The Bacillus Formate Bioeconomy

Anthropogenic emission of greenhouse gases has prompted massive investment in renewable energy, and is driving humanity towards becoming an energy abundant society. Eventually, renewable electricity will be a more efficient method than photosynthesis for capturing energy. Excess energy from renewable sources can be diverted towards production of molecules that can directly be used by microorganisms to produce useful products.

One such molecule that can be used as a carbon and energy source for living organisms is formate. Formate can be efficiently produced using electricity with up to a 98% Faradaic efficiency[1] and is even proposed as a foundational molecule of the future bioeconomy[2]. The process is simple: Excess energy from renewable energy sources is used to produce formate. This formate is then fed to microorganisms, which use the formate as a carbon source for biomass generation and for energy. These organisms can be used as fuels, feedstocks, or as factories for useful chemicals. In 2011, an economic analysis of formate production[3] put the electrical production of formate at ~$573 per ton (Fig 16.) with $558 of these dollars being spent on 7838kwhs of electricity necessary for conversion. Glucose costs approximately $300-$400 per ton[2], with 3x the number of bonds available for energy. Formate becomes economically competitive with glucose when 1 ton of formate costs ~$100, or when 7838kwh costs ~$85, or when 1kwh costs ~1 cent.

Solar is rapidly decreasing in price, with some estimates predicting the average price per kWh of solar in sunny parts of the world only costing ~1-2 cents by 2030[4]. Assuming no new technology is created to increase the efficiency of CO2 to formate conversion and assuming no carbon offset incentives, formate should be competitive with glucose for bioproduction within 10-15 years.

Bacillus subtilis is an important chassis for the future decentralized bioeconomy because of some unique abilities the bacteria has, such as natural competence[5], sporulation[6], and protein secretion[7]. Natural competence allows Bacillus subtilis to uptake DNA from its environment using biological machinery, simplifying the process of genetic modification. Sporulation allows Bacillus subtilis to become resistant to nearly all environmental conditions, including exposure to the vacuum of space[8], greatly simplifying storage of strains and freeing strain distribution from refrigerated cold chains. The protein secretion abilities of Bacillus subtilis make it an widely used organism for industrial production of enzymes[9], simplifying the process necessary to purify enzymes from expression strains. Bacillus subtilis has also been a model organism[10] for over ~20 years, with some researchers even investigating using Bacillus subtilis as a chassis for minimal life[11].

As there is an increase in genetic engineering, it also becomes necessary to put into place biosafety mechanisms that prevent genetically engineered strains from being released and spreading into the environment. Previously, this has been done with synthetic auxotrophies[12]. A similar approach could be taken with synthetic formate auxotrophy[13] by knocking out sugar metabolism genes - creating an organism which is industrially viable, yet cannot survive in the natural environment. This ability is also reversible - it is unlikely environmental strains of bacteria, which may be harmful to humans, will be able to survive in formate-based media. Natural resistance to contamination and enviromental release makes formatotrophic strains attractive engineering platforms.

For these reasons, I propose the synthesis of all genes necessary to make Bacillus subtilis a formatotroph. This toolkit will include ~9 genes necessary for the biosynthetic pathway and ~6 deletion vectors for the verification of biosynthetic function. Hopefully, this will enable the creation of a scalable, useful, and safe bacterial production platform for the future.

IP rights

There is currently a patent, US20150218528A1[14], for the general use of this pathway, which expires in 2033.


[1] Electroreduction of CO2 to Formate on a Copper-Based Electrocatalyst at High Pressures with High Energy Conversion Efficiency

[2] The formate bio-economy

[3] The Electrochemical Reduction of Carbon Dioxide to Formate/Formic Acid: Engineering and Economic Feasibility

[4] Solar’s Future is Insanely Cheap (2020)

[5] Natural competence

[6] Sporulation in Bacillus subtilis

[7] Protein secretion pathways in Bacillus subtilis: Implication for optimization of heterologous protein secretion

[8] Viability of Bacillus subtilis spores exposed to space environment in the M-191 experiment system aboard Apollo 16

[9] Improvement of Heterologous Protein Secretion by Bacillus subtilis

[10] Bacillus subtilis, the model Gram-positive bacterium: 20 years of annotation refinement

[11] The Blueprint of a Minimal Cell: MiniBacillus

[12] Biological containment of genetically modified Lactococcus lactis for intestinal delivery of human interleukin 10

[13] Growth of E. coli on formate and methanol via the reductive glycine pathway

[14] US patent US20150218528A1


This toolkit contains the specific genes necessary to give Bacillus subtilis synthetic formatotrophy. A formatotrophic Bacillus subtilis isn't supplied - that is up to you! If you need a good review on this system in Escherichia coli, check out "Growth of E. coli on formate and methanol via the reductive glycine pathway". The pathway described there can be a little hard to understand, so I've broken down the biosynthesis pathway with EC numbers and explanations below.

Growth of E. coli on formate and methanol via the reductive glycine pathway

Why synthetic formatotrophy?

The idea of synthetic formatotrophy is to build a pathway that allows an organism to depend on formate instead of glucose for cellular energy and carbon biomass. This formate can be created with renewable energy sources, removing photosynthesis as the bottom rung of the trophic ladder. If humanity can increase electricity production enough, synthetic formatotrophy opens up a whole new way to create biomass for useful chemicals, food, and more!

There are 2 pathways that use formate to generate biomass in non-formate utilizing organisms - the rubisco pathway or the reductive glycine pathway. The rubsico pathway uses formate to power a non-native calvin cycle, while the reductive glycine pathway uses formate to synthesize glycine, serine, and pyruvate. The rubisco pathway's cycle creates a lot of intermediate metabolic products that are used in other pathways, while the reductive glycine pathway is more linear and easier to model. This becomes apparent with the length of time necessary to achieve synthetic formatotrophy - with the Rubisco pathway, it took about ~200 days of continuous evolution to get a cell that could live purely on formate, while with the reductive glycine pathway, a formatotroph could be found immediately after all modifications were made. In addition to this advantage, we will be using the reductive glycine pathway since there are auxotrophic methods to test if the formatotrophy pathways are functioning.

Rubisco pathway

Reductive glycine pathway

How do you create synthetic formatotrophy?

To get from formate to cell energy, you convert formate to NADH, which can be used for the electron transport chain. To get from formate to biomass, you convert formate to glycine, then to serine, and finally to pyruvate. Pyruvate can be used in central metabolism, while glycine and serine can be used as amino acids in proteins.

The entire pathway is broken up into 4 "modules" - EM, C1, C2, and C3. Successful expression of EM will allow the cell to use formate for energy. Successful expression of C1 and C2 allow glycine auxotrophs to synthesize glycine. Successful expression of C3 will allow serine auxotrophs to synthesize serine (and pyruvate). In Escherichia coli, there are a total there are 9 genes that need to be expressed: 1 for EM, 3 for C1, 3 for C2, and 2 for C3. The 4 genes in EM and C1 need to be introduced synthetically from different organisms (Pseudomonas strain 101 is used for EM and Methylobacterium extorquens is used for C1) while the 5 genes in C2 and C3 already exist in the current biosynthetic pathways of Escherichia coli, but simply need to be overexpressed. In Bacillus subtilis, the same 4 genes for EM and C1 will need to be introduced synthetically and a different set of genes are used for C2 and C3.

In Escherichia coli, 6 deletions are required to test the functionality of the formate uptake and conversion pathway - serA, kbl, ltaE, aceA, glyA, and sdaA. serA forces serine auxotrophy, kbl and ltaE remove theorine clevage to glycine, and aceA forces glycine auxotrophy. glyA and sdaA are overexpressed in C3, and so were deleted from the genome (presumably so that homologous recombination wouldn't occur between the native and introduced sequences). In Bacillus subtilis, at least 4 of those deletions would be necessary to test if C1, C2, and C3 are functional.

Let's do a deep dive of the actual pathway used for synthetic formatotrophy.

The Synthetic Formatotrophy Pathway

This is the compressed pathway using EC numbers for synthetic formatotrophy.

# Energy

# Biomass -> -> -> ->[1][2] ->[3] ->[4]
[1] Recycles with to enter at
[2] Glycine output can be used as amino acid
[3] Serine output can be used as amino acid
[4] Pyruvate output can be used for energy or for fatty acid synthesis

EM - Energy production

formate -> NADH (energy)

In the EM module, formate is directly converted to NADH for use as energy in the cell.

This single reaction is performed by formate dehydrogenase, creating NADH, which can be used with NADH dehydrogenase to create a proton gradient and generate energy (ATP).

EC - formate dehydrogenase:
formate + NAD+ = CO2 + NADH

C1 - Formate conversion

formate -> 5,10-methylene-THF

In the C1 module, formate is converted to a more familiar molecule, 5,10-methylene-THF, for entry into the C2 module and C3 module.

These 3 reactions are quite simple and each rely on a single protein. Formate-THF ligase first ligates THF onto the formate, which can then be converted to 5,10-methyl-THF using methenyltetrahydrofolate cyclohydrolase. Methylenetetrahydrofolate dehydrogenase finally converts 5,10-methyl-THF to 5,10-methylene-THF.

EC - formate---tetrahydrofolate ligase:
formate + THF + ATP = 10-formyl-THF + ADP + Pi

EC - methenyltetrahydrofolate cyclohydrolase:
10-formyl-THF = H2O, 5,10-methyl-THF

EC - methylenetetrahydrofolate dehydrogenase:
5,10-methyl-THF + NADPH + H+ = 5,10-methylene-THF + NADP+

C2 - Glycine cleavage system / synthase

5,10-methylene-THF + CO2 -> glycine

In the C2 module, 5,10-methylene-THF is converted into glycine. Glycine can be used as an amino acid, or given to the C3 module for further conversion.

There is a fantastic wikipedia page on the [glycine cleavage system](, so below will be a focused simplification. The most important part to remember about the glycine cleavage system is that its reactions are reversible, and so direction of metabolic flux is determined by the concentrations of substrates and products within the reaction. For the usage of biosynthesis from formate, we will be running the glycine cleavage system in the direction of glycine synthesis by adding more substrates.

There are 4 proteins in the system: The T-protein, the P-protein, the L-protein, and the H-protein. The H-protein facilitates transfer of molecules between the 3 other proteins. The T-protein does reaction EC, using the H-protein, 5,10-methylene-THF from the C1 system, and ammonia to produce THF and the second H-protein state. The H-protein is then passed to the P-protein, which uses the H-protein and CO2 to produce glycine and the third H-protein state. Finally, the H-protein is passed to the L-protein with some NAD+ to produce NADH, H+, and the original H-protein, which is recycled into the system.

- H(x) represents H protein in different states
- 5,10-methylene-THF is the output of C1

EC - T protein:
H(0) + 5,10-methylene-THF + NH3 = H(1) + THF

EC - P protein:
H(1) + CO2 = glycine + H(2)

EC - L protein:
H(2) + NAD+ = H(0) + NADH + H+

C3 - Serine and pyruvate synthesis

glycine + methylene-THF -> serine + pyruvate

In the C3 module, glycine is converted into serine and pyruvate to make biomass.

The C3 module uses 2 enzymes, serine hydroxymethylase and serine deaminase. Serine hydroxymethylase converts glycine and 5,10-methylene-THF into serine, which can directly be used as amino acid. However, instead of being used as an amino acid, the serine can also be used by serine deaminase to create pyruvate, which can be used for energy or for creating fatty acids.

- EC, serine deaminase, actually does 3 reactions to get to its final product
- EC, threonine deaminase, can work as a serine deaminase in several organisms
- 5,10-methylene-THF is the output of C1 and glycine is the output of C2

EC - serine hydroxymethylase:
5,10-methylene-THF + glycine + H2O = THF + L-serine

EC - serine deaminase:
L-serine = pyruvate + NH3

The Deletions

As said above, there are 4 genes that must be deleted to test the synthetic formatotrophy pathway. Generally, these genes prevent synthesis of serine and glycine from native pathways. Glycine and serine auxotrophy can then be used to test if formatotrophy is functioning in the target organism. Quite a clever trick!

- serA is the first step of the serine biosynthesis (3-phosphooxypyruvate is used to synthesize serine)
- In literature, it seems glyoxylate synthesized by aceA can be used for glycine synthesis under stress conditions

serA - EC
(2R)-3-phosphoglycerate + NAD+ = 3-phosphooxypyruvate + H+ + NADH

kbl - EC
(2S)-2-amino-3-oxobutanoate + CoA = glycine + acetyl-CoA

ltaE - EC
L-threonine = acetaldehyde + glycine

aceA - EC
D-threo-isocitrate = glyoxylate + succinate

Gene deletions can be done in several different ways, though the most promising would likely be CRISPR plasmids.

CRISPR plasmids for engineering Bacillus subtilis

Synthesis Plan

Paper with most genes: "Growth of E. coli on formate and methanol via the reductive glycine pathway"

EM module

In "Growth of E. coli on formate and methanol via the reductive glycine pathway", the EM module was built from the gene fdh (formate dehydrogenase) in Pseudomonas sp. (strain 101). We will be using that exact gene, codon optimized for Bacillus subtilis.


C1 module

In "Growth of E. coli on formate and methanol via the reductive glycine pathway" the C1 module was built from genes from Methylobacterium extorquens - in particular, fhs (formate--tetrahydrofolate ligase), fchA (methenyltetrahydrofolate cyclohydrolase), and mtdA (methylenetetrahydrofolate dehydrogenase). We'll use these exact genes, codon optimized for Bacillus subtilis.




C2 module

The C2 module is made of the genes involved in the glycine cleavage system, which exists in both Escherichia coli and Bacillus subtilis. Therefore, the 3-4 genes necessary for this pathway will simply be PCR'd out of Escherichia coli and Bacillus subtilis. [TODO 2021-12-30: Update this with synthetic genes]

C3 module

The C3 module is also made out of genes naturally occuring in Escherichia coli and Bacillus subtilis. These too will be PCR'd out. [TODO 2021-12-30: Update this with synthetic genes]


The deletions are a bit more tricky. There are many great systems for doing genomic modification of Bacillus subtilis, but we will be using CRISPR plasmids from "New CRISPR-Cas9 vectors for genetic modifications of Bacillus species". Since we only need to do 4 replacements, we will just use oligos with the existing protocol in that paper. If it works, we will move into much higher throughput work to target all genes for deletions. The primers necessary to do this are part of primers.fasta. [TODO 2021-12-30: Update this with synthetic genes]

New CRISPR-Cas9 vectors for genetic modifications of Bacillus species

Keoni Gandall