Probing the Genome-Scale Metabolic Landscape of Bordetella pertussis, the Causative Agent of Whooping Cough

ABSTRACT Whooping cough is a highly contagious respiratory disease caused by Bordetella pertussis. Despite widespread vaccination, its incidence has been rising alarmingly, and yet, the physiology of B. pertussis remains poorly understood. We combined genome-scale metabolic reconstruction, a novel optimization algorithm, and experimental data to probe the full metabolic potential of this pathogen, using B. pertussis strain Tohama I as a reference. Experimental validation showed that B. pertussis secretes a significant proportion of nitrogen as arginine and purine nucleosides, which may contribute to modulation of the host response. We also found that B. pertussis can be unexpectedly versatile, being able to metabolize many compounds while displaying minimal nutrient requirements. It can grow without cysteine, using inorganic sulfur sources, such as thiosulfate, and it can grow on organic acids, such as citrate or lactate, as sole carbon sources, providing in vivo demonstration that its tricarboxylic acid (TCA) cycle is functional. Although the metabolic reconstruction of eight additional strains indicates that the structural genes underlying this metabolic flexibility are widespread, experimental validation suggests a role of strain-specific regulatory mechanisms in shaping metabolic capabilities. Among five alternative strains tested, three strains were shown to grow on substrate combinations requiring a functional TCA cycle, but only one strain could use thiosulfate. Finally, the metabolic model was used to rationally design growth media with >2-fold improvements in pertussis toxin production. This study thus provides novel insights into B. pertussis physiology and highlights the potential, but also the limitations, of models based solely on metabolic gene content. IMPORTANCE The metabolic capabilities of Bordetella pertussis, the causative agent of whooping cough, were investigated from a systems-level perspective. We constructed a comprehensive genome-scale metabolic model for B. pertussis and challenged its predictions experimentally. This systems approach shed light on new potential host-microbe interactions and allowed us to rationally design novel growth media with >2-fold improvements in pertussis toxin production. Most importantly, we also uncovered the potential for metabolic flexibility of B. pertussis (significantly larger range of substrates than previously alleged; novel active pathways allowing growth in minimal, nearly mineral nutrient combinations where only the carbon source must be organic), although our results also highlight the importance of strain-specific regulatory determinants in shaping metabolic capabilities. Deciphering the underlying regulatory mechanisms appears to be crucial for a comprehensive understanding of B. pertussis's lifestyle and the epidemiology of whooping cough. The contribution of metabolic models in this context will require the extension of the genome-scale metabolic model to integrate this regulatory dimension.


I.2. Newly defined pathways and reactions
The draft metabolic network of B. pertussis Tohama I, as reconstructed based on the metabolic network of E. coli, lacks several essential pathways of B. pertussis, and is therefore incapable of (i) producing several key biomass constituents (B. pertussis specific lipooligosaccharide constituents, B. pertussis specific siderophores, biosynthesis of storage compounds such as PHB or polyphosphate) and (ii) accurately explaining the catabolism of several amino acids (such as branched-chain or aromatic amino acids) which are taken up in sometimes large excess compared to the biomass needs. The necessary pathways were reconstructed based on literature information and gene content, as described in details in Dataset S3.

I.3. Final manual curation
A final round of manual curation was performed with the updated model. I.3.1. "Orphan" metabolites and elemental/charge balances Before curation, 362 metabolites (out of 1646) were identified as "orphan" metabolites -i.e. metabolites that are the product or substrate of only one reaction in the network -, reflecting missing reactions in the network or "carry-over" reactions from the E. coli network. These metabolites were screened manually and decision was taken either to remove them from the network (together with the corresponding reactions) or to add the missing reaction(s). This was performed iteratively, on a case-by-case basis. In parallel, all reactions were checked for charge and atom balance, and corrected if required. Overall, this resulted in the deletion of 390 reactions and 396 metabolites, the addition of 154 new reactions and 13 metabolites, and the modification of 30 reactions (name and/or equation) and 15 metabolites (name only).
Every reaction in the resulting network is elementary and charge balanced, and the network only contains 15 "orphan" metabolites, which were retained because they are the product/substrate of a gene-associated reaction (hence, there is a high confidence level for the presence of the reaction). These metabolites reflect different situations: (i) possible remaining gaps in the network which could not be filled based on genome sequence analysis alone (for instance, several acyl-coenzyme A species), (ii) absence of an external sink for the metabolite (for instance, biotin or several metals, which are not part of the biomass equation), or (iii) truly incomplete pathways such as NAD biosynthesis (hence, the auxotrophy of B. pertussis for niacin). For model-based identification of metabolic engineering strategies, for instance, it is important to keep such metabolites in the network, although the corresponding reactions would never carry a flux in the current version of the model. Also, these 15 metabolites represent good starting points for future improvement of the B. pertussis metabolic network. Genes encoding all enzymes of the TCA cycle are present in the genome sequence (and therefore in the modeled metabolic network), whereas this cycle has been reported as partially dysfunctional, based on the observation that acetyl-coenzyme A and oxaloacetate cannot be converted to α-ketoglutarate [6]. Reactions corresponding to the dysfunctional part of the TCA cycle are R_CS (M_accoa + M_oaa + M_h2o => M_cit + M_coa + M_h), R_ACONTa (M_cit <=> M_acon_C + M_h2o), R_ACONTb (M_acon_C + M_h2o <=> M_icit), R_ICDHyr (M_icit + M_nadp <=> M_akg + M_co2 + M_nadph). In order to account for the physiological observation of Thalen et al. [6], these 4 reactions were constrained in order not to function in the direction from oxaloacetate to α-ketoglutarate (i.e. R_CS: 0,0; R_ACONTa, R_ACONTb, and R_ICDHyr: -99999,0). Flux balance analysis (FBA) was then performed on different model versions, using the external flux constraints derived from measurements in the reference fermentations (Dataset S1), in order to evaluate the impact on (i) growth feasibility and (ii) the energetics of the cell (i.e. how much ATP can be produced). No impact on the feasibility of growth was found with a partially dysfunctional Krebs cycle. However, a significant impact was observed on the maximum possible energy production, depending on the constraints imposed on the Krebs cycle. Constraining either R_CS, R_ACONTa, or R_ACONTb (i.e. any of the reactions that leads from oxaloacetate to isocitrate) results in an equivalent impact on ATP production (19% decrease), while the impact of constraining only R_ICDHyr is less dramatic (10% decrease).

I.3.2. Default constraints to account for physiological observations
In view of the potentially big impact on energy parameters, the abovementioned constraints for all 4 reactions were implemented by default in the model, i.e. the default model was constrained to have a partially dysfunctional Krebs cycle.

I.3.2.2. L-Cys biosynthesis from inorganic sulfate
Because the full sequence of reactions for the L-Cys biosynthesis pathway from sulfate via 3'-phosphoadenylyl sulfate, is present in the reconstructed B. pertussis Tohama I metabolic network, growth with sulfate as the sole source of S would be feasible in silico. However, this would be in contradiction with many reports about the growth requirements of B. pertussis [7,8]. Based on these physiological observations and on the fact that two of the genes encoding reactions in the pathway from sulfate to L-Cys are pseudogenes (BP0970, encoding CysH, and BP0971, encoding CysN [8]), default constraints were set on the corresponding reactions (R_PAPSR, R_PAPSR2, and R_ADSK) so that no flux is possible through this pathway. This was verified with FBA: no biomass production was possible with sulfate as the sole S source when these 3 reactions were constrained.

II.1. Definitions
Metabolic energy is classically divided between the demands of biosynthetic processes (growth) and those of nongrowth associated processes [9], which are traditionally represented by the growth yield (Y ATP ; g DCW mmol ATP -1 ) and the maintenance coefficient (m ATP ; mmol ATP g DCW -1 h -1 ), respectively. This can be represented by the following relation: where q ATP represents the specific ATP consumption rate (mmol ATP g DCW -1 h -1 ) and µ is the specific growth rate (h -1 ).
A fraction of the growth-associated ATP consumption (termed x; mmol ATP g DCW -1 ) is expcitly included in the B. pertussis reaction network in the reactions leading to the synthesis of biomass precursors. However, growth (i.e. the net production of new biomass) also requires additional ATP expenses, which are not explicitly taken into account in the model. This non-explicit ATP consumption (termed y; mmol ATP g DCW -1 ) covers the amount of ATP necessary for the assembly of biomass precursors into new cells [10]. The growth yield Y ATP , the total growth-associated ATP consumption (GAM TOT ; mmol ATP g DCW -1 ), and parameters x and y are related as follows: From a modeling point of view, x, the explicit growth-associated ATP consumption, is intrinsically taken into account in the model. The non-explicit growth-associated ATP consumption, y, can be introduced as an ATP hydrolysis reaction within the biomass equation and has units mmol ATP g DCW -1 [10]. The maintenance ATP consumption (i.e. non-growth associated) can be modeled as an independent ATP hydrolysis reaction (R_ATPM: M_atp_c + M_h2o_c => M_adp_c + M_pi_c + M_h_c), the flux through which represents maintenance [10]. Units of the flux through the maintenance reaction should be identical to units of the other fluxes.
Conversion of specific rates into concentrations (or absolute amounts) in equation 1 results in the following equation: where ATP tot represents the total ATP consumption (mmol ATP L -1 or mmol ATP ), X t=0 is the biomass concentration (or amount) at the start of fermentation (g DCW L -1 or g DCW ), and t is the cultivation time (h). Importantly, equation 3 implies a constant specific growth rate, i.e. exponential growth.
Equation 3 can also be written in terms of x and y (see equation 2): ) to the intercept with the y-axis (ATP consumption at zero growth). ATP consumption can be calculated considered equal to ATP that production under the assumption of full coupling between catabolic ATP supply and anabolic ATP demand.
In the present study, the two reference fermentations are batch fermentations, and no chemostat data were generated. The total amount of ATP generated (ATP tot ; mmol ATP ) was calculated independently for each phase by using Flux Balance Analysis (FBA) to maximize the flux through reaction R_ATPM (an ATP hydrolysis reaction), using a model version without a functional Krebs cycle, with a biomass equation containing no growth-associated ATP hydrolysis (y=0), and with constraints reflecting the measured substrates and products (full description of constraints available in Dataset S4).
The maximum possible ATP production for every phase of the two fermentations, as calculated from FBA, is summarized in Dataset S4. These values represent y, the fraction of ATP consumption that is not explicitly taken into account in the model (i.e. maintenance and assembly of the biomass precursors into cells). From the same FBA simulations, the fraction of growth-associated ATP consumption which is explicitly taken into account in the model in the form of biomass precursor biosynthesis (x), was calculated as the reduced cost associated to the biomass equation (mmol ATP g DCW -1 ), multiplied by the biomass produced in every phase (g DCW ).
In the next step, the growth yield (Y ATP ; g DCW mmol ATP -1 ) and maintenance ATP requirements (m ATP ; mmol ATP g DCW -1 h -1 ) were estimated from the calculated total ATP consumption and from the measured biomass production of individual phases. This was done by fitting values for parameters Y ATP and m ATP , so as to minimize the sum of squared differences between FBA-derived total ATP consumption and calculated total ATP consumption (using equation 3), with the Solver function in Excel. Importantly, the use of equation 3 implies that growth during each metabolic phase follows an exponential growth pattern at a constant µ. The specific growth rate was calculated individually for every phase, based on the corresponding initial and final biomass amounts (Dataset S4).

II.3. Comparison of energy parameters with published values
In the literature, only one report is available for the determination of maintenance and yield parameter values of B. pertussis using glutamate-limited chemostat data at different dilution rates [11]. Other B. pertussis reports such as those of Thalen et al. [6] or Frohlich et al. [12] only provide global substrate requirements in the form of an apparent growth yield, i.e. a lumped factor including growth-associated and maintenance requirements.
The energy parameters of Licari et al. [11] are expressed as a function of substrate amounts (i.e. m s and Y xs ) rather than ATP amounts. Conversion of these parameters is therefore needed, which first requires computing the maximum yield of ATP per L-Glu. This was calculated using the metabolic model, by maximizing the amount of ATP produced (using Flux Balance Analysis) with L-Glu as the sole substrate, unlimited oxygen supply, and no constraint on the metabolic end-products: a value of 15.5 mol ATP per mol L-Glu was determined for the yield of ATP on L-Glu. The resulting energy parameters are m ATP = 0.83 mmol ATP g DCW -1 h -1 and Y ATP = 0.0027 g DCW mmol ATP -1 (Dataset S4).

II.4. Implementation of energy parameters into the model
For implementing energy parameters into the metabolic model, parameter Y ATP cannot be used as such, but must be divided into explicit (x) and non-explicit (y) growth-associated ATP demand (equation 2 When working with specific rates as constraints, the calculated value for the maintenance coefficient m ATP (9.21 mmol ATP g DCW -1 h -1 ) can be introduced as such as a lower bound to the R_ATPM ATP hydrolysis reaction (M_atp_c + M_h2o_c => M_adp_c + M_pi_c + M_h_c).
However, when working with concentrations or absolute amounts as flux constraints -as in the case of calculating the biomass production of batch cultures -, the total amount of ATP used for maintenance purposes must be calculated. This can be done with the following equaton (maintenance part of equation 3): For the reference fermentations, ATP maintenance was calculated based on the measured values for X t=0 , t and µ, and equals 10235.30 mmol ATP or 890.03 mmol ATP L -1 (average of the two reference fermentations; Dataset S4). The calculated value for ATP maintenance can be introduced as a lower bound constraint to the R_ATPM reaction.

III. PREDICTION OF MINIMAL GROWTH REQUIREMENTS
Specifically, we pose the question: is it possible to list the minimal growth requirements that will enable a specific yield? In this context, we define growth as biomass yield per unit substrate uptake and minimal growth requirements as the simplest set of substrates necessary to achieve a target biomass yield.

III.1. Modeling framework and software
In the discussion that follows, a medium component is represented by an uptake flux, i.e. if substrate X is required for growth in the predicted medium, this is modeled as a required non-zero flux through the exchange reaction for X.
All modeling and algorithms described here were developed and implemented using the Open Source software, PySCeS CBMPy (version 0.7.0) making use of the IBM ILOG CPLEX™ 12.5.1 (Academic Edition) for optimization [4].
CBMPy is a Python-based modeling environment and examples of the scripts used to generate the results shown below are available upon request.

III.2. Algorithm for the calculation of the minimal number of substrates
A set of import reactions that define the minimal growth requirements can be used to conceive a minimal growth medium, i.e. one that contains the fewest number of added components. We adopt the hypothesis that, with respect to transport reactions, it is more likely that the cell minimizes the number of active transporters used to achieve a specific objective, rather than the combined flux passing through them. A minimal medium is therefore defined as a medium which supports the minimum number of active transporters (uptake) needed to satisfy a particular objective, in this case biomass production.
In order to compute such a minimal medium, we make use of a mixed integer linear program (MILP) based on the standard FBA LP formulation (constraints and flux bounds) but in addition introduce binary variables to control whether a flux is on (a non-zero flux) or off, and a constraint that fixes the objective to a specific target value. This MILP is described in equation B and is analogous to a previously used approach [13].  . This implementation proved to be both fast and efficient in calculating a set of minimal growth requirements and showed that it was computationally feasible to compute a minimal medium. The algorithm provides a single solution (medium composition) to the problem, from which the number of active fluxes can be deduced.

III.3. Algorithm for the enumeration of minimal active fluxes
While the algorithm described above can provide a single solution (i.e. a single medium composition) that satisfy the condition of the minimal number of active fluxes, there may be alternative solutions (media compositions) that satisfy minimize: such that: this same optimal criterion. In this paragraph, we describe the development of an algorithm to enumerate all possible substrate combinations that have the same, minimal number of active uptake fluxes.
In general, the enumeration of MILP solution spaces is a computationally hard problem. Here, we developed a strategy that takes advantage of the initial solution provided by the MILP optimization: by limiting the scope of our search to only find combinations with a set (minimal) number of active transport reactions, the search space is reduced so as to become computationally feasible.
The EMAF algorithm (for Enumeration of Minimal Active Fluxes) is essentially composed of three parts, each represented as pseudo-code in Fig. S4 and examples of which are available upon request.
To begin with, EMAF initializes the model, sets up the external environment and performs both an active flux minimization and flux variability analysis (FVA) on the target reactions (in this case, the import reactions). This provides the global minimum number of reaction (medium components) as well as a set of required reactions defined as: reactions that have an FVA minimum greater than zero or in other words reactions that are required to carry a flux in all solutions (auxotrophies). It then calls the Subsearch function recursively.
The Subsearch function is the core of the algorithm and takes as arguments a model instance, minimization target and a list of reactions to ignore. Essentially Subsearch computes minimal number of active flux solutions and will exit if the model is either infeasible or the number of active solutions is larger than the global minimum. It then finds the set of non-zero fluxes, as specified by the target list and not in the ignore list (Subsearch will return if target set is empty) and labels them as testable.
For each flux in testable, clone the model, set its bounds to zero and all other fluxes in testable to a small positive number, add them to a new ignore list and call Subsearch recursively using the new model, target and ignore parameters. When Subsearch fails any of the tests mentioned earlier, it returns either an empty solution or the testable flux it was evaluating (set to zero) which results in a depth-first search of the minimal media solution space.
The final part of the algorithm, ConstructMedia unrolls the recursive data structures and regenerates the qualitative media compositions. It then uses this information coupled with an absolute flux minimization approach to quantify the minimal flux required to be taken up to achieve the target objective. For this particular model, while not formally proven, EMAF appears to enumerate all minimal media required to satisfy a target objective.

III.4. Prediction of minimal growth requirements for Tohama I
EMAF was used to identify all possible sets of nutrients -among pre-defined possible nutrients -required for producing B. pertussis Tohama I biomass, with the least possible number of active input fluxes (i.e. least possible number of nutrients). Specifically, in the present study, we applied EMAF to identify possible S sources, possible amino acids as C/N sources, and possible organic acids as C sources.
Practically, for each simulation, EMAF was provided with the following informations: (i) the metabolic model of B. pertussis Tohama I, (ii) a constraint file of input fluxes, reflecting the possible nutrients to choose from (including the maximal concentration allowed; Dataset S7), (iii) the list of target reactions to be minimized (i.e. the list of reactions from which the minimal number of active fluxes must be identified; in this case, all import reactions (R_EXi_) except the import reactions for , O 2 , and H + ; Dataset S7), and (iv) the objective function to be maximized (in this case, the biomass reaction R_BP_biomass_130704, subject to the following constraints: LB=UB=1).
As an output of EMAF, the flux distribution for each possible combination of nutrients resulting in the least number of input fluxes, was obtained (Dataset S7). a the "unsplit" and "split" models only differ in the number and definition of exchange reactions. In the "unsplit" model, exchange reactions are defined as reversible reactions (termed R_EX_), and can account for both the input and output of metabolites into the system. In the "split" model, exchange reactions are irreversible, and each R_EX_ reaction has been split into a reaction that accounts for the input of the metabolite into the model (R_EXi_) and a reaction that accounts for the output of the metabolite (R_EXo_). V. SUPPLEMENTAL FIGURES Figure S1. Net consumption/production of biomass and major extracellular metabolites in reference fermentations A and B (used for model construction and calibration).

IV. SUPPLEMENTAL TABLES
The net consumption/production (expressed in mmol) represents the total amount consumed/produced from the start of the fermentation, in the entire culture volume, at each time-point. Symbols represent the measured net consumption/production of metabolites (available in Dataset S1), vertical lines represent metabolic phase boundaries, solid lines represent fitted spline functions, and dotted lines represent the spline function for the initial phase (Phase 0), during which no sampling was performed. Ten additional metabolites were measured but their concentration was found to be very low. Polyphosphate (PPPi) is intracellular: because the polyphosphate content of biomass fluctuates with time, it was not included in the biomass equation; rather, the amount of intracellular polyphosphate was converted to extracellular equivalents of a triphosphate molecule.  Reactions in the model are represented by lines, together with the stoichiometric coefficients. Metabolites, macromolecules, and pools of metabolites are represented by squares, whose color reflects the macromolecule or pool of metabolites (grey, proteins; blue, RNA; light red, DNA; light green, peptidoglycan; light blue, lipopolysaccharide; red, poly-β-hydroxybutyrate; purple, lipids; orange, cofactors; green, inorganic ions). Metabolite nomenclature is as in the model (M_aaa[_BP]_b, where aaa is the metabolite abbreviation and b indicates the cellular compartment (omitted for all groups except lipids)). For protein, RNA, and DNA synthesis reactions, as well as for biomass assembly, hydrolysis of ATP/GTP, and release of inorganic pyrophosphate (PPi), is indicated in red, with the associated stoichiometric coefficients. In the case of biomass synthesis, the stoichiometric coefficient y represents the growth-associated ATP consumption. Determination of the value of y is described in supplemental text and Dataset S4. For protein synthesis, the release of uncharged tRNA (M_trnaxxx, where xxx is an amino acid) is not explicitly depicted for every amino acid.   Figure S3. Arginine and nucleobase production by B. pertussis.
A. Net production of L-arginine in reference fermentations A and B. Symbols represent the measured net consumption/production of metabolites, vertical lines represent metabolic phase boundaries, solid lines represent fitted spline functions, and dotted lines represent the spline function for the initial phase (Phase 0), during which no sampling was performed. B. Net production of nucleobases in reference fermentation A (LC-MS based quantification).  Figure S4. Pseudo-code representation of EMAF minimal media generating algorithm.
A targeted, recursive media search algorithm is implemented in three parts. An EMAF driver method that initializes the model, determines the initial search parameters and initiates a recursive Subsearch which through a process of elimination determines minimal viable media combinations. Finally in ConstructMedia, minimal media combinations are unrolled and quantified to construct minimal media sets. This algorithm has been fully implemented using CBMPy. Strain names in red refer to strains for which the available genome sequence is made of more than one contig/scaffold. Dataset S4. Calculation of energy parameters (Microsoft Excel workbook -.xlsx). This file contains the data used for determination of ATP parameters for model iBP1870, based on the measured metabolic fluxes in reference fermentations A and B.
Dataset S5. Validation of model iBP1870 against published and newly generated datasets (Microsoft Excel workbook -.xlsx). This file contains data for the comparison of simulated and experimentally determined growth yields, using growth yield information from previously published work and from newly generated growth data.
Dataset S6. Identification of N sinks (Microsoft Excel workbook -.xslx). This file contains the data from LC-MS based identification of N-containing end-products.
Dataset S7. Prediction of minimal growth requirements of B. pertussis Tohama I using EMAF (Microsoft Excel workbook -.xlsx). This file contains the data from the simulations of minimal growth requirements of B. pertussis Tohama I using EMAF, including constraints and results.
Dataset S8. Metabolic reconstructions for 14 additional strains of classical bordetellae (Microsoft Excel workbook -.xlsx). This file contains data from the reconstruction of the metabolic networks of 14 additional strains of Bordetella sp., including the method used.
Model iBP1870 can be downloaded from https://github.com/SystemsBioinformatics/pubdata/tree/master/bordetella-pertussis-model (different formats available, including SBML). This version of the model contains default constraints reflecting a partially dysfunctional Krebs (TCA) cycle, the biomass equation with ATP consumption as defined in the supplemental material, and exchange reactions split between input and output.