API Reference

perfusio — open reference implementation of self-driving perfusion bioprocess development.

perfusio implements the methodology of Gadiyar et al. (2026) and Hutter et al. (2021), providing step-wise Gaussian-process hybrid models, entity-embedding transfer learning, Bayesian Experimental Design, and an online-retraining digital twin for CHO perfusion bioreactors. It is intended as an open, citable complement to commercial platforms.

References

[Gadiyar2026]

Gadiyar, C. J., Müller, C., Vuillemin, T., Bielser, J.-M., Souquet, J., Fagnani, A., Sokolov, M., von Stosch, M., Feidl, F., Butté, A., & Cruz Bournazou, M. N. (2026). Self-Driving Development of Perfusion Processes for Monoclonal Antibody Production. Biotechnology and Bioengineering, 123(2), 391–405. https://doi.org/10.1002/bit.70093

[Hutter2021]

Hutter, S., von Stosch, M., Cruz Bournazou, M. N., & Butté, A. (2021). Knowledge transfer across cell lines using hybrid Gaussian process models with entity embedding vectors. Biotechnology and Bioengineering, 118(12), 4710–4725. https://doi.org/10.1002/bit.27907

class perfusio.DesignSpace(*, name, control_bounds=<factory>, species_bounds=<factory>, viability_min=95.0, vcv_target=None, titer_target=None)[source]

Full description of the operating envelope for a perfusion experiment.

This object encodes both the measurable state bounds and the control variable ranges that are passed to the Bayesian Experimental Design optimizer. It is the single source of truth for all optimize_acqf calls.

Parameters:
  • name (str) – Experiment name (used in audit logs and figure titles).

  • control_bounds (dict[str, perfusio.config.ControlBounds]) – Mapping of control variable name → ControlBounds.

  • species_bounds (dict[str, perfusio.config.SpeciesBounds]) – Mapping of species name → SpeciesBounds.

  • viability_min (float) – Minimum acceptable viability (%) for feasibility constraints. Default 95.0 matches the paper’s constraint.

  • vcv_target (float | None) – Target volumetric cell volume (%) for the single-objective VCV-tracking use case (Gadiyar et al. 2026, §3.5).

  • titer_target (float | None) – Optional target product concentration [mg L⁻¹].

Examples

>>> from perfusio.config import DesignSpace, ControlBounds, SpeciesBounds
>>> ds = DesignSpace(
...     name="ambr250_run_01",
...     control_bounds={
...         "perfusion_rate": ControlBounds(lo=0.5, hi=2.0, unit="vvd"),
...         "bleed_rate": ControlBounds(lo=0.05, hi=0.30, unit="vvd"),
...     },
...     species_bounds={
...         "VCD": SpeciesBounds(lo=0.0, hi=120.0, unit="1e6 cells/mL",
...                              description="Viable Cell Density"),
...     },
... )
property bounds_tensor: tuple[Tensor, Tensor]

Return (lower, upper) tensors of shape (n_controls,) for BoTorch.

Returns:

lower and upper bound tensors, dtype float64.

Return type:

tuple[Tensor, Tensor]

control_bounds: dict[str, ControlBounds]
property control_names: list[str]

Ordered list of control variable names.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property n_controls: int

Number of control variables.

name: str
species_bounds: dict[str, SpeciesBounds]
property species_names: list[str]

Ordered list of species names.

titer_target: float | None
vcv_target: float | None
viability_min: float
class perfusio.RunConfig(*, seed=None, device='cpu', dtype='float64', n_mc_samples=256, n_restarts=10, raw_samples=512, log_level='INFO', allow_write=False)[source]

Runtime configuration for perfusio computations.

Parameters:
  • seed (int | None) – Global random seed. None for non-deterministic execution.

  • device (str) – Torch device string ("cpu", "cuda:0", …). Defaults to "cpu" to ensure CI compatibility without GPUs.

  • dtype (Literal['float32', 'float64']) – Torch dtype. Defaults to torch.float64 (required for numerical stability of GP marginal log-likelihood optimisation — never downcast).

  • n_mc_samples (int) – Number of Monte Carlo samples for uncertainty propagation in rollout. Default 256 (adequate for 80% PI; increase to 1024 for publication).

  • n_restarts (int) – Number of random restarts for acquisition function optimisation. Default 10.

  • raw_samples (int) – Initial Sobol samples for warm-starting acquisition optimisation. Default 512.

  • log_level (Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']) – structlog log level. One of "DEBUG", "INFO", "WARNING", "ERROR".

  • allow_write (bool) – If False (default), the OPC UA connector operates in read-only mode and will raise PermissionError on any write_setpoints call. Must be explicitly set to True to enable hardware writes.

Notes

dtype = torch.float64 is a hard requirement for the GP marginal log-likelihood optimisation. The LBFGS optimiser with strong-Wolfe line search becomes numerically unstable in float32 for datasets with more than ~100 training points.

allow_write: bool
device: str
dtype: Literal['float32', 'float64']
log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']
make_generator()[source]

Create a seeded torch.Generator, or None if seed is None.

Return type:

torch.Generator or None

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_mc_samples: int
n_restarts: int
raw_samples: int
seed: int | None
property torch_device: device

Parsed torch.device.

property torch_dtype: dtype

Parsed torch.dtype.

class perfusio.State(day, species, controls, volume_L, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]

Bioreactor state at a single discrete time step.

All numeric fields are plain Python floats (or None for missing measurements), so that State objects can be serialised easily.

Parameters:
  • day (int) – Process day (0-indexed integer; 0 = inoculation day).

  • species (dict[str, float]) – Mapping of species name → measured (or simulated) value. Missing measurements are represented as float("nan").

  • controls (dict[str, float]) – Mapping of control variable name → applied setpoint.

  • volume_L (float) – Reactor working volume in litres. For ambr®250, this is typically 0.250 L. Must be > 0.

  • reactor_id (str) – Unique string identifier for the reactor vessel (e.g. "R01").

  • clone_id (str) – Cell-line / clone identifier used by CloneRegistry.

  • run_id (str) – Experiment / run identifier.

Notes

The species dict uses the canonical species names defined in perfusio.chemistry.species; see that module for the full registry.

Examples

>>> s = State(
...     day=5,
...     species={"VCD": 18.3, "Via": 97.1, "Glc": 3.2, "Amm": 4.1},
...     controls={"perfusion_rate": 1.0, "bleed_rate": 0.15},
...     volume_L=0.250,
...     reactor_id="R01",
...     clone_id="CX",
...     run_id="EXP001",
... )
>>> s.day
5
clone_id: str = 'unknown'
controls: dict[str, float]
day: int
classmethod from_tensor(t, species_names, control_names, *, day, volume_L=0.25, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]

Reconstruct a State from a flat tensor.

Parameters:
  • t (Tensor) – Flat 1-D tensor as produced by to_tensor().

  • species_names (list[str]) – Species ordering used when the tensor was created.

  • control_names (list[str]) – Control variable ordering.

  • day (int) – Process day (not stored in the tensor to avoid dtype issues).

  • volume_L (float) – Working volume.

  • reactor_id (str) – Reactor ID.

  • clone_id (str) – Clone ID.

  • run_id (str) – Run ID.

Return type:

State

reactor_id: str = 'R00'
run_id: str = 'unknown'
species: dict[str, float]
to_tensor(species_names, control_names, *, dtype=torch.float64, device='cpu')[source]

Serialise this state to a flat 1-D tensor.

Parameters:
  • species_names (list[str]) – Ordered list of species to include; missing ones are nan.

  • control_names (list[str]) – Ordered list of controls to include; missing ones are nan.

  • dtype (dtype) – Target dtype (default float64).

  • device (device | str) – Target device.

Returns:

Shape (len(species_names) + len(control_names) + 1,). The trailing +1 is the process day cast to float.

Return type:

Tensor

volume_L: float
class perfusio.StateBatch(day, species, controls, volume_L, reactor_ids, clone_ids, run_ids, species_names, control_names)[source]

Bioreactor states for B reactors at a single time step.

Parameters:
  • day (int) – Shared process day.

  • species (torch.Tensor) – Tensor of shape (B, n_species); rows are reactors, columns are species in canonical order.

  • controls (torch.Tensor) – Tensor of shape (B, n_controls).

  • volume_L (torch.Tensor) – Tensor of shape (B,) with working volumes in litres.

  • reactor_ids (list[str]) – List of length B with reactor identifiers.

  • clone_ids (list[str]) – List of length B with clone identifiers.

  • run_ids (list[str]) – List of length B with run identifiers.

  • species_names (list[str]) – Column names for the species tensor.

  • control_names (list[str]) – Column names for the controls tensor.

Examples

>>> import torch
>>> sb = StateBatch(
...     day=10,
...     species=torch.zeros(4, 11),
...     controls=torch.zeros(4, 6),
...     volume_L=torch.full((4,), 0.250),
...     reactor_ids=["R01", "R02", "R03", "R04"],
...     clone_ids=["CX"] * 4,
...     run_ids=["EXP001"] * 4,
...     species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln",
...                     "Glu", "Lac", "Amm", "Pyr", "Titer"],
...     control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint",
...                     "temperature", "agitation", "pyruvate_feed"],
... )
>>> sb.batch_size
4
property batch_size: int

Number of reactors in this batch.

clone_ids: list[str]
control_names: list[str]
controls: Tensor
day: int
reactor_ids: list[str]
run_ids: list[str]
species: Tensor
species_names: list[str]
to_input_tensor()[source]

Concatenate species + controls into a single (B, d) tensor.

Returns:

Shape (B, n_species + n_controls + 1) where the last column is the process day as float.

Return type:

Tensor

volume_L: Tensor
class perfusio.Trajectory(species, controls, volume_L, days, species_names, control_names, run_id='unknown', clone_id='unknown', reactor_id='R00', metadata=<factory>)[source]

Full time-series data for a single bioreactor run.

A Trajectory is the primary input to GP training. Each row corresponds to one daily observation; the number of rows equals the run length in days.

Parameters:
  • species (torch.Tensor) – Float tensor of shape (T, n_species) with measured species values. Missing observations are float("nan").

  • controls (torch.Tensor) – Float tensor of shape (T, n_controls) with applied setpoints.

  • volume_L (torch.Tensor) – Float tensor of shape (T,) with reactor working volumes.

  • days (torch.Tensor) – Integer tensor of shape (T,) with process days (0-indexed).

  • species_names (list[str]) – Column names for species.

  • control_names (list[str]) – Column names for controls.

  • run_id (str) – Unique run identifier.

  • clone_id (str) – Clone / cell-line identifier.

  • reactor_id (str) – Reactor vessel identifier.

  • metadata (dict[str, str]) – Arbitrary string → string metadata (e.g. medium lot, operator).

Notes

All tensors should be on the same device and have the same dtype. Use to() to move the trajectory to a different device.

Examples

>>> import torch
>>> traj = Trajectory(
...     species=torch.randn(20, 11),
...     controls=torch.ones(20, 6),
...     volume_L=torch.full((20,), 0.250),
...     days=torch.arange(20),
...     species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln",
...                     "Glu", "Lac", "Amm", "Pyr", "Titer"],
...     control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint",
...                     "temperature", "agitation", "pyruvate_feed"],
...     run_id="EXP001",
...     clone_id="CX",
...     reactor_id="R01",
... )
>>> traj.n_days
20
clone_id: str = 'unknown'
control_names: list[str]
controls: Tensor
days: Tensor
get_species(name)[source]

Return a 1-D tensor of values for species name.

Parameters:

name (str) – Species name.

Returns:

Shape (T,).

Return type:

Tensor

metadata: dict[str, str]
property n_controls: int

Number of control variables.

property n_days: int

Number of time steps in the trajectory.

property n_species: int

Number of measured species.

reactor_id: str = 'R00'
run_id: str = 'unknown'
slice_days(start, end)[source]

Return a sub-trajectory for days in [start, end).

Parameters:
  • start (int) – First day to include (0-indexed).

  • end (int) – Exclusive upper bound.

Return type:

Trajectory

species: Tensor
species_index(name)[source]

Return the column index for name in the species tensor.

Parameters:

name (str) – Species name, must be in species_names.

Return type:

int

Raises:

KeyError – If name is not a registered species.

species_names: list[str]
to(device)[source]

Return a copy with all tensors moved to device.

Parameters:

device (device | str) – Target PyTorch device.

Return type:

Trajectory

to_input_tensor()[source]

Stack species + controls + day into (T, d) tensor.

Returns:

Shape (T, n_species + n_controls + 1).

Return type:

Tensor

valid_mask()[source]

Boolean mask (T, n_species) that is True where measurement is present.

Returns:

dtype bool, shape (T, n_species).

Return type:

Tensor

volume_L: Tensor

Mechanistic Models

Mechanistic kinetic rate expressions for CHO perfusion models.

All rate laws are implemented from first principles with citable sources. Kinetic constants are labelled as synthetic defaults (S) or as drawn from a specific reference. No Merck-proprietary values are used.

Kinetics implemented

  1. Monod growth on glucose and glutamine (dual substrate).

  2. Pirt maintenance (energy maintenance on glucose independent of growth).

  3. Luedeking–Piret mAb production model (growth + non-growth associated).

  4. Lactate switch: CHO clone X switches from lactate production to lactate consumption when glucose falls below a threshold (Warburg effect reversal). Clone Y never consumes lactate.

  5. Ammonium production from glutamine deamidation.

  6. Pyruvate-NH₄⁺ scavenging: pyruvate reacts with NH₄⁺ in a first-order reaction (reduces ammonium accumulation when pyruvate is fed).

  7. First-order cell death with temperature and shear stress dependence.

  8. Apoptosis-induced diameter increase (phenomenological).

References

[Monod1942]

Monod, J. (1942). Recherches sur la Croissance des Cultures Bacteriennes. Hermann et Cie, Paris.

[Pirt1965]

Pirt, S. J. (1965). The maintenance energy of bacteria in growing cultures. Proceedings of the Royal Society B, 163(991), 224–231.

[LuedekingPiret1959]

Luedeking, R., & Piret, E. L. (1959). A kinetic study of the lactic acid fermentation. Journal of Biochemical and Microbiological Technology and Engineering, 1(4), 393–412.

[Gagnon2011]

Gagnon, M., et al. (2011). Metabolic flux analysis of CHO cells. Biotechnology and Bioengineering, 108(6), 1328–1337. (Source of synthetic Monod parameters, labelled S below.)

[Gadiyar2026]

Gadiyar et al. (2026), §3.3 (in-silico setup).

class perfusio.mechanistic.kinetics.CHOKinetics(mu_max=0.04, K_glc=0.08, K_gln=0.5, q_glc_max=5.5e-11, m_glc=2e-12, Y_lac_glc=1.8, glc_switch=2.0, lac_consump_rate=3e-10, consumes_lactate=True, q_gln_max=5e-11, K_d=0.003, K_T=0.001, T_ref=37.0, q_mAb_growth=2e-09, q_mAb_nongrowth=1.8e-09, alpha_amm_gln=0.85, k_pyr_amm=0.05, diam_base=20.5, diam_death_coeff=4.0)[source]

Bases: object

Kinetic parameter set for a CHO perfusion model.

All parameters are synthetic defaults calibrated to reproduce the trajectory shapes shown in Gadiyar et al. (2026) Fig. 7. They are labelled with their source (S = synthetic default; literature citation otherwise).

Parameters:
  • mu_max (float) – Maximum specific growth rate [h⁻¹]. (S — typical CHO: 0.04 h⁻¹)

  • K_glc (float) – Glucose Monod constant [g L⁻¹]. (S — Gagnon et al. 2011: ~0.08 g/L)

  • K_gln (float) – Glutamine Monod constant [mmol L⁻¹]. (S — ~0.5 mmol/L)

  • q_glc_max (float) – Maximum specific glucose consumption rate [g cell⁻¹ h⁻¹]. (S)

  • m_glc (float) – Pirt glucose maintenance coefficient [g cell⁻¹ h⁻¹]. (S)

  • Y_lac_glc (float) – Lactate yield from glucose [mmol mmol⁻¹] in the Warburg regime. (S)

  • glc_switch (float) – Glucose threshold [g L⁻¹] below which clone X switches from lactate production to lactate consumption (Warburg effect reversal). (S)

  • lac_consump_rate (float) – Maximum specific lactate consumption rate [mmol cell⁻¹ h⁻¹] after the metabolic switch (clone X only). (S)

  • q_gln_max (float) – Maximum specific glutamine consumption rate [mmol cell⁻¹ h⁻¹]. (S)

  • K_d (float) – First-order cell death rate constant [h⁻¹]. (S)

  • K_T (float) – Temperature sensitivity of death rate [h⁻¹ K⁻¹]. (S)

  • T_ref (float) – Reference temperature for death rate [°C]. (S — 37.0)

  • q_mAb_growth (float) – Growth-associated mAb production [mg cell⁻¹]. (Luedeking–Piret α)

  • q_mAb_nongrowth (float) – Non-growth-associated mAb production [mg cell⁻¹ h⁻¹]. (LP β)

  • alpha_amm_gln (float) – Ammonium production stoichiometry from glutamine [mmol mmol⁻¹]. (S)

  • k_pyr_amm (float) – First-order rate constant for pyruvate–NH₄⁺ scavenging [h⁻¹]. (S)

  • diam_base (float) – Base mean cell diameter [μm]. (S — 20.5 μm, Gadiyar Fig.7)

  • diam_death_coeff (float) – Diameter increase per unit dead-cell fraction [μm per fraction]. (S)

  • consumes_lactate (bool) – Whether this clone has the metabolic switch (True for clone X). (S)

Notes

These are in-silico defaults only. Real CHO cell lines have proprietary kinetic parameters. See LIMITATIONS.md for an explicit disclaimer.

K_T: float = 0.001
K_d: float = 0.003
K_glc: float = 0.08
K_gln: float = 0.5
STATE_ORDER: ClassVar[list[str]] = ['VCD', 'Via', 'Glc', 'Gln', 'Glu', 'Lac', 'Amm', 'Pyr', 'Titer']

Canonical ordering of ODE state variables.

T_ref: float = 37.0
Y_lac_glc: float = 1.8
alpha_amm_gln: float = 0.85
cell_diameter(via)[source]

Phenomenological cell diameter as a function of viability.

Dead and apoptotic cells swell, increasing the population mean diameter. This is calibrated to match Gadiyar et al. Fig. 7 (Diam ≈ 20.5 → 22.5 μm as viability falls from ~99% to ~90%).

\[\bar{d} = d_0 + \delta_d \cdot (1 - \text{Via}/100)\]
Parameters:

via (float) – Cell viability [%].

Returns:

Mean cell diameter [μm].

Return type:

float

consumes_lactate: bool = True
diam_base: float = 20.5
diam_death_coeff: float = 4.0
glc_switch: float = 2.0
k_death(temperature_C=37.0)[source]

Temperature-dependent specific cell death rate [h⁻¹].

\[k_d(T) = k_d^{\text{ref}} + K_T \cdot (T - T_{\text{ref}})\]
Parameters:

temperature_C (float) – Culture temperature [°C].

Returns:

Death rate [h⁻¹]. Clamped to ≥ 0.

Return type:

float

k_pyr_amm: float = 0.05
lac_consump_rate: float = 3e-10
m_glc: float = 2e-12
mu(glc, gln)[source]

Specific growth rate by dual-substrate Monod kinetics.

\[\mu = \mu_{\max} \frac{[\text{Glc}]}{K_{\text{Glc}} + [\text{Glc}]} \frac{[\text{Gln}]}{K_{\text{Gln}} + [\text{Gln}]}\]
Parameters:
  • glc (float) – Glucose concentration [g L⁻¹]. Clamped to ≥ 0.

  • gln (float) – Glutamine concentration [mmol L⁻¹]. Clamped to ≥ 0.

Returns:

Specific growth rate [h⁻¹].

Return type:

float

References

[Monod1942]

Monod (1942), eq. (1).

mu_max: float = 0.04
ode_rhs(t, y, controls)[source]

Right-hand side of the CHO ODE system for scipy.integrate.solve_ivp.

State vector ordering (must match STATE_ORDER): [VCD, Via, Glc, Gln, Glu, Lac, Amm, Pyr, Titer]

Parameters:
  • t (float) – Time [h] (not used in this autonomous system, but required by the solve_ivp signature).

  • y (list[float]) – Current state vector (see ordering above).

  • controls (dict[str, float]) – Dict with keys "temperature" [°C], "perfusion_rate" [vvd], "bleed_rate" [vvd], "glucose_feed_conc" [g L⁻¹], "gln_feed_conc" [mmol L⁻¹], "pyr_feed_conc" [mmol L⁻¹], "volume_L" [L].

Returns:

Time derivatives for each state variable [h⁻¹ × units].

Return type:

list[float]

q_amm(q_gln_val, pyr, amm)[source]

Volumetric ammonium production rate [mmol L⁻¹ h⁻¹].

Ammonium is produced from glutamine deamidation and scavenged by pyruvate in a first-order reaction.

\[R_{\text{NH}_4^+} = \alpha_{\text{Gln}} \cdot q_{\text{Gln}} - k_{\text{pyr}} \cdot [\text{Pyr}] \cdot [\text{NH}_4^+]\]
Parameters:
  • q_gln_val (float) – Glutamine consumption rate [mmol L⁻¹ h⁻¹].

  • pyr (float) – Pyruvate concentration [mmol L⁻¹].

  • amm (float) – Ammonium concentration [mmol L⁻¹].

Returns:

Net ammonium production rate [mmol L⁻¹ h⁻¹].

Return type:

float

q_glc(mu_val, vcd)[source]

Specific glucose consumption rate (Pirt maintenance + growth).

\[q_{\text{Glc}} = \frac{\mu}{Y_{X/\text{Glc}}} + m_{\text{Glc}}\]

We absorb \(Y_{X/\text{Glc}}\) into q_glc_max for parsimony. The total volumetric rate is \(Q_{\text{Glc}} = q_{\text{Glc}} \cdot X_v\).

Parameters:
  • mu_val (float) – Current specific growth rate [h⁻¹].

  • vcd (float) – Viable cell density [10⁶ cells mL⁻¹].

Returns:

Volumetric glucose consumption rate [g L⁻¹ h⁻¹].

Return type:

float

References

[Pirt1965]

Pirt (1965).

q_glc_max: float = 5.5e-11
q_gln(mu_val, vcd)[source]

Volumetric glutamine consumption rate [mmol L⁻¹ h⁻¹].

Parameters:
  • mu_val (float) – Specific growth rate [h⁻¹].

  • vcd (float) – VCD [10⁶ cells mL⁻¹].

Returns:

Glutamine consumption rate [mmol L⁻¹ h⁻¹].

Return type:

float

q_gln_max: float = 5e-11
q_lac(glc, q_glc_val, vcd=0.0)[source]

Volumetric lactate production/consumption rate.

For clone X: produces lactate when glc > glc_switch; consumes lactate when glc < glc_switch. For clone Y: always produces lactate.

Parameters:
  • glc (float) – Current glucose [g L⁻¹].

  • q_glc_val (float) – Volumetric glucose consumption rate [g L⁻¹ h⁻¹].

  • vcd (float) – Viable cell density [10⁶ cells mL⁻¹]. Required for the metabolic-switch (lactate consumption) branch.

Returns:

Volumetric lactate rate [mmol L⁻¹ h⁻¹]. Positive = production; negative = consumption.

Return type:

float

References

[Gagnon2011]

Gagnon et al. (2011) — metabolic switch parameters.

q_mAb_growth: float = 2e-09
q_mAb_nongrowth: float = 1.8e-09
q_mab(mu_val, vcd)[source]

Volumetric mAb production rate (Luedeking–Piret, 1959).

\[R_{\text{mAb}} = (\alpha \mu + \beta) X_v\]
Parameters:
  • mu_val (float) – Specific growth rate [h⁻¹].

  • vcd (float) – VCD [10⁶ cells mL⁻¹].

Returns:

Volumetric mAb production rate [mg L⁻¹ h⁻¹].

Return type:

float

References

[LuedekingPiret1959]

Luedeking & Piret (1959).

High-level CHO perfusion mechanistic model.

This module wraps CHOKinetics and integrate_run() into a clean CHOPerfusionModel API suitable for use as a mechanistic prior in the hybrid SW-GP.

The model computes, for any process state \(s_i\), the mechanistic prediction of daily rates \(\hat{R}^{\text{mech}}_k(s_i)\). The hybrid model’s GP then learns the residual:

\[R_k(s_i) = \hat{R}^{\text{mech}}_k(s_i) + \epsilon_k(s_i)\]

where \(\epsilon_k \sim \mathcal{GP}(0, k(s_i, s_j))\).

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.1 (“Hybrid model structure”).

class perfusio.mechanistic.models.CHOPerfusionModel(kinetics=None, volume_L=0.25, dt_hours=24.0)[source]

Bases: object

Deterministic mechanistic CHO perfusion model.

Parameters:
  • kinetics (CHOKinetics | None) – Kinetic parameter set. If None, uses the default clone X parameters (consumes lactate at low glucose).

  • volume_L (float) – Nominal reactor working volume [L]. Default 0.250 (ambr®250).

  • dt_hours (float) – Reporting interval [h]. Default 24.0 (daily samples).

Examples

>>> model = CHOPerfusionModel()
>>> y0 = {"VCD": 1.0, "Via": 99.0, "Glc": 5.0, "Gln": 4.0,
...        "Glu": 0.5, "Lac": 2.0, "Amm": 0.5, "Pyr": 0.0,
...        "Titer": 0.0}
>>> controls = {"perfusion_rate": 1.0, "bleed_rate": 0.15,
...              "temperature": 37.0}
>>> traj = model.simulate(y0, controls, n_days=14, seed=42)
>>> traj.shape  # (15, 9) — 14 daily steps + initial state
(15, 9)
STATE_SPECIES: list[str] = ['VCD', 'Via', 'Glc', 'Gln', 'Glu', 'Lac', 'Amm', 'Pyr', 'Titer']
predict_rates(c, controls)[source]

Predict net volumetric rates at state c using the kinetic model.

Parameters:
Returns:

Rate tensor, shape (n_species,). Units are [species-unit h⁻¹].

Return type:

Tensor

predict_rates_batch(c_batch, controls_batch, control_names)[source]

Vectorised rate prediction for a batch of states.

Parameters:
  • c_batch (Tensor) – Batch of states, shape (N, n_species).

  • controls_batch (Tensor) – Batch of controls, shape (N, n_controls).

  • control_names (list[str]) – Ordered list of control names matching columns of controls_batch.

Returns:

Shape (N, n_species).

Return type:

Tensor

simulate(y0, controls, n_days, seed=None)[source]

Run a deterministic simulation over n_days daily steps.

Parameters:
  • y0 (dict[str, float]) – Initial state, mapping species name → value.

  • controls (dict[str, float]) – Control variable values (constant over the run).

  • n_days (int) – Number of daily steps to simulate.

  • seed (int | None) – Ignored (deterministic ODE; kept for API consistency).

Returns:

Shape (n_days + 1, n_species) where species follow STATE_SPECIES ordering. Row 0 is the initial state.

Return type:

np.ndarray

ODE integration utilities for the mechanistic CHO model.

Uses scipy.integrate.solve_ivp with automatic stiff fallback: - Primary solver: RK45 (non-stiff, fast). - Fallback: Radau (implicit, handles stiff metabolite ODEs). - Second fallback: LSODA (automatic stiffness detection).

All solutions are reported at daily (or sub-daily if requested) time points, regardless of the internal adaptive step chosen by the solver.

References

[SciPy]

Virtanen et al. (2020). SciPy 1.0. Nature Methods, 17, 261–272.

perfusio.mechanistic.integrators.integrate_run(kinetics, y0, controls, n_days, dt_hours=24.0)[source]

Integrate the CHO ODE system over n_days daily steps.

Parameters:
  • kinetics (CHOKinetics) – Kinetic parameter set.

  • y0 (list[float]) – Initial state vector, length n_species. Order must match STATE_ORDER.

  • controls (dict[str, float]) – Control variable dict (constant throughout the run).

  • n_days (int) – Number of daily steps to simulate.

  • dt_hours (float) – Reporting interval in hours. Default 24.0.

Returns:

Shape (n_days + 1, n_species) — states at each reporting point including the initial state.

Return type:

np.ndarray

Raises:

RuntimeError – If all ODE solvers fail to converge.

Notes

The integration time span is [0, n_days * dt_hours] hours. Reporting points are at every dt_hours interval. Negative concentrations are clipped to zero after each step.

Gaussian Process Layer

Kernel definitions for the hybrid SW-GP model.

The primary kernel is a product of:

  1. Matérn-5/2 over the metabolite state space (captures smooth, non-periodic autocorrelation in bioreactor trajectories).

  2. Linear kernel over the operating day variable (models the secular trend within a run).

  3. Categorical / Index kernel for multi-task output (one task per species / rate variable), sharing a common latent representation via an index kernel.

These choices follow Cruz-Bournazou et al. (2022) §2.3 and Rasmussen & Williams (2006) §4.2.

References

[RW2006]

Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press. §4.2.

[CruzBournazou2022]

Cruz-Bournazou, M. N., et al. (2022). Digitizing bioprocess development. Current Opinion in Biotechnology, 76, 102764.

class perfusio.gp.kernels.PerfusionKernel(n_tasks, n_state_dims, rank=1, ard_num_dims=1, **kwargs)[source]

Bases: Kernel

Composite kernel for CHO perfusion rate GP models.

The kernel factorises as:

\[k(\mathbf{x}, \mathbf{x}') = k_{\text{Matern5/2}}(\mathbf{x}_{\text{state}}, \mathbf{x}'_{\text{state}}) \times k_{\text{Linear}}(x_{\text{day}}, x'_{\text{day}}) \times k_{\text{Index}}(i, j)\]

where \(\mathbf{x}_{\text{state}}\) are metabolite concentrations, \(x_{\text{day}}\) is the culture day, and \(i,j\) index the output task (species).

Parameters:
  • n_tasks (int) – Number of output tasks (species / rate variables).

  • n_state_dims (int) – Dimensionality of the metabolite state input.

  • rank (int) – Rank of the index kernel covariance factor. Default 1.

  • ard_num_dims (int) – If > 1 uses automatic relevance determination for the Matérn kernel.

  • kwargs (Any)

Notes

Input convention: x[:, :n_state_dims] — metabolite concentrations (normalised) x[:, n_state_dims] — culture day x[:, n_state_dims + 1] — task index (integer)

forward(x1, x2, **params)[source]

Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be implemented by all Kernel subclasses.

Parameters:
  • x1 (Tensor) – First set of data (… x N x D).

  • x2 (Tensor) – Second set of data (… x M x D).

  • diag – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)

  • last_dim_is_batch – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

  • params (Any)

Returns:

The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

  • full_covar: … x N x M

  • full_covar with last_dim_is_batch=True: … x K x N x M

  • diag: … x N

  • diag with last_dim_is_batch=True: … x K x N

Return type:

Tensor

is_stationary: bool = False
class perfusio.gp.kernels.ResidualKernel(n_state_dims, ard_num_dims=1, **kwargs)[source]

Bases: Kernel

RBF-based residual kernel for the SW-GP correction term.

Used in HybridStateSpaceModel to capture smooth, local residuals that the mechanistic model cannot explain.

Parameters:
  • n_state_dims (int) – Dimensionality of the input (state + controls concatenated).

  • ard_num_dims (int) – If > 1, uses ARD (one length-scale per input dimension).

  • kwargs (Any)

forward(x1, x2, **params)[source]

Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be implemented by all Kernel subclasses.

Parameters:
  • x1 (Tensor) – First set of data (… x N x D).

  • x2 (Tensor) – Second set of data (… x M x D).

  • diag – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)

  • last_dim_is_batch – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

  • params (Any)

Returns:

The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

  • full_covar: … x N x M

  • full_covar with last_dim_is_batch=True: … x K x N x M

  • diag: … x N

  • diag with last_dim_is_batch=True: … x K x N

Return type:

Tensor

is_stationary: bool = True

Exact GP model for multi-task rate prediction.

Implements a batched ExactGP from GPyTorch that simultaneously models n_tasks output variables (metabolite rates) using a shared kernel and task-specific hyperparameters.

The model can be used with either a zero mean, linear mean, or a mechanistic prior mean (MechanisticPriorMean).

References

[Gardner2018]

Gardner, J. R., et al. (2018). GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. NeurIPS.

class perfusio.gp.exact_gp.MultiTaskRateGP(train_x, train_y, likelihood, mean_module, kernel=None, n_tasks=9, n_state_dims=9)[source]

Bases: ExactGP

Multi-output exact GP for metabolite rate prediction.

Each of the n_tasks outputs shares the same PerfusionKernel structure but has its own output scale and noise level.

Parameters:
  • train_x (Tensor) – Training inputs, shape (N * n_tasks, d_input) in the indexed multi-task format — last column is the task_id (0..n_tasks-1).

  • train_y (Tensor) – Training targets, shape (N * n_tasks,) — one scalar per row.

  • likelihood (GaussianLikelihood) – Gaussian likelihood (single-output; task correlation is captured by the PerfusionKernel IndexKernel).

  • mean_module (Mean) – Mean function module. Should be a GPyTorch Mean or MechanisticPriorMean.

  • kernel (PerfusionKernel | None) – Composite kernel. If None, constructs a default PerfusionKernel.

  • n_tasks (int) – Number of output tasks (species).

  • n_state_dims (int) – Input dimensionality (species + controls); needed to build default kernel.

Examples

>>> import torch, gpytorch
>>> from perfusio.gp import MultiTaskRateGP
>>> N, d, T = 50, 10, 5
>>> # Indexed format: N*T rows, last col = task_id
>>> X = torch.randn(N * T, d + 2)  # d state + 1 day + 1 task_id
>>> Y = torch.randn(N * T)          # flat scalar targets
>>> likelihood = gpytorch.likelihoods.GaussianLikelihood()
>>> mean = gpytorch.means.ZeroMean()
>>> gp = MultiTaskRateGP(X, Y, likelihood, mean, n_tasks=T, n_state_dims=d)
forward(x)[source]

Compute the prior distribution at x.

Parameters:

x (Tensor) – Input tensor, shape (N, d_input).

Returns:

The GP prior (mean + covariance).

Return type:

MultivariateNormal

predict_with_ci(x_new, ci_levels=(0.1, 0.5, 0.9))[source]

Return posterior mean and credible-interval quantiles.

Parameters:
  • x_new (Tensor) – Test inputs, shape (M, d_input).

  • ci_levels (tuple[float, float, float]) – Probability levels for quantiles (e.g. 10/50/90th percentiles).

Returns:

Keys "mean" and "q{p*100:.0f}" for each level. All tensors have shape (M,) (or (M, n_tasks) for multitask).

Return type:

dict[str, Tensor]

Step-wise GP (SW-GP) for multi-step-ahead trajectory prediction.

The SW-GP predicts the state at time \(t+1\) from the state at time \(t\) (one-step GP). For multi-step-ahead forecasting, predictions are chained (rollout) with uncertainty propagated by:

  1. Monte Carlo sampling: sample from the one-step posterior, feed the sample as the next step’s input, repeat h times, and report percentiles over n_samples trajectories.

  2. Moment matching (fast approximation): propagate mean and variance analytically through the one-step GP (Girard et al., 2003). Used when speed is critical (BED inner loop).

Both approaches are exposed through a common predict_quantiles interface.

References

[Girard2003]

Girard, A., et al. (2003). Gaussian Process Priors With Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting. NeurIPS.

[Gadiyar2026]

Gadiyar et al. (2026), §3.2 (SW-GP rollout for BED OFV).

class perfusio.gp.stepwise.StepwiseGP(model, likelihood, n_species, control_names)[source]

Bases: object

Step-wise GP with multi-step rollout.

Wraps a fitted one-step GP model and provides predict_quantiles() for horizon-h forecasts.

Parameters:
  • model (gpytorch.models.ExactGP) – Fitted GPyTorch one-step GP model.

  • likelihood (gpytorch.likelihoods.Likelihood) – Corresponding likelihood module.

  • n_species (int) – Number of state species modelled.

  • control_names (list[str]) – Ordered list of control variable names.

Notes

The one-step GP takes inputs of shape (N, n_species + n_controls + 1) — species concentrations, controls, and culture day — and outputs (N, n_species) species concentration at the next time step.

predict_quantiles(x0, controls, horizon, n_samples=200, seed=None, method='mc')[source]

Multi-step-ahead prediction with uncertainty quantification.

Parameters:
  • x0 (Tensor) – Initial state, shape (n_species,) or (B, n_species).

  • controls (Tensor) – Control trajectories, shape (horizon, n_controls) or (B, horizon, n_controls).

  • horizon (int) – Number of steps ahead to predict.

  • n_samples (int) – Number of MC samples (ignored when method="moment_matching").

  • seed (int | None) – Random seed for MC sampling.

  • method (str) – "mc" (default) or "moment_matching".

Returns:

Keys: "mean" shape (horizon, n_species), "q10" / "q50" / "q90" same shape.

Return type:

dict[str, Tensor]

Jackknife ensemble of GP models for uncertainty quantification.

The ensemble trains K GP models, each on a random subsample of the training data (jackknife-after-bootstrap), then aggregates their predictions as 10th / 50th / 90th percentiles. This provides calibrated predictive intervals even for small training sets (< 30 observations per species), where the Normal approximation of a single GP may be overconfident.

The approach is analogous to the jackknife+ of Barber et al. (2021) but applied at the GP model level rather than at the residual level.

References

[Barber2021]

Barber, R. F., et al. (2021). Predictive inference with the jackknife+. Annals of Statistics, 49(1), 486–507.

[Gadiyar2026]

Gadiyar et al. (2026), §3.2 (ensemble calibration).

class perfusio.gp.ensemble.EnsembleMember(model, likelihood, train_idx)[source]

Bases: object

Container for a single jackknife ensemble member.

Parameters:
  • model (ExactGP)

  • likelihood (Likelihood)

  • train_idx (Tensor)

likelihood: Likelihood
model: ExactGP
train_idx: Tensor
class perfusio.gp.ensemble.JackknifeEnsemble(K=20, subsample_fraction=0.8, seed=0)[source]

Bases: object

Jackknife ensemble over K GP models.

Parameters:
  • K (int) – Number of ensemble members. Default 20 (Gadiyar §3.2).

  • subsample_fraction (float) – Fraction of training data used per member. Default 0.8.

  • seed (int) – Random seed for reproducibility.

Examples

>>> from perfusio.gp import JackknifeEnsemble, MultiTaskRateGP
>>> import gpytorch, torch
>>> ens = JackknifeEnsemble(K=5)
fit(train_x, train_y, model_factory, n_iter=200, lr=0.05)[source]

Train all K ensemble members.

Parameters:
  • train_x (Tensor) – Training inputs, shape (N, d).

  • train_y (Tensor) – Training targets, shape (N, n_tasks) or (N,).

  • model_factory (object) – Callable (x_sub, y_sub) -> (model, likelihood) that constructs a fresh GP model for a given subsample.

  • n_iter (int) – MLL training iterations per member.

  • lr (float) – Adam learning rate.

Returns:

Self (for chaining).

Return type:

JackknifeEnsemble

members: list[EnsembleMember]
predict(x_new, quantiles=(0.1, 0.5, 0.9))[source]

Ensemble posterior mean and quantiles.

Parameters:
  • x_new (Tensor) – Test inputs, shape (M, d).

  • quantiles (tuple[float, float, float]) – Probability levels. Default (0.10, 0.50, 0.90).

Returns:

Keys: "mean" (ensemble mean), and "q{p*100:.0f}" for each quantile level. All tensors have the same shape as x_new[..., 0].

Return type:

dict[str, Tensor]

Raises:

RuntimeError – If fit() has not been called yet.

Mean function definitions for the GP module.

Provides three mean functions:

  1. ZeroMeanMultiTask — zero mean function used when a pure data-driven GP is desired or when the training set is large.

  2. LinearMean — affine mean function for improved extrapolation.

  3. MechanisticPriorMean — wraps the CHOPerfusionModel to provide a first-principles prior mean. The GP then learns only the residual between the mechanistic prediction and the observations.

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.1 (“Hybrid model”).

class perfusio.gp.means.LinearMean(input_size, batch_shape=None)[source]

Bases: Mean

Affine (linear + bias) mean function.

Parameters:
  • input_size (int) – Dimensionality of the input.

  • batch_shape (torch.Size | None) – Batch shape (for multi-output GP).

Notes

Parameters are learnable via MLL training.

forward(x)[source]

Evaluate the linear mean.

Parameters:

x (Tensor) – Input tensor, shape (..., input_size).

Returns:

Mean predictions, shape (...,).

Return type:

Tensor

class perfusio.gp.means.MechanisticPriorMean(mech_model, n_species, species_names, control_names, task_index=None, scale=1.0)[source]

Bases: Mean

GP mean function backed by a mechanistic CHO model.

At any input point \(\mathbf{x} = (\mathbf{c}, \mathbf{u}, t)\), this mean function evaluates the mechanistic rate \(\hat{R}^{\text{mech}}_k(\mathbf{x})\) from CHOPerfusionModel.

The GP thus models:

\[R_k(\mathbf{x}) = \hat{R}^{\text{mech}}_k(\mathbf{x}) + \epsilon_k(\mathbf{x})\]

where \(\epsilon_k \sim \mathcal{GP}(0, k_{\epsilon})\).

Parameters:
  • mech_model (CHOPerfusionModel) – Instantiated CHOPerfusionModel.

  • n_species (int) – Number of species whose rates are modelled.

  • species_names (list[str]) – Ordered list of species names matching kinetics.STATE_ORDER.

  • control_names (list[str]) – Ordered list of control names.

  • task_index (int | None) – Index of the target task (species index into STATE_ORDER). Used when the GP is single-task, predicting one species at a time. Pass None for multi-task models that predict all species jointly.

  • scale (float) – Optional learnable scale parameter to allow the GP to up- or down-weight the mechanistic prior.

Notes

The input layout expected is: x[:, :n_species] — normalised species concentrations x[:, n_species:n_species+n_controls] — controls x[:, -1] — culture day (hours ÷ 24 internally)

Gradient flow is blocked through the mechanistic model (torch.no_grad) since the kinetics are not themselves differentiable PyTorch modules.

forward(x)[source]

Return mechanistic rate predictions for a batch of inputs.

Parameters:

x (Tensor) – Input batch, shape (N, n_species + n_controls + 1).

Returns:

Shape (N,) if single-task, (N, n_species) if multi-task.

Return type:

Tensor

class perfusio.gp.means.ZeroMeanMultiTask(batch_shape=(), **kwargs)[source]

Bases: ZeroMean

Zero mean function — thin subclass for consistent naming.

Transfer Learning

Clone registry and entity-embedding module.

Each CHO cell line (clone) is assigned a learnable dense embedding vector that captures clone-specific characteristics (growth rate, metabolism, productivity) in a low-dimensional latent space. The embedding is learned jointly with the GP hyperparameters during MLL training and fine-tuned during transfer.

Architecture

clone_id  →  Embedding(n_clones, embed_dim)  →  e ∈ ℝ^{embed_dim}
e is concatenated with (c, u, t) → GP input

Following Hutter et al. (2021) §2.3, the default embedding dimension is 4, which was found to balance expressiveness and identifiability for ≤ 20 clones.

References

[Hutter2021]

Hutter et al. (2021), §2.3.

class perfusio.embedding.clones.CloneInfo(clone_id, name, consumes_lactate=True, description='')[source]

Bases: object

Metadata for a single CHO cell line.

Parameters:
  • clone_id (int)

  • name (str)

  • consumes_lactate (bool)

  • description (str)

clone_id: int
consumes_lactate: bool = True
description: str = ''
name: str
class perfusio.embedding.clones.CloneRegistry(clones)[source]

Bases: object

Registry that maps clone names to integer IDs and metadata.

Parameters:

clones (list[CloneInfo]) – List of CloneInfo objects. The clone_id field must form a contiguous range starting at 0.

Examples

>>> from perfusio.embedding import CloneRegistry
>>> reg = CloneRegistry.default()
>>> reg["CloneX"].clone_id
0
>>> len(reg)
2
clone_id(name)[source]

Return the integer ID for a clone by name.

Parameters:

name (str)

Return type:

int

classmethod default()[source]

Return the default two-clone registry (Clone X and Clone Y).

Clone X switches from lactate production to consumption at low glucose (Warburg reversal). Clone Y does not.

Return type:

CloneRegistry

property n_clones: int

Number of registered clones.

class perfusio.embedding.clones.EntityEmbedding(n_clones, embed_dim=4)[source]

Bases: Module

Learnable dense entity embedding for CHO cell lines.

Parameters:
  • n_clones (int) – Number of distinct clones (vocabulary size).

  • embed_dim (int) – Embedding dimensionality. Default 4 (Hutter et al. 2021, §2.3).

Examples

>>> emb = EntityEmbedding(n_clones=2, embed_dim=4)
>>> clone_ids = torch.tensor([0, 1, 0])  # batch of three samples
>>> e = emb(clone_ids)
>>> e.shape
torch.Size([3, 4])
embed_and_concat(x, clone_ids)[source]

Concatenate state features with clone embeddings.

Parameters:
  • x (Tensor) – Feature tensor, shape (N, d).

  • clone_ids (Tensor) – Clone ID tensor, shape (N,).

Returns:

Augmented features, shape (N, d + embed_dim).

Return type:

Tensor

forward(clone_ids)[source]

Look up embeddings for a batch of clone IDs.

Parameters:

clone_ids (Tensor) – Integer tensor of clone IDs, shape (N,) or (B,).

Returns:

Embedding matrix, shape (N, embed_dim).

Return type:

Tensor

Transfer learning orchestration for entity-embedded GP models.

Implements the two-stage transfer protocol of Hutter et al. (2021):

  1. Warm start (warm_start): Train on the source clone dataset. The EntityEmbedding is initialised and the full model (embedding + GP kernel) is optimised.

  2. Joint fine-tuning (joint_finetune): Load weights from the source run, optionally freeze the kernel backbone, and fine-tune on a small target clone dataset with a higher learning rate on the embedding.

This two-stage approach allows the model to learn shared process dynamics from data-rich source clones and rapidly adapt to a new clone with as few as 3–5 runs (Hutter et al. §3.2).

References

[Hutter2021]

Hutter et al. (2021), §2.3 and §3.2.

class perfusio.embedding.transfer.TransferLearner(embedding, model, likelihood, lr_backbone=0.01, lr_embedding=0.1)[source]

Bases: object

Two-stage transfer learning manager.

Parameters:
  • embedding (EntityEmbedding) – Shared EntityEmbedding module.

  • model (gpytorch.models.ExactGP) – GPyTorch model (must accept augmented inputs from the embedding).

  • likelihood (gpytorch.likelihoods.Likelihood) – Corresponding GPyTorch likelihood.

  • lr_backbone (float) – Learning rate for GP kernel hyperparameters during fine-tuning.

  • lr_embedding (float) – Learning rate for embedding parameters during fine-tuning. Typically 10× the backbone lr.

Examples

>>> from perfusio.embedding import TransferLearner, EntityEmbedding
>>> emb = EntityEmbedding(n_clones=2, embed_dim=4)
>>> # attach model and likelihood (constructed externally)
>>> # tl = TransferLearner(emb, model, likelihood)
joint_finetune(train_x, train_y, clone_ids, n_iter=100, freeze_backbone=False)[source]

Fine-tune on target-clone data with differential learning rates.

Parameters:
  • train_x (Tensor) – Target features (without embedding), shape (M, d_raw). M may be as small as the number of experiments × time points.

  • train_y (Tensor) – Targets, shape (M,) or (M, n_tasks).

  • clone_ids (Tensor) – Clone IDs for the target data, shape (M,).

  • n_iter (int) – Number of fine-tuning steps.

  • freeze_backbone (bool) – If True, freeze all GP kernel/mean parameters and update only the embedding. Useful when the target dataset is very small (< 3 runs) to avoid catastrophic forgetting.

Return type:

None

load(path)[source]

Load state from disk (must match the architecture).

Parameters:

path (str | Path) – Path to checkpoint saved by save().

Return type:

None

save(path)[source]

Save embedding and model state to disk.

Parameters:

path (str | Path) – File path. Saves a dict with "embedding", "model", "likelihood" state dicts.

Return type:

None

warm_start(train_x, train_y, clone_ids, n_iter=300)[source]

Train the full model on source-clone data.

Parameters:
  • train_x (Tensor) – Source features (without embedding), shape (N, d_raw).

  • train_y (Tensor) – Targets, shape (N,) or (N, n_tasks).

  • clone_ids (Tensor) – Integer clone IDs, shape (N,).

  • n_iter (int) – Number of optimisation steps.

Return type:

None

Hybrid State-Space Model

Hybrid State-Space Model: mechanistic prior + GP residual.

The HybridStateSpaceModel is the central model object used by the self-driving loop. It holds:

Prediction combines the mechanistic and GP contributions as:

\[\hat{c}_{t+1} = c_t + \Delta t \cdot \left[ \hat{R}^{\text{mech}}(c_t, u_t) + \hat{\epsilon}^{\text{GP}}(c_t, u_t) \right]\]

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.1 and §3.2.

class perfusio.hybrid.model.HybridStateSpaceModel(mech_model, gp_model, embedding=None, dt_hours=24.0, species_names=None, control_names=None)[source]

Bases: object

Combined mechanistic + GP hybrid model.

Parameters:
  • mech_model (CHOPerfusionModel) – Fitted (or default) mechanistic model.

  • gp_model (StepwiseGP) – Fitted one-step SW-GP model.

  • embedding (EntityEmbedding | None) – Optional entity embedding (None for single-clone scenarios).

  • dt_hours (float) – Time step in hours. Default 24 (daily).

  • species_names (list[str] | None) – Ordered species names corresponding to state columns.

  • control_names (list[str] | None) – Ordered control names corresponding to the control input.

Examples

>>> from perfusio.mechanistic import CHOPerfusionModel
>>> mech = CHOPerfusionModel()
>>> # gp_model = ... (fitted StepwiseGP)
>>> # hybrid = HybridStateSpaceModel(mech, gp_model)
predict_next_state(c_t, u_t, day, clone_id=None)[source]

Predict next-day species concentrations.

Parameters:
  • c_t (Tensor) – Current state, shape (n_species,).

  • u_t (Tensor) – Current controls, shape (n_controls,).

  • day (int) – Current culture day (1-indexed).

  • clone_id (int | None) – Integer clone index for embedding lookup. Required only when self.embedding is not None.

Returns:

(mean, q10, q90) — posterior mean and 80% credible interval for \(c_{t+1}\). All shape (n_species,).

Return type:

tuple[Tensor, Tensor, Tensor]

predict_trajectory(c0, controls, n_days, clone_id=None, n_samples=200, seed=0)[source]

Rollout the hybrid model over n_days steps.

Parameters:
  • c0 (Tensor) – Initial state, shape (n_species,).

  • controls (Tensor) – Control sequence, shape (n_days, n_controls) (constant or varying by day).

  • n_days (int) – Prediction horizon.

  • clone_id (int | None) – Clone ID for embedding.

  • n_samples (int) – MC samples for GP rollout.

  • seed (int) – Random seed.

Returns:

Keys "mean", "q10", "q90" each shape (n_days, n_species).

Return type:

dict[str, Tensor]

Training routines for the hybrid SW-GP model.

Provides:

  • train_hybrid(): Main training entry point. Fits the GP residual on pre-computed mechanistic residuals, then optionally runs jackknife ensemble.

  • retrain_online(): Lightweight incremental update when new daily data arrive (fast re-optimisation of GP hyperparameters only).

Optimisation strategy

  1. L-BFGS (primary): batch optimiser, fast for small datasets (≤ 200 pts).

  2. Adam fallback: used when L-BFGS diverges (detected by NaN/Inf loss).

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.3 (model retrain cadence).

perfusio.hybrid.train.retrain_online(new_x, new_y, model, likelihood, n_iter=50, lr=0.05)[source]

Incremental re-optimisation when new daily data arrive.

Uses Adam only (L-BFGS is overkill for a single new data point). The existing GP training data are not replaced; the model is updated in-place by calling set_train_data to append the new observations.

Parameters:
  • new_x (Tensor) – New inputs, shape (M, d).

  • new_y (Tensor) – New targets, shape (M,) or (M, n_tasks).

  • model (ExactGP) – Model to update in-place.

  • likelihood (Likelihood) – Corresponding likelihood.

  • n_iter (int) – Number of Adam steps.

  • lr (float) – Learning rate.

Return type:

None

Notes

strict=False is passed to set_train_data to allow appending to a different-length dataset.

perfusio.hybrid.train.train_hybrid(train_x, train_y, model, likelihood, n_iter_lbfgs=50, n_iter_adam=200, lr_adam=0.05)[source]

Fit the hybrid GP residual model by maximising the MLL.

Attempts L-BFGS first; falls back to Adam if L-BFGS produces NaN loss.

Parameters:
  • train_x (Tensor) – Training inputs, shape (N, d).

  • train_y (Tensor) – Residual targets (observed − mechanistic), shape (N,) or (N, n_tasks).

  • model (ExactGP) – GPyTorch ExactGP model.

  • likelihood (Likelihood) – Corresponding GPyTorch likelihood.

  • n_iter_lbfgs (int) – Number of L-BFGS steps.

  • n_iter_adam (int) – Number of Adam fallback steps.

  • lr_adam (float) – Adam learning rate.

Return type:

None

Multi-step-ahead trajectory forecasting with the hybrid model.

Exposes forecast_run() as the primary public entry point. It wraps the HybridStateSpaceModel MC rollout and returns structured outputs that can be directly consumed by the BED OFV function and the visualisation layer.

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.2 (3-step-ahead horizon for BED).

perfusio.hybrid.forecast.forecast_run(hybrid, c0, controls, horizon=3, n_samples=200, seed=0, clone_id=None)[source]

Forecast species trajectories horizon steps ahead.

Parameters:
  • hybrid (HybridStateSpaceModel) – Fitted HybridStateSpaceModel.

  • c0 (Tensor) – Initial state, shape (n_species,).

  • controls (Tensor) – Control sequence to apply. Shape (horizon, n_controls). If shape (n_controls,) is given, the same controls are applied at every step.

  • horizon (int) – Number of future days to predict. Default 3 (Gadiyar §3.2).

  • n_samples (int) – MC rollout samples for uncertainty quantification.

  • seed (int) – Random seed.

  • clone_id (int | None) – Clone ID for embedding lookup.

Returns:

A dict with keys:

  • "days" — shape (horizon,) — day indices (1-indexed from c0).

  • "mean" — shape (horizon, n_species) — posterior mean.

  • "q10" — shape (horizon, n_species) — 10th percentile.

  • "q50" — shape (horizon, n_species) — 50th percentile (median).

  • "q90" — shape (horizon, n_species) — 90th percentile.

Return type:

dict[str, Tensor]

Bayesian Experimental Design

BoTorch acquisition function factory for Bayesian Experimental Design.

Provides build_acquisition(), a single entry-point that constructs any of the 11 acquisition functions supported by perfusio:

Single-objective (analytic):

PI, EI, LogEI, UCB

Single-objective (Monte Carlo):

qEI, qLogEI, qUCB

Multi-objective (Monte Carlo):

qEHVI, qNEHVI, qNParEGO

Constrained variants are automatically applied when constraints are passed.

References

[Balandat2020]

Balandat, M., et al. (2020). BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. NeurIPS.

perfusio.bed.acquisitions.build_acquisition(name, model, best_f=None, beta=0.2, ref_point=None, partitioning=None, sampler=None, constraints=None, **kwargs)[source]

Construct a BoTorch acquisition function by name.

Parameters:
  • name (str) – One of: PI, EI, LogEI, UCB, qEI, qLogEI, qUCB, qEHVI, qNEHVI, qNParEGO.

  • model (Model) – Fitted BoTorch / GPyTorch surrogate model.

  • best_f (float | None) – Incumbent best observed value. Required for PI, EI, LogEI, qEI, qLogEI.

  • beta (float) – UCB exploration parameter. Required for UCB, qUCB.

  • ref_point (list[float] | None) – Reference point for hypervolume-based acquisitions (qEHVI, qNEHVI). Should be strictly dominated by all feasible points.

  • partitioning (Any) – Pre-computed box decomposition (NondominatedPartitioning). Required for qEHVI.

  • sampler (Any) – MC sampler (IIDNormalSampler or SobolQMCNormalSampler). Defaults to SobolQMCNormalSampler(512) for MC acquisitions.

  • constraints (list[Callable[[Tensor], Tensor]] | None) – Optional list of constraint callables c(X) <= 0 to enforce. Applied as soft-constraint penalties where available.

  • **kwargs (Any) – Additional keyword arguments forwarded to the acquisition constructor.

Return type:

botorch.acquisition.AcquisitionFunction

Raises:

ValueError – If name is not a recognised acquisition type.

Objective function values (OFVs) for Bayesian Experimental Design.

Implements Gadiyar et al. (2026) Eq. 7:

\[\text{OFV}(\mathbf{u}) = \sum_{j=1}^{3} \left( \hat{y}_{t+j}(\mathbf{u}) - y_{\text{target}} \right)^2\]

where \(\hat{y}_{t+j}\) is the j-step-ahead hybrid model prediction (50th-percentile / mean) for the chosen objective species (e.g. VCD, Titer, viability).

The OFV is used to evaluate candidate control setpoints \(\mathbf{u}\) proposed by the acquisition function.

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.2, Eq. (7).

class perfusio.bed.objectives.MultiObjectiveOFV(hybrid, objectives, horizon=3, n_samples=100, seed=42)[source]

Bases: object

Vector-valued OFV for multi-objective BED.

Returns one OFV per objective (e.g. maximise titer AND minimise ammonium). Used with multi-objective acquisition functions (qEHVI, qNEHVI, qNParEGO).

Parameters:
  • hybrid (HybridStateSpaceModel) – Fitted hybrid model.

  • objectives (list[tuple[int, float, float]]) – List of (species_index, target, direction) tuples where direction is +1 for maximisation and -1 for minimisation (the sign is applied to the OFV before passing to botorch acqf).

  • horizon (int) – Prediction horizon.

  • n_samples (int) – MC rollout samples.

  • seed (int) – Random seed.

evaluate(c0, u)[source]

Return the multi-objective OFV vector for one candidate control.

Parameters:
  • c0 (Tensor) – Current state, shape (n_species,).

  • u (Tensor) – Candidate control, shape (n_controls,).

Returns:

Shape (n_objectives,). Positive values = better for botorch (all objectives are negated and then passed as maximisation).

Return type:

Tensor

class perfusio.bed.objectives.TargetSpec(species_index, target, weight=1.0)[source]

Bases: object

Specification for a single tracked species target.

Parameters:
species_index: int
target: float
weight: float = 1.0
class perfusio.bed.objectives.TargetTrackingOFV(targets, hybrid=None, horizon=3, n_samples=100, seed=42)[source]

Bases: object

3-step-ahead squared-error OFV from Gadiyar et al. (2026) Eq. 7.

Parameters:
  • hybrid (HybridStateSpaceModel | None) – Fitted hybrid model for forecasting.

  • targets (list[TargetSpec]) – List of TargetSpec objects — one per tracked species.

  • horizon (int) – Prediction horizon in days. Default 3 (Gadiyar §3.2).

  • n_samples (int) – MC rollout samples. Default 100 (fast for BED inner loop).

  • seed (int) – Fixed random seed for reproducible OFV evaluations.

Examples

>>> from perfusio.bed.objectives import TargetTrackingOFV, TargetSpec
>>> # assume `hybrid` is a fitted HybridStateSpaceModel
>>> ofv = TargetTrackingOFV(
...     hybrid=hybrid,
...     targets=[TargetSpec(0, target=30.0, weight=1.0)],  # VCD → 30e6 cells/mL
... )
>>> # ofv.evaluate(c0, u_candidate)  # returns scalar Tensor
evaluate(c0, u)[source]

Evaluate the OFV for a candidate control vector.

Parameters:
  • c0 (Tensor) – Current state, shape (n_species,).

  • u (Tensor) – Candidate control vector, shape (n_controls,). Applied constantly over the horizon.

Returns:

Scalar OFV value (lower = better).

Return type:

Tensor

evaluate_batch(c0, u_batch)[source]

Evaluate OFV for a batch of candidate controls.

Parameters:
  • c0 (Tensor) – Current state, shape (n_species,).

  • u_batch (Tensor) – Candidate controls, shape (B, n_controls).

Returns:

OFV values, shape (B,).

Return type:

Tensor

score_trajectories(Y)[source]

Score a batch of pre-computed trajectories against targets.

Useful for evaluating acquisition function surrogates directly on GP prediction samples without running the full hybrid model.

Parameters:

Y (Tensor) – Pre-computed trajectories, shape (B, horizon, n_species).

Returns:

OFV values, shape (B,) — higher is better (negative SSE).

Return type:

Tensor

High-level BED policy — daily self-driving decision loop.

BEDPolicy orchestrates the complete self-driving step:

  1. Receive current bioreactor state.

  2. Evaluate the acquisition function over the design space.

  3. Propose next setpoints (with --allow-write safety gate).

  4. Log the decision and uncertainty to the audit trail.

This is the class called by the digital twin’s daily scheduler.

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.2 and Fig. 5.

class perfusio.bed.policies.BEDDecision(timestamp, day, recommended_controls, acqf_value, acqf_name, current_state, forecast_mean=<factory>)[source]

Bases: object

A single decision made by the BED policy.

Parameters:
acqf_name: str
acqf_value: float
current_state: dict[str, float]
day: int
forecast_mean: dict[str, list[float]]
recommended_controls: dict[str, float]
timestamp: datetime
class perfusio.bed.policies.BEDPolicy(hybrid, design_space, acqf_name='qLogEI', allow_write=False, targets=None, n_restarts=10, n_raw_samples=512)[source]

Bases: object

Daily self-driving control policy.

Parameters:
  • hybrid (object) – Fitted hybrid model.

  • design_space (DesignSpace) – Process design space with control bounds.

  • acqf_name (str) – Acquisition function to use. Default "qLogEI" (robust for noisy bioprocess observations).

  • allow_write (bool) – Safety gate: if False, recommendations are logged but NOT written to the bioreactor. Must be True for closed-loop operation.

  • targets (list[object] | None) – List of TargetSpec objects.

  • n_restarts (int) – optimize_acqf restarts.

  • n_raw_samples (int) – Sobol initialisation candidates.

Examples

>>> from perfusio.bed import BEDPolicy
>>> # policy = BEDPolicy(hybrid, design_space, allow_write=False)
>>> # decision = policy.decide(c_current, day=7)
decide(c_current, day, surrogate_model, best_f=None, seed=None)[source]

Compute the recommended control setpoints for the next day.

Parameters:
  • c_current (Tensor) – Current species state, shape (n_species,).

  • day (int) – Current culture day.

  • surrogate_model (Any) – Fitted BoTorch surrogate (wraps the hybrid GP).

  • best_f (float | None) – Best observed OFV so far. Required for EI-based acquisitions.

  • seed (int | None) – Random seed for deterministic optimisation.

Returns:

Decision record including recommended controls and forecast.

Return type:

BEDDecision

property history: list[BEDDecision]

All decisions made so far.

Pareto front computation and hypervolume indicator.

Provides:

These are used by the multi-objective BED loop to assess diversity and progress of the Pareto set.

References

[Daulton2020]

Daulton, S., et al. (2020). Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. NeurIPS.

perfusio.bed.pareto.compute_pareto_front(Y)[source]

Return the indices of non-dominated points in Y.

Assumes maximisation for all objectives (negate objectives that should be minimised before calling).

Parameters:

Y (Tensor) – Objective matrix, shape (N, n_objectives). Higher is better for all objectives.

Returns:

Boolean mask of shape (N,)True for non-dominated points.

Return type:

Tensor

Examples

>>> import torch
>>> from perfusio.bed.pareto import compute_pareto_front
>>> Y = torch.tensor([[1.0, 2.0], [2.0, 1.0], [1.5, 1.5], [0.5, 0.5]])
>>> compute_pareto_front(Y)
tensor([ True,  True,  True, False])
perfusio.bed.pareto.hypervolume(Y, ref_point)[source]

Compute the hypervolume indicator for a set of points.

Uses the BoTorch Hypervolume class for efficiency and correctness.

Parameters:
  • Y (Tensor) – Objective matrix of non-dominated points, shape (N, n_objectives). Higher is better (negate minimisation objectives before calling).

  • ref_point (Tensor) – Reference point, shape (n_objectives,). Must be dominated by all points in Y.

Returns:

Hypervolume dominated by Y w.r.t. ref_point.

Return type:

float

Acquisition function optimisation over the process design space.

Wraps BoTorch’s optimize_acqf with: - Explicit lower/upper bounds from DesignSpace. - Restarts from Sobol quasi-random initial candidates. - Batch support for q > 1 (parallelised reactor experiments).

References

[Balandat2020]

Balandat et al. (2020). BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. NeurIPS.

perfusio.bed.search.optimise_acquisition(acqf, design_space, q=1, n_restarts=10, n_raw_samples=512, fixed_features=None, seed=None, dtype=torch.float64, device=None)[source]

Optimise acqf over the control design space.

Parameters:
  • acqf (ba.AcquisitionFunction) – BoTorch acquisition function.

  • design_space (DesignSpace) – DesignSpace with control bounds.

  • q (int) – Batch size (number of concurrent recommendations). Use q > 1 only with MC acquisitions (qEI, qUCB, etc.).

  • n_restarts (int) – Number of restarts for optimize_acqf.

  • n_raw_samples (int) – Number of Sobol candidates for initialisation.

  • fixed_features (dict[int, float] | None) – Dict mapping feature indices to fixed values (e.g. fix temperature).

  • seed (int | None) – Random seed.

  • dtype (torch.dtype) – Tensor dtype. Default torch.float64.

  • device (torch.device | None) – Compute device. Default CPU.

Returns:

(candidates, acqf_values) — the optimal candidate(s) (shape (q, n_controls)) and their acquisition values (shape (q,)).

Return type:

tuple[Tensor, Tensor]

Simulator

CHO perfusion simulator — virtual ambr®250 bioreactor.

Provides a fully self-contained, stochastic in-silico CHO bioreactor that replicates the operating conditions described in Gadiyar et al. (2026) §3.3:

  • Clone X: Warburg metabolic switch (lactate consumption at Glc < 2 g/L)

  • Clone Y: no switch (always produces lactate)

  • 24-run Box-Behnken training experiment generator

  • 5-reactor closed-loop demonstration (Gadiyar Fig. 5)

The simulator implements the same BioreactorConnector interface as the real OPC UA connector, so the digital twin and self-driving loop can run identically against both the virtual and real hardware.

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.3 (in-silico validation).

class perfusio.simulator.cho_perfusion.CHOSimulator(clone='CloneX', y0=None, controls=None, volume_L=0.25, noise=None, acceleration=1.0, seed=42)[source]

Bases: object

Virtual ambr®250 bioreactor implementing the BioreactorConnector interface.

Runs the CHO kinetic ODE in accelerated simulation time (up to 1000× real-time), applies measurement noise, and exposes the same async API as the OPC UA connector.

Parameters:
  • clone (str) – "CloneX" (Warburg switch) or "CloneY" (no switch).

  • y0 (dict[str, float] | None) – Initial state dict. Defaults to DEFAULT_Y0.

  • controls (dict[str, float] | None) – Initial control setpoints. Updated via write_setpoints().

  • volume_L (float) – Reactor working volume [L]. Default 0.250 (ambr®250).

  • noise (NoiseModel | None) – Noise model. Defaults to NoiseModel.

  • acceleration (float) – Simulation speed multiplier vs. real-time. Default 1 (real-time). Set to 1440 for 1 simulated day per real minute.

  • seed (int) – Random seed.

Examples

>>> import asyncio
>>> from perfusio.simulator import CHOSimulator
>>> sim = CHOSimulator(clone="CloneX", seed=42)
>>> asyncio.run(sim.read_sample(day=3))
STATE_SPECIES: list[str] = ['VCD', 'Via', 'Glc', 'Gln', 'Glu', 'Lac', 'Amm', 'Pyr', 'Titer']
generate_box_behnken_experiment(n_days=28, seed=0)[source]

Generate a 24-run Box-Behnken training dataset.

Performs a Box-Behnken design over 4 key control factors: - perfusion_rate: [0.5, 1.0, 1.5] vvd - bleed_rate: [0.10, 0.15, 0.20] vvd - temperature: [36.5, 37.0, 37.5] °C - glucose_setpoint:[4.0, 5.0, 6.0] g/L

Returns:

Each element is {"run_id": int, "controls": dict, "trajectory": np.ndarray, "noisy_samples": list[dict]}.

Return type:

list[dict]

Parameters:
async is_alive()[source]

Return True if the virtual reactor is still running.

Return type:

bool

async read_sample(day)[source]

Return a (noisy) sample for the given culture day.

Parameters:

day (int) – Culture day (1-indexed).

Returns:

Observed species values. None indicates missing data.

Return type:

dict[str, float | None]

simulate_run(n_days=28, controls=None, seed=None)[source]

Run a complete N-day simulation deterministically.

Parameters:
  • n_days (int) – Number of daily time steps.

  • controls (dict[str, float] | None) – Override control dict. If None, uses current setpoints.

  • seed (int | None) – Ignored (ODE is deterministic).

Returns:

Shape (n_days + 1, n_species) — clean (noiseless) trajectory.

Return type:

np.ndarray

async write_setpoints(setpoints)[source]

Update control setpoints (resets the cached ODE trajectory).

Parameters:

setpoints (dict[str, float]) – New setpoints dict. Only keys present in the dict are updated.

Return type:

None

Design-of-Experiments (DoE) plan generators.

Provides: - box_behnken(): Box-Behnken design for 3–7 factors. - central_composite(): Central Composite Design (CCD). - latin_hypercube(): Maximin Latin Hypercube (using SciPy’s QMC engine).

All functions return a numpy.ndarray of shape (n_runs, n_factors) in the normalised space [-1, 1] unless scale_to_bounds is True.

References

[BoxBehnken1960]

Box, G. E. P., & Behnken, D. W. (1960). Some new three level designs for the study of quantitative variables. Technometrics, 2(4).

[Montgomery2017]

Montgomery, D. C. (2017). Design and Analysis of Experiments. 9th ed. Wiley. §14.

perfusio.simulator.doe.box_behnken(n_factors, center_points=3)[source]

Generate a Box-Behnken design matrix.

Parameters:
  • n_factors (int) – Number of continuous factors (3 ≤ k ≤ 7).

  • center_points (int) – Number of centre-point replicates.

Returns:

Design matrix, shape (n_runs, n_factors) in normalised [-1, 1].

Return type:

np.ndarray

Raises:

ValueError – If n_factors is not in [3, 7].

perfusio.simulator.doe.central_composite(n_factors, alpha='rotatable', center_points=(4, 4))[source]

Generate a Central Composite Design (CCD) matrix.

Parameters:
  • n_factors (int) – Number of factors.

  • alpha (str) – Axial distance strategy: "rotatable" or "orthogonal".

  • center_points (tuple[int, int]) – (n_center_factorial, n_center_star) — centre-point replicates.

Returns:

Design matrix, shape (n_runs, n_factors) in normalised space. Factorial points at ±1; axial points at ±alpha_val; centre at 0.

Return type:

np.ndarray

perfusio.simulator.doe.latin_hypercube(n_runs, n_factors, seed=0, optimise=True)[source]

Generate a maximin Latin Hypercube Sample in [0, 1]^n_factors.

Parameters:
  • n_runs (int) – Number of runs.

  • n_factors (int) – Number of factors.

  • seed (int) – Random seed.

  • optimise (bool) – If True, use scipy.stats.qmc.LatinHypercube with optimization = "random-cd" for improved space filling (SciPy ≥ 1.7).

Returns:

LHC sample, shape (n_runs, n_factors) in [0, 1].

Return type:

np.ndarray

Notes

Map to physical bounds externally: x_phys = lo + (hi - lo) * x_lhc.

perfusio.simulator.doe.scale_to_bounds(design, lo, hi, normalised=True)[source]

Scale a normalised (or [0,1]) design to physical bounds.

Parameters:
  • design (ndarray) – Design matrix, shape (n_runs, n_factors).

  • lo (ndarray) – Lower bounds, shape (n_factors,).

  • hi (ndarray) – Upper bounds, shape (n_factors,).

  • normalised (bool) – If True, input is in [-1, 1]; else in [0, 1].

Returns:

Scaled design in physical units, same shape.

Return type:

np.ndarray

Noise model for the in-silico simulator.

Simulates typical analytical measurement uncertainty for a CHO perfusion bioreactor operating in an ambr®250 platform.

Calibrated to match Gadiyar et al. (2026) §3.3: - VCD: 5% CV (coefficient of variation) - Viability: 2% CV - Metabolites (Glc, Gln, Glu, Lac, Amm, Pyr): 7% CV - Titer: 8% CV - Missing-data probability: 5% per species per day - Measurement time jitter: ±1 hour (uniform)

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.3 — in-silico noise parameterisation.

class perfusio.simulator.noise.NoiseModel(cv_by_species=<factory>, default_cv=0.07, missing_prob=0.05, jitter_hours=1.0, seed=42)[source]

Bases: object

Additive proportional Gaussian noise + missing-data model.

Parameters:
  • cv_by_species (dict[str, float]) – Dict mapping species name → coefficient of variation (0–1). Species not listed use default_cv.

  • default_cv (float) – Fallback CV for unlisted species.

  • missing_prob (float) – Probability that a single (species, day) measurement is missing.

  • jitter_hours (float) – Half-width of uniform jitter on measurement time [h].

  • seed (int) – Random seed.

Examples

>>> from perfusio.simulator import NoiseModel
>>> nm = NoiseModel(seed=0)
>>> clean = {"VCD": 20.0, "Glc": 5.0}
>>> noisy = nm.apply(clean, day=7)
apply(state, day)[source]

Apply noise and missing-data to a clean state observation.

Parameters:
  • state (dict[str, float]) – Clean simulated state, dict mapping species name → value.

  • day (int) – Culture day (not used for noise level; reserved for future use).

Returns:

Noisy observation. Missing values are represented as None.

Return type:

dict[str, float | None]

cv_by_species: dict[str, float]
default_cv: float = 0.07
jitter_hours: float = 1.0
missing_prob: float = 0.05
seed: int = 42
time_jitter()[source]

Return a random measurement time offset [h].

Return type:

float

Digital Twin

Online digital twin — real-time hybrid model + BED decision loop.

DigitalTwin is the central coordinator that:

  1. Reads daily samples from a BioreactorConnector (hardware OPC UA or virtual ambr®250).

  2. Appends new observations to the training set and online-retrains the hybrid model (retrain_online()).

  3. Runs a 3-step-ahead forecast.

  4. Queries the BEDPolicy for the next setpoints.

  5. Checks predictive alarms via AlarmNotifier.

  6. Writes setpoints back to the connector (gated by allow_write).

  7. Records every event to the AuditLogger.

Usage — blocking (sync wrapper):

twin = DigitalTwin(connector=sim, hybrid=model, ...)
twin.step(day=7)              # single step (tests / notebooks)

Usage — fully async:

await twin.run(n_days=28)    # full 28-day run

Safety gate:

twin = DigitalTwin(..., allow_write=False)   # read-only mode (default)
twin = DigitalTwin(..., allow_write=True)    # closed-loop (operator must opt-in)

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.4 — Fig. 5.

[Mione2024]

Mione et al. (2024) — audit trail requirements.

class perfusio.twin.digital_twin.DigitalTwin(connector, hybrid, design_space, bed_policy, notifier=None, log_dir=PosixPath('perfusio_audit'), run_id='default', allow_write=False, species_names=None, retrain_every=1)[source]

Bases: object

Online digital twin with BED-driven closed-loop control.

Parameters:
  • connector (BioreactorConnector) – Bioreactor data connector (OPC UA, SQL, or virtual simulator).

  • hybrid (HybridStateSpaceModel) – Fitted HybridStateSpaceModel.

  • design_space (DesignSpace) – Process control design space.

  • bed_policy (BEDPolicy) – Configured BEDPolicy.

  • notifier (AlarmNotifier | None) – Alarm notifier. Optional.

  • log_dir (Path | str) – Directory for audit logs. Defaults to ./perfusio_audit.

  • run_id (str) – Unique identifier for this run.

  • allow_write (bool) – Safety gate — if False (default), setpoints are NOT written back.

  • species_names (list[str] | None) – Ordered species names matching the model output.

  • retrain_every (int) – Online-retrain the hybrid GP every N days. Default 1 (daily).

async run(n_days, interval_seconds=86400.0)[source]

Run the full digital twin for n_days asynchronously.

Parameters:
  • n_days (int) – Number of culture days.

  • interval_seconds (float) – Real-time delay between steps. Set to 0 for simulation.

Return type:

None

run_sync(n_days, interval_seconds=0.0)[source]

Run the full twin synchronously (blocking).

Parameters:
Return type:

None

step(day)[source]

Execute one daily step synchronously (blocking).

Parameters:

day (int) – Culture day.

Returns:

Summary with keys: "sample", "forecast", "decision".

Return type:

dict

Immutable audit logger for the digital twin.

Produces two artefacts per run: - audit_events.csv — human-readable event log (append-only). - audit_events.parquet — machine-readable columnar store for analysis.

Records conform to the audit-trail requirements discussed in:

Mione, A., et al. (2024). Regulatory considerations for self-driving bioprocesses. Process Biochemistry.

The audit log is append-only by design. Past records are never modified.

References

[Mione2024]

Mione et al. (2024).

class perfusio.twin.audit.AuditLogger(log_dir, run_id, flush_every=10)[source]

Bases: object

Append-only CSV + Parquet audit recorder.

Parameters:
  • log_dir (Path) – Directory in which to write audit files.

  • run_id (str) – Unique run identifier (written to every record).

  • flush_every (int) – Flush buffer to disk after this many events.

Examples

>>> from pathlib import Path
>>> from perfusio.twin.audit import AuditLogger
>>> audit = AuditLogger(log_dir=Path("/tmp/audit"), run_id="run001")
>>> audit.log("SETPOINT_CHANGE", {"perfusion_rate": 1.2})
>>> audit.close()
COLUMNS = ['timestamp', 'run_id', 'event_type', 'day', 'user', 'payload_json']
close()[source]

Flush remaining events and write a Parquet snapshot.

Return type:

None

log(event_type, payload, day=0, user='system')[source]

Append one audit event.

Parameters:
  • event_type (str) – E.g. "SETPOINT_CHANGE", "MODEL_RETRAIN", "ALARM_RAISED", "DECISION_MADE".

  • payload (dict[str, Any]) – Serialisable dict with event-specific data.

  • day (int) – Culture day at which the event occurred.

  • user (str) – Actor ("system" for automated events).

Return type:

None

Predictive constraint alarm system for the digital twin.

Raises alarms when the forecast trajectory is predicted to violate process constraints before the alarm occurs — giving operators lead time to intervene.

Alarm channels (configurable via AlarmConfig): - Console / logging (always active) - Email (SMTP, requires EMAIL_* environment variables) - Slack webhook (requires SLACK_WEBHOOK_URL)

References

[Gadiyar2026]

Gadiyar et al. (2026), §3.4 (constraint monitoring).

class perfusio.twin.notifications.AlarmEvent(day, species, predicted_value, threshold, direction, lead_days)[source]

Bases: object

A single predictive alarm event.

Parameters:
day: int
direction: str
lead_days: int
predicted_value: float
species: str
threshold: float
class perfusio.twin.notifications.AlarmNotifier(thresholds, lead_days=3, channels=None)[source]

Bases: object

Predictive alarm dispatcher for the digital twin.

Parameters:
  • thresholds (dict[str, tuple[float, float]]) – Dict mapping species name → (lo, hi) tuple. If a predicted value crosses lo (below) or hi (above) within lead_days, an alarm is raised.

  • lead_days (int) – How many days ahead to scan the forecast for violations.

  • channels (list[str] | None) – List of channel names: "log", "email", "slack".

Examples

>>> from perfusio.twin.notifications import AlarmNotifier
>>> notifier = AlarmNotifier(
...     thresholds={"Glc": (2.0, 8.0), "Amm": (0.0, 5.0)},
...     channels=["log"],
... )
check_forecast(forecast, species_names, current_day)[source]

Scan the horizon forecast for predicted constraint violations.

Parameters:
  • forecast (dict[str, Any]) – Output of forecast_run() — dict with "mean" key, shape (horizon, n_species).

  • species_names (list[str]) – Ordered list of species names matching the model output.

  • current_day (int) – Current culture day.

Returns:

All violations predicted within the lead window.

Return type:

list[AlarmEvent]

Daily decision scheduler for the digital twin.

Runs a 24-hour async cadence: read sample → update model → forecast → decide setpoints → write setpoints → log audit trail.

The scheduler is designed to be cancellable (asyncio.CancelledError is handled gracefully) and restartable.

References

[Gadiyar2026]

Gadiyar et al. (2026), Fig. 5 — closed-loop timing diagram.

class perfusio.twin.scheduler.DailyScheduler(step_fn, n_days, interval_seconds=86400.0, start_day=1)[source]

Bases: object

Async daily-cadence scheduler for the digital twin.

Parameters:
  • step_fn (Callable[[int], Coroutine[Any, Any, Any]]) – Coroutine factory step_fn(day: int) -> None — the work to execute each simulated day. Raises StopIteration to end run.

  • n_days (int) – Total number of days to run.

  • interval_seconds (float) – Real-time interval between steps (default 86400 = 24 h real time). Set to a small value for testing (e.g. 1 s).

  • start_day (int) – First day number.

Examples

>>> import asyncio
>>> from perfusio.twin.scheduler import DailyScheduler
>>> async def my_step(day: int) -> None:
...     print(f"Day {day}")
>>> scheduler = DailyScheduler(my_step, n_days=3, interval_seconds=0)
>>> asyncio.run(scheduler.run())
Day 1
Day 2
Day 3
async run()[source]

Execute the scheduler for all days.

Return type:

None

stop()[source]

Request a graceful stop after the current day completes.

Return type:

None

Connectors

Abstract base class for bioreactor connectors.

All connectors — OPC UA, SQL, filesystem, and the ambr®250 emulator — must implement this interface. The digital twin calls only the methods defined here, so the twin is fully connector-agnostic.

The interface matches the BioreactorConnector structural protocol, providing both inheritance and Protocol compatibility.

class perfusio.connectors.base.BioreactorConnectorBase[source]

Bases: ABC

Abstract base class for bioreactor data connectors.

Concrete connectors should subclass this and implement all abstract methods. The async methods are designed for use with asyncio; blocking connectors should wrap their I/O in asyncio.get_event_loop().run_in_executor().

abstractmethod async is_alive()[source]

Return True if the connection to the bioreactor is healthy.

Return type:

bool

abstractmethod async read_sample(day)[source]

Read daily offline sample for the given culture day.

Parameters:

day (int) – Culture day (1-indexed).

Returns:

Species name → measurement value mapping. None values indicate missing measurements.

Return type:

dict[str, Any]

abstractmethod async write_setpoints(setpoints)[source]

Write control setpoints to the bioreactor.

Parameters:

setpoints (dict[str, float]) – Control variable name → new value mapping.

Return type:

None

Notes

Implementations MUST honour the allow_write gate at the DigitalTwin level. The connector itself does not enforce this gate.

OPC UA connector for real bioreactor hardware.

Wraps the asyncua async OPC UA client with:

  • Auto-reconnect with exponential backoff (max 5 retries).

  • Browse-tree caching (resolved node IDs are stored to avoid repeated discovery).

  • Recorded-replay mode (saves/loads OPC UA values from a JSONL log file for offline testing and audit trail reconstruction).

  • Certificate-based and username/password authentication.

Required OPC UA node IDs must be configured via the node_map dict.

References

[OPCUA]

IEC 62541 — OPC Unified Architecture.

class perfusio.connectors.opcua_client.OPCUAConnector(url, node_map, username=None, password=None, cert_path=None, replay_log=None, max_retries=5, retry_delay_s=2.0)[source]

Bases: BioreactorConnectorBase

Async OPC UA client for real bioreactor hardware.

Parameters:
  • url (str) – OPC UA server URL, e.g. "opc.tcp://192.168.1.10:4840".

  • node_map (dict[str, str]) – Dict mapping species/control name → OPC UA node-ID string (e.g. "ns=2;s=VCD").

  • username (str | None) – Optional username for Basic256Sha256 auth.

  • password (str | None) – Optional password.

  • cert_path (str | None) – Optional path to PEM client certificate.

  • replay_log (Path | str | None) – If provided, read values from this JSONL file (replay mode) instead of connecting to a real server. Each line is a JSON object with keys day, node_id, value.

  • max_retries (int) – Maximum reconnection attempts.

  • retry_delay_s (float) – Initial delay between reconnection attempts (doubles each retry).

Examples

>>> from perfusio.connectors import OPCUAConnector
>>> conn = OPCUAConnector(
...     url="opc.tcp://localhost:4840",
...     node_map={"VCD": "ns=2;s=VCD", "Glc": "ns=2;s=Glucose"},
... )
async disconnect()[source]

Gracefully disconnect from the OPC UA server.

Return type:

None

async is_alive()[source]

Return True if the connection to the bioreactor is healthy.

Return type:

bool

async read_sample(day)[source]

Read daily offline sample for the given culture day.

Parameters:

day (int) – Culture day (1-indexed).

Returns:

Species name → measurement value mapping. None values indicate missing measurements.

Return type:

dict[str, Any]

async write_setpoints(setpoints)[source]

Write control setpoints to the bioreactor.

Parameters:

setpoints (dict[str, float]) – Control variable name → new value mapping.

Return type:

None

Notes

Implementations MUST honour the allow_write gate at the DigitalTwin level. The connector itself does not enforce this gate.

SQLAlchemy 2.0 data store for bioreactor run data.

Schema (8 tables): - experiment — top-level experiment metadata - reactor — individual reactor/vessel metadata - sample — offline daily sample measurements - setpoint_history— historical setpoint changes - model_run — GP/hybrid model training runs - forecast — multi-step forecasts stored for audit - audit_event — structured audit events (mirror of CSV log) - alarm_event — predictive alarms

Uses Alembic-compatible MetaData declarations. Run migrations with:

alembic upgrade head

References

[Mione2024]

Mione et al. (2024) — regulatory data integrity requirements.

class perfusio.connectors.sql_store.SQLStore(db_url, experiment_name, reactor_name, echo=False)[source]

Bases: BioreactorConnectorBase

SQLAlchemy-backed connector and data archive.

Parameters:
  • db_url (str) – SQLAlchemy database URL. E.g.: - "sqlite:///perfusio.db" - "postgresql+psycopg2://user:pass@host/db"

  • experiment_name (str) – Name of the experiment row to read/write.

  • reactor_name (str) – Name of the reactor row.

  • echo (bool) – If True, echo SQL to stdout.

Examples

>>> store = SQLStore("sqlite:///test.db", "Exp1", "R1")
>>> store.create_tables()
create_tables()[source]

Create all tables if they do not already exist.

Return type:

None

async is_alive()[source]

Return True if the connection to the bioreactor is healthy.

Return type:

bool

log_audit_event(run_id, event_type, payload, day=0, user='system')[source]

Insert an audit event record.

Parameters:
Return type:

None

async read_sample(day)[source]

Read all species measurements for the given culture day.

Parameters:

day (int)

Return type:

dict[str, Any]

write_sample(day, sample)[source]

Insert a sample record into the database.

Parameters:
Return type:

None

async write_setpoints(setpoints)[source]

Persist setpoint changes to the setpoint_history table.

Parameters:

setpoints (dict[str, float])

Return type:

None

Filesystem connector — CSV / Parquet / Excel data store.

Reads bioreactor samples from flat-file archives (e.g. exported from an ambr®250 UNICORN workstation or a LIMS export). Supports write-back to a rolling CSV log for closed-loop applications.

Expected CSV format:

day,VCD,Via,Glc,Gln,Glu,Lac,Amm,Pyr,Titer
1,2.3,98.1,4.8,...

or long-form Parquet with columns day, species, value.

class perfusio.connectors.filesystem.FilesystemStore(source_path, setpoint_log=None, wide_format=True)[source]

Bases: BioreactorConnectorBase

Read bioreactor data from CSV / Parquet / Excel files.

Parameters:
  • source_path (Path | str) – Path to the source data file (CSV, Parquet, or Excel).

  • setpoint_log (Path | str | None) – Optional path to append setpoint changes as CSV rows.

  • wide_format (bool) – If True (default), source is wide-format (one column per species). If False, source is long-format with day, species, value columns.

Examples

>>> from pathlib import Path
>>> store = FilesystemStore(Path("data/run01.csv"))
async is_alive()[source]

Return True if the connection to the bioreactor is healthy.

Return type:

bool

async read_sample(day)[source]

Return the sample recorded for the given culture day.

Parameters:

day (int)

Return type:

dict[str, Any]

async write_setpoints(setpoints)[source]

Append setpoints to the setpoint log CSV.

Parameters:

setpoints (dict[str, float])

Return type:

None

ambr®250 virtual emulator — full-featured digital twin of the Sartorius ambr®250.

Wraps CHOSimulator with the BioreactorConnectorBase interface, adding:

  • Configurable acceleration factor (up to 1000× real-time).

  • 5-reactor parallel emulation (as in Gadiyar Fig. 5).

  • Identical public API to the real OPC UA connector.

This allows the digital twin to run closed-loop simulations with no code changes when switching from simulation to real hardware.

class perfusio.connectors.ambr250_emulator.Ambr250Emulator(clone='CloneX', volume_L=0.25, controls=None, acceleration=1.0, seed=42)[source]

Bases: BioreactorConnectorBase

Virtual ambr®250 bioreactor emulator.

Parameters:
  • clone (str) – "CloneX" or "CloneY".

  • volume_L (float) – Reactor working volume. Default 0.250 (ambr®250 max).

  • controls (dict[str, float] | None) – Initial control setpoints.

  • acceleration (float) – Speed multiplier vs. real-time. Default 1 (real-time). Set to 1440 for 1 simulated day per real minute.

  • seed (int) – Random seed for the noise model.

Examples

>>> import asyncio
>>> from perfusio.connectors.ambr250_emulator import Ambr250Emulator
>>> em = Ambr250Emulator(clone="CloneX", seed=0)
>>> sample = asyncio.run(em.read_sample(day=5))
>>> sample["VCD"]
classmethod five_reactor_ensemble(clones=None, seed_start=0)[source]

Instantiate 5 virtual reactors (Gadiyar Fig. 5 set-up).

Parameters:
  • clones (list[str] | None) – List of 5 clone names. Defaults to 3× CloneX + 2× CloneY.

  • seed_start (int) – Seed for the first reactor; each subsequent reactor gets seed+1.

Returns:

Five fully independent virtual reactors.

Return type:

list[Ambr250Emulator]

async is_alive()[source]

Always True for the virtual reactor.

Return type:

bool

async read_sample(day)[source]

Return a noisy sample from the virtual reactor at day.

Parameters:

day (int)

Return type:

dict[str, Any]

simulate_run(n_days=28)[source]

Run a clean (noiseless) N-day simulation and return trajectory.

Parameters:

n_days (int)

Return type:

Any

async write_setpoints(setpoints)[source]

Update control setpoints on the virtual reactor.

Parameters:

setpoints (dict[str, float])

Return type:

None

Metrics

Relative Root Mean Squared Error over a prediction horizon.

Implements Gadiyar et al. (2026) Equations 5–6:

\[\text{rRMSE}_{k,h} = \frac{1}{\sigma_k} \sqrt{ \frac{1}{R} \sum_{r=1}^{R} \left( \hat{y}_{r,h,k} - y_{r,h,k} \right)^2 }\]

where \(h\) is the prediction horizon step, \(k\) is the species index, \(R\) is the number of runs and \(\sigma_k\) is the standard deviation of the measured species computed across all runs and all time steps (the “normalisation scale” of Gadiyar Eq. 6).

A truncated tail variant is also provided: only the first n_tail time steps are excluded from normalisation to avoid division-by-small-variance artefacts at inoculation.

References

[Gadiyar2026]

Gadiyar et al. (2026), Eqs. (5)–(6).

perfusio.metrics.rrmse.rrmse_horizon(true, pred, horizon=None, n_tail=0, eps=1e-08)[source]

Relative RMSE across runs, time steps and species.

Parameters:
  • true (Tensor) – Ground-truth observations, shape (n_runs, T, n_species).

  • pred (Tensor) – Model predictions (mean), shape (n_runs, T, n_species).

  • horizon (int | None) – If given, only the first horizon time steps are evaluated. This allows head-to-head comparison of 1-step vs. 3-step models.

  • n_tail (int) – Number of leading time steps to exclude from the normalisation denominator (avoids division by near-zero early-run variance).

  • eps (float) – Small constant added to the normalisation denominator for numerical stability.

Returns:

Shape (T, n_species) if horizon is None, else (horizon, n_species). Element [h, k] is the \(\text{rRMSE}_{k,h}\).

Return type:

Tensor

Examples

>>> import torch
>>> from perfusio.metrics import rrmse_horizon
>>> y = torch.randn(5, 10, 9)
>>> yhat = y + 0.1 * torch.randn_like(y)
>>> rr = rrmse_horizon(y, yhat, horizon=3)
>>> rr.shape
torch.Size([3, 9])

Probabilistic forecast calibration metrics.

Provides: - pi_coverage(): empirical coverage of prediction intervals. - sharpness(): mean width of prediction intervals (lower = sharper). - crps(): Continuous Ranked Probability Score (lower = better).

All functions accept batched inputs to facilitate per-species analysis.

References

[Gneiting2007]

Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. JASA 102(477).

perfusio.metrics.coverage.crps(true, samples)[source]

Energy-form Continuous Ranked Probability Score (CRPS).

Uses the energy score decomposition:

\[\text{CRPS} = \mathbb{E}[|X - y|] - \frac{1}{2}\mathbb{E}[|X - X'|]\]

where \(X, X'\) are i.i.d. draws from the predictive distribution.

Parameters:
  • true (Tensor) – Observations, shape (N, n_species).

  • samples (Tensor) – Posterior predictive samples, shape (N, S, n_species) where S is the number of samples.

Returns:

Mean CRPS per species, shape (n_species,).

Return type:

Tensor

Notes

An energy-score estimate is used rather than the exact Gaussian CRPS to support non-Gaussian predictive distributions from the MC rollout.

perfusio.metrics.coverage.pi_coverage(true, q_lo, q_hi)[source]

Empirical coverage of prediction intervals.

Parameters:
  • true (Tensor) – Observed values, shape (..., n_species).

  • q_lo (Tensor) – Lower quantile predictions, same shape as true.

  • q_hi (Tensor) – Upper quantile predictions, same shape as true.

Returns:

Coverage probability per species, shape (n_species,). Value 1.0 = all observations within the PI.

Return type:

Tensor

Examples

>>> import torch
>>> from perfusio.metrics import pi_coverage
>>> y = torch.tensor([[1.0, 2.0], [1.5, 2.5], [0.5, 1.5]])
>>> lo = torch.zeros_like(y)
>>> hi = torch.full_like(y, 3.0)
>>> pi_coverage(y, lo, hi)
tensor([1., 1.])
perfusio.metrics.coverage.sharpness(q_lo, q_hi)[source]

Mean prediction interval width per species.

Parameters:
  • q_lo (Tensor) – Lower quantile, shape (..., n_species).

  • q_hi (Tensor) – Upper quantile, shape (..., n_species).

Returns:

Mean width per species, shape (n_species,).

Return type:

Tensor

Multi-objective performance metrics.

Provides: - hypervolume_indicator(): hypervolume dominated by a Pareto front. - igd_plus(): Inverted Generational Distance+ (IGD+). - epsilon_indicator(): Additive ε-indicator.

These metrics are used to assess the quality of the Pareto front returned by the multi-objective BED loop (qEHVI / qNEHVI / qNParEGO).

References

[Ishibuchi2015]

Ishibuchi, H., et al. (2015). Modified Distance Calculation in Generational Distance and Inverted Generational Distance. EMO.

[Zitzler2003]

Zitzler, E., Thiele, L., & Bader, J. (2010). On Set-Based Multiobjective Optimization. IEEE TEC 14(1).

perfusio.metrics.multiobjective.epsilon_indicator(approx, reference)[source]

Additive ε-indicator.

The smallest ε such that the approximation set ε-dominates the reference set (i.e. every reference point has an approximation point within ε).

\[\varepsilon(A, R) = \max_{r \in R} \min_{a \in A} \max_k (r_k - a_k)\]
Parameters:
  • approx (Tensor) – Approximation Pareto front, shape (M, n_obj).

  • reference (Tensor) – True (or best-known) Pareto front, shape (K, n_obj).

Returns:

ε-indicator (lower is better).

Return type:

float

perfusio.metrics.multiobjective.hypervolume_indicator(Y, ref_point)[source]

Hypervolume dominated by Y w.r.t. ref_point.

Delegates to hypervolume().

Parameters:
  • Y (Tensor) – Objective matrix (higher is better), shape (N, n_objectives).

  • ref_point (Tensor) – Reference point, shape (n_objectives,).

Return type:

float

perfusio.metrics.multiobjective.igd_plus(approx, reference)[source]

Inverted Generational Distance+ (IGD+).

Measures the average modified distance from each reference-set point to the nearest point in the approximation set.

\[\text{IGD+}(A, R) = \frac{1}{|R|} \sum_{r \in R} \min_{a \in A} \left\| \max(r - a, 0) \right\|_2\]
Parameters:
  • approx (Tensor) – Approximation Pareto front, shape (M, n_obj).

  • reference (Tensor) – True (or best-known) Pareto front, shape (K, n_obj).

Returns:

Mean IGD+ (lower is better, 0 = perfect approximation).

Return type:

float

Visualisation

Static Matplotlib reproductions of Gadiyar et al. (2026) Figures 4, 6, 7, 8.

Each function returns a matplotlib.figure.Figure and includes an alt_text parameter to generate an accessibility string for the figure.

Figure map

  • Fig. 4: Training experiment trajectories (Box-Behnken 24 runs, Clone X + Y).

  • Fig. 6: 3-step-ahead hybrid model predictions vs. observations (all species).

  • Fig. 7: Multi-objective Pareto front (titer vs. VCV at day 14).

  • Fig. 8: Closed-loop control performance — VCD, Glc, Titer over 28 days.

References

[Gadiyar2026]

Gadiyar et al. (2026), Figures 4, 6, 7, 8.

perfusio.viz.static.fig4_training_trajectories(runs, species='VCD', clone_labels=None, alt_text=False)[source]

Reproduce Gadiyar Fig. 4 — training experiment trajectories.

Parameters:
  • runs (list[dict[str, Any]]) – List of run dicts as returned by generate_box_behnken_experiment().

  • species (str) – Species to plot on the y-axis. Default "VCD".

  • clone_labels (list[str] | None) – Optional run-level labels for the legend.

  • alt_text (bool) – If True, prints an accessibility alt-text description.

Return type:

matplotlib.figure.Figure

perfusio.viz.static.fig6_model_predictions(true_traj, pred_mean, pred_q10, pred_q90, species_names, horizon=3, alt_text=False)[source]

Reproduce Gadiyar Fig. 6 — 3-step hybrid model predictions.

Parameters:
  • true_traj (Any) – Tensor/array, shape (T, n_species).

  • pred_mean (Any) – Predictions, shape (T, n_species).

  • pred_q10 (Any) – Predictions, shape (T, n_species).

  • pred_q90 (Any) – Predictions, shape (T, n_species).

  • species_names (list[str]) – List of species labels.

  • horizon (int) – Prediction horizon (drawn as shaded region).

  • alt_text (bool) – If True, prints an accessibility alt-text description.

Return type:

matplotlib.figure.Figure

perfusio.viz.static.fig7_pareto_front(pareto_titer, pareto_vcv, feasible_titer=None, feasible_vcv=None, alt_text=False)[source]

Reproduce Gadiyar Fig. 7 — Pareto front: titer vs. VCV.

Parameters:
  • pareto_titer (Any) – Titer values on the Pareto front.

  • pareto_vcv (Any) – VCV (or VCD) values on the Pareto front.

  • feasible_titer (Any | None) – All feasible evaluated points (background scatter).

  • feasible_vcv (Any | None) – All feasible evaluated points (background scatter).

  • alt_text (bool) – If True, prints an accessibility alt-text description.

Return type:

matplotlib.figure.Figure

perfusio.viz.static.fig8_closed_loop_performance(days, vcd, vcd_target, glc, glc_target, titer, titer_target, alt_text=False)[source]

Reproduce Gadiyar Fig. 8 — closed-loop control performance.

Parameters:
  • days (Any) – Culture days (x-axis).

  • vcd (Any) – Measured trajectories.

  • glc (Any) – Measured trajectories.

  • titer (Any) – Measured trajectories.

  • vcd_target (float) – Setpoint targets drawn as horizontal dashed lines.

  • glc_target (float) – Setpoint targets drawn as horizontal dashed lines.

  • titer_target (float) – Setpoint targets drawn as horizontal dashed lines.

  • alt_text (bool) – If True, prints an accessibility alt-text description.

Return type:

matplotlib.figure.Figure

Plotly interactive figures for perfusio.

Functions

perfusio.viz.interactive.acquisition_surface(x1, x2, acq_values, x1_label='Perfusion rate (vvd)', x2_label='Bleed rate (vvd)', title='Acquisition function surface')[source]

2-D acquisition function heat-map as an interactive contour plot.

Parameters:
  • x1 (ndarray) – 1-D arrays of grid values (will be meshed internally).

  • x2 (ndarray) – 1-D arrays of grid values (will be meshed internally).

  • acq_values (ndarray) – 2-D array of acquisition values, shape (len(x1), len(x2)).

  • x1_label (str) – Axis labels.

  • x2_label (str) – Axis labels.

  • title (str) – Figure title.

Return type:

plotly.graph_objects.Figure

perfusio.viz.interactive.forecast_figure(days, mean, q10, q90, observed_days=None, observed_values=None, species='VCD', title='3-step forecast')[source]

Plotly figure with GP mean and 80% prediction interval ribbon.

Parameters:
  • days (list[int] | ndarray) – Forecast horizon days.

  • mean (ndarray) – Model predictions, 1-D arrays.

  • q10 (ndarray) – Model predictions, 1-D arrays.

  • q90 (ndarray) – Model predictions, 1-D arrays.

  • observed_days (list[int] | ndarray | None) – Optional measured data to overlay.

  • observed_values (ndarray | None) – Optional measured data to overlay.

  • species (str) – Species label.

  • title (str) – Figure title.

Return type:

plotly.graph_objects.Figure

perfusio.viz.interactive.pareto_scatter(titer, vcv, is_pareto=None, hover_text=None, title='Pareto front Titer vs. VCV')[source]

Interactive Pareto front scatter plot.

Parameters:
  • titer (ndarray) – Titer values for all evaluated points.

  • vcv (ndarray) – VCV (viable cell volume) values.

  • is_pareto (ndarray | None) – Boolean mask; Pareto-optimal points are highlighted.

  • hover_text (list[str] | None) – Optional hover labels per point.

  • title (str) – Figure title.

Return type:

plotly.graph_objects.Figure

perfusio.viz.interactive.trajectory_figure(runs, species='VCD', title='Perfusion run trajectories')[source]

Interactive Plotly line chart of multiple run trajectories.

Parameters:
  • runs (list[dict[str, Any]]) – List of run dicts with keys run_id and trajectory (ndarray).

  • species (str) – Species to display.

  • title (str) – Figure title.

Return type:

plotly.graph_objects.Figure

Chemistry Utilities

Discrete mass-balance implementation for perfusion bioreactors.

Implements Equations 2–4 of Gadiyar et al. (2026) in vectorised, batch-aware PyTorch code. The fundamental relation is:

\[\frac{\mathrm{d}(V \cdot c_k)}{\mathrm{d}t} = u_{f,k} - u_{b,k} - u_{p,k} + R_k \cdot V\]

where

  • \(c_k\) is the concentration of species \(k\) [g L⁻¹ or mmol L⁻¹],

  • \(V\) is the working volume [L],

  • \(u_{f,k}\) is the molar/mass feed rate [mmol h⁻¹ or g h⁻¹],

  • \(u_{b,k}\) is the bleed removal rate,

  • \(u_{p,k}\) is the permeate (harvest) removal rate,

  • \(R_k\) is the volumetric net production rate [mmol L⁻¹ h⁻¹ or g L⁻¹ h⁻¹].

In the discrete daily step form (Eq. 3 of the paper, rearranged):

\[R_k(s_i) = \frac{c_{i+1,k} - c_{i,k}}{\Delta t} - \frac{1}{V_i} \left[ u_{f,k} - u_{b,k} - u_{p,k} - c_{i,k} \frac{\Delta V}{\Delta t} \right]\]

This module computes \(R_k\) analytically from observed trajectories (for GP training) and also integrates \(c_{i+1}\) forward given \(R_k\) (for forecast rollout).

References

[Gadiyar2026]

Gadiyar et al. (2026), Equations 2–4.

class perfusio.chemistry.balances.DiscreteMassBalance(species, dt_hours=24.0)[source]

Bases: object

Vectorised implementation of Gadiyar et al. (2026), Eq. 3.

For each species \(k\):

\[R_k(s_i) = \frac{c_{i+1,k} - c_{i,k}}{\Delta t} - \frac{1}{V_i} \left[ u_{f,k} - u_{b,k} - u_{p,k} - c_{i,k} \frac{\Delta V}{\Delta t} \right]\]
Parameters:
  • species (Sequence[Species]) – Ordered sequence of Species objects whose concentrations are tracked by the balance.

  • dt_hours (float) – Discrete time step in hours. Default is 24 h (daily sampling).

Notes

Volume convention: For ambr®250 perfusion reactors in constant-volume mode, \(\Delta V / \Delta t = 0\) because the harvest rate equals perfusion_rate bleed_rate. The term is included here for generality but will be zero in most practical use cases.

Units: All concentrations must be in the same base unit throughout a given trajectory (e.g. all in mmol L⁻¹). The balances do not convert units. VCD is conventionally in 10⁶ cells mL⁻¹; the rate \(R_\text{VCD}\) will therefore be in 10⁶ cells mL⁻¹ h⁻¹.

Examples

Round-trip test: rates_from_observations then step should reconstruct the original trajectory within floating-point tolerance.

>>> import torch
>>> from perfusio.chemistry.species import SpeciesRegistry
>>> reg = SpeciesRegistry.DEFAULT
>>> species = [reg["VCD"], reg["Glc"]]
>>> bal = DiscreteMassBalance(species, dt_hours=24.0)
>>> # Construct a noiseless 5-day trajectory
>>> T, K = 5, 2
>>> c = torch.ones(T, K, dtype=torch.float64)
>>> u_f = torch.zeros(T - 1, K, dtype=torch.float64)
>>> u_b = torch.zeros(T - 1, K, dtype=torch.float64)
>>> u_p = torch.zeros(T - 1, K, dtype=torch.float64)
>>> V = torch.full((T,), 0.250, dtype=torch.float64)
>>> R = bal.rates_from_observations(c, V, u_f, u_b, u_p)
>>> c_recon = bal.step(c[0], R[0], V[0], u_f[0], u_b[0], u_p[0], 24.0)
>>> torch.allclose(c_recon, c[1], atol=1e-10)
True
rates_from_observations(c, V, u_f, u_b, u_p)[source]

Compute net volumetric rates \(R_k(s_i)\) from observations.

This implements Eq. 3 of Gadiyar et al. (2026) (rearranged to solve for \(R\)), vectorised over all species and all time steps.

Parameters:
  • c (Tensor) – Concentration tensor, shape (T, n_species) [same unit as the corresponding Species].

  • V (Tensor) – Working volume tensor, shape (T,) [L].

  • u_f (Tensor) – Feed rate tensor, shape (T-1, n_species) [concentration-unit × L h⁻¹ = amount h⁻¹]. Set to zero for species with no direct feed stream.

  • u_b (Tensor) – Bleed removal rate, shape (T-1, n_species).

  • u_p (Tensor) – Permeate (harvest) removal rate, shape (T-1, n_species).

Returns:

Shape (T-1, n_species) — one rate vector per time interval. Units are [concentration-unit h⁻¹].

Return type:

Tensor

Raises:

ValueError – If tensor shapes are inconsistent.

Notes

The derivation follows directly from Eq. 3 of the paper. Rearranging the discrete balance:

\[R_k = \frac{c_{i+1,k} - c_{i,k}}{\Delta t} - \frac{u_{f,k} - u_{b,k} - u_{p,k}}{V_i} + c_{i,k} \frac{V_{i+1} - V_i}{V_i \Delta t}\]

The last term vanishes in constant-volume mode.

rates_from_trajectory(traj, species_indices=None)[source]

Convenience wrapper: extract rates from a Trajectory.

Parameters:
  • traj (Trajectory) – A full run trajectory.

  • species_indices (list[int] | None) – Column indices to extract from traj.species. If None, all columns are used.

Returns:

Shape (T-1, n_selected_species).

Return type:

Tensor

property species: list[Species]

Ordered list of tracked species.

step(c_t, R_t, V_t, u_f, u_b, u_p, dt=None)[source]

Integrate one discrete step forward (Eq. 2 of Gadiyar et al. 2026).

Computes \(c_{t+1}\) from \(c_t\), \(R_t\), and the mass-flow terms.

\[c_{t+1,k} = c_{t,k} + \Delta t \left[ R_k + \frac{u_{f,k} - u_{b,k} - u_{p,k}}{V_t} - c_{t,k} \frac{\Delta V}{V_t \Delta t} \right]\]
Parameters:
  • c_t (Tensor) – Concentration at time t, shape (n_species,) or (B, n_species) for a batch.

  • R_t (Tensor) – Net volumetric rate at time t, same shape as c_t.

  • V_t (Tensor) – Working volume at time t [L], scalar or shape (B,).

  • u_f (Tensor) – Feed rate at time t, same shape as c_t.

  • u_b (Tensor) – Bleed rate at time t, same shape as c_t.

  • u_p (Tensor) – Permeate rate at time t, same shape as c_t.

  • dt (float | None) – Override for the time step [h]. Defaults to dt_hours.

Returns:

\(c_{t+1}\), same shape as c_t.

Return type:

Tensor

Notes

In constant-volume mode, pass V_t as a scalar and dV=0. The term c_t * dV / (V_t * dt) vanishes automatically because the volume change is zero.

Examples

>>> import torch
>>> from perfusio.chemistry.species import SpeciesRegistry
>>> reg = SpeciesRegistry.DEFAULT
>>> bal = DiscreteMassBalance([reg["VCD"], reg["Glc"]], dt_hours=24.0)
>>> c = torch.tensor([10.0, 3.0], dtype=torch.float64)
>>> R = torch.tensor([0.5, -0.1], dtype=torch.float64)
>>> V = torch.tensor(0.250, dtype=torch.float64)
>>> u = torch.zeros(2, dtype=torch.float64)
>>> c_next = bal.step(c, R, V, u, u, u)
>>> c_next.shape
torch.Size([2])
class perfusio.chemistry.balances.FlowRateCalculator(volume_L=0.25, dt_hours=24.0)[source]

Bases: object

Compute molar/mass flow rates for feed, bleed, and permeate streams.

Translates volumetric flow settings (in vessel volumes per day, vvd) into the per-species flow rates needed by DiscreteMassBalance.

Parameters:
  • volume_L (float) – Nominal reactor working volume [L]. For ambr®250, 0.250 L.

  • dt_hours (float) – Discrete time step [h]. Default 24.0 (daily).

Notes

In constant-volume perfusion mode:

\[u_{b,k} = \text{bleed\_rate} \cdot V \cdot c_k / \Delta t\]
\[u_{p,k} = (\text{perf\_rate} - \text{bleed\_rate}) \cdot V \cdot c_k \cdot (1 - \sigma_k) / \Delta t\]

where \(\sigma_k \in [0,1]\) is the sieving coefficient for species \(k\) through the alternating tangential flow (ATF) filter. For cells, \(\sigma_{\text{cell}} \approx 0\) (perfect retention). For metabolites, \(\sigma \approx 1\) (free passage).

compute_flows(c, perfusion_rate_vvd, bleed_rate_vvd, c_feed=None, sieving_coeffs=None)[source]

Compute (u_f, u_b, u_p) for a single time step.

Parameters:
  • c (Tensor) – Current concentrations, shape (K,) [concentration units].

  • perfusion_rate_vvd (float | Tensor) – Perfusion rate [vessel volumes per day].

  • bleed_rate_vvd (float | Tensor) – Bleed rate [vvd].

  • c_feed (Tensor | None) – Feed stream concentrations, shape (K,). None implies zero.

  • sieving_coeffs (Tensor | None) – Per-species sieving coefficient through ATF/TFF filter, shape (K,). 0 = fully retained (cells), 1 = freely passing (metabolites). Default: 1.0 for all (no retention).

Returns:

(u_f, u_b, u_p) each shape (K,) in [concentration-unit × L / h].

Return type:

tuple[Tensor, Tensor, Tensor]

Notes

A harvest rate \(\text{harvest} = \text{perf} - \text{bleed}\) is applied only to freely passing species (\(\sigma_k = 1\)). Cell- retained species (\(\sigma_k = 0\)) only leave via the bleed stream.

Species registry for perfusion bioreactor models.

This module defines the canonical set of process variables tracked by perfusio. All modules reference species by their canonical short name (e.g. "VCD") to ensure consistent ordering in tensors.

Biological context

The species list covers all variables measured in a typical ambr®250 perfusion run using a Cedex Bio HT or Nova BioProfile FLEX2 at-line analyser, plus diameter measured by a Vi-Cell or Cedex HiRes.

References

[Gadiyar2026]

Gadiyar et al. (2026), §2.1 — “Process variables” table.

Notes

DCD (dead cell density) and Lys (lysed cell density) are derived from VCD, total cell count, and viability; they are included as species for GP modelling but are not independent degrees of freedom.

class perfusio.chemistry.species.Species(short_name, full_name, unit, lo, hi, is_measured=True, is_control=False)[source]

Bases: object

Metadata for a single bioreactor species.

Parameters:
  • short_name (str) – Canonical identifier used in tensor column names (e.g. "VCD").

  • full_name (str) – Human-readable name for axis labels.

  • unit (str) – Physical unit string for display (e.g. "10⁶ cells mL⁻¹").

  • lo (float) – Physically plausible lower bound.

  • hi (float) – Physically plausible upper bound.

  • is_measured (bool) – Whether this species is directly measured (True) or derived (False).

  • is_control (bool) – Whether this is a state variable that can be indirectly controlled via a feed (e.g. Glc via glucose setpoint).

Examples

>>> sp = Species(
...     short_name="VCD",
...     full_name="Viable Cell Density",
...     unit="10⁶ cells mL⁻¹",
...     lo=0.0,
...     hi=120.0,
... )
>>> sp.short_name
'VCD'
full_name: str
hi: float
is_control: bool = False
is_measured: bool = True
lo: float
short_name: str
unit: str
class perfusio.chemistry.species.SpeciesEnum(*values)[source]

Bases: str, Enum

Enumeration of all canonical species short names.

AMM = 'Amm'
DCD = 'DCD'
DIAM = 'Diam'
GLC = 'Glc'
GLN = 'Gln'
GLU = 'Glu'
LAC = 'Lac'
LYS = 'Lys'
PYR = 'Pyr'
TITER = 'Titer'
VCD = 'VCD'
VCV = 'VCV'
VIA = 'Via'
class perfusio.chemistry.species.SpeciesRegistry(species)[source]

Bases: object

Singleton registry of all process variables.

This is the single source of truth for the canonical species list used throughout perfusio. Obtain the default registry via SpeciesRegistry.DEFAULT.

Parameters:

species (list[Species])

DEFAULT

The library-wide default registry matching the paper’s species table.

Type:

ClassVar[perfusio.chemistry.species.SpeciesRegistry]

Examples

>>> reg = SpeciesRegistry.DEFAULT
>>> reg.names
['VCD', 'VCV', 'Via', 'Diam', 'Glc', 'Gln', 'Glu', 'Lac', 'Amm', 'Pyr', 'Titer', 'DCD', 'Lys']
>>> reg["VCD"].unit
'10⁶ cells mL⁻¹'
DEFAULT: ClassVar[SpeciesRegistry] = <perfusio.chemistry.species.SpeciesRegistry object>
property derived: list[Species]

Species derived from measured values.

index(name)[source]

Return the 0-based column index for name.

Parameters:

name (str) – Short name of the species.

Return type:

int

Raises:

KeyError – If name is not in this registry.

property measured: list[Species]

Species that are directly measured at-line.

property names: list[str]

Ordered list of all short names.

subset(names)[source]

Return a sub-registry containing only the listed species.

Parameters:

names (list[str]) – Ordered list of short names to include.

Return type:

SpeciesRegistry

Raises:

KeyError – If any requested name is not in this registry.

Perfusion volume bookkeeping for ambr®250-style constant-volume operation.

In perfusion culture, the reactor working volume is kept approximately constant by balancing the perfusion (feed) rate against the sum of the bleed rate and the harvest (permeate) rate:

\[\frac{\mathrm{d}V}{\mathrm{d}t} = F_{\text{feed}} - F_{\text{bleed}} - F_{\text{harvest}} \approx 0\]

Hence:

\[F_{\text{harvest}} = F_{\text{feed}} - F_{\text{bleed}}\]

When a VCV setpoint is exceeded, the bleed rate is transiently increased to remove excess biomass (and cells that are retained by the ATF/TFF filter).

References

[Gadiyar2026]

Gadiyar et al. (2026), §2 (“Perfusion process description”).

perfusio.chemistry.volumes.bleed_trigger(vcv_current, vcv_setpoint, bleed_rate_base_vvd, bleed_rate_max_vvd, k_bleed=0.5)[source]

Compute a VCV-controlled bleed rate (proportional controller).

When vcv_current > vcv_setpoint, the bleed rate is increased proportionally to the deviation. This mimics the simple feedback controller used in ambr®250 automated perfusion protocols.

\[F_{\text{bleed}} = F_{\text{base}} + k_{\text{bleed}} \cdot \max(\text{VCV} - \text{VCV}_{\text{sp}}, 0)\]
Parameters:
  • vcv_current (float | Tensor) – Current volumetric cell volume [%].

  • vcv_setpoint (float) – Target VCV [%].

  • bleed_rate_base_vvd (float) – Base bleed rate [vvd] applied even at setpoint.

  • bleed_rate_max_vvd (float) – Maximum allowable bleed rate [vvd].

  • k_bleed (float) – Proportional gain [vvd per % VCV deviation].

Returns:

Bleed rate [vvd], clamped to [bleed_rate_base_vvd, bleed_rate_max_vvd].

Return type:

float

Examples

>>> round(bleed_trigger(35.0, 30.0, 0.10, 0.40, k_bleed=0.05), 4)
0.35
perfusio.chemistry.volumes.constant_volume_harvest_rate(perfusion_rate_L_per_h, bleed_rate_L_per_h)[source]

Return the harvest rate that maintains constant volume.

\[F_{\text{harvest}} = F_{\text{feed}} - F_{\text{bleed}}\]
Parameters:
  • perfusion_rate_L_per_h (float | Tensor) – Volumetric perfusion rate [L h⁻¹].

  • bleed_rate_L_per_h (float | Tensor) – Volumetric bleed rate [L h⁻¹].

Returns:

Scalar — harvest rate [L h⁻¹].

Return type:

Tensor

Raises:

ValueError – If bleed_rate > perfusion_rate (physically impossible steady state).

Examples

>>> import torch
>>> F_h = constant_volume_harvest_rate(0.250/24, 0.250*0.15/24)
>>> round(float(F_h / (0.250 / 24)), 4)
0.85
perfusio.chemistry.volumes.perfusion_volume_step(V_t, F_feed_L_per_h, F_bleed_L_per_h, F_harvest_L_per_h, dt_hours=24.0)[source]

Compute the reactor volume at time t+1 from volumetric flow rates.

For constant-volume operation, the caller should ensure that F_feed F_bleed + F_harvest, in which case this function returns a volume very close to \(V_t\).

Parameters:
  • V_t (Tensor | float) – Working volume at time t [L].

  • F_feed_L_per_h (float | Tensor) – Volumetric perfusion (feed) rate [L h⁻¹].

  • F_bleed_L_per_h (float | Tensor) – Volumetric bleed rate [L h⁻¹].

  • F_harvest_L_per_h (float | Tensor) – Volumetric permeate (harvest) rate [L h⁻¹].

  • dt_hours (float) – Time step [h]. Default 24.0.

Returns:

Scalar tensor with the volume at t+1 [L].

Return type:

Tensor

Notes

\(V_{t+1} = V_t + \Delta t (F_{\text{feed}} - F_{\text{bleed}} - F_{\text{harvest}})\)

If the result is negative (physically impossible), a RuntimeError is raised. This should never occur under correct operating conditions.

Examples

>>> import torch
>>> V_next = perfusion_volume_step(0.250, 0.250/24, 0.250*0.15/24, 0.250*0.85/24)
>>> abs(float(V_next) - 0.250) < 1e-9
True

Configuration & Types

Pydantic-v2 configuration models for perfusio.

This module provides validated, strongly-typed configuration objects used throughout the library. All physical bounds are stored here so that they serve as a single source of truth for the design space.

Classes

SpeciesBounds

Lower and upper measurement bounds for a single process variable.

DesignSpace

Full description of the operating envelope for a perfusion experiment, including control variable ranges and target criteria.

RunConfig

Runtime configuration (device, dtype, seeds, logging level).

AlarmConfig

Thresholds that trigger predictive constraint-violation notifications.

class perfusio.config.AlarmConfig(*, viability_warning=97.0, viability_critical=95.0, ammonium_warning=8.0, ammonium_critical=12.0, glucose_warning=1.0, glucose_critical=0.3, notification_channel='log', alarm_file=None)[source]

Bases: BaseModel

Thresholds that trigger predictive constraint-violation notifications.

Parameters:
  • viability_warning (float) – Predicted viability [%] below which a WARNING alarm is raised.

  • viability_critical (float) – Predicted viability [%] below which a CRITICAL alarm is raised.

  • ammonium_warning (float) – Predicted ammonium [mmol L⁻¹] above which a WARNING is raised.

  • ammonium_critical (float) – Critical ammonium threshold.

  • glucose_warning (float) – Predicted glucose [g L⁻¹] below which a WARNING is raised (impending starvation).

  • glucose_critical (float) – Critical glucose starvation threshold.

  • notification_channel (Literal['log', 'file', 'email', 'slack']) – Where to send alarms. Supported: "log" (structlog), "file". "email" and "slack" require environment variable configuration.

  • alarm_file (Path | None) – Path to append alarm records when notification_channel == "file".

Notes

This class deliberately does not support email or Slack credentials inline. Provide them via environment variables PERFUSIO_SMTP_* or PERFUSIO_SLACK_WEBHOOK_URL, respectively.

alarm_file: Path | None
ammonium_critical: float
ammonium_warning: float
glucose_critical: float
glucose_warning: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

notification_channel: Literal['log', 'file', 'email', 'slack']
viability_critical: float
viability_warning: float
class perfusio.config.ControlBounds(*, lo, hi, unit, description='', hard_lower=True, hard_upper=True)[source]

Bases: BaseModel

Bounds on a single manipulated (control) variable.

Parameters:
  • lo (float) – Minimum allowable setpoint.

  • hi (float) – Maximum allowable setpoint.

  • unit (str) – Physical unit of the control variable.

  • description (str) – Human-readable name.

  • hard_lower (bool) – If True, the optimizer will never suggest values below lo. Used to encode cell-death constraints (e.g. stir < 700 rpm kills cells).

  • hard_upper (bool) – Analogous upper hard constraint.

Notes

Hard constraints are encoded as bounds passed to botorch.optim.optimize_acqf(), not as penalty terms in the acquisition function. Penalty-based approaches violate the BoTorch API and can produce numerical artefacts.

description: str
hard_lower: bool
hard_upper: bool
hi: float
lo: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

unit: str
class perfusio.config.DesignSpace(*, name, control_bounds=<factory>, species_bounds=<factory>, viability_min=95.0, vcv_target=None, titer_target=None)[source]

Bases: BaseModel

Full description of the operating envelope for a perfusion experiment.

This object encodes both the measurable state bounds and the control variable ranges that are passed to the Bayesian Experimental Design optimizer. It is the single source of truth for all optimize_acqf calls.

Parameters:
  • name (str) – Experiment name (used in audit logs and figure titles).

  • control_bounds (dict[str, ControlBounds]) – Mapping of control variable name → ControlBounds.

  • species_bounds (dict[str, SpeciesBounds]) – Mapping of species name → SpeciesBounds.

  • viability_min (float) – Minimum acceptable viability (%) for feasibility constraints. Default 95.0 matches the paper’s constraint.

  • vcv_target (float | None) – Target volumetric cell volume (%) for the single-objective VCV-tracking use case (Gadiyar et al. 2026, §3.5).

  • titer_target (float | None) – Optional target product concentration [mg L⁻¹].

Examples

>>> from perfusio.config import DesignSpace, ControlBounds, SpeciesBounds
>>> ds = DesignSpace(
...     name="ambr250_run_01",
...     control_bounds={
...         "perfusion_rate": ControlBounds(lo=0.5, hi=2.0, unit="vvd"),
...         "bleed_rate": ControlBounds(lo=0.05, hi=0.30, unit="vvd"),
...     },
...     species_bounds={
...         "VCD": SpeciesBounds(lo=0.0, hi=120.0, unit="1e6 cells/mL",
...                              description="Viable Cell Density"),
...     },
... )
property bounds_tensor: tuple[Tensor, Tensor]

Return (lower, upper) tensors of shape (n_controls,) for BoTorch.

Returns:

lower and upper bound tensors, dtype float64.

Return type:

tuple[Tensor, Tensor]

control_bounds: dict[str, ControlBounds]
property control_names: list[str]

Ordered list of control variable names.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property n_controls: int

Number of control variables.

name: str
species_bounds: dict[str, SpeciesBounds]
property species_names: list[str]

Ordered list of species names.

titer_target: float | None
vcv_target: float | None
viability_min: float
class perfusio.config.RunConfig(*, seed=None, device='cpu', dtype='float64', n_mc_samples=256, n_restarts=10, raw_samples=512, log_level='INFO', allow_write=False)[source]

Bases: BaseModel

Runtime configuration for perfusio computations.

Parameters:
  • seed (int | None) – Global random seed. None for non-deterministic execution.

  • device (str) – Torch device string ("cpu", "cuda:0", …). Defaults to "cpu" to ensure CI compatibility without GPUs.

  • dtype (Literal['float32', 'float64']) – Torch dtype. Defaults to torch.float64 (required for numerical stability of GP marginal log-likelihood optimisation — never downcast).

  • n_mc_samples (int) – Number of Monte Carlo samples for uncertainty propagation in rollout. Default 256 (adequate for 80% PI; increase to 1024 for publication).

  • n_restarts (int) – Number of random restarts for acquisition function optimisation. Default 10.

  • raw_samples (int) – Initial Sobol samples for warm-starting acquisition optimisation. Default 512.

  • log_level (Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']) – structlog log level. One of "DEBUG", "INFO", "WARNING", "ERROR".

  • allow_write (bool) – If False (default), the OPC UA connector operates in read-only mode and will raise PermissionError on any write_setpoints call. Must be explicitly set to True to enable hardware writes.

Notes

dtype = torch.float64 is a hard requirement for the GP marginal log-likelihood optimisation. The LBFGS optimiser with strong-Wolfe line search becomes numerically unstable in float32 for datasets with more than ~100 training points.

allow_write: bool
device: str
dtype: Literal['float32', 'float64']
log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']
make_generator()[source]

Create a seeded torch.Generator, or None if seed is None.

Return type:

torch.Generator or None

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_mc_samples: int
n_restarts: int
raw_samples: int
seed: int | None
property torch_device: device

Parsed torch.device.

property torch_dtype: dtype

Parsed torch.dtype.

class perfusio.config.SpeciesBounds(*, lo, hi, unit, description='')[source]

Bases: BaseModel

Lower and upper physical bounds for a single measured species.

Parameters:
  • lo (float) – Minimum physically plausible value (e.g. 0.0 for VCD).

  • hi (float) – Maximum physically plausible value.

  • unit (str) – SI or derived unit string, e.g. "10^6 cells mL^-1".

  • description (str) – Human-readable name, used in axis labels.

Examples

>>> b = SpeciesBounds(lo=0.0, hi=100.0, unit="10^6 cells mL^-1",
...                   description="Viable Cell Density")
>>> b.lo
0.0
description: str
hi: float
lo: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

unit: str

State, StateBatch, and Trajectory data structures for perfusio.

These are the primary data containers that flow through the entire library. They are deliberately plain dataclasses (not Pydantic models) so that PyTorch operations can be applied directly without serialisation overhead.

All tensors are stored as torch.float64 on the CPU by default. When GPU support is desired, call .to(device) on the container.

Design notes

  • State holds the full bioreactor state at a single time step.

  • StateBatch holds states for B reactors at a single time step.

  • Trajectory holds the full time series for a single reactor run.

The distinction matters for the digital twin: the BED optimiser works on StateBatch (one entry per candidate reactor), while the GP is trained on Trajectory objects (one per historical run).

class perfusio.states.State(day, species, controls, volume_L, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]

Bases: object

Bioreactor state at a single discrete time step.

All numeric fields are plain Python floats (or None for missing measurements), so that State objects can be serialised easily.

Parameters:
  • day (int) – Process day (0-indexed integer; 0 = inoculation day).

  • species (dict[str, float]) – Mapping of species name → measured (or simulated) value. Missing measurements are represented as float("nan").

  • controls (dict[str, float]) – Mapping of control variable name → applied setpoint.

  • volume_L (float) – Reactor working volume in litres. For ambr®250, this is typically 0.250 L. Must be > 0.

  • reactor_id (str) – Unique string identifier for the reactor vessel (e.g. "R01").

  • clone_id (str) – Cell-line / clone identifier used by CloneRegistry.

  • run_id (str) – Experiment / run identifier.

Notes

The species dict uses the canonical species names defined in perfusio.chemistry.species; see that module for the full registry.

Examples

>>> s = State(
...     day=5,
...     species={"VCD": 18.3, "Via": 97.1, "Glc": 3.2, "Amm": 4.1},
...     controls={"perfusion_rate": 1.0, "bleed_rate": 0.15},
...     volume_L=0.250,
...     reactor_id="R01",
...     clone_id="CX",
...     run_id="EXP001",
... )
>>> s.day
5
clone_id: str = 'unknown'
controls: dict[str, float]
day: int
classmethod from_tensor(t, species_names, control_names, *, day, volume_L=0.25, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]

Reconstruct a State from a flat tensor.

Parameters:
  • t (Tensor) – Flat 1-D tensor as produced by to_tensor().

  • species_names (list[str]) – Species ordering used when the tensor was created.

  • control_names (list[str]) – Control variable ordering.

  • day (int) – Process day (not stored in the tensor to avoid dtype issues).

  • volume_L (float) – Working volume.

  • reactor_id (str) – Reactor ID.

  • clone_id (str) – Clone ID.

  • run_id (str) – Run ID.

Return type:

State

reactor_id: str = 'R00'
run_id: str = 'unknown'
species: dict[str, float]
to_tensor(species_names, control_names, *, dtype=torch.float64, device='cpu')[source]

Serialise this state to a flat 1-D tensor.

Parameters:
  • species_names (list[str]) – Ordered list of species to include; missing ones are nan.

  • control_names (list[str]) – Ordered list of controls to include; missing ones are nan.

  • dtype (dtype) – Target dtype (default float64).

  • device (device | str) – Target device.

Returns:

Shape (len(species_names) + len(control_names) + 1,). The trailing +1 is the process day cast to float.

Return type:

Tensor

volume_L: float
class perfusio.states.StateBatch(day, species, controls, volume_L, reactor_ids, clone_ids, run_ids, species_names, control_names)[source]

Bases: object

Bioreactor states for B reactors at a single time step.

Parameters:
  • day (int) – Shared process day.

  • species (torch.Tensor) – Tensor of shape (B, n_species); rows are reactors, columns are species in canonical order.

  • controls (torch.Tensor) – Tensor of shape (B, n_controls).

  • volume_L (torch.Tensor) – Tensor of shape (B,) with working volumes in litres.

  • reactor_ids (list[str]) – List of length B with reactor identifiers.

  • clone_ids (list[str]) – List of length B with clone identifiers.

  • run_ids (list[str]) – List of length B with run identifiers.

  • species_names (list[str]) – Column names for the species tensor.

  • control_names (list[str]) – Column names for the controls tensor.

Examples

>>> import torch
>>> sb = StateBatch(
...     day=10,
...     species=torch.zeros(4, 11),
...     controls=torch.zeros(4, 6),
...     volume_L=torch.full((4,), 0.250),
...     reactor_ids=["R01", "R02", "R03", "R04"],
...     clone_ids=["CX"] * 4,
...     run_ids=["EXP001"] * 4,
...     species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln",
...                     "Glu", "Lac", "Amm", "Pyr", "Titer"],
...     control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint",
...                     "temperature", "agitation", "pyruvate_feed"],
... )
>>> sb.batch_size
4
property batch_size: int

Number of reactors in this batch.

clone_ids: list[str]
control_names: list[str]
controls: Tensor
day: int
reactor_ids: list[str]
run_ids: list[str]
species: Tensor
species_names: list[str]
to_input_tensor()[source]

Concatenate species + controls into a single (B, d) tensor.

Returns:

Shape (B, n_species + n_controls + 1) where the last column is the process day as float.

Return type:

Tensor

volume_L: Tensor
class perfusio.states.Trajectory(species, controls, volume_L, days, species_names, control_names, run_id='unknown', clone_id='unknown', reactor_id='R00', metadata=<factory>)[source]

Bases: object

Full time-series data for a single bioreactor run.

A Trajectory is the primary input to GP training. Each row corresponds to one daily observation; the number of rows equals the run length in days.

Parameters:
  • species (torch.Tensor) – Float tensor of shape (T, n_species) with measured species values. Missing observations are float("nan").

  • controls (torch.Tensor) – Float tensor of shape (T, n_controls) with applied setpoints.

  • volume_L (torch.Tensor) – Float tensor of shape (T,) with reactor working volumes.

  • days (torch.Tensor) – Integer tensor of shape (T,) with process days (0-indexed).

  • species_names (list[str]) – Column names for species.

  • control_names (list[str]) – Column names for controls.

  • run_id (str) – Unique run identifier.

  • clone_id (str) – Clone / cell-line identifier.

  • reactor_id (str) – Reactor vessel identifier.

  • metadata (dict[str, str]) – Arbitrary string → string metadata (e.g. medium lot, operator).

Notes

All tensors should be on the same device and have the same dtype. Use to() to move the trajectory to a different device.

Examples

>>> import torch
>>> traj = Trajectory(
...     species=torch.randn(20, 11),
...     controls=torch.ones(20, 6),
...     volume_L=torch.full((20,), 0.250),
...     days=torch.arange(20),
...     species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln",
...                     "Glu", "Lac", "Amm", "Pyr", "Titer"],
...     control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint",
...                     "temperature", "agitation", "pyruvate_feed"],
...     run_id="EXP001",
...     clone_id="CX",
...     reactor_id="R01",
... )
>>> traj.n_days
20
clone_id: str = 'unknown'
control_names: list[str]
controls: Tensor
days: Tensor
get_species(name)[source]

Return a 1-D tensor of values for species name.

Parameters:

name (str) – Species name.

Returns:

Shape (T,).

Return type:

Tensor

metadata: dict[str, str]
property n_controls: int

Number of control variables.

property n_days: int

Number of time steps in the trajectory.

property n_species: int

Number of measured species.

reactor_id: str = 'R00'
run_id: str = 'unknown'
slice_days(start, end)[source]

Return a sub-trajectory for days in [start, end).

Parameters:
  • start (int) – First day to include (0-indexed).

  • end (int) – Exclusive upper bound.

Return type:

Trajectory

species: Tensor
species_index(name)[source]

Return the column index for name in the species tensor.

Parameters:

name (str) – Species name, must be in species_names.

Return type:

int

Raises:

KeyError – If name is not a registered species.

species_names: list[str]
to(device)[source]

Return a copy with all tensors moved to device.

Parameters:

device (device | str) – Target PyTorch device.

Return type:

Trajectory

to_input_tensor()[source]

Stack species + controls + day into (T, d) tensor.

Returns:

Shape (T, n_species + n_controls + 1).

Return type:

Tensor

valid_mask()[source]

Boolean mask (T, n_species) that is True where measurement is present.

Returns:

dtype bool, shape (T, n_species).

Return type:

Tensor

volume_L: Tensor