API Reference¶
perfusio — open reference implementation of self-driving perfusion bioprocess development.
perfusio implements the methodology of Gadiyar et al. (2026) and Hutter et al. (2021),
providing step-wise Gaussian-process hybrid models, entity-embedding transfer learning,
Bayesian Experimental Design, and an online-retraining digital twin for CHO perfusion
bioreactors. It is intended as an open, citable complement to commercial platforms.
References
Gadiyar, C. J., Müller, C., Vuillemin, T., Bielser, J.-M., Souquet, J., Fagnani, A., Sokolov, M., von Stosch, M., Feidl, F., Butté, A., & Cruz Bournazou, M. N. (2026). Self-Driving Development of Perfusion Processes for Monoclonal Antibody Production. Biotechnology and Bioengineering, 123(2), 391–405. https://doi.org/10.1002/bit.70093
Hutter, S., von Stosch, M., Cruz Bournazou, M. N., & Butté, A. (2021). Knowledge transfer across cell lines using hybrid Gaussian process models with entity embedding vectors. Biotechnology and Bioengineering, 118(12), 4710–4725. https://doi.org/10.1002/bit.27907
- class perfusio.DesignSpace(*, name, control_bounds=<factory>, species_bounds=<factory>, viability_min=95.0, vcv_target=None, titer_target=None)[source]¶
Full description of the operating envelope for a perfusion experiment.
This object encodes both the measurable state bounds and the control variable ranges that are passed to the Bayesian Experimental Design optimizer. It is the single source of truth for all
optimize_acqfcalls.- Parameters:
name (str) – Experiment name (used in audit logs and figure titles).
control_bounds (dict[str, perfusio.config.ControlBounds]) – Mapping of control variable name →
ControlBounds.species_bounds (dict[str, perfusio.config.SpeciesBounds]) – Mapping of species name →
SpeciesBounds.viability_min (float) – Minimum acceptable viability (%) for feasibility constraints. Default 95.0 matches the paper’s constraint.
vcv_target (float | None) – Target volumetric cell volume (%) for the single-objective VCV-tracking use case (Gadiyar et al. 2026, §3.5).
titer_target (float | None) – Optional target product concentration [mg L⁻¹].
Examples
>>> from perfusio.config import DesignSpace, ControlBounds, SpeciesBounds >>> ds = DesignSpace( ... name="ambr250_run_01", ... control_bounds={ ... "perfusion_rate": ControlBounds(lo=0.5, hi=2.0, unit="vvd"), ... "bleed_rate": ControlBounds(lo=0.05, hi=0.30, unit="vvd"), ... }, ... species_bounds={ ... "VCD": SpeciesBounds(lo=0.0, hi=120.0, unit="1e6 cells/mL", ... description="Viable Cell Density"), ... }, ... )
- property bounds_tensor: tuple[Tensor, Tensor]¶
Return (lower, upper) tensors of shape (n_controls,) for BoTorch.
- Returns:
lowerandupperbound tensors, dtype float64.- Return type:
tuple[Tensor, Tensor]
- control_bounds: dict[str, ControlBounds]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- species_bounds: dict[str, SpeciesBounds]¶
- class perfusio.RunConfig(*, seed=None, device='cpu', dtype='float64', n_mc_samples=256, n_restarts=10, raw_samples=512, log_level='INFO', allow_write=False)[source]¶
Runtime configuration for
perfusiocomputations.- Parameters:
seed (int | None) – Global random seed.
Nonefor non-deterministic execution.device (str) – Torch device string (
"cpu","cuda:0", …). Defaults to"cpu"to ensure CI compatibility without GPUs.dtype (Literal['float32', 'float64']) – Torch dtype. Defaults to
torch.float64(required for numerical stability of GP marginal log-likelihood optimisation — never downcast).n_mc_samples (int) – Number of Monte Carlo samples for uncertainty propagation in rollout. Default 256 (adequate for 80% PI; increase to 1024 for publication).
n_restarts (int) – Number of random restarts for acquisition function optimisation. Default 10.
raw_samples (int) – Initial Sobol samples for warm-starting acquisition optimisation. Default 512.
log_level (Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']) –
structloglog level. One of"DEBUG","INFO","WARNING","ERROR".allow_write (bool) – If
False(default), the OPC UA connector operates in read-only mode and will raisePermissionErroron anywrite_setpointscall. Must be explicitly set toTrueto enable hardware writes.
Notes
dtype = torch.float64is a hard requirement for the GP marginal log-likelihood optimisation. The LBFGS optimiser with strong-Wolfe line search becomes numerically unstable in float32 for datasets with more than ~100 training points.- dtype: Literal['float32', 'float64']¶
- log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']¶
- make_generator()[source]¶
Create a seeded
torch.Generator, orNoneif seed isNone.- Return type:
torch.Generator or None
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property torch_device: device¶
Parsed
torch.device.
- property torch_dtype: dtype¶
Parsed
torch.dtype.
- class perfusio.State(day, species, controls, volume_L, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]¶
Bioreactor state at a single discrete time step.
All numeric fields are plain Python floats (or
Nonefor missing measurements), so thatStateobjects can be serialised easily.- Parameters:
day (int) – Process day (0-indexed integer;
0= inoculation day).species (dict[str, float]) – Mapping of species name → measured (or simulated) value. Missing measurements are represented as
float("nan").controls (dict[str, float]) – Mapping of control variable name → applied setpoint.
volume_L (float) – Reactor working volume in litres. For ambr®250, this is typically 0.250 L. Must be > 0.
reactor_id (str) – Unique string identifier for the reactor vessel (e.g.
"R01").clone_id (str) – Cell-line / clone identifier used by
CloneRegistry.run_id (str) – Experiment / run identifier.
Notes
The
speciesdict uses the canonical species names defined inperfusio.chemistry.species; see that module for the full registry.Examples
>>> s = State( ... day=5, ... species={"VCD": 18.3, "Via": 97.1, "Glc": 3.2, "Amm": 4.1}, ... controls={"perfusion_rate": 1.0, "bleed_rate": 0.15}, ... volume_L=0.250, ... reactor_id="R01", ... clone_id="CX", ... run_id="EXP001", ... ) >>> s.day 5
- classmethod from_tensor(t, species_names, control_names, *, day, volume_L=0.25, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]¶
Reconstruct a
Statefrom a flat tensor.- Parameters:
t (Tensor) – Flat 1-D tensor as produced by
to_tensor().species_names (list[str]) – Species ordering used when the tensor was created.
day (int) – Process day (not stored in the tensor to avoid dtype issues).
volume_L (float) – Working volume.
reactor_id (str) – Reactor ID.
clone_id (str) – Clone ID.
run_id (str) – Run ID.
- Return type:
- class perfusio.StateBatch(day, species, controls, volume_L, reactor_ids, clone_ids, run_ids, species_names, control_names)[source]¶
Bioreactor states for B reactors at a single time step.
- Parameters:
day (int) – Shared process day.
species (torch.Tensor) – Tensor of shape
(B, n_species); rows are reactors, columns are species in canonical order.controls (torch.Tensor) – Tensor of shape
(B, n_controls).volume_L (torch.Tensor) – Tensor of shape
(B,)with working volumes in litres.reactor_ids (list[str]) – List of length B with reactor identifiers.
clone_ids (list[str]) – List of length B with clone identifiers.
run_ids (list[str]) – List of length B with run identifiers.
species_names (list[str]) – Column names for the species tensor.
control_names (list[str]) – Column names for the controls tensor.
Examples
>>> import torch >>> sb = StateBatch( ... day=10, ... species=torch.zeros(4, 11), ... controls=torch.zeros(4, 6), ... volume_L=torch.full((4,), 0.250), ... reactor_ids=["R01", "R02", "R03", "R04"], ... clone_ids=["CX"] * 4, ... run_ids=["EXP001"] * 4, ... species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln", ... "Glu", "Lac", "Amm", "Pyr", "Titer"], ... control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint", ... "temperature", "agitation", "pyruvate_feed"], ... ) >>> sb.batch_size 4
- class perfusio.Trajectory(species, controls, volume_L, days, species_names, control_names, run_id='unknown', clone_id='unknown', reactor_id='R00', metadata=<factory>)[source]¶
Full time-series data for a single bioreactor run.
A
Trajectoryis the primary input to GP training. Each row corresponds to one daily observation; the number of rows equals the run length in days.- Parameters:
species (torch.Tensor) – Float tensor of shape
(T, n_species)with measured species values. Missing observations arefloat("nan").controls (torch.Tensor) – Float tensor of shape
(T, n_controls)with applied setpoints.volume_L (torch.Tensor) – Float tensor of shape
(T,)with reactor working volumes.days (torch.Tensor) – Integer tensor of shape
(T,)with process days (0-indexed).run_id (str) – Unique run identifier.
clone_id (str) – Clone / cell-line identifier.
reactor_id (str) – Reactor vessel identifier.
metadata (dict[str, str]) – Arbitrary string → string metadata (e.g. medium lot, operator).
Notes
All tensors should be on the same device and have the same dtype. Use
to()to move the trajectory to a different device.Examples
>>> import torch >>> traj = Trajectory( ... species=torch.randn(20, 11), ... controls=torch.ones(20, 6), ... volume_L=torch.full((20,), 0.250), ... days=torch.arange(20), ... species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln", ... "Glu", "Lac", "Amm", "Pyr", "Titer"], ... control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint", ... "temperature", "agitation", "pyruvate_feed"], ... run_id="EXP001", ... clone_id="CX", ... reactor_id="R01", ... ) >>> traj.n_days 20
- get_species(name)[source]¶
Return a 1-D tensor of values for species name.
- Parameters:
name (str) – Species name.
- Returns:
Shape
(T,).- Return type:
Tensor
- slice_days(start, end)[source]¶
Return a sub-trajectory for days in
[start, end).- Parameters:
- Return type:
- species_index(name)[source]¶
Return the column index for name in the species tensor.
- Parameters:
name (str) – Species name, must be in
species_names.- Return type:
- Raises:
KeyError – If name is not a registered species.
- to_input_tensor()[source]¶
Stack species + controls + day into
(T, d)tensor.- Returns:
Shape
(T, n_species + n_controls + 1).- Return type:
Tensor
Mechanistic Models¶
Mechanistic kinetic rate expressions for CHO perfusion models.
All rate laws are implemented from first principles with citable sources. Kinetic constants are labelled as synthetic defaults (S) or as drawn from a specific reference. No Merck-proprietary values are used.
Kinetics implemented¶
Monod growth on glucose and glutamine (dual substrate).
Pirt maintenance (energy maintenance on glucose independent of growth).
Luedeking–Piret mAb production model (growth + non-growth associated).
Lactate switch: CHO clone X switches from lactate production to lactate consumption when glucose falls below a threshold (Warburg effect reversal). Clone Y never consumes lactate.
Ammonium production from glutamine deamidation.
Pyruvate-NH₄⁺ scavenging: pyruvate reacts with NH₄⁺ in a first-order reaction (reduces ammonium accumulation when pyruvate is fed).
First-order cell death with temperature and shear stress dependence.
Apoptosis-induced diameter increase (phenomenological).
References
Monod, J. (1942). Recherches sur la Croissance des Cultures Bacteriennes. Hermann et Cie, Paris.
Pirt, S. J. (1965). The maintenance energy of bacteria in growing cultures. Proceedings of the Royal Society B, 163(991), 224–231.
Luedeking, R., & Piret, E. L. (1959). A kinetic study of the lactic acid fermentation. Journal of Biochemical and Microbiological Technology and Engineering, 1(4), 393–412.
Gagnon, M., et al. (2011). Metabolic flux analysis of CHO cells. Biotechnology and Bioengineering, 108(6), 1328–1337. (Source of synthetic Monod parameters, labelled S below.)
Gadiyar et al. (2026), §3.3 (in-silico setup).
- class perfusio.mechanistic.kinetics.CHOKinetics(mu_max=0.04, K_glc=0.08, K_gln=0.5, q_glc_max=5.5e-11, m_glc=2e-12, Y_lac_glc=1.8, glc_switch=2.0, lac_consump_rate=3e-10, consumes_lactate=True, q_gln_max=5e-11, K_d=0.003, K_T=0.001, T_ref=37.0, q_mAb_growth=2e-09, q_mAb_nongrowth=1.8e-09, alpha_amm_gln=0.85, k_pyr_amm=0.05, diam_base=20.5, diam_death_coeff=4.0)[source]¶
Bases:
objectKinetic parameter set for a CHO perfusion model.
All parameters are synthetic defaults calibrated to reproduce the trajectory shapes shown in Gadiyar et al. (2026) Fig. 7. They are labelled with their source (S = synthetic default; literature citation otherwise).
- Parameters:
mu_max (float) – Maximum specific growth rate [h⁻¹]. (S — typical CHO: 0.04 h⁻¹)
K_glc (float) – Glucose Monod constant [g L⁻¹]. (S — Gagnon et al. 2011: ~0.08 g/L)
K_gln (float) – Glutamine Monod constant [mmol L⁻¹]. (S — ~0.5 mmol/L)
q_glc_max (float) – Maximum specific glucose consumption rate [g cell⁻¹ h⁻¹]. (S)
m_glc (float) – Pirt glucose maintenance coefficient [g cell⁻¹ h⁻¹]. (S)
Y_lac_glc (float) – Lactate yield from glucose [mmol mmol⁻¹] in the Warburg regime. (S)
glc_switch (float) – Glucose threshold [g L⁻¹] below which clone X switches from lactate production to lactate consumption (Warburg effect reversal). (S)
lac_consump_rate (float) – Maximum specific lactate consumption rate [mmol cell⁻¹ h⁻¹] after the metabolic switch (clone X only). (S)
q_gln_max (float) – Maximum specific glutamine consumption rate [mmol cell⁻¹ h⁻¹]. (S)
K_d (float) – First-order cell death rate constant [h⁻¹]. (S)
K_T (float) – Temperature sensitivity of death rate [h⁻¹ K⁻¹]. (S)
T_ref (float) – Reference temperature for death rate [°C]. (S — 37.0)
q_mAb_growth (float) – Growth-associated mAb production [mg cell⁻¹]. (Luedeking–Piret α)
q_mAb_nongrowth (float) – Non-growth-associated mAb production [mg cell⁻¹ h⁻¹]. (LP β)
alpha_amm_gln (float) – Ammonium production stoichiometry from glutamine [mmol mmol⁻¹]. (S)
k_pyr_amm (float) – First-order rate constant for pyruvate–NH₄⁺ scavenging [h⁻¹]. (S)
diam_base (float) – Base mean cell diameter [μm]. (S — 20.5 μm, Gadiyar Fig.7)
diam_death_coeff (float) – Diameter increase per unit dead-cell fraction [μm per fraction]. (S)
consumes_lactate (bool) – Whether this clone has the metabolic switch (True for clone X). (S)
Notes
These are in-silico defaults only. Real CHO cell lines have proprietary kinetic parameters. See LIMITATIONS.md for an explicit disclaimer.
- STATE_ORDER: ClassVar[list[str]] = ['VCD', 'Via', 'Glc', 'Gln', 'Glu', 'Lac', 'Amm', 'Pyr', 'Titer']¶
Canonical ordering of ODE state variables.
- cell_diameter(via)[source]¶
Phenomenological cell diameter as a function of viability.
Dead and apoptotic cells swell, increasing the population mean diameter. This is calibrated to match Gadiyar et al. Fig. 7 (Diam ≈ 20.5 → 22.5 μm as viability falls from ~99% to ~90%).
\[\bar{d} = d_0 + \delta_d \cdot (1 - \text{Via}/100)\]
- k_death(temperature_C=37.0)[source]¶
Temperature-dependent specific cell death rate [h⁻¹].
\[k_d(T) = k_d^{\text{ref}} + K_T \cdot (T - T_{\text{ref}})\]
- mu(glc, gln)[source]¶
Specific growth rate by dual-substrate Monod kinetics.
\[\mu = \mu_{\max} \frac{[\text{Glc}]}{K_{\text{Glc}} + [\text{Glc}]} \frac{[\text{Gln}]}{K_{\text{Gln}} + [\text{Gln}]}\]- Parameters:
- Returns:
Specific growth rate [h⁻¹].
- Return type:
References
[Monod1942]Monod (1942), eq. (1).
- ode_rhs(t, y, controls)[source]¶
Right-hand side of the CHO ODE system for scipy.integrate.solve_ivp.
State vector ordering (must match
STATE_ORDER):[VCD, Via, Glc, Gln, Glu, Lac, Amm, Pyr, Titer]- Parameters:
t (float) – Time [h] (not used in this autonomous system, but required by the
solve_ivpsignature).y (list[float]) – Current state vector (see ordering above).
controls (dict[str, float]) – Dict with keys
"temperature"[°C],"perfusion_rate"[vvd],"bleed_rate"[vvd],"glucose_feed_conc"[g L⁻¹],"gln_feed_conc"[mmol L⁻¹],"pyr_feed_conc"[mmol L⁻¹],"volume_L"[L].
- Returns:
Time derivatives for each state variable [h⁻¹ × units].
- Return type:
- q_amm(q_gln_val, pyr, amm)[source]¶
Volumetric ammonium production rate [mmol L⁻¹ h⁻¹].
Ammonium is produced from glutamine deamidation and scavenged by pyruvate in a first-order reaction.
\[R_{\text{NH}_4^+} = \alpha_{\text{Gln}} \cdot q_{\text{Gln}} - k_{\text{pyr}} \cdot [\text{Pyr}] \cdot [\text{NH}_4^+]\]
- q_glc(mu_val, vcd)[source]¶
Specific glucose consumption rate (Pirt maintenance + growth).
\[q_{\text{Glc}} = \frac{\mu}{Y_{X/\text{Glc}}} + m_{\text{Glc}}\]We absorb \(Y_{X/\text{Glc}}\) into
q_glc_maxfor parsimony. The total volumetric rate is \(Q_{\text{Glc}} = q_{\text{Glc}} \cdot X_v\).- Parameters:
- Returns:
Volumetric glucose consumption rate [g L⁻¹ h⁻¹].
- Return type:
References
[Pirt1965]Pirt (1965).
- q_lac(glc, q_glc_val, vcd=0.0)[source]¶
Volumetric lactate production/consumption rate.
For clone X: produces lactate when
glc > glc_switch; consumes lactate whenglc < glc_switch. For clone Y: always produces lactate.- Parameters:
- Returns:
Volumetric lactate rate [mmol L⁻¹ h⁻¹]. Positive = production; negative = consumption.
- Return type:
References
[Gagnon2011]Gagnon et al. (2011) — metabolic switch parameters.
High-level CHO perfusion mechanistic model.
This module wraps CHOKinetics and
integrate_run() into a clean
CHOPerfusionModel API suitable for use as a mechanistic prior in
the hybrid SW-GP.
The model computes, for any process state \(s_i\), the mechanistic prediction of daily rates \(\hat{R}^{\text{mech}}_k(s_i)\). The hybrid model’s GP then learns the residual:
where \(\epsilon_k \sim \mathcal{GP}(0, k(s_i, s_j))\).
References
Gadiyar et al. (2026), §3.1 (“Hybrid model structure”).
- class perfusio.mechanistic.models.CHOPerfusionModel(kinetics=None, volume_L=0.25, dt_hours=24.0)[source]¶
Bases:
objectDeterministic mechanistic CHO perfusion model.
- Parameters:
kinetics (CHOKinetics | None) – Kinetic parameter set. If
None, uses the default clone X parameters (consumes lactate at low glucose).volume_L (float) – Nominal reactor working volume [L]. Default 0.250 (ambr®250).
dt_hours (float) – Reporting interval [h]. Default 24.0 (daily samples).
Examples
>>> model = CHOPerfusionModel() >>> y0 = {"VCD": 1.0, "Via": 99.0, "Glc": 5.0, "Gln": 4.0, ... "Glu": 0.5, "Lac": 2.0, "Amm": 0.5, "Pyr": 0.0, ... "Titer": 0.0} >>> controls = {"perfusion_rate": 1.0, "bleed_rate": 0.15, ... "temperature": 37.0} >>> traj = model.simulate(y0, controls, n_days=14, seed=42) >>> traj.shape # (15, 9) — 14 daily steps + initial state (15, 9)
- predict_rates(c, controls)[source]¶
Predict net volumetric rates at state c using the kinetic model.
- Parameters:
c (Tensor) – State tensor, shape
(n_species,). Species ordered asSTATE_SPECIES.
- Returns:
Rate tensor, shape
(n_species,). Units are [species-unit h⁻¹].- Return type:
Tensor
- predict_rates_batch(c_batch, controls_batch, control_names)[source]¶
Vectorised rate prediction for a batch of states.
- simulate(y0, controls, n_days, seed=None)[source]¶
Run a deterministic simulation over n_days daily steps.
- Parameters:
- Returns:
Shape
(n_days + 1, n_species)where species followSTATE_SPECIESordering. Row 0 is the initial state.- Return type:
np.ndarray
ODE integration utilities for the mechanistic CHO model.
Uses scipy.integrate.solve_ivp with automatic stiff fallback:
- Primary solver: RK45 (non-stiff, fast).
- Fallback: Radau (implicit, handles stiff metabolite ODEs).
- Second fallback: LSODA (automatic stiffness detection).
All solutions are reported at daily (or sub-daily if requested) time points, regardless of the internal adaptive step chosen by the solver.
References
Virtanen et al. (2020). SciPy 1.0. Nature Methods, 17, 261–272.
- perfusio.mechanistic.integrators.integrate_run(kinetics, y0, controls, n_days, dt_hours=24.0)[source]¶
Integrate the CHO ODE system over n_days daily steps.
- Parameters:
kinetics (CHOKinetics) – Kinetic parameter set.
y0 (list[float]) – Initial state vector, length
n_species. Order must matchSTATE_ORDER.controls (dict[str, float]) – Control variable dict (constant throughout the run).
n_days (int) – Number of daily steps to simulate.
dt_hours (float) – Reporting interval in hours. Default 24.0.
- Returns:
Shape
(n_days + 1, n_species)— states at each reporting point including the initial state.- Return type:
np.ndarray
- Raises:
RuntimeError – If all ODE solvers fail to converge.
Notes
The integration time span is
[0, n_days * dt_hours]hours. Reporting points are at everydt_hoursinterval. Negative concentrations are clipped to zero after each step.
Gaussian Process Layer¶
Kernel definitions for the hybrid SW-GP model.
The primary kernel is a product of:
Matérn-5/2 over the metabolite state space (captures smooth, non-periodic autocorrelation in bioreactor trajectories).
Linear kernel over the operating day variable (models the secular trend within a run).
Categorical / Index kernel for multi-task output (one task per species / rate variable), sharing a common latent representation via an index kernel.
These choices follow Cruz-Bournazou et al. (2022) §2.3 and Rasmussen & Williams (2006) §4.2.
References
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press. §4.2.
Cruz-Bournazou, M. N., et al. (2022). Digitizing bioprocess development. Current Opinion in Biotechnology, 76, 102764.
- class perfusio.gp.kernels.PerfusionKernel(n_tasks, n_state_dims, rank=1, ard_num_dims=1, **kwargs)[source]¶
Bases:
KernelComposite kernel for CHO perfusion rate GP models.
The kernel factorises as:
\[k(\mathbf{x}, \mathbf{x}') = k_{\text{Matern5/2}}(\mathbf{x}_{\text{state}}, \mathbf{x}'_{\text{state}}) \times k_{\text{Linear}}(x_{\text{day}}, x'_{\text{day}}) \times k_{\text{Index}}(i, j)\]where \(\mathbf{x}_{\text{state}}\) are metabolite concentrations, \(x_{\text{day}}\) is the culture day, and \(i,j\) index the output task (species).
- Parameters:
n_tasks (int) – Number of output tasks (species / rate variables).
n_state_dims (int) – Dimensionality of the metabolite state input.
rank (int) – Rank of the index kernel covariance factor. Default 1.
ard_num_dims (int) – If > 1 uses automatic relevance determination for the Matérn kernel.
kwargs (Any)
Notes
Input convention:
x[:, :n_state_dims]— metabolite concentrations (normalised)x[:, n_state_dims]— culture dayx[:, n_state_dims + 1]— task index (integer)- forward(x1, x2, **params)[source]¶
Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be implemented by all Kernel subclasses.
- Parameters:
x1 (Tensor) – First set of data (… x N x D).
x2 (Tensor) – Second set of data (… x M x D).
diag – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)
last_dim_is_batch – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)
params (Any)
- Returns:
The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:
full_covar: … x N x M
full_covar with last_dim_is_batch=True: … x K x N x M
diag: … x N
diag with last_dim_is_batch=True: … x K x N
- Return type:
- class perfusio.gp.kernels.ResidualKernel(n_state_dims, ard_num_dims=1, **kwargs)[source]¶
Bases:
KernelRBF-based residual kernel for the SW-GP correction term.
Used in
HybridStateSpaceModelto capture smooth, local residuals that the mechanistic model cannot explain.- Parameters:
- forward(x1, x2, **params)[source]¶
Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be implemented by all Kernel subclasses.
- Parameters:
x1 (Tensor) – First set of data (… x N x D).
x2 (Tensor) – Second set of data (… x M x D).
diag – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)
last_dim_is_batch – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)
params (Any)
- Returns:
The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:
full_covar: … x N x M
full_covar with last_dim_is_batch=True: … x K x N x M
diag: … x N
diag with last_dim_is_batch=True: … x K x N
- Return type:
Exact GP model for multi-task rate prediction.
Implements a batched ExactGP from GPyTorch that simultaneously models
n_tasks output variables (metabolite rates) using a shared kernel and
task-specific hyperparameters.
The model can be used with either a zero mean, linear mean, or a mechanistic
prior mean (MechanisticPriorMean).
References
Gardner, J. R., et al. (2018). GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. NeurIPS.
- class perfusio.gp.exact_gp.MultiTaskRateGP(train_x, train_y, likelihood, mean_module, kernel=None, n_tasks=9, n_state_dims=9)[source]¶
Bases:
ExactGPMulti-output exact GP for metabolite rate prediction.
Each of the
n_tasksoutputs shares the samePerfusionKernelstructure but has its own output scale and noise level.- Parameters:
train_x (Tensor) – Training inputs, shape
(N * n_tasks, d_input)in the indexed multi-task format — last column is the task_id (0..n_tasks-1).train_y (Tensor) – Training targets, shape
(N * n_tasks,)— one scalar per row.likelihood (GaussianLikelihood) – Gaussian likelihood (single-output; task correlation is captured by the
PerfusionKernelIndexKernel).mean_module (Mean) – Mean function module. Should be a GPyTorch
MeanorMechanisticPriorMean.kernel (PerfusionKernel | None) – Composite kernel. If
None, constructs a defaultPerfusionKernel.n_tasks (int) – Number of output tasks (species).
n_state_dims (int) – Input dimensionality (species + controls); needed to build default kernel.
Examples
>>> import torch, gpytorch >>> from perfusio.gp import MultiTaskRateGP >>> N, d, T = 50, 10, 5 >>> # Indexed format: N*T rows, last col = task_id >>> X = torch.randn(N * T, d + 2) # d state + 1 day + 1 task_id >>> Y = torch.randn(N * T) # flat scalar targets >>> likelihood = gpytorch.likelihoods.GaussianLikelihood() >>> mean = gpytorch.means.ZeroMean() >>> gp = MultiTaskRateGP(X, Y, likelihood, mean, n_tasks=T, n_state_dims=d)
- forward(x)[source]¶
Compute the prior distribution at x.
- Parameters:
x (Tensor) – Input tensor, shape
(N, d_input).- Returns:
The GP prior (mean + covariance).
- Return type:
MultivariateNormal
Step-wise GP (SW-GP) for multi-step-ahead trajectory prediction.
The SW-GP predicts the state at time \(t+1\) from the state at time \(t\) (one-step GP). For multi-step-ahead forecasting, predictions are chained (rollout) with uncertainty propagated by:
Monte Carlo sampling: sample from the one-step posterior, feed the sample as the next step’s input, repeat h times, and report percentiles over n_samples trajectories.
Moment matching (fast approximation): propagate mean and variance analytically through the one-step GP (Girard et al., 2003). Used when speed is critical (BED inner loop).
Both approaches are exposed through a common predict_quantiles interface.
References
Girard, A., et al. (2003). Gaussian Process Priors With Uncertain Inputs Application to Multiple-Step Ahead Time Series Forecasting. NeurIPS.
Gadiyar et al. (2026), §3.2 (SW-GP rollout for BED OFV).
- class perfusio.gp.stepwise.StepwiseGP(model, likelihood, n_species, control_names)[source]¶
Bases:
objectStep-wise GP with multi-step rollout.
Wraps a fitted one-step GP model and provides
predict_quantiles()for horizon-h forecasts.- Parameters:
Notes
The one-step GP takes inputs of shape
(N, n_species + n_controls + 1)— species concentrations, controls, and culture day — and outputs(N, n_species)species concentration at the next time step.- predict_quantiles(x0, controls, horizon, n_samples=200, seed=None, method='mc')[source]¶
Multi-step-ahead prediction with uncertainty quantification.
- Parameters:
x0 (Tensor) – Initial state, shape
(n_species,)or(B, n_species).controls (Tensor) – Control trajectories, shape
(horizon, n_controls)or(B, horizon, n_controls).horizon (int) – Number of steps ahead to predict.
n_samples (int) – Number of MC samples (ignored when
method="moment_matching").seed (int | None) – Random seed for MC sampling.
method (str) –
"mc"(default) or"moment_matching".
- Returns:
Keys:
"mean"shape(horizon, n_species),"q10"/"q50"/"q90"same shape.- Return type:
Jackknife ensemble of GP models for uncertainty quantification.
The ensemble trains K GP models, each on a random subsample of the training data (jackknife-after-bootstrap), then aggregates their predictions as 10th / 50th / 90th percentiles. This provides calibrated predictive intervals even for small training sets (< 30 observations per species), where the Normal approximation of a single GP may be overconfident.
The approach is analogous to the jackknife+ of Barber et al. (2021) but applied at the GP model level rather than at the residual level.
References
Barber, R. F., et al. (2021). Predictive inference with the jackknife+. Annals of Statistics, 49(1), 486–507.
Gadiyar et al. (2026), §3.2 (ensemble calibration).
- class perfusio.gp.ensemble.EnsembleMember(model, likelihood, train_idx)[source]¶
Bases:
objectContainer for a single jackknife ensemble member.
- Parameters:
model (ExactGP)
likelihood (Likelihood)
train_idx (Tensor)
- likelihood: Likelihood¶
- model: ExactGP¶
- class perfusio.gp.ensemble.JackknifeEnsemble(K=20, subsample_fraction=0.8, seed=0)[source]¶
Bases:
objectJackknife ensemble over K GP models.
- Parameters:
Examples
>>> from perfusio.gp import JackknifeEnsemble, MultiTaskRateGP >>> import gpytorch, torch >>> ens = JackknifeEnsemble(K=5)
- fit(train_x, train_y, model_factory, n_iter=200, lr=0.05)[source]¶
Train all K ensemble members.
- Parameters:
train_x (Tensor) – Training inputs, shape
(N, d).train_y (Tensor) – Training targets, shape
(N, n_tasks)or(N,).model_factory (object) – Callable
(x_sub, y_sub) -> (model, likelihood)that constructs a fresh GP model for a given subsample.n_iter (int) – MLL training iterations per member.
lr (float) – Adam learning rate.
- Returns:
Self (for chaining).
- Return type:
- members: list[EnsembleMember]¶
- predict(x_new, quantiles=(0.1, 0.5, 0.9))[source]¶
Ensemble posterior mean and quantiles.
- Parameters:
- Returns:
Keys:
"mean"(ensemble mean), and"q{p*100:.0f}"for each quantile level. All tensors have the same shape asx_new[..., 0].- Return type:
- Raises:
RuntimeError – If
fit()has not been called yet.
Mean function definitions for the GP module.
Provides three mean functions:
ZeroMeanMultiTask— zero mean function used when a pure data-driven GP is desired or when the training set is large.LinearMean— affine mean function for improved extrapolation.MechanisticPriorMean— wraps theCHOPerfusionModelto provide a first-principles prior mean. The GP then learns only the residual between the mechanistic prediction and the observations.
References
Gadiyar et al. (2026), §3.1 (“Hybrid model”).
- class perfusio.gp.means.LinearMean(input_size, batch_shape=None)[source]¶
Bases:
MeanAffine (linear + bias) mean function.
- Parameters:
input_size (int) – Dimensionality of the input.
batch_shape (torch.Size | None) – Batch shape (for multi-output GP).
Notes
Parameters are learnable via MLL training.
- class perfusio.gp.means.MechanisticPriorMean(mech_model, n_species, species_names, control_names, task_index=None, scale=1.0)[source]¶
Bases:
MeanGP mean function backed by a mechanistic CHO model.
At any input point \(\mathbf{x} = (\mathbf{c}, \mathbf{u}, t)\), this mean function evaluates the mechanistic rate \(\hat{R}^{\text{mech}}_k(\mathbf{x})\) from
CHOPerfusionModel.The GP thus models:
\[R_k(\mathbf{x}) = \hat{R}^{\text{mech}}_k(\mathbf{x}) + \epsilon_k(\mathbf{x})\]where \(\epsilon_k \sim \mathcal{GP}(0, k_{\epsilon})\).
- Parameters:
mech_model (CHOPerfusionModel) – Instantiated
CHOPerfusionModel.n_species (int) – Number of species whose rates are modelled.
species_names (list[str]) – Ordered list of species names matching kinetics.STATE_ORDER.
task_index (int | None) – Index of the target task (species index into STATE_ORDER). Used when the GP is single-task, predicting one species at a time. Pass
Nonefor multi-task models that predict all species jointly.scale (float) – Optional learnable scale parameter to allow the GP to up- or down-weight the mechanistic prior.
Notes
The input layout expected is:
x[:, :n_species]— normalised species concentrationsx[:, n_species:n_species+n_controls]— controlsx[:, -1]— culture day (hours ÷ 24 internally)Gradient flow is blocked through the mechanistic model (
torch.no_grad) since the kinetics are not themselves differentiable PyTorch modules.
Transfer Learning¶
Clone registry and entity-embedding module.
Each CHO cell line (clone) is assigned a learnable dense embedding vector that captures clone-specific characteristics (growth rate, metabolism, productivity) in a low-dimensional latent space. The embedding is learned jointly with the GP hyperparameters during MLL training and fine-tuned during transfer.
Architecture¶
clone_id → Embedding(n_clones, embed_dim) → e ∈ ℝ^{embed_dim}
e is concatenated with (c, u, t) → GP input
Following Hutter et al. (2021) §2.3, the default embedding dimension is 4, which was found to balance expressiveness and identifiability for ≤ 20 clones.
References
Hutter et al. (2021), §2.3.
- class perfusio.embedding.clones.CloneInfo(clone_id, name, consumes_lactate=True, description='')[source]¶
Bases:
objectMetadata for a single CHO cell line.
- class perfusio.embedding.clones.CloneRegistry(clones)[source]¶
Bases:
objectRegistry that maps clone names to integer IDs and metadata.
- Parameters:
clones (list[CloneInfo]) – List of
CloneInfoobjects. Theclone_idfield must form a contiguous range starting at 0.
Examples
>>> from perfusio.embedding import CloneRegistry >>> reg = CloneRegistry.default() >>> reg["CloneX"].clone_id 0 >>> len(reg) 2
- class perfusio.embedding.clones.EntityEmbedding(n_clones, embed_dim=4)[source]¶
Bases:
ModuleLearnable dense entity embedding for CHO cell lines.
- Parameters:
Examples
>>> emb = EntityEmbedding(n_clones=2, embed_dim=4) >>> clone_ids = torch.tensor([0, 1, 0]) # batch of three samples >>> e = emb(clone_ids) >>> e.shape torch.Size([3, 4])
Transfer learning orchestration for entity-embedded GP models.
Implements the two-stage transfer protocol of Hutter et al. (2021):
Warm start (
warm_start): Train on the source clone dataset. TheEntityEmbeddingis initialised and the full model (embedding + GP kernel) is optimised.Joint fine-tuning (
joint_finetune): Load weights from the source run, optionally freeze the kernel backbone, and fine-tune on a small target clone dataset with a higher learning rate on the embedding.
This two-stage approach allows the model to learn shared process dynamics from data-rich source clones and rapidly adapt to a new clone with as few as 3–5 runs (Hutter et al. §3.2).
References
Hutter et al. (2021), §2.3 and §3.2.
- class perfusio.embedding.transfer.TransferLearner(embedding, model, likelihood, lr_backbone=0.01, lr_embedding=0.1)[source]¶
Bases:
objectTwo-stage transfer learning manager.
- Parameters:
embedding (EntityEmbedding) – Shared
EntityEmbeddingmodule.model (gpytorch.models.ExactGP) – GPyTorch model (must accept augmented inputs from the embedding).
likelihood (gpytorch.likelihoods.Likelihood) – Corresponding GPyTorch likelihood.
lr_backbone (float) – Learning rate for GP kernel hyperparameters during fine-tuning.
lr_embedding (float) – Learning rate for embedding parameters during fine-tuning. Typically 10× the backbone lr.
Examples
>>> from perfusio.embedding import TransferLearner, EntityEmbedding >>> emb = EntityEmbedding(n_clones=2, embed_dim=4) >>> # attach model and likelihood (constructed externally) >>> # tl = TransferLearner(emb, model, likelihood)
- joint_finetune(train_x, train_y, clone_ids, n_iter=100, freeze_backbone=False)[source]¶
Fine-tune on target-clone data with differential learning rates.
- Parameters:
train_x (Tensor) – Target features (without embedding), shape
(M, d_raw).Mmay be as small as the number of experiments × time points.train_y (Tensor) – Targets, shape
(M,)or(M, n_tasks).clone_ids (Tensor) – Clone IDs for the target data, shape
(M,).n_iter (int) – Number of fine-tuning steps.
freeze_backbone (bool) – If
True, freeze all GP kernel/mean parameters and update only the embedding. Useful when the target dataset is very small (< 3 runs) to avoid catastrophic forgetting.
- Return type:
None
Hybrid State-Space Model¶
Hybrid State-Space Model: mechanistic prior + GP residual.
The HybridStateSpaceModel is the central model object used by the
self-driving loop. It holds:
A
CHOPerfusionModel(mechanistic prior)A
StepwiseGP(data-driven residual)Optionally an
EntityEmbeddingfor cross-clone transfer.
Prediction combines the mechanistic and GP contributions as:
References
Gadiyar et al. (2026), §3.1 and §3.2.
- class perfusio.hybrid.model.HybridStateSpaceModel(mech_model, gp_model, embedding=None, dt_hours=24.0, species_names=None, control_names=None)[source]¶
Bases:
objectCombined mechanistic + GP hybrid model.
- Parameters:
mech_model (CHOPerfusionModel) – Fitted (or default) mechanistic model.
gp_model (StepwiseGP) – Fitted one-step SW-GP model.
embedding (EntityEmbedding | None) – Optional entity embedding (None for single-clone scenarios).
dt_hours (float) – Time step in hours. Default 24 (daily).
species_names (list[str] | None) – Ordered species names corresponding to state columns.
control_names (list[str] | None) – Ordered control names corresponding to the control input.
Examples
>>> from perfusio.mechanistic import CHOPerfusionModel >>> mech = CHOPerfusionModel() >>> # gp_model = ... (fitted StepwiseGP) >>> # hybrid = HybridStateSpaceModel(mech, gp_model)
- predict_next_state(c_t, u_t, day, clone_id=None)[source]¶
Predict next-day species concentrations.
- Parameters:
- Returns:
(mean, q10, q90)— posterior mean and 80% credible interval for \(c_{t+1}\). All shape(n_species,).- Return type:
tuple[Tensor, Tensor, Tensor]
Training routines for the hybrid SW-GP model.
Provides:
train_hybrid(): Main training entry point. Fits the GP residual on pre-computed mechanistic residuals, then optionally runs jackknife ensemble.retrain_online(): Lightweight incremental update when new daily data arrive (fast re-optimisation of GP hyperparameters only).
Optimisation strategy¶
L-BFGS (primary): batch optimiser, fast for small datasets (≤ 200 pts).
Adam fallback: used when L-BFGS diverges (detected by NaN/Inf loss).
References
Gadiyar et al. (2026), §3.3 (model retrain cadence).
- perfusio.hybrid.train.retrain_online(new_x, new_y, model, likelihood, n_iter=50, lr=0.05)[source]¶
Incremental re-optimisation when new daily data arrive.
Uses Adam only (L-BFGS is overkill for a single new data point). The existing GP training data are not replaced; the model is updated in-place by calling
set_train_datato append the new observations.- Parameters:
- Return type:
None
Notes
strict=Falseis passed toset_train_datato allow appending to a different-length dataset.
- perfusio.hybrid.train.train_hybrid(train_x, train_y, model, likelihood, n_iter_lbfgs=50, n_iter_adam=200, lr_adam=0.05)[source]¶
Fit the hybrid GP residual model by maximising the MLL.
Attempts L-BFGS first; falls back to Adam if L-BFGS produces NaN loss.
- Parameters:
train_x (Tensor) – Training inputs, shape
(N, d).train_y (Tensor) – Residual targets (observed − mechanistic), shape
(N,)or(N, n_tasks).model (ExactGP) – GPyTorch ExactGP model.
likelihood (Likelihood) – Corresponding GPyTorch likelihood.
n_iter_lbfgs (int) – Number of L-BFGS steps.
n_iter_adam (int) – Number of Adam fallback steps.
lr_adam (float) – Adam learning rate.
- Return type:
None
Multi-step-ahead trajectory forecasting with the hybrid model.
Exposes forecast_run() as the primary public entry point. It wraps the
HybridStateSpaceModel MC rollout and returns
structured outputs that can be directly consumed by the BED OFV function and
the visualisation layer.
References
Gadiyar et al. (2026), §3.2 (3-step-ahead horizon for BED).
- perfusio.hybrid.forecast.forecast_run(hybrid, c0, controls, horizon=3, n_samples=200, seed=0, clone_id=None)[source]¶
Forecast species trajectories horizon steps ahead.
- Parameters:
hybrid (HybridStateSpaceModel) – Fitted
HybridStateSpaceModel.c0 (Tensor) – Initial state, shape
(n_species,).controls (Tensor) – Control sequence to apply. Shape
(horizon, n_controls). If shape(n_controls,)is given, the same controls are applied at every step.horizon (int) – Number of future days to predict. Default 3 (Gadiyar §3.2).
n_samples (int) – MC rollout samples for uncertainty quantification.
seed (int) – Random seed.
clone_id (int | None) – Clone ID for embedding lookup.
- Returns:
A dict with keys:
"days"— shape(horizon,)— day indices (1-indexed from c0)."mean"— shape(horizon, n_species)— posterior mean."q10"— shape(horizon, n_species)— 10th percentile."q50"— shape(horizon, n_species)— 50th percentile (median)."q90"— shape(horizon, n_species)— 90th percentile.
- Return type:
Bayesian Experimental Design¶
BoTorch acquisition function factory for Bayesian Experimental Design.
Provides build_acquisition(), a single entry-point that constructs any of
the 11 acquisition functions supported by perfusio:
- Single-objective (analytic):
PI,EI,LogEI,UCB- Single-objective (Monte Carlo):
qEI,qLogEI,qUCB- Multi-objective (Monte Carlo):
qEHVI,qNEHVI,qNParEGO
Constrained variants are automatically applied when constraints are passed.
References
Balandat, M., et al. (2020). BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. NeurIPS.
- perfusio.bed.acquisitions.build_acquisition(name, model, best_f=None, beta=0.2, ref_point=None, partitioning=None, sampler=None, constraints=None, **kwargs)[source]¶
Construct a BoTorch acquisition function by name.
- Parameters:
name (str) – One of:
PI,EI,LogEI,UCB,qEI,qLogEI,qUCB,qEHVI,qNEHVI,qNParEGO.model (Model) – Fitted BoTorch / GPyTorch surrogate model.
best_f (float | None) – Incumbent best observed value. Required for
PI,EI,LogEI,qEI,qLogEI.beta (float) – UCB exploration parameter. Required for
UCB,qUCB.ref_point (list[float] | None) – Reference point for hypervolume-based acquisitions (
qEHVI,qNEHVI). Should be strictly dominated by all feasible points.partitioning (Any) – Pre-computed box decomposition (
NondominatedPartitioning). Required forqEHVI.sampler (Any) – MC sampler (
IIDNormalSamplerorSobolQMCNormalSampler). Defaults toSobolQMCNormalSampler(512)for MC acquisitions.constraints (list[Callable[[Tensor], Tensor]] | None) – Optional list of constraint callables
c(X) <= 0to enforce. Applied as soft-constraint penalties where available.**kwargs (Any) – Additional keyword arguments forwarded to the acquisition constructor.
- Return type:
botorch.acquisition.AcquisitionFunction
- Raises:
ValueError – If name is not a recognised acquisition type.
Objective function values (OFVs) for Bayesian Experimental Design.
Implements Gadiyar et al. (2026) Eq. 7:
where \(\hat{y}_{t+j}\) is the j-step-ahead hybrid model prediction (50th-percentile / mean) for the chosen objective species (e.g. VCD, Titer, viability).
The OFV is used to evaluate candidate control setpoints \(\mathbf{u}\) proposed by the acquisition function.
References
Gadiyar et al. (2026), §3.2, Eq. (7).
- class perfusio.bed.objectives.MultiObjectiveOFV(hybrid, objectives, horizon=3, n_samples=100, seed=42)[source]¶
Bases:
objectVector-valued OFV for multi-objective BED.
Returns one OFV per objective (e.g. maximise titer AND minimise ammonium). Used with multi-objective acquisition functions (qEHVI, qNEHVI, qNParEGO).
- Parameters:
hybrid (HybridStateSpaceModel) – Fitted hybrid model.
objectives (list[tuple[int, float, float]]) – List of
(species_index, target, direction)tuples wheredirectionis+1for maximisation and-1for minimisation (the sign is applied to the OFV before passing to botorch acqf).horizon (int) – Prediction horizon.
n_samples (int) – MC rollout samples.
seed (int) – Random seed.
- class perfusio.bed.objectives.TargetSpec(species_index, target, weight=1.0)[source]¶
Bases:
objectSpecification for a single tracked species target.
- class perfusio.bed.objectives.TargetTrackingOFV(targets, hybrid=None, horizon=3, n_samples=100, seed=42)[source]¶
Bases:
object3-step-ahead squared-error OFV from Gadiyar et al. (2026) Eq. 7.
- Parameters:
hybrid (HybridStateSpaceModel | None) – Fitted hybrid model for forecasting.
targets (list[TargetSpec]) – List of
TargetSpecobjects — one per tracked species.horizon (int) – Prediction horizon in days. Default 3 (Gadiyar §3.2).
n_samples (int) – MC rollout samples. Default 100 (fast for BED inner loop).
seed (int) – Fixed random seed for reproducible OFV evaluations.
Examples
>>> from perfusio.bed.objectives import TargetTrackingOFV, TargetSpec >>> # assume `hybrid` is a fitted HybridStateSpaceModel >>> ofv = TargetTrackingOFV( ... hybrid=hybrid, ... targets=[TargetSpec(0, target=30.0, weight=1.0)], # VCD → 30e6 cells/mL ... ) >>> # ofv.evaluate(c0, u_candidate) # returns scalar Tensor
- score_trajectories(Y)[source]¶
Score a batch of pre-computed trajectories against targets.
Useful for evaluating acquisition function surrogates directly on GP prediction samples without running the full hybrid model.
- Parameters:
Y (Tensor) – Pre-computed trajectories, shape
(B, horizon, n_species).- Returns:
OFV values, shape
(B,)— higher is better (negative SSE).- Return type:
Tensor
High-level BED policy — daily self-driving decision loop.
BEDPolicy orchestrates the complete self-driving step:
Receive current bioreactor state.
Evaluate the acquisition function over the design space.
Propose next setpoints (with
--allow-writesafety gate).Log the decision and uncertainty to the audit trail.
This is the class called by the digital twin’s daily scheduler.
References
Gadiyar et al. (2026), §3.2 and Fig. 5.
- class perfusio.bed.policies.BEDDecision(timestamp, day, recommended_controls, acqf_value, acqf_name, current_state, forecast_mean=<factory>)[source]¶
Bases:
objectA single decision made by the BED policy.
- Parameters:
- class perfusio.bed.policies.BEDPolicy(hybrid, design_space, acqf_name='qLogEI', allow_write=False, targets=None, n_restarts=10, n_raw_samples=512)[source]¶
Bases:
objectDaily self-driving control policy.
- Parameters:
hybrid (object) – Fitted hybrid model.
design_space (DesignSpace) – Process design space with control bounds.
acqf_name (str) – Acquisition function to use. Default
"qLogEI"(robust for noisy bioprocess observations).allow_write (bool) – Safety gate: if
False, recommendations are logged but NOT written to the bioreactor. Must beTruefor closed-loop operation.targets (list[object] | None) – List of
TargetSpecobjects.n_restarts (int) –
optimize_acqfrestarts.n_raw_samples (int) – Sobol initialisation candidates.
Examples
>>> from perfusio.bed import BEDPolicy >>> # policy = BEDPolicy(hybrid, design_space, allow_write=False) >>> # decision = policy.decide(c_current, day=7)
- decide(c_current, day, surrogate_model, best_f=None, seed=None)[source]¶
Compute the recommended control setpoints for the next day.
- Parameters:
c_current (Tensor) – Current species state, shape
(n_species,).day (int) – Current culture day.
surrogate_model (Any) – Fitted BoTorch surrogate (wraps the hybrid GP).
best_f (float | None) – Best observed OFV so far. Required for EI-based acquisitions.
seed (int | None) – Random seed for deterministic optimisation.
- Returns:
Decision record including recommended controls and forecast.
- Return type:
- property history: list[BEDDecision]¶
All decisions made so far.
Pareto front computation and hypervolume indicator.
Provides:
compute_pareto_front(): identify non-dominated points.hypervolume(): Hypervolume dominated by the Pareto front w.r.t. a reference point.
These are used by the multi-objective BED loop to assess diversity and progress of the Pareto set.
References
Daulton, S., et al. (2020). Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization. NeurIPS.
- perfusio.bed.pareto.compute_pareto_front(Y)[source]¶
Return the indices of non-dominated points in Y.
Assumes maximisation for all objectives (negate objectives that should be minimised before calling).
- Parameters:
Y (Tensor) – Objective matrix, shape
(N, n_objectives). Higher is better for all objectives.- Returns:
Boolean mask of shape
(N,)—Truefor non-dominated points.- Return type:
Tensor
Examples
>>> import torch >>> from perfusio.bed.pareto import compute_pareto_front >>> Y = torch.tensor([[1.0, 2.0], [2.0, 1.0], [1.5, 1.5], [0.5, 0.5]]) >>> compute_pareto_front(Y) tensor([ True, True, True, False])
- perfusio.bed.pareto.hypervolume(Y, ref_point)[source]¶
Compute the hypervolume indicator for a set of points.
Uses the BoTorch
Hypervolumeclass for efficiency and correctness.- Parameters:
- Returns:
Hypervolume dominated by Y w.r.t. ref_point.
- Return type:
Acquisition function optimisation over the process design space.
Wraps BoTorch’s optimize_acqf with:
- Explicit lower/upper bounds from DesignSpace.
- Restarts from Sobol quasi-random initial candidates.
- Batch support for q > 1 (parallelised reactor experiments).
References
Balandat et al. (2020). BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. NeurIPS.
- perfusio.bed.search.optimise_acquisition(acqf, design_space, q=1, n_restarts=10, n_raw_samples=512, fixed_features=None, seed=None, dtype=torch.float64, device=None)[source]¶
Optimise acqf over the control design space.
- Parameters:
acqf (ba.AcquisitionFunction) – BoTorch acquisition function.
design_space (DesignSpace) –
DesignSpacewith control bounds.q (int) – Batch size (number of concurrent recommendations). Use
q > 1only with MC acquisitions (qEI,qUCB, etc.).n_restarts (int) – Number of restarts for
optimize_acqf.n_raw_samples (int) – Number of Sobol candidates for initialisation.
fixed_features (dict[int, float] | None) – Dict mapping feature indices to fixed values (e.g. fix temperature).
seed (int | None) – Random seed.
dtype (torch.dtype) – Tensor dtype. Default
torch.float64.device (torch.device | None) – Compute device. Default CPU.
- Returns:
(candidates, acqf_values)— the optimal candidate(s) (shape(q, n_controls)) and their acquisition values (shape(q,)).- Return type:
tuple[Tensor, Tensor]
Simulator¶
CHO perfusion simulator — virtual ambr®250 bioreactor.
Provides a fully self-contained, stochastic in-silico CHO bioreactor that replicates the operating conditions described in Gadiyar et al. (2026) §3.3:
Clone X: Warburg metabolic switch (lactate consumption at Glc < 2 g/L)
Clone Y: no switch (always produces lactate)
24-run Box-Behnken training experiment generator
5-reactor closed-loop demonstration (Gadiyar Fig. 5)
The simulator implements the same BioreactorConnector
interface as the real OPC UA connector, so the digital twin and self-driving
loop can run identically against both the virtual and real hardware.
References
Gadiyar et al. (2026), §3.3 (in-silico validation).
- class perfusio.simulator.cho_perfusion.CHOSimulator(clone='CloneX', y0=None, controls=None, volume_L=0.25, noise=None, acceleration=1.0, seed=42)[source]¶
Bases:
objectVirtual ambr®250 bioreactor implementing the BioreactorConnector interface.
Runs the CHO kinetic ODE in accelerated simulation time (up to 1000× real-time), applies measurement noise, and exposes the same async API as the OPC UA connector.
- Parameters:
clone (str) –
"CloneX"(Warburg switch) or"CloneY"(no switch).y0 (dict[str, float] | None) – Initial state dict. Defaults to
DEFAULT_Y0.controls (dict[str, float] | None) – Initial control setpoints. Updated via
write_setpoints().volume_L (float) – Reactor working volume [L]. Default 0.250 (ambr®250).
noise (NoiseModel | None) – Noise model. Defaults to
NoiseModel.acceleration (float) – Simulation speed multiplier vs. real-time. Default 1 (real-time). Set to 1440 for 1 simulated day per real minute.
seed (int) – Random seed.
Examples
>>> import asyncio >>> from perfusio.simulator import CHOSimulator >>> sim = CHOSimulator(clone="CloneX", seed=42) >>> asyncio.run(sim.read_sample(day=3))
- generate_box_behnken_experiment(n_days=28, seed=0)[source]¶
Generate a 24-run Box-Behnken training dataset.
Performs a Box-Behnken design over 4 key control factors: - perfusion_rate: [0.5, 1.0, 1.5] vvd - bleed_rate: [0.10, 0.15, 0.20] vvd - temperature: [36.5, 37.0, 37.5] °C - glucose_setpoint:[4.0, 5.0, 6.0] g/L
Design-of-Experiments (DoE) plan generators.
Provides:
- box_behnken(): Box-Behnken design for 3–7 factors.
- central_composite(): Central Composite Design (CCD).
- latin_hypercube(): Maximin Latin Hypercube (using SciPy’s QMC engine).
All functions return a numpy.ndarray of shape (n_runs, n_factors)
in the normalised space [-1, 1] unless scale_to_bounds is True.
References
Box, G. E. P., & Behnken, D. W. (1960). Some new three level designs for the study of quantitative variables. Technometrics, 2(4).
Montgomery, D. C. (2017). Design and Analysis of Experiments. 9th ed. Wiley. §14.
- perfusio.simulator.doe.box_behnken(n_factors, center_points=3)[source]¶
Generate a Box-Behnken design matrix.
- Parameters:
- Returns:
Design matrix, shape
(n_runs, n_factors)in normalised [-1, 1].- Return type:
np.ndarray
- Raises:
ValueError – If
n_factorsis not in [3, 7].
- perfusio.simulator.doe.central_composite(n_factors, alpha='rotatable', center_points=(4, 4))[source]¶
Generate a Central Composite Design (CCD) matrix.
- Parameters:
- Returns:
Design matrix, shape
(n_runs, n_factors)in normalised space. Factorial points at ±1; axial points at ±alpha_val; centre at 0.- Return type:
np.ndarray
- perfusio.simulator.doe.latin_hypercube(n_runs, n_factors, seed=0, optimise=True)[source]¶
Generate a maximin Latin Hypercube Sample in [0, 1]^n_factors.
- Parameters:
- Returns:
LHC sample, shape
(n_runs, n_factors)in [0, 1].- Return type:
np.ndarray
Notes
Map to physical bounds externally:
x_phys = lo + (hi - lo) * x_lhc.
- perfusio.simulator.doe.scale_to_bounds(design, lo, hi, normalised=True)[source]¶
Scale a normalised (or [0,1]) design to physical bounds.
- Parameters:
design (ndarray) – Design matrix, shape
(n_runs, n_factors).lo (ndarray) – Lower bounds, shape
(n_factors,).hi (ndarray) – Upper bounds, shape
(n_factors,).normalised (bool) – If
True, input is in [-1, 1]; else in [0, 1].
- Returns:
Scaled design in physical units, same shape.
- Return type:
np.ndarray
Noise model for the in-silico simulator.
Simulates typical analytical measurement uncertainty for a CHO perfusion bioreactor operating in an ambr®250 platform.
Calibrated to match Gadiyar et al. (2026) §3.3: - VCD: 5% CV (coefficient of variation) - Viability: 2% CV - Metabolites (Glc, Gln, Glu, Lac, Amm, Pyr): 7% CV - Titer: 8% CV - Missing-data probability: 5% per species per day - Measurement time jitter: ±1 hour (uniform)
References
Gadiyar et al. (2026), §3.3 — in-silico noise parameterisation.
- class perfusio.simulator.noise.NoiseModel(cv_by_species=<factory>, default_cv=0.07, missing_prob=0.05, jitter_hours=1.0, seed=42)[source]¶
Bases:
objectAdditive proportional Gaussian noise + missing-data model.
- Parameters:
cv_by_species (dict[str, float]) – Dict mapping species name → coefficient of variation (0–1). Species not listed use
default_cv.default_cv (float) – Fallback CV for unlisted species.
missing_prob (float) – Probability that a single (species, day) measurement is missing.
jitter_hours (float) – Half-width of uniform jitter on measurement time [h].
seed (int) – Random seed.
Examples
>>> from perfusio.simulator import NoiseModel >>> nm = NoiseModel(seed=0) >>> clean = {"VCD": 20.0, "Glc": 5.0} >>> noisy = nm.apply(clean, day=7)
Digital Twin¶
Online digital twin — real-time hybrid model + BED decision loop.
DigitalTwin is the central coordinator that:
Reads daily samples from a
BioreactorConnector(hardware OPC UA or virtual ambr®250).Appends new observations to the training set and online-retrains the hybrid model (
retrain_online()).Runs a 3-step-ahead forecast.
Queries the
BEDPolicyfor the next setpoints.Checks predictive alarms via
AlarmNotifier.Writes setpoints back to the connector (gated by
allow_write).Records every event to the
AuditLogger.
Usage — blocking (sync wrapper):
twin = DigitalTwin(connector=sim, hybrid=model, ...)
twin.step(day=7) # single step (tests / notebooks)
Usage — fully async:
await twin.run(n_days=28) # full 28-day run
Safety gate:
twin = DigitalTwin(..., allow_write=False) # read-only mode (default)
twin = DigitalTwin(..., allow_write=True) # closed-loop (operator must opt-in)
References
Gadiyar et al. (2026), §3.4 — Fig. 5.
Mione et al. (2024) — audit trail requirements.
- class perfusio.twin.digital_twin.DigitalTwin(connector, hybrid, design_space, bed_policy, notifier=None, log_dir=PosixPath('perfusio_audit'), run_id='default', allow_write=False, species_names=None, retrain_every=1)[source]¶
Bases:
objectOnline digital twin with BED-driven closed-loop control.
- Parameters:
connector (BioreactorConnector) – Bioreactor data connector (OPC UA, SQL, or virtual simulator).
hybrid (HybridStateSpaceModel) – Fitted
HybridStateSpaceModel.design_space (DesignSpace) – Process control design space.
notifier (AlarmNotifier | None) – Alarm notifier. Optional.
log_dir (Path | str) – Directory for audit logs. Defaults to
./perfusio_audit.run_id (str) – Unique identifier for this run.
allow_write (bool) – Safety gate — if
False(default), setpoints are NOT written back.species_names (list[str] | None) – Ordered species names matching the model output.
retrain_every (int) – Online-retrain the hybrid GP every N days. Default 1 (daily).
- async run(n_days, interval_seconds=86400.0)[source]¶
Run the full digital twin for n_days asynchronously.
Immutable audit logger for the digital twin.
Produces two artefacts per run:
- audit_events.csv — human-readable event log (append-only).
- audit_events.parquet — machine-readable columnar store for analysis.
Records conform to the audit-trail requirements discussed in:
Mione, A., et al. (2024). Regulatory considerations for self-driving bioprocesses. Process Biochemistry.
The audit log is append-only by design. Past records are never modified.
References
Mione et al. (2024).
- class perfusio.twin.audit.AuditLogger(log_dir, run_id, flush_every=10)[source]¶
Bases:
objectAppend-only CSV + Parquet audit recorder.
- Parameters:
Examples
>>> from pathlib import Path >>> from perfusio.twin.audit import AuditLogger >>> audit = AuditLogger(log_dir=Path("/tmp/audit"), run_id="run001") >>> audit.log("SETPOINT_CHANGE", {"perfusion_rate": 1.2}) >>> audit.close()
- COLUMNS = ['timestamp', 'run_id', 'event_type', 'day', 'user', 'payload_json']¶
Predictive constraint alarm system for the digital twin.
Raises alarms when the forecast trajectory is predicted to violate process constraints before the alarm occurs — giving operators lead time to intervene.
Alarm channels (configurable via AlarmConfig):
- Console / logging (always active)
- Email (SMTP, requires EMAIL_* environment variables)
- Slack webhook (requires SLACK_WEBHOOK_URL)
References
Gadiyar et al. (2026), §3.4 (constraint monitoring).
- class perfusio.twin.notifications.AlarmEvent(day, species, predicted_value, threshold, direction, lead_days)[source]¶
Bases:
objectA single predictive alarm event.
- Parameters:
- class perfusio.twin.notifications.AlarmNotifier(thresholds, lead_days=3, channels=None)[source]¶
Bases:
objectPredictive alarm dispatcher for the digital twin.
- Parameters:
thresholds (dict[str, tuple[float, float]]) – Dict mapping species name →
(lo, hi)tuple. If a predicted value crosseslo(below) orhi(above) withinlead_days, an alarm is raised.lead_days (int) – How many days ahead to scan the forecast for violations.
channels (list[str] | None) – List of channel names:
"log","email","slack".
Examples
>>> from perfusio.twin.notifications import AlarmNotifier >>> notifier = AlarmNotifier( ... thresholds={"Glc": (2.0, 8.0), "Amm": (0.0, 5.0)}, ... channels=["log"], ... )
Daily decision scheduler for the digital twin.
Runs a 24-hour async cadence: read sample → update model → forecast → decide setpoints → write setpoints → log audit trail.
The scheduler is designed to be cancellable (asyncio.CancelledError is
handled gracefully) and restartable.
References
Gadiyar et al. (2026), Fig. 5 — closed-loop timing diagram.
- class perfusio.twin.scheduler.DailyScheduler(step_fn, n_days, interval_seconds=86400.0, start_day=1)[source]¶
Bases:
objectAsync daily-cadence scheduler for the digital twin.
- Parameters:
step_fn (Callable[[int], Coroutine[Any, Any, Any]]) – Coroutine factory
step_fn(day: int) -> None— the work to execute each simulated day. RaisesStopIterationto end run.n_days (int) – Total number of days to run.
interval_seconds (float) – Real-time interval between steps (default 86400 = 24 h real time). Set to a small value for testing (e.g. 1 s).
start_day (int) – First day number.
Examples
>>> import asyncio >>> from perfusio.twin.scheduler import DailyScheduler >>> async def my_step(day: int) -> None: ... print(f"Day {day}") >>> scheduler = DailyScheduler(my_step, n_days=3, interval_seconds=0) >>> asyncio.run(scheduler.run()) Day 1 Day 2 Day 3
Connectors¶
Abstract base class for bioreactor connectors.
All connectors — OPC UA, SQL, filesystem, and the ambr®250 emulator — must implement this interface. The digital twin calls only the methods defined here, so the twin is fully connector-agnostic.
The interface matches the BioreactorConnector
structural protocol, providing both inheritance and Protocol compatibility.
- class perfusio.connectors.base.BioreactorConnectorBase[source]¶
Bases:
ABCAbstract base class for bioreactor data connectors.
Concrete connectors should subclass this and implement all abstract methods. The async methods are designed for use with
asyncio; blocking connectors should wrap their I/O inasyncio.get_event_loop().run_in_executor().- abstractmethod async is_alive()[source]¶
Return True if the connection to the bioreactor is healthy.
- Return type:
- abstractmethod async write_setpoints(setpoints)[source]¶
Write control setpoints to the bioreactor.
- Parameters:
setpoints (dict[str, float]) – Control variable name → new value mapping.
- Return type:
None
Notes
Implementations MUST honour the
allow_writegate at theDigitalTwinlevel. The connector itself does not enforce this gate.
OPC UA connector for real bioreactor hardware.
Wraps the asyncua async OPC UA client with:
Auto-reconnect with exponential backoff (max 5 retries).
Browse-tree caching (resolved node IDs are stored to avoid repeated discovery).
Recorded-replay mode (saves/loads OPC UA values from a JSONL log file for offline testing and audit trail reconstruction).
Certificate-based and username/password authentication.
Required OPC UA node IDs must be configured via the node_map dict.
References
IEC 62541 — OPC Unified Architecture.
- class perfusio.connectors.opcua_client.OPCUAConnector(url, node_map, username=None, password=None, cert_path=None, replay_log=None, max_retries=5, retry_delay_s=2.0)[source]¶
Bases:
BioreactorConnectorBaseAsync OPC UA client for real bioreactor hardware.
- Parameters:
url (str) – OPC UA server URL, e.g.
"opc.tcp://192.168.1.10:4840".node_map (dict[str, str]) – Dict mapping species/control name → OPC UA node-ID string (e.g.
"ns=2;s=VCD").username (str | None) – Optional username for Basic256Sha256 auth.
password (str | None) – Optional password.
cert_path (str | None) – Optional path to PEM client certificate.
replay_log (Path | str | None) – If provided, read values from this JSONL file (replay mode) instead of connecting to a real server. Each line is a JSON object with keys
day,node_id,value.max_retries (int) – Maximum reconnection attempts.
retry_delay_s (float) – Initial delay between reconnection attempts (doubles each retry).
Examples
>>> from perfusio.connectors import OPCUAConnector >>> conn = OPCUAConnector( ... url="opc.tcp://localhost:4840", ... node_map={"VCD": "ns=2;s=VCD", "Glc": "ns=2;s=Glucose"}, ... )
- async write_setpoints(setpoints)[source]¶
Write control setpoints to the bioreactor.
- Parameters:
setpoints (dict[str, float]) – Control variable name → new value mapping.
- Return type:
None
Notes
Implementations MUST honour the
allow_writegate at theDigitalTwinlevel. The connector itself does not enforce this gate.
SQLAlchemy 2.0 data store for bioreactor run data.
Schema (8 tables):
- experiment — top-level experiment metadata
- reactor — individual reactor/vessel metadata
- sample — offline daily sample measurements
- setpoint_history— historical setpoint changes
- model_run — GP/hybrid model training runs
- forecast — multi-step forecasts stored for audit
- audit_event — structured audit events (mirror of CSV log)
- alarm_event — predictive alarms
Uses Alembic-compatible MetaData declarations. Run migrations with:
alembic upgrade head
References
Mione et al. (2024) — regulatory data integrity requirements.
- class perfusio.connectors.sql_store.SQLStore(db_url, experiment_name, reactor_name, echo=False)[source]¶
Bases:
BioreactorConnectorBaseSQLAlchemy-backed connector and data archive.
- Parameters:
Examples
>>> store = SQLStore("sqlite:///test.db", "Exp1", "R1") >>> store.create_tables()
- log_audit_event(run_id, event_type, payload, day=0, user='system')[source]¶
Insert an audit event record.
Filesystem connector — CSV / Parquet / Excel data store.
Reads bioreactor samples from flat-file archives (e.g. exported from an ambr®250 UNICORN workstation or a LIMS export). Supports write-back to a rolling CSV log for closed-loop applications.
Expected CSV format:
day,VCD,Via,Glc,Gln,Glu,Lac,Amm,Pyr,Titer
1,2.3,98.1,4.8,...
or long-form Parquet with columns day, species, value.
- class perfusio.connectors.filesystem.FilesystemStore(source_path, setpoint_log=None, wide_format=True)[source]¶
Bases:
BioreactorConnectorBaseRead bioreactor data from CSV / Parquet / Excel files.
- Parameters:
source_path (Path | str) – Path to the source data file (CSV, Parquet, or Excel).
setpoint_log (Path | str | None) – Optional path to append setpoint changes as CSV rows.
wide_format (bool) – If
True(default), source is wide-format (one column per species). IfFalse, source is long-format withday,species,valuecolumns.
Examples
>>> from pathlib import Path >>> store = FilesystemStore(Path("data/run01.csv"))
ambr®250 virtual emulator — full-featured digital twin of the Sartorius ambr®250.
Wraps CHOSimulator with the
BioreactorConnectorBase interface, adding:
Configurable acceleration factor (up to 1000× real-time).
5-reactor parallel emulation (as in Gadiyar Fig. 5).
Identical public API to the real OPC UA connector.
This allows the digital twin to run closed-loop simulations with no code changes when switching from simulation to real hardware.
- class perfusio.connectors.ambr250_emulator.Ambr250Emulator(clone='CloneX', volume_L=0.25, controls=None, acceleration=1.0, seed=42)[source]¶
Bases:
BioreactorConnectorBaseVirtual ambr®250 bioreactor emulator.
- Parameters:
clone (str) –
"CloneX"or"CloneY".volume_L (float) – Reactor working volume. Default 0.250 (ambr®250 max).
controls (dict[str, float] | None) – Initial control setpoints.
acceleration (float) – Speed multiplier vs. real-time. Default 1 (real-time). Set to 1440 for 1 simulated day per real minute.
seed (int) – Random seed for the noise model.
Examples
>>> import asyncio >>> from perfusio.connectors.ambr250_emulator import Ambr250Emulator >>> em = Ambr250Emulator(clone="CloneX", seed=0) >>> sample = asyncio.run(em.read_sample(day=5)) >>> sample["VCD"]
- classmethod five_reactor_ensemble(clones=None, seed_start=0)[source]¶
Instantiate 5 virtual reactors (Gadiyar Fig. 5 set-up).
- Parameters:
- Returns:
Five fully independent virtual reactors.
- Return type:
Metrics¶
Relative Root Mean Squared Error over a prediction horizon.
Implements Gadiyar et al. (2026) Equations 5–6:
where \(h\) is the prediction horizon step, \(k\) is the species index, \(R\) is the number of runs and \(\sigma_k\) is the standard deviation of the measured species computed across all runs and all time steps (the “normalisation scale” of Gadiyar Eq. 6).
A truncated tail variant is also provided: only the first n_tail time
steps are excluded from normalisation to avoid division-by-small-variance
artefacts at inoculation.
References
Gadiyar et al. (2026), Eqs. (5)–(6).
- perfusio.metrics.rrmse.rrmse_horizon(true, pred, horizon=None, n_tail=0, eps=1e-08)[source]¶
Relative RMSE across runs, time steps and species.
- Parameters:
true (Tensor) – Ground-truth observations, shape
(n_runs, T, n_species).pred (Tensor) – Model predictions (mean), shape
(n_runs, T, n_species).horizon (int | None) – If given, only the first
horizontime steps are evaluated. This allows head-to-head comparison of 1-step vs. 3-step models.n_tail (int) – Number of leading time steps to exclude from the normalisation denominator (avoids division by near-zero early-run variance).
eps (float) – Small constant added to the normalisation denominator for numerical stability.
- Returns:
Shape
(T, n_species)ifhorizonis None, else(horizon, n_species). Element[h, k]is the \(\text{rRMSE}_{k,h}\).- Return type:
Tensor
Examples
>>> import torch >>> from perfusio.metrics import rrmse_horizon >>> y = torch.randn(5, 10, 9) >>> yhat = y + 0.1 * torch.randn_like(y) >>> rr = rrmse_horizon(y, yhat, horizon=3) >>> rr.shape torch.Size([3, 9])
Probabilistic forecast calibration metrics.
Provides:
- pi_coverage(): empirical coverage of prediction intervals.
- sharpness(): mean width of prediction intervals (lower = sharper).
- crps(): Continuous Ranked Probability Score (lower = better).
All functions accept batched inputs to facilitate per-species analysis.
References
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. JASA 102(477).
- perfusio.metrics.coverage.crps(true, samples)[source]¶
Energy-form Continuous Ranked Probability Score (CRPS).
Uses the energy score decomposition:
\[\text{CRPS} = \mathbb{E}[|X - y|] - \frac{1}{2}\mathbb{E}[|X - X'|]\]where \(X, X'\) are i.i.d. draws from the predictive distribution.
- Parameters:
- Returns:
Mean CRPS per species, shape
(n_species,).- Return type:
Tensor
Notes
An energy-score estimate is used rather than the exact Gaussian CRPS to support non-Gaussian predictive distributions from the MC rollout.
- perfusio.metrics.coverage.pi_coverage(true, q_lo, q_hi)[source]¶
Empirical coverage of prediction intervals.
- Parameters:
- Returns:
Coverage probability per species, shape
(n_species,). Value 1.0 = all observations within the PI.- Return type:
Tensor
Examples
>>> import torch >>> from perfusio.metrics import pi_coverage >>> y = torch.tensor([[1.0, 2.0], [1.5, 2.5], [0.5, 1.5]]) >>> lo = torch.zeros_like(y) >>> hi = torch.full_like(y, 3.0) >>> pi_coverage(y, lo, hi) tensor([1., 1.])
- perfusio.metrics.coverage.sharpness(q_lo, q_hi)[source]¶
Mean prediction interval width per species.
Multi-objective performance metrics.
Provides:
- hypervolume_indicator(): hypervolume dominated by a Pareto front.
- igd_plus(): Inverted Generational Distance+ (IGD+).
- epsilon_indicator(): Additive ε-indicator.
These metrics are used to assess the quality of the Pareto front returned by the multi-objective BED loop (qEHVI / qNEHVI / qNParEGO).
References
Ishibuchi, H., et al. (2015). Modified Distance Calculation in Generational Distance and Inverted Generational Distance. EMO.
Zitzler, E., Thiele, L., & Bader, J. (2010). On Set-Based Multiobjective Optimization. IEEE TEC 14(1).
- perfusio.metrics.multiobjective.epsilon_indicator(approx, reference)[source]¶
Additive ε-indicator.
The smallest ε such that the approximation set ε-dominates the reference set (i.e. every reference point has an approximation point within ε).
\[\varepsilon(A, R) = \max_{r \in R} \min_{a \in A} \max_k (r_k - a_k)\]
- perfusio.metrics.multiobjective.hypervolume_indicator(Y, ref_point)[source]¶
Hypervolume dominated by Y w.r.t. ref_point.
Delegates to
hypervolume().
- perfusio.metrics.multiobjective.igd_plus(approx, reference)[source]¶
Inverted Generational Distance+ (IGD+).
Measures the average modified distance from each reference-set point to the nearest point in the approximation set.
\[\text{IGD+}(A, R) = \frac{1}{|R|} \sum_{r \in R} \min_{a \in A} \left\| \max(r - a, 0) \right\|_2\]
Visualisation¶
Static Matplotlib reproductions of Gadiyar et al. (2026) Figures 4, 6, 7, 8.
Each function returns a matplotlib.figure.Figure and includes an
alt_text parameter to generate an accessibility string for the figure.
Figure map¶
Fig. 4: Training experiment trajectories (Box-Behnken 24 runs, Clone X + Y).
Fig. 6: 3-step-ahead hybrid model predictions vs. observations (all species).
Fig. 7: Multi-objective Pareto front (titer vs. VCV at day 14).
Fig. 8: Closed-loop control performance — VCD, Glc, Titer over 28 days.
References
Gadiyar et al. (2026), Figures 4, 6, 7, 8.
- perfusio.viz.static.fig4_training_trajectories(runs, species='VCD', clone_labels=None, alt_text=False)[source]¶
Reproduce Gadiyar Fig. 4 — training experiment trajectories.
- Parameters:
runs (list[dict[str, Any]]) – List of run dicts as returned by
generate_box_behnken_experiment().species (str) – Species to plot on the y-axis. Default
"VCD".clone_labels (list[str] | None) – Optional run-level labels for the legend.
alt_text (bool) – If True, prints an accessibility alt-text description.
- Return type:
matplotlib.figure.Figure
- perfusio.viz.static.fig6_model_predictions(true_traj, pred_mean, pred_q10, pred_q90, species_names, horizon=3, alt_text=False)[source]¶
Reproduce Gadiyar Fig. 6 — 3-step hybrid model predictions.
- Parameters:
true_traj (Any) – Tensor/array, shape
(T, n_species).pred_mean (Any) – Predictions, shape
(T, n_species).pred_q10 (Any) – Predictions, shape
(T, n_species).pred_q90 (Any) – Predictions, shape
(T, n_species).horizon (int) – Prediction horizon (drawn as shaded region).
alt_text (bool) – If True, prints an accessibility alt-text description.
- Return type:
matplotlib.figure.Figure
- perfusio.viz.static.fig7_pareto_front(pareto_titer, pareto_vcv, feasible_titer=None, feasible_vcv=None, alt_text=False)[source]¶
Reproduce Gadiyar Fig. 7 — Pareto front: titer vs. VCV.
- Parameters:
pareto_titer (Any) – Titer values on the Pareto front.
pareto_vcv (Any) – VCV (or VCD) values on the Pareto front.
feasible_titer (Any | None) – All feasible evaluated points (background scatter).
feasible_vcv (Any | None) – All feasible evaluated points (background scatter).
alt_text (bool) – If True, prints an accessibility alt-text description.
- Return type:
matplotlib.figure.Figure
- perfusio.viz.static.fig8_closed_loop_performance(days, vcd, vcd_target, glc, glc_target, titer, titer_target, alt_text=False)[source]¶
Reproduce Gadiyar Fig. 8 — closed-loop control performance.
- Parameters:
days (Any) – Culture days (x-axis).
vcd (Any) – Measured trajectories.
glc (Any) – Measured trajectories.
titer (Any) – Measured trajectories.
vcd_target (float) – Setpoint targets drawn as horizontal dashed lines.
glc_target (float) – Setpoint targets drawn as horizontal dashed lines.
titer_target (float) – Setpoint targets drawn as horizontal dashed lines.
alt_text (bool) – If True, prints an accessibility alt-text description.
- Return type:
matplotlib.figure.Figure
Plotly interactive figures for perfusio.
Functions¶
trajectory_figure()— animated multi-run trajectory explorer.forecast_figure()— GP posterior with shaded uncertainty ribbon.pareto_scatter()— interactive Pareto front scatter.acquisition_surface()— 2-D acquisition function heat-map.
- perfusio.viz.interactive.acquisition_surface(x1, x2, acq_values, x1_label='Perfusion rate (vvd)', x2_label='Bleed rate (vvd)', title='Acquisition function surface')[source]¶
2-D acquisition function heat-map as an interactive contour plot.
- Parameters:
x1 (ndarray) – 1-D arrays of grid values (will be meshed internally).
x2 (ndarray) – 1-D arrays of grid values (will be meshed internally).
acq_values (ndarray) – 2-D array of acquisition values, shape
(len(x1), len(x2)).x1_label (str) – Axis labels.
x2_label (str) – Axis labels.
title (str) – Figure title.
- Return type:
plotly.graph_objects.Figure
- perfusio.viz.interactive.forecast_figure(days, mean, q10, q90, observed_days=None, observed_values=None, species='VCD', title='3-step forecast')[source]¶
Plotly figure with GP mean and 80% prediction interval ribbon.
- Parameters:
mean (ndarray) – Model predictions, 1-D arrays.
q10 (ndarray) – Model predictions, 1-D arrays.
q90 (ndarray) – Model predictions, 1-D arrays.
observed_days (list[int] | ndarray | None) – Optional measured data to overlay.
observed_values (ndarray | None) – Optional measured data to overlay.
species (str) – Species label.
title (str) – Figure title.
- Return type:
plotly.graph_objects.Figure
- perfusio.viz.interactive.pareto_scatter(titer, vcv, is_pareto=None, hover_text=None, title='Pareto front — Titer vs. VCV')[source]¶
Interactive Pareto front scatter plot.
- Parameters:
- Return type:
plotly.graph_objects.Figure
Chemistry Utilities¶
Discrete mass-balance implementation for perfusion bioreactors.
Implements Equations 2–4 of Gadiyar et al. (2026) in vectorised, batch-aware PyTorch code. The fundamental relation is:
where
\(c_k\) is the concentration of species \(k\) [g L⁻¹ or mmol L⁻¹],
\(V\) is the working volume [L],
\(u_{f,k}\) is the molar/mass feed rate [mmol h⁻¹ or g h⁻¹],
\(u_{b,k}\) is the bleed removal rate,
\(u_{p,k}\) is the permeate (harvest) removal rate,
\(R_k\) is the volumetric net production rate [mmol L⁻¹ h⁻¹ or g L⁻¹ h⁻¹].
In the discrete daily step form (Eq. 3 of the paper, rearranged):
This module computes \(R_k\) analytically from observed trajectories (for GP training) and also integrates \(c_{i+1}\) forward given \(R_k\) (for forecast rollout).
References
Gadiyar et al. (2026), Equations 2–4.
- class perfusio.chemistry.balances.DiscreteMassBalance(species, dt_hours=24.0)[source]¶
Bases:
objectVectorised implementation of Gadiyar et al. (2026), Eq. 3.
For each species \(k\):
\[R_k(s_i) = \frac{c_{i+1,k} - c_{i,k}}{\Delta t} - \frac{1}{V_i} \left[ u_{f,k} - u_{b,k} - u_{p,k} - c_{i,k} \frac{\Delta V}{\Delta t} \right]\]- Parameters:
Notes
Volume convention: For ambr®250 perfusion reactors in constant-volume mode, \(\Delta V / \Delta t = 0\) because the harvest rate equals
perfusion_rate − bleed_rate. The term is included here for generality but will be zero in most practical use cases.Units: All concentrations must be in the same base unit throughout a given trajectory (e.g. all in mmol L⁻¹). The balances do not convert units. VCD is conventionally in 10⁶ cells mL⁻¹; the rate \(R_\text{VCD}\) will therefore be in 10⁶ cells mL⁻¹ h⁻¹.
Examples
Round-trip test: rates_from_observations then step should reconstruct the original trajectory within floating-point tolerance.
>>> import torch >>> from perfusio.chemistry.species import SpeciesRegistry >>> reg = SpeciesRegistry.DEFAULT >>> species = [reg["VCD"], reg["Glc"]] >>> bal = DiscreteMassBalance(species, dt_hours=24.0) >>> # Construct a noiseless 5-day trajectory >>> T, K = 5, 2 >>> c = torch.ones(T, K, dtype=torch.float64) >>> u_f = torch.zeros(T - 1, K, dtype=torch.float64) >>> u_b = torch.zeros(T - 1, K, dtype=torch.float64) >>> u_p = torch.zeros(T - 1, K, dtype=torch.float64) >>> V = torch.full((T,), 0.250, dtype=torch.float64) >>> R = bal.rates_from_observations(c, V, u_f, u_b, u_p) >>> c_recon = bal.step(c[0], R[0], V[0], u_f[0], u_b[0], u_p[0], 24.0) >>> torch.allclose(c_recon, c[1], atol=1e-10) True
- rates_from_observations(c, V, u_f, u_b, u_p)[source]¶
Compute net volumetric rates \(R_k(s_i)\) from observations.
This implements Eq. 3 of Gadiyar et al. (2026) (rearranged to solve for \(R\)), vectorised over all species and all time steps.
- Parameters:
c (Tensor) – Concentration tensor, shape
(T, n_species)[same unit as the correspondingSpecies].V (Tensor) – Working volume tensor, shape
(T,)[L].u_f (Tensor) – Feed rate tensor, shape
(T-1, n_species)[concentration-unit × L h⁻¹ = amount h⁻¹]. Set to zero for species with no direct feed stream.u_b (Tensor) – Bleed removal rate, shape
(T-1, n_species).u_p (Tensor) – Permeate (harvest) removal rate, shape
(T-1, n_species).
- Returns:
Shape
(T-1, n_species)— one rate vector per time interval. Units are [concentration-unit h⁻¹].- Return type:
Tensor
- Raises:
ValueError – If tensor shapes are inconsistent.
Notes
The derivation follows directly from Eq. 3 of the paper. Rearranging the discrete balance:
\[R_k = \frac{c_{i+1,k} - c_{i,k}}{\Delta t} - \frac{u_{f,k} - u_{b,k} - u_{p,k}}{V_i} + c_{i,k} \frac{V_{i+1} - V_i}{V_i \Delta t}\]The last term vanishes in constant-volume mode.
- rates_from_trajectory(traj, species_indices=None)[source]¶
Convenience wrapper: extract rates from a
Trajectory.- Parameters:
traj (Trajectory) – A full run trajectory.
species_indices (list[int] | None) – Column indices to extract from
traj.species. IfNone, all columns are used.
- Returns:
Shape
(T-1, n_selected_species).- Return type:
Tensor
- step(c_t, R_t, V_t, u_f, u_b, u_p, dt=None)[source]¶
Integrate one discrete step forward (Eq. 2 of Gadiyar et al. 2026).
Computes \(c_{t+1}\) from \(c_t\), \(R_t\), and the mass-flow terms.
\[c_{t+1,k} = c_{t,k} + \Delta t \left[ R_k + \frac{u_{f,k} - u_{b,k} - u_{p,k}}{V_t} - c_{t,k} \frac{\Delta V}{V_t \Delta t} \right]\]- Parameters:
c_t (Tensor) – Concentration at time t, shape
(n_species,)or(B, n_species)for a batch.R_t (Tensor) – Net volumetric rate at time t, same shape as c_t.
V_t (Tensor) – Working volume at time t [L], scalar or shape
(B,).u_f (Tensor) – Feed rate at time t, same shape as c_t.
u_b (Tensor) – Bleed rate at time t, same shape as c_t.
u_p (Tensor) – Permeate rate at time t, same shape as c_t.
dt (float | None) – Override for the time step [h]. Defaults to
dt_hours.
- Returns:
\(c_{t+1}\), same shape as c_t.
- Return type:
Tensor
Notes
In constant-volume mode, pass
V_tas a scalar anddV=0. The termc_t * dV / (V_t * dt)vanishes automatically because the volume change is zero.Examples
>>> import torch >>> from perfusio.chemistry.species import SpeciesRegistry >>> reg = SpeciesRegistry.DEFAULT >>> bal = DiscreteMassBalance([reg["VCD"], reg["Glc"]], dt_hours=24.0) >>> c = torch.tensor([10.0, 3.0], dtype=torch.float64) >>> R = torch.tensor([0.5, -0.1], dtype=torch.float64) >>> V = torch.tensor(0.250, dtype=torch.float64) >>> u = torch.zeros(2, dtype=torch.float64) >>> c_next = bal.step(c, R, V, u, u, u) >>> c_next.shape torch.Size([2])
- class perfusio.chemistry.balances.FlowRateCalculator(volume_L=0.25, dt_hours=24.0)[source]¶
Bases:
objectCompute molar/mass flow rates for feed, bleed, and permeate streams.
Translates volumetric flow settings (in vessel volumes per day, vvd) into the per-species flow rates needed by
DiscreteMassBalance.- Parameters:
Notes
In constant-volume perfusion mode:
\[u_{b,k} = \text{bleed\_rate} \cdot V \cdot c_k / \Delta t\]\[u_{p,k} = (\text{perf\_rate} - \text{bleed\_rate}) \cdot V \cdot c_k \cdot (1 - \sigma_k) / \Delta t\]where \(\sigma_k \in [0,1]\) is the sieving coefficient for species \(k\) through the alternating tangential flow (ATF) filter. For cells, \(\sigma_{\text{cell}} \approx 0\) (perfect retention). For metabolites, \(\sigma \approx 1\) (free passage).
- compute_flows(c, perfusion_rate_vvd, bleed_rate_vvd, c_feed=None, sieving_coeffs=None)[source]¶
Compute (u_f, u_b, u_p) for a single time step.
- Parameters:
c (Tensor) – Current concentrations, shape
(K,)[concentration units].perfusion_rate_vvd (float | Tensor) – Perfusion rate [vessel volumes per day].
c_feed (Tensor | None) – Feed stream concentrations, shape
(K,).Noneimplies zero.sieving_coeffs (Tensor | None) – Per-species sieving coefficient through ATF/TFF filter, shape
(K,).0= fully retained (cells),1= freely passing (metabolites). Default: 1.0 for all (no retention).
- Returns:
(u_f, u_b, u_p)each shape(K,)in [concentration-unit × L / h].- Return type:
tuple[Tensor, Tensor, Tensor]
Notes
A harvest rate \(\text{harvest} = \text{perf} - \text{bleed}\) is applied only to freely passing species (\(\sigma_k = 1\)). Cell- retained species (\(\sigma_k = 0\)) only leave via the bleed stream.
Species registry for perfusion bioreactor models.
This module defines the canonical set of process variables tracked by
perfusio. All modules reference species by their canonical short name
(e.g. "VCD") to ensure consistent ordering in tensors.
Biological context¶
The species list covers all variables measured in a typical ambr®250 perfusion run using a Cedex Bio HT or Nova BioProfile FLEX2 at-line analyser, plus diameter measured by a Vi-Cell or Cedex HiRes.
References
Gadiyar et al. (2026), §2.1 — “Process variables” table.
Notes
DCD (dead cell density) and Lys (lysed cell density) are derived from
VCD, total cell count, and viability; they are included as species for GP
modelling but are not independent degrees of freedom.
- class perfusio.chemistry.species.Species(short_name, full_name, unit, lo, hi, is_measured=True, is_control=False)[source]¶
Bases:
objectMetadata for a single bioreactor species.
- Parameters:
short_name (str) – Canonical identifier used in tensor column names (e.g.
"VCD").full_name (str) – Human-readable name for axis labels.
unit (str) – Physical unit string for display (e.g.
"10⁶ cells mL⁻¹").lo (float) – Physically plausible lower bound.
hi (float) – Physically plausible upper bound.
is_measured (bool) – Whether this species is directly measured (True) or derived (False).
is_control (bool) – Whether this is a state variable that can be indirectly controlled via a feed (e.g. Glc via glucose setpoint).
Examples
>>> sp = Species( ... short_name="VCD", ... full_name="Viable Cell Density", ... unit="10⁶ cells mL⁻¹", ... lo=0.0, ... hi=120.0, ... ) >>> sp.short_name 'VCD'
- class perfusio.chemistry.species.SpeciesEnum(*values)[source]¶
-
Enumeration of all canonical species short names.
- AMM = 'Amm'¶
- DCD = 'DCD'¶
- DIAM = 'Diam'¶
- GLC = 'Glc'¶
- GLN = 'Gln'¶
- GLU = 'Glu'¶
- LAC = 'Lac'¶
- LYS = 'Lys'¶
- PYR = 'Pyr'¶
- TITER = 'Titer'¶
- VCD = 'VCD'¶
- VCV = 'VCV'¶
- VIA = 'Via'¶
- class perfusio.chemistry.species.SpeciesRegistry(species)[source]¶
Bases:
objectSingleton registry of all process variables.
This is the single source of truth for the canonical species list used throughout
perfusio. Obtain the default registry viaSpeciesRegistry.DEFAULT.- DEFAULT¶
The library-wide default registry matching the paper’s species table.
- Type:
Examples
>>> reg = SpeciesRegistry.DEFAULT >>> reg.names ['VCD', 'VCV', 'Via', 'Diam', 'Glc', 'Gln', 'Glu', 'Lac', 'Amm', 'Pyr', 'Titer', 'DCD', 'Lys'] >>> reg["VCD"].unit '10⁶ cells mL⁻¹'
- DEFAULT: ClassVar[SpeciesRegistry] = <perfusio.chemistry.species.SpeciesRegistry object>¶
Perfusion volume bookkeeping for ambr®250-style constant-volume operation.
In perfusion culture, the reactor working volume is kept approximately constant by balancing the perfusion (feed) rate against the sum of the bleed rate and the harvest (permeate) rate:
Hence:
When a VCV setpoint is exceeded, the bleed rate is transiently increased to remove excess biomass (and cells that are retained by the ATF/TFF filter).
References
Gadiyar et al. (2026), §2 (“Perfusion process description”).
- perfusio.chemistry.volumes.bleed_trigger(vcv_current, vcv_setpoint, bleed_rate_base_vvd, bleed_rate_max_vvd, k_bleed=0.5)[source]¶
Compute a VCV-controlled bleed rate (proportional controller).
When
vcv_current > vcv_setpoint, the bleed rate is increased proportionally to the deviation. This mimics the simple feedback controller used in ambr®250 automated perfusion protocols.\[F_{\text{bleed}} = F_{\text{base}} + k_{\text{bleed}} \cdot \max(\text{VCV} - \text{VCV}_{\text{sp}}, 0)\]- Parameters:
vcv_current (float | Tensor) – Current volumetric cell volume [%].
vcv_setpoint (float) – Target VCV [%].
bleed_rate_base_vvd (float) – Base bleed rate [vvd] applied even at setpoint.
bleed_rate_max_vvd (float) – Maximum allowable bleed rate [vvd].
k_bleed (float) – Proportional gain [vvd per % VCV deviation].
- Returns:
Bleed rate [vvd], clamped to
[bleed_rate_base_vvd, bleed_rate_max_vvd].- Return type:
Examples
>>> round(bleed_trigger(35.0, 30.0, 0.10, 0.40, k_bleed=0.05), 4) 0.35
- perfusio.chemistry.volumes.constant_volume_harvest_rate(perfusion_rate_L_per_h, bleed_rate_L_per_h)[source]¶
Return the harvest rate that maintains constant volume.
\[F_{\text{harvest}} = F_{\text{feed}} - F_{\text{bleed}}\]- Parameters:
- Returns:
Scalar — harvest rate [L h⁻¹].
- Return type:
Tensor
- Raises:
ValueError – If bleed_rate > perfusion_rate (physically impossible steady state).
Examples
>>> import torch >>> F_h = constant_volume_harvest_rate(0.250/24, 0.250*0.15/24) >>> round(float(F_h / (0.250 / 24)), 4) 0.85
- perfusio.chemistry.volumes.perfusion_volume_step(V_t, F_feed_L_per_h, F_bleed_L_per_h, F_harvest_L_per_h, dt_hours=24.0)[source]¶
Compute the reactor volume at time t+1 from volumetric flow rates.
For constant-volume operation, the caller should ensure that
F_feed ≈ F_bleed + F_harvest, in which case this function returns a volume very close to \(V_t\).- Parameters:
- Returns:
Scalar tensor with the volume at t+1 [L].
- Return type:
Tensor
Notes
\(V_{t+1} = V_t + \Delta t (F_{\text{feed}} - F_{\text{bleed}} - F_{\text{harvest}})\)
If the result is negative (physically impossible), a
RuntimeErroris raised. This should never occur under correct operating conditions.Examples
>>> import torch >>> V_next = perfusion_volume_step(0.250, 0.250/24, 0.250*0.15/24, 0.250*0.85/24) >>> abs(float(V_next) - 0.250) < 1e-9 True
Configuration & Types¶
Pydantic-v2 configuration models for perfusio.
This module provides validated, strongly-typed configuration objects used throughout the library. All physical bounds are stored here so that they serve as a single source of truth for the design space.
Classes¶
- SpeciesBounds
Lower and upper measurement bounds for a single process variable.
- DesignSpace
Full description of the operating envelope for a perfusion experiment, including control variable ranges and target criteria.
- RunConfig
Runtime configuration (device, dtype, seeds, logging level).
- AlarmConfig
Thresholds that trigger predictive constraint-violation notifications.
- class perfusio.config.AlarmConfig(*, viability_warning=97.0, viability_critical=95.0, ammonium_warning=8.0, ammonium_critical=12.0, glucose_warning=1.0, glucose_critical=0.3, notification_channel='log', alarm_file=None)[source]¶
Bases:
BaseModelThresholds that trigger predictive constraint-violation notifications.
- Parameters:
viability_warning (float) – Predicted viability [%] below which a WARNING alarm is raised.
viability_critical (float) – Predicted viability [%] below which a CRITICAL alarm is raised.
ammonium_warning (float) – Predicted ammonium [mmol L⁻¹] above which a WARNING is raised.
ammonium_critical (float) – Critical ammonium threshold.
glucose_warning (float) – Predicted glucose [g L⁻¹] below which a WARNING is raised (impending starvation).
glucose_critical (float) – Critical glucose starvation threshold.
notification_channel (Literal['log', 'file', 'email', 'slack']) – Where to send alarms. Supported:
"log"(structlog),"file"."email"and"slack"require environment variable configuration.alarm_file (Path | None) – Path to append alarm records when
notification_channel == "file".
Notes
This class deliberately does not support email or Slack credentials inline. Provide them via environment variables
PERFUSIO_SMTP_*orPERFUSIO_SLACK_WEBHOOK_URL, respectively.- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- notification_channel: Literal['log', 'file', 'email', 'slack']¶
- class perfusio.config.ControlBounds(*, lo, hi, unit, description='', hard_lower=True, hard_upper=True)[source]¶
Bases:
BaseModelBounds on a single manipulated (control) variable.
- Parameters:
lo (float) – Minimum allowable setpoint.
hi (float) – Maximum allowable setpoint.
unit (str) – Physical unit of the control variable.
description (str) – Human-readable name.
hard_lower (bool) – If
True, the optimizer will never suggest values below lo. Used to encode cell-death constraints (e.g. stir < 700 rpm kills cells).hard_upper (bool) – Analogous upper hard constraint.
Notes
Hard constraints are encoded as
boundspassed tobotorch.optim.optimize_acqf(), not as penalty terms in the acquisition function. Penalty-based approaches violate the BoTorch API and can produce numerical artefacts.- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class perfusio.config.DesignSpace(*, name, control_bounds=<factory>, species_bounds=<factory>, viability_min=95.0, vcv_target=None, titer_target=None)[source]¶
Bases:
BaseModelFull description of the operating envelope for a perfusion experiment.
This object encodes both the measurable state bounds and the control variable ranges that are passed to the Bayesian Experimental Design optimizer. It is the single source of truth for all
optimize_acqfcalls.- Parameters:
name (str) – Experiment name (used in audit logs and figure titles).
control_bounds (dict[str, ControlBounds]) – Mapping of control variable name →
ControlBounds.species_bounds (dict[str, SpeciesBounds]) – Mapping of species name →
SpeciesBounds.viability_min (float) – Minimum acceptable viability (%) for feasibility constraints. Default 95.0 matches the paper’s constraint.
vcv_target (float | None) – Target volumetric cell volume (%) for the single-objective VCV-tracking use case (Gadiyar et al. 2026, §3.5).
titer_target (float | None) – Optional target product concentration [mg L⁻¹].
Examples
>>> from perfusio.config import DesignSpace, ControlBounds, SpeciesBounds >>> ds = DesignSpace( ... name="ambr250_run_01", ... control_bounds={ ... "perfusion_rate": ControlBounds(lo=0.5, hi=2.0, unit="vvd"), ... "bleed_rate": ControlBounds(lo=0.05, hi=0.30, unit="vvd"), ... }, ... species_bounds={ ... "VCD": SpeciesBounds(lo=0.0, hi=120.0, unit="1e6 cells/mL", ... description="Viable Cell Density"), ... }, ... )
- property bounds_tensor: tuple[Tensor, Tensor]¶
Return (lower, upper) tensors of shape (n_controls,) for BoTorch.
- Returns:
lowerandupperbound tensors, dtype float64.- Return type:
tuple[Tensor, Tensor]
- control_bounds: dict[str, ControlBounds]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- species_bounds: dict[str, SpeciesBounds]¶
- class perfusio.config.RunConfig(*, seed=None, device='cpu', dtype='float64', n_mc_samples=256, n_restarts=10, raw_samples=512, log_level='INFO', allow_write=False)[source]¶
Bases:
BaseModelRuntime configuration for
perfusiocomputations.- Parameters:
seed (int | None) – Global random seed.
Nonefor non-deterministic execution.device (str) – Torch device string (
"cpu","cuda:0", …). Defaults to"cpu"to ensure CI compatibility without GPUs.dtype (Literal['float32', 'float64']) – Torch dtype. Defaults to
torch.float64(required for numerical stability of GP marginal log-likelihood optimisation — never downcast).n_mc_samples (int) – Number of Monte Carlo samples for uncertainty propagation in rollout. Default 256 (adequate for 80% PI; increase to 1024 for publication).
n_restarts (int) – Number of random restarts for acquisition function optimisation. Default 10.
raw_samples (int) – Initial Sobol samples for warm-starting acquisition optimisation. Default 512.
log_level (Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']) –
structloglog level. One of"DEBUG","INFO","WARNING","ERROR".allow_write (bool) – If
False(default), the OPC UA connector operates in read-only mode and will raisePermissionErroron anywrite_setpointscall. Must be explicitly set toTrueto enable hardware writes.
Notes
dtype = torch.float64is a hard requirement for the GP marginal log-likelihood optimisation. The LBFGS optimiser with strong-Wolfe line search becomes numerically unstable in float32 for datasets with more than ~100 training points.- dtype: Literal['float32', 'float64']¶
- log_level: Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']¶
- make_generator()[source]¶
Create a seeded
torch.Generator, orNoneif seed isNone.- Return type:
torch.Generator or None
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property torch_device: device¶
Parsed
torch.device.
- property torch_dtype: dtype¶
Parsed
torch.dtype.
- class perfusio.config.SpeciesBounds(*, lo, hi, unit, description='')[source]¶
Bases:
BaseModelLower and upper physical bounds for a single measured species.
- Parameters:
Examples
>>> b = SpeciesBounds(lo=0.0, hi=100.0, unit="10^6 cells mL^-1", ... description="Viable Cell Density") >>> b.lo 0.0
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
State, StateBatch, and Trajectory data structures for perfusio.
These are the primary data containers that flow through the entire library. They are deliberately plain dataclasses (not Pydantic models) so that PyTorch operations can be applied directly without serialisation overhead.
All tensors are stored as torch.float64 on the CPU by default. When GPU
support is desired, call .to(device) on the container.
Design notes¶
Stateholds the full bioreactor state at a single time step.StateBatchholds states for B reactors at a single time step.Trajectoryholds the full time series for a single reactor run.
The distinction matters for the digital twin: the BED optimiser works on
StateBatch (one entry per candidate reactor), while the GP is trained on
Trajectory objects (one per historical run).
- class perfusio.states.State(day, species, controls, volume_L, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]¶
Bases:
objectBioreactor state at a single discrete time step.
All numeric fields are plain Python floats (or
Nonefor missing measurements), so thatStateobjects can be serialised easily.- Parameters:
day (int) – Process day (0-indexed integer;
0= inoculation day).species (dict[str, float]) – Mapping of species name → measured (or simulated) value. Missing measurements are represented as
float("nan").controls (dict[str, float]) – Mapping of control variable name → applied setpoint.
volume_L (float) – Reactor working volume in litres. For ambr®250, this is typically 0.250 L. Must be > 0.
reactor_id (str) – Unique string identifier for the reactor vessel (e.g.
"R01").clone_id (str) – Cell-line / clone identifier used by
CloneRegistry.run_id (str) – Experiment / run identifier.
Notes
The
speciesdict uses the canonical species names defined inperfusio.chemistry.species; see that module for the full registry.Examples
>>> s = State( ... day=5, ... species={"VCD": 18.3, "Via": 97.1, "Glc": 3.2, "Amm": 4.1}, ... controls={"perfusion_rate": 1.0, "bleed_rate": 0.15}, ... volume_L=0.250, ... reactor_id="R01", ... clone_id="CX", ... run_id="EXP001", ... ) >>> s.day 5
- classmethod from_tensor(t, species_names, control_names, *, day, volume_L=0.25, reactor_id='R00', clone_id='unknown', run_id='unknown')[source]¶
Reconstruct a
Statefrom a flat tensor.- Parameters:
t (Tensor) – Flat 1-D tensor as produced by
to_tensor().species_names (list[str]) – Species ordering used when the tensor was created.
day (int) – Process day (not stored in the tensor to avoid dtype issues).
volume_L (float) – Working volume.
reactor_id (str) – Reactor ID.
clone_id (str) – Clone ID.
run_id (str) – Run ID.
- Return type:
- class perfusio.states.StateBatch(day, species, controls, volume_L, reactor_ids, clone_ids, run_ids, species_names, control_names)[source]¶
Bases:
objectBioreactor states for B reactors at a single time step.
- Parameters:
day (int) – Shared process day.
species (torch.Tensor) – Tensor of shape
(B, n_species); rows are reactors, columns are species in canonical order.controls (torch.Tensor) – Tensor of shape
(B, n_controls).volume_L (torch.Tensor) – Tensor of shape
(B,)with working volumes in litres.reactor_ids (list[str]) – List of length B with reactor identifiers.
clone_ids (list[str]) – List of length B with clone identifiers.
run_ids (list[str]) – List of length B with run identifiers.
species_names (list[str]) – Column names for the species tensor.
control_names (list[str]) – Column names for the controls tensor.
Examples
>>> import torch >>> sb = StateBatch( ... day=10, ... species=torch.zeros(4, 11), ... controls=torch.zeros(4, 6), ... volume_L=torch.full((4,), 0.250), ... reactor_ids=["R01", "R02", "R03", "R04"], ... clone_ids=["CX"] * 4, ... run_ids=["EXP001"] * 4, ... species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln", ... "Glu", "Lac", "Amm", "Pyr", "Titer"], ... control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint", ... "temperature", "agitation", "pyruvate_feed"], ... ) >>> sb.batch_size 4
- class perfusio.states.Trajectory(species, controls, volume_L, days, species_names, control_names, run_id='unknown', clone_id='unknown', reactor_id='R00', metadata=<factory>)[source]¶
Bases:
objectFull time-series data for a single bioreactor run.
A
Trajectoryis the primary input to GP training. Each row corresponds to one daily observation; the number of rows equals the run length in days.- Parameters:
species (torch.Tensor) – Float tensor of shape
(T, n_species)with measured species values. Missing observations arefloat("nan").controls (torch.Tensor) – Float tensor of shape
(T, n_controls)with applied setpoints.volume_L (torch.Tensor) – Float tensor of shape
(T,)with reactor working volumes.days (torch.Tensor) – Integer tensor of shape
(T,)with process days (0-indexed).run_id (str) – Unique run identifier.
clone_id (str) – Clone / cell-line identifier.
reactor_id (str) – Reactor vessel identifier.
metadata (dict[str, str]) – Arbitrary string → string metadata (e.g. medium lot, operator).
Notes
All tensors should be on the same device and have the same dtype. Use
to()to move the trajectory to a different device.Examples
>>> import torch >>> traj = Trajectory( ... species=torch.randn(20, 11), ... controls=torch.ones(20, 6), ... volume_L=torch.full((20,), 0.250), ... days=torch.arange(20), ... species_names=["VCD", "VCV", "Via", "Diam", "Glc", "Gln", ... "Glu", "Lac", "Amm", "Pyr", "Titer"], ... control_names=["perfusion_rate", "bleed_rate", "glucose_setpoint", ... "temperature", "agitation", "pyruvate_feed"], ... run_id="EXP001", ... clone_id="CX", ... reactor_id="R01", ... ) >>> traj.n_days 20
- get_species(name)[source]¶
Return a 1-D tensor of values for species name.
- Parameters:
name (str) – Species name.
- Returns:
Shape
(T,).- Return type:
Tensor
- slice_days(start, end)[source]¶
Return a sub-trajectory for days in
[start, end).- Parameters:
- Return type:
- species_index(name)[source]¶
Return the column index for name in the species tensor.
- Parameters:
name (str) – Species name, must be in
species_names.- Return type:
- Raises:
KeyError – If name is not a registered species.
- to_input_tensor()[source]¶
Stack species + controls + day into
(T, d)tensor.- Returns:
Shape
(T, n_species + n_controls + 1).- Return type:
Tensor