The Importance of scaling Stirred Reactors in Terms of Development and Manufacturing Costs

The scale-up of a pharmaceutical process is a crucial step in the development of any drug product. Many of the active ingredients (API) in a drug product are manufactured in stirred reactors. Manufacturing scale volumes can range from several hundred to several thousand Liter depending on the type of process and potency of the API.  Before running the process in a stirred reactor at manufacturing scale, it needs to be developed in a laboratory, typically in the range of Millilitres to Liters. From there it undergoes a stepwise scale-up to kilo scale, pilot scale and manufacturing scale.

The scale-up is of particular importance at this point. Scaling up a process right the first time can save costs in the order of several ten thousand up to one million per individual process test. To achieve this, as far as physically possible, the scale-up strategy aims to maintain the same process conditions and the associated fluid dynamics between the reactors on the small and large scale. This ensures consistent product quality and yield across scales and is therefore essential for a cost-effective process development and manufacturing.

Utilizing Computational Fluid Dynamics (CFD) for Simulation and Analysis of Stirred Reactors

Computational Fluid Dynamics (CFD) is a powerful tool used to simulate and analyze fluid flow and heat transfer in stirred reactors (Figure 1). By creating a virtual model of the reactor, CFD enables the prediction of process behaviour and optimization of operating conditions. CFD can provide insights into flow patterns, mixing efficiency, and heat distribution, allowing for the identification of potential issues and the improvement of reactor performance.

Figure 1: Reactor CFD Simulation. Left and Right: Slice through reactor showing velocity vectors and axial velocity direction (Red: Positive, Blue: Negative). Middle: Top view on vortex showing the mesomixing time map.

CFD is therefore particularly useful to support the scale-up of a pharmaceutical process in a stirred reactor. Since the aim of any scale-up is to maintain the same process conditions and fluid dynamics between the two reactors, CFD can provide the necessary insights to achieve this.

However, building up a CFD model and analysing the results to derive actionable recommendations can be a very time-consuming task. This is particularly true when it comes to changing reactor configurations and operating conditions several times to reach best possible fluid dynamics similarity between the two reactors. To reduce the iterations until best possible similarity is achieved, machine learning methodologies can be used, to evaluate the most promising configuration before starting with the time-consuming process of CFD.

Integrating CFD and Machine Learning for fast Prediction and Enhanced Efficiency

By integrating CFD and machine learning, it is possible to achieve enhanced efficiency in the scale-up process of stirred reactors. By defining a design space consisting of a number of stirred reactor configurations, operating conditions and scales, CFD can provide the necessary data for machine learning algorithms, enabling the training of accurate predictive models. This combination of CFD and machine learning not only allows for real-time optimization leading to improved performance and faster scale-up. It also enables teams for a new way of collaboration with ad-hoc decision making on critical tasks like the selection of the most appropriate reactor configuration and the associated long term availability planning.

Building a Machine Learning Dataset


Integrating CFD and Machine Learning requires knowhow from the field of numerical simulation and AI. Our SimLab and AiLab has build an example training dataset using CliqScale.R results of stirred reactors like shown in Figure 2.

ML Reatorsetup
Figure 2: Base reactor setup for ML-training dataset

Different reactor configuration parameters were varied like reactor diameter, liquid height, number of immersed stirrers, number of blades, pitch angle, stirring speed. The dataset consists of 768 samples made of the following features.

Dataset configuration:

  • Volume range: 1L – 7000L
  • 3 volumes per reactor
  • 4 rpm per volume
  • 3 number of blades [2,3,4]
  • 2 pitch angles [45, 60]
  • 1 baffle configuration [4 baffles]
  • Fluid: water @ ~ 20°C

With our automated CFD postprocessing routine, cumulative distributions were extracted and the 50th percentile (median) was obtained as shown in Figure 3 with the dotted black lines.

th percentile
Figure 3: Cumulative distributions and extraction of the 50th percentile (median) as shown with the dotted black line.

The response variables are the median values of turbulent kinetic energy (k), turbulent dissipation rate (epsilon), strain rate (strainRate) and velocity magnitude (Umag). Combining the reactor features of one sample with the response variable of that sample, a feature vector is obtained. By storing all feature vectors in one file the final machine learning dataset is obtained. A machine learning algorithm can now learn how the reactor features and the response variables relate to each other. Once learned, a completely new set of reactor features can be presented to the machine learning model, which predicts the corresponding response variables within seconds. This allows for a fast screening of different reactor configurations and to find the most appropriate operating conditions before running a full CFD simulation.

The following example shows the increase in efficiency

The screening of ten reactor configurations using an ML-Model takes about 30 minutes including the preparation of all reactor configurations in an Excel file.

Ten reactor configurations with CFD can take weeks due to the large overhead typical for CFD simulations. In addition, the typical CFD waiting time of hours makes real-time analysis during team meetings impossible. Furthermore, the CFD process can be carried out by CFD experts only, whereby the ML model can be used by anyone if the model is appropriately operationalized. The implementation of an ML model is therefore beneficial in many respects.

Training and testing different Machine Learning Algorithms

Trying and testing different machine learning algorithms is a common practice to find the best algorithm and the corresponding hyperparameters. Depending on the properties of the data, some algorithms are better than others and it is not always obvious in advance, which algorithms will best fulfill the requirements. The focus in the current use case lies on the production of a dataset, training some preliminary models and demonstrating how this can help screening different reactor configurations and finding the correct initial operating conditions for a subsequent CFD simulation.

Figure 4 shows results from a cross-validated dataset using a random forest regressor. With only little optimization an explained variance scores (EVS) from 0.8 to 0.98 could be achieved.

ml dataset true predicted

ml dataset true predicted
Figure 4: Predicted vs true values for the median of the turbulent dissipation rate (upper left) with an explained variance score (EVS) of 0.8, turbulent kinetic energy (upper right) with an EVS of 0.96, velocity magnitude (lower left) with an EVS of 0.98 and for the median of the strain rate (lower right) with an EVS of 0.97.

Intermediate Summary

In the section above, the importance of scale-up was explained including the implications on development and manufacturing cost, how CFD can help to support scaling up stirred reactors right the first time as well as how machine learning can help accelerating the process of scaling up with CFD.

A machine learning model was trained on a CFD dataset generated with CliqScale.R consisting of 768 samples. The resulting ML-model can predict the medians of epsilon, k, strain rate and velocity magnitude from the reactor features liquid volume, number of blades, pitch angle of blades, number of immersed impellers, axial position of impellers and rpm.

The next section demonstrates the application of the ML-model. It will be shown how the operating conditions for two given reactors can be found using the ML-Model. This represents the step of screening different reactor configurations, where a lab and a pilot reactor has been found. As a second step these operating conditions will be used for the CFD-simulation to proof the validity of the ML-predicted operating conditions.

Using the ML-Model for the initial Estimation of Reactor Operating Conditions

There are several ways to estimate the initial stirring speed, like constant tip speed or power per volume. But these are very rough estimators and do not account for geometrical specifics of the reactors. Using a machine learning model results in much better estimations like shown in the example below. In this example a scale-up from a 3L lab scale reactor to a 280L pilot is carried out using machine learning for the operating condition prediction and confirming the predicted conditions with a full CFD simulation.

Scale-up configuration: Lab to Pilot

Figure 5 shows the scale-up configuration with a 3L reactor that needs to be scaled to 280L.

reactors ML test
Figure 5: Scale-up from 3 L to 280 L

Initial estimate of stirring speed based on power per volume

The initial estimate based on power per volume (P/V) results in an initial stirring speed of 132 rpm. The P/V approach is associated with the turbulent dissipation rate, but doesn’t deliver any a priori information about the levels of turbulence, velocity and strain rates compared to a machine learning model.

initail estimate
Figure 6: Initial estimate of stirring speed based on power per volume is 132 rpm

Using the Machine Learning Model

The values in the yellow rectangle in Figure 7 are the ML-predicted medians of the source reactor with a volume of 3L. The lower values, represent the medians at different stirring speeds of the target reactor with a volume of 280L. The objective is to find an initial guess of the stirring speed that approximately matches epsilon, k and velocity of the 3L reactor.

These stirring speeds were found to be 110 rpm to obtain best matches for epsilon and 60 rpm to obtain best matches for k and Umag. It is important to note, that finding a perfect match cannot be expected, due to different reasons. First, the algorithm is a decision tree used for a regression. Secondly, there might be gaps in the data that do not fully represent the requested configuration, and third the model is not yet fine-tuned.

ML results
Figure 7: Medians predicted with the ML-model for the 3L reactor (yellow rectangle) and the 280L reactor below. The values in the red rectangles are the values of interest and best matches of the 3L values.

The CFD results

The purpose of the present use case is demonstrating how scale-up can be accelerated through a quick upfront screening of reactor configurations and operating conditions using machine learning (ML). For the ML-predicted operating conditions, full CFD simulations are carried out to proof the validity of the operating conditions. As shown in Figure 7, the stirring speed was set to match epsilon with 110 rpm, and to match k and Umag with 60 rpm. As a consequence, the cumulative distributions of these quantities are expected to be in agreement for the source reactor (3L) and the target reactor (280L).

group epsilon cum log colored
Figure 8: ML-predicted turbulent dissipation rate target stirring speed of 110 rpm (green line) in very good agreement with the source reactor speed of 400 rpm (black dashed line)
group k cum log colored
Figure 9: ML-predicted turbulent kinetic energy target stirring speed of 60 rpm (blue line) in good agreement with the source reactor speed of 400 rpm (black dashed line)
group Umag cum log colored
Figure 10: ML-predicted velocity target stirring speed of 60 rpm (blue line) in excellent agreement with the source reactor speed of 400 rpm (black dashed line)



The present use case shows how machine learning can be used for a simplified and accelerated screening of different reactor configurations to find the most suitable reactor configuration and operating conditions for scale-up, based on different fluid dynamic quantities. The resulting configuration and operating conditions were verified as correct by CFD simulations.

Get a Machine Learning Model of your own Reactors

If such an ML model is trained for specific reactors in a company, operationalized and made available internally via a web application, this can make a significant contribution to minimizing the risk of batch failures, as it supports the selection of a suitable reactor at an early stage. This can result in significant cost savings, both due to the saved raw materials and due to the saved overhead, that is typically associated with every new process run.

As shown in the present use case, we can simulate individual reactors using our automated process and train a ML model that represents these reactors.

Are you interested in an ML model that represent your reactors? We are happy to help – contact us for a free consultation.