Predictive Modeling of Pipetting Dynamics. Multivariate Regression Analysis: PLS and ANN for Estimating Density and Volume from Pressure Recordings

Detta är en Master-uppsats från Lunds universitet/Avdelningen för Biomedicinsk teknik

Författare: Lisa Linard Pedersen; [2024]

Nyckelord: Technology and Engineering;

Sammanfattning: Thermo Fisher Scientific manufacture automatic pipetting instruments for diagnostic tests. These tests are sensitive to abnormalities and changes in e.g. volume or density could potentially lead to less precision or other issues in the pipetting work flow. Utilizing data collected from a pressure sensor inside the pipette could be a way of automatically verifying different aspects related to the pipetting. Machine learning may be a powerful tool in continuously evaluating these aspects and keeping the handler notified of any changes. This thesis aims to investigate the feasibility of extracting useful insights from pipetting pressure recordings. The initial objective was classifying error causes such as bubbles or foam in the pipette and data was collected with this in mind. This however was not successful as these errors were not detectable in the pressure recordings. Hence, the thesis focuses on the secondary objective, to estimate pipetted volume and density based on pressure sensor data. The data collection was done using the Thermo Fisher pipetting instrument Phadia 200. Three different sets were collected. D1 data set consisting of 4 groups of 80 observations each. These were water, 5% glycerol, 10% glycerol and 40% glycerol. D2 data set consisting of 3 groups of 50 observations each. These were three different human samples. D3 data set consisting of 3 groups of 50 observations each. These were 2.5% glycerol, 7.5% glycerol and 20% glycerol. Pressure recordings as well as estimated volumes for each sample were collected. A partial least squares model (PLS) and an artificial neural network (ANN) model were used for the regression problem. The results of the regressions were not satisfactory and it was concluded that the data was not ideal for the task. All models but the ones where all data sets were included in training yielded very poor R2 scores, especially in the volume estimations. The best model was a PLS model which had an R2 of 0.96 in volume predictions and 0.54 in density predictions. This model had an RMSE of 0.9660 in volume predictions and 0.0140 in density predictions. However, since this model was trained with all data and did not predict on any new densities, this does not say anything about generalizability to new and unseen data. The model that had the best results for predicting unseen data was a PLS model trained on D1 data set predicting the D3 data set. For these density predictions, R2 was 0.80 and RMSE 0.0093. For the volume predictions however, R2 was -40 and RMSE 2.1135. The data was collected with the primary objective, a classification problem, in mind. Since the data was finally used for a regression task, it was concluded that shortcomings in the experimental design were a crucial aspect affecting the results. It is however not possible to say whether a better set up and data set would yield better results. There is a risk that the relationships between pressure, volume and density simply are not clear enough or are too easily affected by outside factors. The conclusion is therefore that further investigation needs to be done in order to evaluate the feasibility of the methods.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)