Statistical Modelling of Plug-In Hybrid Fuel Consumption : A study using data science methods on test fleet driving data

Detta är en Uppsats för yrkesexamina på avancerad nivå från Umeå universitet/Institutionen för matematik och matematisk statistik

Sammanfattning: The automotive industry is undertaking major technological steps in an effort to reduce emissions and fight climate change. To reduce the reliability on fossil fuels a lot of research is invested into electric motors (EM) and their applications. One such application is plug-in hybrid electric vehicles (PHEV), in which internal combustion engines (ICE) and EM are used in combination, and take turns to propel the vehicle based on driving conditions. The main optimization problem of PHEV is to decide when to use which motor. If this optimization is done with respect to emissions, the entire electric charge should be used up before the end of the trip. But if the charge is used up too early, latter driving segments for which the optimal choice would have been to use the EM will have to be done using the ICE. To address this optimization problem, we studied the fuel consumption during different driving conditions. These driving conditions are characterized by hundreds of sensors which collect data about the state of the vehicle continuously when driving. From these data, we constructed 150 seconds segments, including e.g. vehicle speed, before new descriptive features were engineered for each segment, e.g. max vehicle speed. By using the characteristics of typical driving conditions specified by the Worldwide Harmonized Light Vehicles Test Cycle (WLTC), segments were labelled as a highway or city road segments. To reduce the dimensions without losing information, principle component analysis was conducted, and a Gaussian mixture model was used to uncover hidden structures in the data. Three machine learning regression models were trained and tested: a linear mixed model, a kernel ridge regression model with linear kernel function, and lastly a kernel ridge regression model with an RBF kernel function. By splitting the data into a training set and a test set the models were evaluated on data which they have not been trained on. The model performance and explanation rate obtained for each model, such as R2, Mean Absolute Error and Mean Squared Error, were compared to find the best model. The study shows that the fuel consumption can be modelled by the sensor data of a PHEV test fleet where 6 features contributes to an explanation ratio of 0.5, thus having highest impact on the fuel consumption. One needs to keep in mind the data were collected during the Covid-19 outbreak where travel patterns were not considered to be normal. No regression model can explain the real world better than what the underlying data does.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)