Detecting Metro Service Disruptions and Predicting their Spillover Effects throughout the Network using GTFS and Large-Scale Vehicle Location Data

Detta är en Master-uppsats från KTH/Transportplanering

Författare: Weizhi Michelle Teo; [2023]

Nyckelord: ;

Sammanfattning: One of the top factors that influence commuters’ satisfaction level with public transport is the punctuality of the service. Commuters rely on public transport to get them from their origin to destination on time and any form of delay will incur additional cost to both the commuters as well as the public transport operators. Hence, there has been numerous studies and research conducted to better understand the cause of delays and make predictions on the propagation of such delays in the network. The main objectives of this thesis are as follows: i) identify sources of disruption within Stockholm Metro System; ii) understand how the delay propagates to other trains and stations in the network; and iii) formulate a model to predict delays.The data used by the project to achieve the objectives was downloaded from Trafiklab’s API, which included the static General Transit Feed Specification (GTFS) data containing information such as the timetable, and the real-time vehicle positions data. First, the static GTFS data was converted to GPS-like records and checked against the published metro map for consistency. Next, the real-time data was decoded from google protobuf format and superimposed onto the planned vehicle trajectory using the K-Nearest Neighbour algorithm. Subsequently, data cleaning and imputing were performed as there were missing and erroneous real-time vehicle positions observed. In addition, dwell times were not planned for in the timetable and historical records, after removing outliers using the Interquartile Range (IQR) method, were used as a proxy for estimating the scheduled dwell times. To detect the possible locations of primary delays, the cleaned data was used as an input for the execution of the backward pass critical method on an activity graph that represented the metro network. Thereafter, the forward pass critical method was implemented to illustrate how the primary delay is being propagated to other trains and stations, resulting in secondary delays. The output data containing these secondary delays and the independent features that are associated with them was then used to train and validate a random forest algorithm, which is capable of predicting secondary delays upon detecting a disruption (i.e. primary delay). Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics were used for the evaluation of the predictions. The accuracy of the predictions was within an acceptable range of less than a minute and the importance plot of the Random Forest model indicated that the main factor contributing to the secondary delay was the deviation in dwell time at each delayed station, followed by the magnitude of the primary delay. These two features have also been utilized in other literature for their delay prediction models, which validate the proposed model of this project. Public transport operators can utilize this model to make decisions on recovery actions in the event of a disruption and to inform commuters about the likely delay they may experience if they choose to use the affected service.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)