Spatiotemporal Forecasting of Traffic Flow Data using GNN (Ongoing)

According to a study, India suffers a huge loss of $21.3 billion annually because of delays and additional fuel consumption due to poor road conditions and frequent halts. The occurrence of congestion on the city roads prevents the movement of traffic and leads to an intolerable increase in the trip delay. The time wasted in congestion could effectively be used in doing some productive work. The sudden stop-and-go driving pattern in the congestion leads to more fuel consumption in the city, thereby increasing the pollution level in the city by emitting more carbon into the environment. Traffic congestion causes noise of high level (more than 90 dB) which causes the environment to turn unpleasant.

All of the factors mentioned above made us realize the importance of reducing the time required to reach your destination in addition to the harmful effects of congestion on the environment and the country’s economy. This led to the creation of a solution to predict an optimal path to avoid traffic congestion altogether.

It was found that PeMSD7and PeMSD8 as the two main datasets used in the development of the traffic forecast models. But both of them were based on the foreign setting which is unsuitable for utilization in the Indian Province thus, a need to create our own dataset was realized. Thus, the area to be considered was finalized and explored. Finally, 20 distinct points were selected.


Thus, a Nodejs script which uses the TomTom API was deployed on Heroku to create our own API server. This server collects traffic data at the selected nodes at a regular interval of 15 mins.


The exploratory data analysis on the collected data showcased the heavy correlation between the free flow velocity at every node at a given instance.


Now the data engineering was done to change the data into a similar form to that of PeMSD7, and PeMSD8. Thus, Dive (Dataset for Indian Vehicular Traffic Evaluation) is created which is a dataset based on the Indian setting for traffic forecasting and evaluation. It currently consists of 2 CSV files:-

  • Node.adj.csv:
    • The adjacency matrix where each node represents euclidean distance among two nodes.


  • FreeFlowVelocity.csv:
    • The free flow velocity of 20 sensors at 15-minute intervals.


Currently, we have trained the SST-GNN which is the current state of the art on our dataset and got similar results to that of PeMDS7. The results are competitive with the ones trained on PeMSD7.

Dataset<====15 mins====><====30 mins====><====45 mins====><====60 mins====>
 MAERMSEMAPEMAERMSEMAPEMAERMSEMAPEMAERMSEMAPE
PEMSD72.043.534.772.674.806.603.175.798.003.486.399.04
PEMSD81.032.081.861.392.802.671.623.282.671.743.573.50
DIVE0.951.762.010.961.692.030.911.691.9412.6126.2326.23