Publications

Toward agent-based microsimulation of cyclist following behavior: Estimation of reward function parameters using inverse reinforcement learning
Reward functions are a key component in developing agent-based microsimulation models. The objective of this research is to estimate reward function parameters for cyclists in following interactions with other cyclists on bicycle paths. Decisions of cyclists (acceleration and direction) in following interactions are modeled as a finite state Markov Decision Process, in which the reward function describing the desired state of the cyclist is unknown. Two algorithms of imitation learning using Inverse Reinforcement Learning (IRL) are evaluated to estimate reward function parameters: Feature Matching (FM) and Maximum Entropy (ME) IRL. The algorithms are trained on 1297 cyclist trajectories in following interactions extracted from video data using computer vision, and then validated using a separate set of 349 trajectories. The estimated reward function parameters indicate how cyclists weigh the five state features in the reward function: speed, speed difference from leading cyclist, lateral position in path, lateral distance from leading cyclist, and longitudinal distance from leading cyclist. Following cyclists tend to prefer intermediate values of longitudinal and lateral distance to leading cyclists. Cyclists also prefer high speeds, with low speed difference from the leading cyclist and low deviation from the center of the path. Implementation of the reward functions derived from the FM and ME algorithms correctly predicted 58% and 67%, respectively, of the observed cyclist decisions (acceleration and direction) in the validation data set. This research is a key step toward developing operational bicycle traffic microsimulation models with applications such as facility planning and bicycle safety modeling.
A Bi-Level Approach for Calibrating a Traffic Simulation Model of Greater Cairo Region
Traffic simulation has proved to be a vital tool for planning and operating transportation systems. Traffic simulation models need to be calibrated by adjusting model parameters to ensure the model’s ability to reproduce local traffic conditions and serve as a reliable test-bed for evaluating modification scenarios. This research developed and calibrated a mesoscopic traffic simulation model for the exceptionally large traffic network of Greater Cairo Region (GCR). The scope of the study is limited to calibrating traffic stream parameters, while a typical user equilibrium traffic assignment model was adopted. Open source traffic simulation software “DynusT” was used as a modeling platform. A wide range of field data was consolidated from previous related studies. The calibration procedure involved two levels: theoretical-based, and simulation based calibration. In the theoretical-based calibration stage, traffic stream parameters of the modified Greenshield’s traffic flow model was estimated using non-linear regression approach. On the other hand, the simulation-based calibration involved the estimation of the Anisotropic Mesoscopic Model parameter using a genetic algorithm optimization approach. A sensitivity analysis on estimated parameters values was conducted to verify the appropriateness of chosen values. Testing results revealed the potential of the adopted calibration approach and the credibility of estimated traffic stream parameters values. Limited discrepancy was observed between simulation-based link traffic volumes and actual ones in most observed links, with a normalized root mean square error (NRMSE) of 10.6 %.
Inferring Nonlinear Reward Functions for Cyclists in Following Interactions Using Continuous Inverse Reinforcement Learning
Understanding and modeling cyclist movement patterns is an essential step in developing agent- based microsimulation models. The aim of this study is to infer how cyclists in following interactions weigh different state features, such as relative distances and speeds, when making guidance decisions. Cyclist guidance decisions are modeled as a continuous state and action Markov Decision Process (MPD). Two Inverse Reinforcement Learning (IRL) algorithms are evaluated to estimate the MPD reward function in a linear form based on Maximum Entropy (ME) and in a nonlinear form based on Gaussian Processes (GP). The algorithms are trained on 856 cyclist trajectories in following interactions extracted from video data using computer vision, and then validated using a separate set of 172 trajectories. The estimated reward functions imply cyclist preferences for low lateral distances, path deviations, speed differences, accelerations and direction angles, but high longitudinal distances from leading cyclists. The mean and variance of the reward function learned using GP can be applied to simulate heterogeneous cyclist preferences and behavior. Predicted trajectories based on Q-learning with the linear and non-linear reward functions are compared to the validation data. This research is a fundamental step toward developing operational bicycle traffic microsimulation models with applications such as facility planning and bicycle safety modeling. Key novel aspects are the investigation of continuous, non- linear, and stochastic reward functions for cyclist agents using real-world observational data.