Uber is one of the largest companies in the world providing ride-sharing services. It is based on a platform for connecting a two-sided marketplace of drivers and riders. The company has been providing its services since 2009 and currently operates in over 85 countries all over the world.
A major concern for the company is related to the supply and demand for rides on its platform. The average time for pickup of a rider and the average gross earning per hour of a driver are the fundamental factors that determine the performance of the company. Therefore, Uber is known to put in a lot of effort to manage both of these factors using Data Science.
Uber’s ride cancellation problem
In 2016, Sai Alluri, the Analytics Lead at Uber India, discussed on upGrad about how Uber was using data science and analytics to solve real-life problems. He explained that one of the problems faced by Uber at the time was the concerning rate of cancellation of rides by the drivers when they needed to drop off passengers to an airport in India.
The problem came into light after several complaints were received from passengers about their rides being canceled to the airport by one driver after another. This led the company to research on understanding how their marketplace is dealing with the supply and demand at airports.
Understanding the cause of the problem using data analytics
For understanding the root cause of the problem, Uber used advanced data analytical processes to gain insights from their structured as well as unstructured data.
The structured data at Uber consisted of organized data related to transactions and various customer interactions whereas the unstructured data comprised of the data coming from customer complaint tickets or real-time operations that talked about the problem in-hand, such as the issue of customers not getting trips to the airport.
One of the key metrics used in the research was the “completed by request” ratio which was used to determine the reliability of the marketplace. As stated by Alluri, it is the ratio of the number of total trip requests vs the number of completed requests for a certain hour of the day in an area. A higher ratio meant that the marketplace balance was better in those areas.
Observations
Upon analyzing the trip data, it was discovered that the cancellations of trips going to the airport was much higher in comparison to the overall cancellations. Further research on the characteristics of trips being cancelled led to an understanding that the cancellations were directly related to two major factors: lengthy routes, and long waiting time. These factors made trips around the airport to be less favourable for the drivers in terms of time and their earnings.
The drivers had to take long trips from the city in order to reach the airport. This took up a lot of their time compared to short trips within the city where they could be earning more by doing multiple trips. After spending a significant amount of time and gas on the trip, the drivers would not prefer to come back from the airport empty. However, getting a ride back from the airport was possible only when there were flights coming at the time the driver reached the airport. Hence, the drivers were bound to a long waiting time to get another ride from the airport.
For instance, if a driver went for a trip to the airport at around 5 am in the morning, and the next flight was at 8 am, the driver had to wait for about 2 to 3 hours at the airport. Essentially, the driver’s waiting time was the idle time or the unutilized time in which the driver could have been earning if he/she was outside of the airport.
The idle time of the driver at the airport changed throughout the day with respect to the flight patterns and the time at which the driver reached the airport. Moreover, at times when a comparatively higher number of flights were landing than taking off from the airport, there seemed to be a shortage of supply to meet the demand for trips from the airport.
Demand Positioning
The problem at hand required an efficient method in place to ensure that the supply and demand were met. The demand positioning model at Uber was aimed at solving this problem. The model was focused on increasing the overall efficiency by forecasting demand patterns and creating a mechanism to position the drivers at the right spot when the demand arises.
The first step was to visualize the graph of inflow and outflow of cabs to the airport for every hour of the day from 4 am to 1 am. The city was divided into several pockets to identify distinct characteristics of a particular area. A methodology called “search-surge” was used to actively find the area with the highest demand. This data was then used to develop a sophisticated demand prediction model that could estimate an upcoming demand in a particular area of the city.
This feature also allowed drivers to view an estimated waiting time based on the predicted demand to decide if they want to head to the airport or complete other rides in the city first. Drivers were now able to determine optimal times to arrive at the airport without creating excessive congestion of idle drivers.
The key points taken into consideration while building the demand positioning model were as follows.
- Historical data of about three to four weeks.
- The specific time of the day, city, and areas where a surge in demand occurred.
- The “completed by request” ratio of rides in different areas of the city. A low ratio denoted a high demand but not enough supply.
- Positioning the drivers around the areas of predicted demand prior to the demand surge.
Conclusion
Data science plays a major role in most studies and research at Uber. It provides the tools and techniques and a new direction to the traditional problem-solving approaches. Uber focuses on creating efficiency across all areas of its business. However, supply and demand remain to be one of the most critical aspects of the platform. Thus, Uber has been continuously working on improving the experience for both drivers and riders with the help of advance data analytics. Due to positive results in drivers and key metrics, they have been continuing to roll out these features to drivers all across the world.
You might as well want to read a more detailed article on how can businesses use data science.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!