Ridesharing the Data Pipeline | Data Stack Academy Blogs

Lyft surpassed one billion riders back in 2018. Think of how much collection, storage, and transmission of data that requires. That’s why Lyft has a team of data engineers. They process data from millions of GPS logs from rides that already happened in the area and hundreds or thousands of rides that are happening at that moment. The data includes current GPS stats, current ride requests, and historic ride statistics.

That’s a lot of data. Data scientists can’t handle that much. Their algorithms work in sample sizes of thousands, not millions. Data engineers scale it down to make it manageable.

Data scientists now use that data to give a clearer picture of what to expect on rides. They have specific prediction algorithms that break down current ride demands. Within seconds, the Lyft app lets the rider know how long they will have to wait until their driver arrives.

Data processing doesn’t end there. Data scientists have built anomaly detection algorithms to find out if something is not going as predicted. For instance, they can notify a customer if a driver takes too many wrong turns. Why? Because people are unpredictable variables. Lyft wants to get riders to their destinations quickly and safely. Using data to predict behavior helps ensure that.