The cloud. If you don’t know what it is, you’re probably on the wrong page. You probably also don’t realize it’s been integrated into just about every service or good in the U.S. market. Those organic wood blocks you bought for your kids—the company website has tutorials on what to build and lets you create your own that are saved under your profile. That Swiss chard you got at the farmer’s market—the farmer uses weather data trends stored on the cloud to find the precise time each year to plant their crops.
Lyft surpassed one billion riders back in 2018. Think of how much collection, storage, and transmission of data that requires. That’s why Lyft has a team of data engineers. They process data from millions of GPS logs from rides that already happened in the area and hundreds or thousands of rides that are happening at that moment. The data includes current GPS stats, current ride requests, and historic ride statistics.
Python Pandas is the #1 tool inside a Data Engineer or Data Scientist toolbox. It allows you to read/write data from a large variety of file formats; and provides extensive built-in functionality to aggregate, join, filter, and transform dataset with high performance. Pandas is the fastest and easiest tool to extract, transform, and load (ETL) dataset which fit in memory and can be process by a single machine.
This lesson will teach you the basic pandas Data Engineering skills.