Skip to main content

One post tagged with "aws emr"

View All Tags

· 11 min read
Parham Parvizi

A quick deep-dive into Apache Spark, the most popular distributed data engineering tool.

What is Spark? Why is it so popular? When and how to use it?

Learn the difference between the sub components (RDDs, DataFrames, SQL, Streaming, ...), setup PySpark , and learn how to write Spark transformations using Python and Jupyter Notebook.