Pyspark๐
Credits : Following notes are learnings from Spark: The Definitive Guide by Bill Chambers and Matei Zaharia
- Overview of Big Data and Spark
- Structured APIs - DataFrames, SQL And Datasets
Till this its probably enough to learn 80% of pyspark and people can head dive into the pyspark easily.
This shoud be performed practically with a running spark cluster and probably book is best place to read this.
- Production Applications
You should take a look at Apache Flink as well for Streaming Solutions.
- Streaming
For Running Machine Learning Workloads on Spark. There is a dedicate guide for ML on the site.
- Advanced Analytics and Machine Learning
- Ecosystem