Skip to content


Credits : Following notes are learnings from Spark: The Definitive Guide by Bill Chambers and Matei Zaharia

Till this its probably enough to learn 80% of pyspark and people can head dive into the pyspark easily.

This shoud be performed practically with a running spark cluster and probably book is best place to read this.

You should take a look at Apache Flink as well for Streaming Solutions.

For Running Machine Learning Workloads on Spark. There is a dedicate guide for ML on the site.