Ecosystem and Community๐
Spark Packages๐
Spark has a package repository for packages specific to Spark: Spark Packages. packages are libraries for Spark applications that can easily be shared with the community. GraphFrames is a perfect example; it makes graph analysis available on Sparkโs structured APIs in ways much easier to use than the lower-level (GraphX) API built into Spark
Healthcare and genomics have seen a surge in opportunity for big data applications. For example, the ADAM Project leverages unique, internal optimizations to Sparkโs Catalyst engine to provide a scalable API & CLI for genome processing.
Another package, Hail, is an open source, scalable framework for exploring and analyzing genomic data. Starting from sequencing or microarray data in VCF and other formats, Hail provides scalable algorithms to enable statistical analysis of gigabyte-scale data on a laptop or terabyte-scale data on cluster.
Popular Spark Packages๐
- Spark Cassandra Connector : This connector helps you get data in and out of the Cassandra database.
- Spark Redshift Connector : This connector helps you get data in and out of the Redshift database.
- Spark bigquery: This connector helps you get data in and out of Googleโs BigQuery.
- Spark Avro: This package allows you to read and write Avro files.
- Elasticsearch: This package allows you to get data into and out of Elasticsearch.
- Magellan: Allows you to perform geo-spatial data analytics on top of Spark.
- GraphFrames: Allows you to perform graph analysis with DataFrames.
- Spark Deep Learning: Allows you to leverage Deep Learning and Spark together.
Community๐
Spark Summit๐
- Event that occurs across globe at various times a year.
- People attend to learn about cutting edge in Spark and Uses Cases
- There are hundreds of videos on such events are available
Local Meetups๐
- Use meeting.com to find groups near you!