Skip to content

Ecosystem and Community๐Ÿ”—

Spark Packages๐Ÿ”—

Spark has a package repository for packages specific to Spark: Spark Packages. packages are libraries for Spark applications that can easily be shared with the community. GraphFrames is a perfect example; it makes graph analysis available on Sparkโ€™s structured APIs in ways much easier to use than the lower-level (GraphX) API built into Spark

Healthcare and genomics have seen a surge in opportunity for big data applications. For example, the ADAM Project leverages unique, internal optimizations to Sparkโ€™s Catalyst engine to provide a scalable API & CLI for genome processing.

Another package, Hail, is an open source, scalable framework for exploring and analyzing genomic data. Starting from sequencing or microarray data in VCF and other formats, Hail provides scalable algorithms to enable statistical analysis of gigabyte-scale data on a laptop or terabyte-scale data on cluster.

  • Spark Cassandra Connector : This connector helps you get data in and out of the Cassandra database.
  • Spark Redshift Connector : This connector helps you get data in and out of the Redshift database.
  • Spark bigquery: This connector helps you get data in and out of Googleโ€™s BigQuery.
  • Spark Avro: This package allows you to read and write Avro files.
  • Elasticsearch: This package allows you to get data into and out of Elasticsearch.
  • Magellan: Allows you to perform geo-spatial data analytics on top of Spark.
  • GraphFrames: Allows you to perform graph analysis with DataFrames.
  • Spark Deep Learning: Allows you to leverage Deep Learning and Spark together.

Community๐Ÿ”—

Spark Summit๐Ÿ”—

  • Event that occurs across globe at various times a year.
  • People attend to learn about cutting edge in Spark and Uses Cases
  • There are hundreds of videos on such events are available

Local Meetups๐Ÿ”—

  • Use meeting.com to find groups near you!