Running Spark on Amazon Web Services (AWS)

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon's Hadoop distribution suited to run in AWS cloud environment. It's quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it's a cumbersome task. They say:

"to a man with a hammer everything looks like a nail" :)

and we felt into this trap with EMR once.

The article below describes two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that we use on our clients' data warehousing projects. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for "one service fits all" approach in AWS world.

Check out!

big data

spark

AWS

Amazon Web Services

Last updated: 18 December 2019

Written by

Mariusz Strzelecki

Data Engineer

Want more? Check our articles

Big Data Event

Overview of InfoShare 2024 - Part 2: Data Quality, LLMs and Data Copilot

Welcome back to our comprehensive coverage of InfoShare 2024! If you missed our first part, click here to catch up on demystifying AI buzzwords and…

Tutorial

From 0 to MLOps with ❄️ Snowflake Data Cloud in 3 steps with the Kedro-Snowflake plugin

MLOps on Snowflake Data Cloud MLOps is an ever-evolving field, and with the selection of managed and cloud-native machine learning services expanding…

Tutorial

Apache NiFi: A Complete Guide E-book.

We are proud to present you our first e-book, created by GetInData specialists. Apache NiFi: A Complete Guide is the result of long and fruitful work…

getindata big data blog apache spark iceberg

Tutorial

Apache Spark with Apache Iceberg - a way to boost your data pipeline performance and safety

SQL language was invented in 1970 and has powered databases for decades. It allows you not only to query the data, but also to modify it easily on the…

getindata blog business value event processing

Use-cases/Project

Business value of event processing - use cases

Every second your IT systems exchange millions of messages. This information flow includes technical messages about opening a form on your website…

Tutorial

Cloud data warehouses: Snowflake vs BigQuery. What are the differences between the pricing models?

Companies planning to process data in the cloud face the difficulty of choosing the right data warehouse. Choosing the right solution is one of the…

Running Spark on Amazon Web Services (AWS)

Like this post?
Spread the word

Want more? Check our articles

Overview of InfoShare 2024 - Part 2: Data Quality, LLMs and Data Copilot

From 0 to MLOps with ❄️ Snowflake Data Cloud in 3 steps with the Kedro-Snowflake plugin

Apache NiFi: A Complete Guide E-book.

Apache Spark with Apache Iceberg - a way to boost your data pipeline performance and safety

Business value of event processing - use cases

Cloud data warehouses: Snowflake vs BigQuery. What are the differences between the pricing models?

Contact us

Interested in our solutions?
Contact us!

Running Spark on Amazon Web Services (AWS)

Like this post?Spread the word

Want more? Check our articles

Overview of InfoShare 2024 - Part 2: Data Quality, LLMs and Data Copilot

From 0 to MLOps with ❄️ Snowflake Data Cloud in 3 steps with the Kedro-Snowflake plugin

Apache NiFi: A Complete Guide E-book.

Apache Spark with Apache Iceberg - a way to boost your data pipeline performance and safety

Business value of event processing - use cases

Cloud data warehouses: Snowflake vs BigQuery. What are the differences between the pricing models?

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!