Use-cases/Project
2 min read

Running Spark on Amazon Web Services (AWS)

running apache spark on aws
Source: Acast-Tech https://medium.com/acast-tech/

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon's Hadoop distribution suited to run in AWS cloud environment. It's quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it's a cumbersome task. They say:

"to a man with a hammer everything looks like a nail" :)

and we felt into this trap with EMR once.

The article below describes two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that we use on our clients' data warehousing projects. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for "one service fits all" approach in AWS world.

Check out!

big data
spark
AWS
Amazon Web Services
18 December 2019

Want more? Check our articles

datagenerationobszar roboczy 1 4
Tutorial

Data online generation for event stream processing

In a lot of business cases that we solve at Getindata when working with our clients, we need to analyze sessions: a series of related events of actors…

Read more
getindator create an image illustrating the concept of data ske b0d7e21f 9c85 40d2 9a52 32caba3aece3
Tutorial

Data skew in Flink SQL

Data processing in real-time has become crucial for businesses, and Apache Flink, with its powerful stream processing capabilities, is at the…

Read more
logs analytics in cloud loki albert lewandowski getindata big data blog notext
Tutorial

Logs analytics at scale in the cloud with Loki

Logs can provide a lot of useful information about the environment and status of the application and should be part of our monitoring stack. We'll…

Read more
getindata joins forces with xebia 2 twitter facebook 1

GetInData Join Forces With Xebia

The partnership empowers both to meet the growing global demand Xebia, the company at the forefront of digital transformation, today proudly announced…

Read more
getindator data metrics shown on modern visualization being che 643c6b8e 8140 4873 b9b9 3188291a0ef9
Whitepaper

Data Quality Rules: enforcing reliability of datasets. Data Quality Assurance using AWS Glue DataBrew

In today's data-driven world, maintaining the quality and integrity of your data is paramount. Ensuring that organizations' datasets are accurate…

Read more
blogpodcast tumbnail  1
Tutorial

Airflow in a multi-teams / multi-tenant environment. Deployment strategies

This article explores using Airflow 2 in environments with multiple teams (tenants) and concludes with a brief overview of out-of-the-box features to…

Read more

Contact us

Interested in our solutions?
Contact us!

Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.


What did you find most impressive about GetInData?

They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com
The administrator of your personal data is GetInData Poland Sp. z o.o. with its registered seat in Warsaw (02-508), 39/20 Pulawska St. Your data is processed for the purpose of provision of electronic services in accordance with the Terms & Conditions. For more information on personal data processing and your rights please see Privacy Policy.

By submitting this form, you agree to our Terms & Conditions and Privacy Policy