Next term
Not scheduled
If you are interested, please contact us
Duration
2-days
Target audience
Data Engineers, Data Scientists
Technology
e.g. Apache Spark
Advanced Spark Training
This 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.
Training outcome
Participants will develop knowledge of advanced aspects of working with Apache Spark and will be able to use this to optimise and streamline Spark applications as well as integrate Spark with external data sources and sinks.
Course agenda*
Day 1
Apache Spark Training
Introduction
- Overview of Interactive Notebooks
- How to set up interactive environment for Spark
Spark Optimisation
- Parquet Optimisations
- RDD vs Dataframes performance
- Eliminating Shuffle
- Controlling number of partitions
- Exercise
Datasets
- Introduction to Datasets
- RDD vs Datasets vs Dataframes
- Sample: how to process MAX_INT_VALUE records in 2 sec
- Encoders - Object Serialization in Spark
- Exercise: Working with Datasets
Day 2
Spark Advanced
YARN
- Architecture details
- Advanced configuration settings important for Spark applications
Developer skills
- Testing Spark code
- Structuring Spark applications (clean code principles)
- Scala API tips
- Exercise
Spark 2.0
- New features at glance
- SparkSession
- Unifying DataFrames and Datasets
- Nested queries
- Whole Stage Codegen
- Vectorized reader
- Exercise
* GetInData reserves the right to make any changes and adjustments to the presented agenda.
Contact person
Klaudia Wachnio
klaudia@getindata.com
off
Piotr Krewski
piotr@getindata.com
+48 888 185 137
Testimonials
Other Big Data Training
Machine Learning Operations Training (MLOps)
This four-day course will teach you how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.Hadoop Administrator Training
This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.Data Analyst Training
This four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.Real-Time Stream Processing
This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.Analytics engineering with Snowflake and dbt
This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy Snowflake data transformation workflows faster than ever before.Mastering ML/MLOps and AI-powered Data Applications in the Snowflake Data Cloud
This 2-day training is dedicated to data engineers, data scientists, or a tech enthusiasts. This workshop will provide hands-on experience and real-world insights into architecting data applications on the Snowflake Data Cloud.Modern Data Pipelines with DBT
In this one day workshop, you will learn how to create modern data transformation pipelines managed by DBT. Discover how you can improve your pipelines’ quality and workflow of your data team by introducing a tool aimed to standardize the way you incorporate good practices within the data team.Real-time analytics with Snowflake and dbt
This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy real-time Snowlake data pipelines.
Contact us
Interested in our solutions?
Contact us!
Contact us!
Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?
They did a very good job in finding people that fitted in Acast both technically as well as culturally.
Type the form or send a e-mail: hello@getindata.com