Real-Time Stream Processing
This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.
Outcome
After the training participants will be able to independently implement real-time big data processing scenarios with the help of Apache Kafka and Apache Flink.
They will have knowledge and understanding of the inner workings of these most widely-used open-source streaming technologies.
Course agenda*
Day 1
Introduction to Apache Kafka and Flink
Real-time data collection with Apache Kafka
- Key concepts of log-based approach
- Daemons and cluster infrastructure
- Hands-on exercise: Interacting with a Kafka Cluster to produce and consume messages with CLI scripts
Interactive reporting and data exploration with Elasticsearch
- A search engine as a core of data-driven decisions
- Live demo: visualizing continuously arriving data with Kibana
Introduction to Apache Flink
- Constructing DataStreams with Flink APIs
- Hands-on exercises: Applying simple filters to a stream of events and running jobs in YARN cluster
- Grouping data into windows based on different notions of time
- Hands-on exercises: Calculating user session statistics
- Connecting to the external world
- Hands-on exercises: Reading events from Kafka and writing statistics to Elasticsearch for real-time dashboards in Kibana
Day 2
Apache Flink Advanced
Deep dive into Apache Flink
- Advanced time handling, when out-of-the-box solutions are not enough
- Daemons and cluster infrastructure, overview of deployment modes e.g. -YARN, Mesos, Docker, Standalone
- Accessing fault-tolerant state and how it is checkpointed
- Hands-on exercises: Using low-level functions and state for constructing complex time-based scenarios
- Advantages of relational approach with StreamSQL
- Hands-on exercises: Querying streams with SQL language
- Early alerting based on a sequence of events with Flink CEP library
- Hands-on exercises: Writing pattern sequences and converting matches to alerts
Comparison of other streaming frameworks like Spark Streaming, Kafka Streams, Storm
- Daemons and cluster infrastructure
- How they implement fault-tolerance
- Feature sets
Contact person
Testimonials
Other Big Data Training
Machine Learning Operations Training (MLOps)
This four-day course will teach you how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.Hadoop Administrator Training
This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.Advanced Spark Training
This 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.Data Analyst Training
This four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.Analytics engineering with Snowflake and dbt
This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy Snowflake data transformation workflows faster than ever before.Mastering ML/MLOps and AI-powered Data Applications in the Snowflake Data Cloud
This 2-day training is dedicated to data engineers, data scientists, or a tech enthusiasts. This workshop will provide hands-on experience and real-world insights into architecting data applications on the Snowflake Data Cloud.Modern Data Pipelines with DBT
In this one day workshop, you will learn how to create modern data transformation pipelines managed by DBT. Discover how you can improve your pipelines’ quality and workflow of your data team by introducing a tool aimed to standardize the way you incorporate good practices within the data team.Real-time analytics with Snowflake and dbt
This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy real-time Snowlake data pipelines.
Contact us
Interested in our solutions?
Contact us!
Contact us!
Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?