Hadoop Administrator Training
This four-day course provides the practical and theoretical knowledge necessary to operate a Hadoop cluster. We put great emphasis on practical hands-on exercises that aim to prepare participants to work as effective Hadoop administrators.
Training Overview
After the training, participants will be able to independently install and configure a secure and stable Hadoop cluster. They will understand the architecture, requirements and role of the individual components of core Hadoop. They will be also prepared to troubleshoot problems with Hadoop clusters and tune cluster performance.
Course agenda*
Day 1
Hadoop Ecosystem
Course introduction
A quick introduction to core Hadoop components
Hands-on Exercises: Installing the Hadoop cluster using a cluster manager - Connecting to machines in the public cloud
- Installing the cluster manager (Cloudera Manager or Apache Ambari)
- Installation of core components of a Hadoop cluster
Overview of HDFS
- Basic concepts e.g. writing/reading files, replication, metadata and blocks of data
- Daemons and cluster infrastructure e.g. NameNode, DataNodes
- Key properties and use-cases
- Hands-on Exercises: Verification of HDFS installation and running HDFS commands
Overview of YARN
- Motivation and basic concepts
- Daemons and cluster infrastructure e.g. ResourceManager, NodeManagers, containers
- Exercises: Verification of YARN installation and running YARN commands
Overview of projects from Hadoop Ecosystem
- Processing data in Hadoop cluster with Hive
- Interactive analysis with Spark
- Transferring data to HDFS with Sqoop
- Defining and submitting workflow with Oozie
- Hands-on Exercises: Using Hive, Sqoop, and Spark
Day 2
Advanced Hadoop
Administrative aspects of HDFS
- NameNode internals e.g. metadata management, startup procedure, checkpointing with Secondary NameNode
- Important HDFS configuration settings
- Hands-on Exercises: Changing the Java heap size, restarting NameNode, checking checkpointing status, balancing HDFS
Administrative aspects of YARN
- Cluster resources e.g. container sizes, limits and best practices
- Important configuration settings
- Hands-on Exercises: Reviewing and tuning resource-related settings such as vcores and RAM.
Monitoring and alerting
- Monitoring and alerting capabilities
Hands-on Exercises: Creating custom charts, dashboards and receiving alerts
Day 3
Hadoop Security, High Availability and Multi-tenancy
Hadoop security
- Authentication with Kerberos
- Authorization for Hadoop (including Apache Sentry or Apache Ranger)
- Security-related features e.g. impersonation, encryption, auditing
High availability for Hadoop components
- HA design for HDFS, YARN, Hive, Oozie, HUE
- Hands-on Exercises: Enabling NameNode HA and verifying its correctness
- Bonus Hands-on Exercises: Migrating NameNode to a different host
- Bonus Hands-on Exercises: Enabling and verifying ResourceManager HA
YARN Schedulers
- Overview of Fair/Capacity Scheduler
- Hands-on Exercises: Configuring queues and ACLs in the Scheduler
- Hands-on Exercises: Configuring multi-tenant queues and ACLs in the Scheduler
Day 4
Popular Maintenance Tasks
Popular cluster maintenance tasks
- Hands-on Exercises: Expanding the cluster, balancing HDFS, decommissioning a node, troubleshooting Spark app
Backup and Disaster Recovery
- Build-in BDR features and components in Hadoop and other Hadoop-related projects
- Hands-on Exercises: Using Trash, HDFS snapshots and DistCp
BONUS: Advanced configuration settings for HDFS and YARN
BONUS: Hardware and software selection for Hadoop clusters
Contact person
Testimonials
Other Big Data Training
Machine Learning Operations Training (MLOps)
This four-day course will teach you how to operationalize Machine Learning models using popular open-source tools, like Kedro and Kubeflow, and deploy it using cloud computing.Advanced Spark Training
This 2-day training is dedicated to Big Data engineers and data scientists who are already familiar with the basic concepts of Apache Spark and have hands-on experience implementing and running Spark applications.Data Analyst Training
This four-day course teaches Data Analysts how to analyse massive amounts of data available in a Hadoop YARN cluster.Real-Time Stream Processing
This two-day course teaches data engineers how to process unbounded streams of data in real-time using popular open-source frameworks.Analytics engineering with Snowflake and dbt
This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy Snowflake data transformation workflows faster than ever before.Mastering ML/MLOps and AI-powered Data Applications in the Snowflake Data Cloud
This 2-day training is dedicated to data engineers, data scientists, or a tech enthusiasts. This workshop will provide hands-on experience and real-world insights into architecting data applications on the Snowflake Data Cloud.Modern Data Pipelines with DBT
In this one day workshop, you will learn how to create modern data transformation pipelines managed by DBT. Discover how you can improve your pipelines’ quality and workflow of your data team by introducing a tool aimed to standardize the way you incorporate good practices within the data team.Real-time analytics with Snowflake and dbt
This 2-day training is dedicated to data analysts, analytics engineers & data engineers, who are interested in learning how to build and deploy real-time Snowlake data pipelines.
Contact us
Interested in our solutions?
Contact us!
Contact us!
Together, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?