About The Course
The Hadoop Cluster Administration training course is designed to provide knowledge and skills to become a successful Hadoop Architect. It starts with the fundamental concepts of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, configure, manage, monitor, and secure a Hadoop Cluster. The course will also cover HBase Administration. There will be many challenging, practical and focused hands-on exercises for the learners. By the end of this Hadoop Cluster Administration training, you will be prepared to understand and solve real world problems that you may come across while working on Hadoop Cluster.
After the completion of ‘Hadoop Administration’ course at LearnChase, you should be able to:
1. Get a clear understanding of Apache Hadoop, HDFS, Hadoop Cluster and Hadoop Administration
2. Gain insight on Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2
3. Plan and Deploy a Hadoop Cluster
4. Load Data and Run Applications
5. Configuration and Performance Tuning
6. Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster
7. Secure a deployment and understand Backup and Recovery
8. Understand about Oozie, Hcatalog/Hive, and HBase Administration
Who should go for this course?
This course is best suited to systems administrators, windows administrators, linux administrators, Infrastructure engineers, DB Administrators, Big Data Architects, Mainframe Professionals and IT managers who are interested in learning Hadoop Administration.
Basic knowledge of Linux is required as Hadoop runs on Linux. LearnChase offers a complementary course on “Linux Fundamentals” to all the Hadoop Administration course participants.
How will I execute the Practicals?
For your practical work, we will help you set up a Virtual Machine in your System. This will be a local access for you. You can also create anACCOUNT on AWS EC2 and use ‘Free tier usage’ eligible servers to create your Hadoop Cluster on AWS EC2. Step by step procedure is documented and shared in LMS. Our 24/7 expert support team will also be available to assist you.
Which Case-Studies will be a part of the Course?
Towards the end of the Course, you will be working on a live project, which will use the different Hadoop ecosystem components to work together in a Hadoop implementation to solve Big Data Problems.
1. Setup a minimum 2 Node Hadoop Cluster
Node 1 – Namenode, datanode, tasktracker
Node 2 – Jobtracker, datanode, tasktracker
2. Create a simple text file and copy to HDFS
Find out the location of the node to which it went
Find in which data node the output files are written
3. Create a large text file and copy to HDFS with block size 256 MB Keep all the other files in default block size and find how block size has an impact on the performance
4. Set a spaceQuota of 200MB for projects and copy a file of 70MB with replication=2
What is the reason it is not letting you copy the file?
How will you solve this problem without increasing the spaceQuota?
5. Configure Rack Awareness and copy the file to HDFS
Find its rack distribution and the command used for it
How to change the replication factor of the existing file
The final certification project is based on real world use cases as follows:
Problem Statement 1:
1. Setup a Hadoop with single node or 2 node cluster with all daemons like namenode, datanode, jobtracker, tasktracker that must run in the cluster with block size = 128MB
2. Write a Namespace ID for the cluster and create a directory with name space quota as 10 & Space Quota of 100MB in the directory
3. Use distcp command to copy the projects to the same cluster and create the list of data nodes participating in the cluster
Problem statement 2:
1. Save the namespace of the Namenode, without using secondary namenode and edits file must merge, without stopping the namenode daemon.
2. Set include file, so that no other nodes can talk to the namenode
3. Set cluster Re-balancer threshold to 40%.
4. Set the map and reduceSLOTS to s4 and 2 respectively for each node
1. Hadoop Cluster Administration
Learning Objectives – In this module, you will understand what is Big Data and Apache Hadoop, How Hadoop solves the Big Data problems, Hadoop Cluster Architecture, Introduction to MapReduce framework, Hadoop Data Loading techniques, and Role of a Hadoop Cluster Administrator.
Topics – Introduction to Big Data, Hadoop Architecture, MapReduce Framework, A typical Hadoop Cluster, Data Loading into HDFS, Hadoop Cluster Administrator: Roles and Responsibilities
2. Hadoop Architecture and Cluster setup
Learning Objectives – After this module, you will understand Multiple Hadoop Server roles such as NameNode and DataNode, and MapReduce data processing. You will also understand the Hadoop 1.0 Cluster setup and configuration, Setting up Hadoop Clients using Hadoop 1.0, and important Hadoop configuration files and parameters.
Topics – Hadoop server roles and their usage, Rack Awareness, Anatomy of Write and Read, Replication Pipeline, Data Processing, Hadoop Installation and Initial Configuration, Deploying Hadoop in pseudo-distributed mode, deploying a multi-node Hadoop cluster, Installing Hadoop Clients
3. Hadoop Cluster: Planning and Managing
Learning Objectives – In this module, you will understand Planning and Managing a Hadoop Cluster, Hadoop Cluster Monitoring and Troubleshooting, Analyzing logs, and Auditing. You will also understand Scheduling and Executing MapReduce Jobs, and different Schedulers.
Topics – Planning the Hadoop Cluster, Cluster Size, Hardware and Software considerations, Managing and Scheduling Jobs, types of schedulers in Hadoop, Configuring the schedulers and run MapReduce jobs, Cluster Monitoring and Troubleshooting.
4. Backup, Recovery and Maintenance
Learning Objectives – In this module, you will understand day to day Cluster Administration tasks such as adding and Removing Data Nodes, NameNode recovery, configuring Backup and Recovery in Hadoop, Diagnosing the Node Failures in the Cluster, Hadoop Upgrade etc.
Topics – Configure Rack awareness, Setting up Hadoop Backup, whitelist and blacklist data nodes in a cluster, setup quota’s, upgrade Hadoop cluster, copy data across clusters using distcp, Diagnostics and Recovery, Cluster Maintenance.
5. Hadoop 2.0 and High Availability
Learning Objectives – In this module, you will understand Secondary NameNode setup and check pointing, Hadoop 2.0 New Features, HDFS High Availability, YARN framework, MRv2, and Hadoop 2.0 Cluster setup in pseudo- distributed and distributed mode.
Topics – Configuring Secondary NameNode, Hadoop 2.0, YARN framework, MRv2, Hadoop 2.0 Cluster setup, Deploying Hadoop 2.0 in pseudo-distributed mode, deploying a multi-node Hadoop 2.0 cluster.
6. Advanced Topics: QJM, HDFS Federation and Security
Learning Objectives – In this module, you will understand basics of Hadoop security, Managing security with Kerberos, HDFS Federation setup and Log Management. You will also understand HDFS High Availability using Quorum Journal Manager (QJM).
Topics – Configuring HDFS Federation, Basics of Hadoop Platform Security, Securing the Platform, Configuring Kerberos.
7. Oozie, Hcatalog/Hive and HBase Administration
Learning Objectives – In this module, you will understand Setting up Apache Oozie Workflow Scheduler for Hadoop Jobs, Hcatalog/Hive Administration, deploying HBase with other Hadoop components, Using HBase effectively to load data, writing to and reading from HBase.
Topics – Oozie, Hcatalog/Hive Administration, HBase Architecture, HBase setup, HBase and Hive Integration, HBase performance optimization.
8. Project: Hadoop Implementation
Learning Objectives – In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. You will also learn how to plan, design, and deploy a Hadoop Cluster using a typical Real-World Use Case.
Topics – Understanding the Problem, Plan, Design, and Create a Hadoop Cluster for a Real World Use Case, Setup and Configure commonly used Hadoop ecosystem components such as Pig and Hive, Configure Ganglia on the Hadoop cluster and troubleshoot the common Cluster Problems.
- 10 Days
- 0 Units
- 0 Hrs