HADOOP ADMINISTRATION
Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.
Course Details
Hadoop Developer/Admin Training – Course Content
Training Objectives of Hadoop Developer/Admin:
Hadoop Admin Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux
Hadoop Architecture:
Introduction to
- Parallel Computer vs. Distributed Computing
- How to install Hadoop on your system
- How to install Hadoop cluster on multiple
- Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
- Exploring HDFS (Hadoop Distributed File System) Exploring the HDFS Apache Web UI
- NameNode architecture (EditLog, FsImage, location of replicas) Secondary NameNode architecture
- DataNode architecture
MapReduce Architecture:
- Exploring JobTracker/TaskTracker
- How a client submits a Map-Reduce job
- Exploring Mapper/Reducer/Combiner
- Shuffle: Sort & Partition
- Input/output formats
- Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler) Exploring the Apache MapReduce Web UI
Hadoop Developer Tasks:
- Writing a map-reduce programme
- Reading and writing data using
- Java Hadoop Eclipse integration
- Mapper in details
- Reducer in details
- Using Combiners
- Reducing Intermediate Data with Combiners
- Writing Partitioners for Better Load
- Balancing Sorting in HDFS
- Searching in HDFS
- Indexing in HDFS
- Hands-On Exercise
Hadoop Administrative Tasks:
- Routine Administrative Procedures
- Understanding dfsadmin and mradmin Block Scanner, Balancer
- Health Check & Safe mode
- DataNode commissioning/decommissioning
- Monitoring and Debugging on a production
- cluster NameNode Back up and Recovery
- ACL (Access control list) Upgrading Hadoop
HBase Architecture:
- Introduction to HBase
- HBase vs. RDBMS
- Exploring HBase Master & region server
- Column Families and Regions
- Basic HBase shell commands.
Hive Architecture:
- Introduction to Hive
- HBase vs Hive
- Installation of Hive
- HQL (Hive query language)
- Basic Hive commands
Pig Architecture:
- Introduction to Pig
- Installation of Pig on your system
- Basic Pig commands
- Hands-On Exercise
Sqoop Architecture:
- Introduction to Sqoop
- Installation of Sqoop on your system
- Import/Export data from RDBMS to HDFS
- Import/Export data from RDBMS to HBase
- Import/Export data from RDBMS to Hive
- Hands-On Exercise
Mini Project / POC ( Proof of Concept ):
- Facebook-Hive POC
- Usages of Hadoop/Hive @ Facebook
- Static & dynamic partitioning
- UDF ( User defined functions )