Course Details

hadoop-administration-online-training

HADOOP ADMINISTRATION

Apache Hadoop is the open source data management software that helps organizations analyze huge volumes of structured and unstructured data, is a very hot topic across the tech industry. It can be quickly learn to take advantage of the MapReduce framework through technical sessions and hands on labs.

Course Details

Hadoop Developer/Admin Training – Course Content
Training Objectives of Hadoop Developer/Admin:
Hadoop Admin Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo.
Target Students / Prerequisites:
Students must be belonging to IT Background and familiar with Concepts in Java and Linux
Hadoop Architecture:
Introduction to 

  • Parallel Computer vs. Distributed Computing
  • How to install Hadoop on your system
  • How to install Hadoop cluster on multiple 
  • Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker
  • Exploring HDFS (Hadoop Distributed File System) Exploring the HDFS Apache Web UI
  • NameNode architecture (EditLog, FsImage, location of replicas) Secondary NameNode architecture
  • DataNode architecture

MapReduce Architecture:

  • Exploring JobTracker/TaskTracker
  • How a client submits a Map-Reduce job
  • Exploring Mapper/Reducer/Combiner
  • Shuffle: Sort & Partition
  • Input/output formats
  • Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler) Exploring the Apache MapReduce Web UI

Hadoop Developer Tasks:

  • Writing a map-reduce programme
  • Reading and writing data using
  • Java Hadoop Eclipse integration
  • Mapper in details
  • Reducer in details
  • Using Combiners
  • Reducing Intermediate Data with Combiners
  • Writing Partitioners for Better Load
  • Balancing Sorting in HDFS
  • Searching in HDFS
  • Indexing in HDFS
  • Hands-On Exercise

Hadoop Administrative Tasks:

  • Routine Administrative Procedures
  • Understanding dfsadmin and mradmin Block Scanner, Balancer
  • Health Check & Safe mode
  • DataNode commissioning/decommissioning
  • Monitoring and Debugging on a production
  • cluster NameNode Back up and Recovery
  • ACL (Access control list) Upgrading Hadoop

HBase Architecture:

  • Introduction to HBase
  • HBase vs. RDBMS
  • Exploring HBase Master & region server
  • Column Families and Regions
  • Basic HBase shell commands.

Hive Architecture:

  • Introduction to Hive
  • HBase vs Hive
  • Installation of Hive
  • HQL (Hive query language)
  • Basic Hive commands

Pig Architecture:

  • Introduction to Pig
  • Installation of Pig on your system
  • Basic Pig commands
  • Hands-On Exercise

Sqoop Architecture:

  • Introduction to Sqoop
  • Installation of Sqoop on your system
  • Import/Export data from RDBMS to HDFS
  • Import/Export data from RDBMS to HBase
  • Import/Export data from RDBMS to Hive
  • Hands-On Exercise

Mini Project / POC ( Proof of Concept ):

  • Facebook-Hive POC
  • Usages of Hadoop/Hive @ Facebook
  • Static & dynamic partitioning
  • UDF ( User defined functions )

Course Reviews

Average Rating:4.6

5 Stars24
4 Stars5
3 Stars2
2 Stars0
1 Star0