Hadoop Developer
Our Data Science program is an interdisciplinary field where our expert faculty uses scientific methods, processes, algorithms, and systems to train students with this knowledge so that a student can extract data points and share insights from structured and unstructured data across a broad range of applications domains.
Course Details
Module Topic Details Status
1 Setting up VM and Hadoop
Set up Cloudera VM
Completed
Install JDK
Installation Step of Hadoop Single Node Cluster
Install VM ware / Virtual Box
2 Introduction to Bigdata
Bigdata landscape
Completed
Course Content
Session details and Feedback process
3 Hadoop Architecture, Networking and Cluster (HDFS & MapReduce)
Name Node
Completed
Data Node
Secondary Name Node
Rack Awareness
Replication & Re-replication
HDFS Read & Write
4 Linux & HDFS Commands
Basic Linux Commands
Completed
HDFS Commands
5 Working Session: Local FileSystem & HDFS Commands
6 MapReduce-1(MR V1)
Understanding Map Reduce
Completed
Job Tracker and Task Tracker
Architecture of Map Reduce
Data Flow of Map Reduce
Hadoop Writable with Java data types
Map Function & Reduce Function
How Map Reduce Works
Anatomy of Map Reduce Job
Submission & Initialization of Map Reduce Job
Monitoring & Progress of Map Reduce Job
Understand Difference Between Block and Input Split
Role of Record Reader, Shuffler and Sorter
File Input Formats
How To check the Logs of all the Nodes(NN,DN,TT,JT,SNN)
Setting up Eclipse Development Environment
Creating Map Reduce Projects
Configuring Hadoop API on Eclipse IDE
Life cycle of the Job
Identity of Reducer
7 Working Session: Program
Map Reduce program flow with word count
Completed
Cricket Match Avg Score Program
Completed
8 Assessment
Hadoop MCQ
9 Apache Sqoop
Installation of Sqoop
Introduction to SQOOP & Architecture
Import data from RDBMS to HDFS
Handling incremental loads using sqoop
Hands on exercise
10 Working Session: Sqoop Commands
11 Sqoop Assignment
12 Apache Hive
Apache Hive Introduction & History
End-to-End workflow(Hive Architecture)
Data Types in Hive
Apache Hive table
Types of Tables in Hive(External &Internal)
Partitions(Static & Dynamic)
Types of Insertion(Single &Multi Table)
CTAS & CVAS Concept
Bucketing
File Input Formats(RCFILE,TEXTFILE,ORCFILE,SQUENCEFILE)
13 Working Session: Hive Practice
14 Hive Assignment
15 Apache PIG
PIG Introduction Architecture Commands
16 Working Session: Apache Pig Practice
17 Yarn(MapReduce V2)-Hadoop 2.x
Inroduction of Yarn
Architecture of Yarn
18 ZooKeeper
Role of ZooKeeper
Journal Node
Use of ZoopKeeper
19 Apache Hbase
Hbase Introduction Hbase commands
How To View Table data
How to Insert,Update and delete the data
20 Apache Oozie
Oozie Introduction Components How to Schedule Job What is Workflow What is Cordinator What is Bundle
21 Hue
Introduction of Hue
How to run ETL process in Hue(Sqoop,Hive,Pig,Oozie)
22 Course closure : Mock Interview
Ending the cource Mock Interview on the concepts covered