About Apache Hadoop
Apache Hadoop® is Built for Big Data, Insights and Innovation. Learn More Today. Cost-Effective Solution. Simple Programming Models.
Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Course Contents
The following are the course contents offered for Big Data / Apache Hadoop
- Introduction to Big Data and Hadoop
- Getting Started with Hadoop
- Introduction to Big Data Stack and Spark
- The Motivation for Hadoop
- Hadoop Overview
- Data Storage: HDFS
- Distributed Data Processing: Map Reduce
- Data Processing and Analysis: Pig
- Data Integration: Sqoop & Flume
- Other Hadoop Data Tools & EcoSystem
- Hive as Data warehouse
- HBase as NoSQL
- Oozie for workflow Management & Scheduling
- Cluster Computing and Hadoop Clusters
- Hadoop Components and the Hadoop Ecosphere
- What Do Hadoop Administrators Do?
- Key Differences between Hadoop 1 and Hadoop 2
- Distributed Data Processing: MapReduce and Spark
- Data Integration: Apache Sqoop
- Key Areas of Hadoop Administration
- Distributed Computing and Hadoop
- Hadoop 2 Architecture
- Data Storage – the Hadoop Distributed File System
- HDFS – Hadoop Distributed File System
- HDFS Architecture
- Hadoop1.x Components
- Namenode
- Fault tolerance & High availability
- Failure handling - FSImage
- HDFS Commands
- Hadoop Distributions and Installation Types
- Understanding the Configuration files
- Configuration Property names and Values
- Setting Up a Portable Hadoop File System
- Setting up a Pseudo-Distributed Hadoop 2 Cluster
- Performing the Initial Hadoop Configuration
- Operating the New Hadoop Cluster
- Hands-On Exercise
- Planning your Hadoop Cluster
- Going from a Single Rack to Multiple Racks
- Creating a Multi-Node Cluster
- Modifying the Hadoop Configuration
- Starting up the Cluster
- Configuring Hadoop Services
- Hands-On Exercise
- Map Reduce Anatomy
- Map Reduce Examples
- Running MapReduce programs in Hadoop
- Hadoop2.x Components
- Block size and performance
- YARN
- Hadoop 2.x vs Hadoop 1.x
- Hands-On Exercise
- Single Node setup
- Hands-On Exercise
- Multi Node setup
- Scaling up/down Hadoop cluster
- Replication distribution and automatic discovery
- Hands-On Exercise
- Using Combiners
- Reducing Intermediate Data with Combiners
- Using The Distributed Cache
- Logging
- Splittable File Formats
- Determining the Optimal Number of Reducers
- Map-Only MapReduce Jobs
- Hands-On Exercise
- SQOOP Introcution & Architecture
- Importing RDB data to HDFS
- Importing RDB data to Hive
- Apache Pig Introduction
- Apache Pig Setup
- Apache Pig Commands
- FILTER
- Structured(including XML/JSON) data processing using Apache Pig
- Parameter substitution
- Macros in Pig
- Unstructured data processing using Apache Pig
- Best Practices for Pig
- Pig UDF
- PIG Advanced
- Flume Introduction
- Flume with Local
- Flume with HDFS
- Flume with Hive
- Flume with HBASE
- Apache Hive - Introduction
- Apache Hive - Setup
- Managed tables & external tables
- Apache Hive - Commands
- Unstructured Data Handling with BigData Tools
- HandsOn Use Case PoC
- Best practices of monitoring a Hadoop cluster
- Using logs and stack traces for monitoring and troubleshooting
- Using open-source tools to monitor Hadoop cluster
Have Question?





