undefined India | Robotic Restaurant Kitchen in Chennai
About Apache Hadoop
Apache Hadoop® is Built for Big Data, Insights and Innovation. Learn More Today. Cost-Effective Solution. Simple Programming Models.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.

Course Contents

The following are the course contents offered for Big Data / Apache Hadoop

  • Introduction to Big Data and Hadoop
  • Getting Started with Hadoop
  • Introduction to Big Data Stack and Spark
  • The Motivation for Hadoop
  • Hadoop Overview
  • Data Storage: HDFS
  • Distributed Data Processing: Map Reduce
  • Data Processing and Analysis: Pig
  • Data Integration: Sqoop & Flume
  • Other Hadoop Data Tools & EcoSystem
  • Hive as Data warehouse
  • HBase as NoSQL
  • Oozie for workflow Management & Scheduling
  • Cluster Computing and Hadoop Clusters
  • Hadoop Components and the Hadoop Ecosphere
  • What Do Hadoop Administrators Do?
  • Key Differences between Hadoop 1 and Hadoop 2
  • Distributed Data Processing: MapReduce and Spark
  • Data Integration: Apache Sqoop
  • Key Areas of Hadoop Administration
  • Distributed Computing and Hadoop
  • Hadoop 2 Architecture
  • Data Storage – the Hadoop Distributed File System
  • HDFS – Hadoop Distributed File System
  • HDFS Architecture
  • Hadoop1.x Components
  • Namenode
  • Fault tolerance & High availability
  • Failure handling - FSImage
  • HDFS Commands
  • Hadoop Distributions and Installation Types
  • Understanding the Configuration files
  • Configuration Property names and Values
  • Setting Up a Portable Hadoop File System
  • Setting up a Pseudo-Distributed Hadoop 2 Cluster
  • Performing the Initial Hadoop Configuration
  • Operating the New Hadoop Cluster
  • Hands-On Exercise
  • Planning your Hadoop Cluster
  • Going from a Single Rack to Multiple Racks
  • Creating a Multi-Node Cluster
  • Modifying the Hadoop Configuration
  • Starting up the Cluster
  • Configuring Hadoop Services
  • Hands-On Exercise
  • Map Reduce Anatomy
  • Map Reduce Examples
  • Running MapReduce programs in Hadoop
  • Hadoop2.x Components
  • Block size and performance
  • YARN
  • Hadoop 2.x vs Hadoop 1.x
  • Hands-On Exercise
  • Single Node setup
  • Hands-On Exercise
  • Multi Node setup
  • Scaling up/down Hadoop cluster
  • Replication distribution and automatic discovery
  • Hands-On Exercise
  • Using Combiners
  • Reducing Intermediate Data with Combiners
  • Using The Distributed Cache
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Hands-On Exercise
  • SQOOP Introcution & Architecture
  • Importing RDB data to HDFS
  • Importing RDB data to Hive
  • Apache Pig Introduction
  • Apache Pig Setup
  • Apache Pig Commands
  • FILTER
  • Structured(including XML/JSON) data processing using Apache Pig
  • Parameter substitution
  • Macros in Pig
  • Unstructured data processing using Apache Pig
  • Best Practices for Pig
  • Pig UDF
  • PIG Advanced
  • Flume Introduction
  • Flume with Local
  • Flume with HDFS
  • Flume with Hive
  • Flume with HBASE
  • Apache Hive - Introduction
  • Apache Hive - Setup
  • Managed tables & external tables
  • Apache Hive - Commands
  • Unstructured Data Handling with BigData Tools
  • HandsOn Use Case PoC
  • Best practices of monitoring a Hadoop cluster
  • Using logs and stack traces for monitoring and troubleshooting
  • Using open-source tools to monitor Hadoop cluster

Have Question?

Contact us







Website:

robochef.co