901 Bannock, St Denver, CO 80204 USA
Hadoop Online Training – Course Content Training Objectives of Hadoop: Hadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo. Target Students / Prerequisites: Students must be belonging to IT Background and familiar with Concepts in Java and Linux. Introduction, The Motivation for Hadoop: Problems with traditional large-scale systems Requirements for a new approach Hadoop Basic Concepts: An Overview of Hadoop The Hadoop Distributed File System Hands-on Exercise How MapReduce Works Hands-on Exercise Anatomy of a Hadoop Cluster Other Hadoop Ecosystem Components Writing a MapReduce Program: Examining a Sample MapReduce Program With several examples Basic API Concepts The Driver Code The Mapper The Reducer Hadoop’s Streaming API Delving Deeper Into The Hadoop API: More About ToolRunner Testing with MRUnit Reducing Intermediate Data With Combiners The configure and close methods for Map/Reduce Setup and Teardown Writing Partitioners for Better Load Balancing Hands-On Exercise Directly Accessing HDFS Using the Distributed Cache Hands-On Exercise Performing several Hadoop jobs: The configure and close Methods Sequence Files Record Reader Record Writer Role of Reporter Output Collector Processing video files and audio files Processing image files Processing XML files Counters Directly Accessing HDFS ToolRunner Using The Distributed Cache Common MapReduce Algorithms: Sorting and Searching Indexing Classification/Machine Learning Term Frequency-Inverse Document Frequency Word Co-Occurrence Hands-On Exercise: Creating an Inverted Index Identity Mapper Identity Reducer Exploring well known problems using MapReduce applications Using HBase: What is HBase? HBase API Managing large data sets with HBase Using HBase in Hadoop applications Hands-on Exercise Using Hive and Pig: Hive Basics Pig Basics Hands-on Exercise Practical Development Tips and Techniques Debugging MapReduce Code Using LocalJobRunner Mode for Easier Debugging Retrieving Job Information with Countries Logging Splittable File Formats Determining the Optimal Number of Reducers Map-Only MapReduce Jobs Hands-on Exercise Debugging MapReduce Programs: Testing with MRUnit Logging Classification/Machine Learning Advanced MapReduce Programming A Recap of the MapReduce Flow The Secondary Sort CustomizedInputFormats and OutputFormats Pipelining Jobs With Oozie Map-Side Joins Reduce-Side Joins Joining Data Sets in MapReduce: Map-Side Joins The Secondary Sort Reduce-Side Joins Monitoring and debugging on a Production Cluster: Counters Skipping Bad Records Rerunning failed tasks with Isolation Runner Tuning for Performance in MapReduce: Reducing network traffic with combiner Partitioners Reducing the amount of input data Using Compression Reusing the JVM Running with speculative execution Refactoring code and rewriting algorithms Parameters affecting Performance Other Performance Aspects
© 2015 Mapple ITs. All rights reserved | Design by www.mappleits.com