Home > 2016 > May

Big education


Big Data Analytics – Hadoop 2.0

Course Outline

Day 1

Introduction to Big Data

  • What is Big Data, Examples of Big Data, Use cases of Big Data
  • What is Hadoop, History of Hadoop, Where Hadoop is being used
  • Problems with Traditional Large-Scale Systems and Need for Hadoop
  • Understanding distributed systems and Hadoop

Software Installation

  • Pre-requisites, Understanding Hadoop Configuration Files
  • Setup Single Node Hadoop Cluster, The Command-Line Interface

Hadoop Ecosystem

  • HDFS, MapReduce, Hive-Introduction, Sqoop-Introduction
  • Pig-Introduction, HBase-Introduction, Flume-Introduction
  • Spark-Introduction, Oozie-Introduction

Understanding Hadoop Distributed File System (HDFS)

  • Understanding Hadoop / HDFS Architecture
  • Hadoop Components – HDFS, Map reduce
  • Name Nodes and Data Nodes, Hadoop 2.0 Architecture
  • Running Hadoop, Web-based cluster UI-Master UI, Map Reduce UI

Hands-On Exercise: HDFS Commands

  • Basic HDFS commands

Understanding Map Reduce

  • How Map Reduce works, Data flow inMapReduce
  • Map operation, Reduce operation, MapReduce Driver Class
  • Running your First Program, Split, Record Reader(RR), Sorter
  • Shuffler and Partitioner, Combiner in-depth, Distributed Cache
  • Writing first MapReduce Drivers, Mappers and Reducers in Java with eclipse, Code Walkthrough, Error Handling in MapReduce
  • Map Reduce Job Execution Flow In-Depth

Day 2

Hands-On Exercise: Map Reduce

  • First Map Reduce Program with Basic Word Count
  • Calculate Aggregation for Structured Data
  • Handling Unstructured Data, Processing Fixed Length Values


  • Introduction to Apache Hive, Hive architecture, Installing Hive
  • Getting Data into Hive, Hive-HQL & Query Execution
  • Working with WHERE Clause, Partitions in Hive (Static and Dynamic)
  • Performing JOIN Operation in Hive (Map and Reducer Side Joins)
  • Compression in Hive (ORC), Executing hive queries in real time


  • Hive Query Hands On Exercise
  • Loading Data, Sample Query with WHERE and JOIN, Partitions


  • Installing & Configure Sqoop, Import RDBMS data to Hive using Sqoop
  • Export from Hive to RDBMS using Sqoop, Incremental Load

Hands-On Exercise:Sqoop

  • Import Data from RDBMS to HDFS and Hive
  • Export Data from HDFS or Hive to RDBMS

Day 3


  • Introduction to Apache Pig. Install Pig, Pig Architecture
  • Pig Latin – reading and writing data using Pig, Parameter Passing with Pig, UDFS in PIG, Managing Multiple Pig Scripts in Real-Time Case
  • Executing Pig Scripts in Real-Time Projects


  • What is HBase, Install HBase, HBase Architecture
  • Command line interface Exercise, MapReduce Programs in HBase
  • Filters in HBase, HBase – Hive Integration

Hands-On Exercise:HBase

  • Hbase command line interface, lading data into Hbase with MapReduce

Day 4


  • What is Spark & why, Install Spark, Spark Cluster Standalone Mode and UI, Using the Spark Shell, Spark Components
  • Spark Streaming Overview, Functional Programming with Spark


  • Resilient Distributed Datasets (RDDs), Key-Value Pair RDDs
  • Spark Interface with Scala and Java


  • RDD Partitions and HDFS Data Locality, MapReduce and Pair RDD Operations, Programming in Spark,
  • Example: Streaming Word Count, Creating the SparkContext
  • Configuring spark properties, caching overview, distributed persistence
  • Other streaming operations, common spark algorithms, iterative algorithms, Building and Running a Spark Application, Logging

Hands-On Exercise:Spark I

Hands on Examples on Spark Shell, Hands on Spark MapReduce

Building Spark Application


Introduce Flume, Flume Installation

  • Flume Components (Agent, Sourse, Channel, Sink, Receiver)
  • Flume Configuration with Source to Write Data into File (Local and HDFS), Multiple sources with Flume
  • Running Flume Agent, Running Receivers and Test with Sample Data

Hands-On Exercise:Spark II

  • Configuring Flume with HDFS sync, Streaming Data to Spark
  • Validating Chat Application

Training Course Project

  • Real time project by taking real time data, Take the data from different source system like text files, CSV files, RDBMS, Loading the data into Hadopp & develop analytics solutions using MapReduce, HIVE& PIG
May 31, 2016
Read more

Big Data Analytics: Data Mining and Predictive Analytics

Course Outline

RapidMiner Basics Pt. 1

  • Overview
    • Business Scenario, Analytics
    • Data Mining in the Enterprise, CRISP-DM
  • Basic usage
    • User Interface, Creating and Handling RapidMiner Repositories
    • Starting a New RapidMiner Project, Operators and Processes
    • Loading Data, Storing Data, Processes, and Results ·
  • EDA: Exploratory Data Analysis
    • Data Types, Data Hierarchy, Quick Summary Statistics
    • Visualizing Data, Charting
  • Data preparation
    • Normalization and Standardization
    • Basic Transformations of Value Types
    • Handling Missing Values, Sampling
    • Filtering examples and attributes, Handling attribute roles
  • Building better processes
    • Organizing, Renaming, Relative Path, Flow Control
    • Subprocesses, Building Blocks, Breakpoints
  • Predictive models
    • Correlations, K-Nearest Neighbor, Naive Bayes, Linear Regression
    • Rules, Decision Trees, Importance of Attributes
  • Model evaluation
    • Applying Models, Overfitting, Splitting Data
    • Evaluation Methods, Performance Criteria
  • Sharing and collaboration
    • Exporting Images, RapidMiner Server

RapidMiner Basics Pt. 2

  • Overview
    • Business Case Changes, Intro Course Recap, Loading New Data
  • EDA
    • Multiple Sources, Understanding New Attributes
    • Schema Relationships
  • Data preparation
    • Joins, Aggregation, Multi-level Aggregation
    • Pivot, Set Theory, Calculated Values, Regular Expressions
    • Changing Value Types, Balancing Data, Outlier Detection
    • Feature Selection, Dimensionality Reduction
  • Predictive Models (Sample Varies)
    • SVM, Random Forest, K-Means Clustering, Neural Networks
    • Logistic Regression, Meta Learning
  • Model evaluation
    • Advanced Performance Criteria, ROC Plots
    • Comparison between Models, Lift Chart
    • Significance Tests, Logging Results
    • Validation of Preprocessing and Preprocessing Models
  • Deployment
    • Sharing data, models, and processes
    • Exporting processes as web service
    • Basics of Report Creation
    • Managing Processes and Services
May 30, 2016
Read more


Introduction to Business Analytics
• The concept of Business Analytics
• Data, Information, Knowledge and Wisdom
• Data as Unique Enterprise Asset
• Data, Information and Analytics Lifecycle
• Business Analytics – Current Context
• Types of Analytics
—-o  Descriptive Analytics
—-o  Predictive Analytics
—-o  Prescriptive Analytics

Data/Information Architecture for Business Analytics
• Data/Information Architecture
• Concept of Data Warehouse/Enterprise Data Warehouse (EDW)
• ETL – Key Process
• Concept of Data Mart
• Business Intelligence
• Data Mining

Data Mining Tool
• Understand the open source DM tool RapidMiner
• Explore the various features of RapidMiner
• Walkthrough a RapidMiner demo with different scenarios

Data Mining Techniques
• Understand the various data mining techniques
• Understand how correlation matrix works
• Understand how association rule mining works
• Understanding the Predictive Analytics technique
• Understand the forecasting technique

Introduction to Big Data
• What is Big Data? Why Big Data?
• 3V’s of Big Data
• The Rapid Growth of Unstructured Data
• Big Data Market Forecast
• Big Data Analytics
• Big Data in Business
• Big Data Types & Architecture

Introduction to Hadoop
• Big Data – Current Industry Trends
• Why Process Big Data?
• Challenges in Data Processing
• Why Hadoop?
• What is Hadoop offering?
• Hadoop Network Structure
• Hadoop Eco-System
• Hadoop Core Components
• Hadoop – Features
• Hadoop – Relevance
• Hadoop in Action

Hadoop HDFS & MapReduce
• Hadoop HDFS
—-o What does HDFS Facilitate?
—-o HDFS Architecture
—-o Hadoop Network and Server Infrastructure
—-o NameNode, Secondary NameNode and DataNode
—-o Ensuring Data Correctness
—-o Data Pipelining while Loading Data
—-o fs Operations
• Hadoop MapReduce
—-o MapReduce Conceptualization
—-o MapReduce – Overview
—-o MapReduce – Programming Model
—-o MapReduce – Execution Overview
—-o Hadoop – Application Examples
—-o Word Count – Example

Apache HBase
• What is HBase?
• HBase Architecture
• ZooKeeper
• HBase Data model
• HBase Deployment
• HBase Cluster Architecture
• Indexes in HBase
• Scaling HBase
• Data Locality, Coherence and Concurrency, Fault Tolerance
• Hadoop Integration
• High-Level Architecture
• Replication of Data Across Data Centres
• HBase Applications
• Advantages and Disadvantages

Apache Hive
• What is Hive?
• Why Hive?
• Where to use Hive?
• Hive Architecture
• Hive: Benefits
• Hive: Tradeoffs
• Hive: Real world Examples

May 27, 2016
Read more
images (20)

NICF- Statistics Bootcamp Using R and Tableau

Who should attend

This course is designed for:

  • Individuals with some IT background and would like to learn statistics with the latest tools, i.e., R and Tableau.
  • Individuals who have learnt statistics a long while ago and would like to refresh or update their statistics knowledge
  • Individuals who have no knowledge or experience in business analytics but would like to explore work opportunities in analytics.
  • Organisation users who need to be familiar with the state-of-the-art analytic techniques.


  • Knowledge or familiarity with basic statistics and programming will be useful


What will be covered

  • Visualising and summarising data for quick insights
  • Compare and evaluate different business strategies
  • Prediction based on relevant factors
  • Introduction to big data analytics
  • Presenting and Reporting Analysis Results
May 27, 2016
Read more
images (3)

Executive Education: Competing on Big Data Analytics

Day One

Basic Level in Big Data Analytics 

  • Understanding big data
  • Introduction to data science and big data analytics
  • Data Analytics Process and Lifecycle
  • Basic Data Analytics Techniques

Day Two

Advanced Level in Big Data Analytics   

  • Advanced Data Analytics Techniques
  • Data Mining and Extraction of big data
  • Predictive Modelling and forecasting
  • Building business cases using big data analytics
May 26, 2016
Read more

Data Science Course


(a) Sessions 1, 2 and 3 – Data science in Context, R and Python basics and R/Python data challenge

(b) Sessions 4 and 5 – Database concepts and Data collection

(c) Sessions 6, 7 and 8 – Insights, data visualization and storytelling;  Advanced Data visualization; and Group Project 1

(d) Sessions 9 and 10 – Statistics fundamentals & data modelling and Data preparation

(e) Sessions 11 and 12 – Regression basics and Linear & logistic regression

(f) Sessions 13 and 14 – Clustering & classification and Sentimental analysis & natural processing language

(g) Sessions 15, 16 and 17 – Time series analysis & forecasting; Introduction to Big Data and Group project 2


May 26, 2016
Read more
download (4)

NICF – Data Analytics

This modular course includes the following topics:

– Introduction to Big Data
– Hadoop overview and basic concepts
– Writing Map Reduce Applications
– Reducers and Partitioners
– Hadoop API library
– Input and output formats
– Advanced MapReduce features
– Sqoop, Flume, HBase, Hive and Pig overview
– Oozie overview

May 25, 2016
Read more
images (6)


Course Outline

This course consists of 2 post diploma certificates (PDCs). Each PDC comprises two modules and the details are as follows:

Semester One
PDC 1 Certificate in Fundamentals of Data Analysis
Module 1 – Probability and Statistics
Module 2 – Data Mining

Semester Two
PDC 2 Certificate in Applied Data Analysis

Module 3 – Applied Statistics
Module 4 – Regression Analysis

May 25, 2016
Read more