Hadoop

Hadoop/MR vs Spark/RDD Example by Word count analysis

Apache Spark provides an efficient way for solving iterative algorithms b...

Converting Airline dataset from the row format to colum...

To process Big Data huge number of machines are required. Instead of buying ...

Apache Spark : RDD vs DataFrame vs Dataset

With Spark2.0 release, there are 3 types of data abstractions which Spark of...

DL_BTEQ_WRAPPER_SRIPT

#!/bin/bash #==================================================================...

DL_TDCH_SCRIPT

#!/usr/bin/bsh #===============================================================...

DL_PARQUET_WRAPPER_SRIPT

#==============================================================================...

SCHEMA_GENERATION_WRAPPER_SRIPT

#Source DBSRVR config file #Taking user input and passing to DBSRVR file echo...

spark_1.x_sql_examples_1

-------------------------------------------------------------------------------...

spark_1.x_sql_examples_2

-------------------------------------------------------------------------------...

spark_1.x_submit_examples

-------------------------------------------------------------------------------...

spark_2.x_sql_examples_1.

-------------------------------------------------------------------------------...

spark_2.x_submit_examples

-------------------------------------------------------------------------------...

spark_command_line_examples

------------------------------------------------------------ Installation Ste...

spark_core_rdd_practice

val nums = List(1,2,3,4,5,6) val chars = List('a', 'b', 'c', 'd', 'e', 'f') s...

spark_day_1_practice

-------------------------------------------------- Spark Provides Libraries -> ...

spark_day_2_practice

--------------------------------------------------- def myfunc[T](index: Int, i...

spark_day_3_practice

------------------------------------------------------- def myfunc[T](index: I...

SPARK_DAY1_PRACTICE

Spark: --------------------- Apache Spark™ is a fast and general engine for lar...

Spark_Day3_3

--------------------------------------------------- val t1 = List((1, "kalyan")...

Spark_Day2_2

--------------------------------------------------- Find the `average` of 1 to ...

spark_hadoop_commands

=================================================================== Hadoop Spar...

spark_graphx_examples

------------------------------------------------ $SPARK_HOME/bin/run-example o...

spark_projects_build_commands

How to Build Eclipse Project: -------------------------------------------- 1. ...

spark_mllib_examples

---------------------------------------------- hadoop fs -put $SPARK_HOME/data ...

spark_streaming_examples

Create Spark Streaming Context: ========================================== scal...

HADOOP (PROOF OF CONCEPT) HEALTHCARE POC BY MAHESH CHAN...

INDUSTRY: HEALTHCARE DATA INPUT FORMAT :- PDF (My Input Data is in PDF Form...

HADOOP (PROOF OF CONCEPT) SENSEXLOG EXCEL DATA BY MAHES...

INDUSTRY: SENSEX LOG DATA INPUT FORMAT :- .xls (My Input Data is in excel...

HADOOP (PROOF OF CONCEPT) RETAIL DATA BY MAHESH CHANDRA...

INDUSTRY: RETAIL Data Input Format :- .xls (My Input Data is in excel 2007-20...

HADOOP - EXCEL INPUT FORMAT TO READ ANY EXCEL FILE

Hello Friends, After publishing my blogs on my POC for processing Excel file...

HIVE 2.1.1 INSTALLATION IN HADOOP 2.7.3 IN UBUNTU 16

Hello Friends, Welcome to the blog where I am going to explain and take you...

HADOOP (PROOF OF CONCEPTS) WEATHER REPORT ANALYSIS

Hello Friends, Welcome back... This blog is for analysis of Weather Report P...

HADOOP POC ON EXCEL DATA WEATHER REPORT ANALYSIS

Hello Friends, Glad to present this blog which is for analysis of Weather Re...

XML FILE PROCESSING IN HADOOP

Dear Friends, Welcome back, after a long time. I was asked by one of my fri...

MULTIPLE OUTPUT WITH MULTIPLE INPUT FILE NAME

Dear Friends, I was being asked to solve how to process different files at a...

WAYS TO BULK LOAD DATA IN HBASE

Dear Friends, Going ahead with my post, this one was asked by one of my frie...

HIVE ON RESCUE- A HEALTHCARE USE_CASE ON STRUCTURED DATA

Dear Friends, We know that Hadoop's HIVE component is very good for structur...

A USECASE ON TRAVEL APP

Dear Friends, Welcome Back.... Day by Day I am learning different thing ...

HIVE INTERVIEW RELATED PREPARATION

Dear Friends.... Few days I spent preparing and giving interviews for job c...

CRUNCH YOUR WAY IN HADOOP

 Welcome Back Friends... Sorry.... It took me some time for posting my blog...

What is Hadoop?

Apache Hadoop is, an open-source software framework, written in Java, by Doug ...

Hadoop Distributions

Below are the companies offering commercial implementations and/or providing s...

What is Spark ? Why Apache Spark?

Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab...

Hadoop Certifications

Hadoop Certifications Cloudera/Hortonworks/IBM Cloudera - Cloudera Universit...

Hadoop Training Institutes in India

Apache Hadoop Online/Offline Training Institutes in India (Bangalore/Hyderabad...

Hadoop Commands

Hadoop CLI Commands hadoop command [genericOptions] [commandOptions] hadoop ...

Apache Hadoop Versions

Hadoop Versions Hadoop 3 6 Apr 2018: Release 3.1.0 available 25 March 2018: R...

Apache Hadoop Terms/Abbreviations

HDFS - Hadoop Distributed File System GFS - Google File System JSON - Java Scr...

Google Cloud CLI commands

gcloud commands gcloud help gcloud help compute instances create gcloud -h g...

Terraform commands

Terraform Command Line Interface (Terraform CLI) $ terraform Usage: terraform...

AWS Certified Cloud Practitioner Practice Exam 2

AWS Certified Cloud Practitioner Practice Exam 2 Loading... Related Arti...

AWS Cloud Practitioner Exam 1

AWS Certified Cloud Practitioner Practice Exam 1 Loading... Related Artic...

Difference between NACL and Security Groups in AWS

What are the differences between Security Groups and NACL in Amazon Web Servic...

Google Associate Cloud Engineer Certification Practice ...

GCP Associate Cloud Engineer (ACE) Certification - Sample Questions ...

awscli commands

Amazon Web Services Command Line Interface (AWSCLI) Installing aws cli in Pyt...

Sqoop Commands

Sqoop command line (CLI) sqoop help usage: sqoop COMMAND [ARGS] Available co...

Hive Date functions

Date functions in Apache Hive hive> select current_date(); current_timestamp()...

Apache Spark Online Quiz

Spark Interview Questions and Answers [2022] Loading... Related Articles: W...

Hadoop Online Quiz [ 2022 ]

Apache Hadoop Interview Questions And Answers Loading... Related Articles: ...

Comparison between AWS and GCP resources

GCP (Google Cloud Platform) & AWS (Amazon Web Services) Cloud services compari...

Proof of Concept or POC on Customer Complaints Analysis

POC #: Customer Complaints Analysis The POC is based on Consumer Complains r...

Proof of Concept or POC on Youtube Data Analysis

POC #: Youtube Data Analysis The POC is based on Youtube data.  Public DATA...

POC #: Analyse social bookmarking sites to find insights

 Industry: Social Media Data: It comprises of the information gathered from...

POC #: Generate Analytics from a Product based Company ...

POC #: Generate Analytics from a Product based Company Web Log. The POC is ...

POC #: Sensex Log Data Processing (PDF File Processing ...

Industry: Financial Data: Input Format - .PDF (Our Input Data is in PDF Forma...

How to Analyze Data in Apache Spark

In this activity, you we will load data into Apache Spark and inspect the data u...

Analytics on India census using Spark

POC#: Analytics on India census using Spark In this article, I have explored Ce...

Sentiment Analysis on Demonetization(India)

POC#: Sentiment Analysis on demonetization in India using Spark In this arti...

Pig : Data types and Operators

 Data types:   simple data types:  ---------------------    int --> 32 bit int...

Pig : How to perform grouping by Multiple Columns

 how to perform grouping by multiple columns. --------------------------------...

Pig : Entire Column Aggregations

 Entire column aggregations.  select sum(sal) from emp; grunt> describe emp ...

Pig : Word Count Using Pig Data Flow

Word Count Using Pig DataFlow: [cloudera@quickstart ~]$ cat comment hadoop is ...

Spark : Entire Column Aggregations

 Entire Column Aggregations:  sql:    select sum(sal) from emp; scala> val e...

Spark : Handling CSV files .. Removing Headers

scala> val l = List(10,20,30,40,50,56,67) scala> val r2 = r.collect.reverse.ta...

Spark : Conditional Transformations

 Conditions Transformations: val trans = emp.map{ x =>        val w = x.split...

Pig : CoGroup examples Vs Union Examples

-- co groupinggrunt> cat piglab/emp101,aaaa,40000,m,11102,bbbbbb,50000,f,12103...

Spark : Union and Distinct

 Unions in spark.val l1 = List(10,20,30,40,50)val l2 = List(100,200,300,400,50...

Spark : CoGroup And Handling Empty Compact Buffers

Co Grouping using Spark:--------------------------scala> branch1.collect.forea...

Pig : load Operator

Load Operator:-------------- to load data from file to relation. [cloudera@qui...

Pig : Foreach Operator

Foreach Operator:-------------------grunt> emp = load 'piglab/emp' using PigSt...

Pig : Subsetting using Filter, Limit, Sample

Techniques of subsetting relations: i) filter: used for condiational filtering...

Spark : Joins

[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp spLab/e[cloudera@quickst...

Spark : Joins 2

Denormalizing datasets using Joins[cloudera@quickstart ~]$ cat > childrenc101,...

Pig : Joins

[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,...

Pig : Order [ Sorting ] , exec, run , pig

 order :-    to sort data (tuples) in ascending or descending order.  emp = l...

Pig : Cross Operator to Cartisian

 Cross:  -----    used cartisian product.    each element of left set, joins ...

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.