Hadoop

bg
Proof of Concept or POC on Customer Complaints Analysis

Proof of Concept or POC on Customer Complaints Analysis

POC #: Customer Complaints Analysis The POC is based on Consumer Complains r...

bg
Proof of Concept or POC on Youtube Data Analysis

Proof of Concept or POC on Youtube Data Analysis

POC #: Youtube Data Analysis The POC is based on Youtube data.  Public DATA...

bg
POC #: Analyse social bookmarking sites to find insights

POC #: Analyse social bookmarking sites to find insights

 Industry: Social Media Data: It comprises of the information gathered from...

bg
POC #: Generate Analytics from a Product based Company Web Log.

POC #: Generate Analytics from a Product based Company ...

POC #: Generate Analytics from a Product based Company Web Log. The POC is ...

bg
POC #: Sensex Log Data Processing (PDF File Processing in Map Reduce)

POC #: Sensex Log Data Processing (PDF File Processing ...

Industry: Financial Data: Input Format - .PDF (Our Input Data is in PDF Forma...

bg
How to Analyze Data in Apache Spark

How to Analyze Data in Apache Spark

In this activity, you we will load data into Apache Spark and inspect the data u...

bg
Analytics on India census using Spark

Analytics on India census using Spark

POC#: Analytics on India census using Spark In this article, I have explored Ce...

bg
Sentiment Analysis on Demonetization(India)

Sentiment Analysis on Demonetization(India)

POC#: Sentiment Analysis on demonetization in India using Spark In this arti...

Pig : Data types and Operators

 Data types:   simple data types:  ---------------------    int --> 32 bit int...

Pig : How to perform grouping by Multiple Columns

 how to perform grouping by multiple columns. --------------------------------...

Pig : Entire Column Aggregations

 Entire column aggregations.  select sum(sal) from emp; grunt> describe emp ...

Pig : Word Count Using Pig Data Flow

Word Count Using Pig DataFlow: [cloudera@quickstart ~]$ cat comment hadoop is ...

Spark : Entire Column Aggregations

 Entire Column Aggregations:  sql:    select sum(sal) from emp; scala> val e...

Spark : Handling CSV files .. Removing Headers

scala> val l = List(10,20,30,40,50,56,67) scala> val r2 = r.collect.reverse.ta...

Spark : Conditional Transformations

 Conditions Transformations: val trans = emp.map{ x =>        val w = x.split...

Pig : CoGroup examples Vs Union Examples

-- co groupinggrunt> cat piglab/emp101,aaaa,40000,m,11102,bbbbbb,50000,f,12103...

Spark : Union and Distinct

 Unions in spark.val l1 = List(10,20,30,40,50)val l2 = List(100,200,300,400,50...

Spark : CoGroup And Handling Empty Compact Buffers

Co Grouping using Spark:--------------------------scala> branch1.collect.forea...

Pig : load Operator

Load Operator:-------------- to load data from file to relation. [cloudera@qui...

Pig : Foreach Operator

Foreach Operator:-------------------grunt> emp = load 'piglab/emp' using PigSt...

Pig : Subsetting using Filter, Limit, Sample

Techniques of subsetting relations: i) filter: used for condiational filtering...

Spark : Joins

[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp spLab/e[cloudera@quickst...

Spark : Joins 2

Denormalizing datasets using Joins[cloudera@quickstart ~]$ cat > childrenc101,...

Pig : Joins

[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,...

Pig : Order [ Sorting ] , exec, run , pig

 order :-    to sort data (tuples) in ascending or descending order.  emp = l...

Pig : Cross Operator to Cartisian

 Cross:  -----    used cartisian product.    each element of left set, joins ...

Pig : UDFs

Pig UDFS ----------   UDF ---> user defined functions.      adv:        i)  ...

Spark : Spark streaming and Kafka Integration

steps:  1)  start zookeper server  2)  Start Kafka brokers [ one or more ]  3)...

Python Examples 1

name = input("Enter name ") age = input("Enter age") print(name, " is ", age...

bg
Pig : Udfs using Python

Pig : Udfs using Python

we can keep multiple functions   under one program(.py)  transoform.py -------...

Hive Partitioned tables [case study]

[cloudera@quickstart ~]$ cat saleshistory 01/01/2011,2000 01/01/2011,3000 01/0...

Pig Video Lessons

Pig class Links: PigLab1 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XTz...

Hive(10AmTo1:00Pm) Lab1 notes : Hive Inner and Externa...

hive> create table samp1(line string); -- here we did not select any database. ...

bg
Python Options in Hadoop

Python Options in Hadoop

New developers in the Hadoop ecosystem often struggle to get involved because th...

bg
16 Hadoop fs Commands Every Data Engineer Must Know

16 Hadoop fs Commands Every Data Engineer Must Know

Commands in Hadoop The Hadoop shell is the CLI for the Hadoop cluster. Most of t...

bg
Ultimate Hadoop Python Example

Ultimate Hadoop Python Example

What are the options for using Python in Hadoop? Python developers are looking t...

bg
How to Find HDFS Path URL?

How to Find HDFS Path URL?

Have you ever been running a script in from the HDFS command line gotten this er...

bg
What’s New in Hadoop 3.0?

What’s New in Hadoop 3.0?

Major Hadoop Release! Hadoop 3.0 is has dropped! There is a lot of excitement in...

bg
Freelance Hadoop Administrative Roles

Freelance Hadoop Administrative Roles

Freelance Hadoop Admin Roles A lot of the world’s economy is shifting to freelan...

bg
What is the Difference Between Spark & Hadoop

What is the Difference Between Spark & Hadoop

Spark & Hadoop Workloads are Huge Data Engineers and Big Data Developers spend a...

bg
Learn HDFS Without Java?

Learn HDFS Without Java?

HDFS Skills Without Java In the world of Hadoop and Big Data HDFS is king. Data ...

bg
Certifications Required For Hadoop Administrators?

Certifications Required For Hadoop Administrators?

Hadoop Certifications Data Engineers looking to grow their careers are constantl...

bg
Spark vs. Hadoop 2019

Spark vs. Hadoop 2019

Spark vs. Hadoop 2019 In 2019 which skill is in more demand for Data Enginners S...

bg
Hadoop: 3 Top Real Time Applications

Hadoop: 3 Top Real Time Applications

Hadoop its massive data processing capability helps built many real-time applica...

bg
Understanding Hadoop MapReduce Fault Tolerance

Understanding Hadoop MapReduce Fault Tolerance

Hadoop MapReduce is totally different from other distributed systems. It handles...

bg
Sqoop, Flume and Storm Understand The Differences Quickly

Sqoop, Flume and Storm Understand The Differences Quickly

Top differences between Sqoop, Flume and Storm in Hadoop frame work

bg
What is Adaptive MapReduce in Hadoop and How it Works

What is Adaptive MapReduce in Hadoop and How it Works

The performance and the approach of Adaptive MapReduce in Hadoop explained.

bg
How to Copy HDFS files to Local Linux GET Vs copyToLocal

How to Copy HDFS files to Local Linux GET Vs copyToLocal

Two popular HDFS commands you can use to copy HDFS files to local Linux. I have ...

bg
How to Install HBase Properly

How to Install HBase Properly

HBase is a NoSQL database in the Hadoop framework. Correct installation needed t...

bg
Industry’s First Auto-Scaling Hadoop Clusters

Industry’s First Auto-Scaling Hadoop Clusters

Background In 2009 I first started playing around with Hive and EC2/S3. I was bl...

bg
Optimizing Hadoop for S3 – Part 1

Optimizing Hadoop for S3 – Part 1

Introduction: Users of Qubole Data Service use Hive queries or Hadoop jobs to pr...

bg
Sqoop as a Service

Sqoop as a Service

Background: As Qubole Data Service has gained adoption – many of our customers a...

bg
Case Study: Building Analytics Applications

Case Study: Building Analytics Applications

This is a guest blog post written by Marc Rossen, a Qubole user, and advocate. T...

bg
Caching in on the cloud!

Caching in on the cloud!

Motivation One of the interesting things about using Hadoop and Hive in the clou...

bg
Case Study: Big Data Cloud Computing – Part 1

Case Study: Big Data Cloud Computing – Part 1

The scalability of cloud databases and the potential of big data cloud computing...

bg
Top 10 Industry Examples of HDFS

Top 10 Industry Examples of HDFS

Top 10 Industry Examples of HDFS Not everyone comes to us with a clear strategy ...

bg
Qubole Available on Google Compute Engine

Qubole Available on Google Compute Engine

Qubole is a leading provider of Hadoop as a service with the mission of providin...

bg
Save Time Executing Hive Queries Using Command Templates

Save Time Executing Hive Queries Using Command Templates

A common characteristic of many analytics queries is that they are mostly invari...

bg
Hadoop Cloud vs On-Premise Hadoop

Hadoop Cloud vs On-Premise Hadoop

As topics of conversation go, the terms “Big Data” and “Hadoop functionality” se...

bg
Komli Media Improves Utilization with Premium Big Data Platform Qubole

Komli Media Improves Utilization with Premium Big Data ...

Komli Media, Asia Pacific’s leading media technology company, depends on reachin...

bg
Accenture Technology Labs Hadoop Deployment Comparison Study

Accenture Technology Labs Hadoop Deployment Comparison ...

Background The Accenture Technology Labs Hadoop Deployment Comparison study rece...

bg
The Challenges and Opportunities for E-commerce in a Big Data World

The Challenges and Opportunities for E-commerce in a Bi...

The highly competitive world of e-Commerce is driven by price and advertising. C...

bg
Job Scheduling in Hadoop – A 7 Year Perspective

Job Scheduling in Hadoop – A 7 Year Perspective

In a recent presentation at Flipkart’s 2014 SlashN conference, I summarized seve...

bg
Announcing General Availability of Presto-as-a-Service

Announcing General Availability of Presto-as-a-Service

Presto Ready! We announced our Presto-as-a-Service Alpha Program on Amazon Web S...

bg
Hadoop in the Cloud: Qubole shows 2x – 8x speedup in performance over Apache Hadoop

Hadoop in the Cloud: Qubole shows 2x – 8x speedup in pe...

Qubole aims to provide the best platform for big data analysis in the cloud. In ...

bg
Forbes: Qubole Data Service Road to Hadoop

Forbes: Qubole Data Service Road to Hadoop

On Monday, May 26, 2014, Qubole was featured on Forbes.com. Technology contribut...

bg
Qubole Founders Open Up About the Transformation of Hadoop

Qubole Founders Open Up About the Transformation of Hadoop

Seven years ago, Joydeep Sen Sarma and Ashish Thusoo were first introduced to bi...

bg
Hadoop vs Traditional Databases: Big Data Considerations

Hadoop vs Traditional Databases: Big Data Considerations

Today’s ultra-connected world is generating massive volumes of data at ever-acce...

bg
Top 5 Big Data Myths Debunked

Top 5 Big Data Myths Debunked

The era of big data has arrived. Today, companies both large and small are disco...

bg
Securely sharing data across Organizations with Qubole

Securely sharing data across Organizations with Qubole

Customers love that Qubole enables collaboration via a shared workbench across m...

bg
High Performance Hadoop with New Generation AWS Instances

High Performance Hadoop with New Generation AWS Instances

Welcome New Generation Instance Types Amazon Web Services (AWS) offers a range o...

bg
MapReduce vs Apache Spark

MapReduce vs Apache Spark

Cluster Computing Comparisons: MapReduce vs Apache Spark Since its early beginni...

bg
Not All Hadoop Distributions are Created Equal

Not All Hadoop Distributions are Created Equal

The debate is over. Big data analytics has proven benefits. And organizations lo...

bg
Looking Forward: Hadoop Industry Trends

Looking Forward: Hadoop Industry Trends

From its primitive beginnings as a modest open-source search engine called “Nutc...

bg
Hadoop with Enhanced Networking on AWS

Hadoop with Enhanced Networking on AWS

Introduction At Qubole, many of our customers run their Hadoop clusters on AWS E...

bg
Apache Hadoop 2.6.0 Now Generally Available on Qubole

Apache Hadoop 2.6.0 Now Generally Available on Qubole

We’re excited to announce that Apache Hadoop 2.6.0, the latest stable release* o...

bg
Hadoop is Hard! But Big Data Doesn’t Have To Be

Hadoop is Hard! But Big Data Doesn’t Have To Be

When it comes to big data analytics, Hadoop has been heralded as the all-in-one ...

bg
Drag-n-Drop upgrades of Hadoop, Spark and Presto Clusters

Drag-n-Drop upgrades of Hadoop, Spark and Presto Clusters

Introduction As the Big Data stack has matured, many companies have started usin...

bg
Multi-tenant Job History Server for Ephemeral Hadoop and Spark Clusters

Multi-tenant Job History Server for Ephemeral Hadoop an...

Introduction Qubole Data Service (QDS) allows users to configure logical Hadoop ...

bg
Riding the Spotted Elephant

Riding the Spotted Elephant

Introduction: One of the benefits of moving Hadoop workloads to the cloud is red...

bg
The Main Types of Big Data Vendors: A Comparative Look

The Main Types of Big Data Vendors: A Comparative Look

The big data boom has given rise to a host of vendors, each promoting their own ...

bg
Keeping Big Data Safe: Common Hadoop Security Issues and Best Practices

Keeping Big Data Safe: Common Hadoop Security Issues an...

The big data explosion has given rise to a host of Information technology tools ...

bg
Apache Spark vs Hadoop

Apache Spark vs Hadoop

Which Big Data Framework is the Best Fit?  Apache Hadoop wasn’t just the “elepha...

bg
Cassandra vs Hadoop: A Comparative Look

Cassandra vs Hadoop: A Comparative Look

Technology is reshaping our world. The proliferation of mobile devices, the expl...

bg
Hadoop Data Warehouse

Hadoop Data Warehouse

This post was originally published in August 2014 and has since been updated. Bi...

bg
Big Data Implementation

Big Data Implementation

Big Data Challenges  With all the hype, it’s little wonder that organizations ar...

bg
The Big Data Lifecycle At TubeMogul

The Big Data Lifecycle At TubeMogul

This post was written by Chris Chanyi, Senior Data Architect at TubeMogul. It or...

bg
RubiX: Fast Cache Access for Big Data Analytics on Cloud Storage

RubiX: Fast Cache Access for Big Data Analytics on Clou...

Qubole introduced first-generation Caching for S3 files in Presto in 2014 and do...

bg
Quark: Control and Optimize SQL Across Hadoop and RDBMS

Quark: Control and Optimize SQL Across Hadoop and RDBMS

One of the important functions of a database administrator is to manage storage ...

bg
Qubole announces Heterogeneous Clusters on AWS – Reduce costs up to 90% with Spot Fleet

Qubole announces Heterogeneous Clusters on AWS – Reduce...

Co-authored by Hariharan Iyer, Member of the Technical Staff at Qubole. Introduc...

bg
Auto-scaling in Qubole With AWS Elastic Block Storage

Auto-scaling in Qubole With AWS Elastic Block Storage

Co-authored by Hariharan Iyer, Member of the Technical Staff at Qubole. Amazon E...

bg
Data Platforms 2017: The Conference I Wish Existed in 2007

Data Platforms 2017: The Conference I Wish Existed in 2007

This post is authored by Ashish Thusoo, Co-Founder and Chief Executive Officer, ...

bg
Container Packing: A New Algorithm for Resource Scheduling in the Cloud

Container Packing: A New Algorithm for Resource Schedul...

Container Packing for Resource Scheduling in the Cloud In this post we describe ...

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.

ca-pub-4239506253673884