Hadoop

How to practice pyspark programs in Jupyter How to conf...

nikhil Dec 3, 2024 0 2

Azure DataFactory Copy all files from one Storage to an...

nikhil Dec 3, 2024 0 0

How to create gcp VM instance | Connect using putty | ...

nikhil Dec 3, 2024 0 1

Apache Superset 2 1 installation in ec2 | Latest supser...

nikhil Dec 3, 2024 0 1

Invalid Spark URL spark HeartbeatReceiver

nikhil Dec 3, 2024 0 1

databricks process xml data convert xml to csv using p...

nikhil Dec 3, 2024 0 2

Azure lookup activity get multiple tables from Mysql t...

nikhil Dec 3, 2024 0 1

Proof of Concept or POC on Customer Complaints Analysis

nikhil Dec 3, 2024 0 0

POC #: Customer Complaints Analysis The POC is based on Consumer Complains r...

Proof of Concept or POC on Youtube Data Analysis

nikhil Dec 3, 2024 0 0

POC #: Youtube Data Analysis The POC is based on Youtube data. Public DATA...

POC #: Analyse social bookmarking sites to find insights

nikhil Dec 3, 2024 0 1

Industry: Social Media Data: It comprises of the information gathered from...

POC #: Generate Analytics from a Product based Company ...

nikhil Dec 3, 2024 0 1

POC #: Generate Analytics from a Product based Company Web Log. The POC is ...

POC #: Sensex Log Data Processing (PDF File Processing ...

nikhil Dec 3, 2024 0 2

Industry: Financial Data: Input Format - .PDF (Our Input Data is in PDF Forma...

How to Analyze Data in Apache Spark

nikhil Dec 3, 2024 0 1

In this activity, you we will load data into Apache Spark and inspect the data u...

Analytics on India census using Spark

nikhil Dec 3, 2024 0 1

POC#: Analytics on India census using Spark In this article, I have explored Ce...

Sentiment Analysis on Demonetization(India)

nikhil Dec 3, 2024 0 0

POC#: Sentiment Analysis on demonetization in India using Spark In this arti...

Pig : Data types and Operators

nikhil Dec 3, 2024 0 2

Data types: simple data types: --------------------- int --> 32 bit int...

Pig : How to perform grouping by Multiple Columns

nikhil Dec 3, 2024 0 1

how to perform grouping by multiple columns. --------------------------------...

Pig : Entire Column Aggregations

nikhil Dec 3, 2024 0 1

Entire column aggregations. select sum(sal) from emp; grunt> describe emp ...

Pig : Word Count Using Pig Data Flow

nikhil Dec 3, 2024 0 1

Word Count Using Pig DataFlow: [cloudera@quickstart ~]$ cat comment hadoop is ...

Spark : Entire Column Aggregations

nikhil Dec 3, 2024 0 1

Entire Column Aggregations: sql: select sum(sal) from emp; scala> val e...

Spark : Handling CSV files .. Removing Headers

nikhil Dec 3, 2024 0 1

scala> val l = List(10,20,30,40,50,56,67) scala> val r2 = r.collect.reverse.ta...

Spark : Conditional Transformations

nikhil Dec 3, 2024 0 1

Conditions Transformations: val trans = emp.map{ x => val w = x.split...

Pig : CoGroup examples Vs Union Examples

nikhil Dec 3, 2024 0 1

-- co groupinggrunt> cat piglab/emp101,aaaa,40000,m,11102,bbbbbb,50000,f,12103...

Spark : Union and Distinct

nikhil Dec 3, 2024 0 1

Unions in spark.val l1 = List(10,20,30,40,50)val l2 = List(100,200,300,400,50...

Spark : CoGroup And Handling Empty Compact Buffers

nikhil Dec 3, 2024 0 1

Co Grouping using Spark:--------------------------scala> branch1.collect.forea...

Pig : load Operator

nikhil Dec 3, 2024 0 1

Load Operator:-------------- to load data from file to relation. [cloudera@qui...

Pig : Foreach Operator

nikhil Dec 3, 2024 0 1

Foreach Operator:-------------------grunt> emp = load 'piglab/emp' using PigSt...

Pig : Subsetting using Filter, Limit, Sample

nikhil Dec 3, 2024 0 1

Techniques of subsetting relations: i) filter: used for condiational filtering...

Spark : Joins

nikhil Dec 3, 2024 0 0

[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp spLab/e[cloudera@quickst...

Spark : Joins 2

nikhil Dec 3, 2024 0 0

Denormalizing datasets using Joins[cloudera@quickstart ~]$ cat > childrenc101,...

Pig : Joins

nikhil Dec 3, 2024 0 0

[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,...

Pig : Order [ Sorting ] , exec, run , pig

nikhil Dec 3, 2024 0 2

order :- to sort data (tuples) in ascending or descending order. emp = l...

Pig : Cross Operator to Cartisian

nikhil Dec 3, 2024 0 0

Cross: ----- used cartisian product. each element of left set, joins ...

Pig : UDFs

nikhil Dec 3, 2024 0 0

Pig UDFS ---------- UDF ---> user defined functions. adv: i) ...

Spark : Spark streaming and Kafka Integration

nikhil Dec 3, 2024 0 0

steps: 1) start zookeper server 2) Start Kafka brokers [ one or more ] 3)...

Python Examples 1

nikhil Dec 3, 2024 0 0

name = input("Enter name ") age = input("Enter age") print(name, " is ", age...

Pig : Udfs using Python

nikhil Dec 3, 2024 0 1

we can keep multiple functions under one program(.py) transoform.py -------...

Hive Partitioned tables [case study]

nikhil Dec 3, 2024 0 2

[cloudera@quickstart ~]$ cat saleshistory 01/01/2011,2000 01/01/2011,3000 01/0...

Pig Video Lessons

nikhil Dec 3, 2024 0 0

Pig class Links: PigLab1 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XTz...

Hive(10AmTo1:00Pm) Lab1 notes : Hive Inner and Externa...

nikhil Dec 3, 2024 0 0

hive> create table samp1(line string); -- here we did not select any database. ...

Python Options in Hadoop

nikhil Dec 3, 2024 0 0

New developers in the Hadoop ecosystem often struggle to get involved because th...

16 Hadoop fs Commands Every Data Engineer Must Know

nikhil Dec 3, 2024 0 1

Commands in Hadoop The Hadoop shell is the CLI for the Hadoop cluster. Most of t...

Ultimate Hadoop Python Example

nikhil Dec 3, 2024 0 1

What are the options for using Python in Hadoop? Python developers are looking t...

How to Find HDFS Path URL?

nikhil Dec 3, 2024 0 0

Have you ever been running a script in from the HDFS command line gotten this er...

What’s New in Hadoop 3.0?

nikhil Dec 3, 2024 0 0

Major Hadoop Release! Hadoop 3.0 is has dropped! There is a lot of excitement in...

Freelance Hadoop Administrative Roles

nikhil Dec 3, 2024 0 0

Freelance Hadoop Admin Roles A lot of the world’s economy is shifting to freelan...

What is the Difference Between Spark & Hadoop

nikhil Dec 3, 2024 0 0

Spark & Hadoop Workloads are Huge Data Engineers and Big Data Developers spend a...

Learn HDFS Without Java?

nikhil Dec 3, 2024 0 1

HDFS Skills Without Java In the world of Hadoop and Big Data HDFS is king. Data ...

Certifications Required For Hadoop Administrators?

nikhil Dec 3, 2024 0 0

Hadoop Certifications Data Engineers looking to grow their careers are constantl...

Spark vs. Hadoop 2019

nikhil Dec 3, 2024 0 0

Spark vs. Hadoop 2019 In 2019 which skill is in more demand for Data Enginners S...

Hadoop: 3 Top Real Time Applications

nikhil Dec 3, 2024 0 1

Hadoop its massive data processing capability helps built many real-time applica...

Understanding Hadoop MapReduce Fault Tolerance

nikhil Dec 3, 2024 0 0

Hadoop MapReduce is totally different from other distributed systems. It handles...

Sqoop, Flume and Storm Understand The Differences Quickly

nikhil Dec 3, 2024 0 0

Top differences between Sqoop, Flume and Storm in Hadoop frame work

What is Adaptive MapReduce in Hadoop and How it Works

nikhil Dec 3, 2024 0 0

The performance and the approach of Adaptive MapReduce in Hadoop explained.

How to Copy HDFS files to Local Linux GET Vs copyToLocal

nikhil Dec 3, 2024 0 1

Two popular HDFS commands you can use to copy HDFS files to local Linux. I have ...

How to Install HBase Properly

nikhil Dec 3, 2024 0 0

HBase is a NoSQL database in the Hadoop framework. Correct installation needed t...

Industry’s First Auto-Scaling Hadoop Clusters

nikhil Dec 3, 2024 0 1

Background In 2009 I first started playing around with Hive and EC2/S3. I was bl...

Optimizing Hadoop for S3 – Part 1

nikhil Dec 3, 2024 0 0

Introduction: Users of Qubole Data Service use Hive queries or Hadoop jobs to pr...

Sqoop as a Service

nikhil Dec 3, 2024 0 0

Background: As Qubole Data Service has gained adoption – many of our customers a...

Case Study: Building Analytics Applications

nikhil Dec 3, 2024 0 0

This is a guest blog post written by Marc Rossen, a Qubole user, and advocate. T...

Caching in on the cloud!

nikhil Dec 3, 2024 0 0

Motivation One of the interesting things about using Hadoop and Hive in the clou...

Case Study: Big Data Cloud Computing – Part 1

nikhil Dec 3, 2024 0 1

The scalability of cloud databases and the potential of big data cloud computing...

Top 10 Industry Examples of HDFS

nikhil Dec 3, 2024 0 1

Top 10 Industry Examples of HDFS Not everyone comes to us with a clear strategy ...

Qubole Available on Google Compute Engine

nikhil Dec 3, 2024 0 0

Qubole is a leading provider of Hadoop as a service with the mission of providin...

Save Time Executing Hive Queries Using Command Templates

nikhil Dec 3, 2024 0 1

A common characteristic of many analytics queries is that they are mostly invari...

Hadoop Cloud vs On-Premise Hadoop

nikhil Dec 3, 2024 0 0

As topics of conversation go, the terms “Big Data” and “Hadoop functionality” se...

Komli Media Improves Utilization with Premium Big Data ...

nikhil Dec 3, 2024 0 1

Komli Media, Asia Pacific’s leading media technology company, depends on reachin...

Accenture Technology Labs Hadoop Deployment Comparison ...

nikhil Dec 3, 2024 0 1

Background The Accenture Technology Labs Hadoop Deployment Comparison study rece...

The Challenges and Opportunities for E-commerce in a Bi...

nikhil Dec 3, 2024 0 0

The highly competitive world of e-Commerce is driven by price and advertising. C...

Job Scheduling in Hadoop – A 7 Year Perspective

nikhil Dec 3, 2024 0 0

In a recent presentation at Flipkart’s 2014 SlashN conference, I summarized seve...

Announcing General Availability of Presto-as-a-Service

nikhil Dec 3, 2024 0 1

Presto Ready! We announced our Presto-as-a-Service Alpha Program on Amazon Web S...

Hadoop in the Cloud: Qubole shows 2x – 8x speedup in pe...

nikhil Dec 3, 2024 0 0

Qubole aims to provide the best platform for big data analysis in the cloud. In ...

Forbes: Qubole Data Service Road to Hadoop

nikhil Dec 3, 2024 0 1

On Monday, May 26, 2014, Qubole was featured on Forbes.com. Technology contribut...

Qubole Founders Open Up About the Transformation of Hadoop

nikhil Dec 3, 2024 0 1

Seven years ago, Joydeep Sen Sarma and Ashish Thusoo were first introduced to bi...

Hadoop vs Traditional Databases: Big Data Considerations

nikhil Dec 3, 2024 0 0

Today’s ultra-connected world is generating massive volumes of data at ever-acce...

Top 5 Big Data Myths Debunked

nikhil Dec 3, 2024 0 1

The era of big data has arrived. Today, companies both large and small are disco...

Securely sharing data across Organizations with Qubole

nikhil Dec 3, 2024 0 0

Customers love that Qubole enables collaboration via a shared workbench across m...

High Performance Hadoop with New Generation AWS Instances

nikhil Dec 3, 2024 0 1

Welcome New Generation Instance Types Amazon Web Services (AWS) offers a range o...

MapReduce vs Apache Spark

nikhil Dec 3, 2024 0 0

Cluster Computing Comparisons: MapReduce vs Apache Spark Since its early beginni...

Not All Hadoop Distributions are Created Equal

nikhil Dec 3, 2024 0 0

The debate is over. Big data analytics has proven benefits. And organizations lo...

Looking Forward: Hadoop Industry Trends

nikhil Dec 3, 2024 0 0

From its primitive beginnings as a modest open-source search engine called “Nutc...

Hadoop with Enhanced Networking on AWS

nikhil Dec 3, 2024 0 0

Introduction At Qubole, many of our customers run their Hadoop clusters on AWS E...

Apache Hadoop 2.6.0 Now Generally Available on Qubole

nikhil Dec 3, 2024 0 0

We’re excited to announce that Apache Hadoop 2.6.0, the latest stable release* o...

Hadoop is Hard! But Big Data Doesn’t Have To Be

nikhil Dec 3, 2024 0 1

When it comes to big data analytics, Hadoop has been heralded as the all-in-one ...

Drag-n-Drop upgrades of Hadoop, Spark and Presto Clusters

nikhil Dec 3, 2024 0 0

Introduction As the Big Data stack has matured, many companies have started usin...

Multi-tenant Job History Server for Ephemeral Hadoop an...

nikhil Dec 3, 2024 0 0

Introduction Qubole Data Service (QDS) allows users to configure logical Hadoop ...

Riding the Spotted Elephant

nikhil Dec 3, 2024 0 0

Introduction: One of the benefits of moving Hadoop workloads to the cloud is red...

The Main Types of Big Data Vendors: A Comparative Look

nikhil Dec 3, 2024 0 1

The big data boom has given rise to a host of vendors, each promoting their own ...

Keeping Big Data Safe: Common Hadoop Security Issues an...

nikhil Dec 3, 2024 0 1

The big data explosion has given rise to a host of Information technology tools ...

Apache Spark vs Hadoop

nikhil Dec 3, 2024 0 0

Which Big Data Framework is the Best Fit? Apache Hadoop wasn’t just the “elepha...

Cassandra vs Hadoop: A Comparative Look

nikhil Dec 3, 2024 0 0

Technology is reshaping our world. The proliferation of mobile devices, the expl...

Hadoop Data Warehouse

nikhil Dec 3, 2024 0 0

This post was originally published in August 2014 and has since been updated. Bi...

Big Data Implementation

nikhil Dec 3, 2024 0 0

Big Data Challenges With all the hype, it’s little wonder that organizations ar...

The Big Data Lifecycle At TubeMogul

nikhil Dec 3, 2024 0 1

This post was written by Chris Chanyi, Senior Data Architect at TubeMogul. It or...

RubiX: Fast Cache Access for Big Data Analytics on Clou...

nikhil Dec 3, 2024 0 0

Qubole introduced first-generation Caching for S3 files in Presto in 2014 and do...

Quark: Control and Optimize SQL Across Hadoop and RDBMS

nikhil Dec 3, 2024 0 0

One of the important functions of a database administrator is to manage storage ...

Qubole announces Heterogeneous Clusters on AWS – Reduce...

nikhil Dec 3, 2024 0 1

Co-authored by Hariharan Iyer, Member of the Technical Staff at Qubole. Introduc...

Auto-scaling in Qubole With AWS Elastic Block Storage

nikhil Dec 3, 2024 0 1

Co-authored by Hariharan Iyer, Member of the Technical Staff at Qubole. Amazon E...

Data Platforms 2017: The Conference I Wish Existed in 2007

nikhil Dec 3, 2024 0 1

This post is authored by Ashish Thusoo, Co-Founder and Chief Executive Officer, ...

Container Packing: A New Algorithm for Resource Schedul...

nikhil Dec 3, 2024 0 0

Container Packing for Resource Scheduling in the Cloud In this post we describe ...

Hadoop

Popular Posts

Recommended Posts

Popular Tags