This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.
Apache Spark provides an efficient way for solving iterative algorithms b...
To process Big Data huge number of machines are required. Instead of buying ...
With Spark2.0 release, there are 3 types of data abstractions which Spark of...
#!/bin/bash #==================================================================...
#!/usr/bin/bsh #===============================================================...
#==============================================================================...
#Source DBSRVR config file #Taking user input and passing to DBSRVR file echo...
-------------------------------------------------------------------------------...
-------------------------------------------------------------------------------...
-------------------------------------------------------------------------------...
-------------------------------------------------------------------------------...
-------------------------------------------------------------------------------...
------------------------------------------------------------ Installation Ste...
val nums = List(1,2,3,4,5,6) val chars = List('a', 'b', 'c', 'd', 'e', 'f') s...
-------------------------------------------------- Spark Provides Libraries -> ...
--------------------------------------------------- def myfunc[T](index: Int, i...
------------------------------------------------------- def myfunc[T](index: I...
Spark: --------------------- Apache Spark™ is a fast and general engine for lar...
--------------------------------------------------- val t1 = List((1, "kalyan")...
--------------------------------------------------- Find the `average` of 1 to ...
=================================================================== Hadoop Spar...
------------------------------------------------ $SPARK_HOME/bin/run-example o...
How to Build Eclipse Project: -------------------------------------------- 1. ...
---------------------------------------------- hadoop fs -put $SPARK_HOME/data ...
Create Spark Streaming Context: ========================================== scal...
INDUSTRY: HEALTHCARE DATA INPUT FORMAT :- PDF (My Input Data is in PDF Form...
INDUSTRY: SENSEX LOG DATA INPUT FORMAT :- .xls (My Input Data is in excel...
INDUSTRY: RETAIL Data Input Format :- .xls (My Input Data is in excel 2007-20...
Hello Friends, After publishing my blogs on my POC for processing Excel file...
Hello Friends, Welcome to the blog where I am going to explain and take you...
Hello Friends, Welcome back... This blog is for analysis of Weather Report P...
Hello Friends, Glad to present this blog which is for analysis of Weather Re...
Dear Friends, Welcome back, after a long time. I was asked by one of my fri...
Dear Friends, I was being asked to solve how to process different files at a...
Dear Friends, Going ahead with my post, this one was asked by one of my frie...
Dear Friends, We know that Hadoop's HIVE component is very good for structur...
Dear Friends.... Few days I spent preparing and giving interviews for job c...
Welcome Back Friends... Sorry.... It took me some time for posting my blog...
Below are the companies offering commercial implementations and/or providing s...
Spark is one of Hadoop’s sub project developed in 2009 in UC Berkeley’s AMPLab...
Hadoop Certifications Cloudera/Hortonworks/IBM Cloudera - Cloudera Universit...
Apache Hadoop Online/Offline Training Institutes in India (Bangalore/Hyderabad...
Hadoop CLI Commands hadoop command [genericOptions] [commandOptions] hadoop ...
Hadoop Versions Hadoop 3 6 Apr 2018: Release 3.1.0 available 25 March 2018: R...
HDFS - Hadoop Distributed File System GFS - Google File System JSON - Java Scr...
gcloud commands gcloud help gcloud help compute instances create gcloud -h g...
Terraform Command Line Interface (Terraform CLI) $ terraform Usage: terraform...
AWS Certified Cloud Practitioner Practice Exam 2 Loading... Related Arti...
AWS Certified Cloud Practitioner Practice Exam 1 Loading... Related Artic...
What are the differences between Security Groups and NACL in Amazon Web Servic...
GCP Associate Cloud Engineer (ACE) Certification - Sample Questions ...
Amazon Web Services Command Line Interface (AWSCLI) Installing aws cli in Pyt...
Sqoop command line (CLI) sqoop help usage: sqoop COMMAND [ARGS] Available co...
Date functions in Apache Hive hive> select current_date(); current_timestamp()...
Spark Interview Questions and Answers [2022] Loading... Related Articles: W...
Apache Hadoop Interview Questions And Answers Loading... Related Articles: ...
GCP (Google Cloud Platform) & AWS (Amazon Web Services) Cloud services compari...
POC #: Customer Complaints Analysis The POC is based on Consumer Complains r...
POC #: Youtube Data Analysis The POC is based on Youtube data. Public DATA...
Industry: Social Media Data: It comprises of the information gathered from...
POC #: Generate Analytics from a Product based Company Web Log. The POC is ...
Industry: Financial Data: Input Format - .PDF (Our Input Data is in PDF Forma...
In this activity, you we will load data into Apache Spark and inspect the data u...
POC#: Analytics on India census using Spark In this article, I have explored Ce...
POC#: Sentiment Analysis on demonetization in India using Spark In this arti...
Data types: simple data types: --------------------- int --> 32 bit int...
how to perform grouping by multiple columns. --------------------------------...
Entire column aggregations. select sum(sal) from emp; grunt> describe emp ...
Word Count Using Pig DataFlow: [cloudera@quickstart ~]$ cat comment hadoop is ...
Entire Column Aggregations: sql: select sum(sal) from emp; scala> val e...
scala> val l = List(10,20,30,40,50,56,67) scala> val r2 = r.collect.reverse.ta...
Conditions Transformations: val trans = emp.map{ x => val w = x.split...
-- co groupinggrunt> cat piglab/emp101,aaaa,40000,m,11102,bbbbbb,50000,f,12103...
Unions in spark.val l1 = List(10,20,30,40,50)val l2 = List(100,200,300,400,50...
Co Grouping using Spark:--------------------------scala> branch1.collect.forea...
Load Operator:-------------- to load data from file to relation. [cloudera@qui...
Foreach Operator:-------------------grunt> emp = load 'piglab/emp' using PigSt...
Techniques of subsetting relations: i) filter: used for condiational filtering...
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp spLab/e[cloudera@quickst...
Denormalizing datasets using Joins[cloudera@quickstart ~]$ cat > childrenc101,...
[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,...
order :- to sort data (tuples) in ascending or descending order. emp = l...
Cross: ----- used cartisian product. each element of left set, joins ...