Pig : load Operator

Load Operator:-------------- to load data from file to relation. [cloudera@quickstart ~]$ cat > samp1100 200 300400 500 900100 120 23123 900 800[cloudera@quickstart ~]$ hadoop fs -copyFromLocal samp1 piglab[cloudera@quickstart ~]$grunt> s1 = load 'piglab/samp1' using PigStorage('\t')>> as (a:int, b:int, c:int);grunt> s2 = load 'piglab/samp1' using PigStorage()>> as (a:int, b:int, c:int);grunt> s3 = load 'piglab/samp1'>> as (a:int, b:int, c:int);grunt> dump s3(100,200,300)(400,500,900)(100,120,23)(123,900,800)grunt> dump s2(100,200,300)(400,500,900)(100,120,23)(123,900,800)grunt> dump s1(100,200,300)(400,500,900)(100,120,23)(123,900,800)outputs of s1, s2 , s3 are same. in s2, PigStorage() is with \t delimiter. in s3, among PigStorage() and BinStorage() PigStorage() is applied by default with \t delimiter. the meaning of s1, s2 ,s3 is same.s4 = load 'piglab/samp1' as (a:int, b:int, c:int, d:int) first 3 fields of file are mapped with a,b,c fields, there is not 4th field in the file, so d will become null.grunt> dump s4(100,200,300,)(400,500,900,)(100,120,23,)(123,900,800,)-- following is to skip last fields.grunt> s5 = load 'piglab/samp1'>> as (a:int, b:int)>> ;grunt> illustrate s5--------------------------------| s5 | a:int | b:int |--------------------------------| | 100 | 120 |---------------------------------- but to skip middled fields, take help of foreach operator. [later]loading non tab delimited files into PigRelation[cloudera@quickstart ~]$ cat > samp2100,10,12,200,203,30,300[cloudera@quickstart ~]$ hadoop fs -copyFromLocal samp2 piglabgrunt> ss1 = load 'piglab/samp2' as (a:int, b:int, c:int);(,,)(,,)(,,) here load is expecting \t delimiter, but file has 0 tabs. so entire line is one field which is string. this has to be mapped with first field of relation , which is a but as int. so a became null. file does not have 2 nd, 3 rd fields , thats why b, c fields bacame null.grunt> ss2 = load 'piglab/samp2' as (a:chararray, b:int, c:int);(100,10,1,,)(2,200,20,,)(3,30,300,,)grunt> ss3 = load 'piglab/samp2' using PigStorage(',') as (a:int, b:int, c:int);grunt> dump ss3(100,10,1)(2,200,20)(3,30,300)grunt> cat piglab/emp101,aaaa,40000,m,11102,bbbbbb,50000,f,12103,cccc,50000,m,12104,dd,90000,f,13105,ee,10000,m,12106,dkd,40000,m,12107,sdkfj,80000,f,13108,iiii,50000,m,11grunt> emp = load 'piglab/emp'>> using PigStorage(',')>> as (id:int, name:chararray, sal:int, sex:chararray, dno:int);grunt> illustrate emp-------------| emp | id:int | name:chararray | sal:int | sex:chararray | dno:int |----------------------------------------------------------------------------------------| | 104 | dd | 90000 | f | 13 |----------------------------------------------------------------------------------------

nikhil

nikhil

Dec 3, 2024 - 16:46

0 1

Load Operator:
--------------
to load data from file to relation.
[cloudera@quickstart ~]$ cat > samp1
100 200 300
400 500 900
100 120 23
123 900 800
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal samp1 piglab
[cloudera@quickstart ~]$

grunt> s1 = load 'piglab/samp1' using PigStorage('\t')
>> as (a:int, b:int, c:int);
grunt> s2 = load 'piglab/samp1' using PigStorage()
>> as (a:int, b:int, c:int);
grunt> s3 = load 'piglab/samp1'
>> as (a:int, b:int, c:int);
grunt> dump s3
(100,200,300)
(400,500,900)
(100,120,23)
(123,900,800)
grunt> dump s2
(100,200,300)
(400,500,900)
(100,120,23)
(123,900,800)
grunt> dump s1
(100,200,300)
(400,500,900)
(100,120,23)
(123,900,800)
outputs of s1, s2 , s3 are same.
in s2, PigStorage() is with \t delimiter.
in s3, among PigStorage() and BinStorage()
PigStorage() is applied by default with \t delimiter.

the meaning of s1, s2 ,s3 is same.
s4 = load 'piglab/samp1'
as (a:int, b:int, c:int, d:int)
first 3 fields of file are mapped with a,b,c fields,
there is not 4th field in the file,
so d will become null.
grunt> dump s4
(100,200,300,)
(400,500,900,)
(100,120,23,)
(123,900,800,)
-- following is to skip last fields.
grunt> s5 = load 'piglab/samp1'
>> as (a:int, b:int)
>> ;
grunt> illustrate s5
--------------------------------
| s5 | a:int | b:int |
--------------------------------
| | 100 | 120 |
--------------------------------
-- but to skip middled fields, take help of foreach operator. [later]
loading non tab delimited files into PigRelation

[cloudera@quickstart ~]$ cat > samp2
100,10,1
2,200,20
3,30,300
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal samp2 piglab
grunt> ss1 = load 'piglab/samp2' as (a:int, b:int, c:int);
(,,)
(,,)
(,,)
here load is expecting \t delimiter,
but file has 0 tabs.
so entire line is one field which is string.
this has to be mapped with first field of relation , which is a but as int.
so a became null. file does not have 2 nd, 3 rd fields , thats why b, c fields bacame null.
grunt> ss2 = load 'piglab/samp2'
as (a:chararray, b:int, c:int);
(100,10,1,,)
(2,200,20,,)
(3,30,300,,)
grunt> ss3 = load 'piglab/samp2'
using PigStorage(',')
as (a:int, b:int, c:int);
grunt> dump ss3
(100,10,1)
(2,200,20)
(3,30,300)
grunt> cat piglab/emp
101,aaaa,40000,m,11
102,bbbbbb,50000,f,12
103,cccc,50000,m,12
104,dd,90000,f,13
105,ee,10000,m,12
106,dkd,40000,m,12
107,sdkfj,80000,f,13
108,iiii,50000,m,11
grunt> emp = load 'piglab/emp'
>> using PigStorage(',')
>> as (id:int, name:chararray, sal:int, sex:chararray, dno:int);
grunt> illustrate emp
-------------
| emp | id:int | name:chararray | sal:int | sex:chararray | dno:int |
----------------------------------------------------------------------------------------
| | 104 | dd | 90000 | f | 13 |
----------------------------------------------------------------------------------------

Previous Article

Pig : Foreach Operator

How Long Will Your Money Live After YOU Retire?

What's Your Reaction?

0

Like

0

Dislike

0

Love

0

Funny

0

Angry

0

Sad

0

Wow

Related Posts

Hadoop: Decade Two, Day Zero*

nikhil Dec 3, 2024 0 0

Analyzing US flight data on Amazon S3 with sparklyr and...

nikhil Dec 3, 2024 0 1

scala CONSTRUCTOR inheritace 13 07 2020

nikhil Dec 3, 2024 0 2

Pig : CoGroup examples Vs Union Examples

nikhil Dec 3, 2024 0 1

CICD Process

nikhil Dec 3, 2024 0 1

Spark vs. Hadoop 2019

nikhil Dec 3, 2024 0 0

Comments

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.

ca-pub-4239506253673884