Hadoop Commands
Hadoop CLI Commands hadoop command [genericOptions] [commandOptions] hadoop fs Usage: hadoop fs [generic options] [-appendToFile ... ] [-cat [-ignoreCrc] ...] [-checksum ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] [-d] ... ] [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] ... ] [-count [-q] [-h] [-v] [-t []] [-u] [-x] ...] [-cp [-f] [-p | -p[topax]] [-d] ... ] [-createSnapshot []] [-deleteSnapshot ] [-df [-h] [ ...]] [-du [-s] [-h] [-x] ...] [-expunge] [-find ... ...] [-get [-f] [-p] [-ignoreCrc] [-crc] ... ] [-getfacl [-R] ] [-getfattr [-R] {-n name | -d} [-e en] ] [-getmerge [-nl] [-skip-empty-file] ] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [ ...]] [-mkdir [-p] ...] [-moveFromLocal ... ] [-moveToLocal ] [-mv ... ] [-put [-f] [-p] [-l] [-d] ... ] [-renameSnapshot ] [-rm [-f] [-r|-R] [-skipTrash] [-safely] ...] [-rmdir [--ignore-fail-on-non-empty] ...] [-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]] [-setfattr {-n name [-v value] | -x name} ] [-setrep [-R] [-w] ...] [-stat [format] ...] [-tail [-f] ] [-test -[defsz] ] [-text [-ignoreCrc] ...] [-touchz ...] [-truncate [-w] ...] [-usage [cmd ...]] hadoop fs -ls / -- Get a directory listing of the HDFS root directory hadoop fs -ls /user/hive hadoop fs -ls s3a://satya-hive/trip_data_dec17 hadoop fs -cat /test1/foo.txt -- Display contents of the file residing in HDFS hadoop fs -cat /test1/foo.txt | wc -l -- Count the number of lines of file in HDFS hadoop fs -rm /test1/foo.txt -- Remove the file in HDFS hadoop fs -rm -r hadoop-test2 hadoop fs -mkdir /test6/ -- Create a directory under the HDFS root directory hadoop fs -put foo.txt /test6/ -- Copy file foo.txt from local disk to the directory hadoop fs -put /etc/note.txt /test2/note_fs.txt hadoop fs -get /user/satya/passwd ./ -- Copy the file back to local disk hadoop fs -cp hadoop-test1/dwp-payments-april10.csv hadoop-test2 hadoop fs -cp /user/satya/reviews_Home_and_Kitchen_5.json s3a://satya-sparks/reviews_HomeKitchen hadoop fs -Ddfs.replication=2 -cp hadoop-test2/dwp-payments-april10.csv hadoop-test2/test_with_rep2.csv hadoop fs -mv hadoop-test1/dwp-payments-april10.csv hadoop-test3 hadoop fs -setrep 5 -R /user/satya/tmp/ hadoop fs -chmod 1777 /tmp hadoop fs -touch /user/satya/test/foo hadoop fs -rmr /user/satya/test/foo hadoop fs -touchz /user/satya/test/bar hadoop fs -count -q /user/satya hadoop fs -copyFromLocal /hirw-starterkit/hdfs/commands/dwp-payments-april10.csv hadoop-test1 hadoop fs -copyToLocal hadoop-test1/dwp-payments-april10.csv . hadoop -execute start-all.sh hadoop job -list hadoop job -kill jobID hadoop job -list-attempt-ids jobID taskType taskState hadoop job -kill-task taskAttemptId hadoop namenode -format hadoop jar wordcount hadoop jar /opt/hadoop/hadoop-examples-1.0.4.jar wordcount /out/wc_output hadoop dfsadmin -report hadoop dfsadmin -setSpaceQuota 10737418240 /user/esammer hadoop dfsadmin -refreshNodes hadoop dfsadmin -upgradeProgress status hadoop dfsadmin -finalizeUpgrade hadoop fsck Usage: DFSck [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]] -- start checking from this path -move -- move corrupted files to /lost+found -delete -- delete corrupted files -files -- print out files being checked -openforwrite -- print out files opened for write -blocks -- print out block report -locations -- print out locations for every block -racks -- print out network topology for data-node locations By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status. hadoop fsck / -files -blocks -locations hadoop fsck /user/satya -files -blocks -locations hadoop distcp -- Distributed Copy (distcp) distcp [OPTIONS] * OPTIONS: -p[rbugp] Preserve status r: replication number b: block size u: user g: group p: permission -p alone is equivalent to -prbugp -i Ignore failures -log Write logs to -m Maximum number of simultaneous copies -overwrite Overwrite destination -update Overwrite if src size different from dst size -skipcrccheck Do not use CRC check to determine if src is different from dest. Relevant only if -update is specified -f Use list at as src list -filelimit Limit the total number of files to be
hadoop command [genericOptions] [commandOptions]
hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile
[-cat [-ignoreCrc]
[-checksum
[-chgrp [-R] GROUP PATH...]
[-chmod [-R]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc]
[-count [-q] [-h] [-v] [-t [
[-cp [-f] [-p | -p[topax]] [-d]
[-createSnapshot
[-deleteSnapshot
[-df [-h] [
[-du [-s] [-h] [-x]
[-expunge]
[-find
[-get [-f] [-p] [-ignoreCrc] [-crc]
[-getfacl [-R]
[-getfattr [-R] {-n name | -d} [-e en]
[-getmerge [-nl] [-skip-empty-file]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [
[-mkdir [-p]
[-moveFromLocal
[-moveToLocal
[-mv
[-put [-f] [-p] [-l] [-d]
[-renameSnapshot
[-rm [-f] [-r|-R] [-skipTrash] [-safely]
[-rmdir [--ignore-fail-on-non-empty]
[-setfacl [-R] [{-b|-k} {-m|-x
[-setfattr {-n name [-v value] | -x name}
[-setrep [-R] [-w]
[-stat [format]
[-tail [-f]
[-test -[defsz]
[-text [-ignoreCrc]
[-touchz
[-truncate [-w]
[-usage [cmd ...]]
hadoop fs -ls / -- Get a directory listing of the HDFS root directory
hadoop fs -ls /user/hive
hadoop fs -ls s3a://satya-hive/trip_data_dec17
hadoop fs -cat /test1/foo.txt -- Display contents of the file residing in HDFS
hadoop fs -cat /test1/foo.txt | wc -l -- Count the number of lines of file in HDFS
hadoop fs -rm /test1/foo.txt -- Remove the file in HDFS
hadoop fs -rm -r hadoop-test2
hadoop fs -mkdir /test6/ -- Create a directory under the HDFS root directory
hadoop fs -put foo.txt /test6/ -- Copy file foo.txt from local disk to the directory
hadoop fs -put /etc/note.txt /test2/note_fs.txt
hadoop fs -get /user/satya/passwd ./ -- Copy the file back to local disk
hadoop fs -cp hadoop-test1/dwp-payments-april10.csv hadoop-test2
hadoop fs -cp /user/satya/reviews_Home_and_Kitchen_5.json s3a://satya-sparks/reviews_HomeKitchen
hadoop fs -Ddfs.replication=2 -cp hadoop-test2/dwp-payments-april10.csv hadoop-test2/test_with_rep2.csv
hadoop fs -mv hadoop-test1/dwp-payments-april10.csv hadoop-test3
hadoop fs -setrep 5 -R /user/satya/tmp/
hadoop fs -chmod 1777 /tmp
hadoop fs -touch /user/satya/test/foo
hadoop fs -rmr /user/satya/test/foo
hadoop fs -touchz /user/satya/test/bar
hadoop fs -count -q /user/satya
hadoop fs -copyFromLocal /hirw-starterkit/hdfs/commands/dwp-payments-april10.csv hadoop-test1
hadoop fs -copyToLocal hadoop-test1/dwp-payments-april10.csv .
hadoop -execute start-all.sh
hadoop job -list
hadoop job -kill jobID
hadoop job -list-attempt-ids jobID taskType taskState
hadoop job -kill-task taskAttemptId
hadoop namenode -format
hadoop jar
hadoop jar /opt/hadoop/hadoop-examples-1.0.4.jar wordcount /out/wc_output
hadoop dfsadmin -report
hadoop dfsadmin -setSpaceQuota 10737418240 /user/esammer
hadoop dfsadmin -refreshNodes
hadoop dfsadmin -upgradeProgress status
hadoop dfsadmin -finalizeUpgrade
hadoop fsck
Usage: DFSck
-move -- move corrupted files to /lost+found
-delete -- delete corrupted files
-files -- print out files being checked
-openforwrite -- print out files opened for write
-blocks -- print out block report
-locations -- print out locations for every block
-racks -- print out network topology for data-node locations
By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually tagged CORRUPT or HEALTHY depending on their block allocation status.
hadoop fsck / -files -blocks -locations
hadoop fsck /user/satya -files -blocks -locations
hadoop distcp -- Distributed Copy (distcp)
distcp [OPTIONS]
OPTIONS:
-p[rbugp] Preserve status
r: replication number
b: block size
u: user
g: group
p: permission
-p alone is equivalent to -prbugp
-i Ignore failures
-log
-m
-overwrite Overwrite destination
-update Overwrite if src size different from dst size
-skipcrccheck Do not use CRC check to determine if src is different from dest. Relevant only if -update is specified
-f
-filelimit
-sizelimit
-delete Delete the files existing in the dst but not in src
-mapredSslConf
NOTE 1: if -overwrite or -update are set, each source URI is interpreted as an isomorphic update to an existing directory.
For example:
hadoop distcp -p -update "hdfs://A:8020/user/foo/bar" "hdfs://B:8020/user/foo/baz"
would update all descendants of 'baz' also in 'bar'; it would *not* update /user/foo/baz/bar
NOTE 2: The parameter
1230k = 1230 * 1024 = 1259520
891g = 891 * 1024^3 = 956703965184
hadoop distcp hdfs://A:8020/path/one hdfs://B:8020/path/two
hadoop distcp /path/one /path/two
Related Hadoop Articles: Apache Hadoop Hadoop Training in India (Bangalore/Hyderabad)