Useful commands for hadoop developer
This post combines most frequently used command for spark, emr, yarn and AWS by hadoop developer.
Kill Spark job:
This command will kill all the running spark jobs.
ps aux | grep -i spark | awk {'print $2'} | xargs kill -9
EMR and Hadoop commands
Yarn Restart
sudo /sbin/stop hadoop-yarn-resourcemanager sudo /sbin/start hadoop-yarn-resourcemanager
EMR list cluster
aws emr list-clusters aws emr list-instances --cluster-id j-XXXXXX
Distcp
hadoop distcp /<hdfs-location>/* s3://<bucket-name/<key> &
Mapreduce
mapred job -list mapred job -kill $jobId
Yarn jobs:
yarn application -list yarn application -kill <ApplicationId> yarn logs -applicationId yarn logs -applicationId <Application ID> -containerId <Container ID>
AWS S3
aws s3 ls s3://<bucket-name/<key> --recursive --human-readable --summarize
aws distcp
s3-dist-cp --src <source-location> --dest=s3://<bucket-name/<key>