Search This Blog

Tuesday, July 10, 2012

Run custom jar with elastic map reduce (EMR) on command line

I followed the instruction to run a custom jar on EMR:

http://aws.amazon.com/articles/3938

I got stuck with step 5:
5. Run the job flow.
 $ ./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json 
I couldn't find file "elasticmapreduce-client.rb" at all.  After some online searches, I got it work.  The correct command is:
./elastic-mapreduce --create --json path/to/your/flow
Here is my flow file looks like:
   [
      {
         "Name": "Custom Jar Grep Example 1",
         "ActionOnFailure": "CONTINUE",
         "HadoopJarStep":
         {
            "Jar": "s3n://YOUR_BUCKET/hadoop-examples-0.20.2-cdh3u4.jar",
                       ##"MainClass": "fully-qualified-class-name", 
            "Args":
            [
               "grep",
               "s3n://YOUR_BUCKET/input/example",
               "s3n://YOUR_BUCKET/output/example",
               "dfs[a-z.]+"
            ]
         }
      }
   ]
The flow is corresponding to the following hadoop command:
hadoop jar hadoop-examples-0.20.2-cdh3u4.jar grep input output 'dfs[a-z.]+'

Some useful tips:

1. show the log:
 ./elastic-mapreduce --jobflow JOB_ID --logs

1 comment: