Search This Blog

Showing posts with label elastic map reduce. Show all posts
Showing posts with label elastic map reduce. Show all posts

Tuesday, July 10, 2012

Run custom jar with elastic map reduce (EMR) on command line

I followed the instruction to run a custom jar on EMR:

http://aws.amazon.com/articles/3938

I got stuck with step 5:
5. Run the job flow.
 $ ./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json 
I couldn't find file "elasticmapreduce-client.rb" at all.  After some online searches, I got it work.  The correct command is:
./elastic-mapreduce --create --json path/to/your/flow
Here is my flow file looks like:
   [
      {
         "Name": "Custom Jar Grep Example 1",
         "ActionOnFailure": "CONTINUE",
         "HadoopJarStep":
         {
            "Jar": "s3n://YOUR_BUCKET/hadoop-examples-0.20.2-cdh3u4.jar",
                       ##"MainClass": "fully-qualified-class-name", 
            "Args":
            [
               "grep",
               "s3n://YOUR_BUCKET/input/example",
               "s3n://YOUR_BUCKET/output/example",
               "dfs[a-z.]+"
            ]
         }
      }
   ]
The flow is corresponding to the following hadoop command:
hadoop jar hadoop-examples-0.20.2-cdh3u4.jar grep input output 'dfs[a-z.]+'

Some useful tips:

1. show the log:
 ./elastic-mapreduce --jobflow JOB_ID --logs