Technology Blog: Run custom jar with elastic map reduce (EMR) on command line

Tuesday, July 10, 2012

Run custom jar with elastic map reduce (EMR) on command line

I followed the instruction to run a custom jar on EMR:

http://aws.amazon.com/articles/3938

I got stuck with step 5:

5. Run the job flow.

 $ ./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json

I couldn't find file "elasticmapreduce-client.rb" at all. After some online searches, I got it work. The correct command is:

./elastic-mapreduce --create --json path/to/your/flow

Here is my flow file looks like:

   [
      {
         "Name": "Custom Jar Grep Example 1",
         "ActionOnFailure": "CONTINUE",
         "HadoopJarStep":
         {
            "Jar": "s3n://YOUR_BUCKET/hadoop-examples-0.20.2-cdh3u4.jar",

##"MainClass": "fully-qualified-class-name",

            "Args":
            [
               "grep",
               "s3n://YOUR_BUCKET/input/example",
               "s3n://YOUR_BUCKET/output/example",
               "dfs[a-z.]+"
            ]
         }
      }
   ]

The flow is corresponding to the following hadoop command:

hadoop jar hadoop-examples-0.20.2-cdh3u4.jar grep input output 'dfs[a-z.]+'

Some useful tips:

1. show the log:

./elastic-mapreduce --jobflow JOB_ID --logs

1 comment:

< Fefo />July 12, 2012 at 5:47 AM
awesome ! :)
ReplyDelete
Replies

Add comment

Technology Blog

Search This Blog

Tuesday, July 10, 2012

Run custom jar with elastic map reduce (EMR) on command line

1 comment:

Popular Posts