http://aws.amazon.com/articles/3938
I got stuck with step 5:
5. Run the job flow.I couldn't find file "elasticmapreduce-client.rb" at all. After some online searches, I got it work. The correct command is:
$ ./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json
./elastic-mapreduce --create --json path/to/your/flowHere is my flow file looks like:
[##"MainClass": "fully-qualified-class-name",
{
"Name": "Custom Jar Grep Example 1",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep":
{
"Jar": "s3n://YOUR_BUCKET/hadoop-examples-0.20.2-cdh3u4.jar",
"Args":The flow is corresponding to the following hadoop command:
[
"grep",
"s3n://YOUR_BUCKET/input/example",
"s3n://YOUR_BUCKET/output/example",
"dfs[a-z.]+"
]
}
}
]
hadoop jar hadoop-examples-0.20.2-cdh3u4.jar grep input output 'dfs[a-z.]+'
Some useful tips:
1. show the log:
./elastic-mapreduce --jobflow JOB_ID --logs
awesome ! :)
ReplyDelete