1. find the total line number of a fold for some files:
find test -iname *.java | xargs wc -l | tail -1
2. count # of files in each folder recursively
for t in `find . -type d -ls | awk '{print $11}'`; do echo "$t `find $t -type f | wc -l`" ; done > ~/count.txt
Search This Blog
Friday, December 7, 2012
Thursday, November 1, 2012
Some svn commands
Create tag from trunk or branch:
svn copy http://svn.mydomain.com/repository/myproject/trunk \ http://svn.mydomain.com/repository/myproject/tags/release-1.0 \ -m "Tagging the 1.0 release."
The tag created is the snapshot of the trunk at the time the "svn copy" is executed.
To be more precisely, the revision # can be passed to "svn copy":
svn copy -r 12345 http://svn.mydomain.com/repository/myproject/trunk \ http://svn.mydomain.com/repository/myproject/tags/release-1.0 \ -m "Tagging the 1.0 release."
Merge from Trunk to a branch:
1. check out the branch (assume the path is /path/to/mybranch)
2. go to above folder
3. run the following commmand:
svn merge http://svn.mydomain.com/repository/myproject/trunk .
It will merge all changes from trunk to mybranch since last merge.
The above command is same as:
svn merge -rLastMergedRevision:HEAD http://svn.mydomain.com/repository/myproject/trunk .
View merge history:
"svn log" doesn't display the merge history. It only shows the merge commit:
svn log------------------------------------------------------------------------ r196402 | liz | 2012-11-01 10:39:13 -0700 (Thu, 01 Nov 2012) | 1 line Merging r196340 through r196401 ------------------------------------------------------------------------ r196340 | liz | 2012-10-31 14:52:06 -0700 (Wed, 31 Oct 2012) | 1 line development branch for new feature
If need to see the merge history, the option --use-merge-history (-g) can be used with svn log:
svn log -g
------------------------------------------------------------------------ r196402 | liz | 2012-11-01 10:39:13 -0700 (Thu, 01 Nov 2012) | 1 line Merging r196340 through r196401 ------------------------------------------------------------------------ r196388 | xyz | 2012-11-01 09:50:28 -0700 (Thu, 01 Nov 2012) | 2 lines Merged via: r196402 Added new unit tests ------------------------------------------------------------------------ r196340 | liz | 2012-10-31 14:52:06 -0700 (Wed, 31 Oct 2012) | 1 line development branch for new feature
Tuesday, July 17, 2012
Run Sqoop on Amazon Elastic MapReduce (EMR) with Amazon RDS
Amazon EMR doesn't have Sqoop installed. It is possible to run Sqoop with Amazon EMR. The following blog shows how to install and run Sqoop:
http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/
However, the solution isn't perfect since the input files are usually in S3 and Sqoop doesn't support S3 directly. Here is my script to install sqoop and export data from S3 to Amazon RDS (mysql):
Note that, RDS needs to be configured to allow access to the database with the following 2 EC2 security groups:
http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/
However, the solution isn't perfect since the input files are usually in S3 and Sqoop doesn't support S3 directly. Here is my script to install sqoop and export data from S3 to Amazon RDS (mysql):
#!/bin/bash
BUCKET_NAME=zli-emr-test
SQOOP_FOLDER=sqoop-1.4.1-incubating__hadoop-0.20
SQOOP_TAR=$SQOOP_FOLDER.tar.gz
##change to home directory
cd ~
##Install sqoop on emr
hadoop fs -copyToLocal s3n://$BUCKET_NAME/$SQOOP_TAR $SQOOP_TAR
tar -xzf $SQOOP_TAR
##Install jdbc driver (ex mysql-connection-java.jar) to sqoop lib folder
hadoop fs -copyToLocal s3n://$BUCKET_NAME/mysql-connector-java-5.1.19.jar ~/$SQOOP_FOLDER/lib/
##Copy input file from S3 to hdf
HADOOP_INPUT=hdfs:///user/hadoop/myinput
hadoop distcp s3://$BUCKET_NAME/myinput $HADOOP_INPUT
~/$SQOOP_FOLDER/bin/sqoop export --connect jdbc:mysql://RDS-Host-name:3306/DB_NAME --username USERNAME --password PASSWORD --table TABLE_NAME --export-dir $HADOOP_INPUT --input-fields-terminated-by='\t'The script assumes that sqoop tar ball and mysql-connector-java.jar are in S3 bucket, as well as the input file are in S3 too.
Note that, RDS needs to be configured to allow access to the database with the following 2 EC2 security groups:
ElasticMapReduce-master
ElasticMapReduce-slave
Tuesday, July 10, 2012
Run custom jar with elastic map reduce (EMR) on command line
I followed the instruction to run a custom jar on EMR:
http://aws.amazon.com/articles/3938
I got stuck with step 5:
Some useful tips:
1. show the log:
http://aws.amazon.com/articles/3938
I got stuck with step 5:
5. Run the job flow.I couldn't find file "elasticmapreduce-client.rb" at all. After some online searches, I got it work. The correct command is:
$ ./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json
./elastic-mapreduce --create --json path/to/your/flowHere is my flow file looks like:
[##"MainClass": "fully-qualified-class-name",
{
"Name": "Custom Jar Grep Example 1",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep":
{
"Jar": "s3n://YOUR_BUCKET/hadoop-examples-0.20.2-cdh3u4.jar",
"Args":The flow is corresponding to the following hadoop command:
[
"grep",
"s3n://YOUR_BUCKET/input/example",
"s3n://YOUR_BUCKET/output/example",
"dfs[a-z.]+"
]
}
}
]
hadoop jar hadoop-examples-0.20.2-cdh3u4.jar grep input output 'dfs[a-z.]+'
Some useful tips:
1. show the log:
./elastic-mapreduce --jobflow JOB_ID --logs
Labels:
command line,
custom jar,
elastic map reduce,
emr,
hadoop
Friday, June 29, 2012
mvn test: LocalJobRunner.java with java.lang.OutOfMemoryError
When I run mvn test to unit test some hadoop related program, I got the following:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>-Xms256m -Xmx512m</argLine>
</configuration>
</plugin>
java.lang.OutOfMemoryError: Java heap spaceI tried to set MAVEN_OPTS=-Xmx2048m, but it didn't work. After the online research, I fixed this problem by adding the following to pom:
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>-Xms256m -Xmx512m</argLine>
</configuration>
</plugin>
Labels:
Java heap space,
maven,
maven-surefire-plugin,
mvn,
OutOfMemoryError,
plugin,
test,
unit test
Thursday, June 28, 2012
maven shade plugin: Invalid signature file digest for Manifest main attributes
If you get the following error message with maven shade plugin:
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
You need to add the following to pom.xml:
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
Explanation:
The above configuration filters all files in META-INF ending with .SF, .DSA, and .RSA for all artifacts (*:*) when creating uber-jar file.
The reason java.lang.SecurityException is raised is because some dependency jar files are signed jar files. A jar file is signed by using jarsigner, which creates 2 additional files and places them in META-INF:
See jarsigner for detailed explanation of JAR Signing and Verification Tool.
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
You need to add the following to pom.xml:
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
Explanation:
The above configuration filters all files in META-INF ending with .SF, .DSA, and .RSA for all artifacts (*:*) when creating uber-jar file.
The reason java.lang.SecurityException is raised is because some dependency jar files are signed jar files. A jar file is signed by using jarsigner, which creates 2 additional files and places them in META-INF:
- a signature file, with a .SF extension, and
- a signature block file, with a .DSA, .RSA, or .EC extension.
See jarsigner for detailed explanation of JAR Signing and Verification Tool.
Subscribe to:
Posts (Atom)