Search This Blog

Friday, December 7, 2012

my shell script cheat sheet

1. find the total line number of a fold for some files:

find test -iname *.java | xargs wc -l | tail -1

2. count # of files in each folder recursively

for t in `find . -type d -ls | awk '{print $11}'`; do echo "$t `find $t -type f  | wc -l`" ; done > ~/count.txt

insert control character with vi

insert a visible ^A, we must use: <ctrl> + v + a.

Thursday, November 1, 2012

Some svn commands

Create tag from trunk or branch:
svn copy http://svn.mydomain.com/repository/myproject/trunk \
           http://svn.mydomain.com/repository/myproject/tags/release-1.0 \
      -m "Tagging the 1.0 release." 
The tag created is the snapshot of the trunk at the time the "svn copy" is executed.  
To be more precisely, the revision # can be passed to "svn copy":
svn copy -r 12345 http://svn.mydomain.com/repository/myproject/trunk \
           http://svn.mydomain.com/repository/myproject/tags/release-1.0 \
      -m "Tagging the 1.0 release."
Merge from Trunk to a branch:

1. check out the branch (assume the path is /path/to/mybranch)
2. go to above folder
3. run the following commmand:
svn merge http://svn.mydomain.com/repository/myproject/trunk .
It will merge all changes from trunk to mybranch since last merge. 
The above command is same as:
svn merge -rLastMergedRevision:HEAD http://svn.mydomain.com/repository/myproject/trunk .
View merge history:
"svn log" doesn't display the merge history.  It only shows the merge commit:
svn log
------------------------------------------------------------------------
r196402 | liz | 2012-11-01 10:39:13 -0700 (Thu, 01 Nov 2012) | 1 line

Merging r196340 through r196401
------------------------------------------------------------------------
r196340 | liz | 2012-10-31 14:52:06 -0700 (Wed, 31 Oct 2012) | 1 line

development branch for new feature
If need to see the merge history, the option --use-merge-history (-g) can be used with svn log:
svn log -g
------------------------------------------------------------------------
r196402 | liz | 2012-11-01 10:39:13 -0700 (Thu, 01 Nov 2012) | 1 line

Merging r196340 through r196401
------------------------------------------------------------------------
r196388 | xyz | 2012-11-01 09:50:28 -0700 (Thu, 01 Nov 2012) | 2 lines
Merged via: r196402

Added new unit tests

------------------------------------------------------------------------
r196340 | liz | 2012-10-31 14:52:06 -0700 (Wed, 31 Oct 2012) | 1 line

development branch for new feature
 
  

Tuesday, July 17, 2012

Run Sqoop on Amazon Elastic MapReduce (EMR) with Amazon RDS

Amazon EMR doesn't have Sqoop installed.  It is possible to run Sqoop with Amazon EMR.  The following blog shows how to install and run Sqoop:

http://blog.kylemulka.com/2012/04/how-to-install-sqoop-on-amazon-elastic-map-reduce-emr/

However, the solution isn't perfect since the input files are usually in S3 and Sqoop doesn't support S3 directly.  Here is my script to install sqoop and export data from S3 to Amazon RDS (mysql):

#!/bin/bash
BUCKET_NAME=zli-emr-test
SQOOP_FOLDER=sqoop-1.4.1-incubating__hadoop-0.20
SQOOP_TAR=$SQOOP_FOLDER.tar.gz

##change to home directory
cd ~

##Install sqoop on emr
hadoop fs -copyToLocal s3n://$BUCKET_NAME/$SQOOP_TAR $SQOOP_TAR
tar -xzf $SQOOP_TAR

##Install jdbc driver (ex mysql-connection-java.jar) to sqoop lib folder
hadoop fs -copyToLocal s3n://$BUCKET_NAME/mysql-connector-java-5.1.19.jar ~/$SQOOP_FOLDER/lib/
##Copy input file from S3 to hdf
HADOOP_INPUT=hdfs:///user/hadoop/myinput
hadoop distcp s3://$BUCKET_NAME/myinput $HADOOP_INPUT
~/$SQOOP_FOLDER/bin/sqoop export --connect jdbc:mysql://RDS-Host-name:3306/DB_NAME --username USERNAME --password PASSWORD --table TABLE_NAME --export-dir $HADOOP_INPUT --input-fields-terminated-by='\t'
The script assumes that sqoop tar ball and mysql-connector-java.jar are in S3 bucket, as well as the input file are in S3 too.

Note that, RDS needs to be configured to allow access to the database with the following 2 EC2 security groups:
ElasticMapReduce-master
ElasticMapReduce-slave

Tuesday, July 10, 2012

Run custom jar with elastic map reduce (EMR) on command line

I followed the instruction to run a custom jar on EMR:

http://aws.amazon.com/articles/3938

I got stuck with step 5:
5. Run the job flow.
 $ ./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json 
I couldn't find file "elasticmapreduce-client.rb" at all.  After some online searches, I got it work.  The correct command is:
./elastic-mapreduce --create --json path/to/your/flow
Here is my flow file looks like:
   [
      {
         "Name": "Custom Jar Grep Example 1",
         "ActionOnFailure": "CONTINUE",
         "HadoopJarStep":
         {
            "Jar": "s3n://YOUR_BUCKET/hadoop-examples-0.20.2-cdh3u4.jar",
                       ##"MainClass": "fully-qualified-class-name", 
            "Args":
            [
               "grep",
               "s3n://YOUR_BUCKET/input/example",
               "s3n://YOUR_BUCKET/output/example",
               "dfs[a-z.]+"
            ]
         }
      }
   ]
The flow is corresponding to the following hadoop command:
hadoop jar hadoop-examples-0.20.2-cdh3u4.jar grep input output 'dfs[a-z.]+'

Some useful tips:

1. show the log:
 ./elastic-mapreduce --jobflow JOB_ID --logs

Friday, June 29, 2012

mvn test: LocalJobRunner.java with java.lang.OutOfMemoryError

When I run mvn test to unit test some hadoop related program, I got the following:
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
I tried to set MAVEN_OPTS=-Xmx2048m, but it didn't work.   After the online research, I fixed this problem by adding the following to pom:

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <configuration>
          <argLine>-Xms256m -Xmx512m</argLine>
        </configuration>
      </plugin>

Thursday, June 28, 2012

maven shade plugin: Invalid signature file digest for Manifest main attributes

If you get the following error message with maven shade plugin:

Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes

You need to add the following to pom.xml:

        <configuration>
          <filters>
            <filter>
              <artifact>*:*</artifact>
              <excludes>
                <exclude>META-INF/*.SF</exclude>
                <exclude>META-INF/*.DSA</exclude>
                <exclude>META-INF/*.RSA</exclude>
              </excludes>
            </filter>
          </filters>
        </configuration>

Explanation:

The above configuration filters all files in META-INF ending with .SF, .DSA, and .RSA for all artifacts (*:*) when creating uber-jar file.

The reason java.lang.SecurityException is raised is because some dependency jar files are signed jar files.  A jar file is signed by using jarsigner, which creates 2 additional files and places them in META-INF:
  • a signature file, with a .SF extension, and
  • a signature block file, with a .DSA, .RSA, or .EC extension.
Since the uber-jar file is created, the signatures and integrity of signed JAR files are no longer valid.  When the uber-jar file is executed, java.lang.SecurityException is thrown.

See jarsigner for detailed explanation of JAR Signing and Verification Tool.

Sunday, June 24, 2012

Eclipse: fix plugin execution not covered by lifecycle configuration

I recently used maven-jaxb2-plugin to generate java classes based on xml schemas (xsd).  I can run "mvn compile" on command line to generate java classes with the following configuration:
            <plugin>
                <groupId>org.jvnet.jaxb2.maven2</groupId>
                <artifactId>maven-jaxb2-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>generate</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

However, my Eclipse shows the following error:

Plugin execution not covered by lifecycle configuration: org.jvnet.jaxb2.maven2:maven-jaxb2-plugin:0.8.1:generate (execution: default, phase: generate-sources)


This error prevents Eclipse generating java source codes and adding the generated sources to the class path.  In order to fix this problem, I need to install m2e connector for maven-jaxb2-plugin.  I can manually install it to Eclipse with the following update site:

http://bitstrings.github.com/m2e-connectors-p2/milestones/

However, Eclipse provides a better to install it.  Eclipse shows the problem with the following:


Roll the mouse over <execution>, it displays the following:
Click the link "Discover new m2e connector", and eclipse will try to find the correct  connector automatically:
Eclipse found the same m2e connector.  Click finish to install it.  After I installed it successfully, Eclipse is happy with maven-jaxb2-plugin.

This approach can be applied to other plugins too.

Friday, June 8, 2012

Enable CORS support in REST services with Spring 3.1

We are working on a web project, which has 2 components, UI and web services.  These 2 components are deployed to 2 different servers.  UI uses jQuery to make ajax call  to back end restful web services. Back end web services use Spring 3.1 MVC framework.  We are facing the XSS (cross site scripting) issue since modern browsers don't allow XSS.

The back end rest web services support HTTP methods GET, POST, PUT and DELETE.  The first problem we solved was how to call HTTP method GET from UI using jQuery.  Since I used JSONP before, it is a natural solution to me to use JSONP again.  My coworker already blog the solution here: http://www.iceycake.com/2012/06/xml-json-jsonp-web-service-endpoints-spring-3-1.

However,  JSONP only supports GET, and you can't set http headers with JSONP request.  I also realized that JSONP is kind of out-of-dated too.  So how to support other methods like PUT, POST? 

CORS, Cross-Origin Resource Sharing, defines a mechanism to enable client-side cross-origin requests.  CORS are widely supported by modern browsers like FireFox, Chrome, Safari, and IE.

The UI code is fairly simple:
<html>
<head>
<script type="text/javascript" src="jquery.js"></script>
<script type="application/javascript">      
(function($) {
    var url = 'http://localhost:8080/employee/id';
    $.ajax({
        type: 'put',
        url: url,
        async: true,
    contentType: 'application/json',
        data: '{"id": 1, "name": "John Doe"}',
        success: function(response) {
            alert("success");
        },
        error: function(xhr) {
            alert('Error!  Status = ' + xhr.status + " Message = " + xhr.statusText);
        }
    });
})(jQuery);
</script>
</head>
<body>
    <!-- we will add our HTML content here -->
</body>
</html>
If you just run the above javascript and call the backend, you may notice that the PUT request is changed to OPTIONS.  The browser sends an OPTIONS request to the server, and the server needs to send response with correct headers.  So in order to return the correct response, a filter is created (thanks to the link https://gist.github.com/2232095):
package com.zhentao;
import java.io.IOException;
import javax.servlet.FilterChain;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.springframework.web.filter.OncePerRequestFilter;
public class CorsFilter extends OncePerRequestFilter {

    @Override
    protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
            throws ServletException, IOException {
        if (request.getHeader("Access-Control-Request-Method") != null && "OPTIONS".equals(request.getMethod())) {
            // CORS "pre-flight" request
            response.addHeader("Access-Control-Allow-Origin", "*");
            response.addHeader("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE");
            response.addHeader("Access-Control-Allow-Headers", "Content-Type");
            response.addHeader("Access-Control-Max-Age", "1800");//30 min
        }
        filterChain.doFilter(request, response);
    }
}
 The web.xml needs adding the following too:
  <filter>
    <filter-name>cors</filter-name>
    <filter-class>com.zhentao.CorsFilter</filter-class>
  </filter>
  <filter-mapping>
    <filter-name>cors</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

   
However, this isn't enough yet.  The original post didn't address it.  You still need to change your rest web services to return Accept-Controll-Allow-Origin header since HTTP is stateless.  Here is what I did with Spring Rest api:
    @RequestMapping(value = "/employee/{id}", method = RequestMethod.PUT, consumes = {"application/json"})
    public ResponseEntity<Employee> create(@Valid @RequestBody Employee employee, @PathVariable String id) 
            employee.setId(id);//some logic here
            HttpHeaders headers = new HttpHeaders();
            headers.add("Access-Control-Allow-Origin", "*");
            ResponseEntity<Employee> entity = new ResponseEntity<Employee>(headers, HttpStatus.CREATED);
            return entity;
    }
Update: it seems my post caused some confusion, and I created another post and fully working examples for CORS. See here.

Wednesday, May 30, 2012

log4j filter some text and don't log it

Use org.apache.log4j.varia.StringMatchFilter to filter the text not to display.  Here is the example for log4j.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
  <appender name="console" class="org.apache.log4j.ConsoleAppender">
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} [%t] %-5p - %m%n" />
    </layout>

    <filter class="org.apache.log4j.varia.StringMatchFilter">
      <param name="StringToMatch" value="the-text-not-to-log" />
      <param name="AcceptOnMatch" value="false" />
    </filter>

    <!-- <filter class="org.apache.log4j.varia.DenyAllFilter" /> -->
  </appender>

  <root>
    <priority value="info" />
    <appender-ref ref="console" />
  </root>
</log4j:configuration>
Then run the following to test:

        LOG.info("This is an info and shouldn't be logged with the-text-not-to-log and blah blah");
        LOG.info("This is logged");
        LOG.debug("This is a debug and not logged");
        LOG.error("This is an error and logged");

If you only want to log some text, then change the value for AcceptOnMatch to true and add DenyAllFilter.

Wednesday, May 23, 2012

mac: no acceptable C compiler found in $PATH

If you get the following error on mac:
configure: error: no acceptable C compiler found in $PATH
Here is the step to fix it:

1. download Xcode from https://developer.apple.com/downloads/index.action
2. install Xcode
3. Open Xcode, go to Preferences --> Downloads --> install "Command Line Tools"
4. relaunch terminal

Saturday, May 19, 2012

Enable Spring Jdbc Transaction with Annotation

Spring 3.1 introduced a new annotation @EnableTransactionManagement.  With it, all xml configuration can be got rid of now.  Here is the example on how to use it:

@Configuration
@EnableTransactionManagement
public class JdbcConfig {

    @Bean
    public DataSource dataSource() {
    //config datasource      
    }

    @Bean
    public PlatformTransactionManager txManager() {
        return new DataSourceTransactionManager(dataSource());
    }

    @Bean
    public MyDao myDao() {
        MyDaoJdbc dao = new MyDaoJdbc();
        dao.setDataSource(dataSource());
        return dao;
    }
}
Next you need to annote your dao with @Transactional.  The source code can be found at github:

https://github.com/zhentao/spring-jdbc-transaction-example

cobertura-maven-plugin

Recently I created a parent pom for my team to use.  All my team's projects need to inherit this parent pom.  This parent pom is rather simple:

  <properties>
    <default.encoding>UTF-8</default.encoding>
    <default.jdk>1.6</default.jdk>
    <line.coverage.target>90</line.coverage.target>
    <branch.coverage.target>90</branch.coverage.target>
  </properties>

  <build>
    <defaultGoal>install</defaultGoal>
    <pluginManagement>
      <plugins>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>2.3.2</version>
          <configuration>
            <source>${default.jdk}</source>
            <target>${default.jdk}</target>
            <encoding>${default.encoding}</encoding>
          </configuration>
        </plugin>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-source-plugin</artifactId>
          <version>2.1.2</version>
        </plugin>
        <plugin>
          <artifactId>maven-resources-plugin</artifactId>
          <version>2.5</version>
          <configuration>
            <encoding>${default.encoding}</encoding>
          </configuration>
        </plugin>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-javadoc-plugin</artifactId>
          <version>2.8</version>
          <configuration>
            <encoding>${default.encoding}</encoding>
          </configuration>
        </plugin>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-release-plugin</artifactId>
          <version>2.2.2</version>
        </plugin>

        <!-- Sonar uses these versions, below -->
        <plugin>
          <artifactId>maven-pmd-plugin</artifactId>
          <version>2.7.1</version>
          <configuration>
            <targetJdk>${default.jdk}</targetJdk>
            <encoding>${default.encoding}</encoding>
          </configuration>
        </plugin>
        <plugin>
          <artifactId>maven-checkstyle-plugin</artifactId>
          <version>2.9.1</version>
          <configuration>
            <encoding>${default.encoding}</encoding>
          </configuration>
        </plugin>
        <plugin>
          <groupId>org.codehaus.mojo</groupId>
          <artifactId>cobertura-maven-plugin</artifactId>
          <version>2.5.1</version>
        </plugin>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.12</version>
        </plugin>
      </plugins>
    </pluginManagement>

    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>cobertura-maven-plugin</artifactId>
        <configuration>
          <check>
            <!-- Per-class thresholds -->
            <branchRate>${branch.coverage.target}</branchRate>
            <lineRate>${line.coverage.target}</lineRate>
            <!-- Project-wide thresholds -->
            <totalLineRate>${line.coverage.target}</totalLineRate>
            <totalBranchRate>${branch.coverage.target}</totalBranchRate>
          </check>
          <formats>
            <format>html</format>
          </formats>
        </configuration>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>clean</goal>
              <goal>cobertura</goal>
              <goal>check</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
      </plugin>
      <plugin>
        <!-- Ensure that source code is packaged and deployed for inclusion into IDEs -->
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-source-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>jar</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

    </plugins>
  </build>
We are targeting a pretty high unit test coverage, 90% for both line and branch coverage.  However, some classes don't need unit test or don't need high coverage.  Then each individual project can include the following to its project pom:
    <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>cobertura-maven-plugin</artifactId>
        <configuration>
          <instrumentation>
            <ignores>
              <ignore>org.slf4j.*</ignore>
            </ignores>
            <excludes>
              <exclude>com/zhentao/Some.class</exclude>
              <exclude>com/zhentao/other.class</exclude>
            </excludes>
          </instrumentation>
        </configuration>
      </plugin>
    </plugins>
  </build>
 <ignore>org.slf4j.*</ignore> ignores all log statement in the code since we don't need to test logs.
<exclude>com/zhentao/Some.class</exclude> excludes Some.class from the cobertura report.

However, this flexibility can be abused by some developers.  I got an build failed notification today:
11:52:52  Archiving artifacts
11:52:52  [htmlpublisher] Archiving HTML reports...
11:52:52  [htmlpublisher] Archiving at PROJECT level /target/site/cobertura to /htmlreports/Cobertura_Report
11:52:52  ERROR: Specified HTML directory '/target/site/cobertura' does not exist.
11:52:52  Build step 'Publish HTML reports' changed build result to FAILURE
 One developer excluded all classes from testing.  What a developer!

Friday, May 4, 2012

Yahoo hadoop tutorials

This is the link from yahoo for hadoop tutorials:

http://developer.yahoo.com/hadoop/tutorial/index.html

Monday, January 16, 2012

"someCommand > /dev/null 2>&1" means

someCommand > /dev/null 2>&1

When I saw it, I don't understand what it means since I am not good at shell script.  Today, I searched online and got the answer from the following link:

http://www.xaprb.com/blog/2006/06/06/what-does-devnull-21-mean/

Below is my summary from the above blog:

Greater than (>) is for redirect, and the number 1 stands for STDOUT and 2 for STDERR, and 0 for STDIN.  By default, if no number provided, it is for STDOUT (number 1).  So the above script means STDOUT is redirected to /dev/null (called bit-bucket), and STDERR then is redirected to STDOUT which in turn is redirected to /dev/null.  It means both STDERR and STDOUT are redirected to /dev/null eventually.