Monday, August 25, 2014

Splitting a git repository

Sometimes when you are starting with version control for your code, you dont know how big your repository is going to become. The decision whether to create a new repository for every module or to keep them all in one repository may be a difficult one.
When the modules are smaller, you would like to keep them all in one place. You can decide to move them in their own repository later. It took me some googling around to figure out how to split a git repo. This is fairly easy and this great blog post summarizes it very well. In this post, I am putting the lessons I learnt while splitting a git repo.

Please ensure that all your commits to the module are merged to master branch. The version logs of only the master branch would be preserved post the split.

Say, I have a git project called webservice.git. And it has the subdirectories service1/, service2/, service3/. Of these service1/ has grown large enough to be moved into its own repository.

Follow the following steps to move the folder service1/ into its own git repo, along with the logs of the master branch.

Step 1: Clone existing repo as desired repo:
$git clone --no-hardlinks git@gitserver:webservice.git service1

Step 2: Filter the branch and reset to exclude other files, so they can be pruned:
cd service1
$git filter-branch —subdirectory-filter service1 HEAD — —all
$git reset —hard
$git gc —aggressive
$git prune

Step 3: Create new empty repo on git server

Step 4: On the local machine, replace remote origin to point to new repo:
cd service1
$git remote rm origin
$git remote add origin git@gitserver:service1.git
$git push origin master

Thursday, August 21, 2014

An Introduction to Zookeeper - Part I of the Zookeeper series

A distributed system consists of multiple computers that communicate through a computer network and interact with each other to achieve a common goal.Major benefits that distributed systems offer over centralized systems is scalability and redundancy. Systems can be easily expanded by adding more machines as needed, and even if one of the machines is unavailable, the service continues to be available. However, such a system comes with its own challenges.

Zookeeper is an open source, high-performance coordination service for distributed applications. It tackles the common challenges that distributed applications face. It exposes common services that are required in the distributed environment like naming, configuration management, group service and provides the solution to distributed synchronization.

Zookeeper can be run in either a single-server mode or cluster(replicated) mode. Running zookeeper in single-server mode does not take advantage of zookeeper’s inherent features of high availability and resilience. Typically in a production environment, Zookeeper is run in a multi-server mode. A zookeeper cluster is called as an ensemble. A leader is elected on service startup. If the leader goes down, a new leader is elected. Clients only connect to a single zookeeper server and maintain a TCP connection. A Client can read from any zookeeper server, however writes go through the leader and needs a majority consensus.

Zookeeper provides sequential consistency guarantee, i.e., updates are applied in the order in which they are sent. It guarantees atomic updates, i.e., the updates either succeed or fail, there are no partial updates. It guarantees that a zookeeper client sees the same view of the service irrespective of the server in the ensemble that it connects to. A server will not accept a connection from a client until it has caught up with the state of the server to which the client was connected previously. Zookeeper ensures reliability, i.e., if an update succeeds in Zookeeper, then it is not rolled back. Zookeeper guarantees timeliness, i.e., a client is bound to see system changes within a certain time bound.

Zookeeper is an eventually consistent system, i.e., it does not guarantee that different clients will have identical view of zookeeper data at every instance in time. But it guarantees that if a follower falls too far behind the leader, then it goes offline.

The distributed processes using Zookeeper coordinate with each other through shared hierarchical namespaces. These namespaces are organized like UNIX file system. More on Zookeeper in the next post.

Tuesday, August 12, 2014

Multiple Java Installations on OSX and switching between them

Sometimes it may be necessary to have two versions of java on your OSX, and to switch between the two in a fast,reliable and convenient manner. This post aims to show how this can be achieved.

I am currently using version 10.9.4 of OSX, and I was required to have both JAVA6 and JAVA7 on my system, as some of the codebases I was working on were dependent on JAVA6 and some were dependent on JAVA7.

Download the required jdk versions from Oracle downloads page.

After the installations are done, run the following command to know the java version currently being used.
$java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

Run the following command. Replace 1.6 with the version you want to switch to.
$/usr/libexec/java_home -v 1.6
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home

Export the output of the above code as JAVA_HOME as follows:
$export JAVA_HOME="/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home"

Now run the command to know the java version:
$java -version
java version “1.6.0_65”
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

In this way, switching between the two versions can be done easily.

Monday, August 11, 2014

Java Inheritance - Super keyword

Inheritance is an important principle in object oriented programming. It is a mechanism in which one object acquires the property of another. The object which gives the properties is called as super/base/parent class and the one that receives the property is called as the sub/derived/child class. The child class inherits all fields and methods of the parent class extending it to add its own functionality by adding fields and methods.
One of the major advantages of Inheritance is reusability. The derived class gets the properties of the base class, thereby the code within the base is reused in the derived class, leading to faster development cycle.

This post looks at the effect inheritance has on the constructor and the methods, and particularly the role of super keyword in inheritance.

Incase of inheritance when an object of derived class is created the compiler first calls the base class constructor and then the derived class constructor. In case of multilevel inheritance the constructor would be called according to the order of inheritance.

Incase we have a method belonging to the base and derived class that have the same signatures, and if this common method is called, the compiler overrides the method belonging to the base class and calls the method belonging to the derived class.

Lets define a simple class A as shown in the snippet below. This is going to be our parent class.
class A{
    //Constructor of A
    A(){
        System.out.println("A's constructor");
    }
    public void method(){
 System.out.println("A's get method");
    }
}

We define a child class that extends the above class as folows.
class B extends A{
    //Constructor of B
    B(){
        System.out.println("B's constructor");
    }
}

Here is a sample test main method for testing the inheritance.
public static void main(String[] args){
    System.out.println("JAVA - INHERITANCE");
    B objectB = new B();
}

The output of the above code is as shown below.
JAVA - INHERITANCE
A's constructor
B's constructor

As we see, the base class constructor is called implicitly. We can also call the base class constructor explicitly by using super(). However, the super call should be the first statement in the derived class constructor. The following code snippet will give a compiler error.
class B extends A{
    //Constructor of B
    B(){
        System.out.println("B's constructor");  //This is not allowed
        super();
    }
}

The super() call has to be the first line in the constructor, as its important to initialize the base class before the derived class is initialized.
The correct way of writing the above code is as follows:
class B extends A{
    //Constructor of B
    B(){
        super();
    }
}

However, a base class method can be called from any line of the method in the derived class using the keyword super. This is shown in the snippet below.

class B extends A{
    //Constructor of B
    B(){
        super();
    }
 
    @Override
    public void method(){
    System.out.println("B's get method");
       super.method();
    }
}

Here is the sample main function where we call the derived class method..
public static void main(String[] args){
    System.out.println("JAVA - INHERITANCE");
    B objectB = new B();
    objectB.method();
}

The output of the above code is as shown below.
JAVA - INHERITANCE
A's constructor
B's constructor
B's get method
A's get method


Thus we have seen in this post that the parent class constructor is executed before the child class constructor. If the base class contains a constructor with one or more arguments, then it is mandatory for the derived class to have a constructor and pass the arguments to the base class constructor

Sunday, August 3, 2014

Configuration and Coordination with Zookeeper

It took me a while to understand the concept of Zookeeper and it took me another some to understand how to use it for the task that I had begun with. This post is intended to help others cross the bridge faster.
Dynamic Configuration Management for today's system comes with all the nitty-gritties that are involved with a distributed environment. A distributed environment is relatively unreliable, with problems like network failures, clock synchronization, and it becomes the responsibility of each server to keep track of correctness of current configuration. These problems are the motivation behind configuration and coordination systems.

This particular post looks at Zookeeper as a configuration management tool. A short and good course on Zookeeper

Setup: A zookeeper cluster with 5 servers. A java wrapper that uses zookeeper to keep track of configuration changes. For our purposes we used the Curator framework.

Read zookeeper multi-server setup on a single host for setting up zookeeper cluster.

In your java project include the dependency for the curator framework.

Initializing connection to Zookeeper ensemble
public static CuratorFramework createConnection(String zookeeperConnectionString) {
    //First retry will wait for 1 second, the second will wait up to 2 seconds, the third will wait
    //upto 4 seconds.
    RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
    CuratorFramework client = CuratorFrameworkFactory.builder().connectString(zookeeperConnectionString)
            .retryPolicy(retryPolicy)
            .namespace("session_service")
            .canBeReadOnly(true)
            .zookeeperFactory(new DefaultZookeeperFactory())
            .build();
    client.start();
    return client;
}

Creating a new znode
public static void create(CuratorFramework client, String path) throws Exception {
    client.create().forPath(path);
}

Setting data of a znode
public static void setData(CuratorFramework client, String path, String data) throws Exception {
    byte[] payload = data.getBytes();
    client.setData().forPath(path, payload);
}

Deleting a znode
public static void delete(CuratorFramework client, String path) throws Exception {
    client.delete().forPath(path);
}

Get children and set the given watcher on the node
public static List watchedGetChildren(CuratorFramework client, String path) throws Exception {
    return client.getChildren().usingWatcher(new WatcherImpl(client,path)).forPath(path);
}

Get Data of the node and set a watcher on the node
public static String getData(CuratorFramework client, String path) throws Exception{
    String str = new String(client.getData().usingWatcher(new WatcherImpl(client,path)).forPath(path));
    return str;
}

Zookeeper works on the idea of client setting watches on znodes. Whenever the znode changes, the watch is triggered, that is, the client is notified.

We should have a class that implements the Watcher interface. This class should implement the process method. This is called if and when the corresponding changes occur.
public class WatcherImpl implements Watcher{
    @Override
    public void process(WatchedEvent event) {
        if(event.getType() == Event.EventType.NodeDataChanged) {
     System.out.println("The Data has changed");
        }
        else if(event.getType() == Event.EventType.NodeChildrenChanged){
     System.out.println("Children have changed");
        }
    }
}

Its important to note that watches are one time event and if its required to continue monitoring the changes the watch needs to be set again. All the read operations like getData(), getChildren() and exists() have the option of setting a watch. These watches are triggered on the corresponding changes.

That's all for getting the configuration management with zookeeper up and running.