Tuesday, December 23, 2014

Configuring Hive on Ubuntu


Hive facilitates querying and managing large datasets residing in distributed storage. It is built on top of Hadoop. Hive defines a simple query language called as Hive Query language (HQL) which enables users familiar with SQL to query the data. Hive converts your HQL (Hive Query Language) queries into a series of MapReduce jobs for execution on a Hadoop cluster. In this post we will configure Hive on our machine.

Download Hive from the Apache Hive site. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hive. At the time of this writing, the latest version available is 0.14.0.

Prerequisites:
Java: 1.6 or higher. Preferred version would be 1.7
Hadoop: 2.x. For Hadoop installation you can refer to this post.

Installation

Set the environment variable HIVE_HOME to point to the installation directory. You can set this in your .bashrc
export HIVE_HOME=/user/hive

Finally, add $HIVE_HOME/bin to your PATH.
$export PATH=$HIVE_HOME/bin:$PATH

Setting HADOOP_PATH in HIVE config.sh
Append the following line to the file $HIVE_HOME/bin/config.sh.
export HADOOP_HOME=/user/hadoop


Running Hive
You must create /tmp and /user/hive/warehouse and set appropriate permissions before you can create any table in hive.
$ hadoop fs -mkdir /usr/hive/warehouse
$ hadoop fs -chmod g+w /usr/hive/warehouse
$ hadoop fs -mkdir /tmp
$ hadoop fs -chmod g+w /tmp

Start the hive shell
$ hive

The shell would look something like
Logging initialized using configuration in jar:file:/user/hive/lib/hive-common-0.14.0.jar!/hive-log4j.properties
hive >

Reference : https://cwiki.apache.org/confluence/display/Hive/Home

Tuesday, December 16, 2014

Configuring Hadoop on Ubuntu in pseudo-distributed mode


Hadoop is an open-source Apache project that enables processing of extremely large datasets in a distributed computing environment. There are three different modes in which it can be run:

1. Standalone Mode
2. Pseudo-Distributed Mode
3. Fully-Distributed Mode

This post covers setting up of Hadoop 2.5.1 in a Pseudo-distributed mode on an Ubuntu machine. For setting up hadoop on OSx, refer to this post .

Prerequisites


Java: Install Java if it isn’t installed on your system.
Keyless SSH : First, ensure ssh is installed. Then generate the key pairs.
$sudo apt-get install ssh
$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now ssh into your localhost and allow authorization.
rsync utility:
$sudo apt-get install rsync

Installation


Download Hadoop from the Apache Hadoop site. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hadoop. At the time of this writing, the latest version available is 2.5.2.

Configuration


Every component of Hadoop is configured using an XML file specifically located in hadoop-2.5.2/etc/hadoop.MapReduce properties go in mapred-site.xml, HDFS properties in hdfs-site.xml and common properties in core-site.xml. The general Hadoop environment properties are found in hadoop-env.sh.

hadoop-env.sh
# set to the root of your Java installation
export JAVA_HOME=/usr

# Assuming your installation directory is /user/hadoop
export HADOOP_PREFIX=/user/hadoop
For the rest of this post, we refer to /user/hadoop when we say $HADOOP_HOME.

core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

The Hadoop Distributed File System properties go in this config file. Since we are only setting up one node, we set the value of dfs.replication to 1.
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>


Execution


Before starting the daemons we must format the newly installed HDFS.
$ cd $HADOOP_HOME
$ bin/hdfs namenode -format

Start the Daemons:
$ cd $HADOOP_HOME
$ sbin/start-dfs.sh

Monitoring
By default, the web interface for NameNode is available at http://localhost:50070

Check the output of jps
$jps
10582 SecondaryNameNode
10260 NameNode
10685 Jps
10404 DataNode

Running Examples
1. Create the HDFS directories required to execute MapReduce jobs:
$ cd $HADOOP_HOME
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

2. Copy the input files to the Hadoop Distributed File System
$ bin/hdfs dfs -put etc/hadoop input

3. Run the example provided
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar grep input output 'dfs[a-z.]+'

4. View the output files on HDFS
$ bin/hdfs dfs -cat output/*

Stop the Daemons:
$ cd $HADOOP_HOME
$ sbin/stop-dfs.sh

Reference : http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

Friday, December 12, 2014

Git Basics - A cheat sheet for your daily git needs.

This post is for anyone to refer to for their daily git needs. We will not be covering any advanced git concepts here.


Git is a distributed version control system.
Some basic terminologies:
Directory: A folder that contains multiple files.
Repository: A directory where Git has been initialized to start version controlling your files.

I have created an empty directory called gitBasics on my machine.
$ ls -a
.  ..

Let us initialize an empty git repository.
$ git init
Initialized empty Git repository in /Users/anjana/gitBasics/.git/
$ ls -a
.    ..   .git
As seen above, a hidden .git directory is created inside the the gitBasics, indicating that a repository has been initialized.

Next, lets see the current status of the directory as compared to the repository.
$ git status
On branch master

Initial commit

nothing to commit (create/copy files and use "git add" to track)

Now lets create a file filename.txt in the directory.
$ ls -a
.            ..           .git         filename.txt

Lets check the status again.
$ git status
On branch master

Initial commit

Untracked files:
  (use "git add ..." to include in what will be committed)

 filename.txt

nothing added to commit but untracked files present (use "git add" to track)
git shows that an untracked file is present.

Lets add this file to the staged area.
$ git add filename.txt
$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached ..." to unstage)

 new file:   filename.txt

Next, lets commit these changes.
$ git commit -m"Adding test file"
[master (root-commit) 4b8b52d] Adding test file
 Committer: Shankar 
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly:

    git config --global user.name "Your Name"
    git config --global user.email you@example.com

After doing this, you may fix the identity used for this commit with:

    git commit --amend --reset-author

 1 file changed, 1 insertion(+)
 create mode 100644 filename.txt

At the time of commit, git tries to identify the author of the commit. In order to set this, use the following commands.
$ git config --global user.name "Anjana Shankar"
$ git config --global user.email "***@g***.com"

Now when you run git status, it says that the working directory is clean and there is nothing to commit.
$ git status
On branch master
nothing to commit, working directory clean

Next we have the git log command. This command prints the history of the repository.
$ git log
commit 4b8b52d4071a04c7f98436aae959ab9b10fec2ec
Author: Shankar 
Date:   Thu Dec 11 22:24:28 2014 +0530

    Adding test file

Now lets add the remote origin to our local repo.
$ git remote add origin git@github.com:*****/gitBasics.git

After the remote branch is added, we should push our code to remote git repo. This can be done as follows:
$git push -u origin master

In order to pull from remote branch, use the following command:
$git pull -u origin master

In order to see the differences between the current and the last committed version of code, use the following:
$ git diff HEAD
diff --git a/filename.txt b/filename.txt
index c9e358c..411cdda 100644
--- a/filename.txt
+++ b/filename.txt
@@ -1 +1 @@
-First File
+First File Modified

or you can simply use
$ git diff
diff --git a/filename.txt b/filename.txt
index c9e358c..411cdda 100644
--- a/filename.txt
+++ b/filename.txt
@@ -1 +1 @@
-First File
+First File Modified

A line prepended with '-' shows the deleted lines and a line prepended with '+' shows the added lines.

When we use the git add command, we stage the differences. Lets stage the differences first, and then understand how to unstage and reverse our changes to arrive at the last committed snapshot. I have created another file 'filename2.txt', Committed the file and then made some changes to it.
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

 modified:   filename2.txt

no changes added to commit (use "git add" and/or "git commit -a")
$ git add filename2.txt 
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

 modified:   filename2.txt

To see the staged differences, use the following:
$ git diff --staged
diff --git a/filename2.txt b/filename2.txt
index f686acc..5701cbe 100644
--- a/filename2.txt
+++ b/filename2.txt
@@ -1 +1 @@
-Second File
+Second File Modified

You can unstage the files as follows:
$ git reset filename2.txt
Unstaged changes after reset:
M filename2.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

 modified:   filename2.txt

no changes added to commit (use "git add" and/or "git commit -a")

After unstaging the changes can be undone as follows:
$ git checkout -- filename2.txt
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

Let's talk about branches now:
To create a new branch, use the following:
$ git branch newBranch

Use the following to switch branches
$ git checkout newBranch
Switched to branch 'newBranch'
$ git status
On branch newBranch
nothing to commit, working directory clean

I have modified 'filename2.txt' and pushed changes to this branch.
$ git log
commit 889ab1f0f42e7efd5818f68b30a42ced587db320
Author: Anjana Shankar <*****@gmail.com>
Date:   Fri Dec 12 10:14:05 2014 +0530

    Modified file on the branch

Now lets merge this branch to master. First we will have to switch back to master. Once you are on the master you can merge the branch.
$git checkout master
$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
$ git merge newBranch
Updating 7c4f3ad..889ab1f
Fast-forward
 filename2.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working directory clean
$git push
Total 0 (delta 0), reused 0 (delta 0)
To git@github.com:****/gitBasics.git
   7c4f3ad..889ab1f  master -> master

Finally as we are done with the branch, let's delete it.
$ git branch -d newBranch
Deleted branch newBranch (was 889ab1f).
$ git push origin --delete newBranch
To git@github.*****/gitBasics.git
 - [deleted]         newBranch

To see the remote branches available, use the following:
$ git branch -r
  origin/master

That's it in this post. Will try to cover a few advanced git concepts in my next posts.
Reference : Pro Git book

Friday, November 7, 2014

Configuring Hadoop on Mac OSx in pseudo-distributed cluster mode.


Hadoop is an open-source Apache project that enables processing of extremely large datasets in a distributed computing environment. There are three different modes in which it can be run:

1. Standalone Mode
2. Pseudo-Distributed Mode
3. Fully-Distributed Mode

This post covers setting up of Hadoop 2.5.1 in a Pseudo-distributed mode. A Pseudo-Distributed mode is one where each hadoop daemon runs as a separate java process.

Prerequisites


Java: Install Java if it isn’t installed on your mac.
Homebrew: Homebrew is a package manager for Mac. You can find the installation instructions here
Keyless SSH : First, ensure Remote Login under System Preferences -> Sharing is checked to enable SSH. Generate the key pairs.
$ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now ssh into your localhost and allow authorization.

Installation


This is where Homebrew is used.
$brew install Hadoop
If you do not want to use homebrew or you want to install a specific version of Hadoop, you can download it from the Apache Hadoop. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hadoop.

Configuration


Every component of Hadoop is configured using an XML file specifically located in /usr/local/Cellar/hadoop/2.5.1/libexec/etc/hadoop.MapReduce properties go in mapred-site.xml, HDFS properties in hdfs-site.xml and common properties in core-site.xml. The general Hadoop environment properties are found in hadoop-env.sh.

hadoop-env.sh

Replace the existing HADOOP_OPTS with following.
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
If Homebrew was not used to install Hadoop, kindly point the JAVA_HOME to your java installation.

core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

The Hadoop Distributed File System properties go in this config file. Since we are only setting up one node, we set the value of dfs.replication to 1.
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>


Execution

Before starting the daemons we must format the newly installed HDFS.
$ cd /usr/local/Cellar/hadoop/2.5.1/libexec/bin
$ hdfs namenode -format

Start the Daemons:
$ cd /usr/local/Cellar/hadoop/2.5.1/libexec/sbin
$ ./start-dfs.sh

Monitoring
Check the output of jps
$jps
10756 NameNode
1282 
10842 DataNode
11022 Jps
10951 SecondaryNameNode
1842 

Alternatively, the web interface for the NameNode can be browsed at http://localhost:50070

Running Examples
1. Create the HDFS directories required to execute MapReduce jobs:
$ cd /usr/local/Cellar/hadoop/2.5.1/libexec/bin
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/<username>

2. Copy the input files to the Hadoop Distributed File System
$ hdfs dfs -put ../etc/hadoop input

3. Run the example provided
$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input output 'dfs[a-z.]+'

4. View the output files on HDFS
$ hdfs dfs -cat output/*

Reference : http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

Wednesday, October 8, 2014

Setting up keyboard shortcuts in Mac OSX

How many of you have wished that you could maximize your window size without having to drag your mouse around to the green button on the title bar. You will have to make your own keyboard shortcut for this one, since it isn’t set by default. This post aims to show you how this can be done.
Go to System Preferences -> Keyboard.
Choose the Shortcuts tab.
From the Left Pane select App Shortcuts. Click on + icon at the bottom of the right pane.

Enter Zoom into the menu title section and then click into the keyboard shortcut box to define your keyboard shortcut.



Make sure to enter the exact command of the menu title that you want to add the shortcut for.

Monday, August 25, 2014

Splitting a git repository

Sometimes when you are starting with version control for your code, you dont know how big your repository is going to become. The decision whether to create a new repository for every module or to keep them all in one repository may be a difficult one.
When the modules are smaller, you would like to keep them all in one place. You can decide to move them in their own repository later. It took me some googling around to figure out how to split a git repo. This is fairly easy and this great blog post summarizes it very well. In this post, I am putting the lessons I learnt while splitting a git repo.

Please ensure that all your commits to the module are merged to master branch. The version logs of only the master branch would be preserved post the split.

Say, I have a git project called webservice.git. And it has the subdirectories service1/, service2/, service3/. Of these service1/ has grown large enough to be moved into its own repository.

Follow the following steps to move the folder service1/ into its own git repo, along with the logs of the master branch.

Step 1: Clone existing repo as desired repo:
$git clone --no-hardlinks git@gitserver:webservice.git service1

Step 2: Filter the branch and reset to exclude other files, so they can be pruned:
cd service1
$git filter-branch —subdirectory-filter service1 HEAD — —all
$git reset —hard
$git gc —aggressive
$git prune

Step 3: Create new empty repo on git server

Step 4: On the local machine, replace remote origin to point to new repo:
cd service1
$git remote rm origin
$git remote add origin git@gitserver:service1.git
$git push origin master

Thursday, August 21, 2014

An Introduction to Zookeeper - Part I of the Zookeeper series

A distributed system consists of multiple computers that communicate through a computer network and interact with each other to achieve a common goal.Major benefits that distributed systems offer over centralized systems is scalability and redundancy. Systems can be easily expanded by adding more machines as needed, and even if one of the machines is unavailable, the service continues to be available. However, such a system comes with its own challenges.

Zookeeper is an open source, high-performance coordination service for distributed applications. It tackles the common challenges that distributed applications face. It exposes common services that are required in the distributed environment like naming, configuration management, group service and provides the solution to distributed synchronization.

Zookeeper can be run in either a single-server mode or cluster(replicated) mode. Running zookeeper in single-server mode does not take advantage of zookeeper’s inherent features of high availability and resilience. Typically in a production environment, Zookeeper is run in a multi-server mode. A zookeeper cluster is called as an ensemble. A leader is elected on service startup. If the leader goes down, a new leader is elected. Clients only connect to a single zookeeper server and maintain a TCP connection. A Client can read from any zookeeper server, however writes go through the leader and needs a majority consensus.

Zookeeper provides sequential consistency guarantee, i.e., updates are applied in the order in which they are sent. It guarantees atomic updates, i.e., the updates either succeed or fail, there are no partial updates. It guarantees that a zookeeper client sees the same view of the service irrespective of the server in the ensemble that it connects to. A server will not accept a connection from a client until it has caught up with the state of the server to which the client was connected previously. Zookeeper ensures reliability, i.e., if an update succeeds in Zookeeper, then it is not rolled back. Zookeeper guarantees timeliness, i.e., a client is bound to see system changes within a certain time bound.

Zookeeper is an eventually consistent system, i.e., it does not guarantee that different clients will have identical view of zookeeper data at every instance in time. But it guarantees that if a follower falls too far behind the leader, then it goes offline.

The distributed processes using Zookeeper coordinate with each other through shared hierarchical namespaces. These namespaces are organized like UNIX file system. More on Zookeeper in the next post.

Tuesday, August 12, 2014

Multiple Java Installations on OSX and switching between them

Sometimes it may be necessary to have two versions of java on your OSX, and to switch between the two in a fast,reliable and convenient manner. This post aims to show how this can be achieved.

I am currently using version 10.9.4 of OSX, and I was required to have both JAVA6 and JAVA7 on my system, as some of the codebases I was working on were dependent on JAVA6 and some were dependent on JAVA7.

Download the required jdk versions from Oracle downloads page.

After the installations are done, run the following command to know the java version currently being used.
$java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

Run the following command. Replace 1.6 with the version you want to switch to.
$/usr/libexec/java_home -v 1.6
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home

Export the output of the above code as JAVA_HOME as follows:
$export JAVA_HOME="/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home"

Now run the command to know the java version:
$java -version
java version “1.6.0_65”
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

In this way, switching between the two versions can be done easily.

Monday, August 11, 2014

Java Inheritance - Super keyword

Inheritance is an important principle in object oriented programming. It is a mechanism in which one object acquires the property of another. The object which gives the properties is called as super/base/parent class and the one that receives the property is called as the sub/derived/child class. The child class inherits all fields and methods of the parent class extending it to add its own functionality by adding fields and methods.
One of the major advantages of Inheritance is reusability. The derived class gets the properties of the base class, thereby the code within the base is reused in the derived class, leading to faster development cycle.

This post looks at the effect inheritance has on the constructor and the methods, and particularly the role of super keyword in inheritance.

Incase of inheritance when an object of derived class is created the compiler first calls the base class constructor and then the derived class constructor. In case of multilevel inheritance the constructor would be called according to the order of inheritance.

Incase we have a method belonging to the base and derived class that have the same signatures, and if this common method is called, the compiler overrides the method belonging to the base class and calls the method belonging to the derived class.

Lets define a simple class A as shown in the snippet below. This is going to be our parent class.
class A{
    //Constructor of A
    A(){
        System.out.println("A's constructor");
    }
    public void method(){
 System.out.println("A's get method");
    }
}

We define a child class that extends the above class as folows.
class B extends A{
    //Constructor of B
    B(){
        System.out.println("B's constructor");
    }
}

Here is a sample test main method for testing the inheritance.
public static void main(String[] args){
    System.out.println("JAVA - INHERITANCE");
    B objectB = new B();
}

The output of the above code is as shown below.
JAVA - INHERITANCE
A's constructor
B's constructor

As we see, the base class constructor is called implicitly. We can also call the base class constructor explicitly by using super(). However, the super call should be the first statement in the derived class constructor. The following code snippet will give a compiler error.
class B extends A{
    //Constructor of B
    B(){
        System.out.println("B's constructor");  //This is not allowed
        super();
    }
}

The super() call has to be the first line in the constructor, as its important to initialize the base class before the derived class is initialized.
The correct way of writing the above code is as follows:
class B extends A{
    //Constructor of B
    B(){
        super();
    }
}

However, a base class method can be called from any line of the method in the derived class using the keyword super. This is shown in the snippet below.

class B extends A{
    //Constructor of B
    B(){
        super();
    }
 
    @Override
    public void method(){
    System.out.println("B's get method");
       super.method();
    }
}

Here is the sample main function where we call the derived class method..
public static void main(String[] args){
    System.out.println("JAVA - INHERITANCE");
    B objectB = new B();
    objectB.method();
}

The output of the above code is as shown below.
JAVA - INHERITANCE
A's constructor
B's constructor
B's get method
A's get method


Thus we have seen in this post that the parent class constructor is executed before the child class constructor. If the base class contains a constructor with one or more arguments, then it is mandatory for the derived class to have a constructor and pass the arguments to the base class constructor

Sunday, August 3, 2014

Configuration and Coordination with Zookeeper

It took me a while to understand the concept of Zookeeper and it took me another some to understand how to use it for the task that I had begun with. This post is intended to help others cross the bridge faster.
Dynamic Configuration Management for today's system comes with all the nitty-gritties that are involved with a distributed environment. A distributed environment is relatively unreliable, with problems like network failures, clock synchronization, and it becomes the responsibility of each server to keep track of correctness of current configuration. These problems are the motivation behind configuration and coordination systems.

This particular post looks at Zookeeper as a configuration management tool. A short and good course on Zookeeper

Setup: A zookeeper cluster with 5 servers. A java wrapper that uses zookeeper to keep track of configuration changes. For our purposes we used the Curator framework.

Read zookeeper multi-server setup on a single host for setting up zookeeper cluster.

In your java project include the dependency for the curator framework.

Initializing connection to Zookeeper ensemble
public static CuratorFramework createConnection(String zookeeperConnectionString) {
    //First retry will wait for 1 second, the second will wait up to 2 seconds, the third will wait
    //upto 4 seconds.
    RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
    CuratorFramework client = CuratorFrameworkFactory.builder().connectString(zookeeperConnectionString)
            .retryPolicy(retryPolicy)
            .namespace("session_service")
            .canBeReadOnly(true)
            .zookeeperFactory(new DefaultZookeeperFactory())
            .build();
    client.start();
    return client;
}

Creating a new znode
public static void create(CuratorFramework client, String path) throws Exception {
    client.create().forPath(path);
}

Setting data of a znode
public static void setData(CuratorFramework client, String path, String data) throws Exception {
    byte[] payload = data.getBytes();
    client.setData().forPath(path, payload);
}

Deleting a znode
public static void delete(CuratorFramework client, String path) throws Exception {
    client.delete().forPath(path);
}

Get children and set the given watcher on the node
public static List watchedGetChildren(CuratorFramework client, String path) throws Exception {
    return client.getChildren().usingWatcher(new WatcherImpl(client,path)).forPath(path);
}

Get Data of the node and set a watcher on the node
public static String getData(CuratorFramework client, String path) throws Exception{
    String str = new String(client.getData().usingWatcher(new WatcherImpl(client,path)).forPath(path));
    return str;
}

Zookeeper works on the idea of client setting watches on znodes. Whenever the znode changes, the watch is triggered, that is, the client is notified.

We should have a class that implements the Watcher interface. This class should implement the process method. This is called if and when the corresponding changes occur.
public class WatcherImpl implements Watcher{
    @Override
    public void process(WatchedEvent event) {
        if(event.getType() == Event.EventType.NodeDataChanged) {
     System.out.println("The Data has changed");
        }
        else if(event.getType() == Event.EventType.NodeChildrenChanged){
     System.out.println("Children have changed");
        }
    }
}

Its important to note that watches are one time event and if its required to continue monitoring the changes the watch needs to be set again. All the read operations like getData(), getChildren() and exists() have the option of setting a watch. These watches are triggered on the corresponding changes.

That's all for getting the configuration management with zookeeper up and running.