Tuesday, December 23, 2014

Configuring Hive on Ubuntu


Hive facilitates querying and managing large datasets residing in distributed storage. It is built on top of Hadoop. Hive defines a simple query language called as Hive Query language (HQL) which enables users familiar with SQL to query the data. Hive converts your HQL (Hive Query Language) queries into a series of MapReduce jobs for execution on a Hadoop cluster. In this post we will configure Hive on our machine.

Download Hive from the Apache Hive site. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hive. At the time of this writing, the latest version available is 0.14.0.

Prerequisites:
Java: 1.6 or higher. Preferred version would be 1.7
Hadoop: 2.x. For Hadoop installation you can refer to this post.

Installation

Set the environment variable HIVE_HOME to point to the installation directory. You can set this in your .bashrc
export HIVE_HOME=/user/hive

Finally, add $HIVE_HOME/bin to your PATH.
$export PATH=$HIVE_HOME/bin:$PATH

Setting HADOOP_PATH in HIVE config.sh
Append the following line to the file $HIVE_HOME/bin/config.sh.
export HADOOP_HOME=/user/hadoop


Running Hive
You must create /tmp and /user/hive/warehouse and set appropriate permissions before you can create any table in hive.
$ hadoop fs -mkdir /usr/hive/warehouse
$ hadoop fs -chmod g+w /usr/hive/warehouse
$ hadoop fs -mkdir /tmp
$ hadoop fs -chmod g+w /tmp

Start the hive shell
$ hive

The shell would look something like
Logging initialized using configuration in jar:file:/user/hive/lib/hive-common-0.14.0.jar!/hive-log4j.properties
hive >

Reference : https://cwiki.apache.org/confluence/display/Hive/Home

No comments :

Post a Comment