Friday, January 22, 2016

How to install and run Apache Hive in a local mode (stand alone)?



For installing and running Apache Hive in a local mode (stand alone):

Step 1:  Download following software/s:

a) Java software: Our system must have Java ( http://www.oracle.com/technetwork/java/javaee/downloads/index.html ) before Hadoop and Hive installation.

b) Apache Hadoop framework software:
For Apache Hive in stand alone mode we need to point to Hadoop installed directory. I am using the hadoop-2.5.2.tar.gz framework from https://hadoop.apache.org/releases.html

c) Apache Hive framework software:
I am using the apache-hive-1.2.1-bin.tar.gz framework release from https://hive.apache.org/downloads.html

Step 2: Uncompress downloaded tar files a gzip tar file (.tgz or .tar.gz)

a) Uncompress / install Java software:  

tar xvzf jdk-8u71-linux-i586.tar.gz
Please follow the appropriate instruction if required.

b) Uncompress the Apache Hadoop framework at your Unix/Linux/Osx command prompt:

tar xvzf hadoop-2.5.2.tar.gz 

The above command creates and uncompress the software into hadoop-2.5.2 folder

b) Move the hadoop-2.5.2 directory from the uncompressed location to Users library directory, (if required, rename appropriately):

/usr/lib/apache-hadoop-2.5.2 

c) Uncompress the Apache Hive framework at your Unix/Linux/Osx command prompt: 

tar xvzf apache-hive-1.2.1-bin.tar.gz 

The above command uncompresses software files into hadoop-2.5.2 folder

d) Move hadoop-2.5.2 directory from the uncompressed location to Current users library directory (if required, rename appropriately):

/usr/lib/apache-hive-1.2.1 

Step 3: Configure user's .bash profile as below:

export JAVA_HOME="/usr/lib/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home"

export PATH="/usr/lib/Java/JavaVirtualMachines/jdk1.8.0_65.jdk/Contents/Home:${PATH}"

#HADOOP_HOME for Hive Installation
export HADOOP_HOME="/usr/lib/apache-hadoop-2.5.2"

export PATH="${HADOOP_HOME}/bin:${PATH}"


#Apache Hive Home Path
export PATH="/usr/lib/apache-hive-1.2.1/bin:${PATH}"

Step 4: Run Apache Hive from command prompt:

/user/home: hive <enter>

a) The above may return below error or warning:
Logging initialized using configuration in jar:file:/Library/apache-hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
[ERROR] Terminal initialization failed; falling back to unsupported

java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

Step 5: Resolve initialization failed error:

a) Resolve error after running Apache Hive from command line, add below line to initialize the Apache Hadoop library into .bash profile file:

export HADOOP_USER_CLASSPATH_FIRST=true

b) Rerun the Apache Hive again:

/user/home: hive <enter>

c) On success, we will get Apache Hive interactive shell:

Logging initialized using configuration in jar:file:/usr/lib/apache-hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>

d) You can test/view the Apache Hive default database:

hive> show databases; <enter>
OK
default      --- Pls. note that this is Hive's default database
Time taken: 1.994 seconds, Fetched: 1 row(s)

d) Try to create new database for your testing:


hive> create database test_db; <enter>

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to create database path file:/user/hive/warehouse/test_db.db, failed to create database test_db)

Step 6: Resolve Unable to create database path file error:

a) To be able to create/store Apache Hive database file/s add following line into .bash profile file and restart Apache Hive:

export HIVE_OPTS='-hiveconf mapred.job.tracker=local -hiveconf fs.default.name=file:///tmp -hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse -hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'

b) Quit your terminal window and restart to get above setting refreshed:

c) Rerun the Apache Hive again:

/user/home: hive <enter>

Logging initialized using configuration in jar:file:/Library/apache-hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 1.652 seconds, Fetched: 1 row(s)
hive> create database test_db;
OK
Time taken: 0.323 seconds
hive> show databases;
OK
default
test_db
Time taken: 0.035 seconds, Fetched: 2 row(s)

hive> 

d) Now you will be able to access hive in local mode (stand alone) and create databases/tables without any errors or warnings.

I will be creating more blog posts to share new things in Apache Hive. Please keep on reading. Thanks!

References:

  • http://www.dummies.com/how-to/content/how-to-get-started-with-apache-hive.html
  • https://cwiki.apache.org/confluence/display/Hive/LanguageManual