INSTALLING HADOOP,HIVE,DERBY ON CENTOS

Please subscribe to my site www.jamesjara.com to get more tutorials.

INSTALLING HADOOP IN centos 6
INSTALLING HIVE IN centos 6
INSTALLING DERBY IN centos 6
hadoop-0.20.203.0rc1

this is the guide for the installation of Hadoop ecosystem,
is very extended so please follow step by step

====INSTALLATION=====

1. Installing java
    yum  install sun-java6-jdk

2.Adding a dedicated user for hadoop
This will add the user hdoopuser and the group hdoopgroup to your local machine.
    /usr/sbin/useradd hdoopuser
    groupadd hdoopgroup
    usermod -a -G hdoopgroup hdoopuser

3.Configuring SSH
    su - hdoopuser        #login as hdoopuser
    ssh-keygen -t rsa -P ""    #generate key without password
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys    #enable the new key
    chmod 0600 $HOME/.ssh/authorized_keys    #enable empty password

4.Disabling IPv6
    sed -i 's/^\(NETWORKING\s*=\s*\).*$/\NETWORKING=NO/' /etc/sysconfig/network

5.Installation/Conf/startup of Hadoop
    mkdir /hadoop
    chown -R hdoopuser /hadoop
    cd /hadoop/
    wget http://mirrors.abdicar.com/Apache-HTTP-Server//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz
    tar -xvzf hadoop-0.20.203.0rc1.tar.gz
    ln -s /hadoop/hadoop-0.20.203.0rc1/ /hadoop/hadoop
    cd /hadoop/hadoop

    #basic config
    1)
    vim conf/core-site.xml
        #Add the following inside the <configuration> tag
        <property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:9000/</value>
        </property>
        <property>
        <name>dfs.permissions</name>
        <value>false</value>
        </property>
    2)
    vim conf/hdfs-site.xml
        #Add the following inside the <configuration> tag
        <property>
          <name>dfs.name.dir</name>
          <value>/hadoop/hdfs/name</value>
        </property>
        <property>
          <name>dfs.data.dir</name>
          <value>/hadoop/hdfs/data</value>
        </property>
        <property>
          <name>dfs.replication</name>
          <value>2</value>
        </property>
    3)
    vim conf/mapred-site.xml
        #Add the following inside the <configuration> tag
        <property>
          <name>mapred.job.tracker</name>
          <value>localhost:9001</value>
        </property>
    4)
    vim conf/hadoop-env.sh
        export JAVA_HOME=/opt/jre/
        export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
    5)
    Fomart nodes
        su - hdoopuser
        cd /hadoop/hadoop
        bin/hadoop namenode -format
    6)Start hadoop
        bin/start-all.sh
        notes:  HTTP CONSOLE OF HADOOP
            http://localhost:50030/ for the jobtrackeR
            http://localhost:50070/ for the namenode

5.Installation/Conf/startup of Hive/Derby
    cd /hadoop
    wget http://mirrors.ucr.ac.cr/apache//hive/stable/hive-0.8.1-bin.tar.gz
    tar -xvzf hive-0.8.1-bin.tar.gz
    ln -s /hadoop/hive-0.8.1-bin/ /hadoop/hive
    export HADOOP_HOME=/hadoop/hadoop/
    cd /hadoop/hive
     mv conf/hive-default.xml.template conf/hive-site.xml
    #test hive
    bin/hive
        > show tables;
    #installing derby metadatastore
    cd /hadoop
    wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz
    tar -xzf db-derby-10.4.2.0-bin.tar.gz
    ln -s db-derby-10.4.2.0-bin derby
    mkdir derby/data
    export DERBY_INSTALL=/hadoop/derby/
    export DERBY_HOME=/hadoop/derby/
    export HADOOP=/hadoop/hadoop/bin/hadoop  

    vim /hadoop/hadoop/bin/start-dfs.sh
    #add to the file start-dfs.sh the next 2 lines
        cd /hadoop/derby/data
        nohup /hadoop/derby/bin/startNetworkServer -h 0.0.0.0 &

    vim /hadoop/hadoop/bin/start-all.sh
    #add to the file start-all.sh the next 2 lines
        cd /hadoop/derby/data
        nohup /hadoop/derby/bin/startNetworkServer -h 0.0.0.0 &

    #HIVE CONF
    vim /hadoop/hive/conf/hive-site.xml    #installing web panel for hive , search and replace
    #search for "javax.jdo.option.ConnectionURL" and edit like the following
        <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
          <description>JDBC connect string for a JDBC metastore</description>
        </property>
    #HTTP CONSOLE OF HIVE
    bin/hive --service hwi &              
        URL: http://localhost:9999/

    #create new file
    vim /hadoop/hive/conf/jpox.properties
    #add the following
        javax.jdo.PersistenceManagerFactoryClass=org.jpox.PersistenceManagerFactoryImpl
        org.jpox.autoCreateSchema=false
        org.jpox.validateTables=false
        org.jpox/usr/share/javadoc/java-1.6.0-openjdk/jre/.validateColumns=false
        org.jpox.validateConstraints=false
        org.jpox.storeManagerType=rdbms
        org.jpox.autoCreateSccp /hadoop/derby/lib/derbytools.jar  /hadoop/hive/libhema=true
        org.jpox.autoStartMechanismMode=checked
        org.jpox.transactionIsolation=read_committed
        javax.jdo.option.DetachAllOnCommit=true
        javax.jdo.option.NontransactionalRead=true
        javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver
        javax.jdo.option.ConnectionURL=jdbc:derby://localhost:1527/metastore_db;create=true
        javax.jdo.option.ConnectionUserName=APP
        javax.jdo.option.ConnectionPassword=mine
    #now copy derby jar sources to Hive lib
    cp /hadoop/derby/lib/derbyclient.jar /hadoop/hive/lib
    cp /hadoop/derby/lib/derbytools.jar  /hadoop/hive/lib

    #HTTP CONSOLE OF HIVE      
    http://localhost:9999/hwi/ for the hive

6.START CLUSTER
    /hadoop/hadoop/bin/start-all.sh
    /hadoop/hive/bin/hive --service hwi &   #hwi=webpanel
  

7. FOR NEXT TIME AND EVER. Create a bash profile
    vi /etc/profile
    export JAVA_HOME=/opt/jre/
    export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
    export HADOOP_HOME=/hadoop/hadoop/
    export DERBY_INSTALL=/hadoop/derby/
    export DERBY_HOME=/hadoop/derby/
    export HADOOP=/hadoop/hadoop/bin/hadoop


======RUNNING======
PANELS:
http://localhost:50030/ for the jobtrackeR
http://localhost:50060/ for the  tasktracker
http://localhost:50070/ for the namenode
http://localhost:9999/hwi/ for the hive

Comentarios

Entradas populares de este blog

Accounting Utilities Linux