Monday, October 21, 2013

Setup newest Hadoop 2.x (2.2.0) on Ubuntu

In this tutorial I am going to guide you through setting up hadoop 2.2.0 environment on Ubuntu.

Prerequistive

$ sudo apt-get install openjdk-7-jdk
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk

$ sudo apt-get install openssh-server

Add Hadoop Group and User

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo adduser hduser sudo
After user is created, re-login into ubuntu using hduser

Setup SSH Certificate

$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost

Download Hadoop 2.2.0

$ cd ~
$ wget http://www.trieuvan.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
$ sudo chown -R hduser:hadoop hadoop

Setup Hadoop Environment Variables

$cd ~
$vi .bashrc

paste following to the end of the file

#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
###end of paste

$ cd /usr/local/hadoop/etc/hadoop
$ vi hadoop-env.sh

#modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/
Re-login into Ubuntu using hdser and check hadoop version
$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
At this point, hadoop is installed.

Configure Hadoop

$ cd /usr/local/hadoop/etc/hadoop
$ vi core-site.xml
#Paste following between <configuration>


   fs.default.name
   hdfs://localhost:9000



$ vi yarn-site.xml
#Paste following between <configuration>


   yarn.nodemanager.aux-services
   mapreduce_shuffle


   yarn.nodemanager.aux-services.mapreduce.shuffle.class
   org.apache.hadoop.mapred.ShuffleHandler



$ mv mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml
#Paste following between <configuration>


   mapreduce.framework.name
   yarn



$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /usr/local/hadoop/etc/hadoop
$ vi hdfs-site.xml
Paste following between <configuration> tag


   dfs.replication
   1
 
 
   dfs.namenode.name.dir
   file:/home/hduser/mydata/hdfs/namenode
 
 
   dfs.datanode.data.dir
   file:/home/hduser/mydata/hdfs/datanode
 

Format Namenode

hduser@ubuntu40:~$ hdfs namenode -format

Start Hadoop Service

$ start-dfs.sh
....
$ start-yarn.sh
....

hduser@ubuntu40:~$ jps
If everything is sucessful, you should see following services running
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode

Run Hadoop Example

hduser@ubuntu: cd /usr/local/hadoop
hduser@ubuntu:/usr/local/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

Number of Maps  = 2
Samples per Map = 5
13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2
13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2
13/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
...

Note: ericduq has created a shell script (make-single-node.sh) for this setup and it is available at git repo at https://github.com/ericduq/hadoop-scripts.

What to read next
Hadoop FileSystem (HDFS) Tutorial 1
Hadoop 2.x Core (HDFS and YARN) Components Explained
Hadoop Wordcount example

Feel free to leave comments below. I will have more hadoop tutorials added regularly.

262 comments:

  1. Nicely written tutorial.

    I'm uncertain why you gave the hduser sudo privs.

    Modifying .bashrc was a really nice touch.

    If you're planning more of these, it would be great to add one with setting up the other nodes of a multi-node cluster.

    ReplyDelete
  2. Nice writeup! I second Chotu's comment that I'd like to see how to add additional nodes to the cluster now.

    Thanks!

    ReplyDelete
  3. Also, there is a typo here: $ mkdir -p mydata/hdfs/namdnode

    ReplyDelete
  4. Thanks got my server running with your info :)

    ReplyDelete
  5. Very helpful article. how to add new nodes ?

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. Nice article. Great job. If you can add multi-node cluster, it will be much appreciated.

    ReplyDelete
  8. Very nice article, after struggling with the official documentation half the morning and 5 mins. of your tutorial have yielded better results :D
    Btw, as everyone above me has mentioned, a tutorial on how to set up a multi-node cluster would be awesome.

    ReplyDelete
  9. Thanks everyone for the feedbacks. Will work on multi-node setup shortly.

    ReplyDelete
  10. The article is very well written and helped me a lot. But I am facing one problem that when I try and run jps, I cannot see the namenode and the datanode running.
    I am getting the following message when I run start-dfs.sh command:

    $ start-dfs.sh
    13/11/13 01:07:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
    Starting namenodes on []
    localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-ubuntu.out
    localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-ubuntu.out
    13/11/13 01:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    Please tell me what am I doing wrong...

    ReplyDelete
    Replies
    1. Make sure you have filled up "hdfs-site.xml" correctly.

      Delete
    2. I think you are missing the html tags .keep the tags propertly in the *.xml files. you can use the reference from here
      http://raseshmori.wordpress.com/2012/09/23/install-hadoop-2-0-1-yarn-nextgen/

      Delete
    3. This comment has been removed by the author.

      Delete
    4. This comment has been removed by the author.

      Delete
    5. Try adding this line to ~/.bashrc :
      export HADOOP_CONF_DIR=$HADOOP_INSTALL:etc/hadoop/

      HTH

      Delete
    6. I've had the same issue but SecondaryNameNode was running though. The problem was that I didn't log out and log in as hduser. I did create the directories in my own home folder. They had to be created in the hduser's home folder.

      You could do:
      me@ubuntu ~$ su hduser
      (now you are `hduser` in the terminal for that session)
      hduser@ubuntu /home/me$ cd ~
      hduser@ubuntu ~$ mkdir -p .....{ etc. }

      Delete
    7. This comment has been removed by the author.

      Delete
    8. My mistake was to try and format the xml nicely, eg I had

      name
      hdfs://localhost:9000
      name

      instead of namehdfs://localhost:9000name

      believe it or not but it makes a massive difference,I think somehow it puts spaces before the hdfs. By the way I had to remove the <> around name above as blogger wouldnt show it.

      Delete
  11. Thanks good article for big data learners.....RAJMOHAN

    ReplyDelete
  12. Nice article but I am not able to run it .. Getting Connexion refused error on port 9000 .. Please help thanks

    ReplyDelete
  13. Thank You.... your blog saved me a lot of grunt work :)

    ReplyDelete
  14. Thank you... brief installation guide for hadoop

    ReplyDelete
  15. This Article really looks good. Then what about the Job Tracker & Task Tracker . may i know the procedure for configuring the Job Tracker & Task Tracker in Hadoop 2.2.0

    ReplyDelete
  16. excellent tutorial, thanks so much, worked like a champ.

    however, i had to do one extra tweak (running crunchbang, which is really just debian under the covers)


    had to explicitly set JAVA_HOME in hadoop-env.sh


    export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

    it wasnt picking it up from my bashrc profile. weird.

    ReplyDelete
  17. To be clear: when viewed in a web browser, the "contents of the tags" are supposed to have some XML tags around them, but they are not shown on the web page because they are swallowed by the browser. To see what you really need to put into those three files, view the page source and you will see tags around those configs.

    ReplyDelete
  18. A shell script (make-single-node.sh) for this setup is in my git repo at https://github.com/ericduq/hadoop-scripts. I will be adding a multi-node version. Thanks, Zhi.

    ReplyDelete
    Replies
    1. post updated to include the url. Thanks.

      Delete
    2. great script Eric and it worked flawlessly.

      Delete
    3. Can you please share the same script for 32 bit intel

      Distributor ID: Ubuntu
      Description: Ubuntu 13.10
      Release: 13.10
      Codename: saucy

      Delete
  19. After entering the start-dfs.sh command I get the following error:

    Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
    Starting namenodes on []

    After entering start-yarn.sh and jps, it only shows jps, ResourceManager, and NodeManager running.

    I've double and triple checked all of the config files. Any thoughts?

    ReplyDelete
    Replies
    1. Same issue also for me! I'm installing on Debian 7.

      I double checked all xml config file, all correct!

      Please clear to me also the core-site.xml parameter fs.default.name: on Hadoop official doc it is named DEPRECATED.

      Also on the fs.default.name valure: localhost or the hostname?

      Any update?

      Delete
    2. core-site.xml shoul be like

      < property >
      < name >fs.default.name< /name >
      < value >hdfs://localhost:9000< /value >
      < /property >

      Delete
  20. Hi,

    Have submitted job its running, but not able to see any completed jobs in the UI.

    http://localhost:8088

    Do I need to specify the property in configurations?

    ReplyDelete
  21. Hi, How do i start additional Data node.

    ReplyDelete
  22. This comment has been removed by the author.

    ReplyDelete
  23. This article is really helpful. But I am also facing issue with starting Data Node.

    Here is the complete detail:

    hduser@ubuntu:/usr/local/hadoop$ sudo -u hduser sbin/start-dfs.sh
    13/12/08 00:53:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [localhost]
    hduser@localhost's password:
    localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-ubuntu.out
    hduser@localhost's password:
    localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.out
    Starting secondary namenodes [0.0.0.0]
    hduser@0.0.0.0's password:
    0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-ubuntu.out
    13/12/08 00:54:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    hduser@ubuntu:/usr/local/hadoop$ jps
    5731 NameNode
    6272 Jps
    6152 SecondaryNameNode

    Can any one please help me in fixing this issue.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. I too had the same issue:
      for namenode: The problem was that I had not created the mydata/hdfs/ directory under the hsuser. Creating the datanode and namenode folders as mentioned in the blog solves the problem.
      For datanode: I had formated the name node when the services were active. if you run the command 'hadoop datanode' you get an error indicating "Incompatible clusterIDs". Deleting and recreating the mydata/hdfs/datanode. solves the problem. Refer: http://stackoverflow.com/questions/16020334/hadoop-datanode-process-killed

      Delete
  24. I just started to learn Linux and hadoop. By following the instruction to the 'Start', I typed in $start-dfs.sh and got -bash: start-dfs.sh: command not found. for $jps, I got 18087 Jps.
    Any suggestions and instructions are greatly appreciated.
    PZ

    ReplyDelete
  25. It is very good and very useful for beginners to setup first time like me . Thanks you for the post, I expect you will keep post more on hadoop.

    ReplyDelete
  26. Hi,
    Thanks for this posting as I have also struggled with the official doc and was very happy to find this tutorial. Unfortunately I have ran into the following problem when running start-dfs.sh any help or insight is appreciated:

    [hduser@localhost hadoop]$ start-dfs.sh
    13/12/14 15:37:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
    localhost]
    sed: -e expression #1, char 6: unknown option to `s'
    64-Bit: ssh: Could not resolve hostname 64-Bit: Name or service not known
    VM: ssh: Could not resolve hostname VM: Name or service not known
    library: ssh: Could not resolve hostname library: Name or service not known
    ... lots more Name or service not known messages

    I have verify that OPEN jdk 1.7 is installed and changed the JAVA_HOME to point to it but it still doesn't work. However, I am running 1.7.45 not sure if that is causing problems...
    And I am running on centos instead of ubuntu but the download is generic to linux x86 ...

    ReplyDelete
    Replies
    1. There were a number of problems that I solved with these problems and hopefully it will help someone who faces the same issues:
      1) I download a 32 bit pre-built and running it on a 64 bit platform which causes all these error messages to be displayed.
      so building hadoop from scratch on my machine will fix it.
      2) the start-dfs.sh script is treating the error messages as input to the next command in the script and that causes more error.
      I "customized" the script to ignore these messages.
      3) My ssh from one box to another is not password less even though I did execute all the commands listed. There were a few more steps to ensure a truly password less ssh clustered environment and without it the script just sits there waiting for a password but without the prompt because the ssh operation puts it in a subshell so after step number 2 I figured out what was going on and just type in hduser's password whenever the script appears to be "stuck".

      Delete
    2. Hey Tim,
      Running into the same thing. Can you post how you customized the script to ignore errors?

      Thanks,
      Mathew

      Delete
    3. Hi Tim T,

      Can you post how you customized the script, im getting the same issue while trying to start dfs

      Delete
  27. I had an interesting hangup on ubuntu that I fixed with point 3 here: http://wiki.apache.org/hadoop/ConnectionRefused

    I had to comment out the first line of /etc/hosts

    ReplyDelete
  28. I am getting this error "/home/hduser/hadoop/bin/hdfs: line 201: /home/lib/jvm/jdk//bin/java: No such file or directory" while installig in Ubuntu, can anyone tell me how to fix it. I have already changed the path in .bashrc file

    export JAVA_HOME=/usr/lib/java-7-openjdk-amd64
    export HADOOP_INSTALL=/usr/local/hadoop
    export PATH=$PATH:$HADOOP_INSTALL/bin
    export PATH=$PATH:$HADOOP_INSTALL/sbin
    export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_HOME=$HADOOP_INSTALL
    export HADOOP_HDFS_HOME=$HADOOP_INSTALL
    export YARN_HOME=$HADOOP_INSTALL

    ReplyDelete
    Replies
    1. if you followed this tutorial closely, the jdk should be in /usr/lib/jvm/jdk/

      make sure you do

      $ cd /usr/lib/jvm
      $ ln -s java-7-openjdk-amd64 jdk

      Delete
    2. Hi Zhi,
      First of all thank you for putting up such a wonderful tutorial . Cheers..!!
      I am also getting the same error as muk rram when I try to run
      hduser@anilkumar:~$ hadoop version I get the following error
      /usr/local/hadoop/bin/hadoop: line 133: /usr/lib/jvm/jdk//bin/java: No such file or directory

      I followed your tutorial closely, but only differed while creating the jdk soft link since my system is a 32 bit machine, with Xubuntu, so my
      :~$ java -version Returns the following
      java version "1.7.0_25"
      OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.13.04.2)
      OpenJDK Client VM (build 23.7-b01, mixed mode, sharing)

      and

      $ cd /usr/lib/jvm | ls returns the following.
      java-1.7.0-openjdk-i386 java-7-openjdk-common
      java-6-openjdk-i386 java-7-openjdk-i386

      So instead of using $ ln -s java-7-openjdk-amd64 jdk I used ln -s java-7-openjdk-i386 jdk. Can you please let me know where I erred and help me out.

      Delete
    3. Hi Zhi and everyone,
      Error solved..I changed my JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre and that worked.
      Also, when I tried to run jps..I was getting the following error
      "Unable to find jps..."
      So I installed the following packages openjdk-7-jdk with Synaptic Package Manager.
      And it worked.
      Once again thanks for putting up such a wonderful post..!!

      Delete
  29. @Zhi thanks a lot. Sharing is caring.

    ReplyDelete
  30. This comment has been removed by the author.

    ReplyDelete
  31. I have installed and run the example; however when I am trying to copy text file "hadoop dfs -cp largetextfile.txt /home/hduser/somefile.txt" I am getting the error saying "cp: ''/home/hduser" : no such file or directory.

    ReplyDelete
  32. This comment has been removed by the author.

    ReplyDelete
  33. Very helpful article. Please let me know, how to configure jobtacker and tasktracker. Here, jps is not show job and task tracker. do we need some additional configuration for that?

    ReplyDelete
  34. Nice post for the beginners. I configured as per the instruction and comments. Very first time, i ran the pi program and saw the result out of it. then switched of my PC and start to verify the pi.. i am getting the below exception.. any help is appreciated..

    Number of Maps = 2
    Samples per Map = 5
    14/01/12 23:02:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    java.net.ConnectException: Call From ishu/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
    at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:278)
    at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

    ReplyDelete
  35. You need to login with your Google ID (this is the only one I tried) to be able to view the tags in the conf files. They do not appear to be visible unless you are logged in

    ReplyDelete
  36. This is very good post dude but after following all the steps it was not showing Namenode and Datanode on 'jps' but here is the updated post for this with completes all the rest thing
    http://www.javatute.com/javatute/faces/post/hadoop/2014/setting-hadoop-2.2.0-on-ubuntu-12-lts.xhtml

    ReplyDelete
  37. Hey Zhi,

    Thanks for the tutorial but I am facing one problem. When I try to check the hadoop version I get the following:

    Error: Could not find or load main class org.apache.hadoop.util.VersionInfo

    Also, when I try: hdfs namenode -format

    I get the following:

    Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

    The java version used is:

    java version "1.7.0_25"
    OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1ubuntu0.12.04.2)
    OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

    Can you or anyone else please help me?

    ReplyDelete
  38. Wonderful post. Thanks

    But I my namenode does not start.

    I get this message instead
    Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
    Starting namenodes on []

    ReplyDelete
  39. I tried installing hadoop 2.2.0 on Mac os X 10.8. When I try jps on my system i only get the follwoing output.

    26453 NodeManager
    26485 Jps
    26366 ResourceManager

    there is no DataNode/NameNode/Secondary NameNode.

    Isnt this the right way to set configuration in hdfs-site.xml:-


    dfs.replication
    1
    dfs.namenode.name.dir
    Users/hduser/mydata/hdfs/namenode
    dfs.datanode.data.dir
    /Users/hduser/mydata/hdfs/datanode

    kindly guide me on this issue.

    ReplyDelete
  40. Just to say that the described holds also for CentOS and Slackware ... thank you!

    ReplyDelete
  41. Thanks for the instructions. One problem, when I disabled IPv6 I lost network access. Any idea how I get this back?

    ReplyDelete
  42. Thanks Zhi! Quite a helpful article and surely saves many hours of newbies. Below are two problems I ran into and their solutions

    a. ssh connection refused. It turned out that my ubuntu installation did not have sshd installed and running.
    Solution is install openssh server and run it using
    sudo apt-get install openssh-server
    b. command start-dfs.sh failed to complete due to error 'unable to load native-hadoop library for your platform'. I have added environment variable to my bashrc profile as mentioned in https://gist.github.com/ruo91/7154697

    ReplyDelete
  43. congratulations guys, quality information you have given!!! Big Data and Analytics

    ReplyDelete
  44. Thanks. Well written. Official hadoop doc for 2.2.0 starts somewhere at map-reduce instead of these single node setup. Readers please be careful about the editing of *.xml files, in the configuration tag each pair on line in property->name->/name->value-/value-/property. The editor in the blog site is trimming them off :-).

    ReplyDelete
  45. Hello Iam new to Linux and Hadoop. I have a Windows 7 32 bit host OS on which iam running a Ubuntu 12.04 32 bit guest OS using VMware Workstation. Do i follow the same steps as listed above to install a 32 bit Hadoop version on Ubuntu 12.04 32 bit guest .( Is this even a valid question like is there a 64 bit Hadoop versus a 32 bit Hadoop?)

    Also my Toshiba laptop uses a Intel(R) Core(TM)2Duo CPU P7450 @2.13Ghz and Windows says it can support 64 bit OS , when i go to my bios i dont see anything related to Intel Vt -x and VMware utility says that i cannot have a 64 bit guest OS installed. Why would that be? Can anyone explain?

    ReplyDelete
  46. Excellent article! It's how we got our dev hadoop up.

    I did have to perform one additional action, lest jobs were forever in "pending": $ yarn-daemon.sh start nodemanager

    Cheers!

    ReplyDelete
  47. Great article! However I'm getting an error when trying to start the dfs:

    hduser@notebook:~$ start-dfs.sh
    14/02/17 23:13:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
    localhost]
    sed: -e expression #1, char 6: unknown option to `s'
    -c: Unknown cipher type 'cd'
    now.: ssh: Could not resolve hostname now.: No such file or directory
    noexecstack'.: ssh: Could not resolve hostname noexecstack'.: No such file or directory
    It's: ssh: Could not resolve hostname It's: No such file or directory
    might: ssh: Could not resolve hostname might: No such file or directory
    recommended: ssh: Could not resolve hostname recommended: No such file or directory
    with: ssh: Could not resolve hostname with: No such file or directory
    ',: ssh: Could not resolve hostname ',: No such file or directory
    stack: ssh: Could not resolve hostname stack: No such file or directory
    Server: ssh: Could not resolve hostname Server: No such file or directory
    OpenJDK: ssh: Could not resolve hostname OpenJDK: No such file or directory

    and a lot more of similar errors. Do you have an idea of the reason of this problem?

    Thanks!

    ReplyDelete
    Replies
    1. I think that the error is because hadoop it's compiled for 32bit and mi VM it's 64 bits. You can avoid this error by starting the single components one by one:

      $ hadoop-daemon.sh start namenode
      $ hadoop-daemon.sh start datanode
      $ yarn-daemon.sh start resourcemanager
      $ yarn-daemon.sh start nodemanager
      $ mr-jobhistory-daemon.sh start historyserver

      But of course the best solution would be to compile hadoop in your local environment

      Delete
    2. Yes, I took a few hours to build 64 bit and copy all files in native folder to /usr/local/hadoop ... to overwrite the 32 bit files, that works!

      Delete
    3. I made a fix for this issue in the make-single-node.sh script. See https://github.com/ericduq/hadoop-scripts. You don't need to recompile hadoop or start the services one at a time.

      Delete
    4. ericduq - can you please describe that difference between what you did in make-single-node.sh and the original steps?

      Delete
    5. This comment has been removed by the author.

      Delete
    6. The differences between make-single-node.sh and original steps are:
      ericduq add lines 41, 42:
      sudo sh -c 'echo export HADOOP_COMMON_LIB_NATIVE_DIR=\$\{HADOOP_INSTALL\}/lib/native >> /home/hduser/.bashrc'
      sudo sh -c 'echo export HADOOP_OPTS=\"-Djava.library.path=\$HADOOP_INSTALL/lib\" >> /home/hduser/.bashrc'

      Delete
    7. These 2 lines did make difference. Now it starts like a charm :) Thanks John.

      Delete
  48. Excellent tutorial. One gotcha. Remove the mydata directory before formatting namenode. Otherwise the data node will not startup.

    ReplyDelete
  49. This would be necessary if you did not perform the startup commands in proper sequence. You must run "hdfs namenode -format" before running "start-dfs.sh"

    ReplyDelete
    Replies
    1. data node does not start up. Can you please help

      Delete
  50. This comment has been removed by the author.

    ReplyDelete
  51. Hi ericduq. I followed the tutorial and your script. But both resulted in exception on container-launch when I runned the build-in example "hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5". Anyone got similar exceptions?
    Number of Maps = 2
    Samples per Map = 5
    14/02/27 01:05:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Wrote input for Map #0
    Wrote input for Map #1
    Starting Job
    14/02/27 01:05:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/02/27 01:05:18 INFO input.FileInputFormat: Total input paths to process : 2
    14/02/27 01:05:18 INFO mapreduce.JobSubmitter: number of splits:2
    14/02/27 01:05:18 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    14/02/27 01:05:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    14/02/27 01:05:18 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
    14/02/27 01:05:18 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    14/02/27 01:05:18 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
    14/02/27 01:05:18 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
    14/02/27 01:05:18 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use
    14/02/27 01:05:18 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
    14/02/27 01:05:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393463083822_0001
    14/02/27 01:05:20 INFO impl.YarnClientImpl: Submitted application application_1393463083822_0001 to ResourceManager at /0.0.0.0:8032
    14/02/27 01:05:20 INFO mapreduce.Job: The url to track the job: http://ip-10-164-77-20:8088/proxy/application_1393463083822_0001/
    14/02/27 01:05:20 INFO mapreduce.Job: Running job: job_1393463083822_0001
    14/02/27 01:06:31 INFO mapreduce.Job: Job job_1393463083822_0001 running in uber mode : false
    14/02/27 01:06:31 INFO mapreduce.Job: map 0% reduce 0%
    14/02/27 01:06:31 INFO mapreduce.Job: Job job_1393463083822_0001 failed with state FAILED due to: Application application_1393463083822_0001 failed 2 times due to AM Container for appattempt_1393463083822_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

    ReplyDelete
  52. Hello Everyone,
    I am new to big data hadoop, after installation i ran build pi value program, it gave the output. But now when i try to run the program, I get below error:
    Number of Maps = 2
    Samples per Map = 5
    14/03/04 20:08:07 WARN util.NativeCodeLoader: Unable to load native-hadoop libra
    ry for your platform... using builtin-java classes where applicable
    java.net.ConnectException: Call From bigdata/127.0.1.1 to localhost:9000 failed
    on connection exception: java.net.ConnectException: Connection refused; For more
    details see: http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
    orAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
    onstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    Any help would be appreciated

    ReplyDelete
  53. of the hook my friend, very good job, thank you

    ReplyDelete
  54. I could install hadoop 2.2.0 (single node setup) on lubuntu 13.10 (64-bit) by following these instructions.
    'Name or service not known' problem resolved by setting up hadoop options.

    Thanks a lot for this tutorial.

    Cheers,
    Mukhesh.

    ReplyDelete
  55. It is good to see the best site for all Hadoop tutorials. And you can also more queries about Hadoop here

    Hadoop Tutorial

    ReplyDelete
  56. Hadoop is best technology to maintain all big data . IF u like to know more about it visit here Hadoop Interview Questions

    ReplyDelete
  57. Hello

    while running ./start-dfs.sh coomand I am getting following error-

    14/03/18 00:13:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    ]tarting namenodes on [localhost
    : Name or service not knownstname localhost
    vijay@localhost's password:
    localhost: starting datanode, logging to /cygdrive/c/cygwin64/usr/local/hadoop-2.2.0/logs/hadoop-vijay-datanode-vijay-THINK.out
    ]tarting secondary namenodes [0.0.0.0
    : Name or service not knownstname 0.0.0.0
    14/03/18 00:14:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    Could you please suggest solution for this ?

    Thanks
    Vijay

    ReplyDelete
  58. I have followed the entire steps. But data node does not start. Even though I start it manually $ hadoop-daemon.sh start datanode. It shuts down again.

    ReplyDelete
  59. Zhi,

    Awsome tutorial, many thanks for taking the time and effort to put it all together and share it with us.

    I have hit a snag at the last hurdle and was wondering if you had any thoughts on how to solve the problem. I have installed and configured everything and all seems ok until I go to run the Map Reduce example. I get the following :

    Number of Maps = 2
    Samples per Map = 5
    14/03/28 06:41:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Wrote input for Map #0
    Wrote input for Map #1
    Starting Job
    14/03/28 06:43:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/03/28 06:43:37 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); maxRetries=45
    14/03/28 06:43:58 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); maxRetries=45
    14/03/28 06:44:18 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); maxRetries=45
    14/03/28 06:44:38 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); maxRetries=45
    ...
    14/03/28 06:59:00 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:org.apache.hadoop.net.ConnectTimeoutException: Call From Node1/199.101.28.130 to 0.0.0.0:8032 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=0.0.0.0/0.0.0.0:8032]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
    org.apache.hadoop.net.ConnectTimeoutException: Call From Node1/199.101.28.130 to 0.0.0.0:8032 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=0.0.0.0/0.0.0.0:8032]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

    ReplyDelete
    Replies
    1. Hi Paul, I had the same issue. Was able to start job after adding three properties in yarn-site.xml
      see this post

      http://stackoverflow.com/questions/20586920/hadoop-connecting-to-resourcemanager-failed

      Delete
  60. Nice tutorial!!!. I followed this tutorial but i am not able to run hadoop map reduce example of pi successfully.It gives me following error

    14/04/18 05:17:48 ERROR hdfs.DFSClient: Failed to close file /user/hduser/QuasiMonteCarlo_1397823451260_1750583313/in/part0
    org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hduser/QuasiMonteCarlo_1397823451260_1750583313/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)

    at org.apache.hadoop.ipc.Client.call(Client.java:1347)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)

    How can i fix this error?

    ReplyDelete
  61. I tried this tutorial and ended up with the error.

    I used this and able to start data,secondary and name node.

    sudo sh -c 'echo export HADOOP_COMMON_LIB_NATIVE_DIR=\$\{HADOOP_INSTALL\}/lib/native >> /home/hduser/.bashrc'
    sudo sh -c 'echo export HADOOP_OPTS=\"-Djava.library.path=\$HADOOP_INSTALL/lib\" >> /home/hduser/.bashrc'

    ReplyDelete
  62. How come i don't see ETC directory? Can some one help?

    cannot access /usr/local/hadoop/etc: No such file or directory

    ReplyDelete
  63. Hi

    I am getting below error. Please help

    hduser@ubuntu:~$ start-dfs.sh
    14/05/21 22:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
    localhost]
    sed: -e expression #1, char 6: unknown option to `s'
    -c: Unknown cipher type 'cd'
    now.: ssh: Could not resolve hostname now.: Name or service not known
    guard.: ssh: Could not resolve hostname guard.: Name or service not known
    noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Name or service not known
    localhost: namenode running as process 3662. Stop it first.

    recommended: ssh: connect to host recommended port 22: Connection refused
    stack: ssh: connect to host stack port 22: Connection refused
    which: ssh: connect to host which port 22: Connection refused
    will: ssh: connect to host will port 22: Connection refused
    VM: ssh: connect to host VM port 22: Connection refused
    'execstack: ssh: Could not resolve hostname 'execstack: Name or service not known
    warning:: ssh: Could not resolve hostname warning:: Name or service not known
    ',: ssh: Could not resolve hostname ',: Name or service not known
    '-z: ssh: Could not resolve hostname '-z: Name or service not known
    It's: ssh: Could not resolve hostname It's: Name or service not known
    you: ssh: connect to host you port 22: Connection refused
    Server: ssh: connect to host Server port 22: Connection refused
    or: ssh: connect to host or port 22: Connection refused
    link: ssh: connect to host link port 22: Connection refused
    to: ssh: connect to host to port 22: Connection refused
    VM: ssh: connect to host VM port 22: Connection refused
    localhost: datanode running as process 3770. Stop it first.
    Starting secondary namenodes [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
    It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
    0.0.0.0]
    sed: -e expression #1, char 6: unknown option to `s'
    guard.: ssh: Could not resolve hostname guard.: Name or service not known
    now.: ssh: Could not resolve hostname now.: Name or service not known
    -c: Unknown cipher type 'cd'
    noexecstack'.: ssh: Could not resolve hostname noexecstack'.: Name or service not known
    The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
    ECDSA key fingerprint is d0:d6:21:4f:c3:12:f5:ed:af:2d:06:e9:b4:28:fd:90.
    Are you sure you want to continue connecting (yes/no)? 64-Bit: ssh: connect to host 64-Bit port 22: Connection refused
    loaded: ssh: connect to host loaded port 22: Connection refused
    stack: ssh: connect to host stack port 22: Connection refused
    it: ssh: connect to host it port 22: Connection refused
    with: ssh: connect to host with port 22: Connection refused
    with: ssh: connect to host with port 22: Connection refused
    warning:: ssh: Could not resolve hostname warning:: Name or service not known
    It's: ssh: Could not resolve hostname It's: Name or service not known
    ',: ssh: Could not resolve hostname ',: Name or service not known
    'execstack: ssh: Could not resolve hostname 'execstack: Name or service not known
    '-z: ssh: Could not resolve hostname '-z: Name or service not known

    ReplyDelete
    Replies
    1. 1.http://askubuntu.com/questions/144433/how-to-install-hadoop
      2.http://bigdatahandler.com/hadoop-hdfs/installing-single-node-hadoop-2-2-0-on-ubuntu/
      rely useful links ...ref these above links and ref this blog simultaneously u will successfully install hadoop on linux...

      Delete
  64. Really very nice and useful tutorial.. Thanks.

    ReplyDelete
  65. Very nice guide! This is the best! Thank you so, so very much!!!

    ReplyDelete
  66. Replies
    1. 1.http://askubuntu.com/questions/144433/how-to-install-hadoop
      2.http://bigdatahandler.com/hadoop-hdfs/installing-single-node-hadoop-2-2-0-on-ubuntu/

      Delete
  67. can we add hadoop 1.2.0 and hadoop 2.2. under the same new user , as created before for the purpose of hadoop-1.2.0, please reply soon .

    ReplyDelete
  68. Hi Zhi

    I followed your instruction and the nodemanager kept crashing. I received the following error in the nodemanager log file.

    FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize mapreduce_shuffle
    java.lang.RuntimeException: No class defiend for mapreduce_shuffle
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.init(AuxServices.java:94)

    I was able to fix the situation by changing the yarn-site.xml configuration from "mapreduce_shuffle" to "mapreduce.shuffle".
    This is the exact configuration I used in yarn-site.xml.



    yarn.nodemanager.aux-services
    mapreduce.shuffle


    yarn.nodemanager.aux-services.mapreduce.shuffle.class
    org.apache.hadoop.mapred.ShuffleHandler

    ReplyDelete
  69. If you want to install it on Mac OS Mavericks 10.9 you should put the export paths in .profile in the user you created ex. hduser. If it's not there(most probably) just create it: $touch .profile
    you should create it in /Users/hduser
    except that everything is same, but be careful when you copy past inside the .xml files you should paste it between and

    ReplyDelete
  70. This comment has been removed by the author.

    ReplyDelete
  71. I am new to Hadoop i am trying to install the same in Ubuntu 14 , i followed all the steps but i am stuck at downloading and installing hadoop .
    Error i am getting is "Connecting to www.trieuvan.com (www.trieuvan.com)|66.201.46.168|:80... failed: Connection refused."
    Even i tried to install other url's none of them is working .Please tell me where i am going wrong

    ReplyDelete
    Replies
    1. http://bigdatahandler.com/hadoop-hdfs/installing-single-node-hadoop-2-2-0-on-ubuntu/
      http://askubuntu.com/questions/144433/how-to-install-hadoop
      check out these links
      i guess u r getting error 404 ...
      u can download hadoop.tar file from http://www.apache.org/dyn/closer.cgi/hadoop/core

      Delete
  72. This comment has been removed by a blog administrator.

    ReplyDelete
  73. relly nice awsm tutorial thaks a lot ..succfully install hadoop on my laptop just now

    ReplyDelete
  74. hi

    i didn't get datanode ..


    i am getting only 5nodes:
    hduser@ubuntu:~$ jps
    3713 ResourceManager
    3522 SecondaryNameNode
    3931 NodeManager
    7899 Jps
    3055 NameNode
    hduser@ubuntu:~$

    ReplyDelete
  75. This comment has been removed by a blog administrator.

    ReplyDelete
  76. This is the best Hadoop setup tutorial I was able to find. Thank you.

    ReplyDelete
  77. Hi Everyone,
    It's great Tutorial!
    I'm new & trying Hadoop 2.4 & jdk8 on ubuntu14.04. Actually I follow the site: http://www.sybaris.ca/big-data-notes/installing-hadoop . later I find this awesome tutiorial. :) Both seems similar. I've followed above the Configuration Section. when I try this

    hduser@rony-laptop:~$ hadoop version # i get the following error
    /usr/lib/jvm/jdk//bin/java: error while loading shared libraries: libjli.so: cannot open shared object file:No such file or directory

    I've also execute permission issue problem that is solved. please help me to overcome this problem.

    ReplyDelete
  78. This comment has been removed by a blog administrator.

    ReplyDelete
  79. This comment has been removed by a blog administrator.

    ReplyDelete
  80. This comment has been removed by a blog administrator.

    ReplyDelete
  81. My datanode, namenode and secondary nodes are not starting. When i run start-all.sh it shows:
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /usr/local/hadoop1/logs/hadoop-hduser4-namenode-ubuntu.out
    localhost: starting datanode, logging to /usr/local/hadoop1/logs/hadoop-hduser4-datanode-ubuntu.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop1/logs/hadoop-hduser4-secondarynamenode-ubuntu.out
    starting yarn daemons
    starting resourcemanager, logging to /usr/local/hadoop1/logs/yarn-hduser4-resourcemanager-ubuntu.out
    localhost: starting nodemanager, logging to /usr/local/hadoop1/logs/yarn-hduser4-nodemanager-ubuntu.out

    But on running jps, it shows only Resource Manager, JPS and Node Manager

    What do i do? Plz help

    ReplyDelete
  82. I was trying to search regarding hadoop setting and its installation for last several days. Finally your post helps me a lot.
    Anobody intrested in Hadoop Training so please check https://intellipaat.com/

    ReplyDelete
  83. Best web hosting companies. Reviews, rates, statistics of top hosting companies.
    Find best hosting company at HostingCompaniesz.com

    ReplyDelete
  84. Thanks for InformationHadoop Course will provide the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. This course will further examine related technologies such as Hive, Pig, and Apache Accumulo. HADOOP Online Training

    ReplyDelete
  85. "HI, thanks you so much for publishing this tutorial. Highly impressive one. In this tutorial we will be able to know step by step process for setting up a Hadoop Single Node cluster, so that we can play around with the framework and learn more about it. I would also like to suggest the newbies who are looking for more info regarding Hadoop, they may visit this page as well- https://intellipaat.com/
    ."

    ReplyDelete
  86. Hi this is raj i am having 3 years of experience as a php developer and i am certified. i have knowledge on OOPS concepts in php but dont know indepth. After learning hadoop will be enough to get a good career in IT with good package? and i crossed hadoop training in chennai website where someone please help me to identity the syllabus covers everything or not??
    Thanks,
    raj

    ReplyDelete
  87. Fusion Homes present Fusion Homes Greater Noida West at Fusion Homes Greater Noida West Review with best Fusion Homes Greater Noida West

    ReplyDelete
  88. Arihan Ambar presents 2/3 BHK #residential apartments in Noida Extension. Arihant Ambar will be developed on the modern concept of urbanliving.

    ReplyDelete
  89. Supertech Azaliya New project in Gurgaon, hues location sector 68 Sohna Road Gurgaon. 1/2 BHK Semi luxury Apartments. Call Now +91-9582251924

    ReplyDelete
  90. Amrapali Group Amrapali Verona Heights Promoted by Property30.com

    ReplyDelete
  91. Supertech Limited Launching new address Supertech Azaliya for home purchasers in Gurgaon Sector 68 Gurgaon.

    ReplyDelete
  92. Now after a long time the developer comes up with one of the finest residential development project in Greater Noida West which will be named Arihant Ambar.

    ReplyDelete
  93. Bring your dream home to reality with plush, green and well located Supertech Aadri at Sector 79 Gurgaon.

    ReplyDelete
  94. Hi,
    Thanks for giving all the details and valuable information about Hadoop

    ReplyDelete
  95. this is very nice article and very good information for Oracle ADF Learners. our Cubtraining also provide all Oracle Courses

    ReplyDelete
  96. "HI, thanks for publishing this tutorial. Highly impressive and informative.In this tutorial will be able to know step by step process for setting up a Hadoop Single Node cluster.I would also like to suggest the beginners who are looking for more info regarding Hadoop, they may visit this page as well- https://www.hadooponlinetutor.com/
    ."

    ReplyDelete
  97. Thanks a lot for writing this blog. I used it step by step and everything worked well except the link for downloading hadoop 2.2.0. If you want please update it to "https://archive.apache.org/dist/hadoop/core/hadoop-2.2.0/hadoop-2.2.0.tar.gz".
    I used the same version of hadoop just to be on the safe side. I hope this works well with the newer versions as well. If it does not, it'd be really helpful if you can write one blog for the latest stable release as well.
    Thanks again.

    ReplyDelete
  98. mkdir: cannot create directory `/logs': Permission denied
    chown: cannot access `/logs': No such file or directory
    starting resourcemanager, logging to /logs/yarn-hduser-resourcemanager-ubuntu.out
    /usr/local/hadoop/sbin/yarn-daemon.sh: line 124: /logs/yarn-hduser-resourcemanager-ubuntu.out: No such file or directory
    head: cannot open `/logs/yarn-hduser-resourcemanager-ubuntu.out' for reading: No such file or directory
    /usr/local/hadoop/sbin/yarn-daemon.sh: line 129: /logs/yarn-hduser-resourcemanager-ubuntu.out: No such file or directory
    /usr/local/hadoop/sbin/yarn-daemon.sh: line 130: /logs/yarn-hduser-resourcemanager-ubuntu.out: No such file or directory
    localhost: mkdir: cannot create directory `/logs': Permission denied
    localhost: chown: cannot access `/logs': No such file or directory
    localhost: starting nodemanager, logging to /logs/yarn-hduser-nodemanager-ubuntu.out
    localhost: /usr/local/hadoop/sbin/yarn-daemon.sh: line 124: /logs/yarn-hduser-nodemanager-ubuntu.out: No such file or directory
    localhost: head: cannot open `/logs/yarn-hduser-nodemanager-ubuntu.out' for reading: No such file or directory
    localhost: /usr/local/hadoop/sbin/yarn-daemon.sh: line 129: /logs/yarn-hduser-nodemanager-ubuntu.out: No such file or directory
    localhost: /usr/local/hadoop/sbin/yarn-daemon.sh: line 130: /logs/yarn-hduser-nodemanager-ubuntu.out: No such file or directory



    hey guys plz help me ...this error is coming while i am starting the yarn.....

    ReplyDelete
  99. This comment has been removed by the author.

    ReplyDelete
  100. Thanks for this great tutorial. Helped a lot while setup single node hadoop cluster.

    Thanks.

    ReplyDelete
  101. This comment has been removed by the author.

    ReplyDelete
  102. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
    Oracle Training In Chennai

    ReplyDelete

  103. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..
    Oracle Training In Chennai

    ReplyDelete

  104. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..
    Oracle Training In Chennai

    ReplyDelete
  105. Thanks for sharing such a great information..Its really nice and informative..
    Informatica Training In Chennai

    ReplyDelete

  106. This post is really nice and informative. The explanation given is really comprehensive and informative..
    Pega Training In Chennai

    ReplyDelete
  107. There are lots of information about latest technology and how to get trained in them, like Hadoop Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies Hadoop Training in Chennai By the way you are running a great blog. Thanks for sharing this blogs..
    Hadoop Training In Chennai

    ReplyDelete
  108. Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
    QTP Training In Chennai

    ReplyDelete
  109. Thanks for sharing this informative blog. I did SAS Certification in Greens Technology at Adyar. This is really useful for me to make a bright career..
    SAS Training In Chennai

    ReplyDelete
  110. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
    Greens Technologies Training In Chennai

    ReplyDelete
  111. This information is impressive..I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic
    Greens Technologies Training In Chennai

    ReplyDelete
  112. Thanks for sharing amazing information about pega Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with pega. Pega Training in Chennai

    ReplyDelete
  113. Who wants to learn Informatica with real-time corporate professionals. We are providing practical oriented best Informatica training institute in Chennai. Informatica Training in chennai

    ReplyDelete
  114. QTP is a software Testing Tool which helps in Functional and Regression testing of an application. If you are interested in QTP training, our real time working. QTP Training in Chennai

    ReplyDelete
  115. Looking for real-time training institue.Get details now may if share this link visit Oracle Training in chennai

    ReplyDelete
  116. Hey, nice site you have here!We provide world-class Oracle certification and placement training course as i wondered Keep up the excellent work experience!Please visit Greens Technologies located at Chennai Adyar Oracle Training in chennai

    ReplyDelete
  117. Awesome blog if our training additional way as an SQL and PL/SQL trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog Green Technologies In Chennai

    ReplyDelete
  118. Nice site....Please refer this site also Our vision succes!Training are focused on perfect improvement of technical skills for Freshers and working professional. Our Training classes are sure to help the trainee with COMPLETE PRACTICAL TRAINING and Realtime methodologies. Green Technologies In Chennai

    ReplyDelete
  119. if share valuable information about hadoop training courses, certification, online resources, and private training for Developers, Administrators, and Data Analysts may visit Hadoop Training in Chennai

    ReplyDelete
  120. let's Jump Start Your Career & Get Ahead. Choose sas training method that works for you. This course is designed for professionals looking to move to a role as a business analyst, and students looking to pursue business analytics as a career. SAS Training in Chennai

    ReplyDelete
  121. It is really very helpful for us and I have gathered some important information from this blog.
    Oracle Training In Chennai

    ReplyDelete
  122. Oracle Training in Chennai is one of the best oracle training institute in Chennai which offers complete Oracle training in Chennai by well experienced Oracle Consultants having more than 12+ years of IT experience.

    ReplyDelete
  123. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    ReplyDelete
  124. Great post and informative blog.it was awesome to read, thanks for sharing this great content to my vision.
    Informatica Training In Chennai

    ReplyDelete
  125. A Best Pega Training course that is exclusively designed with Basics through Advanced Pega Concepts.With our Pega Training in Chennai you’ll learn concepts in expert level with practical manner.We help the trainees with guidance for Pega System Architect Certification and also provide guidance to get placed in Pega jobs in the industry.

    ReplyDelete
  126. Our HP Quick Test Professional course includes basic to advanced level and our QTP course is designed to get the placement in good MNC companies in chennai as quickly as once you complete the QTP certification training course.

    ReplyDelete
  127. Thanks for sharing this nice useful informative post to our knowledge, Actually SAS used in many companies for their day to day business activities it has great scope in future.

    ReplyDelete
  128. Greens Technologies Training In Chennai Excellent information with unique content and it is very useful to know about the information based on blogs

    ReplyDelete
  129. Greens Technology offer a wide range of training from ASP.NET , SharePoint, Cognos, OBIEE, Websphere, Oracle, DataStage, Datawarehousing, Tibco, SAS, Sap- all Modules, Database Administration, Java and Core Java, C#, VB.NET, SQL Server and Informatica, Bigdata, Unix Shell, Perl scripting, SalesForce , RedHat Linux and Many more.

    ReplyDelete
  130. The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop Administrator Online Training for Installing Haddop on their Own
    Thank you for sharing Such a good tutorials on Hadoop

    ReplyDelete
  131. have to learned to lot of information about java Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with java.
    Java Training in Chennai

    ReplyDelete
  132. Looking for real-time training institue.Get details now may if share this link visit
    Spring Training in chennai
    oraclechennai.in:

    ReplyDelete
  133. Awesome blog if our training additional way as an SQL and PL/SQL trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
    plsql in Chennai
    greenstechnologies.in:

    ReplyDelete
  134. Job oriented form_reports training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
    forms-reports Training in Chennai

    ReplyDelete
  135. Looking for real-time training institue.Get details now may if share this link visit
    Spring Training in chennai
    oraclechennai.in:

    ReplyDelete
  136. Hybernet is a framework Tool. If you are interested in hybernet training, our real time working.
    Hibernate Training in Chennai.
    hibernate-training-institute-center-in-chennai

    ReplyDelete
  137. if i share this blog weblogic Server Training in Chennai aims to teach professionals and beginners to have perfect solution of their learning needs in server technologies.Weblogic server training In Chennai

    ReplyDelete
  138. if learned in this site.what are the tools using in sql server environment and in warehousing have the solution thank ..Msbi training In Chennai

    ReplyDelete
  139. i wondered keep share this sites .if anyone wants realtime training Greens technolog chennai in Adyar visit this blog..performance tuning training In Chennai

    ReplyDelete
  140. As your information sybase very nice its more informative and gather new ideas implemnted thanks for sharing this blogsybase training In Chennai

    ReplyDelete
  141. i gain the knowledge of Java programs easy to add functionalities play online games, chating with others and industry oriented coaching available from greens technology chennai in Adyar may visit.Core java training In Chennai

    ReplyDelete
  142. I have read your blog and I got very useful and knowledgeable information from your blog. It’s really a very nice article Spring training In Chennai

    ReplyDelete
  143. fantastic presentation .We are charging very competitive in the market which helps to bring more oracle professionals into this market. may update this blog . Oracle training In Chennai which No1:Greens Technologies In Chennai

    ReplyDelete

  144. hai you have to learned to lot of information about c# .net Gain the knowledge and hands-on experience you need to successfully design, build and deploy applications with c#.net.
    C-Net-training-in-chennai

    ReplyDelete

  145. hai If you are interested in asp.net training, our real time working.
    asp.net Training in Chennai.
    Asp-Net-training-in-chennai.html

    ReplyDelete

  146. Amazing blog if our training additional way as an silverlight training trained as individual, you will be able to understand other applications more quickly and continue to build your skill set which will assist you in getting hi-tech industry jobs as possible in future courese of action..visit this blog
    silverlight-training.html
    greenstechnologies.in:

    ReplyDelete



  147. awesome Job oriented sharepoint training in Chennai is offered by our institue is mainly focused on real time and industry oriented. We provide training from beginner’s level to advanced level techniques thought by our experts.
    if you have more details visit this blog.
    SharePoint-training-in-chennai.html

    ReplyDelete

  148. if share valuable information about cloud computing training courses, certification, online resources, and private training for Developers, Administrators, and Data Analysts may visit
    Cloud-Computing-course-content.html

    ReplyDelete
  149. This steps works likes wonder !! Thanks soo much for wonderful artical

    ReplyDelete
  150. Amazingly-simple tutorial if followed patiently and accurately! my configuration: Running Ubuntu 14.04 through a virtual machine (VMWare player) running on a Windows 10 laptop and trying Hadoop 2.6.0.

    Suggestion: try using vim instead of vi for editing all those files, much simpler! to do so:
    - Install vim (if you don't have it already: sudo apt-get install vim
    - for all cases when the article asks to run command "vi filename" use "vim filename"
    - once you open the file, move the cursor to where you need to edit using the keyboard arrows
    - click "i" on the keyboard
    - do the change required
    - click "esc" then :wq (save changes and quit the file)

    ReplyDelete