Install and Configure HADOOP on OsX
by BoxOfSugar in Circuits > Apple
3077 Views, 5 Favorites, 0 Comments
Install and Configure HADOOP on OsX
Installing Hadoop on OSX
I decided that I wanted to setup a Hadoop cluster on the Mac’s I run, this was mainly decided because of Xgrid not begin available anymore on the new version os OsX. I have setup SGE clusters before, Xgrid obviously, and Microsoft Cluster Server so I wanted to get it under my belt. This isn’t the definitive guide but it worked fairly well for me, I am still not sure of some of the concepts but that will come with practice.
The first step is to make sure you have the basics.
Command line Xcode tools and Java Developer for your version os OsX.
https://developer.apple.com/downloads/index.action
Lets first create a group and a user on every machine.
Create a group named ‘hadoop’ and then add an admin user ‘hadoopadmin’ to the group.
Lets do everything as hadoopadmin to make it easy.
You can download Hadoop and install it yourself but I took a shortcut and used homebrew to install it.
->brew install hadoop
This will set all your env paths in the proper hadoop config files so this is a help.
Once installed lets set the config files in hadoop.
I named my machines
hadoop01 & hadoop02 for the first two.
Configure the masters and slaves file on all machines.
master:
hadoopadmin@hadoop01
slaves:
hadoopadmin@hadoop01
hadoopadmin@hadoop02
Also configure /etc/hosts on all machines.
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost
#
#
#
# hadoop
132.235.132.67 hadoop01
132.235.132.46 hadoop02
I am using 2.4.0 so they are located in
/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
Edit
hadoop-env.sh
I changed these two lines.
#export JAVA_HOME=“$(/usr/libexec/java_home)”
to
export JAVA_HOME=`/usr/libexec/java_home -v 1.6`
and
#export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”
to
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc=“
This last one stopped an error I was getting upon startup.
Edit
hdfs-site.xml
Insert this configuration
dfs.replication
3
dfs.name.dir
/usr/local/Cellar/hadoop/2.4.0/hdfs/name
dfs.data.dir
/usr/local/Cellar/hadoop/2.4.0/hdfs/data
Edit
mapred-site.xml.template
Insert
mapred.job.tracker
hadoop01:9001
Edit
core-site.xml
fs.default.name
hdfs://hadoop01:9000
hadoop.tmp.dir
/usr/local/Cellar/hadoop/2.4.0/tmp
Now lets create a few hadoop directories
/usr/local/Cellar/hadroop/2.4.0
-> hadoop -mkdir tmp
-> hadoop -mkdir hdfs
-> hadoop -mkdir hdfs/name
-> hadoop -mkdir hdfs/data
I enabled passwordless SSH on all machines.
ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
I found info on this at
http://stackoverflow.com/questions/7134535/setup-passphraseless-ssh-to-localhost-on-os-x
I then formatted the name node
-> hadoop namenode -format
Then started hadoop by running
/usr/local/Cellar/hadoop/2.4.0/libexec/sbin/start-all.sh
I did all of this stuff on all my machines, although some items I think do not need to be.
I have to thank
http://stackoverflow.com &
http://dennyglee.com
For tutorials and help getting through this.
Thanks
Joe Murphy
AKA Grehyton