Hadoop集群安装

回复 收藏
Hadoop集群安装

首先配置好一台之后,再RSYNC到各个slave即可,先在Namenode(Master )上安装

1、安装 JAVA、修改系统限制
  1. [root@hadoop src]# java -version
  2. java version "1.7.0_21"
  3. Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
  4. Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
修改系统打开文件数,与单个用户可用的最大进程数
/etc/security/limits.conf 中加入
  1. hadoop  -       nofile 32768
  2. hadoop soft/hard nproc 32000
在 /etc/pam.d/common-session 加上这一行:
  1. session required pam_limits.so
让其生效,重启登录SHELL

2、创建Hadoop用户
  1. useradd hadoop
3、SSH配置
  1. ssh-keygen  -t rsa -f  /root/.ssh/id_rsa
  2. cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
  3. 复制到其它几台:ssh-copy-id  -i /root/.ssh/id_rsa.pub  root@IP
注:确保namenode与datanode之前可以*互相*无密码访问

4、安装 Hadoop
/usr/local/下解压Hadoop
  1. cd /usr/local/
  2. tar zxvf /usr/local/src/hadoop-1.2.1.tar.gz
设定Hadoop目录所属主于组
  1. chown -R hadoop:hadoop hadoop-1.2.1/
5、Hadoop配置
/etc/profile中加入环境变量
  1. export HADOOP_INSTALL=/usr/local/hadoop-1.2.1
  2. export PATH=$PATH:$HADOOP_INSTALL/bin
  3. export PATH=$PATH:$JAVA_HOME/bin
hadoop-env.sh 修改如下内容
  1. export JAVA_HOME=/usr/java/jdk1.7.0_21
  2. export HADOOP_HEAPSIZE=2000
  3. //定义日志路径,为方便日后升级,建议不要放在Hadoop目录下
  4. export HADOOP_LOG_DIR=/usr/local/logs
  5. //如下为日志名称
  6. export HADOOP_IDENT_STRING=hadoop.namenode
  7. //指定SSH端口(默认22,如特殊端口要另外指定)
  8. export HADOOP_SSH_OPTS="-p 10000-o ConnectTimeout=1"
  9. #export HADOOP_MASTER=namenode:/usr/local/hadoop-1.2.1
core-site.xml中加入
  1. hadoop.tmp.dir/usr/local/hadoop/tmpfs.default.namehdfs://hadoop.namenode/true
建议修改tmp目录,

各种数据目录不用提前建,当启动Hadoop时会自动创建

hdfs-site.xml
  1. dfs.replication1dfs.name.dir/usr/local/hadoop/hdfs/nametruedfs.data.dir/usr/local/hadoop/hdfs/datatruefs.checkpoint.dir/usr/local/hadoop/hdfs/namesecondarytruedfs.webhdfs.enabledtruedfs.support.appendtruedfs.support.broken.appendtrue
注:name.dir目录可以指定两个,为了安全,可以指定一个NFS,一个本地,格式中间为逗号,如下形式:
  1. dfs.name.dir/usr/local/hadoop/hdfs/name,/data7/hadoop/hdfs/nametrue
mapred-site.xml
  1. mapred.job.trackerhadoop.namenode:8021mapred.local.dir/usr/local/hadoop/mapred/localturemapred.system.dir/usr/local/hadoop/mapred/systemturemapred.tasktracker.map.tasks.maxinum7turemapred.tasktracker.reduce.tasks.maxinum7turemapred.child.java.opts-Xmx400m
mapred.job.tracker项,正常当集群较大时,会单独指定一台服务器为jobtracker,以运行MapRedue, 小型集群中,jobtracker和namenode一台机器上,指定配置为 localhost:8021

h5.6、节点设置
在/etc/hosts中加入各节点信息
  1. 192.168.0.31   hadoop.namenode
  2. 192.168.0.32   hadoop.node1
修改Namenode与DataNode节点文件,默认在conf目录中,存在masters与slaves(此处,SLAVE端无需设置,文件同步过去也可以)
可以都写IP,或者都写域名的方式,指定各节点
  1. [root@hadoop conf]# cat masters
  2. 192.168.0.31
  3. [root@hadoop conf]# cat slaves
  4. 192.168.0.32
7、同步至Slave端
可以直接将/usr/local/hadoop-1.2.1/ 目录直接传至SLAVE端口
  1. rsync -av --prgress /usr/local/hadoop-1.2.1  root@hadoop.node1:/usr/local/
在SLAVE端执行
  1. useradd hadoop
  2. chown  -R hadoop:hadoop  /usr/local/hadoop-1.2.1
8、格式化文件系统,(Master端与slave均执行)
使用HDFS前要先格式化,系统的目录为前面hdfs-site.xml中的dfs.name.dir项,修改些配置,避免因默认为/分区,以致文件系统空间不足。
  1. [root@hadoop hdfs]#  hadoop namenode -format           
  2. 13/10/18 01:17:01 INFO namenode.NameNode: STARTUP_MSG:
  3. /************************************************************
  4. STARTUP_MSG: Starting NameNode
  5. STARTUP_MSG:   host = hadoop.namenode/127.0.0.1
  6. STARTUP_MSG:   args = [-format]
  7. STARTUP_MSG:   version = 1.2.1
  8. STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
  9. STARTUP_MSG:   java = 1.7.0_21
  10. ************************************************************/
  11. 13/10/18 01:17:01 INFO util.GSet: Computing capacity for map BlocksMap
  12. 13/10/18 01:17:01 INFO util.GSet: VM type       = 64-bit
  13. 13/10/18 01:17:01 INFO util.GSet: 2.0% max memory = 1864171520
  14. 13/10/18 01:17:01 INFO util.GSet: capacity      = 2^22 = 4194304 entries
  15. 13/10/18 01:17:01 INFO util.GSet: recommended=4194304, actual=4194304
  16. 13/10/18 01:17:01 INFO namenode.FSNamesystem: fsOwner=root
  17. 13/10/18 01:17:01 INFO namenode.FSNamesystem: supergroup=supergroup
  18. 13/10/18 01:17:01 INFO namenode.FSNamesystem: isPermissionEnabled=true
  19. 13/10/18 01:17:01 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
  20. 13/10/18 01:17:01 WARN namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync
  21. 13/10/18 01:17:01 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
  22. 13/10/18 01:17:01 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
  23. 13/10/18 01:17:01 INFO namenode.NameNode: Caching file names occuring more than 10 times
  24. 13/10/18 01:17:01 INFO common.Storage: Image file /usr/local/hadoop/hdfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
  25. 13/10/18 01:17:01 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/hadoop/hdfs/name/current/edits
  26. 13/10/18 01:17:01 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/hadoop/hdfs/name/current/edits
  27. 13/10/18 01:17:01 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
  28. 13/10/18 01:17:01 INFO namenode.NameNode: SHUTDOWN_MSG:
  29. /************************************************************
  30. SHUTDOWN_MSG: Shutting down NameNode at hadoop.namenode/127.0.0.1
  31. ************************************************************/
9、启动Hadoop
如果 前面设置了环境变量,可以直接执行
  1. start-all.sh
也可以单独启动
  1. start-dfs.sh
  2. start-mapred.sh
10、确定Hadoop运行正常
启动正常会在MASTER端出现NamoNode、JobTracker、SecondaryNameNode进程
SLAVE端出现DataNode、TaskTracker进程
  1. [root@hadoop.namenode ~]# jps
  2. 6366 Jps
  3. 4250 JobTracker
  4. 3997 NameNode
  5. 4157 SecondaryNameNode
  1. [root@hadoop.node1 conf]# jps
  2. 15373 TaskTracker
  3. 15268 DataNode
  4. 17443 Jps
或者查看HDFS情况
  1. [root@hadoop logs]# hadoop dfsadmin -report
  2. Configured Capacity: 158107308032 (147.25 GB)
  3. Present Capacity: 149379117071 (139.12 GB)
  4. DFS Remaining: 149379088384 (139.12 GB)
  5. DFS Used: 28687 (28.01 KB)
  6. DFS Used%: 0%
  7. Under replicated blocks: 1
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. -------------------------------------------------
  11. Datanodes available: 1 (1 total, 0 dead)
  12. Name: 192.168.0.32:50010
  13. Decommission Status : Normal
  14. Configured Capacity: 158107308032 (147.25 GB)
  15. DFS Used: 28687 (28.01 KB)
  16. Non DFS Used: 8728190961 (8.13 GB)
  17. DFS Remaining: 149379088384(139.12 GB)
  18. DFS Used%: 0%
  19. DFS Remaining%: 94.48%
  20. Last contact: Fri Oct 18 17:31:18 CST 2013
打开页面http://hadoop.namenode:50030/jobtracker.jsp查看节点情况
节点对应SLAVAE数
node1.png
2013-10-18 18:08 举报
已邀请:
0

阿铭 管理员

赞同来自:

{:5_132:} 好东西。
0

nihao426181

赞同来自:

看不懂……

回复帖子,请先登录注册

退出全屏模式 全屏模式 回复
评分
可选评分理由: