Hadoop集群安装
首先配置好一台之后,再RSYNC到各个slave即可,先在Namenode(Master )上安装
1、安装 JAVA、修改系统限制修改系统打开文件数,与单个用户可用的最大进程数
/etc/security/limits.conf 中加入在 /etc/pam.d/common-session 加上这一行:让其生效,重启登录SHELL
2、创建Hadoop用户3、SSH配置注:确保namenode与datanode之前可以*互相*无密码访问
4、安装 Hadoop
/usr/local/下解压Hadoop设定Hadoop目录所属主于组5、Hadoop配置
/etc/profile中加入环境变量hadoop-env.sh 修改如下内容core-site.xml中加入建议修改tmp目录,
各种数据目录不用提前建,当启动Hadoop时会自动创建
hdfs-site.xml注:name.dir目录可以指定两个,为了安全,可以指定一个NFS,一个本地,格式中间为逗号,如下形式:mapred-site.xmlmapred.job.tracker项,正常当集群较大时,会单独指定一台服务器为jobtracker,以运行MapRedue, 小型集群中,jobtracker和namenode一台机器上,指定配置为 localhost:8021
h5.6、节点设置
在/etc/hosts中加入各节点信息修改Namenode与DataNode节点文件,默认在conf目录中,存在masters与slaves(此处,SLAVE端无需设置,文件同步过去也可以)
可以都写IP,或者都写域名的方式,指定各节点7、同步至Slave端
可以直接将/usr/local/hadoop-1.2.1/ 目录直接传至SLAVE端口在SLAVE端执行8、格式化文件系统,(Master端与slave均执行)
使用HDFS前要先格式化,系统的目录为前面hdfs-site.xml中的dfs.name.dir项,修改些配置,避免因默认为/分区,以致文件系统空间不足。9、启动Hadoop
如果 前面设置了环境变量,可以直接执行也可以单独启动10、确定Hadoop运行正常
启动正常会在MASTER端出现NamoNode、JobTracker、SecondaryNameNode进程
SLAVE端出现DataNode、TaskTracker进程或者查看HDFS情况打开页面http://hadoop.namenode:50030/jobtracker.jsp查看节点情况
节点对应SLAVAE数
首先配置好一台之后,再RSYNC到各个slave即可,先在Namenode(Master )上安装
1、安装 JAVA、修改系统限制
- [root@hadoop src]# java -version
- java version "1.7.0_21"
- Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
- Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
/etc/security/limits.conf 中加入
- hadoop - nofile 32768
- hadoop soft/hard nproc 32000
- session required pam_limits.so
2、创建Hadoop用户
- useradd hadoop
- ssh-keygen -t rsa -f /root/.ssh/id_rsa
- cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
- 复制到其它几台:ssh-copy-id -i /root/.ssh/id_rsa.pub root@IP
4、安装 Hadoop
/usr/local/下解压Hadoop
- cd /usr/local/
- tar zxvf /usr/local/src/hadoop-1.2.1.tar.gz
- chown -R hadoop:hadoop hadoop-1.2.1/
/etc/profile中加入环境变量
- export HADOOP_INSTALL=/usr/local/hadoop-1.2.1
- export PATH=$PATH:$HADOOP_INSTALL/bin
- export PATH=$PATH:$JAVA_HOME/bin
- export JAVA_HOME=/usr/java/jdk1.7.0_21
- export HADOOP_HEAPSIZE=2000
- //定义日志路径,为方便日后升级,建议不要放在Hadoop目录下
- export HADOOP_LOG_DIR=/usr/local/logs
- //如下为日志名称
- export HADOOP_IDENT_STRING=hadoop.namenode
- //指定SSH端口(默认22,如特殊端口要另外指定)
- export HADOOP_SSH_OPTS="-p 10000-o ConnectTimeout=1"
- #export HADOOP_MASTER=namenode:/usr/local/hadoop-1.2.1
- hadoop.tmp.dir/usr/local/hadoop/tmpfs.default.namehdfs://hadoop.namenode/true
各种数据目录不用提前建,当启动Hadoop时会自动创建
hdfs-site.xml
- dfs.replication1dfs.name.dir/usr/local/hadoop/hdfs/nametruedfs.data.dir/usr/local/hadoop/hdfs/datatruefs.checkpoint.dir/usr/local/hadoop/hdfs/namesecondarytruedfs.webhdfs.enabledtruedfs.support.appendtruedfs.support.broken.appendtrue
- dfs.name.dir/usr/local/hadoop/hdfs/name,/data7/hadoop/hdfs/nametrue
- mapred.job.trackerhadoop.namenode:8021mapred.local.dir/usr/local/hadoop/mapred/localturemapred.system.dir/usr/local/hadoop/mapred/systemturemapred.tasktracker.map.tasks.maxinum7turemapred.tasktracker.reduce.tasks.maxinum7turemapred.child.java.opts-Xmx400m
h5.6、节点设置
在/etc/hosts中加入各节点信息
- 192.168.0.31 hadoop.namenode
- 192.168.0.32 hadoop.node1
可以都写IP,或者都写域名的方式,指定各节点
- [root@hadoop conf]# cat masters
- 192.168.0.31
- [root@hadoop conf]# cat slaves
- 192.168.0.32
可以直接将/usr/local/hadoop-1.2.1/ 目录直接传至SLAVE端口
- rsync -av --prgress /usr/local/hadoop-1.2.1 root@hadoop.node1:/usr/local/
- useradd hadoop
- chown -R hadoop:hadoop /usr/local/hadoop-1.2.1
使用HDFS前要先格式化,系统的目录为前面hdfs-site.xml中的dfs.name.dir项,修改些配置,避免因默认为/分区,以致文件系统空间不足。
- [root@hadoop hdfs]# hadoop namenode -format
- 13/10/18 01:17:01 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = hadoop.namenode/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 1.2.1
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
- STARTUP_MSG: java = 1.7.0_21
- ************************************************************/
- 13/10/18 01:17:01 INFO util.GSet: Computing capacity for map BlocksMap
- 13/10/18 01:17:01 INFO util.GSet: VM type = 64-bit
- 13/10/18 01:17:01 INFO util.GSet: 2.0% max memory = 1864171520
- 13/10/18 01:17:01 INFO util.GSet: capacity = 2^22 = 4194304 entries
- 13/10/18 01:17:01 INFO util.GSet: recommended=4194304, actual=4194304
- 13/10/18 01:17:01 INFO namenode.FSNamesystem: fsOwner=root
- 13/10/18 01:17:01 INFO namenode.FSNamesystem: supergroup=supergroup
- 13/10/18 01:17:01 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 13/10/18 01:17:01 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 13/10/18 01:17:01 WARN namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync
- 13/10/18 01:17:01 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 13/10/18 01:17:01 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
- 13/10/18 01:17:01 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 13/10/18 01:17:01 INFO common.Storage: Image file /usr/local/hadoop/hdfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
- 13/10/18 01:17:01 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/hadoop/hdfs/name/current/edits
- 13/10/18 01:17:01 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/hadoop/hdfs/name/current/edits
- 13/10/18 01:17:01 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
- 13/10/18 01:17:01 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at hadoop.namenode/127.0.0.1
- ************************************************************/
如果 前面设置了环境变量,可以直接执行
- start-all.sh
- start-dfs.sh
- start-mapred.sh
启动正常会在MASTER端出现NamoNode、JobTracker、SecondaryNameNode进程
SLAVE端出现DataNode、TaskTracker进程
- [root@hadoop.namenode ~]# jps
- 6366 Jps
- 4250 JobTracker
- 3997 NameNode
- 4157 SecondaryNameNode
- [root@hadoop.node1 conf]# jps
- 15373 TaskTracker
- 15268 DataNode
- 17443 Jps
- [root@hadoop logs]# hadoop dfsadmin -report
- Configured Capacity: 158107308032 (147.25 GB)
- Present Capacity: 149379117071 (139.12 GB)
- DFS Remaining: 149379088384 (139.12 GB)
- DFS Used: 28687 (28.01 KB)
- DFS Used%: 0%
- Under replicated blocks: 1
- Blocks with corrupt replicas: 0
- Missing blocks: 0
- -------------------------------------------------
- Datanodes available: 1 (1 total, 0 dead)
- Name: 192.168.0.32:50010
- Decommission Status : Normal
- Configured Capacity: 158107308032 (147.25 GB)
- DFS Used: 28687 (28.01 KB)
- Non DFS Used: 8728190961 (8.13 GB)
- DFS Remaining: 149379088384(139.12 GB)
- DFS Used%: 0%
- DFS Remaining%: 94.48%
- Last contact: Fri Oct 18 17:31:18 CST 2013
节点对应SLAVAE数
编辑回复