Hadoop - Setup ENV
环境:
Xshell: 5
Xftp: 4
Virtual Box: 5.16
Linux: CentOS-7-x86_64-Minimal-1511
Vim: yum -y install vim-enhanced
JDK: 8
Hadoop: 2.7.3.tar.gz
在Virtual Box中安装完成 Linux后,设置网卡为自动启动:
检查机器网卡:
nmcli d
可以看到有一个网卡:enp0s3
用vi打开网卡配置文件:
vi /etc/sysconfig/network-scirpts/ifcfg-enp0s3
修改最后一行:ONBOOT=no -> ONBOOT=yes
DEVICE=eth0 | 描述网卡对应的设备别名,例如ifcfg-eth0的文件中它为eth0 |
BOOTPROTO=static | 设置网卡获得ip地址的方式,可能的选项为static,dhcp或bootp,分别对应静态指定的ip地址,通过dhcp协议获得的ip地址,通过bootp协议获得的ip地址 |
BROADCAST=192.168.0.255 | 对应的子网广播地址 |
HWADDR=00:07:E9:05:E8:B4 | 对应的网卡物理地址 |
IPADDR=12.168.1.2 | 如果设置网卡获得ip地址的方式为静态指定,此字段就指定了网卡对应的ip地址 |
IPV6INIT=no | 开启或关闭IPv6;关闭no,开启yes |
IPV6_AUTOCONF=no | 开启或关闭IPv6自动配置;关闭no,开启yes |
NETMASK=255.255.255.0 | 网卡对应的网络掩码 |
NETWORK=192.168.1.0 | 网卡对应的网络地址 |
ONBOOT=yes | 系统启动时是否设置此网络接口,设置为yes时,系统启动时激活此设备 |
安装Hadoop
[root@centosmaster opt]# tar zxf hadoop-2.7.3.tar.gz[root@centosmaster opt]# cd hadoop-2.7.3[root@centosmaster opt]# cd /opt/hadoop-2.7.3/etc/hadoop
core-site.xml
fs.defaultFS hdfs://CentOS_105:9000 hadoop.tmp.dir file:/opt/hadoop-2.7.3/current/tmp fs.trash.interval 8
hdfs-site.xml
dfs.namenode.name.dir /opt/hadoop-2.7.3/current/dfs/name dfs.datanode.data.dir /opt/hadoop-2.7.3/current/data dfs.replication 1 dfs.webhdfs.enabled true dfs.permissions.superusergroup staff dfs.permissions.enabled false
yarn-site.xml
yarn.resourcemanager.hostname centosmaster yarn.nodemanager.aux.services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address centosmaster:18040 yarn.resourcemanager.scheduler.address centosmaster:18030 yarn.resourcemanager.resource.tracker.address centosmaster:18025 yarn.resourcemanager.manager.admin.address centosmaster:18141 yarn.resourcemanager.webapp.address centosmaster:18088 yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 86400 yarn.log-aggregation.retain-check-interval-seconds 86400 yarn.nodemanager.remote-app-log-dir /tmp/logs yarn.nodemanager.remote-app-log-dir-suffix logs
mapred-site.xml
mapreduce.foramework.name yarn mapreduce.jobtracker.http.address centosmaster:50030 mapreduce.jobhistory.address centosmaster:10020 mapreduce.jobhistory.webapp.address centosmaster:19888 mapreduce.jobhistory.done.dir /jobhistory/done mapreduce.intermediate-done-dir /jobhistory/one_intermediate mapreduce.job.ubertask.enable true
在Slaves文件中添加本机ip,指定本机为Slave:
centosmaster
给hadoop指定java jdk
vim hadoop-env.sh# The java implementation to use.export JAVA_HOME=/usr/java/jdk1.8.0_111/
格式化HDFS文件系统
[root@centosmaster~]# hdfs namenode -format************************************************************/16/10/23 08:58:31 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]16/10/23 08:58:31 INFO namenode.NameNode: createNameNode [-format]16/10/23 08:58:31 WARN common.Util: Path /opt/hadoop-2.7.3/current/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.16/10/23 08:58:31 WARN common.Util: Path /opt/hadoop-2.7.3/current/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.Formatting using clusterid: CID-1294bdbb-d45c-49f3-b5c5-3d26934e084f16/10/23 08:58:32 INFO namenode.FSNamesystem: No KeyProvider found.16/10/23 08:58:32 INFO namenode.FSNamesystem: fsLock is fair:true16/10/23 08:58:32 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=100016/10/23 08:58:32 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true16/10/23 08:58:32 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.00016/10/23 08:58:32 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Oct 23 08:58:3216/10/23 08:58:32 INFO util.GSet: Computing capacity for map BlocksMap16/10/23 08:58:32 INFO util.GSet: VM type = 64-bit16/10/23 08:58:32 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB16/10/23 08:58:32 INFO util.GSet: capacity = 2^21 = 2097152 entries16/10/23 08:58:32 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false16/10/23 08:58:32 INFO blockmanagement.BlockManager: defaultReplication = 116/10/23 08:58:32 INFO blockmanagement.BlockManager: maxReplication = 51216/10/23 08:58:32 INFO blockmanagement.BlockManager: minReplication = 116/10/23 08:58:32 INFO blockmanagement.BlockManager: maxReplicationStreams = 216/10/23 08:58:32 INFO blockmanagement.BlockManager: replicationRecheckInterval = 300016/10/23 08:58:32 INFO blockmanagement.BlockManager: encryptDataTransfer = false16/10/23 08:58:32 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 100016/10/23 08:58:32 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)16/10/23 08:58:32 INFO namenode.FSNamesystem: supergroup = staff16/10/23 08:58:32 INFO namenode.FSNamesystem: isPermissionEnabled = false16/10/23 08:58:32 INFO namenode.FSNamesystem: HA Enabled: false16/10/23 08:58:32 INFO namenode.FSNamesystem: Append Enabled: true16/10/23 08:58:32 INFO util.GSet: Computing capacity for map INodeMap16/10/23 08:58:32 INFO util.GSet: VM type = 64-bit16/10/23 08:58:32 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB16/10/23 08:58:32 INFO util.GSet: capacity = 2^20 = 1048576 entries16/10/23 08:58:32 INFO namenode.FSDirectory: ACLs enabled? false16/10/23 08:58:32 INFO namenode.FSDirectory: XAttrs enabled? true16/10/23 08:58:32 INFO namenode.FSDirectory: Maximum size of an xattr: 1638416/10/23 08:58:32 INFO namenode.NameNode: Caching file names occuring more than 10 times16/10/23 08:58:32 INFO util.GSet: Computing capacity for map cachedBlocks16/10/23 08:58:32 INFO util.GSet: VM type = 64-bit16/10/23 08:58:32 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB16/10/23 08:58:32 INFO util.GSet: capacity = 2^18 = 262144 entries16/10/23 08:58:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.999000012874603316/10/23 08:58:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 016/10/23 08:58:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 3000016/10/23 08:58:32 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 1016/10/23 08:58:32 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 1016/10/23 08:58:32 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,2516/10/23 08:58:32 INFO namenode.FSNamesystem: Retry cache on namenode is enabled16/10/23 08:58:32 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis16/10/23 08:58:32 INFO util.GSet: Computing capacity for map NameNodeRetryCache16/10/23 08:58:32 INFO util.GSet: VM type = 64-bit16/10/23 08:58:32 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB16/10/23 08:58:32 INFO util.GSet: capacity = 2^15 = 32768 entries16/10/23 08:58:32 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1532573559-192.168.0.105-147718431265116/10/23 08:58:32 INFO common.Storage: Storage directory /opt/hadoop-2.7.3/current/dfs/name has been successfully formatted.16/10/23 08:58:32 INFO namenode.FSImageFormatProtobuf: Saving p_w_picpath file /opt/hadoop-2.7.3/current/dfs/name/current/fsp_w_picpath.ckpt_0000000000000000000 using no compression16/10/23 08:58:32 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop-2.7.3/current/dfs/name/current/fsp_w_picpath.ckpt_0000000000000000000 of size 346 bytes saved in 0 seconds.16/10/23 08:58:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 p_w_picpaths with txid >= 016/10/23 08:58:32 INFO util.ExitUtil: Exiting with status 016/10/23 08:58:32 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at CentOS_105/192.168.0.105************************************************************/
从打出的Log文件可以看到格式化成功:
INFO common.Storage: Storage directory /opt/hadoop-2.7.3/current/dfs/name has been successfully formatted.
hdfs的路径有个warning,需要修改hdfs-site.xml
重新新格式化:
hdfs namenode -format
查看host:
hostnamectl
修改hostname:
[root@centosmaster~]#Hostnamectl set-hostname "centosmaster"
启动hadoop:
[root@centosmaster hadoop-2.7.3]# sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [localhost]localhost: starting namenode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-namenode-centosmaster.outcentosmaster: starting datanode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-datanode-centosmaster.outStarting secondary namenodes [Centosmaster]Centosmaster: starting secondarynamenode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-centosmaster.outstarting yarn daemonsstarting resourcemanager, logging to /opt/hadoop-2.7.3/logs/yarn-root-resourcemanager-centosmaster.outcentosmaster: starting nodemanager, logging to /opt/hadoop-2.7.3/logs/yarn-root-nodemanager-centosmaster.out
用Jps查看启动了什么节点:
[root@centosmaster hadoop]# jps2546 NodeManager3090 SecondaryNameNode3348 Jps2201 DataNode2109 NameNode2447 ResourceManager
停止Hadoop:
sbin/stop-all.sh
验证:
问题1-权限:
[root@CentOS_105 jdk1.8.0_111]# java -versionbash: /usr/java/jdk1.8.0_111//bin/java: Permission denied
解决:chmod 777 /usr/java/jdk1.8.0_111/bin/java
问题2-配置:
[root@centos_1 hadoop-2.7.3]# sbin/start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn.shIncorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.Starting namenodes on []
解决:在etc/hadoop/core-site.xml中增加配置:
fs.default.name hdfs://127.0.0.1:9000
问题3-Hostname
Does not contain a valid host:port authority:
原因:Hadoop的xml配置中会因为某些特殊字符而不正常.
解决:主机使用的hostname不合法,修改为不包含着'.' '/' '_'等非法字符的主机名
参阅
网卡配置信息:http://www.krizna.com/centos/setup-network-centos-7/
JDK安装详解:http://www.cnblogs.com/wangfajun/p/5257899.html