管理hadoop
一、HDFS
永久性数据结构
1.1 namde的目录结构
[root@datanode1 name]# cd /data0/hadoop/dfs/name/current/[root@datanode1 current]# lsedits edits.new fsp_w_picpath fstime VERSION[root@datanode1 current]# ls -l总用量 56-rw-rw-r--. 1 hadoop hadoop 789 1月 15 16:59 edits-rw-rw-r--. 1 hadoop hadoop 1049088 1月 15 18:00 edits.new-rw-rw-r--. 1 hadoop hadoop 14557 1月 14 18:47 fsp_w_picpath-rw-rw-r--. 1 hadoop hadoop 8 1月 14 18:47 fstime-rw-rw-r--. 1 hadoop hadoop 100 1月 14 18:47 VERSION
1.1.2 VERSION文件是一个JAVA属性,包含正在运行的HDFS的版本信息
[root@datanode1 current]# cat VERSION #Thu Jan 14 18:47:15 CST 2016namespaceID=688384215cTime=0storageType=NAME_NODElayoutVersion=-32
layoutVersion是一个负整数,描述HDFS持久数据结构(也称布局)的版本,但该版本号与hadoop发布包的版本号无关。只要布局变更,版本号便会递增(如,版本号-18之后是-19),此时,HDFS也需升级。否则,磁盘仍然使用旧版本布局,新版本的namenode或datanode无法正常工作。
namespaceIT:是文件系统唯一标识符,是在文件系统首次格式化时设置的。
cTime:标记了namenode存储系统的创建时间。对于刚刚格式化的存储系统,这个属性值为0
storageType:说明该存储目录包含有namenode的数据结构
1.1.3 文件系统映像和编辑日志
文件系统客户端执行写操作时(如创建和移动文件),这些操作,首先被记录到编辑日志中。namenode在内存中维护文件系统的元数据;当编辑日志被修改时,相关元数据信息也需要更新。内存中元数支技客户端的读请求。
每次执行写操作后,且向客户端发送成功代码之前,编辑日志需要更新和同步。当namenode向多个目录 写数时,只有在所有写操作均执行完毕之后方可返回代码,以确保任何操作不都不会因为机器故障而丢失。
fsp_w_picpath:是文件系统元数据一个永久检查点。如果namenode发生故障,可以先把fsp_w_picpath文件载入到内存重构新近的元数据,再执行编辑日志 记录各项操作
fsp_w_picpath包含文件系统中所有目录和文件inode的序列化信息。每个inode是一个文件或目录的元数据的内存部描述方式。对于文件来说,包含信息有"复本级别"(replication level),修改时间和访问时间,访问许可,块大小,组成一个文件块等;对于目录来说,包含有修改时间,访问许可和配额元数据等信息。
数据块存储在datanode中,但fsp_w_picpath文件不描述datanode,取而代之的是,namenode将这种块映射关系放在内存中。当datanode加入集群时,namenode向datanode索取块列表以建立映射关系;namnode还将定期征询datanode以确保它拥有最新的块映射。
运行辅助namenode,为主namenode内存中的文件系统元数创建检查点
(1)辅助namenode请求主namenode停止使用edits文件,暂时将新的写操作记录到一个新文件中。
(2)辅助namenode从主namenode获取fsp_w_picpath和edits文件(采用HTTP GET)
(3)辅助namenode将fsp_w_picpath文件载入内存,逐一执行edits文件中操作,创建新fsp_w_picpaths文件。
(4)辅助namenode将新fsp_w_picpath文件发送回主namenode(使用HTTP POST)
(5)主namenode用从辅助namenode接收的fsp_w_picpath文件替换旧的fsp_w_picpath文件;用步聚1所产生的edits文件替换旧edits文件。同时,还更新ftime文件来记录检查点执行时间。
创建检查点的触发条件接爱两个配置参数控制。
(1)辅助namenode每隔一小时(由fs.checkpoint.period属性设置 ,以秒为单位)
(2)当编辑日志大小时到达64MB(由fs.checkpoint.size属性设置,以字节为单位)时,即使未到一小时也会创建检查点。系统每隔5分钟检查一次编辑日志大小。
1.2 辅助namenode的目录结构
[root@slave-two current]# pwd/data0/hadoop/dfs/data/current[root@slave-two current]# cat VERSION #Fri Jan 15 15:34:22 CST 2016namespaceID=688384215storageID=DS-1030151558-10.1.2.216-50010-1452481280886cTime=0storageType=DATA_NODElayoutVersion=-32
在主namenode发生故障时(假设没有及时备份,甚至在NFS上也没有),可以从辅助namenode恢复数据。两种实现方法
(1)将相关存储目录复制到新的namenode中
(2)使用-importCheckpoint选项启动namenode守护进程,从面辅助namenode用作新的主namenode。借助该选项,当dfs.name.dir属性定义目录中没有元数据时,辅助namenode就从fs.checkpoint.dir目录截入最新的检查点数据,否则执行失败
1.3 datanode的目录结构
datanode的存储目录是初始阶段自动创建的,不需要额外格式化
[root@slave-one current]# lsblk_-1342046564177101301 blk_3255346014128987307 blk_-4378222930931288631 blk_7478159877522346339 blk_-8475713792677154223blk_-1342046564177101301_1004.meta blk_3255346014128987307_1010.meta blk_-4378222930931288631_1065.meta blk_7478159877522346339_1002.meta blk_-8475713792677154223_1063.metablk_-1859875086242295767 blk_3484901243420393976 blk_5202437766650751967 blk_7579826132350507903 blk_-9058686418693604829blk_-1859875086242295767_1061.meta blk_3484901243420393976_1067.meta blk_5202437766650751967_1072.meta blk_7579826132350507903_1080.meta blk_-9058686418693604829_1062.metablk_253660519371394588 blk_-350256639016866731 blk_5450455005443823908 blk_774901497839428573 dncp_block_verification.log.currblk_253660519371394588_1014.meta blk_-350256639016866731_1077.meta blk_5450455005443823908_1076.meta blk_774901497839428573_1068.meta VERSIONblk_2653614491429524571 blk_-4332403947618992681 blk_6996247191717220870 blk_7996063171811697628blk_2653614491429524571_1066.meta blk_-4332403947618992681_1012.meta blk_6996247191717220870_1064.meta blk_7996063171811697628_1013.meta[root@slave-one current]# pwd/data0/hadoop/dfs/data/current[root@slave-one current]# cat VERSION #Fri Jan 15 15:34:16 CST 2016namespaceID=688384215storageID=DS-444750413-10.1.2.215-50010-1452481260852cTime=0storageType=DATA_NODElayoutVersion=-32
datanode的current目录中的其他文件都有blk_前缀,包括两种文件类型:HDFS块文件(仅有原始数据)和块的元数据(含.meta后缀)。块文件包含所存储文件中一部分的原始数据;元数据文件包括头部(含版本和类型信息)和该块各区段的一系列的校验和
目录数据的数量增加到一定规模时,datanode会创建一个子目录来存放新数据块及元数据信息。如果存储64个(通过dfs.datanode.numblocks属性设置)数据块,就创建一个子目录
如果dfs.data.dir属性指定了不同磁盘上多个目录,那么数据块以轮转(round-robin)方式写到各个目录中。注意,同一个datanode上的每个磁盘上的块不会重复,不同的datanode之间块才可能重复
2. 安全模式
namenode启动时,先将fsp_w_picpath载入内存,并执行edits中各项操作。一旦内存中成功建立文件系统元数据的映像,则创建一个新的fsp_w_picpath文件(该操作不需借助辅助namenode)和一个空编辑日志。此时,namenode开始监听RPC和HTTP请求。但此刻,namenode处在安全模式,即namenode的文件系统对于客户端来说是只读的。
进入和离开安全模式
[hadoop@slave-one current]$ hadoop dfsadmin -safemode getSafe mode is ON
HDFS的网页面也能显示namenode是否处于安全模式
进入安全模式,使namenode永远处于安全模式方式,将属性dfs.safemode.thresholdpct的值设为大于1
[hadoop@slave-one current]$ hadoop dfsadmin -safemode enterSafe mode is ON
离开
[hadoop@slave-one current]$ hadoop dfsadmin -safemode leaveSafe mode is OFF
1.4 工具
1.4.1 dfadmin工具
可查找HDFS状态信息,也可在HDFS上执行管理操作
hadoop dfsadmin
[hadoop@slave-one current]$ hadoop dfsadmin -help
hadoop dfsadmin is the command to execute DFS administrative commands.
The full syntax is:
hadoop dfsadmin [-report] [-safemode
[-saveNamespace]
[-refreshNodes]
[-setQuota
[-clrQuota
[-setSpaceQuota
[-clrSpaceQuota
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[refreshSuperUserGroupsConfiguration]
[-setBalancerBandwidth
[-help [cmd]]
-report: Reports basic filesystem information and statistics.
-safemode
Safe mode is a Namenode state in which it
1. does not accept changes to the name space (read-only)
2. does not replicate or delete blocks.
Safe mode is entered automatically at Namenode startup, and
leaves safe mode automatically when the configured minimum
percentage of blocks satisfies the minimum replication
condition. Safe mode can also be entered manually, but then
it can only be turned off manually as well.
-saveNamespace: Save current namespace into storage directories and reset edits log.
Requires superuser permissions and safe mode.
-refreshNodes: Updates the set of hosts allowed to connect to namenode.
Re-reads the config file to update values defined by
dfs.hosts and dfs.host.exclude and reads the
entires (hostnames) in those files.
Each entry not defined in dfs.hosts but in
dfs.hosts.exclude is decommissioned. Each entry defined
in dfs.hosts and also in dfs.host.exclude is stopped from
decommissioning if it has aleady been marked for decommission.
Entires not present in both the lists are decommissioned.
-finalizeUpgrade: Finalize upgrade of HDFS.
Datanodes delete their previous version working directories,
followed by Namenode doing the same.
This completes the upgrade process.
-upgradeProgress
request current distributed upgrade status,
a detailed status or force the upgrade to proceed.
-metasave
to
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currrently being replicated
4. Blocks waiting to be deleted
-setQuota
The directory quota is a long integer that puts a hard limit
on the number of names in the directory tree
Best effort for the directory, with faults reported if
1. N is not a positive integer, or
2. user is not an administrator, or
3. the directory does not exist or is a file, or
-clrQuota
Best effort for the directory. with fault reported if
1. the directory does not exist or is a file, or
2. user is not an administrator.
It does not fault if the directory has no quota.
-setSpaceQuota
The space quota is a long integer that puts a hard limit
on the total size of all the files under the directory tree.
The extra space required for replication is also counted. E.g.
a 1GB file with replication of 3 consumes 3GB of the quota.
Quota can also be speciefied with a binary prefix for terabytes,
petabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB).
Best effort for the directory, with faults reported if
1. N is not a positive integer, or
2. user is not an administrator, or
3. the directory does not exist or is a file, or
-clrSpaceQuota
Best effort for the directory. with fault reported if
1. the directory does not exist or is a file, or
2. user is not an administrator.
It does not fault if the directory has no quota.
-refreshServiceAcl: Reload the service-level authorization policy file
Namenode will reload the authorization policy file.
-refreshUserToGroupsMappings: Refresh user-to-groups mappings
-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
-setBalancerBandwidth
Changes the network bandwidth used by each datanode during
HDFS block balancing.
that will be used by each datanode. This value overrides
the dfs.balance.bandwidthPerSec parameter.
--- NOTE: The new value is not persistent on the DataNode.---
-help [cmd]: Displays help for the given command or all commands if none
is specified.
Generic options supported are
-conf
-D
-fs
-jt
-files
-libjars
-archives
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
-help:显示指定命令帮助,未指明,则显示所有命令的帮助
[hadoop@slave-one current]$ hadoop dfsadmin -safemode -helpUsage: java DFSAdmin [-safemode enter | leave | get | wait]
-repot:显示文件系统的统计信息(类似在网页界面上显示文件的内容)
-metasave:将某些信息存储到hadoop日志目录中一个文件,包括正在被复制或删除的块信息以及连接的datanode列表
-safamode:改变或查询安全模式
-saveNamespace:将内存中文件系统映像保存到为一个新的fsp_w_picpath文件,重置edits文件。该操作公在安全模式下执行。
-refreshNodes:更新允许连接到namenode的datanode列表
-upgradeProgress:获取有关HDFS升级的进度信息或强制升级
-finalizeUpgrade:移除datanode和namenode的存储目录上的旧版本数据。这个操作一般在升级完成而且集群在新版本下运行正常情况下执行
-setQuota:设置目录配额,即设置以该目录为根的整个目录树最多包含多少个文件和目录。这项配置能有效阻止用户创建大量小文件,从而保护namenode的内存(文件系统中的所有文件,目录和块的各项信息均存储在内存中)
-clrQuota:清瑼指定的空间配额
-setSpaceQtuota:设置目录空间配客,以限制存储在目录树中的所有文件的总规模。分别为各用户指定有限的存储空间很有必要
-clrSpaceQtuota:清理指定的空间配额
-refreshSserviceAcl:刷新namenode的服务级授权策略文件。
1.4.2 fsck工具
hadoop提供fsck工具来检查HDFS中文件的健康状况。该工具会查找哪些在所有datanode中均缺失的块以及过少或过多复本的块。注意,fsck工具只是从namenode获取信息,并不与任何datanode进行交互,因此并不真正获取块数据
hadoop fsck /
[root@xenserver hadoop6]# xm console hadoop1WARNING: xend/xm is deprecated.PCI: Warning: Cannot find a gap in the 32bit address rangePCI: Unassigned devices with 32bit resource registers may break!PCI: Fatal: No config space access function foundipmi_si: Could not set up I/O spaceipmi_si: Could not set up I/O spaceipmi_si: Could not set up I/O space Welcome to CentOS Starting udev: [ OK ]Setting hostname hadoop1: [ OK ]Setting up Logical Volume Management: 3 logical volume(s) in volume group "VolGroup" now active[ OK ]Checking filesystemsChecking all file systems.[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/VolGroup-lv_root /dev/mapper/VolGroup-lv_root: clean, 62607/3276800 files, 1911998/13107200 blocks[/sbin/fsck.ext4 (1) -- /boot] fsck.ext4 -a /dev/xvda1 /dev/xvda1: recovering journal/dev/xvda1: clean, 38/128016 files, 49624/512000 blocks[/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -a /dev/mapper/VolGroup-lv_home /dev/mapper/VolGroup-lv_home: recovering journal/dev/mapper/VolGroup-lv_home: clean, 8755/29204480 files, 1969939/116811776 blocks[ OK ]Remounting root filesystem in read-write mode: [ OK ]Mounting local filesystems: [ OK ]Enabling /etc/fstab swaps: [ OK ]Entering non-interactive startupStarting monitoring for VG VolGroup: 3 logical volume(s) in volume group "VolGroup" monitored[ OK ]ip6tables: Applying firewall rules: [ OK ]Bringing up loopback interface: [ OK ]Bringing up interface eth0: Determining if ip address 10.1.2.184 is already in use for device eth0...[ OK ]Starting auditd: [ OK ]Starting system logger: [ OK ]Mounting filesystems: [ OK ]Retrigger failed udev events[ OK ]Starting sshd: [ OK ]Starting postfix: [ OK ]Starting crond: [ OK ]CentOS release 6.5 (Final)Kernel 2.6.32-431.el6.x86_64 on an x86_64hadoop1 login: rootPassword: Last login: Tue Jan 19 10:59:56 from 10.1.2.192[root@hadoop1 ~]# hadoop fsck /-bash: hadoop: command not found[root@hadoop1 ~]# cd /home/hadoop/hadoop-1.0.4/bin/[root@hadoop1 bin]# ./hadoop fsck /FSCK started by root from /10.1.2.184 for path / at Tue Jan 19 20:59:55 CST 2016......................../data/appstore/chDownloadForPlayer/2016/01/14/00/output/_logs/history/job_201601082048_0706_conf.xml: CORRUPT block blk_-1739242649335851318.........................................................................................................................................................................../data/appstore/chRetainAndFresh/2016/01/14/00/output/_logs/history/job_201601082048_0707_conf.xml: CORRUPT block blk_5175780252882211574...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................FSCK ended at Tue Jan 19 20:59:56 CST 2016 in 1469 millisecondsPermission denied: user=root, access=READ_EXECUTE, inode=".staging":hadoop:supergroup:rwx------Fsck on path '/' FAILED
fsck输出文件内容有以下信息
过多的复制块
指复本数超出最小块复本级别的块,严格上讲,并非一个大问题,HDFS会自动删除多余复本
仍需复制的块
指复本数目低于最小复本级别的块。HDFS会自动为这些块创建新的复本,直到达到最小复本级别,用 hadoop dfsadmin -metasave FILE 了解正在复制的(或等待复制的)块的信息
错误复制的块
违反块复本放置策略的块。如,最小复本级为3的多机架集群中,如果一个块的三个复本都存储在一个机加中,则可认定该块的复本放置错误,因为一个块的复本要分散在至少两个机架中,以提高可靠性。
损坏的块
指所有复本均已损坏的块。如果虽然部分复本损坏,但至少还有一个复本完好,则该块就未损坏;namenode将创建新的复本,直到达到最小复本级别。
缺失的复本
指在集群中没有任何复本的块
1.5 均衡器
均衡器(balancer)程序是一个hadoop守护进程,它将块从忙碌的datanode移到相对空闲的datanode,从而重新分配块。
启动均衡器,-threshold参数指定阀值(百分比格式),以判定集群是否均衡,默认10%
start-balancer.sh
三、维护
元数据备份
如果namenode永久性元数据丢失或损坏,则整个文件系统无法使用。备份方法:利用脚本文件定期将辅助namenode的previous.checkpoint子目录存档,放到异地站点。注意该子目录放在fs.checkpoint.dir属性定义的目录之中。
数据备份
distcp是一个理想备份工具,其并行的文件复制功能可将备份文件存储到其他HDFS集群。
3. 添加新节点
3.1 委任新节点
(1)配置hdfs-site.xml文件,指向namenode;
(2)配置mapred.site.xml文件,指向jobtracker
(3) 启动datanode和jobtracker守护进程
注意:被允许连接到namenode的所有datanode放在一个文件中,文件名称由dfs.hosts属性指定。该文件放在namenode的本地文件系统中,每行对应一个datanode的网络地址。如需要为一个datanode指定多个网络地址,可将多个网络地址放在一行,由空格隔开。通常情况下,集群中的节点同时运行datanode和tasktracker守护进程,dfs.hosts和mapred.hosts会同时指向一个文件,即include文件。
3.2 dfs.hosts属性和mapred.hosts属性指定(一个或多个)文件不同于slave文件
前者供namenodet和jobtracker使用,决定可以连接哪个工作节点
后者使用slave文件执行面向整个集群范围的操作。如重启集群等。
3.3 向集群添加新节点步聚
(1)将新点的网络地址添加到include文件中
(2)将审核一系列的datanode集合更新至namenode信息
hadoop dfsadmin -refreshNodes
(3)经过审核的一系列的tasktracker信息更新至jobtracker
hadoop mradmin -refreshNodes
(4)以新节点更新slaves文件。这样的话,hadoop控制脚本会将新节点包括在未来操作之中
(5)启动新的datanode和tasktracker
(6)检查新的datanode和tasktracker是否出现在网页界面中
4. 解除旧节点
4.1 用户将拟退出若干datanode告知namenode,hadoop系统将这些datanode停机之前将块复制到其他datanode
4.2 HDFS的include文件和exclude文件
节点是否出现在include文件中 | 节点是否出现在exclude文件中 | 解释 |
否 | 否 | 节点无法连接 |
否 | 是 | 节点无法连接 |
是 | 否 | 节点可连接 |
是 | 是 | 节点可连接,将被解除 |
4.3 从集群节点移除节点步聚
(1)将待解除节点的网络地址添加到exclude文件中,不更新include文件
(2)使用一组新的审核过的datanode来更新namenode
hadoop dfsadmin -refreshNodes
(3)使用一组新的审核过的datanode来更新jobtracker设置
hadoop mradmin -refreshNodes
(4) 转到网页界面,查看待解除datanode的管理状态是否已经变为"正在解除"(Decommission In Progress)。这些datanode会把它们的块复制到其他的datanode中
(5)当所有datanode的状态变为"解除完毕"(Decommissioned)时,表明所有块都已经复制完毕。关闭已经解除的节点
(6)从include文件中移除这些节点
hadoop dfsadmin -refreshNodes
hadoop mradmin -refreshNodes
(7) 从slaves文件中移除节点