`
cloudtech
  • 浏览: 4612367 次
  • 性别: Icon_minigender_1
  • 来自: 武汉
文章分类
社区版块
存档分类
最新评论

hadoop分析之三org.apache.hadoop.hdfs.server.namenode各个类的功能与角色

 
阅读更多
以hadoop0.21为例。
NameNode.java:主要维护文件系统的名字空间和文件的元数据,以下是代码中的说明。
/**********************************************************
 * NameNode serves as both directory namespace manager and
 * "inode table" for the Hadoop DFS.  There is a single NameNode
 * running in any DFS deployment.  (Well, except when there
 * is a second backup/failover NameNode.)
 *
 * The NameNode controls two critical tables:
 *   1)  filename ->blocksequence (namespace)
 *   2)  block ->machinelist ("inodes")
 *
 * The first table is stored on disk and is very precious.
 * The second table is rebuilt every time the NameNode comes
 * up.
 *
 * 'NameNode' refers to both this class as well as the 'NameNode server'.
 * The 'FSNamesystem' class actually performs most of the filesystem
 * management.  The majority of the 'NameNode' class itself is concerned
 * with exposing the IPC interface and the http server to the outside world,
 * plus some configuration management.
 *
 * NameNode implements the ClientProtocol interface, which allows
 * clients to ask for DFS services.  ClientProtocol is not
 * designed for direct use by authors of DFS client code.  End -users
 * should instead use the org.apache.nutch.hadoop.fs.FileSystem class.
 *
 * NameNode also implements the DatanodeProtocol interface, used by
 * DataNode programs that actually store DFS data blocks.  These
 * methods are invoked repeatedly and automatically by all the
 * DataNodes in a DFS deployment.
 *
 * NameNode also implements the NamenodeProtocol interface, used by
 * secondary namenodes or rebalancing processes to get partial namenode's
 * state, for example partial blocksMap etc.
 **********************************************************/
FSNamesystem.java:主要维护几个表的信息:维护了文件名与block列表的映射关系;有效的block的集合;block与节点列表的映射关系;节点与block列表的映射关系;更新的heatbeat节点的LRU cache
/***************************************************
 * FSNamesystem does the actual bookkeeping work for the
 * DataNode.
 *
 * It tracks several important tables.
 *
 * 1)  valid fsname --> blocklist  (kept on disk, logged)
 * 2)  Set of all valid blocks (inverted #1)
 * 3)  block --> machinelist (kept in memory, rebuilt dynamically from reports)
 * 4)  machine --> blocklist (inverted #2)
 * 5)  LRU cache of updated -heartbeat machines
 ***************************************************/
INode.java:HDFS将文件和文件目录抽象成INode。
/**
 * We keep an in-memory representation of the file/block hierarchy.
 * This is a base INode class containing common fields for file and
 * directory inodes.
 */
FSImage.java:需要将INode信息持久化到磁盘上FSImage上。
/**
 * FSImage handles checkpointing and logging of the namespace edits.
 *
 */
FSEditLog.java:写Edits文件
/**
 * FSEditLog maintains a log of the namespace modifications.
 *
 */
BlockInfo.java:INode主要是所文件和目录信息的,而对于文件的内容来说,这是用block描述的。我们假设一个文件的长度大小为Size,那么从文件的0偏移开始,按照固定大小,顺序对文件划分并编号,划分好的每一块为一个block
/**
 * Internal class for block metadata.
 */
DatanodeDescriptor.java:代表的具体的存储对象。
/**************************************************
 * DatanodeDescriptor tracks stats on a given DataNode,
 * such as available storage capacity, last update time, etc.,
 * and maintains a set of blocks stored on the datanode.
 *
 * This data structure is a data structure that is internal
 * to the namenode. It is *not* sent over- the- wire to the Client
 * or the Datnodes. Neither is it stored persistently in the
 * fsImage.

 **************************************************/
FSDirectory.java:代表了HDFS中的所有目录和结构属性
/*************************************************
 * FSDirectory stores the filesystem directory state.
 * It handles writing/loading values to disk, and logging
 * changes as we go.
 *
 * It keeps the filename->blockset mapping always- current
 * and logged to disk.
 *
 *************************************************/
EditLogOutputStream.java:所有的日志记录都是通过EditLogOutputStream输出,在具体实例化的时候,这一组EditLogOutputStream包含多个EditLogFIleOutputStream和一个EditLogBackupOutputStream
/**
 * A generic abstract class to support journaling of edits logs into
 * a persistent storage.
 */
EditLogFileOutputStream.java:将日志记录写到edits或edits.new中。
/**
 * An implementation of the abstract class {@link EditLogOutputStream}, which
 * stores edits in a local file.
 */
EditLogBackupOutputStream.java:将日志通过网络发送到backupnode上。
/**
 * An implementation of the abstract class {@link EditLogOutputStream},
 * which streams edits to a backup node.
 *
 * @see org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol#journal
 * (org.apache.hadoop.hdfs.server.protocol.NamenodeRegistration,
 *  int, int, byte[])
 */
BackupNode.java:name Node的backup:升级阶段:Secondary Name Node -》Checkpoint Node(定期保存元数据,定期checkpoint) -》Backup Node(在内存中保持一份和Name Node完全一致的镜像,当元数据发生变化时,其元数据进行更新,可以利用自身的镜像来checkpoint,无需从nameNode下载)-》Standby Node(可以进行热备)
/**
 * BackupNode.
 * <p>
 * Backup node can play two roles.
 * <ol>
 * <li>{@link NamenodeRole#CHECKPOINT} node periodically creates checkpoints,
 * that is downloads image and edits from the active node, merges them, and
 * uploads the new image back to the active. </li>
 * <li>{@link NamenodeRole#BACKUP} node keeps its namespace in sync with the
 * active node, and periodically creates checkpoints by simply saving the
 * namespace image to local disk(s).</li>
 * </ol>
 */
BackupStorage.java:在Backup Node备份目录下创建jspool,并创建edits.new,将输出流指向edits.new
  /**
   * Load checkpoint from local files only if the memory state is empty.<br>
   * Set new checkpoint time received from the name -node. <br>
   * Move <code>lastcheckpoint.tmp </code> to <code>previous.checkpoint</code> .
   * @throws IOException
   */
TransferFsImage.java:负责从name Node去文件。
/**
 * This class provides fetching a specified file from the NameNode.
 */
GetImageServlet.java:是httpServlet的子类,处理doGet请求。
/**
 * This class is used in Namesystem's jetty to retrieve a file.
 * Typically used by the Secondary NameNode to retrieve image and
 * edit file for periodic checkpointing.
 */
分享到:
评论

相关推荐

    hadoop搭建

    关于Hadoop分布式搭建部署的文档,仅供参考,可以切磋交流噢

    hadoop-2.2.0-x64.tar.gz part2

    [INFO] Apache Hadoop HDFS ................................ SUCCESS [6:23.935s] [INFO] Apache Hadoop HttpFS .............................. SUCCESS [52.247s] [INFO] Apache Hadoop HDFS BookKeeper Journal...

    hadoop-2.2.0-x64.tar.gz part3

    [INFO] Apache Hadoop HDFS ................................ SUCCESS [6:23.935s] [INFO] Apache Hadoop HttpFS .............................. SUCCESS [52.247s] [INFO] Apache Hadoop HDFS BookKeeper Journal...

    hadoop-2.2.0-x64.tar.gz part1

    [INFO] Apache Hadoop HDFS ................................ SUCCESS [6:23.935s] [INFO] Apache Hadoop HttpFS .............................. SUCCESS [52.247s] [INFO] Apache Hadoop HDFS BookKeeper Journal...

    Hadoop入门实战手册

    目录 1 ...............................................................5.1.1 ..................................................................................23 HDFS的三个重要角色 5.1.2 ....................

    Hadoop 2.X HDFS源码剖析

    第3~5章分别介绍了Namenode、Datanode以及HDFS客户端这三个组件的实现细节,同时穿插介绍了HDFS 2.X的新特性,例如Namenode HA、Federation Namenode等。 阅读《Hadoop 2.X HDFS源码剖析》可以帮助读者从架构设计与...

    hadoop3.3.3-winutils

    Hadoop3.x在组成上没有变化Hadoop Distributed File System,简称HDFS,是一个分布式文件系统。 (1)NameNode(nn):存储文件的元数据,如文件名,文件目录结构,文件属性(生成时间、副本数、文件权限),以及每...

    Hadoop大数据平台构建、HDFS配置、启动与验证教学课件.pptx

    HDFS配置、启动与验证 HDFS配置、启动与验证 序号 任务名称 任务一 Hadoop安装及JDK环境变量配置 任务二 HDFS组件参数配置 任务三 配置Hadoop环境变量 任务四 分发Hadoop文件 任务五 NameNode格式化 任务六 启动HDFS...

    Hadoop实战中文版.PDF

    413.2.5 Combiner:本地reduce 433.2.6 预定义mapper和Reducer类的单词计数 433.3 读和写 433.3.1 InputFormat 443.3.2 OutputFormat 493.4 小结 50第二部分 实战第4章 编写MapReduce基础程序 524.1...

    hadoop-2.7.7单机win7或win10搭建完整包

    3.使用编辑器打开E:\apps\hadoop-2.7.7\etc\hadoop\hadoop-env.cmd,修改set JAVA_HOME=E:\apps\你的jdk目录名 4.把E:\apps\hadoop-2.7.7\bin\hadoop.dll拷贝到 C:\Windows\System32 5.设置环境变量,新建系统变量,...

    HDFS源码剖析带书签目录高清.zip

    第3~5章分别介绍了Namenode、Datanode以及HDFS客户端这三个组件的实现细节,同时穿插介绍了HDFS 2.X的新特性,例如Namenode HA、Federation Namenode等。 阅读《Hadoop 2.X HDFS源码剖析》可以帮助读者从架构设计与...

    Hadoop HDFS原理分析,技术详解

    HDFS概述,HDFS客户端操作,HDFS数据流,namenode工作机制,datanode工作机制,HDFS 高可用集群配置

    Hadoop_HDFS安装和管理.pdf

    Hadoop_HDFS安装和管理.pdf, hadoop 0.23.x namenode 海量数据分析

    Apache Hadoop 3.x state of the union and upgrade guidance

    Wangda Tan and Wei-Chiu Chuang the current status of Apache Hadoop 3.x—how it’s used today in deployments large and small, and they dive into the exciting present and future of Hadoop 3.x—features ...

    HadoopHA集群部署、HDFSHA配置、启动与验证教学课件.pptx

    HDFS HA 配置、启动与验证;HDFS HA 配置、启动与验证;任务一 修改core-site.xml配置文件;任务二 修改hdfs-site.xml配置文件(一);任务二 修改hdfs-site.xml配置文件(二);任务二 修改hdfs-site.xml配置文件(三);...

    [hadoop] ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes

    其中一个问题是报but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes 直接解决 在Hadoop安装目录下找到sbin文件夹 在里面修改四个文件 1、对于start-dfs.sh和stop-dfs.sh文件,...

    【大数据】Hadoop常用启动命令.pdf

    包括NameNode、 Secondary NameNode、DataNode、JobTracker、 TaskTrack start-dfs.sh 启动Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode stop-dfs.sh 停⽌Hadoop HDFS守护进程NameNode、...

    Hadoop3.2.2资源包+安装文档

    容错 Hadoop 2.x - 可以通过复制(浪费空间)来处理容错。 Hadoop 3.x - 可以通过Erasure编码处理容错。...Hadoop 3.x - 具有SPOF的功能,因此只要Namenode失败,它就会自动恢复,无需人工干预就可以克服它。

    hadoop常见错误以及处理方法详解

    1、hadoop-root-datanode-master.log 中有如下错误:ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in导致datanode启动不了。原因:每次namenode format...

    高可用性的HDFS:Hadoop分布式文件系统深度实践

    第1章 HDFS HA及解决方案 1.1 HDFS系统架构 1.2 HA定义 1.3 HDFS HA原因分析及应对措施 1.3.1 可靠性 1.3.2 可维护性 1.4 现有HDFS HA解决方案 1.4.1 Hadoop的元数据备份方案 1.4.2 Hadoop的SecondaryNameNode方案 ...

Global site tag (gtag.js) - Google Analytics