一般来说,hadoop主要有三个默认参数文件,分别为core-default.xml,hdfs-default.xml,mapred-default.xml。其它需要用户配置的参数文件为core-site.xml,hdfs-site.xml,mapred-site.xml,下面分别介绍下相关参数的含义
三个重要配置文件
1,core-site.xml
[node1 conf]$ cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.75:9000/</value>
<description>URI of NameNode.</description>
</property>
--fs.default.name 缺省的文件URI标识设定。
<property>
<name>hadoop.tmp.dir</name>
<value>/data1/tmp</value>
<description>Temp dir.</description>
</property>
--hadoop.tmp.dir 临时目录设定
<property>
<name>dfs.hosts.exclude</name>
<value>/home/ocdc/hadoop-ocdc/conf/excludes</value>
<description>List of excluded DataNodes.</description>
</property>
-- dfs.hosts.exclude Datanode的黑名单
<property>
<name>fs.default.name0</name>
<value>hdfs://0.0.0.0:9000/</value>
</property>
--fs.default.name0 Avatar hadoop的配置,主namenode的URI,目前设置为空
<property>
<name>fs.default.name1</name>
<value>hdfs://192.168.0.69:9000/</value>
</property>
</configuration>
--fs.default.name1 备namenode的URI
2,hdfs-site.xml
[node1 conf]$ cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- The necessary parameters, please modify value
-->
<property>
<name>dfs.name.dir</name>
<value>/home/ocdc/hadoop-ocdc/data/namenode</value>
<description>File fsimage location.
If this is a comma-delimited list of directories then the name
table is replicated in all of the directories, for redundancy.
</description>
</property>
--默认配置是${hadoop.tmp.dir}/dfs/name,namenode相关数据存放的地方,如果此处配置了多个目录,这数据在这些目录中各方一份
<property>
<name>dfs.name.edits.dir</name>
<value>/home/ocdc/hadoop-ocdc/data/editlog</value>
<description>File edits location.
If this is a comma-delimited list of directories then the name
table is replicated in all of the directories, for redundancy.
</description>
</property>
文件编辑的目录,也可设置多个目录的冗余配置
<property>
<name>dfs.data.dir</name>
<value>/data01/data,/data02/data,/data03/data,/data04/data,/data05/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
datanode节点上data存储的目录,可以配置多个目录共同存放数据,如果目录不存在,则忽略。此处配置了5个目录存放data
[hadoop@node1 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_wtoc3-lv_root
25G 3.3G 20G 15% /
tmpfs 32G 88K 32G 1% /dev/shm
/dev/cciss/c0d0p1 485M 38M 422M 9% /boot
/dev/mapper/vg_wtoc300-lv_data1
917G 1023M 870G 1% /data1
/dev/mapper/vg_wtoc301-lv_data2
917G 3.1G 868G 1% /data2
/dev/mapper/vg_wtoc3-lv_home
80G 13G 63G 17% /home
/dev/mapper/mpathe 474G 381G 69G 85% /data02
/dev/mapper/mpathf 474G 379G 71G 85% /data03
tmpfs 16G 3.4G 13G 22% /dev/flare
/dev/mapper/mpathd 474G 378G 72G 85% /data01
fuse_dfs 12T 8.9T 2.8T 77% /home/ocdc/fuse-dfs
/dev/mapper/mpathg 474G 373G 77G 83% /data04
/dev/mapper/mpathh 474G 373G 77G 83% /data05
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:50010</value>
<description>
The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
</description>
</property>
datanode监听的地址
[hadoop@node1 ~]$ netstat -an|grep 50010|grep LISTEN
tcp 0 0 ::ffff:192.168.0.70:50010 :::* LISTEN
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
datanode的HTTP服务器和端口,默认50075
[hadoop@node1 ~]$ netstat -an|grep 50075
tcp 0 0 ::ffff:192.168.0.70:50075 :::* LISTEN
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
<description>
The datanode ipc server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
atanode的RPC服务器地址和端口,默认50020
<property>
<name>dfs.datanode.handler.count</name>
<value>20</value>
<description>The number of server threads for the datanode.</description>
</property>
--datanode 线程数
<property>
<name>dfs.namenode.handler.count</name>
<value>20</value>
<description>The number of server threads for the namenode.</description>
</property>
--namenode的线程数
<property>
<name>dfs.web.ugi</name>
<value>hadoop,hadoop</value>
<description>The user account used by the web interface.
Syntax: USERNAME,GROUP1,GROUP2, ...
</description>
</property>
--Web接口访问的用户名和组的帐户设定,此处设置和主机配置明显不符合
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
--块复制数量,如果是裸盘,默认设置为3,此处为2
<property>
<name>dfs.replication.min</name>
<value>1</value>
<description>Minimal block replication.
</description>
</property>
--块复制的最小数量
<property>
<name>dfs.support.append</name>
<value>true</value>
<description>set if hadoop support append</description>
</property>
--是否运行文件追加写,默认为false
<property>
<name>fs.checkpoint.period</name>
<value>30</value>
<description>set if hadoop support append</description>
</property>
---checkpoint周期,默认3600秒
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
---namenode的http访问地址和端口
<property>
<name>fs.checkpoint.dir</name>
<value>${hadoop.tmp.dir}/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFS secondary
name node should store the temporary images to merge.
If this is a comma-delimited list of directories then the image is
replicated in all of the directories for redundancy.
</description>
</property>
--第二(备份)DFS namenode目录
<property>
<name>dfs.balance.bandwidthPerSec</name>
<value>10485760</value>
<description>
Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
</description>
</property>
--用来做负载均衡功能时,可利用的的最大带宽
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
--Hadoop HDFS Datanode同时处理文件的上限,有点类似于linux的nfile
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
--文件操作时,是否检查权限????
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>false</value>
</property>
----???
<property>
<name>dfs.name.dir.shared0</name>
<value>/namenode/namenode0</value>
</property>
--AvatarNode0上HDFS的镜像日志存储目录
<property>
<name>dfs.name.dir.shared1</name>
<value>/namenode/namenode1</value>
</property>
<property>
<name>dfs.name.edits.dir.shared0</name>
<value>/namenode/editlog0</value>
</property>
<property>
<name>dfs.name.edits.dir.shared1</name>
<value>/namenode/editlog1</value>
</property>
---在此配置文件里,除了Hadoop的配置项外,Avatar的配置项dfs.name.dir.shared0,dfs.name.edits.dir.shared0,dfs.name.dir.shared1,dfs.name.edits.dir.shared1,分别为AvatarNode0上HDFS的镜像日志存储目录,AvatarNode1上HDFS的镜像日志存储目录。可以看到这些目录都在NFS的共享目录中,当AvatarNode0上运行的是PrimaryNameNode时,会向dfs.name.edits.dir.share0中写日志,AvatarNode1上的StandbyNameNode就会去读这些日志,反之,当AvatarNode1上运行的是PrimaryNameNode时,会向dfs.name.edits.dir.share1中写日志,AvatarNode0上的StandbyNameNode就会去读这些日志。
<property>
<name>dfs.http.address0</name>
<value>0.0.0.0:50070</value>
</property>
<property>
<name>dfs.http.address1</name>
<value>192.168.0.69:50070</value>
</property>
</configuration>
---0.0.0.0,表示可通过任一网卡访问http接口
3,mapred-site.xml
第三个配置文件
[node1 conf]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.75:9002</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
----JobTracker 的主机(或者IP)和端口。
<property>
<name>mapred.job.tracker.http.address</name>
<value>192.168.0.75:50030</value>
<description>
The job tracker http server address and port the server will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>
---job tracker http server的端口和地址
<property>
<name>mapred.job.tracker.handler.count</name>
<value>2</value>
<description>
The number of server threads for the JobTracker. This should be roughly
4% of the number of tasktracker nodes.
</description>
</property>
--JobTracker的线程数,为tasktracker的4%
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>7</value>
</property>
--可同时运行的map任务数
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
<description>The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
</description>
</property>
-- 最大reduce并发数
--这2个参数分别是用来设置的map和reduce的并发数量。实际作用的就是控制同时运行的task的数量
<property>
<name>mapred.task.tracker.report.address</name>
<value>127.0.0.1:0</value>
<description>The interface and port that task tracker server listens on.
Since it is only connected to by the tasks, it uses the local interface.
EXPERT ONLY. Should only be changed if your host does not have the loopback
interface.</description>
</property>
--tasktracker监听的地址,除非loopback没有设置,一般设置为127.0.0.1
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description>The local directory where MapReduce stores intermediate
data files. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
--map存储临时数据的地方,可以设置多个目录
<property>
<name>mapred.system.dir</name>
<value>${hadoop.tmp.dir}/mapred/system</value>
<description>The shared directory where MapReduce stores control files.
</description>
</property>
--Map/Reduce存放控制文件的目录, Map/Reduce框架存储系统文件的HDFS路径。
<property>
<name>mapred.temp.dir</name>
<value>${hadoop.tmp.dir}/mapred/temp</value>
<description>A shared directory for temporary files.
</description>
</property>
--mapred临时文件存放地
<property>
<name>mapred.map.tasks</name>
<value>1</value>
<description>The default number of map tasks per job.
Ignored when mapred.job.tracker is "local".
</description>
</property>
--每个job默认的task数
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
<description>The default number of reduce tasks per job. Typically set to 99%
of the cluster's reduce capacity, so that if a node fails the reduces can
still be executed in a single wave.
Ignored when mapred.job.tracker is "local".
</description>
</property>
---每个job的默认的reduce任务数
<property>
<name>hadoop.job.history.user.location</name>
<value>none</value>
<description> User can specify a location to store the history files of
a particular job. If nothing is specified, the logs are stored in
output directory. The files are stored in "_logs/history/" in the directory.
User can stop logging by giving the value "none".
</description>
</property>
---
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
--map输出是否压缩
<property>
<name>mapred.tasktracker.expiry.interval</name>
<value>30000</value>
</property>
---tasktracker过期时间,30000秒内不发送心跳,则过期
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>-1</value>
<description>How many tasks to run per jvm. If set to -1, there is
no limit.
</description>
</property>
---每虚拟机运行的任务数
<property>
<name>mapred.task.timeout</name>
<value>90000000</value>
<description>The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string.
</description>
</property>
---如果任务无读无写时的时间耗时为90000000/1000/60/60=25小时,将被终止.默认10分钟
<property>
<name>mapred.reduce.parallel.copies</name>
<value>20</value>
</property>
---复制阶段时reduce并行传送的值。默认是5
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
</property>
---一般来说,都是reduce耗费内存比较大,这个选项正是用来设置JVM堆的最大可用内存,但是也不要设置太大,如果超过2G,应该考虑从程序设计角度去优化。
<!-- FAIR Scheduler
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
---设定任务的执行计划实现类,默认值是org.apache.hadoop.mapred.JobQueueTaskScheduler,非默认说明使用Fair Scheduler算法代替FIFO
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>conf/pool-site.xml</value>
</property>
<property>
<name>mapred.fairscheduler.poolnameproperty</name>
<value>pool.name</value>
</property>
<property>
<name>pool.name</name>
<value>${user.name}</value>
</property>
-->
--- FAIR Scheduler的相关配置
<!-- FAST Scheduler
<property>
<name>mapred.jobtracker.taskScheduler</name>
<value>org.apache.hadoop.mapred.FastScheduler</value>
</property>
<property>
<name>mapred.fastscheduler.standard</name>
<value>100</value>
</property>
<property>
<name>mapred.fastscheduler.scale</name>
<value>0.8</value>
</property>
-->
---默认调度算法FIFO,计算能力调度算法Capacity Scheduler(Yahoo!开发),公平份额调度算法Fair Scheduler(Facebook开发)
<!-- Average Schedule
<property>
<name>mapred.jobtracker.priority.average</name>
<value>false</value>
</property>
-->
</configuration>
-
参考资料
http://blog.csdn.net/yangjl38/article/details/7583374
http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html#Configuring+the+Hadoop+Daemons
分享到:
相关推荐
Hadoop配置文件说明;4.1.1 Hadoop环境配置;4.1.2 Hadoop守护进程环境配置;4.1.3 Hadoop配置参数格式;4.1.3 Hadoop配置参数格式;4.1.4 获得Hadoop集群全部配置信息;4.2 在Master节点上安装Hadoop;(1)解压缩hadoop-...
Hadoop 三个配置文件的参数含义说明;Hadoop core-site.xml;hdfs-site.xml;mapred-site.xml
hadoop作业调优参数整理及原理,并且针对部分的原理和视图详细说明
Hadoop集群参数的自动调优,该文档写得比较详细,是个不错的资料
hadoop的api手册,包含hadoop开放的编程接口及参数说明
用户可以通过dfsadmin -safemode $value来操作安全模式,参数$value的说明如下: enter – 进入安全模式 leave – 强制NameNode离开安全模式 get – 返回安全模式是否开启的信息 wait – 等待,一直到安全模式结束。...
1 安装Linux 3 1.1 安装wmware11 3 1.1.1 待补充 3 1.2 安装centos6.5 3 1.2.1 注意安装细节 3 ...6.2.2 配置参数说明 25 6.2.3 拷贝到其他的环境 26 6.2.4 启动 26 6.2.5 验证 26 6.2.6 基本命令 26
该文件是hadoop集群配置所需要的一些.xml文件,是修改好之后的配置文件,配置参数里做了相关的解释说明
1、资源内容:基于Hadoop的商品推荐系统+源代码+文档说明 2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程思路清晰、注释明细,都经过测试运行成功,功能ok的情况下才上传的。 3...
本手册提供了hadoop,hive,hbase的详细安装和使用说明,大部分参数都根据实际情况的需要做了调整。另外文章中还提供了FairScheduler,CapacityTaskScheduler,机架感知的配置,网上很难找到,(fair/capacity ...
2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程思路清晰、注释明细,都经过测试运行成功,功能ok的情况下才上传的。 3、适用对象:计算机,电子信息工程、数学等专业的大学生...
1、资源内容:基于springboot,spark,hadoop的电影评分网站+源代码+文档说明 2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程思路清晰、注释明细,都经过测试运行成功,功能ok的...
hive参数配置说明大全,详细说个各个参数的作用用法
1、资源内容:基于hadoop实现维基百科词条倒排索引+源代码+文档说明+配置过程文档 2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程思路清晰、注释明细,都经过测试运行成功,功能...
1、资源内容:大数据综合实验,基于mapreduce的成绩分析系统,引入hadoop作云存储+源代码+文档说明 2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程思路清晰、注释明细,都经过...
基于训练得到的模型参数(即Nc和Ncw,其中,c表示情感标签类别,c∈{好评,差评},w∈V,V是“学号_上传文件.data”数据集包含的中文词典集合),对“test.txt”数据集中的各条记录进行“情感标签”判别。...
2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程思路清晰、注释明细,都经过测试运行成功,功能ok的情况下才上传的。 3、适用对象:计算机,电子信息工程、数学等专业的大学生...
1、资源内容:互联网行业分析,数据源于前程堪忧招聘网站,数据分析基于Hadoop+Spark平台,数据大屏基于Echarts+源代码+文档说明 2、代码特点:内含运行结果,不会运行可私信,参数化编程、参数可方便更改、代码编程...
基于Hadoop的分布式文件系统,使用Java语言开发实现了一个本地文件管理系统,其中文件存在于HDFS集群中,通过Java开发的客户端软件进行管理,其功能包括:1、文件分块、加密并上传待HDFS文件系统 2、文件块下载、...
程序列表及其功能说明,按提交文件夹中的顺序排列。 每个程序都可以使用java编译。 MapReduce 程序需要 jared,而 HBase 程序不需要 jared。 Hadoop 程序应该运行在 Hadoop 的主目录中,而 HBase 程序可以在任何...