第一階段 : HDFS 網路設定
1. 登入 HDP120
$ sudo virsh console HDP120 Connected to domain HDP120 Escape character is ^]
2. 設定網路 : 在 HDP120 主機執行
編輯 /etc/hosts 名稱設定檔
輸入所有主機 IP 及名稱
$ nano /etc/hosts 127.0.0.1 localhost 192.168.100.20 HDP120 192.168.100.21 HDP121 192.168.100.22 HDP122[註] 確認 HDP120 不可指定 127.0.0.1 這 Loop Back 位址, 如指定會造成 DataNode 無法連接到 NameNode
使用 scp 命令, 將 /etc/hosts 複製到其他 Hadoop 主機
# 登入密碼為 student $ scp /etc/hosts root@HDP121:/etc/hosts The authenticity of host 'hdp121 (192.168.100.21)' can't be established. ECDSA key fingerprint is d9:3b:ed:58:44:29:33:b9:7e:d7:98:89:3a:01:7c:49. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hdp121,192.168.100.21' (ECDSA) to the list of known hosts. root@HDP121's password: hosts 100% 263 0.3KB/s 00:00 # 登入密碼為 student $ scp /etc/hosts root@HDP122:/etc/hosts The authenticity of host 'hdp122 (192.168.100.22)' can't be established. ECDSA key fingerprint is d9:3b:ed:58:44:29:33:b9:7e:d7:98:89:3a:01:7c:49. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hdp122,192.168.100.22' (ECDSA) to the list of known hosts. root@HDP122's password: hosts 100% 263 0.3KB/s 00:00
3. 設定 SSH 自動登入
以下操作在 NameNode 主機 (HDP120) 執行, 因專職 DataNode 的 Hadoop 主機, 是由 NameNode 透過 ssh 自動登入, 將之啟動, NameNode 也要設定 ssh 自動登入, 來啟動本身 namenode, datanode 及 secondarynamenode 這三個服務.
將 NameNode 的公用金鑰 (id_dsa.pub), 複製到 NameNode 及 DataNode 主機, 並將之更名為 authorized_keys
# scp ~/.ssh/id_dsa.pub root@HDP121:/root/.ssh/authorized_keys root@hdp121's password: id_dsa.pub 100% 601 0.6KB/s 00:00 # scp ~/.ssh/id_dsa.pub root@HDP122:/root/.ssh/authorized_keys root@hdp122's password: id_dsa.pub 100% 601 0.6KB/s 00:00
[註] HDP121 及 HDP122 主機的 /root 目錄中, 必須先產生 .ssh 目錄 (已完成)
4. 測試自動登入, 並取得自動登入主機憑證 (一定要執行)
必須登入 NameNode 及二部 DataNode, 執行以下命令, 需使用電腦名稱(HostName) 來登入, 不可使用 IP 位址來登入, 照以下命令操作 .ssh/known_hosts 這檔案中的憑證, 在執行 start-dfs.sh 時便可正確被使用.
自動登入 HDP120
# ssh HDP120 Linux HDP120 2.6.32-33-generic-pae #72-Ubuntu SMP Fri Jul 29 22:06:29 UTC 2011 i686 GNU/Linux Ubuntu 10.04.4 LTS Welcome to Ubuntu! * Documentation: https://help.ubuntu.com/ Last login: Thu Aug 2 19:00:44 2012 root@HDP120:~# exit
自動登入 HDP121
$ ssh HDP121 # 不可使用 IP 位址 $ filetool.sh -b # 因是 TinyCore 系統, 需執行此命令儲存憑證 $ exit
自動登入 HDP122
$ ssh HDP122 # 不可使用 IP 位址 $ filetool.sh -b # 因是 TinyCore 系統, 需執行此命令儲存憑證 $ exit
第二階段 : NameNode 設定 (HDP120)
1. NameNode 電腦設定檔
編輯 conf/master
$ cd /mnt/hda1/hadoop-1.0.3/
$ nano conf/masters
HDP120
編輯 conf/slaves
$ nano conf/slaves
HDP120
HDP121
HDP122
編輯 conf/core-site.xml
$ cd /mnt/hda1/hadoop-1.0.3
$ mkdir data # 建立 HDFS 資料庫儲存目錄
$ nano conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HDP120:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/hda1/hadoop-1.0.3/data</value>
</property>
</configuration>
[重點]
It is critically important in a real cluster that dfs.name.dir and dfs.data.dir be moved out from hadoop.tmp.dir.
A real cluster should never consider these directories temporary, as
they are where all persistent HDFS data resides. Production clusters
should have two paths listed for dfs.name.dir which are
on two different physical file systems, to ensure that cluster metadata
is preserved in the event of hardware failure.
dfs.name.dir | Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. |
dfs.data.dir | Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. |
編輯 conf/hdfs-site.xml
$ nano conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
[註] dfs.safemode.threshold.pct 的值為 0, NameNode 一啟動不會進入 safe mode (read only)
<property> <name>dfs.safemode.threshold.pct</name> <value>1</value> </property>
2. 重建 data 目錄 (一定要執行此動作)
# rm -r data
# mkdir data
3. 格式化 NameNode 資料庫
# # 因之前已產生資料庫, 所以會詢問你是否重新格式化 ? 請回答 "Y" (一定要大寫) # # hadoop namenode -format 12/08/02 19:18:20 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = HDP120/192.168.100.20 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012 ************************************************************/ Re-format filesystem in /mnt/hda1/hadoop-1.0.3/data/dfs/name ? (Y or N) Y 12/08/02 19:18:24 INFO util.GSet: VM type = 32-bit 12/08/02 19:18:24 INFO util.GSet: 2% max memory = 7.425 MB 12/08/02 19:18:24 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/08/02 19:18:24 INFO util.GSet: recommended=2097152, actual=2097152 12/08/02 19:18:25 INFO namenode.FSNamesystem: fsOwner=root 12/08/02 19:18:25 INFO namenode.FSNamesystem: supergroup=supergroup 12/08/02 19:18:25 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/08/02 19:18:25 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 12/08/02 19:18:25 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/08/02 19:18:25 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/08/02 19:18:26 INFO common.Storage: Image file of size 110 saved in 0 seconds. 12/08/02 19:18:26 INFO common.Storage: Storage directory /mnt/hda1/hadoop-1.0.3/data/dfs/name has been successfully formatted. 12/08/02 19:18:26 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at HDP120/192.168.100.20 ************************************************************/
在上面訊息中, Common Storage 目錄是 /mnt/hda1/hadoop-1.0.3/data/dfs/name.
第三階段 : DataNode 電腦設定 (HDP121,HDP122)
所有 DataNode 電腦均要執行以下設定
1. 設定 DataNode 電腦
# 登入 HDP121, 輸入帳號為 root, 密碼為 student $ sudo virsh console HDP121 Connected to domain HDP121 Escape character is ^] Micro Core Linux 3.8.2 HDP121 login: root Password: # 切換至 /mnt/hda1/hadoop-1.0.3 目錄 $ cd /mnt/hda1/hadoop-1.0.3 $ mkdir data # 建立 HDFS 資料庫儲存目錄
修改 core-site.xml 設定檔
$ nano conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://HDP120:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/mnt/hda1/hadoop-1.0.3/data</value>
</property>
</configuration>
修改 hdfs-site.xml 設定檔
$ nano conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
2. 修改 conf/hadoop-env.sh 設定檔
$ nano conf/hadoop-env.sh
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/mnt/hda1/jdk1.6.0_33 # 修改
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=384 # 修改
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
:
[重點提示]
DataNode 記憶體計算
Datanode 1* 384 (HADOOP_HEAPSIZE) Tasktracker 1 * 384 (HADOOP_HEAPSIZE) Tasktracker child map task 2 * 200 (mapred.tasktracker.map.tasks.maximum=2) Tasktracker child reduce task 2 * 200 (mapred.tasktracker.reduce.tasks.maximum=2) ---------------------------------------------------------------------------------------------------------------------------------------------------- Total 1568
Namenode 記憶體計算
1000 MB per million blocks of storage
----------------------------------------------------------------
新增 Secondary NameNode
NameNode : HDP120
1. 登入 HDP120
$ sudo virsh console HDP120
Connected to domain HDP120
Escape character is ^]
2. 修改 /etc/hosts 檔案
請加入 "192.168.100.30 HDP130" 這行資料
$ nano /etc/hosts
127.0.0.1 localhost
192.168.100.20 HDP120
192.168.100.21 HDP121
192.168.100.22 HDP122
192.168.100.30 HDP130
:
3. 將 /etc/hosts 複製給 HDP130
# scp /etc/hosts root@192.168.100.30:/etc/hosts
The authenticity of host '192.168.100.30 (192.168.100.30)' can't be established.
RSA key fingerprint is 8d:b7:99:60:7f:39:05:b5:09:5f:ed:a4:af:27:cb:46.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.100.30' (RSA) to the list of known hosts.
root@192.168.100.30's password:
hosts 100% 306 0.3KB/s 00:00
4. 將 HDP120 的 SSH Server 憑證, 複製給 HDP130
# scp ~/.ssh/id_dsa.pub root@HDP130:/root/.ssh/authorized_keys
The authenticity of host 'hdp130 (192.168.100.30)' can't be established.
RSA key fingerprint is 8d:b7:99:60:7f:39:05:b5:09:5f:ed:a4:af:27:cb:46.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hdp130' (RSA) to the list of known hosts.
root@hdp130's password:
id_dsa.pub 100% 601 0.6KB/s 00:00
5. 指定 HDP130 為 Secondary NameNode
masters 這個檔案, 用來指定 Secondary NameNode, 而不是 NameNode
# cd /mnt/hda1/hadoop-1.0.3
# nano conf/masters
HDP130
6. 登入 HDP130, 儲存新複製的檔案
# ssh HDP130
# filetool.sh -b
Backing up files to /mnt/hda1/tce/mydata.tgz
# exit
7. 宣告 Secondary NameNode 的資料傳送 Port
在 hdfs-site.xml 這設定檔宣告, 如下 :
$ nano conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.http.address</name>
<value>HDP120:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>HDP130:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
8. 脫離 HDP120 控制台
請按 Ctrl + ]
設定 DataNode 1. 登入 HDP131 $ sudo virsh console HDP131 Connected to domain HDP131 Escape character is ^] Micro Core Linux 3.8.2 HDP131 login: root Password: [註] 登入帳號 root, 密碼為 student 2. 修改 core-site.xml 設定檔 # cd /mnt/hda1/hadoop-1.0.3/ # nano conf/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://HDP120:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/mnt/hda1/hadoop-1.0.3/data</value> </property> </configuration> 3. 修改 hdfs-site.xml 設定檔 # nano conf/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration> 4. 啟動 datanode 服務 (在131下command) # hadoop-daemon.sh start datanode [註] 啟動 tasktracker 服務命令如下 : # hadoop-daemon.sh start tasktracker 5. 檢測 HDP131 是否成為 DataNode # hadoop dfsadmin -report Configured Capacity: 16662200320 (15.52 GB) Present Capacity: 13815033856 (12.87 GB) DFS Remaining: 13814894592 (12.87 GB) DFS Used: 139264 (136 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 4 (4 total, 0 dead) Name: 192.168.100.22:50010 Decommission Status : Normal Configured Capacity: 4226125824 (3.94 GB) DFS Used: 28672 (28 KB) Non DFS Used: 397152256 (378.75 MB) DFS Remaining: 3828944896(3.57 GB) DFS Used%: 0% DFS Remaining%: 90.6% Last contact: Fri Apr 19 14:44:31 CST 2013 Name: 192.168.100.31:50010 Decommission Status : Normal Configured Capacity: 4226125824 (3.94 GB) DFS Used: 28672 (28 KB) Non DFS Used: 833675264 (795.05 MB) DFS Remaining: 3392421888(3.16 GB) DFS Used%: 0% DFS Remaining%: 80.27% Last contact: Fri Apr 19 14:44:30 CST 2013 Name: 192.168.100.20:50010 Decommission Status : Normal Configured Capacity: 3983822848 (3.71 GB) DFS Used: 40960 (40 KB) Non DFS Used: 1219186688 (1.14 GB) DFS Remaining: 2764595200(2.57 GB) DFS Used%: 0% DFS Remaining%: 69.4% Last contact: Fri Apr 19 14:44:29 CST 2013 Name: 192.168.100.21:50010 Decommission Status : Normal Configured Capacity: 4226125824 (3.94 GB) DFS Used: 40960 (40 KB) Non DFS Used: 397152256 (378.75 MB) DFS Remaining: 3828932608(3.57 GB) DFS Used%: 0% DFS Remaining%: 90.6% Last contact: Fri Apr 19 14:44:30 CST 2013
移除 DataNode - HDP120
1. 登入 HDP120
$ sudo virsh console HDP120
Connected to domain HDP120
Escape character is ^]
2. 建立 DataNode 移除名單檔
# cd /mnt/hda1/hadoop-1.0.3
# nano conf/exclude
HDP120
3. 設定 DataNode 移除名單檔
# nano conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.http.address</name>
<value>HDP120:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>HDP130:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/mnt/hda1/hadoop-1.0.3/conf/exclude</value>
</property>
</configuration>
4. 套用 DataNode 移除名單檔
# hadoop dfsadmin -report
Configured Capacity: 16662200320 (15.52 GB)
Present Capacity: 13814939633 (12.87 GB)
DFS Remaining: 13814788096 (12.87 GB)
DFS Used: 151537 (147.99 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (4 total, 1 dead)
Name: 192.168.100.22:50010
Decommission Status : Normal
Configured Capacity: 4226125824 (3.94 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 397160448 (378.76 MB)
DFS Remaining: 3828936704(3.57 GB)
DFS Used%: 0%
DFS Remaining%: 90.6%
Last contact: Fri Apr 19 20:24:45 CST 2013
Name: 192.168.100.31:50010
Decommission Status : Normal
Configured Capacity: 4226125824 (3.94 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 833683456 (795.06 MB)
DFS Remaining: 3392401408(3.16 GB)
DFS Used%: 0%
DFS Remaining%: 80.27%
Last contact: Fri Apr 19 20:24:45 CST 2013
Name: 192.168.100.21:50010
Decommission Status : Normal
Configured Capacity: 4226125824 (3.94 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 397160448 (378.76 MB)
DFS Remaining: 3828924416(3.57 GB)
DFS Used%: 0%
DFS Remaining%: 90.6%
Last contact: Fri Apr 19 20:24:42 CST 2013
Name: 192.168.100.20:50010
Decommission Status : Decommissioned
Configured Capacity: 3983822848 (3.71 GB)
DFS Used: 40945 (39.99 KB)
Non DFS Used: 1219256335 (1.14 GB)
DFS Remaining: 2764525568(2.57 GB)
DFS Used%: 0%
DFS Remaining%: 69.39%
Last contact: Fri Apr 19 20:12:49 CST 2013
------------------------------
新的DataNode需要安裝
1.OS(Redhat or Fedora)
2.JDK (有兩種:Oracale ;open JDK)=>安裝Oracale JDK
3.Hadoopp套件(Apache.org)
4.設定檔設定
conf/core-site.xml
fs.default.name
hadoop.tmp.dir
沒有留言:
張貼留言