PancrasL的博客

单机环境的 Hadoop 平台部署

2021-01-11

Hadoop

前期准备

高级操作系统课程实验之一是部署单机Hadoop,记录一下搭建过程。

系统环境

  • ubuntu18.04
  • 以root用户操作

应用需求

  • 确保安装了ssh服务
  • 关闭了ubuntu防火墙(云服务器需要配置安全组,放行端口)

修改hosts

1
2
3
4
5
6
$ hostnamectl set-hostname master
$ vi /etc/hosts
#添加本地dns映射
...
你的ip master
...

安装Java

  • 安装java
1
2
$ apt update
$ apt install openjdk-8-jdk
  • 添加环境变量
1
2
3
4
5
$ vi /etc/profile
# 添加如下内容
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

$ source /etc/profile
  • 查看是否安装成功
1
2
3
4
$ java -version
openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-8u275-b01-0ubuntu1~18.04-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)

安装Hadoop

准备Hadoop安装包

  • 下载hadoop2.x压缩包
1
2
$ cd ~
$ wget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
  • 解压缩
1
2
3
$ tar -xzf hadoop-2.10.1.tar.gz
$ ls
hadoop-2.10.1 hadoop-2.10.1.tar.gz

修改配置文件

  • 进入 hadoop-2.10.1/etc/hadoop 文件夹
1
2
3
4
5
6
7
8
9
10
11
12
$ cd /root/hadoop-2.10.1/etc/hadoop
$ ls
capacity-scheduler.xml httpfs-env.sh mapred-env.sh
configuration.xsl httpfs-log4j.properties mapred-queues.xml.template
container-executor.cfg httpfs-signature.secret mapred-site.xml.template
core-site.xml httpfs-site.xml slaves
hadoop-env.cmd kms-acls.xml ssl-client.xml.example
hadoop-env.sh kms-env.sh ssl-server.xml.example
hadoop-metrics2.properties kms-log4j.properties yarn-env.cmd
hadoop-metrics.properties kms-site.xml yarn-env.sh
hadoop-policy.xml log4j.properties yarn-site.xml
hdfs-site.xml mapred-env.cmd
  • 修改hadoop-env.sh
1
2
3
4
5
$ vi hadoop-env.sh
# 修改JAVA_HOME字段
...
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
...
  • 修改core-site.xml
1
2
3
4
5
6
7
8
9
10
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop</value>
</property>
</configuration>
  • 修改hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/dn</value>
</property>
</configuration>
  • 修改mapred-site.xml(从模板创建:cp mapred-site.xml.template mapred-site.xml )
1
2
3
4
5
6
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
  • 修改yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<configuration>
<!-- 指定resourcemanager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<!-- 指定reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/hadoop/yarn/nm</value>
</property>
</configuration>
  • 修改slaves
1
2
$ vi slaves
master
  • 创建相关文件夹
1
2
3
4
5
$ mkdir -p /tmp/hadoop
$ mkdir -p /data/hadoop/hdfs/nn
$ mkdir -p /data/hadoop/hdfs/dn
$ mkdir -p /data/hadoop/hdfs/snn
$ mkdir -p /data/hadoop/hdfs/nm
  • 配置Hadoop环境变量
1
2
3
4
5
6
7
8
9
$ vi /etc/profile
# 添加如下内容
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

export HADOOP_HOME=/root/hadoop-2.10.1

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

$ source /etc/profile

安装Hadoop

  • 格式化namenode
1
2
3
4
$ hdfs namenode -format
...
21/01/10 14:38:44 INFO common.Storage: Storage directory /data/hadoop/hdfs/nn has been successfully formatted.
...
  • 启动Hadoop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ start-dfs.sh
Starting namenodes on [master]
The authenticity of host 'master (172.17.212.17)' can't be established.
ECDSA key fingerprint is SHA256:aOcOvEQXhyzfs4i5vOBoM2raFUt7tqCo22B5zCS4Tto.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,172.17.212.17' (ECDSA) to the list of known hosts.
root@master's password:
master: starting namenode, logging to /root/hadoop-2.10.1/logs/hadoop-root-namenode-master.out
root@master's password:
master: starting datanode, logging to /root/hadoop-2.10.1/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:aOcOvEQXhyzfs4i5vOBoM2raFUt7tqCo22B5zCS4Tto.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
root@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /root/hadoop-2.10.1/logs/hadoop-root-secondarynamenode-master.out
  • 启动Yarn
1
2
3
4
5
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /root/hadoop-2.10.1/logs/yarn-root-resourcemanager-master.out
root@master's password:
master: starting nodemanager, logging to /root/hadoop-2.10.1/logs/yarn-root-nodemanager-master.out
  • 查看是否启动成功
1
2
3
4
5
6
7
$ jps
20771 NodeManager
21557 Jps
19784 NameNode
20233 SecondaryNameNode
19979 DataNode
20620 ResourceManager
  • 访问服务
1
2
HDFS:http://ip:50070
YARN:http://ip:8088

运行Hadoop程序

上传文件到Hadoop集群

  • 上传文件 hadoop-2.10.1.tar.gz 到Hadoop集群
1
2
3
$ ls
hadoop-2.10.1 hadoop-2.10.1.tar.gz
$ hadoop fs -put hadoop-2.10.1.tar.gz hdfs://master:9000/

image-20210110145747220

image-20210110150132629

计算圆周率

1
2
3
4
$ cd ~/hadoop-2.10.1/share/hadoop/mapreduce/
$ hadoop jar hadoop-mapreduce-examples-2.10.1.jar pi 5 5
...
Estimated value of Pi is 3.68000000000000000000

单词计数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 创建源文件并上传到HDFS
$ cat >> words.txt <<EOF
hello hello hello
good good
welcome to hadoop, nice to meet you.
EOF

# 在HDFS中创建文件夹
$ hadoop fs -mkdir /wordcount
$ hadoop fs -mkdir /wordcount/input
$ hadoop fs -put words.txt /wordcount/input

# 运行单词计数程序
$ cd ~/hadoop-2.10.1/share/hadoop/mapreduce/
$ hadoop jar hadoop-mapreduce-examples-2.10.1.jar wordcount /wordcount/input /wordcount/output
# 查看结果
$ hadoop fs -ls /wordcount/output
Found 2 items
-rw-r--r-- 1 root supergroup 0 2021-01-10 15:17 /wordcount/output/_SUCCESS
-rw-r--r-- 1 root supergroup 61 2021-01-10 15:17 /wordcount/output/part-r-00000
$ hadoop fs -cat /wordcount/output/part-r-00000
good 2
hadoop, 1
hello 3
meet 1
nice 1
to 2
welcome 1
you. 1