环境准备:
Linux 服务器一台(可以使用虚拟机):需要 JDK 运行环境。
安装步骤:
Hadoop 下载
下载地址:
http://archive.apache.org/dist/hadoop/core/
http://mirror.bit.edu.cn/apache/hadoop/common/
# 北京信息学院镜像源 wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz # 官方镜像源(国内下载可能较慢) wget http://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz解压并移动到指定目录
tar -zxvf hadoop-3.2.1.tar.gz mv hadoop-3.2.1 /usr/local/hadoop/修改配置文件
修改core-site.xml
vim etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> <description>NameNode URL</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/work/hadoop/temp</value> <description>hadoop的临时本地文件目录</description> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> <description>读写序列文件缓存大小</description> </property> </configuration>修改hdfs-site.xml
vim etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/work/hadoop/name</value> <description>本地文件系统存储着命令空间和操作日志 </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/work/hadoop/data</value> <description>存储blocks的本地路径列表,用逗号隔开 </description> </property> <property> <name>dfs.replication</name> <value>1</value> <description> 备份数,根据集群节点合理规划</description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>http://localhost:9001</value> <description> namenode 界面访问地址</description> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> <description> 是否使用界面访问</description> </property> </configuration>修改hadoop-env.sh
配置JDK环境变量,hadoop不会读取系统的环境变量,需要手动配置。
vim etc/hadoop/hadoop-env.sh在文件结尾加入下面一行配置:
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_211初始化namenode节点
./bin/hadoop namenode -format此步骤不能多次执行,否则在上传文件时会出现以下错误:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/jcyx/c602d45d-1a84-4f34-9d95-cac9a556721b.txt could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
如果要重新初始化namenode节点,需要先停止所有节点后清空hdfs中配置的data目录(即在core-site.xml中配置的hadoop.tmp.dir对应文件夹),然后在初始化。
可参考此文章:hadoop上传文件错误
启动HDFS
./sbin/start-dfs.sh如果出现了下面的错误:
[root@localhost hadoop-3.2.1]# ./sbin/start-dfs.sh Starting namenodes on [node001] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [localhost] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.解决方法:
分别在四个文件头部添加一些参数
在 sbin/start-dfs.sh 和 sbin/stop-dfs.sh 这两个文件添加下列参数:
#!/usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root在 sbin/start-yarn.sh 和 sbin/stop-yarn.sh 这两个文件添加下列参数:
#!/usr/bin/env bash YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root如果启动不了,报权限被拒绝:
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. Starting namenodes on [192.168.40.138] Last login: Fri Jul 3 06:07:35 EDT 2020 on pts/1 192.168.40.138: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). Starting datanodes Last login: Fri Jul 3 06:07:55 EDT 2020 on pts/1 localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). Starting secondary namenodes [192.168.40.138] Last login: Fri Jul 3 06:07:55 EDT 2020 on pts/1 192.168.40.138: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).需要配置ssh免密登录:
运行:ssh-keygen -t rsa然后拍两下回车(均选择默认)运行: ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.40.138或普通用户: ssh-copy-id NAME@IP再输入192.168.40.138机器上的root密码检查运行情况:
jps结果应该如下:
[root@localhost hadoop-3.2.1]# jps 14646 DataNode 14504 NameNode 15001 Jps 14874 SecondaryNameNode