Hadoop-集群部署


示例:

IP(Hostname) Server
192.168.140.130 (hadoop01) Nn Dn Snn Nm
192.168.140.131 (hadoop02) Dn Nm Rm
192.168.140.132 (hadoop03) Dn Nm

节点类型参考: Hadoop-2.节点


以下操作需要在三台机器上同步进行:

  1. 同步时间
  2. 修改hosts文件 vi /etc/hosts:
    • 192.168.140.130 hadoop01
    • 192.168.140.131 hadoop02
    • 192.168.140.132 hadoop03
  3. 关闭防火墙
    • systemctl stop firewalld.service
    • systemctl disable firewalld.service
    • firewall-cmd --stat
  4. 修改主机名 vi /etc/hostname
    • 分别配置为 hadoop01, hadoop02, hadoop03
  5. 配置免密钥登陆(主要针对 Nn 和 Rm 节点)
    • ssh-keygen -t rsa
    • ssh-copy-id hadoop01
    • ssh-copy-id hadoop02
    • ssh-copy-id hadoop03
  6. 安装Java, 并配置环境变量:(Linux-yum安装jdk11
  7. 解压 Hadoop 软件包: tar -zxvf hadoop-3.3.5.tar.gz -C /opt/hadoop
  8. 修改配置文件: /opt/hadoop/hadoop-3.3.5/etc/hadoop/
    • hadoop-env.sh: export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.18.0.10-2.el9_1.x86_64
    • core-site.xml(指定 Nn 的位置, 即 hadoop01)
      <configuration>
          <property>
              <name>fs.defaultFS</name>
              <value>hdfs://hadoop01:9000</value>
          </property>
          <property>
              <name>hadoop.tmp.dir</name>
              <value>/tmp/hadoop</value>
          </property>
      </configuration>
      
    • hdfs-site.xml(指定每个文件有3个副本, 并指定 Snn 的位置, 即 hadoop01)
      <configuration>
         <property>
             <name>dfs.replication</name>
             <value>3</value>
         </property>
         <property>
             <name>dfs.secondary.http.address</name>
             <value>hadoop01:50090</value>
         </property>
      </configuration>
      
    • mapred-site.xml (指定使用 yarn 方式运行 mr)
      <configuration>
         <property>
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
         </property>
      </configuration>
      
    • yarn-site.xml(指定 Rm 的位置, 即 hadoop02)
      <configuration>
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>hadoop02</value>
          </property>
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
      </configuration>
      
    • workers (slaves in Hadoop2)
      • hadoop01
      • hadoop02
      • hadoop03
  9. 格式化: hdfs namenode -format(或 hadoop namenode -format)
  10. (可选) 将一台机器配置完的目录复制到其他机器
    • scp -r /opt/hadoop/hadoop-3.3.5 root@hadoop02:/opt/hadoop/hadoop-3.3.5
    • scp -r /opt/hadoop/hadoop-3.3.5 root@hadoop03:/opt/hadoop/hadoop-3.3.5
  11. 配置环境变量
    • vi /etc/profile
    • export HADOOP_HOME=/opt/hadoop/hadoop-3.3.5
    • export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    • source /etc/profile
  12. 启动
    • start-dfs.sh(在 Nn 节点启动, 即 hadoop01)
    • start-yarn.sh(在 Rm 节点启动, 即 hadoop02)
  13. web 访问端口
    • (Nn 节点, 即 hadoop01) 50070(Hadoop2) | 9870(Hadoop3)
    • (Rm 节点, 即 hadoop02) 8088
  14. 测试
    • jps 查看节点是否启动
    • 测试上传文件 hdfs dfs -put /path/to/file /target/path
      • 访问 hadoop01:9870: Utilities -> Browse the file system -> Go!
      • 检查上传的文件是否已经存在3个副本 (Replication)
    • 测试圆周率计算(如果报 ClassNotFoundException, 请参考文章: Hadoop-ClassNotFoundException)
      • hadoop jar /opt/hadoop/hadoop-3.3.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi 10 10

文章作者: 钱不寒
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 钱不寒 !
  目录