Sqoop安装以及指令学习 ----------语句学习一条龙

    技术2022-07-11  83

    Sqoop:

    Sqoop介绍Sqoop安装Sqoop操作连接mysqlHadoop拉取全量导入:按sql语句导入:Append 追加:

    Sqoop介绍

    Sqoop是一个用于在Hadoop和关系型数据库之间传输数据的工具 将数据从RDBMS导入到HDFS RDBMS:HDFS、Hive、Hbase从HDFS导出数据到RDBMS使用MapReduce导入和导出数据,提供并行操作和容错 目标用户 系统管理员、数据库管理员大数据分析师、大数据开发工程师等

    Sqoop安装

    1.解压安装:

    [root@zjw opt]# tar -zxf sqoop-1.4.6-cdh5.14.2.tar.gz [root@zjw opt]# mv sqoop-1.4.6-cdh5.14.2 soft/sqoop146

    2.添加所需驱动 连接Hadoop得三个驱动分别在这三个位置:

    连接mysql的驱动可在自己idea maven本地仓库找。 添加驱动: 3. 配置文件 首先准备好hadoop和hive的环境变量:

    [root@zjw conf]# echo $HADOOP_HOME /opt/soft/hadoop/hadoop-2.6.0-cdh5.14.2 [root@zjw conf]# echo $HIVE_HOME\ > /opt/soft/hive/hive-1.1.0-cdh5.14.2 export HADOOP_COMMON_HOME=/opt/soft/hadoop/hadoop-2.6.0-cdh5.14.2

    复制临时配置文件:

    [root@zjw conf]# cp sqoop-env-template.sh sqoop-env.sh

    在里面添加环境变量:

    #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/opt/soft/hadoop/hadoop-2.6.0-cdh5.14.2 #set the path to where bin/hbase is available #export HBASE_HOME= #Set the path to where bin/hive is available export HIVE_HOME=/opt/soft/hive/hive-1.1.0-cdh5.14.2 #Set the path for where zookeper config dir is export ZOOCFGDIR=/opt/soft/zookeeper/zookeeper-3.4.5-cdh5.14.2/conf

    4.环境变量配置

    # SQOOP_HOME export SQOOP_HOME=/opt/soft/sqoop146 export PATH=$PATH:$SQOOP_HOME/bin

    然后激活环境变量:

    [root@zjw sqoop146]# vi /etc/profile [root@zjw sqoop146]# source /etc/profile

    Sqoop操作

    连接mysql

    连接mysql 拉取databases 语句: [root@zjw conf]# sqoop list-databases --username root --password 1234 --connect jdbc:mysql://192.168.56.100:3306/ 执行结果: [root@zjw conf]# sqoop list-databases --username root --password 1234 --connect jdbc:mysql://192.168.56.100:3306/ Warning: /opt/soft/sqoop146/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /opt/soft/sqoop146/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 20/06/26 10:13:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2 20/06/26 10:13:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/06/26 10:13:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Fri Jun 26 10:13:11 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. information_schema hive mydemo mysql performance_schema sys 查看表 语句: [root@zjw conf]# sqoop list-tables --username root --password 1234 --connect jdbc:mysql://192.168.56.100:3306/mydemo 执行结果: [root@zjw conf]# sqoop list-tables --username root --password 1234 --connect jdbc:mysql://192.168.56.100:3306/mydemo Warning: /opt/soft/sqoop146/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /opt/soft/sqoop146/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 20/06/26 10:44:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2 20/06/26 10:44:57 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/06/26 10:44:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. Fri Jun 26 10:44:58 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. user

    Hadoop拉取

    启动hadoop 然后准备利用sqoop导出:

    全量导入:

    [root@zjw conf]# sqoop import --connect jdbc:mysql://192.168.56.100:3306/mydemo --driver com.mysql.jdbc.Driver --username root --password 1234 --table user --target-dir /tmp/user -m 3

    此时会可能会出现少jar包的错误:

    下载地址:java-json.jar

    再次执行: 查看其中的一个分片内容:

    [root@zjw conf]# hdfs dfs -text /tmp/user/part-m-00000 20/06/26 11:01:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1,zs,male

    按sql语句导入:

    [root@zjw conf]# sqoop import --connect jdbc:mysql://192.168.56.100:3306/mydemo --driver com.mysql.jdbc.Driver --username root --password 1234 --query "select * from user where gender='male' and \$CONDITIONS" --target-dir /tmp/user1 --split-by Id --fields-terminated-by ',' -m 1

    查看查询的内容:

    [root@zjw conf]# hdfs dfs -text /tmp/user1/part-m-00000 20/06/26 11:33:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1,zs,male

    Append 追加:

    [root@zjw myshl]# sqoop import --username root --password 1234 --connect jdbc:mysql://192.168.56.100:3306/mydemo --driver com.mysql.jdbc.Driver --query "select * from myorder where orderdate='2020-06-25' and \$CONDITIONS" --incremental append --check-column orderdate --target-dir /mydata/mytest -m 1

    然后追加6-26日的:

    [root@zjw myshl]# sqoop import --username root --password 1234 --connect jdbc:mysql://192.168.56.100:3306/mydemo --driver com.mysql.jdbc.Driver --query "select * from myorder where orderdate='2020-06-26' and \$CONDITIONS" --incremental append --check-column orderdate --target-dir /mydata/mytest -m 1

    创建一个外部表接数据:

    hive> create database myddd; OK Time taken: 0.8 seconds hive> use myddd; hive> create external table mytab( > custid string, > custname string, > birthday string > ) > row format delimited fields terminated by ',' > location '/mydata/mytest'; OK Time taken: 0.277 seconds hive> select * from mytab; OK 1 dd0001 2020-06-25 2 dd0002 2020-06-25 3 dd0003 2020-06-26 4 dd0004 2020-06-26 3 dd0003 2020-06-26 4 dd0004 2020-06-26 Time taken: 0.388 seconds, Fetched: 6 row(s)

    每次运行 这边外部表就会接取数据

    Processed: 0.010, SQL: 9