HIVE的一些实用技巧

    技术2026-01-16  8

    1 设置会话级别的参数

    # 显示 config 配置的值 # 格式 set <config>; # 示例 set hive.cli.print.current.db # 设置config配置的值 # 格式 set <config>=<config_value>; # 设置示例 set hive.cli.print.current.db=true;

    疑问:调优的时候很多参数是可以调整的,    为什么不配置到全局的里面 (hive-site.xml 文件里都是全局) 去呢? 答:  每个场景的作业情况不同, 调优是针对具体作业的,一个调优无法适应所有场景

     

    2 查看表结构详细

    hive> desc formatted course_select; OK col_name data_type comment # col_name data_type comment stu_id string the id of a student select_courses array<string> selected courses of a student # Detailed Table Information Database: default OwnerType: USER Owner: hdfs CreateTime: Sat Jul 04 16:04:56 CST 2020 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://cdh01:8020/user/hive/warehouse/course_select Table Type: EXTERNAL_TABLE Table Parameters: COLUMN_STATS_ACCURATE true EXTERNAL TRUE numFiles 1 totalSize 98 transient_lastDdlTime 1593850340 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: colelction.delim , field.delim \t serialization.format \t Time taken: 0.138 seconds, Fetched: 34 row(s)

    3 shell 脚本方式执行hive命令

    3.1 hive -e "<hive command>"

    hive> create external table if not exists student_roll( > stu_id int comment 'the id of a student', > basic_info struct<name:string,age:int> comment 'the info of a student', > roll_year string) > row format delimited > fields terminated by '\t' > collection items terminated by ','; OK Time taken: 0.079 seconds hive> select * from student_roll; OK student_roll.stu_id student_roll.basic_info student_roll.roll_year 8102101 {"name":"zhouyu","age":23} 2012 8102121 {"name":"huanggai","age":25} 2015 8102141 {"name":"liming","age":20} 2018 8102108 {"name":"XiaoXiu","age":24} 2013 Time taken: 0.058 seconds, Fetched: 4 row(s) # 实验hive -e 命令 [hdfs@cdh04 hive-app-data]$ vim select_student_roll.sh #!/bin/bash roll_year=$1 hive -e "select * from student_roll where roll_year='$roll_year'" # 执行脚本,这种方式便于传参,更为灵活 [hdfs@cdh04 hive-app-data]$ ./select_student_roll.sh 2018 Total MapReduce CPU Time Spent: 2 seconds 770 msec OK 8102141 {"name":"liming","age":20} 2018 Time taken: 21.038 seconds, Fetched: 1 row(s)

     

    3.2 hive -f <sql脚本文件>

    先删除上面创建的表

    hive> drop table student_roll; OK Time taken: 0.125 seconds hive> > select * from student_roll; FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'student_roll' hive>

    将创建表和查询数据的动作写入脚本文件

    vim hive-sql02.sql

    create external table if not exists student_roll( stu_id int comment 'the id of a student', basic_info struct<name:string,age:int> comment 'the info of a student', roll_year string) row format delimited fields terminated by '\t' collection items terminated by ','; load data local inpath '/var/lib/hadoop-hdfs/hive-app-data/student_roll.dat' overwrite into table student_roll; select * from student_roll;

    hive -f hive-sql02.sql

    [hdfs@cdh04 hive-app-data]$ hive -f hive-sql02.sql Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.16.2-1.cdh5.16.2.p0.8/jars/hive-common-1.1.0-cdh5.16.2.jar!/hive-log4j.properties OK Time taken: 1.359 seconds Loading data to table default.student_roll Table default.student_roll stats: [numFiles=1, numRows=0, totalSize=99, rawDataSize=0] OK Time taken: 0.778 seconds OK 8102101 {"name":"zhouyu","age":23} 2012 8102121 {"name":"huanggai","age":25} 2015 8102141 {"name":"liming","age":20} 2018 8102108 {"name":"XiaoXiu","age":24} 2013 Time taken: 0.373 seconds, Fetched: 4 row(s) WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked. WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.

     

    Processed: 0.011, SQL: 9