hive 操作笔记

技术2026-01-18 13

1、查看数据库的描述及路径

hive> describe database database_name;

2、建表创建脚本 /opt/createtable.sql

create table IF NOT EXISTS lining_test.emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

执行脚本

hive>source /opt/createtable.sql;

加载数据

load data local inpath '/opt/test/emp.txt' overwrite into table emp;

查询

use lining_test;select * from emp;

增加列

use lining_test;alter table emp add columns(year string);

3、分区表创建脚本/opt/createparttable.sql

create table lining_test.emp_part( empno int, empname string, job string, mgr int, hiredate string, salary double, comm double, deptno int) partitioned by (year string, month string) row format delimited fields terminated by '\t';

执行脚本

hive>source /opt/createparttable.sql;

加载数据

load data local inpath '/opt/emp2.txt' into table lining_test.emp_part partition(year='2018',month='8'); load data local inpath '/opt/emp3.txt' into table lining_test.emp_part partition(year='2018',month='3'); load data local inpath '/opt/emp4.txt' into table lining_test.emp_part partition(year='2017',month='12');

查询

use lining_test;select * from emp_part;

查询表结构

describe lining_test.emp_part;

4、直接用hive命令

hive -e "use lining_test;ALTER TABLE emp CHANGE payyear month string;" hive -f "select * from db_hive01.emp"

5、导出结果

insert overwrite local directory '/opt/test/' row format delimited fields terminated by '\t' select * from emp_part; hive -e 'select * from emp_part' >> ./emp_export.txt

6、导出到HDFS

insert overwrite directory '/export_data' select * from lining_test.emp;

有local的速度明显比没有local慢

7、显示所有函数：

hive>show functions;

查看函数的用法；

hive>describe function fun_name;

常用函数

select upper(empname) from emp; select unix_timestamp(trackTime) from bflog limit 3 ; select year(hiredate) from emp ; select month(hiredate) from emp ; select hour(hiredate) from emp ; select substr(hiredate,1,4) from .emp ; select split(hiredate,'-')[1] from emp ; select reverse(hiredate) from emp ; select concat(empno,'-',empname) from emp ; case when 条件1 then ... when 条件2 then ... else end

自定义UDF

add jar /opt/test/mylower.jar ; CREATE TEMPORARY FUNCTION mylower AS 'org.gh.hadoop.hive.MyLower';

8、内连接： hive> SELECT sales., things. FROM sales JOIN things ON (sales.id = things.id);

9、查看hive为某个查询使用多少个MapReduce作业 hive> Explain SELECT sales., things. FROM sales JOIN things ON (sales.id = things.id);

10、左连接： hive> SELECT sales., things. FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);

11、Map连接：Hive可以把较小的表放入每个Mapper的内存来执行连接操作 hive> SELECT /+ MAPJOIN(things) / sales., things. FROM sales JOIN things ON (sales.id = things.id);

12、insert 导入

INSERT OVERWRITE TABLE stations_by_year SELECT year, COUNT(DISTINCT station) GROUP BY year

13、创建视图：

CREATE VIEW valid_records AS SELECT * FROM records2 WHERE temperature !=9999;

14、hive中使用正则表达式选出所有列名以price作为前缀的列（1） hive>select 'price.*' from product; （2）用Like或者RLike

15、order by 全局排序

insert overwrite local directory '/opt/test/local' row format delimited fields terminated by '\t' select * from emp order by empno;

16、sort by 与 distributed by 类似MR中partition，进行分区，结合sort by使用每个reduce内部进行排序，全局不是排序， distribute by 一定是放在sort by 前面，且必须要指定mapreduce.job.reduces数量，否则导出结果还是在一个文件中

set mapreduce.job.reduces=3; insert overwrite local directory '/opt/test/local' row format delimited fields terminated by '\t' select * from emp distribute by deptno sort by empno;

当distributed by和sort by 字段一样的时候，直接使用cluster by

17、聚合函数可以通过设置属性hive.map.aggr值为true来提高聚合的性能： hive>hive.map.aggr=true;

18、word count

select word,count(*) from( select explode(split(sentence,' ')) as word from article )t group by word;

Processed: 0.031, SQL: 9