目录
hbase+es
OBServer
Endpoint
Coprocessor加载
静态加载
动态加载
phoenix+hbase构建二级索引方案
phoenix使用
Global Indexing的二级索引
Local Indexing的二级索引
本文介绍基于hbase+es 和 phoenix两种方案。
hbase+es索引方案即可根据协处理器在数据写入hbase前或者写入hbase后进行同步到es.可以选择所有数据进行同步,空间换时间。在数据量大的情况下也可通过具体逻辑获取rowkey和需要查询的字段保存到es.通过es查找要查询的字段,锁定rowkey在去hbase种获取。
Hbase Coprocessor分为2种。 OBServer 和 Endpoint.
Observer类似于Mysql的触发器。当发生某些事件的时候这类协处理器会被server调用。本质就是一些散布在Hbase Server端代码种的hook钩子,在固定的事件时被调用。
比如: 1. put操作之前有钩子函数prePut, put操作之后有钩子函数postPut函数
2.get操作之前有钩子函数preGet,get操作之后有钩子函数postGet函数等等
Hbase为我们提供了4种Observer接口。
RegionObserver 主要监听region相关的事件,比如get,put,scan,delete以及flush等。RegionServerObserver 主要监听RegionServer相关事件,比如RegionServer启动,关闭,或者执行Region合并等事件。WALObserver 主要监听WAL相关的事件,比如WAL写入,滚动等。MasterObserver 主要监听Master相关事件,比如建表,删表,以及修改表结构等。
Endpoint类似于mysql的存储过程,客户端可以在调用这些Endpoint处理器执行一段Server端代码,也就是可以将计算逻辑下推到RegionServer执行。
在hbase-site.xml添加对应的协处理器类,并将jar添加到hbse lib目录下。重启集群。
例:
<property> <name>hbase.coprocessor.region.classes</name> <value>org.myname.hbase.coprocessor.AggregateImplementation</value> </property>两种动态加载的方式。
hbase shell. disable 'mytable'alter 'mytable', METHOD => 'table_att', 'Coprocessor'=>'hdfs://namenode:por/ user/hadoop-user/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823| arg1=1,arg2=2'enable 'mytable'卸载协处理器的方式: 主需要把第二步替换成: alter 'mytable',METHOD=>'table_att_unset',NAME=>'coprocessor$1'
2.通过api编码
全局索引。适用于==读多写少==的业务场景。
本地索引。适用于==写操作频繁==以及空间受限制的场景。
针对本地索引和全局索引。又分为不可变索引(适用于数据==只增加不更新并且按照时间先后顺序存储 )
可变索引(适用于数据有==增删改==的场景。默认可变 )。
操作phoenix 批量脚本执行
(批量任务脚本执行) 1. 编写创表脚本 //注意表名不加双引号,会全部大写 vim user_phoenix.sql create table if not exists user_phoenix ( state varchar(10) NOT NULL, city varchar(20) NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city)); 2.//准备数据 vim user_phoenix.csv NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332 3.编写查看脚本 vim user_phoenix_query.sql select state as "userState",count(city) as "City Count",sum(population) as "Population Sum" FROM user_phoenix GROUP BY state; 4.脚本批量执行。使用psql.py在phoenix安装目录bin/ psql.py /home/hadoop/user_phoenix.sql /home/hadoop/user_phoenix.csv /home/hadoop/user_phoenix_query.sql结果:
操作phoenix第二种方式: shell
1.进入hbase hbase shell 2.hbase添加数据 create 'employee','company','family' put 'employee','row1','company:name','ted' put 'employee','row1','company:position','worker' put 'employee','row1','family:tel','13600912345' put 'employee','row1','family:age','18' put 'employee','row2','company:name','michael' put 'employee','row2','company:position','manager' put 'employee','row2','family:tel','1894225698' put 'employee','row2','family:age','20' 3.建立hhbase到phoenix的映射表 /usr/hdp/current/phoenix-client/bin/sqlline.py CREATE TABLE IF NOT EXISTS "employee" ( "no" VARCHAR(10) NOT NULL PRIMARY KEY, "company"."name" VARCHAR(30), "company"."position" VARCHAR(20), "family"."tel" VARCHAR(20), "family"."age" VARCHAR(20)) column_encoded_bytes=0;结果:
操作phoenix第三种方式:API
public class PhoenixSearch { private Connection connection; private Statement statement; private ResultSet rs; @BeforeTest public void init() throws SQLException { //定义phoenix的连接url地址 String url="jdbc:phoenix:node01:2181"; connection = DriverManager.getConnection(url); //构建Statement对象 statement = connection.createStatement(); } @Test public void queryTable() throws SQLException { //定义查询的sql语句,注意大小写 String sql="select * from US_POPULATION"; //执行sql语句 try { rs=statement.executeQuery(sql); while(rs.next()){ System.out.println("state:"+rs.getString("state")); System.out.println("city:"+rs.getString("city")); System.out.println("population:"+rs.getInt("population")); System.out.println("-------------------------"); } } catch (SQLException e) { e.printStackTrace(); }finally { if(connection!=null){ connection.close(); } } } }
二级索引使用:
1.bin/sqlline.py 2. create table user ( "session_id" varchar(100) not null primary key, "f"."cookie_id" varchar(100), "f"."visit_time" varchar(100), "f"."user_id" varchar(100), "f"."age" varchar(100), "f"."sex" varchar(100), "f"."visit_url" varchar(100), "f"."visit_os" varchar(100), "f"."browser_name" varchar(100), "f"."visit_ip" varchar(100), "f"."province" varchar(100), "f"."city" varchar(100), "f"."page_id" varchar(100), "f"."goods_id" varchar(100), "f"."shop_id" varchar(100)) column_encoded_bytes=0; 3.准备数据 vim user50w.csv d0a13bf9-debe-414e-a7ec-1f4dc6ffec81,d0a13bf9-debe-414e,2016-12-10 07:35:02,e8d3e1e9-7-0,75,male,a.jsp,Unix,Chrome,61.235.105.44,海南省,三亚,page_65,goods_522,shop_319 fe46f7bf-68e0-42c3-8b4b-5181ae992e6b,fe46f7bf-68e0-42c3,2016-12-10 09:46:36,82844fff-0-1,35,male,i.jsp,Unix,Safari,222.67.119.139,云南省,昆明,page_79,goods_961,shop_224 af64005e-d8ef-4523-952b-a797ccc87eae,af64005e-d8ef-4523,2016-12-10 08:59:52,cc13d758-5-2,47,female,f.jsp,window,Chrome,182.90.120.126,福建省,厦门,page_94,goods_355,shop_844 79ccd15c-5eec-41e9-948f-a8c90e2ab561,79ccd15c-5eec-41e9,2016-12-10 09:17:47,8e174007-f-3,16,female,f.jsp,window,Safari,61.235.158.199,河北省,保定,page_77,goods_661,shop_983 acf9cf99-c02b-4e58-a543-592d5ddd15da,acf9cf99-c02b-4e58,2016-12-10 08:35:24,d0205151-b-4,76,male,f.jsp,window,Opera,182.87.170.107,江西省,南昌,page_58,goods_917,shop_321 76507d6a-7e8c-429c-9c44-51235fa50e42,76507d6a-7e8c-429c,2016-12-10 08:33:07,7e3e9aa5-8-5,26,male,f.jsp,Unix,IE,171.13.22.69,四川省,成都,page_73,goods_831,shop_652 31b4e67d-2603-46a2-9152-ef08d51e5b44,31b4e67d-2603-46a2,2016-12-10 07:36:16,7a21a9dd-1-6,17,male,b.jsp,Linux,Firefox,171.8.205.253,湖北省,黄冈,page_79,goods_624,shop_73 ee79a7b8-0440-46d3-a5a6-e7cf973a59bc,ee79a7b8-0440-46d3,2016-12-10 08:07:06,eca940cb-3-7,61,male,a.jsp,Unix,Firefox,182.89.187.67,海南省,三亚,page_77,goods_213,shop_500 4ee59f5e-7a27-40cf-acdc-9bf7252fdaae,4ee59f5e-7a27-40cf,2016-12-10 08:17:39,092a7454-9-8,66,male,i.jsp,Linux,Firefox,123.234.240.235,江西省,南昌,page_75,goods_479,shop_952 6c63e315-ac55-4968-b803-4620654a4cf2,6c63e315-ac55-4968,2016-12-10 08:25:23,72ec3e29-a-9,23,male,g.jsp,window,Firefox,123.235.233.65,河北省,保定,page_21,goods_870,shop_846 241ced55-fe21-46e5-8171-1dc8903e645a,241ced55-fe21-46e5,2016-12-10 07:35:36,9abb589e-0-10,52,male,d.jsp,window,360,222.71.166.211,山西省,太原,page_66,goods_565,shop_381 74afd1ac-3bb4-4001-a6a6-5a87b6d034cd,74afd1ac-3bb4-4001,2016-12-10 09:06:50,ecab0619-1-11,26,female,g.jsp,Unix,360,36.58.133.237,河北省,石家庄,page_92,goods_720,shop_102 6cd8b9a5-ec5d-4ed3-9fd2-a5b6b2a6c0f8,6cd8b9a5-ec5d-4ed3,2016-12-10 09:40:39,430f6b45-8-12,9,female,b.jsp,window,Firefox,61.235.63.41,江西省,南昌,page_0,goods_302,shop_964 37836cbb-5edc-42b1-b567-96fd373c49a1,37836cbb-5edc-42b1,2016-12-10 08:25:39,fc177b4f-6-13,65,male,a.jsp,Unix,IE,182.83.197.94,上海,上海,page_35,goods_671,shop_589 0eaf16e3-455b-4a6f-aae8-8338c560233a,0eaf16e3-455b-4a6f,2016-12-10 07:59:58,f9c54b17-a-14,66,female,f.jsp,Unix,Chrome,182.91.164.116,河南省,郑州,page_88,goods_670,shop_521 a4452fbf-df1c-4a0f-81f5-6ea2f29ce37c,a4452fbf-df1c-4a0f,2016-12-10 09:43:39,ffc3f499-7-15,14,male,c.jsp,window,Opera,171.14.65.223,海南省,三亚,page_84,goods_506,shop_539 542f9fd1-d43a-417f-bd35-067fc809e97e,542f9fd1-d43a-417f,2016-12-10 07:41:36,6f12025e-f-16,78,female,g.jsp,window,Safari,121.76.65.104,河北省,石家庄,page_32,goods_233,shop_115 9a728dae-4246-4b85-8163-4340312f8513,9a728dae-4246-4b85,2016-12-10 07:55:07,82fc4ce6-f-17,50,male,i.jsp,window,360,106.85.80.251,河北省,保定,page_30,goods_558,shop_514 4191896a-5d49-4639-b686-6ea5bd4adc21,4191896a-5d49-4639,2016-12-10 09:54:04,b13ff532-3-18,10,female,f.jsp,window,Firefox,139.214.246.126,湖北省,武汉,page_27,goods_790,shop_952 e9cf805e-9dd0-4f6f-a557-2d4dd77787d9,e9cf805e-9dd0-4f6f,2016-12-10 07:25:28,fa96f2e8-3-19,12,female,i.jsp,window,Chrome,139.197.132.59,福建省,泉州,page_56,goods_859,shop_829 e640a4b8-e3e8-4243-8288-10174a27ebcd,e640a4b8-e3e8-4243,2016-12-10 07:11:22,b0602438-6-20,4,female,c.jsp,Linux,Opera,182.81.203.77,湖北省,武汉,page_60,goods_572,shop_599 98908358-6786-4bc1-aec0-95df25de8625,98908358-6786-4bc1,2016-12-10 08:05:06,90aea264-9-21,71,male,g.jsp,Unix,Chrome,210.46.30.153,海南省,三亚,page_83,goods_684,shop_164 1b59b298-ffcd-48b0-be87-cfd3616a9eb6,1b59b298-ffcd-48b0,2016-12-10 09:52:49,3cca179f-3-22,16,female,c.jsp,window,Opera,121.76.38.151,海南省,三亚,page_61,goods_935,shop_892 7ca8f087-e1a8-40a2-af98-2b6acde257cd,7ca8f087-e1a8-40a2,2016-12-10 08:08:20,ddb9a1e1-f-23,48,male,h.jsp,Unix,Firefox,61.233.133.84,湖北省,黄冈,page_3,goods_800,shop_499 522ee5e4-df0c-4f44-9386-e636f71be21a,522ee5e4-df0c-4f44,2016-12-10 07:13:10,e608db19-8-24,77,female,c.jsp,Unix,Chrome,123.232.74.186,云南省,昆明,page_47,goods_633,shop_193 9162d621-bafd-4320-a563-d9414735feb7,9162d621-bafd-4320,2016-12-10 10:15:37,0e6dc16d-9-25,8,male,f.jsp,window,Chrome,182.85.84.124,湖北省,武汉,page_75,goods_780,shop_261 3fb5df6e-ffab-4d65-a630-96aa411aaea5,3fb5df6e-ffab-4d65,2016-12-10 07:16:20,badf55e0-8-26,40,female,b.jsp,Linux,360,171.10.101.56,黑龙江,哈尔滨,page_80,goods_393,shop_807 8920282e-d47b-4898-a7d8-50003111b790,8920282e-d47b-4898,2016-12-10 10:32:48,ed726c62-7-27,61,male,h.jsp,Unix,Opera,139.205.253.141,湖南省,长沙,page_22,goods_956,shop_757 605dce05-de7c-40ca-b580-71eb5d53fb88,605dce05-de7c-40ca,2016-12-10 08:53:47,ab2ea182-2-28,28,female,d.jsp,Linux,Opera,139.205.238.221,四川省,成都,page_62,goods_377,shop_9 4./usr/hdp/current/phoenix-client/bin/psql.py -t USER /home/hadoop/phoenixData/user50w.csv现在表中已经有数据了
建立索引的时候还要多写一份include字段,一个索引表对应一个独立的Hbase表,读的时候直接从索引表定位并读出信息。所以这种表的应用场景定位是写的慢,读得快
添加索引 create index USER_COOKIE_ID_INDEX on USER ("f"."cookie_id");
注意:不是索引字段,是不会使用到索引表去查询的。
如果有字段不是索引字段是不会走索引查询,此时可以使用强制索引表。
创建covered index
create index on USER_COOKIE_ID_INDEX(user_id) include(sex);针对具体字段添加索引表。 这种索引设计方式只写索引数据,省了不少空间的占用,根据索引信息拿到原rowkey后再根据rowkey到原数据里面获取其它信息。所以这种表的应用场景定位是写的快,读得慢
创建local Indexing 索引: create local index USER_USER_ID_INDEX on USER ("f"."user_id");注意:本地索引查询字段如果不再索引表中,也可以走索引表查询。
删除某个表的某张索引: drop index USER_COOKIE_ID_INDEX on user; 索引重建 alter index USER_COOKIE_ID_INDEX on user rebuild;
对phoenix映射的hbase表创建索引,直接使用hbase API写Hbase表,索引不会同步 只有通过phoenix语法写入才能使用二级索引同步机制。