hive 如何提取json 数组的某一字段

    技术2022-07-10  159

    背景:

    提取json中某个字段的故事,get_json_object,但是发现好像不能直接提取数组

    尝试使用

    split(regexp_replace(substr(contacts, 2, length(contacts)-2), '"}","', '"}",,,,"'), ',,,,')) 

    之后提取失败~ 提取内容基本如下

    [{"userId":89224,"contactName":"","contactPhone":"+","originPhone":"+"},

    {"userId":89223,"contactName":"","contactPhone":"+","originPhone":""},

    {"userId":89221,"contactName":"","contactPhone":"+","originPhone":"+"}]

    还是直接放方案吧 '$[*]' 推荐

    select user_id, get_json_object(contacts,'$[*].userId') as cons from XXXX 

    你就会得到 类似如下的内容

     user_idcons1107521212[8397213,85,7418687,304388953,198708241,148,111920,5285539,97773639]

     

    但是还不够 我想要生成映射map

    此时推荐 :

    LATERAL VIEW explode (split(substr(t.cons, 2, length(t.cons)-2) , ','))  temp as contact_id 

    关于 explode 和  LATERAL VIEW 的用法 可以自行百度 不容踩坑

    最终写法:完成列转行和映射关系 

    SELECT t.user_id as user_id , contact_id from ( select user_id, get_json_object(contacts,'$[*].userId') as cons from phone_contacts WHERE dt = '2020-06-27') as t LATERAL VIEW explode (split(substr(t.cons, 2, length(t.cons)-2) , ',')) temp as contact_id

     

    Processed: 0.017, SQL: 9