HiveSql面试题6--小打卡圈主与用户映射关系问题(SQL之join思维典型案例代表)

    技术2025-12-07  13

    目录

    0 需求分析

    小打卡面试题:计算小打卡平台的圈主建了多少圈子,名下有多少用户

    1 需求实现

    3 小 结


    0 需求分析

    小打卡面试题:计算小打卡平台的圈主建了多少圈子,名下有多少用户

    已知,数据如下:

    圈子可以理解为微信公众号,用户可以理解为公众号的粉丝 tb_habit圈子表:近千万行数据

    master_id(圈住的用户id)habit_id(圈住所建的圈子)open_id1habit_id1open_id1habit_id2open_id1habit_id3open_id2habit_id4open_id2habit_id5open_id3habit_id6open_id3habit_id7.........................

    user_habit_relation用户与圈子关系表:亿行数据

    habit_id(所建的圈子)user_id(加入圈子的用户)habit_id1user_id1habit_id1user_id3habit_id1user_id4habit_id3user_id2habit_id3user_id1habit_id2user_id5habit_id2user_id1habit_id2user_id7habit_id4user_id11habit_id4user_id12habit_id4user_id1habit_id6user_id17

     

    需求:请用hive sql计算出如下结果(同一个圈主名下,同一个用户加多个圈子只计算一次): 

    1 需求实现

    (1)数据准备

    圈住用户id 所建圈子id open_id1,habit_id1 open_id1,habit_id2 open_id1,habit_id3 open_id2,habit_id4 open_id2,habit_id5 open_id3,habit_id6 open_id3,habit_id7 圈子id 加入圈子的用户id habit_id1,user_id1 habit_id1,user_id3 habit_id1,user_id4 habit_id3,user_id2 habit_id3,user_id1 habit_id2,user_id5 habit_id2,user_id1 habit_id2,user_id7 habit_id4,user_id11 habit_id4,user_id12 habit_id4,user_id1 habit_id6,user_id17

    (2)创建hive表

    drop table if exists dan_test.tb_habit CREATE TABLE dan_test.tb_habit ( master_id string, habit_id string ) ROW format delimited FIELDS TERMINATED BY ","; ———————————————— drop table if exists dan_test.user_habit_relation CREATE TABLE dan_test.user_habit_relation ( habit_id string, user_id string ) ROW format delimited FIELDS TERMINATED BY ",";

    (3)导入数据

    load data local inpath "/home/centos/dan_test/tb_habit.txt" into table tb_habit; load data local inpath "/home/centos/dan_test/user_habit_relation.txt" into table user_habit_relation;

    (4)需求实现

      分析:建立圈子关系树。如下图所示:

     

         很明显本题需要先对两张表进行join(join思维:寻找完整表的数据)通过join将两张表关联起来,采用left join以圈子表为主表,建立圈主与粉丝之间的映射关系,此时记录数将会被补全,对应的会出现重复的数据如habit_id,因而再求个数时需要去重,如上图所示由于用户1出现了不同的圈子但又是隶属于同一个圈主的,因而在算用户数时需要去重。具体SQL如下:

    select c.master_id ,count(distinct c.a_habit_id) ,count(distinct c.user_id) from( select a.master_id as master_id ,a.habit_id as a_habit_id ,b.habit_id as b_habit_id ,b.user_id as user_id from tb_habit a left join user_habit_relation b on a.habit_id = b.habit_id ) c group by c.master_id OK open_id1 3 6 open_id2 2 3 open_id3 2 1 Time taken: 10.707 seconds, Fetched: 3 row(s)

    子表结果如下:

    select a.master_id as master_id ,a.habit_id as a_habit_id ,b.habit_id as b_habit_id ,b.user_id as user_id from tb_habit a left join user_habit_relation b on a.habit_id = b.habit_id --------------------------------- OK open_id1 habit_id1 habit_id1 user_id1 open_id1 habit_id1 habit_id1 user_id4 open_id1 habit_id1 habit_id1 user_id3 open_id1 habit_id2 habit_id2 user_id5 open_id1 habit_id2 habit_id2 user_id7 open_id1 habit_id2 habit_id2 user_id1 open_id1 habit_id3 habit_id3 user_id2 open_id1 habit_id3 habit_id3 user_id1 open_id2 habit_id4 habit_id4 user_id11 open_id2 habit_id4 habit_id4 user_id1 open_id2 habit_id4 habit_id4 user_id12 open_id2 habit_id5 NULL NULL open_id3 habit_id6 habit_id6 user_id17 open_id3 habit_id7 NULL NULL Time taken: 10.573 seconds, Fetched: 14 row(s)

    本题的解题关键:建立圈子关系映射图,如上图所示,通过关系树构建关联表,然后按照题意找出需求结果。本题是典型的SQL思维join思维解题方法的典型代表。

    3 小 结

    本题是典型的SQL思维之join思维的典型代表题,主要采用join思维进行解题,根据题意建立映射关系图,然后选择合适的join进行求解。

    本题拓展案例:

    需求:计算平台的每一个用户发过多少朋友圈、获得多少点赞

    已知,数据如下:

    T1:数十万行数据

     需求:请用sql计算出如下结果:

    有了本题的分析作为基础,相信聪明的读者应该不难,具体答案评论区见分晓。该拓展案例映射关系如下图所示,SQL由读者给出。

    Processed: 0.008, SQL: 9