HIVE插入数据数据少时可以数据大时失败

    技术2023-11-06  106

    tez为hive引擎,执行sql(7G表到orc动态分组表 30多个分区):

    insert overwrite table  dmtest.business_licence_SOFT_INFOE_s partition(AREA_CODE) select enterprise_id,revoke_flag,REGISTER_TIME,update_time, AREA_CODE  from dmtest.business_licence_SOFT_INFOE_H ;

     动态分区插入

    Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"enterprise_id":"34000004046964","area_code":"340000","register_time":"2019-10-06 00:00:00","revoke_flag":"1","update_time":"2020-07-03"}     at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)     at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)     at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)     at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)     ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"enterprise_id":"34000004046964","area_code":"340000","register_time":"2019-10-06 00:00:00","revoke_flag":"1","update_time":"2020-07-03"}     at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)     at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)     ... 17 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive/staging/.hive-staging_hive_2020-07-03_15-55-00_467_6569156417071306755-1/_task_tmp.-ext-10000/area_code=340000/_tmp.000009_3 could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s) running and no node(s) are excluded in this operation.  

     在执行

    insert overwrite table  dmtest.business_licence_SOFT_INFOE_s partition(AREA_CODE) select enterprise_id,revoke_flag,REGISTER_TIME,update_time, AREA_CODE  from dmtest.business_licence_SOFT_INFOE_H limit 1000;

    时无错

    原因,由于map数量多(58个)导入orc文件时 由于orc进行压缩,压缩过后小文件太多(30分区*58),崩溃了

     

    解决办法:

    tez引擎:

    SET tez.grouping.min-size=1024000000;  -- 最小1024M,合并map端小文件  SET tez.grouping.max-size=2048000000;  -- 最大2048M,可增加处理Map数量 可有可无

    mr引擎设置缩小map数量即可

    设置之后由 58个map到 9个map

    执行成功

     

    之后再设置点 合并小文件 

    https://www.cnblogs.com/hit-zb/p/12030721.html

     

     

    Processed: 0.017, SQL: 9