KDB加载文本数据最佳方案--kdloader

    技术2022-07-11  155

    KDB快速加载文本数据方式

    kdloader介绍kdloader最佳性能实践测试环境信息测试步骤总结注意事项kdloader参数说明

    kdloader介绍

    kdLoader 是 K-DB 数据库提供的数据加载工具。只需要创建保存数据的文本文件,从文本文件中导入数据到数据库。可轻松的将大量数据批量载入 K-DB 数据库。

    kdloader最佳性能实践

    测试环境信息

    ENV: 2 + 3 DISK: PCIE SSD CPU: Intel® Xeon® CPU E5-2680 v4 @ 2.40GHz 56cores 机器: NF5280M4 测试表/rows/大小: account / 5725269 / 4G /

    测试步骤

    普通加载,日志产生量为4.2G, 耗时33min kdloader userid=test/test control=ACCOUNT.ctl log=account1.log 设置表不记录redo,日志产生量为4.2G。 耗时33min,无效果 alter table account nologging; kdloader userid=test/test control=ACCOUNT.ctl log=account2.log 调整bindsize, 从65536 改为最大值10485760,日志产生量为3.3.G。 耗时6min, 提高了5倍加载速度 kdloader errors=500 bindsize=10485760 userid=test/test control=ACCOUNT.ctl log=accoun3.log 使用并行加载 direct=Y 方式,日志产生量为0.02G。 耗时2min, 提高了15倍速度, 单个CPU core打满 kdloader errors=500 bindsize=10485760 readsize=2097152 userid=test/test direct=Y control=ACCOUNT.ctl log=accoun5.log 把大文件拆分成多个小文件。 日志产生量为0.02G. 耗时0.5min, 提高了60倍速度。 多个CPU core打满 kdloader errors=500 bindsize=10485760 userid=test/test direct=Y control=ACCOUNT1.ctl log=accoun6.log

    总结

    kdloader程序,最快的加载方式为: 使用 直接路径加载,把大文件拆分为多个小文件,并行加载。速度最快

    注意事项

    设置 LD_MEM_PROTECT=N --没有明显效果如果表上有索引, 不能使用 并行直接路径加载方式。表上有索引的,建议导入数据之后重建索引数据文件中每个列最好不用双引号分割, 否则parallel参数无效。

    kdloader参数说明

    $ kdloader --help

    Options: -h|–help Display the more detailed information. -c|–charset Display usable character sets in this version. -v|–version Display the version information.

    Controls: userid=username/password@dbname Logon information to connect. The dbname has to be matched with the name described in tbdsn.tbr exactly.

    control= The name of the control file. Allowed to use a absolute path and a relative path.

    data= The name of the data file to load. Allowed to use a absolute path and a relative path.

    log= The name of the log file. This is for logging the entire loading process. Allowed to use a absolute path and a relative path. If this parameter is omitted, it is used .log as default.

    bad= The name of the bad file. This is for logging the records to be failed to load. Allowed to use a absolute path and a relative path. If this parameter is omitted, it is used .bad as default.

    discard= The name of the discard file. This is for logging the records to be discarded. Allowed to use a absolute path and a relative path. If this parameter is omitted, the records does not be logged.

    skip= The count of line to skip in data file from the start line. The default is 0.

    errors= The count of error to allow. If error is raised more than this value, abort the loading immediately. The default is 50.

    rows= The count of row for each commit. If this value is smaller, the performance is lower. The default is ‘commit after the end’.

    message= The count of records to process to print out. The value is smaller, the performance is slower. The default is 0.

    readsize= The size of buffer to read from the data file. This value is allowed to use the maximum 2M(2,097,152) byte. The default is 65536 byte.

    disable_idx=<Y|N> Whether to disable indexes before loading. The default is ‘N’.

    direct=<Y|N> Whether to use the Direct Path Loading. If you choose ‘y|Y’, load by the Direct Path Loading. Or load by the conventional loading. The default is ‘N’.

    dpl_log=<Y|N> Whether to log about the Direct Path Loading. If you choose ‘y|Y’, be logged the loading process and use it by recovery. But the performance is low. If you choose ‘n|N’, do not be logged. The default is ‘N’.

    parallel= The count of threads to execute Parallel Direct Path Loading. The default is 1.

    bindsize= The size of buffer to read in the Direct Path Loading. This value is allowed to use the maximum 10M(10,485,760) byte. The default is 65536 byte.

    Processed: 0.008, SQL: 9