java api并发调用sqoop异常记录

    技术2023-09-12  136

    通过java api并发调用sqoop,发现如下相关异常

    2020-07-03 15:10:44 [ pool-1-thread-6:350039 ] - [ ERROR ] Got exception running Sqoop: java.lang.NullPointerException java.lang.NullPointerException at java.util.Objects.requireNonNull(Objects.java:203) at java.util.Arrays$ArrayList.<init>(Arrays.java:3813) at java.util.Arrays.asList(Arrays.java:3800) at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:76) at org.apache.sqoop.util.FileListing.getFileListingNoSort(FileListing.java:82) at org.apache.sqoop.util.FileListing.getFileListing(FileListing.java:67) at com.cloudera.sqoop.util.FileListing.getFileListing(FileListing.java:39) at org.apache.sqoop.orm.CompilationManager.addClassFilesFromDir(CompilationManager.java:289) at org.apache.sqoop.orm.CompilationManager.jar(CompilationManager.java:374) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:494) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at cn.xxx.xxx.sync.SqoopSync$sqoopSyncTask.run(SqoopSync.java:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

    通过跟踪源码发现原因为SqoopOptions中jarOutputDir目录为空导致,而jarOutputDir又是在SqoopOptions初始化时调用getNonceJarDir赋值的,代码如下:

    private void initDefaults(Configuration baseConfiguration) { ... this.jarOutputDir = getNonceJarDir(this.tmpDir + "sqoop-" + localUsername + "/compile"); ... } private static String getNonceJarDir(String tmpBase) { int MAX_DIR_CREATE_ATTEMPTS = true; if (null != curNonce) { return curNonce; } else { File baseDir = new File(tmpBase); File hashDir = null; for(int attempts = 0; attempts < 32; ++attempts) { for(hashDir = new File(baseDir, RandomHash.generateMD5String()); hashDir.exists(); hashDir = new File(baseDir, RandomHash.generateMD5String())) { } if (hashDir.mkdirs()) { hashDir.deleteOnExit(); break; } } if (hashDir != null && hashDir.exists()) { LOG.debug("Generated nonce dir: " + hashDir.toString()); curNonce = hashDir.toString(); return curNonce; } else { throw new RuntimeException("Could not create temporary directory: " + hashDir + "; check for a directory permissions issue on /tmp."); } } }

    如果没设置,默认使用curNonce,curNonce为静态变量,同个java进程sqoop会使用同一个编译目录,当jarOutputDir被其它已完成sqoop任务删除,而报NullPointerException ,如果使用命令行启动sqoop是没这个问题的,因为每个sqoop都是一个单独的进程。

    通过查询官网,需要设置如下参数,可以直接使用UUID为目录名,防止冲突

    --bindir <dir> 编译对象的输出目录

    同时最好配置如下设置,避免多sqoop任务处理同一张表冲突,class-name可以表名加上UUID后缀

    --outdir <dir> 生成代码的输出目录 --class-name <name> 设置生成的类名。
    Processed: 0.008, SQL: 10