spark源码编译之集成cdh的hadoop版本

    技术2025-12-15  12

    文章目录

    0.编译目标1.下载指定版本源码2.准备maven jdk git编译环境3.编译3.1 修改make-distribution.sh(文件在./dev 里面)3.2 在spark source包的根目录pom.xml中添加cdh的 repos3.3 编译命令 4.编译成功并查看5.spark 3.0.0集成编译5.1 步骤和上面基本一样5.2 报错及解决5.2.1 报错5.2.2 spark3.0.0中源码类修改 5.3 重新编译,成功

    0.编译目标

    spark-2.4.5.tgz 集成hadoop-2.6.0-cdh5.16.2.tar

    1.下载指定版本源码

    https://archive.apache.org/dist/spark/

    后边四个就是四种格式的源码,余下的是安装包

    wget https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5.tgz

    2.准备maven jdk git编译环境

    Maven 3.5.4 or + Java 8 scala 2.11.12 yum install -y git 如果是国内环境,可将maven中的仓库换成aliyun /usr/local/maven/conf/settings.xml <mirror> <id>nexus-aliyun</id> <mirrorOf>central</mirrorOf> <name>Nexus aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </mirror> <mirror> <id>CN</id> <name>OSChina Central</name> <url>http://maven.oschina.net/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror> <mirror> <id>alimaven</id> <mirrorOf>central</mirrorOf> <name>aliyun maven</name> <url>https://maven.aliyun.com/nexus/content/repositories/central/</url> </mirror> <mirror> <id>jboss-public-repository-group</id> <mirrorOf>central</mirrorOf> <name>JBoss Public Repository Group</name> <url>https://repository.jboss.org/nexus/content/groups/public</url> </mirror>

    3.编译

    官网参考 https://spark.apache.org/docs/2.4.5/building-spark.html

    编译注意scala的版本的变动

    (1)在spark的3.0.0里面是需要scala2.12的,可以-Dscala.version=2.12.10来进行改动

    The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.6.3 and Java 8. Spark requires Scala 2.12; support for Scala 2.11 was removed in Spark 3.0.0.

    此时linux本地必须安装 scala 2.12.10的版本,才能编译spark3.0.0

    [root@hadoop003 ~]# scala -version Scala code runner version 2.12.10 -- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc.

    (2)To build Spark using another supported Scala version, please change the major Scala version using (e.g. 2.12):

    ./dev/change-scala-version.sh 2.12

    (3)并在./dev/make-distribution.sh中指定SCALA_VERSION=2.12

    (4)编译命令变为

    ./dev/make-distribution.sh \ --name 2.6.0-cdh5.16.2 \ --tgz \ -Phadoop-2.6 \ -Dhadoop.version=2.6.0-cdh5.16.2 \ -Phive -Phive-thriftserver -Pyarn \ -DskipTests \ -Pscala-2.12 \ -Dscala.version=2.12.10 \ -X

    3.1 修改make-distribution.sh(文件在./dev 里面)

    在使用这个方式编译的时候,根据查看里面的代码,它会去确认scala,hadoop等的版本信息,这个动作会花费较长的时间。为了不长时间卡在这里,我们直接指定版本信息。先将上面的代码注释掉,然后在后面添加指定信息。其中VERSION为spark的版本。SPARK_HIVE=1为支持hive的意思。 120 #VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1) 121 #SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\ 122 # | grep -v "INFO"\ 123 # | tail -n 1) 124 #SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\ 125 # | grep -v "INFO"\ 126 # | tail -n 1) 127 #SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\ 128 # | grep -v "INFO"\ 129 # | fgrep --count "<id>hive</id>";\ 130 # # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\ 131 # # because we use "set -o pipefail" 132 # echo -n) 133 VERSION=2.4.5 SCALA_VERSION=2.11 SPARK_HADOOP_VERSION=2.6.0-cdh5.16.2 SPARK_HIVE=1

    3.2 在spark source包的根目录pom.xml中添加cdh的 repos

    <repository> <id>cloudera</id> <name>cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository>

    3.3 编译命令

    ./dev/make-distribution.sh \ --name 2.6.0-cdh5.16.2 \ --tgz \ -Phadoop-2.6 \ -Dhadoop.version=2.6.0-cdh5.16.2 \ -Phive -Phive-thriftserver -Pyarn \ -DskipTests \ -X ./dev/make-distribution.sh --name 2.6.0-cdh5.16.2 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.16.2 -Phive -Phive-thriftserver -Pyarn -DskipTests 这些参数的具体使用可以查看make-distribution.sh脚本。name的设置最好是你使用的hadoop版本号,这样可以一眼看出spark是基于什么hadoop版本编译的。 a、 --name 使用要集成的hadoop的版本,可以方便查看 b、 - -tgz 编译好的包后缀名。 c、 -Dhadoop.version hadoop的具体版本号 d、-Phadoop-2.6 hadoop的大版本号 e、-Phive -Phive-thriftserver支持hive f、-Pyarn支持yarn g、如果一些编译测试报错,需要开启跳过测试

    4.编译成功并查看

    (1)编译成功 [INFO] Reactor Summary for Spark Project Parent POM 2.4.5: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 2.964 s] [INFO] Spark Project Tags ................................. SUCCESS [ 8.686 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 9.732 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 6.172 s] [INFO] Spark Project Networking ........................... SUCCESS [ 13.268 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 9.987 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 14.402 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 11.740 s] [INFO] Spark Project Core ................................. SUCCESS [02:40 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 26.442 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 22.666 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 56.907 s] [INFO] Spark Project Catalyst ............................. SUCCESS [02:18 min] [INFO] Spark Project SQL .................................. SUCCESS [03:01 min] [INFO] Spark Project ML Library ........................... SUCCESS [02:27 min] [INFO] Spark Project Tools ................................ SUCCESS [ 8.742 s] [INFO] Spark Project Hive ................................. SUCCESS [01:06 min] [INFO] Spark Project REPL ................................. SUCCESS [ 7.208 s] [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 17.390 s] [INFO] Spark Project YARN ................................. SUCCESS [ 22.008 s] [INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 37.512 s] [INFO] Spark Project Assembly ............................. SUCCESS [ 3.743 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 17.082 s] [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [ 17.158 s] [INFO] Spark Project Examples ............................. SUCCESS [ 23.347 s] [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 7.639 s] [INFO] Spark Avro ......................................... SUCCESS [ 10.695 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12:06 min (Wall Clock) [INFO] Finished at: 2020-07-04T19:50:53+08:00 [INFO] ------------------------------------------------------------------------ (2)编译完成之后在spark源码跟目录即产生相应的tar包 spark-2.4.5-bin-hadoop-2.6.0-cdh5.16.2.tgz [root@hadoop003 spark-2.4.5]# pwd /spark/source/spark-2.4.5 [root@hadoop003 spark-2.4.5]# ls appveyor.yml build CONTRIBUTING.md dev examples hadoop-cloud licenses NOTICE python repl scalastyle-config.xml streaming assembly common core dist external launcher mllib pom.xml R resource-managers spark-2.4.5-bin-hadoop-2.6.0-cdh5.16.2.tgz target bin conf data docs graphx LICENSE mllib-local project README.md sbin sql (3)解压 tar -zxvf spark-2.4.5-bin-hadoop-2.6.0-cdh5.16.2.tgz [root@hadoop003 spark-2.4.5]# cd spark-2.4.5-bin-2.6.0-cdh5.16.2 [root@hadoop003 spark-2.4.5-bin-2.6.0-cdh5.16.2]# ls bin conf data examples jars python README.md RELEASE sbin yarn [root@hadoop003 spark-2.4.5-bin-2.6.0-cdh5.16.2]# ls -alh total 60K drwxr-xr-x 10 root root 4.0K Jul 4 19:50 . drwxr-xr-x 32 1000 1000 4.0K Jul 4 19:59 .. drwxr-xr-x 2 root root 4.0K Jul 4 19:50 bin drwxr-xr-x 2 root root 4.0K Jul 4 19:50 conf drwxr-xr-x 5 root root 4.0K Jul 4 19:50 data drwxr-xr-x 4 root root 4.0K Jul 4 19:50 examples drwxr-xr-x 2 root root 16K Jul 4 19:50 jars drwxr-xr-x 7 root root 4.0K Jul 4 19:50 python -rw-r--r-- 1 root root 3.7K Jul 4 19:50 README.md -rw-r--r-- 1 root root 150 Jul 4 19:50 RELEASE drwxr-xr-x 2 root root 4.0K Jul 4 19:50 sbin drwxr-xr-x 2 root root 4.0K Jul 4 19:50 yarn # 可以看到jars中的hadoop版本 [root@hadoop003 jars]# pwd /spark/source/spark-2.4.5/spark-2.4.5-bin-2.6.0-cdh5.16.2/jars [root@hadoop003 jars]# ls activation-1.1.1.jar hadoop-mapreduce-client-shuffle-2.6.0-cdh5.16.2.jar metrics-json-3.1.5.jar aircompressor-0.10.jar hadoop-yarn-api-2.6.0-cdh5.16.2.jar metrics-jvm-3.1.5.jar antlr-2.7.7.jar hadoop-yarn-client-2.6.0-cdh5.16.2.jar minlog-1.3.0.jar antlr4-runtime-4.7.jar hadoop-yarn-common-2.6.0-cdh5.16.2.jar netty-3.9.9.Final.jar antlr-runtime-3.4.jar hadoop-yarn-server-common-2.6.0-cdh5.16.2.jar netty-all-4.1.42.Final.jar aopalliance-1.0.jar hadoop-yarn-server-web-proxy-2.6.0-cdh5.16.2.jar objenesis-2.5.1.jar aopalliance-repackaged-2.4.0-b34.jar hive-beeline-1.2.1.spark2.jar opencsv-2.3.jar apacheds-i18n-2.0.0-M15.jar hive-cli-1.2.1.spark2.jar orc-core-1.5.5-nohive.jar apacheds-kerberos-codec-2.0.0-M15.jar hive-exec-1.2.1.spark2.jar orc-mapreduce-1.5.5-nohive.jar apache-log4j-extras-1.2.17.jar hive-jdbc-1.2.1.spark2.jar orc-shims-1.5.5.jar api-asn1-api-1.0.0-M20.jar hive-metastore-1.2.1.spark2.jar oro-2.0.8.jar api-util-1.0.0-M20.jar hk2-api-2.4.0-b34.jar osgi-resource-locator-1.0.1.jar arpack_combined_all-0.1.jar hk2-locator-2.4.0-b34.jar paranamer-2.8.jar arrow-format-0.10.0.jar hk2-utils-2.4.0-b34.jar parquet-column-1.10.1.jar arrow-memory-0.10.0.jar hppc-0.7.2.jar parquet-common-1.10.1.jar arrow-vector-0.10.0.jar htrace-core4-4.0.1-incubating.jar parquet-encoding-1.10.1.jar avro-1.8.2.jar httpclient-4.5.6.jar parquet-format-2.4.0.jar avro-ipc-1.8.2.jar httpcore-4.4.10.jar parquet-hadoop-1.10.1.jar avro-mapred-1.8.2-hadoop2.jar ivy-2.4.0.jar parquet-hadoop-bundle-1.6.0.jar aws-java-sdk-bundle-1.11.134.jar jackson-annotations-2.6.7.jar parquet-jackson-1.10.1.jar bonecp-0.8.0.RELEASE.jar jackson-core-2.6.7.jar protobuf-java-2.5.0.jar breeze_2.11-0.13.2.jar jackson-core-asl-1.9.13.jar py4j-0.10.7.jar breeze-macros_2.11-0.13.2.jar jackson-databind-2.6.7.3.jar pyrolite-4.13.jar calcite-avatica-1.2.0-incubating.jar jackson-jaxrs-1.9.13.jar RoaringBitmap-0.7.45.jar calcite-core-1.2.0-incubating.jar jackson-mapper-asl-1.9.13.jar scala-compiler-2.11.12.jar calcite-linq4j-1.2.0-incubating.jar jackson-module-paranamer-2.7.9.jar scala-library-2.11.12.jar chill_2.11-0.9.3.jar jackson-module-scala_2.11-2.6.7.1.jar scala-parser-combinators_2.11-1.1.0.jar chill-java-0.9.3.jar jackson-xc-1.9.13.jar scala-reflect-2.11.12.jar commons-beanutils-1.9.4.jar janino-3.0.9.jar scala-xml_2.11-1.0.5.jar commons-cli-1.2.jar JavaEWAH-0.3.2.jar shapeless_2.11-2.3.2.jar commons-codec-1.10.jar javassist-3.18.1-GA.jar shims-0.7.45.jar commons-collections-3.2.2.jar javax.annotation-api-1.2.jar slf4j-api-1.7.16.jar commons-compiler-3.0.9.jar javax.inject-1.jar slf4j-log4j12-1.7.16.jar commons-compress-1.8.1.jar javax.inject-2.4.0-b34.jar snappy-0.2.jar commons-configuration-1.6.jar javax.servlet-api-3.1.0.jar snappy-java-1.1.7.3.jar commons-crypto-1.0.0.jar javax.ws.rs-api-2.0.1.jar spark-catalyst_2.11-2.4.5.jar commons-dbcp-1.4.jar javolution-5.5.1.jar spark-core_2.11-2.4.5.jar commons-digester-1.8.jar jaxb-api-2.2.2.jar spark-graphx_2.11-2.4.5.jar commons-httpclient-3.1.jar jcl-over-slf4j-1.7.16.jar spark-hive_2.11-2.4.5.jar commons-io-2.4.jar jdo-api-3.0.1.jar spark-hive-thriftserver_2.11-2.4.5.jar commons-lang-2.6.jar jersey-client-2.22.2.jar spark-kvstore_2.11-2.4.5.jar commons-lang3-3.5.jar jersey-common-2.22.2.jar spark-launcher_2.11-2.4.5.jar commons-logging-1.1.3.jar jersey-container-servlet-2.22.2.jar spark-mllib_2.11-2.4.5.jar commons-math3-3.4.1.jar jersey-container-servlet-core-2.22.2.jar spark-mllib-local_2.11-2.4.5.jar commons-net-3.1.jar jersey-guava-2.22.2.jar spark-network-common_2.11-2.4.5.jar commons-pool-1.5.4.jar jersey-media-jaxb-2.22.2.jar spark-network-shuffle_2.11-2.4.5.jar compress-lzf-1.0.3.jar jersey-server-2.22.2.jar spark-repl_2.11-2.4.5.jar core-1.1.2.jar jetty-6.1.26.cloudera.4.jar spark-sketch_2.11-2.4.5.jar curator-client-2.6.0.jar jetty-util-6.1.26.cloudera.4.jar spark-sql_2.11-2.4.5.jar curator-framework-2.6.0.jar jline-2.14.6.jar spark-streaming_2.11-2.4.5.jar curator-recipes-2.6.0.jar joda-time-2.9.3.jar spark-tags_2.11-2.4.5.jar datanucleus-api-jdo-3.2.6.jar jodd-core-3.5.2.jar spark-unsafe_2.11-2.4.5.jar datanucleus-core-3.2.10.jar jpam-1.1.jar spark-yarn_2.11-2.4.5.jar datanucleus-rdbms-3.2.9.jar json4s-ast_2.11-3.5.3.jar spire_2.11-0.13.0.jar derby-10.12.1.1.jar json4s-core_2.11-3.5.3.jar spire-macros_2.11-0.13.0.jar eigenbase-properties-1.1.5.jar json4s-jackson_2.11-3.5.3.jar ST4-4.0.4.jar flatbuffers-1.2.0-3f79e055.jar json4s-scalap_2.11-3.5.3.jar stax-api-1.0.1.jar gson-2.2.4.jar jsr305-1.3.9.jar stax-api-1.0-2.jar guava-14.0.1.jar jta-1.1.jar stream-2.7.0.jar guice-3.0.jar jtransforms-2.4.0.jar stringtemplate-3.2.1.jar guice-servlet-3.0.jar jul-to-slf4j-1.7.16.jar super-csv-2.2.0.jar hadoop-annotations-2.6.0-cdh5.16.2.jar kryo-shaded-4.0.2.jar univocity-parsers-2.7.3.jar hadoop-auth-2.6.0-cdh5.16.2.jar leveldbjni-all-1.8.jar validation-api-1.1.0.Final.jar hadoop-aws-2.6.0-cdh5.16.2.jar libfb303-0.9.3.jar xbean-asm6-shaded-4.8.jar hadoop-client-2.6.0-cdh5.16.2.jar libthrift-0.9.3.jar xercesImpl-2.9.1.jar hadoop-common-2.6.0-cdh5.16.2.jar log4j-1.2.17.jar xmlenc-0.52.jar hadoop-hdfs-2.6.0-cdh5.16.2.jar lz4-java-1.4.0.jar xz-1.5.jar hadoop-mapreduce-client-app-2.6.0-cdh5.16.2.jar machinist_2.11-0.6.1.jar zookeeper-3.4.6.jar hadoop-mapreduce-client-common-2.6.0-cdh5.16.2.jar macro-compat_2.11-1.1.1.jar zstd-jni-1.3.2-2.jar hadoop-mapreduce-client-core-2.6.0-cdh5.16.2.jar metrics-core-3.1.5.jar hadoop-mapreduce-client-jobclient-2.6.0-cdh5.16.2.jar metrics-graphite-3.1.5.jar

    5.spark 3.0.0集成编译

    5.1 步骤和上面基本一样

    # 根目录下的pom.xml中 scala.version就是2.12,所以不需要change #(修改make-distribution.sh(文件在./dev 里面) VERSION=3.0.0 SCALA_VERSION=2.12 SPARK_HADOOP_VERSION=2.6.0-cdh5.16.2 SPARK_HIVE=1 # 编译的时候加一个debug -X ./dev/make-distribution.sh \ --name 2.6.0-cdh5.16.2 \ --tgz \ -Phadoop-2.6 \ -Dhadoop.version=2.6.0-cdh5.16.2 \ -Phive -Phive-thriftserver -Pyarn \ -DskipTests \ -Pscala-2.12 \ -Dscala.version=2.12.10 \ -X

    5.2 报错及解决

    5.2.1 报错

    报此错的原因是hadoop2.6版本中没有spark3.0.0相应的API

    Caused by: org.apache.maven.plugin.PluginExecutionException: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.3.0:compile failed. at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:148) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289) at org.apache.maven.cli.MavenCli.main (MavenCli.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) Caused by: sbt.internal.inc.CompileFailed at sbt.internal.inc.AnalyzingCompiler.call (AnalyzingCompiler.scala:253) at sbt.internal.inc.AnalyzingCompiler.compile (AnalyzingCompiler.scala:122) at sbt.internal.inc.AnalyzingCompiler.compile (AnalyzingCompiler.scala:95)

    5.2.2 spark3.0.0中源码类修改

    (1)代码所在目录

    (2) 修改过程

    /** * Set up the context for submitting our ApplicationMaster. * This uses the YarnClientApplication not available in the Yarn alpha API. */ def createApplicationSubmissionContext( newApp: YarnClientApplication, containerContext: ContainerLaunchContext): ApplicationSubmissionContext = { val componentName = if (isClusterMode) { config.YARN_DRIVER_RESOURCE_TYPES_PREFIX } else { config.YARN_AM_RESOURCE_TYPES_PREFIX } val yarnAMResources = getYarnResourcesAndAmounts(sparkConf, componentName) val amResources = yarnAMResources ++ getYarnResourcesFromSparkResources(SPARK_DRIVER_PREFIX, sparkConf) logDebug(s"AM resources: $amResources") val appContext = newApp.getApplicationSubmissionContext appContext.setApplicationName(sparkConf.get("spark.app.name", "Spark")) appContext.setQueue(sparkConf.get(QUEUE_NAME)) appContext.setAMContainerSpec(containerContext) appContext.setApplicationType("SPARK") sparkConf.get(APPLICATION_TAGS).foreach { tags => appContext.setApplicationTags(new java.util.HashSet[String](tags.asJava)) } sparkConf.get(MAX_APP_ATTEMPTS) match { case Some(v) => appContext.setMaxAppAttempts(v) case None => logDebug(s"${MAX_APP_ATTEMPTS.key} is not set. " + "Cluster's default value will be used.") } sparkConf.get(AM_ATTEMPT_FAILURE_VALIDITY_INTERVAL_MS).foreach { interval => appContext.setAttemptFailuresValidityInterval(interval) } val capability = Records.newRecord(classOf[Resource]) capability.setMemory(amMemory + amMemoryOverhead) capability.setVirtualCores(amCores) if (amResources.nonEmpty) { ResourceRequestHelper.setResourceRequests(amResources, capability) } logDebug(s"Created resource capability for AM request: $capability") sparkConf.get(AM_NODE_LABEL_EXPRESSION) match { case Some(expr) => val amRequest = Records.newRecord(classOf[ResourceRequest]) amRequest.setResourceName(ResourceRequest.ANY) amRequest.setPriority(Priority.newInstance(0)) amRequest.setCapability(capability) amRequest.setNumContainers(1) amRequest.setNodeLabelExpression(expr) appContext.setAMContainerResourceRequest(amRequest) case None => appContext.setResource(capability) } sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern => try { val logAggregationContext = Records.newRecord(classOf[LogAggregationContext]) logAggregationContext.setRolledLogsIncludePattern(includePattern) sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern => logAggregationContext.setRolledLogsExcludePattern(excludePattern) } appContext.setLogAggregationContext(logAggregationContext) } catch { case NonFatal(e) => logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " + "does not support it", e) } } appContext.setUnmanagedAM(isClientUnmanagedAMEnabled) sparkConf.get(APPLICATION_PRIORITY).foreach { appPriority => appContext.setPriority(Priority.newInstance(appPriority)) } appContext } # 解决和hadoop2.6编译不通过问题 ## 注释spark3.0.0 sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern => try { val logAggregationContext = Records.newRecord(classOf[LogAggregationContext]) logAggregationContext.setRolledLogsIncludePattern(includePattern) sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern => logAggregationContext.setRolledLogsExcludePattern(excludePattern) } appContext.setLogAggregationContext(logAggregationContext) } catch { case NonFatal(e) => logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " + "does not support it", e) } } appContext.setUnmanagedAM(isClientUnmanagedAMEnabled) sparkConf.get(APPLICATION_PRIORITY).foreach { appPriority => appContext.setPriority(Priority.newInstance(appPriority)) } ## 将代码变为 spark2.4.5的代码 sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern => try { val logAggregationContext = Records.newRecord(classOf[LogAggregationContext]) // These two methods were added in Hadoop 2.6.4, so we still need to use reflection to // avoid compile error when building against Hadoop 2.6.0 ~ 2.6.3. val setRolledLogsIncludePatternMethod = logAggregationContext.getClass.getMethod("setRolledLogsIncludePattern", classOf[String]) setRolledLogsIncludePatternMethod.invoke(logAggregationContext, includePattern) sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern => val setRolledLogsExcludePatternMethod = logAggregationContext.getClass.getMethod("setRolledLogsExcludePattern", classOf[String]) setRolledLogsExcludePatternMethod.invoke(logAggregationContext, excludePattern) } appContext.setLogAggregationContext(logAggregationContext) } catch { case NonFatal(e) => logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " + "does not support it", e) } }

    (3)代码修改对比图

    (4)参考资料

    spark3.0相对于spark2.4.5的源码变动 https://github.com/apache/spark/pull/16884/files 修改spark3.0.0 源码位置 https://github.com/apache/spark/blob/e93b8f02cd706bedc47c9b55a73f632fe9e61ec3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

    5.3 重新编译,成功

    [root@hadoop003 source]# cd spark-3.0.0/ [root@hadoop003 spark-3.0.0]# ls appveyor.yml common data examples launcher mllib-local python resource-managers sql assembly conf dev external LICENSE NOTICE R sbin streaming bin CONTRIBUTING.md dist graphx licenses pom.xml README.md scalastyle-config.xml target build core docs hadoop-cloud mllib project repl spark-3.0.0-bin-2.6.0-cdh5.16.2.tgz tools

    解压根目录spark-3.0.0-bin-2.6.0-cdh5.16.2.tgz即可使用

    Processed: 0.017, SQL: 9