(Spark 异常) Failed to get broadcast

    技术2026-02-15  19

    问题

    之前开发的时候遇到. Failed to get broadcast_0_piece0 of broadcast_0异常.

    20/07/03 15:58:50 ERROR Utils: Exception encountered org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:178) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:150) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:150) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:150) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:222) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:139) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:212) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

    原因&解决措施

    去节点查看了一下知道. 基本上是第一次没有执行完毕, 随后又开了一个SparkContext导致的.

    使用单例模式替换即可. 单例模式代码如下所示(此处使用饿汉模式):

    // SparkConfig 要写成单例模式. public class SeanSparkConfig { public static JavaSparkContext sparkContext; public static String exameDateFilePath = "data/demos/exam/exam.txt"; static { //boolean runLocalFlag = false; boolean runLocalFlag = true; SparkConf sparkconf; if(runLocalFlag) { sparkconf = new SparkConf().setAppName("Examle").setMaster("local[2]"); }else { sparkconf = new SparkConf().setAppName("Examle").setMaster("spark://localhost:7077").setJars(new String[] {"/Users/sean/Documents/Gitrep/bigdata/spark/target/spark-demo.jar"}); } sparkContext = new JavaSparkContext(sparkconf); } } 使用. 使用时直接调用即可. JavaRDD<String> examLinesRDD = SeanSparkConfig.sparkContext.textFile(SeanSparkConfig.exameDateFilePath);

    Reference

    [1]. Failed to get broadcast_10_piece0 of broadcast_10 [2]. Spark常见异常:Failed to get broadcast_32_piece0 of broadcast_32 [3]. repartition导致的广播失败,关于错误Failed to get broadcast_544_piece0 of broadcast_544

    Processed: 0.010, SQL: 10