Barrier Execution Mode in Spark
The barrier execution mode is experimental and it only handles limited scenarios. See SPIP: Barrier Execution Mode and Design Doc. |
In case of a task failure, instead of only restarting the failed task, Spark will abort the entire stage and re-launch all tasks for this stage.
Use RDD.barrier transformation to mark the current stage as a barrier stage.
barrier(): RDDBarrier[T]
barrier
simply creates a RDDBarrier that comes with the barrier-aware mapPartitions transformation.
mapPartitions[S: ClassTag](
f: Iterator[T] => Iterator[S],
preservesPartitioning: Boolean = false): RDD[S]
mapPartitions
is simply changes the regular RDD.mapPartitions transformation to create a MapPartitionsRDD with the isFromBarrier flag enabled.
Task
has a isBarrier flag that says whether this task belongs to a barrier stage (default:false
).
Spark must launch all the tasks at the same time for a barrier stage.
An RDD is in a barrier stage, if at least one of its parent RDD(s), or itself, are mapped from an RDDBarrier
.
ShuffledRDD has the isBarrier flag always disabled (false
).
MapPartitionsRDD is the only one RDD that can have the isBarrier flag enabled.
RDDBarrier.mapPartitions is the only transformation that creates a MapPartitionsRDD with the isFromBarrier flag enabled.