Monthly Archive: September 2019

Difference between DataFrame, Dataset, and RDD in Spark 0

Difference between DataFrame, Dataset, and RDD in Spark

RDD RDD is a fault-tolerant collection of elements that can be operated on in parallel. DataFrame DataFrame is a Dataset organised into named columns. It is conceptually equivalent to a...

spark job stage and task 0

Spark Jobs, Stages, Tasks

Every distributed computation is divided in small parts called jobs, stages and tasks. It’s useful to know them especially during monitoring because it helps to detect bottlenecks. Job -> Stages -> Tasks...