Monthly Archive: September 2019

September 30, 2019

Distribution of Executors, Cores and Memory for a Spark Application

Resource Allocation is an important aspect during the execution of any spark job. If not configured correctly, a spark job can consume entire cluster resources and make other...

Scala / Spark

September 27, 2019

What are workers, executors, cores in Spark Standalone cluster?

Spark uses a master/slave architecture. As you can see in the figure, it has one central coordinator (Driver) that communicates with many distributed workers (executors). The driver and...

Difference between DataFrame, Dataset, and RDD in Spark

Scala / Spark

September 27, 2019

Difference between DataFrame, Dataset, and RDD in Spark

RDD RDD is a fault-tolerant collection of elements that can be operated on in parallel. DataFrame DataFrame is a Dataset organised into named columns. It is conceptually equivalent to a...

Scala / Spark / spark interview

September 27, 2019

Spark Jobs, Stages, Tasks

Every distributed computation is divided in small parts called jobs, stages and tasks. It’s useful to know them especially during monitoring because it helps to detect bottlenecks. Job -> Stages -> Tasks...

Scala / Spark / spark interview / Spark-Submit

September 27, 2019

Spark Interview Questions : Basic

1. What is SparkContext? “SparkContext” is the main entry point for Spark functionality. A “SparkContext” represents the connection to a Spark cluster, and can be used to create...

AWS / Hadoop / Scala / Spark

September 27, 2019

Accessing Data Stored in Amazon S3 through Spark

Spark can access files in S3, even when running in local mode, given AWS credentials. By default, with s3a URLs, Spark will search for credentials in a few...

Monthly Archive: September 2019

Distribution of Executors, Cores and Memory for a Spark Application

What are workers, executors, cores in Spark Standalone cluster?

Difference between DataFrame, Dataset, and RDD in Spark

Spark Jobs, Stages, Tasks

Spark Interview Questions : Basic

Accessing Data Stored in Amazon S3 through Spark

Recent Posts

Archives

Categories