Distribution of Executors, Cores and Memory for a Spark Application
Resource Allocation is an important aspect during the execution of any spark job. If not configured correctly, a spark job can consume entire cluster resources and make other...
A Learner's Platform
Resource Allocation is an important aspect during the execution of any spark job. If not configured correctly, a spark job can consume entire cluster resources and make other...
Spark uses a master/slave architecture. As you can see in the figure, it has one central coordinator (Driver) that communicates with many distributed workers (executors). The driver and...
RDD RDD is a fault-tolerant collection of elements that can be operated on in parallel. DataFrame DataFrame is a Dataset organised into named columns. It is conceptually equivalent to a...
Every distributed computation is divided in small parts called jobs, stages and tasks. It’s useful to know them especially during monitoring because it helps to detect bottlenecks. Job -> Stages -> Tasks...
1. What is SparkContext? “SparkContext” is the main entry point for Spark functionality. A “SparkContext” represents the connection to a Spark cluster, and can be used to create...
by beginnershadoop · Published September 27, 2019 · Last modified September 28, 2019
Spark can access files in S3, even when running in local mode, given AWS credentials. By default, with s3a URLs, Spark will search for credentials in a few...
Classification: What are the advantages of different classification algorithms? What are the advantages of using a decision tree for classification? What are the disadvantages of using a decision...
In this tutorial, we will go over the Scala programming language. Scala is a powerful high-level programming language that incorporates object-oriented and functional programming. It’s a type-safe language that relies on the JVM...
Apache Spark has very powerful built-in API for gathering data from a relational database. Effectiveness and efficiency, following the usual Spark approach, is managed in a transparent way....
In this blog post, I’ll help you get started using Apache Spark’s spark.ml Logistic Regression for predicting whether or not someone makes more or less than $50,000. Classification Classification...
More