Category: Scala

November 18, 2018

Scala 101

In this tutorial, we will go over the Scala programming language. Scala is a powerful high-level programming language that incorporates object-oriented and functional programming. It’s a type-safe language that relies on the JVM...

Scala / Spark / Spark Sql

November 17, 2018

JDBC in Spark SQL

Apache Spark has very powerful built-in API for gathering data from a relational database. Effectiveness and efficiency, following the usual Spark approach, is managed in a transparent way....

AI / Scala / Spark

November 4, 2018

Machine Learning: Logistic Regression using Apache Spark

In this blog post, I’ll help you get started using Apache Spark’s spark.ml Logistic Regression for predicting whether or not someone makes more or less than $50,000. Classification Classification...

Scala / Spark

November 26, 2017

User defined functions(udf) in spark

UDFs or user defined functions are a simple way of adding a function into the SparkSQL language. This function operates on distributed DataFrames and works row by row....

Scala / Spark / Spark Sql

November 25, 2017

Redshift Database connection in spark

This blog primarily focus on how to connect to redshift from Spark. Redshift: Amazon Redshift is a fully managed petabyte-scale data warehouse service. Redshift is designed for analytic...

Scala / Spark

August 15, 2017

Most common issues faced by spark developer and it’s solution

Most common issues faced by spark developer and it’s solution Timeout waiting for connection from pool Caused by: com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool To resolve this...

Scala

June 11, 2017

File Operation in scala

File operation is important operation in an application. We might have to provide some configuration information or some input for application in such scenario we have to perform...

Elastic Search / Hadoop / Scala / Spark

July 14, 2016

Spark and ElasticSearch integration

In this blog, as topic gives a glimpse what it is going to be. Here, I’m going to explain the end to end process of writing and reading data...

Hadoop / Scala / Spark

June 19, 2016

Missing Imputation in scala

Imputation: In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting...

Hadoop / Scala / Spark / Spark Sql

May 30, 2016

Spark SQL Using Parquet

Today, I’m focusing on how to use parquet format in spark. Please get the more insight about parquet format If you are new to this format. Parquet: Apache Parquet is a...