Beginner's Hadoop

scala logo 0

Scala 101

In this tutorial, we will go over the Scala programming language. Scala is a powerful high-level programming language that incorporates object-oriented and functional programming. It’s a type-safe language that relies on the JVM...

JDBC in Spark SQL 0

JDBC in Spark SQL

Apache Spark has very powerful built-in API for gathering data from a relational database. Effectiveness and efficiency, following the usual Spark approach, is managed in a transparent way....

Redshift Database connection in spark 0

Redshift Database connection in spark

This blog primarily focus on how to connect to redshift from Spark. Redshift: Amazon Redshift is a fully managed petabyte-scale data warehouse service. Redshift is designed for analytic...

hadoop logo 0

Useful commands for hadoop developer

This post combines most frequently used command for spark, emr, yarn and AWS by hadoop developer. Kill Spark  job: This command will kill all the running spark jobs.


scala-logo 0

File Operation in scala

File operation is important operation in an application. We might have to provide some configuration information or some input for application in such scenario we have to perform...

Missing Imputation in scala 0

Missing Imputation in scala

Imputation: In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting...