Category: Spark

JDBC in Spark SQL 0

JDBC in Spark SQL

Apache Spark has very powerful built-in API for gathering data from a relational database. Effectiveness and efficiency, following the usual Spark approach, is managed in a transparent way....

hadoop logo 0

Useful commands for hadoop developer

This post combines most frequently used command for spark, emr, yarn and AWS by hadoop developer. Kill Spark  job: This command will kill all the running spark jobs.

...

Missing Imputation in scala 0

Missing Imputation in scala

Imputation: In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting...

Missing Imputation in python 0

Missing Imputation in python

Imputation: In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting...