Category: Hadoop

hadoop logo 0

Useful commands for hadoop developer

This post combines most frequently used command for spark, emr, yarn and AWS by hadoop developer. Kill Spark  job: This command will kill all the running spark jobs.

...

Missing Imputation in scala 0

Missing Imputation in scala

Missing imputation algorithm Read the data Get all columns name and the type of columns Replace all missing value(NA, N.A., N.A//,” ”) by null Set Boolean value for...

Missing Imputation in python 0

Missing Imputation in python

Missing imputation algorithm Read the data Get all columns name and the type of columns Replace all missing value(NA, N.A., N.A//,” ”) by null Set Boolean value for...

Spark SQL Using Parquet 0

Spark SQL Using Parquet

Today, I’m focusing on how to use parquet format in spark.  Please get the more insight about parquet format If you are new to this format. Parquet: Apache Parquet is a...

0

Spark SQL using Avro

Today, I’m  flashing lights on  how to use  Avro, a data serialization system, data format on spark sql. Unlike hive spark does not provides direct support for the...

Spark SQL Using Hive 0

Spark SQL Using Hive

In this blog I’m going to describe how to integrate hive with spark. You may find this code on spark’s official github page. My effort is to describe...