Category: Hadoop

October 2, 2019

Impala Export to CSV

Apache Impala is an open source massively parallel processing SQL query engine for data stored in a computer cluster running Apache Hadoop. In some cases, impala-shell is installed manually...

Hadoop / Scala / Spark

October 1, 2019

Salesforce is a customer relationship management solution that brings companies and customers together. It’s one integrated CRM platform that gives all your departments — including marketing, sales, commerce,...

Hadoop / Scala / Spark

September 30, 2019

Distribution of Executors, Cores and Memory for a Spark Application

Resource Allocation is an important aspect during the execution of any spark job. If not configured correctly, a spark job can consume entire cluster resources and make other...

AWS / Hadoop / Scala / Spark

September 27, 2019

Accessing Data Stored in Amazon S3 through Spark

Spark can access files in S3, even when running in local mode, given AWS credentials. By default, with s3a URLs, Spark will search for credentials in a few...

Hadoop / HDFS / Map Reduce / Spark / Yarn

August 31, 2017

Useful commands for hadoop developer

This post combines most frequently used command for spark, emr, yarn and AWS by hadoop developer. Kill Spark job: This command will kill all the running spark jobs. ps...

Elastic Search / Hadoop / Scala / Spark

July 14, 2016

Spark and ElasticSearch integration

In this blog, as topic gives a glimpse what it is going to be. Here, I’m going to explain the end to end process of writing and reading data...

Hadoop / Scala / Spark

June 19, 2016

Missing Imputation in scala

Imputation: In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting...

Hadoop / Python / Spark

June 19, 2016

Missing Imputation in python

Imputation: In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting...

Hadoop / Scala / Spark / Spark Sql

May 30, 2016

Spark SQL Using Parquet

Today, I’m focusing on how to use parquet format in spark. Please get the more insight about parquet format If you are new to this format. Parquet: Apache Parquet is a...

Hadoop / Scala / Spark / Spark Sql

May 28, 2016

Spark SQL using Avro

Today, I’m flashing lights on how to use Avro, a data serialization system, data format on spark sql. Unlike hive spark does not provides direct support for the...

Category: Hadoop

Impala Export to CSV

Salesforce connector in Spark

Distribution of Executors, Cores and Memory for a Spark Application

Accessing Data Stored in Amazon S3 through Spark

Useful commands for hadoop developer

Spark and ElasticSearch integration

Missing Imputation in scala

Missing Imputation in python

Spark SQL Using Parquet

Spark SQL using Avro

Recent Posts

Archives

Categories