Hadoop / Scala / Spark / Spark Sql
0

Spark SQL Using Parquet

by beginnershadoop · Published May 30, 2016 · Updated July 16, 2016

Today, I’m focusing on how to use parquet format in spark. Please get the more insight about parquet format If you are new to this format.

Parquet: Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

Working with parquet is pretty straightforward because spark provides in-build support for parquet format. To load parquet file you have to provide file location spark will automatically read those data.

val parquetFile = sqlContext.read.parquet("resources/wiki_parquet")

After reading data register that dataframe as temporary table and name your table.

parquetFile.registerTempTable("employee")

Then, you can run your sql query as per your need.

Complete code :

import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql._
import org.apache.log4j.{ Level, Logger }

object SparkUsingParquet {

  def main(args: Array[String]) {

    val sparkConf = new SparkConf().setAppName("Spark SQL parquet").setMaster("local[*]")
    val sc = new SparkContext(sparkConf)

    val sqlContext = new SQLContext(sc)
    val parquetFile = sqlContext.read.parquet("resources/wiki_parquet")

    parquetFile.registerTempTable("employee")

    val allrecords = sqlContext.sql("SELECT * FROM employee")
    
    allrecords.show()

  }

}

Spark SQL Using Parquet

You may also like...

Leave a Reply Cancel reply

Recent Posts

Archives

Categories

Spark SQL Using Parquet

Share this:

Related

You may also like...

Spark SQL Using Hive

File Operation in scala

Spark SQL using Avro

Leave a Reply Cancel reply

Recent Posts

Archives

Categories