Spark SQL using Avro
Today, I’m flashing lights on how to use Avro, a data serialization system, data format on spark sql. Unlike hive spark does not provides direct support for the avro format. To load avro format data we have to use spark-avro package. For that import com.databricks.spark.avro._
// import needed for the .avro method to be added import com.databricks.spark.avro._
And then
// The Avro records get converted to Spark types, filtered, and // then written back out as Avro records val df = sqlContext.read.avro("resources/episodes.avro")
Complete Code :
import org.apache.spark.{ SparkConf, SparkContext } import org.apache.spark.sql._ object SparkAvro { def main(args: Array[String]) { // import needed for the .avro method to be added import com.databricks.spark.avro._ val conf = new SparkConf().setAppName("Spark Using Avro").setMaster("local[*]") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) // The Avro records get converted to Spark types, filtered, and // then written back out as Avro records val df = sqlContext.read.avro("resources/episodes.avro") import sqlContext.implicits._ import sqlContext._ df.registerTempTable("AvroSample") val result = sqlContext.sql("select 8 from AvroSample") result.show() } }
Databricks avro dependency
<!-- http://mvnrepository.com/artifact/com.databricks/spark-avro_2.10 --> <dependency> <groupId>com.databricks</groupId> <artifactId>spark-avro_2.10</artifactId> <version>2.0.1</version> </dependency>