Spark SQL using Avro
Today, I’m flashing lights on how to use Avro, a data serialization system, data format on spark sql. Unlike hive spark does not provides direct support for the avro format. To load avro format data we have to use spark-avro package. For that import com.databricks.spark.avro._
// import needed for the .avro method to be added
import com.databricks.spark.avro._
And then
// The Avro records get converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("resources/episodes.avro")
Complete Code :
import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql._
object SparkAvro {
def main(args: Array[String]) {
// import needed for the .avro method to be added
import com.databricks.spark.avro._
val conf = new SparkConf().setAppName("Spark Using Avro").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
// The Avro records get converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("resources/episodes.avro")
import sqlContext.implicits._
import sqlContext._
df.registerTempTable("AvroSample")
val result = sqlContext.sql("select 8 from AvroSample")
result.show()
}
}
Databricks avro dependency
<!-- http://mvnrepository.com/artifact/com.databricks/spark-avro_2.10 --> <dependency> <groupId>com.databricks</groupId> <artifactId>spark-avro_2.10</artifactId> <version>2.0.1</version> </dependency>