Spark SQL Using Hive
In this blog I’m going to describe how to integrate hive with spark. You may find this code on spark’s official github page. My effort is to describe each steps of the code.
For spark word count example please follow my previous blog and for spark sql you can go through sparksql blog . Basic configuration is similar to spark word count ie SparkConf and Spark context. The only difference is creation of hive context. Spark provides direct support for hive tables. Using hive context we can run our sql queries.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
OR you can create new table and load data on recently created table.
sqlContext.sql("CREATE TABLE IF NOT EXISTS employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'") sqlContext.sql("LOAD DATA LOCAL INPATH 'src/data/employee.txt' INTO TABLE employee")
Next step is to run your intended query
val result = sqlContext.sql("FROM employe SELECT id, name, age")
Until now all codes are lazy evaluation when you trigger action it will execute the task. In this code result.show() is an example of action.
result.show()
Complete code:
import org.apache.spark.{SparkConf, SparkContext} object SparkSqlHiveExample { def main(args: Array[String]) { val sparkconf = new SparkConf() .setMaster("local[*]") .setAppName("Spark SQL Test") val sc = new SparkContext(sparkconf) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'") sqlContext.sql("LOAD DATA LOCAL INPATH 'src/data/employee.txt' INTO TABLE employee") val result = sqlContext.sql(" SELECT id, name, age FROM employe") result.show() } }