Network Streaming in Spark
Network streaming in spark is another interesting topic, here I am going to explain how network streaming works and will provide complete spark scala code.
Before jumping to the code, I would like to go through couple of steps
Streaming Context :
The first thing a spark streaming program must create a StreamingContext object, which contains information about the cluster. There are three ways to create StreamingContext.
val scc = new StreamingContext(path: String) Re-create a StreamingContext from a checkpoint file.
OR
val ssc = new StreamingContext("local[2]","NetworkWordCount", Seconds(1))
OR
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") val ssc = new StreamingContext(conf, Seconds(1))
where: local[2] –> This will Create a local StreamingContext with two working thread if you want to use yarn mode change it to yarn
Seconds(1)–> Batch duration, which means that every 1 second RDD will be created to perform batch operation
Creating SocketStream:
We have to create SocketStream to listen the particular socket where data has been streamed. For this we need to know host address and the port number.
val hostname="localhost" //replace host by your intended lost name val port=44444 // change port according to your need
SocketTextStream will Create a DStream which will connect to hostname:port, like localhost:44444. ie
ssc.socketTextStream("localhost", 44444) val lines = ssc.socketTextStream("localhost", 44444)