09.10.2023 Views

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 7

Analytics at Scale

val conf = new SparkConf().setAppName("wiki_test") // create a

spark config object

val sc = new SparkContext(conf) // Create a spark context

val data = sc.textFile("/path/to/somedir") // Read files from

"somedir" into an RDD of (filename, content) pairs.

val tokens = data.flatMap(_.split(" ")) // Split each file into

a list of tokens (words).

val wordFreq = tokens.map((_, 1)).reduceByKey(_ + _) // Add a

count of one to each token, then sum the counts per word type.

wordFreq.sortBy(s => -s._2).map(x => (x._2, x._1)).top(10)

// Get the top 10 words. Swap word and count to sort by count.

On top of Spark Core, Spark provides the following:

• Spark SQL, which is a SQL interface through the

command line or a database connector interface. It

also provides a SQL interface for the Spark data frame

object.

• Spark Streaming, which enables you to process

streaming data in real time.

• MLib, a machine learning library to build analytical

models on Spark data.

• GraphX, a distributed graph processing framework.

Analytics in the Cloud

Like many other fields, analytics is being impacted by the cloud. It is

affected in two ways. Big cloud providers are continuously releasing

machine learning APIs. So, a developer can easily write a machine

learning application without worrying about the underlining algorithm.

For example, Google provides APIs for computer vision, natural language,

168

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!