Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Recommendations

Info

Chapter 7Analytics at Scaleval conf = new SparkConf().setAppName("wiki_test") // create aspark config objectval sc = new SparkContext(conf) // Create a spark contextval data = sc.textFile("/path/to/somedir") // Read files from"somedir" into an RDD of (filename, content) pairs.val tokens = data.flatMap(_.split(" ")) // Split each file intoa list of tokens (words).val wordFreq = tokens.map((_, 1)).reduceByKey(_ + _) // Add acount of one to each token, then sum the counts per word type.wordFreq.sortBy(s => -s._2).map(x => (x._2, x._1)).top(10)// Get the top 10 words. Swap word and count to sort by count.On top of Spark Core, Spark provides the following:• Spark SQL, which is a SQL interface through thecommand line or a database connector interface. Italso provides a SQL interface for the Spark data frameobject.• Spark Streaming, which enables you to processstreaming data in real time.• MLib, a machine learning library to build analyticalmodels on Spark data.• GraphX, a distributed graph processing framework.Analytics in the CloudLike many other fields, analytics is being impacted by the cloud. It isaffected in two ways. Big cloud providers are continuously releasingmachine learning APIs. So, a developer can easily write a machinelearning application without worrying about the underlining algorithm.For example, Google provides APIs for computer vision, natural language,168
Chapter 7Analytics at Scalespeech processing, and many more. A user can easily write code that cangive the sentiment of an image of a face or voice in two or three lines ofcode.The second aspect of the cloud is in the data engineering part.In Chapter 1 I gave an example of how to expose a model as a highperformanceREST API using Falcon. Now if a million users are going touse it and if the load varies by much, then autoscale is a required featureof this application. If you deploy the application in Google App Engineor AWS Lambda, you can achieve the autoscale feature in 15 minutes.Once the application is autoscaled, you need to think about the database.DynamoDB from Amazon and Cloud Datastore by Google are autoscaleddatabases in the cloud. If you use one of them, your application is nowhigh performance and autoscaled, but people around globe will access it,so the geographical distance will create extra latency or a negative impacton performance. You also have to make sure that your application is alwaysavailable. Further, you need to deploy your application in three regions:Europe, Asia, and the United States (you can choose more regions if yourbudget permits). If you use an elastic load balancer with a geobalancingrouting rule, which routes the traffic from a region to the app engine ofthat region, then it will be available across the globe. In geobalancing,you can mention a secondary app engine for each rule, which makesyour application highly available. If a primary app engine is down, thesecondary app engine will take care of the things.Figure 7-6 describes this system.169
Page 1 and 2:
AdvancedData AnalyticsUsing PythonW
Page 3 and 4:
Advanced Data Analytics Using Pytho
Page 5 and 6:
Table of ContentsAbout the Authorxi
Page 7 and 8:
Table of ContentsNaive Bayes Classi
Page 9 and 10:
Table of ContentsTime-Series Analys
Page 11 and 12:
About the Technical ReviewerSundar
Page 13 and 14:
CHAPTER 1IntroductionIn this book,
Page 15 and 16:
Chapter 1IntroductionOOP in PythonB
Page 17 and 18:
Chapter 1Introductionif pageFile.ge
Page 19 and 20:
Chapter 1Introductiondef parseSoupT
Page 21 and 22:
Chapter 1Introductionsuper(AirLineR
Page 23 and 24:
Chapter 1Introductionif "Value" in
Page 25 and 26:
Chapter 1Introductionavailable in R
Page 27 and 28:
Chapter 1Introductiong.scores = Tab
Page 29 and 30:
Chapter 1Introductionif f == 'clien
Page 31 and 32:
Chapter 1Introductionmetadata1 = Me
Page 33 and 34:
Chapter 1Introduction(pred4,prob4)
Page 35 and 36:
CHAPTER 2ETL with Python(Structured
Page 37 and 38:
Chapter 2ETL with Python (Structure
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
CHAPTER 3Supervised LearningUsing P
Page 63 and 64:
Chapter 3Supervised Learning Using
Page 65 and 66:
Page 67 and 68:
Page 69 and 70:
Page 71 and 72:
Decision TreeChapter 3 Supervised L
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
CHAPTER 4UnsupervisedLearning: Clus
Page 91 and 92:
Chapter 4Unsupervised Learning: Clu
Page 93 and 94:
Page 95 and 96:
General and Euclidean DistanceThe d
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Chapter 4Graph Theoretical Approach
Page 111 and 112:
CHAPTER 5Deep Learningand Neural Ne
Page 113 and 114:
Chapter 5Deep Learning and Neural N
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128: Chapter 5Deep Learning and Neural N
Page 133 and 134: Chapter 6Time SeriesFigure 6-1. A t
Page 135 and 136: Chapter 6Time SeriesA trend can be
Page 137 and 138: Chapter 6Time SeriesThe simple movi
Page 139 and 140: Chapter 6Time SeriesIrregular Fluct
Page 141 and 142: Chapter 6Time SeriesFigure 6-2 show
Page 143 and 144: Chapter 6Time Seriesìïïr ( k)=í
Page 145 and 146: Chapter 6Time SeriesThe autocovaria
Page 147 and 148: Chapter 6Time SeriesIn this case, r
Page 149 and 150: Chapter 6Time SeriesHere is how to
Page 151 and 152: Chapter 6 Time SeriesThe Fourier Tr
Page 153 and 154: Chapter 6Time Seriesmodel provides
Page 155 and 156: CHAPTER 7Analytics at ScaleIn recen
Page 157 and 158: Chapter 7Analytics at Scalealphabet
Page 159 and 160: Chapter 7Analytics at Scalepublic a
Page 161 and 162: Chapter 7Analytics at ScaleHere is
Page 163 and 164: Chapter 7Analytics at Scaleimport o
Page 165 and 166: Chapter 7Analytics at ScaleRootBDAS
Page 167 and 168: Chapter 7Analytics at ScaleTo test
Page 169 and 170: Chapter 7Analytics at ScaleHDFS Fil
Page 171 and 172: Chapter 7Analytics at ScaleJoin Pat
Page 173 and 174: Chapter 7Analytics at Scale}{}Strin
Page 175 and 176: Chapter 7Analytics at Scale}}String
Page 177: Chapter 7Analytics at ScaleSpark Co
Page 181 and 182: Chapter 7Analytics at Scaley=height
Page 183 and 184: Chapter 7Analytics at Scaleif not s
Page 185 and 186: Chapter 7Analytics at Scaleelse:(pr
Page 187 and 188: Chapter 7Analytics at Scalefor futu
Page 189 and 190: Chapter 7Analytics at ScaleYou can
Page 191 and 192: IndexCollaborative filtering, 52Com
Page 193 and 194: IndexNeo4j, 34Neo4j REST, 35Neural
Page 195: IndexTime series (cont.)transformat
show all

Advanced Data Analytics Using Python_ With Machine Learning, Deep Learning and NLP Examples ( 2023)

Create successful ePaper yourself

Delete template?

Save as template?