348957348957

More documents

Recommendations

Info

Consider the following example: A dataset is given by [1, 1, 4, 3, 5, 2, 6, 2, 4], and point P is equal to 5. Figure 7-4 shows how kNN would be applied to this dataset. Now, if you were to specify that k is equal to 3, there are, based on distance, three nearest neighbors to the point 5. Those neighbors are 4, 4, and 6. So, based on the kNN algorithm, query point P will be classified as 4 because 4 is the majority number in the k number of points nearest to it. Similarly, kNN continues defining other query points using the same majority principle. When using kNN, it’s crucial to choose a k value that minimizes noise — unexplainable random variation, in other words. At the same time, you must choose a k value that includes sufficient data points in the selection process. If the data points aren’t uniformly distributed, it’s generally harder to predetermine a good k value. Be careful to select an optimum k value for each dataset you’re analyzing. Large k values tend to produce less noise and more boundary smoothing — clearer definition and less overlap — between classes than small k values do. Knowing when to use the k-nearest neighbor algorithm kNN is particularly useful for multi-label learning — supervised learning where the algorithm is applied so that it automatically learns from (detects patterns in) multiple sets of instances. Each of these sets could potentially have several classes of their own. With multi-label learning, the algorithm learns to predict multiple class labels for each new instance it encounters. The problem with kNN is that it takes a lot longer than other classification methods to classify a sample. Nearest neighbor classifier performance depends on calculating the distance function and on the value of the neighborhood parameter k. You can try to speed things up by specifying optimal values for k and n. Exploring common applications of k-nearest neighbor algorithms kNN is often used for internet database management purposes. In this capacity, kNN is useful for website categorization, web page ranking, and other user dynamics across the web. kNN classification techniques are also quite beneficial in customer relationship management (CRM), a set of processes that ensure a business sustains improved relationships with its clients while simultaneously experiencing increased business revenues. Most CRMs get tremendous benefit from using kNN to data-mine customer information to find patterns that are useful in boosting customer retention. The method is so versatile that even if you’re a small-business owner or a marketing department manager, you can easily use kNN to boost your own marketing return on investment. Simply use kNN to analyze your customer data for purchasing patterns, and then use those findings to customize marketing initiatives so that they’re more exactly targeted for your customer base.
Solving Real-World Problems with Nearest Neighbor Algorithms Nearest neighbor methods are used extensively to understand and create value from patterns in retail business data. In the following sections, I present two powerful cases where kNN and average-NN algorithms are being used to simplify management and security in daily retail operations. Seeing k-nearest neighbor algorithms in action K-nearest neighbor techniques for pattern recognition are often used for theft prevention in the modern retail business. Of course, you’re accustomed to seeing CCTV cameras around almost every store you visit, but most people have no idea how the data gathered from these devices is being used. You might imagine that someone in the back room, monitoring these cameras for suspicious activity, and perhaps that is how things were done in the past. But now a modern surveillance system is intelligent enough to analyze and interpret video data on its own, without the need for human assistance. The modern systems can now use k-nearest neighbor for visual pattern recognition to scan and detect hidden packages in the bottom bin of a shopping cart at checkout. If an object is detected that is an exact match for an object listed in the database, the price of the spotted product could even automatically be added to the customer’s bill. Though this automated billing practice is not used extensively now, the technology has been developed and is available for use. K-nearest neighbor is also used in retail to detect patterns in credit card usage. Many new transaction-scrutinizing software applications use kNN algorithms to analyze register data and spot unusual patterns that indicate suspicious activity. For example, if register data indicates that a lot of customer information is being entered manually rather than through automated scanning and swiping, this could indicate that the employee who’s using that register is in fact stealing a customer’s personal information. Or, if register data indicates that a particular good is being returned or exchanged multiple times, this could indicate that employees are misusing the return policy or trying to make money from making fake returns. Seeing average nearest neighbor algorithms in action Average nearest neighbor algorithm classification and point pattern detection can be used in grocery retail to identify key patterns in customer purchasing behavior, and subsequently increase sales and customer satisfaction by anticipating customer behavior. Consider the following story: As with other grocery stores, buyer behavior at (the fictional) Waldorf Food Co-op tends to follow fixed patterns. Managers have even commented on the odd fact that members of a particular age group tend to visit the store during the same particular time window, and they even tend to buy the same types of products. One day, Manager Mike got extremely proactive and decided to hire a data scientist to analyze his customer data and provide exact details
Page 3 and 4:
Data Science For Dummies ® , 2nd E
Page 5 and 6:
Data Science For Dummies® To view
Page 7 and 8:
Choosing a Data Graphic Chapter 10:
Page 9 and 10:
Getting to Know Knoema Data Queuing
Page 11 and 12:
costs. I’ve worked hard to make s
Page 13 and 14:
Foreword We live in exciting, even
Page 15 and 16:
Part 1 Getting Started with Data Sc
Page 17 and 18:
Chapter 1 Wrapping Your Head around
Page 19 and 20:
and the well-being of their busines
Page 21 and 22:
quantitative description of the wor
Page 23 and 24:
Alternatives Organizations and thei
Page 25 and 26:
support enhancements, finance and b
Page 27 and 28:
Because the three Vs of big data ar
Page 29 and 30:
FIGURE 2-1: Popular sources of big
Page 31 and 32:
Defining data engineering If engine
Page 33 and 34:
volumes of data in-batch — where
Page 35 and 36:
there yet. Real-time, stream-proces
Page 37 and 38:
estrictive. MPP is quicker and easi
Page 39 and 40:
The company had the following three
Page 41 and 42:
Decrease financial risks. A busines
Page 43 and 44:
Granularity is a measure of a datas
Page 45 and 46:
you. Unless you’re a data technol
Page 47 and 48:
The term multivariate refers to mor
Page 49 and 50: management. Making business value f
Page 51 and 52: dashboards and tabular data reports
Page 53 and 54: Part 2 Using Data Science to Extrac
Page 55 and 56: Chapter 4 Machine Learning: Learnin
Page 57 and 58: dataset composed of historical valu
Page 59 and 60: activation function is a mathematic
Page 61 and 62: applications have been known to imp
Page 63 and 64: out a smaller section of the datase
Page 65 and 66: To understand discrete and continuo
Page 67 and 68: Ranking variable-pairs using Spearm
Page 69 and 70: in shared variance — when a varia
Page 71 and 72: virtually riskless investments (U.S
Page 73 and 74: Logistic regression Logistic regres
Page 75 and 76: It is cumbersome to detect outliers
Page 77 and 78: FIGURE 5-4: A comparison of pattern
Page 79 and 80: Chapter 6 Using Clustering to Subdi
Page 81 and 82: FIGURE 6-1: A simple scatter plot.
Page 83 and 84: Looking at clustering similarity me
Page 85 and 86: are separated by wide, sparse areas
Page 87 and 88: FIGURE 6-4: A schematic layout of a
Page 89 and 90: With DBScan, take an iterative, tri
Page 91 and 92: IN THIS CHAPTER Chapter 7 Modeling
Page 93 and 94: FIGURE 7-1: A classification of Wor
Page 95 and 96: located. What to do? One easy solut
Page 97 and 98: FIGURE 7-2: The distances between t
Page 99: the kNN algorithm has been ranked a
Page 103 and 104: Chapter 8 Building Models That Oper
Page 105 and 106: operating. Later in this chapter, y
Page 107 and 108: Taking on time series Most IoT sens
Page 109 and 110: in order to operate and drive thems
Page 111 and 112: IN THIS PART … Explore the princi
Page 113 and 114: order to help members of this audie
Page 115 and 116: What social, political, caused-base
Page 117 and 118: FIGURE 9-1: This design style conve
Page 119 and 120: to create a sense of relative persp
Page 121 and 122: KNOWING WHEN TO GET PERSUASIVE Pers
Page 123 and 124: FIGURE 9-6: A bar chart. Source: Ly
Page 125 and 126: same category. To ensure that it do
Page 127 and 128: FIGURE 9-12: A stacked chart. FIGUR
Page 129 and 130: Source: Lynda.com, Python for DS FI
Page 131 and 132: clustering and machine learning met
Page 133 and 134: Source: Lynda.com, Python for DS FI
Page 135 and 136: Though many data graphic types can
Page 137 and 138: If you want users to be able to int
Page 139 and 140: Just in case you’re not aware, HT
Page 141 and 142: Bringing in the web servers and PHP
Page 143 and 144: }, { position: 4, quantity: 10 }];
Page 145 and 146: var scale_y = d3.scale.linear() .do
Page 147 and 148: Chapter 11 Web-Based Applications f
Page 149 and 150: For all you techies out there, a co
Page 151 and 152:
You can also use the dashboard to a
Page 153 and 154:
FIGURE 11-3: Choropleth map in Open
Page 155 and 156:
Map layers are spatial datasets tha
Page 157 and 158:
FIGURE 11-6: A bar chart in iCharts
Page 159 and 160:
fashion using a data-graphing appli
Page 161 and 162:
need to produce documents in an inf
Page 163 and 164:
clunky. These design elements consu
Page 165 and 166:
nitty-gritty details. Even if every
Page 167 and 168:
Chapter 13 Making Maps from Spatial
Page 169 and 170:
FIGURE 13-2: Spatial data described
Page 171 and 172:
FIGURE 13-4: A street and neighborh
Page 173 and 174:
area, sum up the volume of snow tha
Page 175 and 176:
Now that you know what types of coo
Page 177 and 178:
FIGURE 13-8: Buffered features at t
Page 179 and 180:
the values of an attribute in a vec
Page 181 and 182:
FIGURE 13-11: Your new QGIS setup.
Page 183 and 184:
FIGURE 13-13: A layer added into QG
Page 185 and 186:
FIGURE 13-15: Layer properties in Q
Page 187 and 188:
Part 4 Computing for Data Science
Page 189 and 190:
IN THIS CHAPTER Chapter 14 Using Py
Page 191 and 192:
programmed to do. Classes, on the o
Page 193 and 194:
Sets in Python A set is another dat
Page 195 and 196:
def average(any_list):return(sum(an
Page 197 and 198:
Be sure to import the library into
Page 199 and 200:
SciPy offers functionalities and al
Page 201 and 202:
FIGURE 14-2: Time-series plot of mo
Page 203 and 204:
When you download your free Python
Page 205 and 206:
import numpy as np import matplotli
Page 207 and 208:
square brackets) and then turn thos
Page 209 and 210:
Chapter 15 Using Open Source R for
Page 211 and 212:
mode. Data frames are structured in
Page 213 and 214:
handle that task: You simply define
Page 215 and 216:
"
Page 217 and 218:
information and use a linear regres
Page 219 and 220:
univariate time series forecasts. O
Page 221 and 222:
Therefore, knowing how to make sens
Page 223 and 224:
marketing, and more.
Page 225 and 226:
whether SQL should be pronounced
Page 227 and 228:
Let the following scenario serve as
Page 229 and 230:
Imagine that you have a text field
Page 231 and 232:
into native R or Python data forms
Page 233 and 234:
LISTING 16-3 A Full Outer JOIN SELE
Page 235 and 236:
Chapter 17 Doing Data Science with
Page 237 and 238:
FIGURE 17-1: The full dataset that
Page 239 and 240:
FIGURE 17-3: Spotting outliers in a
Page 241 and 242:
FIGURE 17-5: Excel XY (scatter) plo
Page 243 and 244:
Automating Excel tasks with macros
Page 245 and 246:
Absolute: After you start recording
Page 247 and 248:
at its public workflow server (see
Page 249 and 250:
IN THIS PART … Explore the impact
Page 251 and 252:
Let me emphasize here, at the begin
Page 253 and 254:
What: Getting Directly to the Point
Page 255 and 256:
journalist you walk a fine line bet
Page 257 and 258:
take tremendous value from consumin
Page 259 and 260:
Because the library won’t budge o
Page 261 and 262:
correlation coefficient of 0.86. Th
Page 263 and 264:
Looking back to the World Bank Glob
Page 265 and 266:
Chapter 19 Delving into Environment
Page 267 and 268:
evolution of EI away from standard
Page 269 and 270:
Non-relational database technologie
Page 271 and 272:
in a lack of stable water resources
Page 273 and 274:
and land cover. Through his recent
Page 275 and 276:
concepts and methods you can use to
Page 277 and 278:
narrative, and conversation. Custom
Page 279 and 280:
Webtrends (http://webtrends.com): O
Page 281 and 282:
click heat map data visualizations
Page 283 and 284:
functionality for A/B split testing
Page 285 and 286:
Segment Builder, check out the Goog
Page 287 and 288:
geographic region. After you distin
Page 289 and 290:
temporally relevant but not geo-ref
Page 291 and 292:
FIGURE 21-1: A map product derived
Page 293 and 294:
ehavior and information about prese
Page 295 and 296:
the time). Officer Bob, on said str
Page 297 and 298:
Part 6
Page 299 and 300:
IN THIS PART … Find out all about
Page 301 and 302:
national statistics, election resul
Page 303 and 304:
yet available. Like Data.gov (discu
Page 305 and 306:
Geology Engineering Some examples f
Page 307 and 308:
FIGURE 22-1: The index of insect re
Page 309 and 310:
When you work on collaborative proj
Page 311 and 312:
and tools, you can create results t
Page 313 and 314:
ideas behind web-scraping in Chapte
Page 315 and 316:
Missing data can indicate a formatt
Page 317 and 318:
FIGURE 23-3: A Gephi hairball graph
Page 319 and 320:
Checking out Knoema’s data visual
Page 321 and 322:
FIGURE 23-6: A map of Eurostat data
Page 323 and 324:
Dedication I dedicate this book to
Page 325 and 326:
Publisher’s Acknowledgments Acqui
Page 327 and 328:
Create your own Dummies book cover
show all

348957348957

Create successful ePaper yourself

Delete template?

Save as template?