Big Data Analytics

Recommendations

Info

Massive Data Analysis: Tasks, Tools, . . . 21 4.1 Hadoop Distributed File System The Hadoop Distributed File System (HDFS) 6 is a distributed file system designed to run reliably and to scale on commodity hardware. HDFS achieves high faulttolerance by dividing data into smaller chunks and replicating them across several nodes in a cluster. It can scale up to 200 PB in data, and 4500 machines in single cluster. HDFS is a side project of Hadoop and works closely with it. HDFS is designed to work efficiently in batch mode, rather than in interactive mode. Characteristics of typical applications developed for HDFS, such as write once and read multiple times, and simple and coherent data access, increases the throughput. HDFS is designed to handle large file sizes from Gigabytes to a few Terabytes. HDFS follows the master/slave architecture with one NameNode and multiple DataNodes. NameNode is responsible for managing the file system’s meta data and handling requests from applications. DataNodes physically hold the data. Typically, every node in the cluster has one DataNode. Every file stored in HDFS is divided into blocks with default block size of 64 MB. For the sake of fault tolerance, every block is replicated into user-defined number of times (recommended to be a minimum of 3 times) and distributed across different data nodes. All meta data about replication and distribution of the file are stored in the NameNode. Each DataNode sends a heartbeat signal to NameNode. If it fails to do so, the NameNode marks the DataNode as failed. HDFS maintains a Secondary NameNode, which is periodically updated with information from NameNode. In case of NameNode failure, HDFS restores a NameNode with information from the Secondary NameNode, which ensures faulttolerance of the NameNode. HDFS has a built-in balancer feature, which ensures uniform data distribution across the cluster, and re-replication of missing blocks to maintain the correct number of replications. 4.2 NoSQL Databases Conventionally, Relational Database Management Systems (RDBMS) are used to manage large datasets and handle tons of requests securely and reliably. Built-in features, such as data integrity, security, fault-tolerance, and ACID (atomicity, consistency, isolation, and durability) have made RDBMS a go-to data management technology for organizations and enterprises. In spite of RDBMS’ advantages, it is either not viable or is too expensive for applications that deal with Big data. This has made organizations to adopt a special type of database called “NoSQL” (Not an SQL), which means database systems that do not employ traditional “SQL” or adopt the constraints of the relational database model. NoSQL databases cannot provide all strong built-in features of RDBMS. Instead, they are more focused on faster read/write access to support ever-growing data. 6 http://hadoop.apache.org.
22 M.K. Pusala et al. According to December 2014 statistics from Facebook [16], it has 890 Million average daily active users sharing billions of messages and posts every day. In order to handle huge volumes and a variety of data, Facebook uses a Key-Value database system with memory cache technology that can handle billions of read/write requests. At any given point in time, it can efficiently store and access trillions of items. Such operations are very expensive in relational database management systems. Scalability is another feature in NoSQL databases, attracting large number of organizations. NoSQL databases are able to distribute data among different nodes within a cluster or across different clusters. This helps to avoid capital expenditure on specialized systems, since clusters can be built with commodity computers. Unlike relational databases, NoSQL systems have not been standardized and features vary from one system to another. Many NoSQL databases trade-off ACID properties in favor of high performance, scalability, and faster store and retrieve operations. Enumerations of such NoSQL databases tend to vary, but they are typically categorized as Key-Value databases, Document databases, Wide Column databases, and Graph databases. Figure 2 shows a hierarchical view of NoSQL types, with two examples of each type. 4.2.1 Key-Value Database As the name suggests, Key-Value databases store data as Key-Value pairs, which makes them schema-free systems. In most of Key-Value databases, the key is functionally generated by the system, while the value can be of any data type from a character to a large binary object. Keys are typically stored in hash tables by hashing each key to a unique index. All the keys are logically grouped, eventhough data values are not physically grouped. The logical group is referred to as a ‘bucket’. Data can only be accessed with both a bucket and a key value because the unique index is hashed using the Fig. 2 Categorization of NoSQL databases: The first level in the hierarchy is the categorization of NoSQL. Second level, provides examples for each NoSQL database type
Page 1 and 2: Saumyadipta Pyne · B.L.S. Prakasa
Page 3 and 4: Saumyadipta Pyne ⋅ B.L.S. Prakasa
Page 5 and 6: Foreword Big data is transforming t
Page 7 and 8: viii Preface We thank all the autho
Page 9 and 10: x Contents Managing Large-Scale Sta
Page 11 and 12: xii About the Editors biological, n
Page 13 and 14: 2 S. Pyne et al. ethics. If data po
Page 15 and 16: 4 S. Pyne et al. Therefore, as data
Page 17 and 18: 6 S. Pyne et al. characteristics. I
Page 19 and 20: 8 S. Pyne et al. stays of high-dime
Page 21 and 22: 10 S. Pyne et al. References 1. Ken
Page 23 and 24: 12 M.K. Pusala et al. 1 Introductio
Page 25 and 26: 14 M.K. Pusala et al. to obtain hid
Page 27 and 28: 16 M.K. Pusala et al. to identify t
Page 29 and 30: 18 M.K. Pusala et al. duce works cl
Page 31: 20 M.K. Pusala et al. (ETL), as wel
Page 35 and 36: 24 M.K. Pusala et al. 4.2.3 Documen
Page 37 and 38: 26 M.K. Pusala et al. also offers s
Page 39 and 40: 28 M.K. Pusala et al. The Real-time
Page 41 and 42: 30 M.K. Pusala et al. Fig. 4 Implem
Page 43 and 44: 32 M.K. Pusala et al. However, thes
Page 45 and 46: 34 M.K. Pusala et al. Rack-level da
Page 47 and 48: 36 M.K. Pusala et al. 7 Summary and
Page 49 and 50: 38 M.K. Pusala et al. 15. Enhancing
Page 51 and 52: 40 M.K. Pusala et al. 64. Yuan D, C
Page 53 and 54: 42 A. Laha ability to query the dat
Page 55 and 56: 44 A. Laha as positive, negative or
Page 57 and 58: 46 A. Laha 3.2 Velocity Some of the
Page 59 and 60: 48 A. Laha angular data), variation
Page 61 and 62: 50 A. Laha that it creates a histog
Page 63 and 64: 52 A. Laha It is seen that the ASR
Page 65 and 66: 54 A. Laha However, better methods
Page 67 and 68: Application of Mixture Models to La
Page 83 and 84:
Application of Mixture Models to La
Page 85 and 86:
An Efficient Partition-Repetition A
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
96 X. Chen and J. Huan 1 Introducti
Page 107 and 108:
98 X. Chen and J. Huan Hash or rand
Page 109 and 110:
100 X. Chen and J. Huan Fig. 1 Vert
Page 111 and 112:
102 X. Chen and J. Huan According t
Page 113 and 114:
104 X. Chen and J. Huan Fig. 3 Assi
Page 115 and 116:
106 X. Chen and J. Huan The ADG alg
Page 117 and 118:
108 X. Chen and J. Huan partition o
Page 119 and 120:
110 X. Chen and J. Huan Table 5 Ave
Page 121 and 122:
112 X. Chen and J. Huan orders. At
Page 123 and 124:
114 X. Chen and J. Huan 10. Ng AY (
Page 125 and 126:
116 Y. Simmhan and S. Perera will b
Page 127 and 128:
118 Y. Simmhan and S. Perera human-
Page 129 and 130:
120 Y. Simmhan and S. Perera IoT sy
Page 131 and 132:
122 Y. Simmhan and S. Perera Fig. 1
Page 133 and 134:
124 Y. Simmhan and S. Perera Finall
Page 135 and 136:
126 Y. Simmhan and S. Perera detect
Page 137 and 138:
128 Y. Simmhan and S. Perera Fig. 3
Page 139 and 140:
130 Y. Simmhan and S. Perera perfor
Page 141 and 142:
132 Y. Simmhan and S. Perera get fe
Page 143 and 144:
134 Y. Simmhan and S. Perera Refere
Page 145 and 146:
Complex Event Processing in Big Dat
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
164 C. Hota et al. security and pri
Page 173 and 174:
166 C. Hota et al. The network is p
Page 175 and 176:
168 C. Hota et al. Fig. 5 Web usage
Page 177 and 178:
170 C. Hota et al. the P2P network
Page 179 and 180:
172 C. Hota et al. authors in [45]
Page 181 and 182:
174 C. Hota et al. Table 1 Applicat
Page 183 and 184:
176 C. Hota et al. Fig. 7 Flow-base
Page 185 and 186:
178 C. Hota et al. applications lik
Page 187 and 188:
180 C. Hota et al. Fig. 10 Botnet T
Page 189 and 190:
182 C. Hota et al. The build time o
Page 191 and 192:
184 C. Hota et al. With the edge-pa
Page 193 and 194:
186 C. Hota et al. 24. Liang J, Kum
Page 195 and 196:
Application-Level Benchmarking of B
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
202 S. Batra and S. Sachdeva 1 Intr
Page 209 and 210:
204 S. Batra and S. Sachdeva In 201
Page 211 and 212:
206 S. Batra and S. Sachdeva 2.2 Du
Page 213 and 214:
208 S. Batra and S. Sachdeva One mo
Page 215 and 216:
210 S. Batra and S. Sachdeva Retrie
Page 217 and 218:
212 S. Batra and S. Sachdeva 3.4 Op
Page 219 and 220:
214 S. Batra and S. Sachdeva inform
Page 221 and 222:
216 S. Batra and S. Sachdeva Fig. 4
Page 223 and 224:
218 S. Batra and S. Sachdeva 7. IHT
Page 225 and 226:
Microbiome Data Mining for Microbia
Page 227 and 228:
Page 229 and 230:
Page 231 and 232:
Page 233 and 234:
Page 235 and 236:
Page 237 and 238:
Page 239 and 240:
Page 241 and 242:
238 K. Das and Z. Nenadic a viable
Page 243 and 244:
240 K. Das and Z. Nenadic band [47,
Page 245 and 246:
242 K. Das and Z. Nenadic (a) (b) C
Page 247 and 248:
244 K. Das and Z. Nenadic Fig. 3 Tw
Page 249 and 250:
246 K. Das and Z. Nenadic membershi
Page 251 and 252:
248 K. Das and Z. Nenadic and are e
Page 253 and 254:
250 K. Das and Z. Nenadic Fig. 6 Ex
Page 255 and 256:
252 K. Das and Z. Nenadic 3.2.3 Dis
Page 257 and 258:
254 K. Das and Z. Nenadic 3. Belhum
Page 259 and 260:
256 K. Das and Z. Nenadic 50. Ramos
Page 261 and 262:
Big Data and Cancer Research Binay
Page 263 and 264:
Big Data and Cancer Research 261 sy
Page 265 and 266:
Big Data and Cancer Research 263 Ta
Page 267 and 268:
Big Data and Cancer Research 265 in
Page 269 and 270:
Big Data and Cancer Research 267 Id
Page 271 and 272:
Big Data and Cancer Research 269 us
Page 273 and 274:
Big Data and Cancer Research 271 Ac
Page 275 and 276:
Big Data and Cancer Research 273 39
Page 277 and 278:
Big Data and Cancer Research 275 75
show all

Big Data Analytics

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?