Technologies

Recommendations

Info

Linear Fit a · x + b Popul. % Temperature RH CoV - Temperature CoV - RH a b R 2 a b R 2 a b R 2 a b R 2 P1 30.1 5.17 · 10 −5 −2.43 · 10 −3 0.81 1.20 · 10 −4 −4.88 · 10 −3 0.83 −7.90 · 10 −3 1.14 · 10 −3 0.83 −6.56 · 10 −3 3.29 · 10 −3 0.84 P2 25.6 −1.91 · 10 −5 2.46 · 10 −3 0.83 1.03 · 10 −4 −1.73 · 10 −3 0.84 −9.06 · 10 −3 1.28 · 10 −3 0.84 −3.71 · 10 −3 1.98 · 10 −3 0.83 P3 23.3 1.41 · 10 −3 −1.04 · 10 −1 0.75 2.11 · 10 −4 −5.59 · 10 −3 0.71 −4.91 · 10 −2 7.26 · 10 −2 0.77 −4.46 · 10 −2 2.42 · 10 −2 0.78 P4 19.6 1.73 · 10 −3 −1.07 · 10 −1 0.36 4.45 · 10 −4 −16.4 · 10 −3 0.44 −1.36 · 10 −1 1.33 · 10 −2 0.47 −8.02 · 10 −2 4.13 · 10 −2 0.55 Exponential Fit a · e b·x Popul. % Temperature RH CoV - Temperature CoV - RH a b R 2 a b R 2 a b R 2 a b R 2 P1 30.1 2.38 · 10 −3 −6.45 · 10 −3 0.84 2.67 · 10 −4 5.11 · 10 −2 0.89 2.06 · 10 −3 −1.74 · 10 0 0.72 1.74 · 10 −2 −9.01 · 10 0 0.81 P2 25.6 3.08 · 10 −3 −1.64 · 10 −2 0.85 5.37 · 10 −4 5.38 · 10 −2 0.88 1.84 · 10 −3 −1.09 · 10 1 0.79 1.37 · 10 −2 −1.63 · 10 1 0.81 P3 23.3 2.57 · 10 −3 6.35 · 10 −3 0.76 5.31 · 10 −5 9.93 · 10 −2 0.69 5.13 · 10 −3 −8.45 · 10 0 0.58 1.66 · 10 −3 3.57 · 10 0 0.57 P4 19.6 3.62 · 10 −4 3.36 · 10 −2 0.43 1.31 · 10 −5 11.54 · 10 −2 0.59 6.10 · 10 −3 −3.91 · 10 0 0.23 7.17 · 10 −3 −1.39 · 10 0 0.21 Table 5: Linear (a · x + b) and nonlinear (a · e b·x ) fits for the monthly failure rates of four disk populations in HH1, as a function of the absolute temperature, relative humidity, temperature variation, and relative humidity variation. Figure 4: Linear and non-linear correlation between the monthly failure rate and the environmental conditions we consider, for the most popular disk model in HH1. red = high correlation, dark blue = low correlation. also improves energy efficiency and lowers both capital and operating costs. Putting all these effects together, in Figure 5 we compare the cooling-related (capital + operational + disk replacement) costs for a chiller-based (CD1), a water-side economized (HD2), and a free-cooled datacenter (HH4). For the disk replacement costs, we consider only the cost above that of a 1.5% AFR (the AFR of CD1), so the chiller-based bars do not include these costs. We use HD2 and HH4 datacenters for this figure because they exhibit the highest disk AFRs in their categories; they represent the worst-case disk replacement costs for their respective datacenter classes. We use the PUEs and CAPEX estimates from Table 1, and estimate the cooling energy cost assuming $0.07/kWh for the electricity price (the average industrial electricity price in the United States). We also assume that the datacenter operator brunts the cost of replacing each failed disk, which we assume to be $100, by having to buy the extra disk (as opposed to simply paying a slightly more expensive warranty). We perform the cost comparison for 10, 15, and 20 years, as the datacenter lifetime. The figure shows that, for a lifetime of 10 years, the cooling cost of the chiller-based datacenter is roughly balanced between capital and operating expenses. For a lifetime of 15 years, the operational cooling cost becomes the larger fraction, whereas for 20 years it be- Figure 5: 10, 15, and 20-year cost comparison for chillerbased, water-side, and free-cooled datacenters including the cost replacing disks. comes roughly 75% of the total cooling-related costs. In comparison, water-side economized datacenters have slightly higher capital cooling costs than chiller-based ones. However, their operational cooling costs are substantially lower, because of the periods when the chillers can be bypassed (and turned off). Though these datacenters may lead to slightly higher AFRs, the savings from lower operational cooling costs more than compensate for the slight disk replacement costs. In fact, the fully burdened cost of replacing a disk would have to be several fold higher than $100 (which is unlikely) for replacement costs to dominate. Finally, the free-cooled datacenter exhibits lower capital cooling costs (by 3.6x) and operational cooling costs (by 8.3x) than the chillerbased datacenter. Because of these datacenters’ sometimes higher AFRs, their disk replacement costs may be non-trivial, but the overall cost tradeoff is still clearly in their favor. For 20 years, the free-cooled datacenter exhibits overall cooling-related costs that are roughly 74% and 45% lower than the chiller-based and water-side economized datacenters, respectively. Based on this figure, we observe that: 12. Although operating at higher humidity may entail substantially higher component AFRs in certain geographies, the cost of this increase is small compared to the savings from reducing energy consumption and infrastructure costs via free cooling, especially for longer lifetimes. USENIX Association 14th USENIX Conference on File and Storage <strong>Technologies</strong> (FAST ’16) 61
Figure 6: Normalized daily failures in CD1 over 4 years. Infant mortality seems clear. 6 Modeling Lifetime in Modern DCs Models of hardware component lifetime have been used extensively in industry and academia to understand and predict failures, e.g. [9, 29, 30]. In the context of a datacenter, which hosts hundreds of thousands of components, modeling lifetimes is important to select the conditions under which datacenters should operate and prepare for the unavoidable failures. Because disk drives fail much more frequently than other components, they typically receive the most modeling attention. The two most commonly used disk lifetime models are the bathtub model and the failure acceleration model based on the Arrhenius equation (Equation 1). The latter acceleration model does not consider the impact of relative humidity, which has only become a significant factor after the advent of free cooling (Section 5.3). Bathtub model. This model abstracts the lifetime of large component populations as three consecutive phases: (1) in the first phase, a significant number of components fail prematurely (infant mortality); (2) the second phase is a stable period of lower failure rate; and (3) in the third phase, component failure increase again due to wear-out effects. We clearly observe infant mortality in some datacenters but not others, depending on disk model/vendor. For example, Figure 3 does not show any obvious infant mortality in HH1. The same is the case of CD3. (Prior works have also observed the absence of infant mortality in certain cases, e.g. [28].) In contrast, CD1 uses disks from a different vendor and does show clear infant mortality. Figure 6 shows the normalized number of daily failures over a period of 4 years in CD1. This length of time implies 1 server refresh (which happened after 3 years of operation). The figure clearly shows 2 spikes of temporally clustered failures: one group 8 months after the first server deployment, and another 9 months after the first server refresh. A full deployment/refresh may take roughly 3 months, so the first group of disks may have failed 5-8 months after deployment, and the second 6-9 months after deployment. A new disk lifetime model for free-cooled datacenters. Section 5.3 uses multiple sources of evidence to argue that relative humidity has a significant impact on hardware reliability in humid free-cooled datacenters. The section also classifies the disk failures into two main groups: SMART and controller/connectivity. Thus, we extend the acceleration model above to include relative humidity, and recognize that there are two main disk lifetime processes in free-cooled datacenters: (1) one that affects the disk mechanics and SMART-related errors; and (2) another that affects its controller/connector. We model process #1 as the Arrhenius acceleration factor, i.e. AF 1 = AF T (we define AF T in Equation 1), as has been done in the past [10, 27]. For process #2, we model the corrosion rate due to high relative humidity and temperature, as both of them are known to affect corrosion [33]. Prior works have devised models allowing for more than one accelerating variable [12]. A general such model extends the Arrhenius failure rate to account for relative humidity, and compute a corrosion rate CR: CR(T ,RH)=const · e ( −Ea k·T ) (b·RH)+( c·RH · e k·T ) (2) where T is the average temperature, RH is the average relative humidity, E a is the temperature activation energy, k is Boltzmann’s constant, and const, b, and c are other constants. Peck empirically found an accurate model that assumes c = 0 [24], and we make the same assumption. Intuitively, Equation 2 exponentially relates the corrosion rate with both temperature and relative humidity. One can now compute the acceleration factor AF 2 by dividing the corrosion rate at the elevated temperature and relative humidity CR(T e ,RH e ) by the same rate at the baseline temperature and relative humidity CR(T b ,RH b ). Essentially, AF 2 calculates how much faster disks will fail due to the combined effects of these environmentals. This division produces: AF 2 = AF T · AF RH (3) where AF RH = e b·(RH e−RH b ) and AF T is from Equation 1. Now, we can compute the compound average failure rate FR as AF 1 · FR 1b + AF 2 · FR 2b , where FR 1b is the average mechanical failure rate at the baseline temperature, and FR 2b is the average controller/connector failure rate at the baseline relative humidity and temperature. The rationale for this formulation is that the two failure processes proceed in parallel, and a disk’s controller/connector would not fail at exactly the same time as its other components; AF 1 estimates the extra failures due to mechanical/SMART issues, and AF 2 estimates the extra failures due to controller/connectivity issues. To account for varying temperature and relative humidity, we also weight the factors based on the duration of those temperatures and relative humidities. Other works have used weighted acceleration factors, e.g. [31]. For simplicity in the monitoring of these environmentals, we can use the average temperature and average relative 62 14th USENIX Conference on File and Storage <strong>Technologies</strong> (FAST ’16) USENIX Association
Page 1 and 2:
conference proceedings ISBN 978-1-9
Page 4 and 5:
USENIX Association Proceedings of t
Page 6 and 7:
14th USENIX Conference on File and
Page 8:
Thursday, February 25, 2016 Elimina
Page 12 and 13:
Optimizing Every Operation in a Wri
Page 14 and 15:
all tree updates and can be replaye
Page 16 and 17:
second pass replays the logical ent
Page 18 and 19:
Whenever a data or metadata entry i
Page 20 and 21:
160 Single File Rename (warm cache)
Page 22 and 23: 40 rsync 1000 IMAP MB / s 30 20 10
Page 24 and 25: References [1] ANDERSEN, D. G., FRA
Page 26 and 27: The Composite-file File System: Dec
Page 28 and 29: system. Since moving a subfile in a
Page 30 and 31: The original intent of our work is
Page 32 and 33: References [ADB05] Abd-El-Malek M,
Page 34 and 35: Isotope: Transactional Isolation fo
Page 36 and 37: outinely implement indirection in F
Page 38 and 39: A55 beginTX(); Read(1); Write(0); e
Page 40 and 41: disk, at the same level of the stac
Page 42 and 43: 5.2 Transactional Filesystem IsoFS
Page 44 and 45: Goodput (K Ops/sec) Goodput (K Ops/
Page 46 and 47: References [1] fcntl man page. [2]
Page 48: [53] M. Saxena, M. M. Swift, and Y.
Page 51 and 52: 4250925 2125463 0 130 2 1 0 130 125
Page 53 and 54: The following access methods are fa
Page 55 and 56: Request stage Insertion requests HT
Page 57 and 58: 3. Clients must be able to read bac
Page 59 and 60: land detection and reverse power fl
Page 61 and 62: (a) Insert latencies (b) Cold query
Page 63 and 64: [27] SCOTT, JIM. MapR-DB OpenTSDB b
Page 65 and 66: Understanding these tradeoffs and i
Page 67 and 68: is in line with those reported by h
Page 69 and 70: DC Tag Cooling AFR Increase wrt AFR
Page 71: Figure 3: Two-year comparison betwe
Page 75 and 76: Figure 8: The server blade designs.
Page 78 and 79: Flash Reliability in Production: Th
Page 80 and 81: Model name MLC-A MLC-B MLC-C MLC-D
Page 82 and 83: Median RBER 0e+00 4e−08 8e−08 M
Page 84 and 85: Median RBER 0e+00 6e−08 Cluster 1
Page 86 and 87: Daily UE Probability 0.000 0.006 0.
Page 88 and 89: Model name MLC-A MLC-B MLC-C MLC-D
Page 90 and 91: sides differences in burn-in testin
Page 92 and 93: Opening the Chrysalis: On the Real
Page 94 and 95: allow a two-parity erasure code (n
Page 96 and 97: parity B(D k ) we can obtain B(A),
Page 98 and 99: in client-datanode data stream. To
Page 100 and 101: surviving servers repair the lost d
Page 102 and 103: (a) Overall Network Traffic in GB (
Page 104 and 105: References [1] Ceph Erasure. http:/
Page 106 and 107: The Devil is in the Details: Implem
Page 108 and 109: Figure 1: Normal programming order
Page 110 and 111: The benefits from such a scheme are
Page 112 and 113: Used Write L 1 H Clean Write L 1 [P
Page 114 and 115: on the device. The only comparable
Page 116 and 117: Normalized Erasures 1.1 1.05 1 0.95
Page 118 and 119: References [1] https://github.com/z
Page 120: [54] G. Yadgar, A. Yucovich, H. Aro
Page 123 and 124:
storage devices. As a result, to se
Page 125 and 126:
m 3 and m 4 are in the erased state
Page 127 and 128:
3.2 Hybrid ECC and Data Structure T
Page 129 and 130:
34MB) of files in an ext4 partition
Page 131 and 132:
e intuitively justified. When both
Page 133 and 134:
algorithm [30], and designed the LZ
Page 135 and 136:
[27] T. Tso, “Debugfs.” [Online
Page 137 and 138:
from the host are performed on eith
Page 139 and 140:
(c) Read-Dominate Distribution of R
Page 141 and 142:
Normalized 0.2 0 Traditional Li et
Page 143 and 144:
[19] D. Park and D. H. Du, “Hot d
Page 145 and 146:
during lookups. WiscKey retains the
Page 147 and 148:
Throughput (MB/s) 600 500 400 300 2
Page 149 and 150:
oth point queries and range queries
Page 151 and 152:
3.4.2 Optimizing the LSM-tree Log A
Page 153 and 154:
Throughput (MB/s) 300 250 200 150 1
Page 155 and 156:
1000 LevelDB RocksDB WhisKey-GC Whi
Page 157 and 158:
References [1] Apache HBase. http:/
Page 159 and 160:
Crash-Consistent Applications. In P
Page 161 and 162:
2.1 Multi-Stage Log-Structured Desi
Page 163 and 164:
3.1 Roles of Redundancy Redundancy
Page 165 and 166:
1 // @param L maximum level 2 // @p
Page 167 and 168:
Write amplification 60 50 40 30 20
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
[18] M. Ghosh, I. Gupta, S. Gupta,
Page 175 and 176:
A Proofs This section provides proo
Page 177 and 178:
1 // @param wal write-ahead log fil
Page 179 and 180:
The first challenge is that the sca
Page 181 and 182:
get operation Client set operation
Page 183 and 184:
VT 2 , ..., VT F . If VT 1 = VT 2 =
Page 185 and 186:
may suffer from unnecessary resourc
Page 187 and 188:
Memory (GB) 350 300 250 200 150 100
Page 189 and 190:
Throughput (10 3 ops/s) PBR Cocytus
Page 191 and 192:
[29] K. Rashmi, N. B. Shah, D. Gu,
Page 193 and 194:
image data as necessary, drasticall
Page 195 and 196:
Table 2: HelloBench Workloads. Hell
Page 197 and 198:
Percent of Images 100% 80% 60% 40%
Page 199 and 200:
Figure 15: Slacker Architecture. Mo
Page 201 and 202:
Figure 18: Loopback Bitmaps. Contai
Page 203 and 204:
Percent of Images 100% 80% 60% 40%
Page 205 and 206:
References [1] Tintri VMstore(tm) T
Page 208 and 209:
sRoute: Treating the Storage Stack
Page 210 and 211:
for a dynamic mechanism that change
Page 212 and 213:
long as it reaches the initiating s
Page 214 and 215:
p X r Y p X r Y p X (a) (b) (c) Fig
Page 216 and 217:
Request rate (reqs/s) 100000 10000
Page 218 and 219:
Throughput (IO/s) 700 600 Read requ
Page 220 and 221:
6 Open questions Our initial invest
Page 222 and 223:
An API for application control of S
Page 224 and 225:
Flamingo: Enabling Evolvable HDD-ba
Page 226 and 227:
Input Rack description Resource con
Page 228 and 229:
4 can spin up concurrently. Flaming
Page 230 and 231:
a set of performance-related charac
Page 232 and 233:
1600 1200 800 400 0 Number of confi
Page 234 and 235:
Time to first byte (sec) 120 100 80
Page 236 and 237:
References [1] ALVAREZ, G. A., BORO
Page 238 and 239:
PCAP: Performance-Aware Power Cappi
Page 240 and 241:
eing visited. In contrast, in this
Page 242 and 243:
Observation 1: HDDs exhibit a dynam
Page 244 and 245:
12 10 (a) PCAP base De lay (b) Unbo
Page 246 and 247:
(a) Power capping by throttling (b)
Page 248 and 249:
(a) Throughput vs. time (b) Thro
Page 250 and 251:
References [1] Amazon S3. http://aw
Page 252 and 253:
Mitigating Sync Amplification for C
Page 254 and 255:
a new image based on the base image
Page 256 and 257:
We conducted the experiments on a m
Page 258:
REFERENCES [1] CHIDAMBARAM, V., PIL
Page 261 and 262:
the specification and the executabl
Page 263 and 264:
Modeled Client ClientReq Ack Notify
Page 265 and 266:
wrapping the target vNext component
Page 267 and 268:
ExtentNode machines and launches a
Page 269 and 270:
Fabric services. The model was writ
Page 271 and 272:
merging process would preserve this
Page 273 and 274:
[31] LEESATAPORNWONGSA, T.,HAO, M.,
Page 275 and 276:
In the following segment, we briefl
Page 277 and 278:
Label Definition Measured metrics:
Page 279 and 280:
1 (a) CDF of Slowdown Interval (b)
Page 281 and 282:
(a) CDF of Slowdown vs. Drive Age (
Page 283 and 284:
(≥2x) disks and SSDs are unplugge
Page 285 and 286:
Our current SSD dataset is two orde
Page 287 and 288:
Latencies with Chopper. In Proceedi
Page 289 and 290:
the technology. However, this probl
Page 291 and 292:
For example, the DFH of a dataset c
Page 293 and 294:
slackness and present a “likely r
Page 295 and 296:
that accuracy is achieved at a samp
Page 297 and 298:
Name Description Size Deduplication
Page 299 and 300:
Figure 11: Execution of the full re
Page 302 and 303:
OrderMergeDedup: Efficient, Failure
Page 304 and 305:
Figure 1: Failure-consistent dedupl
Page 306 and 307:
Advanced Packaging Tool (APT) is a
Page 308 and 309:
loads from 5× to 12×. In comparis
Page 310:
Dmdedup: Device mapper target for d
Page 313 and 314:
ploits the scan-resistant ability o
Page 315 and 316:
3.2 Operations Based on the archite
Page 317 and 318:
Figure 2: Architecture of D-ARC D.
Page 319 and 320:
Read Miss Ratio (%) 100 90 80 70 60
Page 321 and 322:
Total Miss Ratio (%) 7.2 7 6.8 6.6
Page 323 and 324:
Throughput (MB/s) 95 LRU D-LRU D-AR
Page 325 and 326:
ack for client-side flash caches. I
Page 327 and 328:
Context recovery can be achieved ei
Page 329 and 330:
can use these flags to pass hints t
Page 331 and 332:
Throughput (Kops/s) 12.5 10 7.5 5 2
Page 333 and 334:
Environments (RESoLVE’12), London
Page 335 and 336:
To overcome all these limitations,
Page 337 and 338:
RAMCloud [44] is a DRAM-based stora
Page 339 and 340:
(VFS) layer locks all the affected
Page 341 and 342:
4.5. Atomic mmap DAX file systems a
Page 343 and 344:
NVMM Read latency Write bandwidth c
Page 345 and 346:
Duration 10s 30s 120s 600s 3600s NI
Page 347 and 348:
for Persistent Memory. In Proceedin
Page 349 and 350:
Memory Storage System. In Proceedin
Page 351 and 352:
face which does not support overwri
Page 353 and 354:
Figure 3: Check-point segment handl
Page 355 and 356:
Figure 5: Writes and garbage collec
Page 357 and 358:
Capacity Block-level Hybrid Page-le
Page 359 and 360:
Transactions per Second Transaction
Page 361 and 362:
Percentage (%) 1 0.8 0.6 0.4 0.2 0
Page 363 and 364:
References [1] RethinkDB. http://re
Page 366 and 367:
CloudCache: On-demand Flash Cache M
Page 368 and 369:
Trace Time (days) Total IO (GB) WSS
Page 370 and 371:
Hit Ratio (%) Allocation (GB) 90 75
Page 372 and 373:
Hit Ratio (%) 66 63 60 57 54 51 48
Page 374 and 375:
VM migration to balance the load on
Page 376 and 377:
Latency (msec) 2.5 2.0 1.5 1.0 0.5
Page 378 and 379:
formance for the migrated VM. It al
Page 380:
[23] N. Megiddo and D. S. Modha. AR
show all

Technologies

Create successful ePaper yourself

Delete template?

Save as template?