11.07.2015 Views

Nirmesh Malviya - MIT Database Group

Nirmesh Malviya - MIT Database Group

Nirmesh Malviya - MIT Database Group

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

BIGTABLE<strong>Nirmesh</strong> <strong>Malviya</strong>nirmesh@csail.mit.eduApril 30, 2012


Content and Figure Credits●http://research.yahoo.com/files/6DeanGoogle.pdf●http://research.cs.wisc.edu/areas/os/Seminar/schedules/archive/bigtable.ppt●http://www-scf.usc.edu/~csci572/2011Spring/presentations/Taheriyan.pptx4/30/2012 6.830 - Spring 2012


What is Bigtable?●Distributed storage system.●Manages structured data via a simple datamodel.●Scalable.●Self-managing.4/30/2012 6.830 - Spring 2012


Why did Google not usea relational database?●Google has LOTS of data.●No commercial system big enough.●Too expensive even if there was one.●Don’t have end-to-end control.●Low-level storage optimizations difficult.4/30/2012 6.830 - Spring 2012


Data model: sparse map●Sparse multi-dimensional map.●Indexed by key.●Values are uninterpreted bytes.●Distributed, persistent and sorted.●Essentially a column-oriented physical store.4/30/2012 6.830 - Spring 2012


Data model: example4/30/2012 6.830 - Spring 2012


Data model is not relational●Writes to a row atomic.●No multirow transactions.●No table-wide integrity constraints.4/30/2012 6.830 - Spring 2012


API●Writes (atomic)● Set(): write cells in a row.● DeleteCells(): delete cells in a row.● DeleteRow(): delete all cells in a row.●Reads.●Metadata operations.● Create/delete tables, column families, changemetadata.4/30/2012 6.830 - Spring 2012


API Example: Write/Modifyatomic rowmodification4/30/2012 6.830 - Spring 2012


API Example: ReadReturn sets can be filtered using regularexpressions:anchor: com.cnn.*4/30/2012 6.830 - Spring 2012


GFS (now Colossus)●Large-scale distributed filesystem.●Master: responsible for metadata.●Chunk servers: responsible for readingand writing large chunks of data.●Chunks replicated on 3 machines.4/30/2012 6.830 - Spring 2012


Google File System (GFS)MasterReplicasGFS MasterGFS MasterClientClientC0C1C1C0C3C3C4C3C5C4Data transfers happen directly betweenclients/chunkservers.4/30/2012 6.830 - Spring 2012


SSTable●Immutable, sorted file of key-valuepairs.●Chunks of data plus an index.●Index is of block ranges, not values.64Kblock64Kblock64KblockSSTableIndex4/30/2012 6.830 - Spring 2012


Tablet●●Contains some range of rows of the table.Built out of multiple SSTables.Tablet Start:aardvark End:apple64Kblock64Kblock64KblockSSTable64Kblock64Kblock64KblockSSTableIndexIndex4/30/2012 6.830 - Spring 2012


Table●●●Multiple tablets make up a table.SSTables can be shared.Tablets do not overlap, SSTables can overlap.TabletaardvarkappleTabletapple_two_EboatSSTable SSTable SSTable SSTable4/30/2012 6.830 - Spring 2012


Locality <strong>Group</strong>s●<strong>Group</strong> column families together into anSSTable.●Can keep some groups all in memory.●Can compress locality groups.●Bloom Filters on locality groups – avoidsearching SSTable.4/30/2012 6.830 - Spring 2012


Bigtable: Building blocks●Scheduler.●GFS.●Chubby Lock service.●Mapreduce helpful but not required.4/30/2012 6.830 - Spring 2012


Typical ClusterCluster Scheduling Master Lock Service GFS MasterMachine 1Machine 2Machine 3UserTaskSingle TaskBigTableServerUserTaskBigTableServerBigTable MasterSchedulerSlaveLinuxGFSChunkServerSchedulerSlaveLinuxGFSChunkServerSchedulerSlaveLinuxGFSChunkServer4/30/2012 6.830 - Spring 2012


Chubby●{lock/file/name} service.●Coarse-grained locks, can store smallamount of data in a lock.●5 replicas, need a majority vote to beactive.4/30/2012 6.830 - Spring 2012


Finding a tablet●Tablets move around from server toserver.●Given a row, how do clients find the rightmachine?●Tablet property – startrowindex andendrowindex.●Instead: store special tables containingtablet location info in Bigtable cell itself.4/30/2012 6.830 - Spring 2012


Finding a tablet4/30/2012 6.830 - Spring 2012


Tablet Server●Manages tablets, multiple tablets per server.●●Each tablet is 100-200MB.● lives on only one server.Tablet server splits tablets that get too big.4/30/2012 6.830 - Spring 2012


Tablet Server startup●On startup, creates and acquires an exclusivelock on uniquely named file in Chubby directory.●Tablet server stops serving its tables if its losesits exclusive lock.4/30/2012 6.830 - Spring 2012


Bigtable Master●Responsible for load balancing and faulttolerance.●Use Chubby to monitor health of tablet servers,restart failed servers.● If Chubby session expires, master kills itself.●Preferably start tablet server on same machinethat the data is already at.4/30/2012 6.830 - Spring 2012


Master Startup●Grabs unique master lock in Chubby.●Prevents multi-instantiations.●Scans directory in Chubby for live servers,communicates with every live tabletserver.●Scans METADATA table to learn the set oftablets.4/30/2012 6.830 - Spring 2012


Bigtable Master●Master monitors Chubby directory to discovertablet servers.●Master is responsible for finding when tabletserver is no longer serving its tablets.● Detects by checking periodically the status ofthe lock of each tablet server.4/30/2012 6.830 - Spring 2012


Writing to a table●●Mutations are logged, then applied toan in-memory version.Logfile stored in GFS.InsertInsertDeleteInsertDeleteTabletMemtableapple_two_EboatInsertSSTableSSTable4/30/2012 6.830 - Spring 2012


Tablet Compactions●Minor compaction.●●Reduce memory usage.Reduce log traffic on restart.●Merging compaction.●Major compaction.●No deletion records, only live data.4/30/2012 6.830 - Spring 2012


BigTable System ArchitectureBigtable cellBigtable masterMetadata opsBigtable clientBigtable clientlibraryperforms metadata ops,load balancingRead/writeOpen()Bigtable tabletserverserves dataBigtable tabletserverserves dataBigtable tabletserverserves dataCluster SchedulingMasterhandles failover, monitoringGFSholds tablet data, logsLockserviceholds metadata,handles master-election4/30/2012 6.830 - Spring 2012


Bigtable in real-world●Bigtable is closed-source and owned byGoogle.●Apache Hbase is open-sourceimplementation.●Famous user: Facebook messaging platform.4/30/2012 6.830 - Spring 2012

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!