14.11.2014 Views

PLMCE2014-Innodb-Architecture-And-Performance-Optimization

PLMCE2014-Innodb-Architecture-And-Performance-Optimization

PLMCE2014-Innodb-Architecture-And-Performance-Optimization

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

InnoDB <strong>Architecture</strong>, and <br />

<strong>Performance</strong> Op7miza7on <br />

Peter Zaitsev, Ovais Tariq <br />

Percona Live, Santa Clara,CA <br />

April 1,2014 <br />

1 <br />

www.percona.com


<strong>Architecture</strong> and <strong>Performance</strong> <br />

• Advanced <strong>Performance</strong> Op?miza?on requires <br />

transparency/X-­‐ray Vision <br />

• Impossible without understanding system <br />

architecture <br />

• Focus on Conceptual Aspects <br />

– Exact Checksum algorithm InnoDB uses is not <br />

important <br />

– What maSers <br />

• How fast is that algorithm ? <br />

• How checksums are checked/updated <br />

2 <br />

www.percona.com


Aspects of <strong>Architecture</strong> <br />

• General <strong>Architecture</strong> <br />

• Storage Files Layout <br />

• Threads <strong>Architecture</strong> <br />

• Indexes <br />

• Buffer Pool <br />

• Page Cleaner Thread <br />

• Adap?ve Hash Index <br />

• Change Buffer <br />

3 <br />

www.percona.com


Aspects of <strong>Architecture</strong> 2 <br />

• Doublewrite Buffer <br />

• Redo Logging <br />

• Recovery <br />

• Mul? Versioning <br />

• Locking and Latching <br />

• BLOB Storage <br />

• Compression <br />

• Miscellaneous <br />

4 <br />

www.percona.com


InnoDB Versions <br />

• Below MySQL 5.5 <br />

– Ancient History. Upgrade Recommended <br />

• MySQL 5.5 <br />

– Scalability further improved <br />

• MySQL 5.6 (GA) <br />

– Further improvements in Scalability <br />

– Full Text Search, fast checksums etc. <br />

– Focus of today <br />

• MySQL 5.7 <br />

– Future <br />

5 <br />

www.percona.com


XtraDB <br />

• Follows MySQL/InnoDB Versions <br />

• Included in Percona Server and MariaDB <br />

– No more available as separate plugin <br />

• Includes all InnoDB features and <br />

improvements plus many more <br />

– hSp://goo.gl/bnq5sI <br />

• Percona Server 5.6 is latest GA version <br />

6 <br />

www.percona.com


Overview of General InnoDB <strong>Architecture</strong> <br />

GENERAL ARCHITECTURE <br />

7 <br />

www.percona.com


General <strong>Architecture</strong> <br />

• Tradi?onal OLTP Engine <br />

– “Emulates Oracle <strong>Architecture</strong>” <br />

• Implemented using MySQL Storage engine API <br />

• Row Based Storage. Row Locking. MVCC <br />

• Data Stored in Tablespaces <br />

• Log of changes stored in circular log files <br />

• Data, Indexes Pages cached in “Buffer Pool” <br />

8 <br />

www.percona.com


Physical Structure of InnoDB Tablespace and Logs <br />

STORAGE FILES LAYOUT <br />

9 <br />

www.percona.com


InnoDB Tablespaces <br />

• All data stored in Tablespaces <br />

– Changes to these databases stored in Circular Logs <br />

– Changes has to be reflected in tablespace before log <br />

record is overwriSen <br />

• Single tablespace or mul?ple tablespaces <br />

– innodb_file_per_table=1 <br />

– Mul?ple tablespace support is enabled by default <br />

• System informa?on is always in main tablespace <br />

– Main tablespace can consist of many files <br />

• They are concatenated <br />

10 <br />

www.percona.com


Tablespace Format <br />

• Collec?on of Segments <br />

– Segment is like a “file” <br />

• Segment is number of extents <br />

– Typically 64 of 16K page sizes <br />

– Smaller extents for very small objects <br />

• First Tablespace page contains header <br />

– Tablespace size <br />

– Tablespace id <br />

11 <br />

www.percona.com


Types of Segments <br />

• Each table is Set of Indexes <br />

– InnoDB has “index organized tables” <br />

• Each index has <br />

– Leaf node segment <br />

– Non Leaf node segment <br />

• Special Segments <br />

– Rollback Segment(s) <br />

– Insert buffer, etc. <br />

12 <br />

www.percona.com


InnoDB On-­‐Disk Layout <br />

13 <br />

www.percona.com


InnoDB Space Alloca7on <br />

• Small Segments (less than 32 pages) – by page <br />

• Large Segments <br />

– Extent at the ?me (to avoid fragmenta?on) <br />

• Free pages recycled within same segment <br />

• All pages in extent must be free before it is used <br />

in different segment of same tablespace <br />

– innodb_file_per_table=1 -­‐ free space can be used by <br />

same table only <br />

• InnoDB never shrinks its tablespaces <br />

14 <br />

www.percona.com


InnoDB Page Structure <br />

15 <br />

www.percona.com


Storage Tuning Parameters <br />

• innodb_file_per_table <br />

– Store each table in its own file/tablespace <br />

• innodb_autoextend_increment <br />

– Extend all tablespaces in this increment <br />

– Extension of tablespaces can happen concurrently <br />

• innodb_page_size <br />

– May be set to 4K, 8K or 16K <br />

– Must be set before the database is created <br />

16 <br />

www.percona.com


Using File per Table <br />

• Typically more convenient <br />

• Reclaim space from dropped table <br />

• OPTIMIZE TABLE mytable; <br />

– reduce file size aler data was deleted <br />

• Store different tables/databases on different drives <br />

• Backup/Restore tables one by one <br />

• Support for compression <br />

• Will use more space with many tables <br />

• Longer unclean restart ?me with many tables <br />

17 <br />

www.percona.com


<strong>Performance</strong> and InnoDB File Per <br />

Table <br />

• <strong>Performance</strong> is Similar in majority of cases <br />

• Very large number of tables is a problem <br />

– Can use a mix of tables in their own tablespaces <br />

and tables in shared tablespace <br />

• Can help with i-­‐node level locking on some <br />

filesystems <br />

– EXT3 <br />

18 <br />

www.percona.com


Drop Table with <br />

innodb_file_per_table <br />

• Dropping tablespace can get expensive <br />

– Even empty one! <br />

• DROP TABLE; TRUNCATE TABLE; some ALTER <br />

• Nega?ve Scalability <br />

– More expensive with larger buffer pool size <br />

• Significant improvements in MySQL 5.6 <br />

• More work in MySQL 5.7 <br />

– Especially for temporary tables <br />

19 <br />

www.percona.com


Dealing with Runaway tablespace <br />

• Main Tablespace does not shrink <br />

– Consider sepng max size <br />

–<br />

innodb_data_file_path=ibdata1:10M:autoextend:<br />

max:10G <br />

• Dump and Restore <br />

• Use Transportable Tablespaces feature <br />

20 <br />

www.percona.com


Transportable Tablespaces <br />

• Copy tables from one loca?on to another <br />

• FLUSH TABLES FOR EXPORT <br />

• Copy .frm, .ibd, .cfg file to a backup loca?on <br />

• UNLOCK TABLES <br />

• Import the tablespace at a different loca?on <br />

– But make sure that the table defini?on is the same on <br />

the second loca?on <br />

• Or use Percona XtraBackup hSp://goo.gl/MVGfIS <br />

21 <br />

www.percona.com


Separate Undo Tablespace <br />

• Store undo tablespace in separate set of files <br />

– innodb_undo_directory <br />

• Path where external undo tablespace are stored <br />

• Should be SSD backed for best performance <br />

– innodb_undo_tablespaces <br />

• How many undo tablespaces to create <br />

• Must be set at database crea?on ?me and cannot be <br />

changed <br />

• The tablespace size grows and never shrinks <br />

• Beware of running long running transac?ons <br />

22 <br />

www.percona.com


Separate Undo Tablespace (cont.) <br />

• Store undo tablespace in separate set of files <br />

– innodb_undo_logs <br />

• earlier innodb_rollback_segments <br />

23 <br />

www.percona.com


Page Checksums <br />

• Protec?on from corrupted data <br />

– Bad hardware, OS Bugs, InnoDB Bugs <br />

– Are not completely replaced by filesystem checksums <br />

• Checked when page is Read to Buffer Pool <br />

• Updated when page is flushed to disk <br />

• Can be significant overhead <br />

– Especially for very fast storage <br />

• Can be disabled by innodb_checksums=0 <br />

24 <br />

www.percona.com


Page Checksums (cont.) <br />

• For Fast Storage you might use faster <br />

checksums <br />

– innodb_checksum_algorithm=crc32 <br />

25 <br />

www.percona.com


What threads are there and what they do <br />

THREADS ARCHITECTURE <br />

26 <br />

www.percona.com


General Thread <strong>Architecture</strong> <br />

• Using MySQL Threads for execu?on <br />

– Normally thread per connec?on <br />

• Transac?on executed mainly by such thread <br />

– LiSle benefit from Mul?-­‐Core for single query <br />

• innodb_thread_concurrency can be used to limit <br />

number of execu?ng threads <br />

– This limit is number of threads in kernel <br />

• Including threads doing Disk IO or storing data in TMP Table. <br />

– Limits concurrency but reduces conten?on <br />

27 <br />

www.percona.com


General Thread <strong>Architecture</strong> (cont.) <br />

• innodb_thread_sleep_delay can be used to <br />

control thread queuing <br />

– This limit controls for how long threads sleep <br />

before joining the queue of wai?ng threads <br />

– Only used if innodb_thread_concurrency > 0 <br />

– innodb_adap7ve_max_sleep_delay <br />

• Makes thread sleep delay more adap?ve <br />

• InnoDB can adap?vely tune <br />

innodb_thread_sleep_delay, based on this <br />

28 <br />

www.percona.com


Helper Threads <br />

• Main Thread <br />

– Schedules background ac?vi?es <br />

• Change buffer merges, possible purge, etc. <br />

• IO Threads <br />

– Read -­‐ mul?ple threads used for read ahead <br />

• innodb_read_io_threads <br />

– Write -­‐ mul?ple threads used for background writes <br />

• innodb_write_io_threads <br />

– Insert Buffer thread used for change buffer merges <br />

29 <br />

www.percona.com


Helper Threads (cont.) <br />

• Page Cleaner Thread <br />

• Purge thread(s) <br />

• Deadlock detec?on thread <br />

• <strong>And</strong> others <br />

30 <br />

www.percona.com


ThreadPool <br />

• Helpful for handling very large number of <br />

connec?ons <br />

– 10K+ connec?ons are possible <br />

• Available in MySQL Enterprise, Percona Server, <br />

MariaDB <br />

• Implementa?ons are a bit different <br />

31 <br />

www.percona.com


How Indexes are Implemented in InnoDB <br />

INDEXES <br />

32 <br />

www.percona.com


Everything is the Index <br />

• InnoDB tables are “Index Organized” <br />

– PRIMARY KEY contains data instead of data pointer <br />

• Hidden PRIMARY KEY is used if not defined (6b) <br />

– Hidden PRIMARY KEY is a global conten?on point <br />

• Data is “Clustered” by PRIMARY KEY <br />

– Data with close PK value is stored close to each other <br />

– Clustering is within page ONLY <br />

• InnoDB system tables INNODB_SYS_TABLES and <br />

INNODB_SYS_INDEXES expose informa?on <br />

33 <br />

www.percona.com


INNODB_SYS_TABLES Example <br />

• Gives good overview informa?on <br />

mysql (information_schema) > select * from INNODB_SYS_TABLES limit 10;<br />

+----------+----------------------------+------+--------+-------+-------------+------------+---------------+<br />

| TABLE_ID | NAME | FLAG | N_COLS | SPACE | FILE_FORMAT | ROW_FORMAT | ZIP_PAGE_SIZE |<br />

+----------+----------------------------+------+--------+-------+-------------+------------+---------------+<br />

| 14 | SYS_DATAFILES | 0 | 5 | 0 | Antelope | Redundant | 0 |<br />

| 11 | SYS_FOREIGN | 0 | 7 | 0 | Antelope | Redundant | 0 |<br />

| 12 | SYS_FOREIGN_COLS | 0 | 7 | 0 | Antelope | Redundant | 0 |<br />

| 13 | SYS_TABLESPACES | 0 | 6 | 0 | Antelope | Redundant | 0 |<br />

| 16 | mysql/innodb_index_stats | 1 | 11 | 2 | Antelope | Compact | 0 |<br />

| 15 | mysql/innodb_table_stats | 1 | 9 | 1 | Antelope | Compact | 0 |<br />

| 18 | mysql/slave_master_info | 1 | 26 | 4 | Antelope | Compact | 0 |<br />

| 17 | mysql/slave_relay_log_info | 1 | 11 | 3 | Antelope | Compact | 0 |<br />

| 19 | mysql/slave_worker_info | 1 | 15 | 5 | Antelope | Compact | 0 |<br />

+----------+----------------------------+------+--------+-------+-------------+------------+---------------+<br />

9 rows in set (0.00 sec)<br />

34 <br />

www.percona.com


INNODB_SYS_INDEXES Example <br />

• Gives good overview informa?on <br />

mysql (information_schema) > select * from INNODB_SYS_INDEXES where table_id in (15,<br />

16);<br />

+----------+---------+----------+------+----------+---------+-------+<br />

| INDEX_ID | NAME | TABLE_ID | TYPE | N_FIELDS | PAGE_NO | SPACE |<br />

+----------+---------+----------+------+----------+---------+-------+<br />

| 17 | PRIMARY | 15 | 3 | 2 | 3 | 1 |<br />

| 18 | PRIMARY | 16 | 3 | 4 | 3 | 2 |<br />

+----------+---------+----------+------+----------+---------+-------+<br />

2 rows in set (0.00 sec)<br />

35 <br />

www.percona.com


Index Structure <br />

• Secondary Indexes refer to rows by Primary Key <br />

– No update when row is moved to different page <br />

• Long Primary Keys are expensive <br />

– Increase size of all indexes <br />

• Random Primary Key Inserts are expensive <br />

– Cause page splits; Fragmenta?on <br />

– Make page space u?liza?on low <br />

• Auto-­‐Increment keys are olen beSer than <br />

ar?ficial keys, UUIDs, SHA1 etc. <br />

36 <br />

www.percona.com


More on Clustered Index <br />

• PRIMARY KEY lookups are the most efficient <br />

– Secondary key lookup is essen?ally 2 key lookups <br />

• Op?mized with Adap?ve Hash Index <br />

• PRIMARY KEY ranges are very efficient <br />

– Build Schema keeping it in mind <br />

– (user_id,message_id) may be beSer than (message_id) <br />

• Changing PRIMARY KEY is expensive <br />

– Effec?vely removing row and adding new one <br />

– Effects all secondary indexes <br />

• Sequen?al Inserts give compact, least fragmented <br />

storage <br />

37 <br />

www.percona.com


More on Indexes <br />

• There is no Prefix Index compressions <br />

– Index can be 10x larger than for MyISAM table <br />

– InnoDB has page compression. Not the same thing. <br />

• Indexes contain transac?on informa?on = fat <br />

– Allow to see row visibility = index covering queries <br />

• Fast Index Crea?on <br />

– Secondary indexes built by sor?ng <br />

– Good page fill factor and no fragmenta?on <br />

– innodb_sort_buffer_size <br />

38 <br />

www.percona.com


Fragmenta7on <br />

• Inter-­‐row fragmenta?on <br />

– The row itself is fragmented <br />

– Happens in MyISAM but NOT in InnoDB <br />

• Intra-­‐row fragmenta?on <br />

– Sequen?al scan of rows is not sequen?al <br />

– Happens in InnoDB, outside of page boundary <br />

• Empty Space Fragmenta?on <br />

– A lot of empty space can be lel between rows <br />

39 <br />

www.percona.com


Fragmenta7on (cont.) <br />

• Defragmenta?on can be done by OPTIMIZE TABLE <br />

– OPTIMIZE TABLE mytable; <br />

• Cannot be done online in MySQL <br />

• Can be done online using pt-­‐online-­‐schema-­‐change <br />

hSp://goo.gl/pKBV2R <br />

pt-online-schema-change --alter<br />

"ENGINE=InnoDB" D=sakila,t=actor<br />

40 <br />

www.percona.com


Structure and working of InnoDB in-­‐memory storage area <br />

BUFFER POOL <br />

41 <br />

www.percona.com


What is the Buffer Pool? <br />

• In-­‐memory storage area that caches data and <br />

indexes <br />

• The unit of data storage is pages <br />

• All data access requests first go to the buffer pool <br />

• Data not found is loaded from disk and copied to <br />

it <br />

• In-­‐memory data access is very fast as compared <br />

to disk IO <br />

• Data modifica?ons are also cached and then <br />

synced to disk in the background <br />

42 <br />

www.percona.com


Important Structures <br />

• LRU List <br />

– List of pages in the buffer pool ordered by last access <br />

?me <br />

– Uses a modified LRU algorithm <br />

– Actually maintains two lists <br />

• List of young block <br />

• List of old blocks <br />

– Addi?onal list when dealing with compressed tables <br />

• unzip_LRU list -­‐ maintains a list of uncompressed pages for <br />

compressed tables <br />

43 <br />

www.percona.com


Important Structures (cont.) <br />

• Flush List <br />

– List of dirty pages in the buffer pool <br />

– Dirty pages are pages that contain modified rows <br />

and that need to be synced to data files on disk <br />

– Pages are moved from flush list to LRU list when <br />

changes have been synced to disk <br />

– Flush list size is capped by the redo log size (among <br />

other limits) <br />

44 <br />

www.percona.com


Configuring the Buffer Pool <br />

• innodb_buffer_pool_size is most important <br />

op?on <br />

– Use all your memory nor commiSed to anything else <br />

– Keep overhead into account (~5%) <br />

– Never let Buffer Pool Swapping to happen <br />

– Up to 80-­‐90% of memory on InnoDB only Systems <br />

– innodb_buffer_pool_instances=N <br />

• Set to 4-­‐16 depending number of cores to get beSer <br />

scalability <br />

• Default is 8 instances <br />

• Percona Server does well with few instances <br />

45 <br />

www.percona.com


Buffer Pool Page Replacement <br />

• InnoDB uses LRU for page replacement <br />

– With Midpoint Inser?on <br />

• innodb_old_blocks_pct, innodb_old_blocks_7me <br />

– Offers Scan resistance from large full table scans <br />

– innodb_old_blocks_7me defaults to 1000ms now <br />

meaning pages read in by scans age out faster <br />

• Page Cleaner thread maintains clean pages in free <br />

list <br />

• Evic?on is done from the list of old blocks <br />

46 <br />

www.percona.com


Buffer Pool Page Replacement (cont.) <br />

• Compressed / uncompressed pages evic?on <br />

– If size of unzip_LRU list


Buffer Pool Page Flushing <br />

• Different flushing flavors <br />

– Background flushing <br />

• Adap?ve, async, innodb_max_dirty_pages_pct, LRU <br />

and idle flushing <br />

– Foreground Sync flushing <br />

• Done by query thread that detects the condi?on <br />

• All other query threads are halted <br />

48 <br />

www.percona.com


Neighbor Page Flushing <br />

• InnoDB looks for next and previous dirty pages <br />

and flushes them as well to keep IO more <br />

bulky <br />

• Controlled by innodb_flush_neighbors <br />

• This is a good op?miza?on for HDD <br />

• Such behavior may be a poor choice <br />

– Especially for SSD which does not have random IO <br />

penalty <br />

– Disable it with innodb_flush_neighbors=0 <br />

49 <br />

www.percona.com


Buffer Pool Dump and Restore <br />

• Ability to dump Buffer Pool contents allowing <br />

for warm cache on restarts <br />

• Automa?cally dump and reload on shutdown <br />

and restart <br />

– innodb_buffer_pool_dump_at_shutdown <br />

– innodb_buffer_pool_load_at_startup <br />

• On-­‐demand dump and reload <br />

– innodb_buffer_pool_dump_now <br />

– innodb_buffer_pool_load_now <br />

50 <br />

www.percona.com


What’s in the Buffer Pool? <br />

• Informa?on about Buffer Pool pages and <br />

sta?s?cs is available in I_S tables <br />

– INNODB_BUFFER_PAGE_LRU <br />

– INNODB_BUFFER_PAGE <br />

– INNODB_BUFFER_POOL_STATS <br />

mysql > select POOL_ID, POOL_SIZE, FREE_BUFFERS, DATABASE_PAGES, OLD_DATABASE_PAGES, MODIFIED_DATABASE_PAGES from<br />

INNODB_BUFFER_POOL_STATS;<br />

+---------+-----------+--------------+----------------+--------------------+-------------------------+<br />

| POOL_ID | POOL_SIZE | FREE_BUFFERS | DATABASE_PAGES | OLD_DATABASE_PAGES | MODIFIED_DATABASE_PAGES |<br />

+---------+-----------+--------------+----------------+--------------------+-------------------------+<br />

| 0 | 8191 | 8043 | 148 | 0 | 0 |<br />

+---------+-----------+--------------+----------------+--------------------+-------------------------+<br />

1 row in set (0.00 sec)<br />

51 <br />

www.percona.com


What’s in the Buffer Pool? (cont.) <br />

• InnoDB Status also gives high level picture of <br />

the buffer pool <br />

– Percona Server exposes more informa?on <br />

– SHOW ENGINE INNODB STATUS\G <br />

----------------------<br />

BUFFER POOL AND MEMORY<br />

----------------------<br />

Total memory allocated 137363456; in additional pool allocated 0<br />

Dictionary memory allocated 43128<br />

Buffer pool size 8191<br />

Free buffers 8043<br />

Database pages 148<br />

Old database pages 0<br />

Modified db pages 0<br />

Pending reads 0<br />

Pending writes: LRU 0, flush list 0, single page 0<br />

Pages made young 0, not young 0<br />

0.00 youngs/s, 0.00 non-youngs/s<br />

Pages read 148, created 0, written 1<br />

0.00 reads/s, 0.00 creates/s, 0.00 writes/s<br />

No buffer pool page gets since the last printout<br />

Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s<br />

LRU len: 148, unzip_LRU len: 0<br />

I/O sum[0]:cur[0], unzip sum[0]:cur[0]<br />

52 <br />

www.percona.com


Data Dic7onary <br />

• Holds informa?on about InnoDB Tables <br />

– Sta?s?cs; Auto-­‐increment value, System informa?on <br />

– Can be 4KB to 10KB per table <br />

• Can consume a lot of memory with huge number <br />

of tables <br />

– Think hundreds of thousands <br />

• Data dic?onary LRU <br />

– Controlled automa?cally by table_defini7on_cache <br />

• If you have many tables increasing it is important <br />

53 <br />

www.percona.com


Buffer Pool Improvements in Percona <br />

Server <br />

• Lot of work in Percona Server to improve <br />

buffer pool scalability <br />

• Different mutex for different jobs reduce <br />

global conten?on <br />

– LRU list mutex <br />

– Flush list mutex <br />

– Free list mutex <br />

– and others .. hSp://goo.gl/Ln1Z9a <br />

54 <br />

www.percona.com


How does it func?on and what work does it do? <br />

PAGE CLEANER THREAD <br />

55 <br />

www.percona.com


Page Cleaner Thread <br />

• Handles all types of background flushing <br />

– Flushes pages from end of LRU list <br />

– Flushes pages from flush list <br />

• Wakes up once per second <br />

– Skips sleeping if server is idle <br />

• Server will do merges and flushes faster when it is <br />

idle <br />

• In some cases flushing can be done from user <br />

threads <br />

– Percona Server allows you to disable this <br />

56 <br />

www.percona.com


LRU List Flushing <br />

• Can happen in workloads with data sets larger <br />

than memory <br />

• Flushes dirty pages from the end of the LRU <br />

list <br />

• Moves clean pages at the tail of the LRU to the <br />

free list <br />

• Page cleaner does LRU flushing in all cases <br />

with some excep?ons <br />

57 <br />

www.percona.com


LRU List Flushing (cont.) <br />

• innodb_lru_scan_depth=N <br />

– How deeply to examine tail for dirty pages <br />

– User thread may scan up to this depth as well if no <br />

page available in free list <br />

– Important to tune to prevent synchronous flushes <br />

58 <br />

www.percona.com


Synchronous LRU List Flushing <br />

• Synchronous flush from user thread if it cannot <br />

find a clean page in the free list <br />

• Flushes a single page from user thread in case <br />

of synchronous flush <br />

• Typically when page cleaner thread is not able <br />

to keep up with page dirtying rate <br />

59 <br />

www.percona.com


Flushing Overview <br />

• Hard Problem to Balance <br />

– Flush too much now <br />

• Compete with IO which must happen <br />

• Loose possibility of op?miza?on <br />

– Delay too much <br />

• Might have to do too much IO in the future <br />

• Poten?ally cause “stalls” or performance dips <br />

• Lots of con?nuing work <br />

• Too Many tuning op?ons to play with <br />

60 <br />

www.percona.com


Flush List Flushing / Adap7ve Flushing <br />

• Flushing to advance “earliest modify LSN” <br />

– To free log space so it can be reduced <br />

• Mostly done from Page Cleaner from the end <br />

of the flush list <br />

– Approximately done once per second <br />

• Most of writes typically happen this way <br />

• Number of pages to flush per cycle depends on <br />

the load <br />

61 <br />

www.percona.com


Some Variables <br />

• innodb_io_capacity <br />

• innodb_io_capacity_max <br />

• innodb_max_dirty_pages_pct <br />

• innodb_max_dirty_pages_pct_lwm <br />

• innodb_flushing_avg_loops <br />

62 <br />

www.percona.com


Synchronous Flush List Flushing <br />

• Synchronous flushing outside of Page Cleaner <br />

in cases when sync point is reached <br />

• Synchronous flushing is done from user thread <br />

which detects the condi?on <br />

• Every other user thread is blocked to prevent <br />

increase in checkpoint age <br />

• Causes system-­‐wide stalls <br />

63 <br />

www.percona.com


Percona Server Improvements <br />

• Focus on BeSer default behavior with no <br />

tuning needed <br />

– Increase Priority of Page Cleaner Thread <br />

– Reduce flushing from user level threads <br />

– Make algorithm more adap?ve to changes in <br />

workload <br />

– BeSer balancing on processing mul?ple buffer <br />

pools <br />

64 <br />

www.percona.com


Some <strong>Performance</strong> Gains <br />

65 <br />

www.percona.com


What does this structure do? <br />

ADAPTIVE HASH INDEX <br />

66 <br />

www.percona.com


What is Adap7ve Hash Index? <br />

• On-­‐demand hash index built automa?cally by <br />

InnoDB <br />

• Built on top of exis?ng B+TREE Indexes to <br />

speed up lookups <br />

– Both PRIMARY and Secondary indexes <br />

• Can be built for full index and prefixes <br />

• Par?al Index <br />

– Only built for index values which are accessed <br />

olen <br />

67 <br />

www.percona.com


What is Adap7ve Hash Index? (cont.) <br />

• Adap?ve Hash Index lookup <br />

Search the<br />

adaptive hash<br />

extension_number<br />

1<br />

4<br />

2 6<br />

8<br />

Find a pointer directly to<br />

12 the leaf node of the<br />

clustered index.<br />

10 14<br />

3 5 7 9 11 13 15<br />

4 0xACDC<br />

12 0xACDC<br />

12 0xACDC<br />

2 0xACDC<br />

6 0xACDC 10 0xACDC 14 0xACDC<br />

1 ..<br />

3 .. 5 .. 7 .. 9 ..<br />

11 ..<br />

13 ..<br />

15 ..<br />

68 <br />

www.percona.com


Tuning Adap7ve Hash Index <br />

• No tuning op?on available <br />

– Can either be enabled or disabled <br />

– Enabled by default <br />

• Can be disabled for performance reasons <br />

– innodb_adap7ve_hash_index <br />

– Improves concurrency but reduces performance <br />

• Can be Par??oned in Percona Server <br />

– innodb_adap7ve_hash_index_par77ons=8 <br />

69 <br />

www.percona.com


What is change buffering in InnoDB? <br />

CHANGE BUFFER <br />

70 <br />

www.percona.com


What is Change Buffer? <br />

• Buffer INSERT, UPDATE and PURGE opera?ons <br />

– DELETE is covered as a special UPDATE <br />

• Designed to speed up modifica?ons into <br />

secondary Indexes when index page not in buffer <br />

pool <br />

– Secondary index updates are expensive <br />

– Reported up to 15 ?mes IO reduc?on for some cases <br />

• Works for Non-­‐Unique Secondary Indexes only <br />

• If leaf index page is not in buffer pool <br />

– Store a note the page should be updated in memory <br />

71 <br />

www.percona.com


What is Change Buffer ? (cont.) <br />

• If page containing buffered entries is read <br />

from disk they are merged transparently <br />

• InnoDB performs gradual change buffer <br />

merges in background <br />

– Change buffer merges may con?nue across <br />

restarts <br />

– InnoDB slow shutdown forces complete change <br />

buffer merge on shutdown <br />

• innodb_fast_shutdown=0 <br />

72 <br />

www.percona.com


Tuning Change Buffer <br />

• Can take up to ¼ the size of the buffer pool <br />

– Size of the change buffer can be controlled via <br />

innodb_change_buffer_max_size <br />

• Background merge speed may not be enough <br />

– Tune innodb_io_capacity <br />

• You can disable change buffering <br />

– innodb_change_buffering=none <br />

– Can be good for SSDs <br />

– Can also be good if dataset fits in memory <br />

73 <br />

www.percona.com


Change Buffering Overhead <br />

• Full change buffer is useless and wastes <br />

memory <br />

• Delayed change buffering may cause <br />

slowdown <br />

– Too many merges need to happen on page read <br />

• Aler restart merge speed may slow down <br />

– Finding index entries to merge requires random IO <br />

74 <br />

www.percona.com


How is this structure used in InnoDB <br />

DOUBLEWRITE BUFFER <br />

75 <br />

www.percona.com


What is Doublewrite Buffer? <br />

• InnoDB redo logs requires consistent pages for <br />

recovery <br />

• Page write may only be par?ally completed <br />

– Upda?ng part of 16K and leaving the rest <br />

• Doublewrite Buffer is short term page level log <br />

• All page writes go to doublewrite buffer first: <br />

– Write pages to doublewrite buffer; Sync <br />

– Write Pages to their original loca?ons; Sync <br />

– Pages contain tablespace_id+page_id <br />

76 <br />

www.percona.com


How DoubleWrite helps recovery <br />

• On crash recovery pages in this buffer are <br />

compared to their original loca?on <br />

– At least one of them should be conistent <br />

77 <br />

www.percona.com


What is Doublewrite Buffer? (cont.) <br />

• Doublewrite Buffer usage: <br />

Sync to double<br />

write area.<br />

Buffer<br />

Buffer Pool<br />

Sync individual<br />

pages to correct<br />

destinations.<br />

Tablespace<br />

78 <br />

www.percona.com


Tuning Doublewrite Buffer <br />

• No par?cular tuning op?ons available <br />

– Either enable it or disable it <br />

– innodb_doublewrite <br />

• Not much overhead on tradi?onal disks because <br />

writes are sequen?al <br />

• Much more overhead on SSDs <br />

– Also reduces their lifespan <br />

• Makes sense to disable it on atomic writes <br />

guarantee <br />

79 <br />

www.percona.com


Tuning Doublewrite Buffer (cont.) <br />

• Atomic writes guarantees <br />

– ZFS filesystem <br />

– DirectFS filesystem on Fusion-­‐io devices <br />

• Percona Server has support <br />

• hSp://goo.gl/ylpDK1 <br />

80 <br />

www.percona.com


What is it and how does it work? <br />

REDO LOGGING <br />

81 <br />

www.percona.com


InnoDB Log Files <br />

• Set of log files (ib_logfile?) <br />

– 2 log files by default. Effec?vely concatenated <br />

• Log Header <br />

– Stores informa?on about last checkpoint <br />

• Log is NOT organized in pages, but records <br />

– Records aligned 512 bytes, matching disk sector <br />

• Log record format “physiological” <br />

– Stores Page# and opera?on to do on it <br />

• Only REDO opera?ons are stored in logs. <br />

82 <br />

www.percona.com


More on Log Files <br />

• The log files are used to perform recovery in case <br />

of an unclean shutdown <br />

– Larger log files -­‐ Longer recovery <br />

• There is virtually no limit on the size of the log <br />

files <br />

– Well the combined size of the log files cannot exceed <br />

~512GB <br />

• Percona Server allows different log file block size <br />

– innodb_log_block_size <br />

• Default value is 512 bytes and good in most general cases <br />

• 4096 can be beSer for SSDs <br />

83 <br />

www.percona.com


More on Log Files (cont.) <br />

• Percona Server adds <br />

– innodb_log_checksum_algorithm=crc32 <br />

• Reduce CPU load for heavy writes to log file <br />

• If you're using compressed pages full pages <br />

can be logged to log file. <br />

84 <br />

www.percona.com


More on Log Files (cont.) <br />

• InnoDB Logs usage <br />

01010<br />

Log Files<br />

UPDATE City SET<br />

name = 'Morgansville'<br />

WHERE name = 'Brisbane'<br />

AND CountryCode='AUS'<br />

Buffer Pool<br />

Tablespace<br />

85 <br />

www.percona.com


InnoDB Log IO <br />

• Log are opened in buffered mode <br />

– Even with innodb_flush_method=O_DIRECT <br />

– Percona Server use O_DIRECT for logs as well <br />

• innodb_flush_method=ALL_O_DIRECT <br />

• innodb_log_block_size may need to be adjusted for EXT4 <br />

filesystem for ALL_O_DIRECT to work <br />

• Flushed by fsync() -­‐ default or O_SYNC <br />

• Logs which fit in cache may improve performance <br />

– Small transac?ons and <br />

innodb_flush_log_at_trx_commit=1 or 2 <br />

86 <br />

www.percona.com


Resizing Log Files <br />

• Log files are automa?cally resized <br />

– If innodb_log_file_size is changed between <br />

restarts <br />

– Log files are automa?cally resized to match the <br />

new desired size <br />

• Automa?c resizing does not work with <br />

versions < 5.6 <br />

87 <br />

www.percona.com


Tuning Parameters <br />

• Total log file size is calculated as <br />

– innodb_log_file_size * innodb_log_files_in_group <br />

• innodb_log_file_size <br />

– A general principle is to have these big enough to <br />

hold one hour worth of changes <br />

– Extremely important to tune this for write <br />

performance <br />

• innodb_log_files_in_group <br />

– No good reason to change <br />

88 <br />

www.percona.com


Log Buffer <br />

• Redo log records are first wriSen to a log <br />

buffer in memory <br />

• innodb_flush_log_at_7meout=N <br />

– Log buffer is synced to disk every N seconds <br />

irrespec?vely anyhow <br />

– Once per second by default <br />

89 <br />

www.percona.com


Log Buffer (cont.) <br />

• innodb_log_buffer_size <br />

– Values 32-­‐256MB typically make sense <br />

– Larger values are good to reduce conten?on <br />

– May need to be larger if using large BLOBs <br />

– See amount of data wriSen to the logs <br />

– Log buffer covering 10sec is good enough <br />

90 <br />

www.percona.com


Log Buffer (cont.) <br />

• innodb_flush_log_at_trx_commit <br />

Log Files<br />

Buffer Pool<br />

Log Buffer OS Cache RAID Cache<br />

Physical Storage<br />

1 fsync every commit<br />

2<br />

fsync once per innodb_flush_log_at_7meout<br />

second<br />

0<br />

fsync once per innodb_flush_log_at_7meout<br />

second<br />

91 <br />

www.percona.com


How InnoDB recovers from crash? <br />

RECOVERY <br />

92 <br />

www.percona.com


Recovery Stages <br />

• Physical Recovery <br />

– Recover par?ally wriSen pages from doublewrite <br />

buffer <br />

• Redo Recovery <br />

– Redo all the changes stored in transac?onal logs <br />

• Undo Recovery <br />

– Rollback uncommiSed transac?ons <br />

93 <br />

www.percona.com


Redo Recovery Stage <br />

• Foreground <br />

– Server is not started un?l it is complete <br />

• Larger Logs = Longer recovery ?me <br />

– Though row sizes, database size, workload also maSer <br />

• Scan Log files <br />

– Buffer modifica?ons on per page basics <br />

– Apply modifica?ons to data file <br />

• LSN stored in the page tells if change needs to be <br />

applied <br />

94 <br />

www.percona.com


Tuning Redo Recovery <br />

• Redo log size maSers <br />

– large logs longer recovery <br />

– tune innodb_log_file_size accordingly <br />

• innodb_max_dirty_pages_pct <br />

– Fewer the dirty pages, faster the recovery <br />

• innodb_buffer_pool_size <br />

– Larger buffer pool means faster IO recovery <br />

95 <br />

www.percona.com


Undo Recovery Stage <br />

• Is done in the background <br />

– Performed aler MySQL is started <br />

• Speed depends on transac?on length <br />

– Very large UPDATE, INSERT... SELECT is problem. <br />

• Is NOT a problem with ALTER TABLE <br />

– Commits every 10000 rows to avoid this problem <br />

– Unless it is Par??oned table <br />

• Faster with larger innodb_log_file_size <br />

• Be careful killing MySQL with runaway update <br />

queries. <br />

96 <br />

www.percona.com


Implementa?on of Mul? Versioning <br />

MULTI VERSIONING <br />

97 <br />

www.percona.com


Mul7 Versioning Overview <br />

• InnoDB maintains old row versions of modified <br />

rows <br />

• Read Transac?on can read old version of row, <br />

while it is modified <br />

– Prevents the need to do locking reads <br />

• Locking reads can s?ll be performed with <br />

SELECT FOR UPDATE and LOCK IN SHARE <br />

MODE Modifiers <br />

98 <br />

www.percona.com


Updates and Locking Reads <br />

• Updates bypass Mul? Versioning <br />

– You can only modify row which currently exists <br />

• Locking Read bypass mul?-­‐versioning <br />

– Result from SELECT vs. SELECT .. LOCK IN SHARE <br />

MODE will be different <br />

• Locking Reads are slower <br />

– Because they have to set locks <br />

– Can be 2x+ slower! <br />

– SELECT FOR UPDATE has larger overhead <br />

99 <br />

www.percona.com


Mul7 Versioning Implementa7on <br />

• The most recent row version is stored in the page <br />

– Even before it is commiSed <br />

• Previous row versions stored in undo space <br />

• The number of versions stored is not limited <br />

– Can cause system/undo tablespace size to explode <br />

• Access to old versions require going through <br />

linked list <br />

– Long transac?ons with many concurrent updates can <br />

impact performance. <br />

100 <br />

www.percona.com


Mul7 Versioning Internals <br />

• Each row in the database has <br />

– DB_TRX_ID (6b) – Transac?on inserted/updated row <br />

– DB_ROLL_PTR (7b) -­‐ Pointer to previous version <br />

• Dele?on handled as Special Update <br />

• DB_TRX_ID + list of currently running transac?ons <br />

is used to check which version is visible <br />

• Insert and Update Undo Segments <br />

– Inserts history can be discarded when transac?on <br />

commits. <br />

– Update history is used for MVCC implementa?on <br />

101 <br />

www.percona.com


Mul7 Versioning <strong>Performance</strong> <br />

• Only changed columns stored in the undo <br />

segment <br />

• Short rows are faster to update <br />

– Separate table to store counters olen make sense <br />

• Beware of long transac?ons <br />

– Especially containing many updates <br />

• “Rows Read” can be misleading <br />

– Single row may correspond to scanning thousand <br />

of versions/index entries <br />

102 <br />

www.percona.com


Mul7 Versioning Indexes <br />

• Indexes contain pointers to all versions <br />

– Index key 5 will point to all rows which were 5 in <br />

the past <br />

• Indexes contain TRX_ID <br />

– Easy to check entry is visible <br />

– Can use “Covering Indexes” <br />

• Many old versions is performance problem <br />

– Slow down accesses <br />

– Will leave many “holes” in pages when purged <br />

103 <br />

www.percona.com


Dealing with Runaway Transac7ons <br />

• Monitor SHOW ENGINE INNODB STATUS for <br />

transac?ons ACTIVE for long ?me <br />

– Looking for large InnoDB History List Length is also <br />

good idea <br />

• <strong>Innodb</strong>_history_list_length status variable <br />

• Percona Server has feature to kill idle transac?ons <br />

– Idle transac?ons are ones that are open but inac?ve <br />

– innodb_kill_idle_transac7on hSp://goo.gl/poBUVi <br />

104 <br />

www.percona.com


Transac7on Isola7on Modes <br />

• SERIALIZABLE <br />

– Locking reads. Bypass mul? versioning <br />

• REPEATABLE-­‐READ (default) <br />

– Read commiSed data at it was on start of transac?on <br />

• READ-­‐COMMITTED <br />

– Read commiSed data as it was at start of statement <br />

• READ-­‐UNCOMMITTED <br />

– Read uncommiSed data as it is changing live <br />

105 <br />

www.percona.com


Purging the Garbage <br />

• Old row and index entries need to be removed <br />

– When they are not needed for any ac?ve transac?on <br />

• REPEATABLE READ <br />

– Need to be able to read everything at transac?on start <br />

• READ-­‐COMMITTED <br />

– Need to read everything at statement start <br />

• Purging can be done by the main InnoDB thread <br />

or by dedicated thread(s) <br />

106 <br />

www.percona.com


Purging the Garbage (cont.) <br />

• Purging is done by default by a single thread <br />

– Can have between 0 -­‐ 32 purge threads <br />

– A minimum of 1 purge thread is recommended <br />

– 4 is probably good for write intensive workloads <br />

– More than 1 thread may decrease throughput for IO <br />

bound workload <br />

• innodb_purge_batch_size <br />

– Default is 300 which is good in most cases <br />

– Keep this to a reasonably high value because purging <br />

many rollback segments is expensive <br />

107 <br />

www.percona.com


Purging the Garbage (cont.) <br />

• Purge Thread(s) may be unable to keep up <br />

with intensive updates <br />

– InnoDB “History List Length” will grow high <br />

– innodb_max_purge_lag slows updates down <br />

• A “History List Length” threshold that when crossed, <br />

triggers DML(s) to be throSled <br />

– innodb_max_purge_lag_delay <br />

• Controls maximum delay in milliseconds which may be <br />

added to a DML <br />

108 <br />

www.percona.com


How row-­‐level locking and internal locks work? <br />

LOCKING AND LATCHING <br />

109 <br />

www.percona.com


InnoDB Locking Basics <br />

• Row level locking <br />

– No lock escala?on <br />

• Pessimis?c Locking Strategy <br />

• Row Level lock wait ?meout <br />

– innodb_lock_wait_7meout <br />

• Tradi?onal “S” and “X” locks <br />

• Inten?on locks on tables “IS” “IX” <br />

– Restric?ng table opera?ons <br />

110 <br />

www.percona.com


InnoDB Locking Basics (cont.) <br />

• Locks on Rows AND Index Records <br />

• Automa?c deadlock detec?on <br />

– Non-­‐recursive <br />

– innodb_print_all_deadlocks <br />

• Prints all deadlocks to the error log <br />

– For some deadlocks you may get <br />

innodb_lock_wait_7meout error <br />

111 <br />

www.percona.com


Gap Locking <br />

• InnoDB not only locks rows but also gap between them <br />

• Needed for Consistent Reads <br />

– Also used by update statements <br />

– READ-­‐COMMITTED + RBR can reduce those needs <br />

• InnoDB has no phantom rows even in Consistent Reads <br />

• Gap locking olen cause complex deadlock situa?ons <br />

• “Infinum”, “Supremum” records define bounds of data <br />

stored on the page <br />

• Only record lock is needed for PK Update <br />

112 <br />

www.percona.com


Types of Locks in InnoDB <br />

• Next-­‐Key-­‐Lock <br />

– Lock Key and gap before the key <br />

• Gap-­‐Lock <br />

– Lock just the gap before the key <br />

• Record-­‐Only-­‐Lock <br />

– Lock record only <br />

• Insert inten?on gap locks <br />

– Held when wai?ng to insert into the gap <br />

113 <br />

www.percona.com


Advanced Gap Locking Stuff <br />

• Gaps can change on row dele?on <br />

– Actually when Purge thread removes record <br />

• Leaving conflic?ng Gap locks held <br />

• Gap Locks are “purely inhibi?ve” <br />

– Only block inser?on. <br />

– Holding lock does not allow inser?on. Must also wait <br />

for conflic?ng locks to be released <br />

• “supremum” record can have lock, “infinum” <br />

can't <br />

• You rarely need it in prac?ce <br />

114 <br />

www.percona.com


InnoDB Lock Storage <br />

• InnoDB locks storage is preSy compact <br />

– This is why there is no lock escala?on! <br />

• Lock space needed depends on lock loca?on <br />

– Locking sparse rows is more expensive <br />

• Each Page having locks gets bitmap allocated <br />

for it <br />

– Bitmap holds lock informa?on for all records on <br />

the page <br />

• Locks typically take 3-­‐8 bits per locked row <br />

115 <br />

www.percona.com


Auto Increment Locks <br />

• Tradi?onal vs. Consecu?ve Lock Mode <br />

• Tradi?onal lock mode means taking table lock <br />

• Consecu?ve lock mode means a lightweight <br />

mutex is used instead <br />

• innodb_autoinc_lock_mode – set lock <br />

behavior <br />

– “1” -­‐ Does not hold table lock for simple Inserts <br />

– “2” -­‐ Does not hold lock in any case. <br />

• Only works with Row level replica?on <br />

116 <br />

www.percona.com


InnoDB Latching <br />

• InnoDB implements its own mutex and RW-­‐Locks <br />

– For transparency not only <strong>Performance</strong> <br />

• Latching stats shown in SHOW INNODB STATUS <br />

-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ <br />

SEMAPHORES <br />

-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐ <br />

OS WAIT ARRAY INFO: reserva?on count 13569 <br />

OS WAIT ARRAY INFO: signal count 11421 <br />

-­‐-­‐Thread 1152170336 has waited at ./../include/buf0buf.ic line 630 for 0.00 seconds the semaphore: <br />

Mutex at 0x2a957858b8 created file buf0buf.c line 517, lock var 0 <br />

-­‐-­‐Thread 1147709792 has waited at ./../include/buf0buf.ic line 630 for 0.00 seconds the semaphore: <br />

Mutex at 0x2a957858b8 created file buf0buf.c line 517, lock var 0 <br />

Mutex spin waits 5018, rounds 70800, OS waits 2033 <br />

RW-­‐shared spins 2736, rounds 84289, OS waits 2791 <br />

RW-­‐excl spins 904, rounds 111941, OS waits 3607 <br />

Spin rounds per wait: 14.11 mutex, 30.81 RW-­‐shared, 123.83 RW-­‐excl <br />

117 <br />

www.percona.com


Latching <strong>Performance</strong> <br />

• Has been improving over the years <br />

– Great improvements in MySQL 5.6 & XtraDB <br />

– S?ll hot spots remain <br />

• innodb_thread_concurrency <br />

– Limi?ng concurrency can reduce conten?on <br />

– Introduces conten?on on its own <br />

• innodb_sync_spin_loops <br />

– Trade Spinning for context switching <br />

– Typically limited produc?on impact <br />

118 <br />

www.percona.com


Current Hot Spots <br />

• log_mutex <br />

– Wri?ng data to the log buffer <br />

• Index-­‐>lock <br />

– Lock held for dura?on of low level index modifica?on <br />

– Can be serious hot spot for heavy write workloads <br />

– Improvements in MySQL 5.7 (not a full solu?on) <br />

• Adap?ve Hash Latch <br />

– Global latch to protect access to Adap?ve Hash Index <br />

119 <br />

www.percona.com


Current Hot Spots (cont.) <br />

• Adap?ve Hash Latch <br />

– Introduces conten?on in heavy read-­‐write <br />

workload <br />

– innodb_adap7ve_hash_index=0 <br />

• Can slow things down but also reduces conten?on <br />

– Percona Server tackles this by par??oning the <br />

global hash index <br />

• innodb_adap7ve_hash_index_par77ons=N <br />

• Helps when workload is spread across mul?ple tables <br />

120 <br />

www.percona.com


Where is my conten7on <br />

• <strong>Performance</strong> Schema is best way to look <br />

– Check out ps_helper <br />

+---------------------------------------------+---------+--------+<br />

| event_name |nsecs_per| seconds|<br />

+---------------------------------------------+---------+--------+<br />

| wait/synch/rwlock/innodb/index_tree_rw_lock | 19543.3 | 3456.1 |<br />

| wait/synch/mutex/innodb/log_sys_mutex | 2071.3 | 385.8 |<br />

| wait/synch/rwlock/innodb/hash_table_locks | 165.7 | 184.5 |<br />

| wait/synch/mutex/innodb/fil_system_mutex | 328.3 | 113.6 |<br />

| wait/synch/mutex/innodb/redo_rseg_mutex | 1766.4 | 84.9 |<br />

| wait/synch/rwlock/sql/MDL_lock::rwlock | 430.2 | 73.9 |<br />

| wait/synch/mutex/innodb/buf_pool_mutex | 264.5 | 72.7 |<br />

| wait/synch/rwlock/innodb/fil_space_latch | 27216.1 | 53.8 |<br />

| wait/synch/mutex/sql/THD::LOCK_query_plan | 167.0 | 50.7 |<br />

| wait/synch/mutex/innodb/trx_sys_mutex | 394.9 | 41.5 |<br />

+---------------------------------------------+---------+--------+<br />

121 <br />

www.percona.com


How are BLOBs stored in InnoDB <br />

BLOB STORAGE <br />

122 <br />

www.percona.com


Handling Blobs <br />

• Blobs are handled by InnoDB in a special way <br />

– <strong>And</strong> differently by different file formats <br />

• Small blobs <br />

– Whole row fits in ~8000 bytes stored on the page <br />

• Large Blobs <br />

– Can be stored full on external pages (Barracuda) <br />

– Can be stored par?ally on external page <br />

• First 768 bytes are stored on the page (Antelope) <br />

• InnoDB will NOT read external blobs unless they <br />

are needed by the query <br />

123 <br />

www.percona.com


Blobs in a Separate Table <br />

• It Depends :) <br />

• No need to store large blobs in separate table as <br />

they are already stored outside of the row <br />

• Storing medium size blobs (which fit on the page) <br />

in the separate table makes sense <br />

• Only split blobs to separate table if they are <br />

accessed infrequently <br />

• TODO: Would love to be able to specify BLOBs to <br />

be stored off the page <br />

124 <br />

www.percona.com


InnoDB BLOB != MySQL BLOB <br />

• MySQL Has limit of 65535 bytes per row <br />

excluding BLOB and TEXT column <br />

– This limit applies to VARCHAR columns <br />

• InnoDB limit is only 8000 (half a page) <br />

– So long VARCHAR fields may be stored as a BLOB <br />

inside InnoDB <br />

mysql> create table ai(c varchar(40000), d varchar(40000));<br />

ERROR 1118 (42000): Row size too large.<br />

The maximum row size for the used table type, not counting BLOBs, is 65535.<br />

You have to change some columns to TEXT or BLOBs<br />

125 <br />

www.percona.com


BLOB Page Alloca7on <br />

• Each BLOB Stored in separate segment <br />

– Normal alloca?on rules apply. By page when by extent <br />

– One large BLOB is faster than several medium ones <br />

– Many BLOBs can cause extreme waste <br />

• 500 byte blobs will require full 16K page if it does not fit with <br />

complete row <br />

• External BLOB is NOT updated in place <br />

– InnoDB always creates a new version <br />

• Large VARCHAR/TEXT are handled similarly to <br />

BLOB <br />

126 <br />

www.percona.com


How is compression implemented in InnoDB? <br />

COMPRESSION <br />

127 <br />

www.percona.com


Compression Basics <br />

• Requires “Barracuda” and innodb_file_per_table=1 <br />

• Per Page compression (mostly) <br />

• Uses zlib for compression <br />

– innodb_compression_level sets the zlib compression level <br />

• Default is 6 and max is 9 <br />

• ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8; <br />

– Es?mate how well the data will compress <br />

– Compressed page size of 8KB works well in most cases <br />

– Choosing too small a value can cause more page splits <br />

leading to subop?mal performance <br />

128 <br />

www.percona.com


Compression Basics (cont.) <br />

• Per page update log to avoid recompression <br />

• Both Compressed and Uncompressed page can be <br />

stored in Buffer Pool <br />

– Uncompressed pages would be evicted from the <br />

Buffer Pool first <br />

• Compression failure can be expensive <br />

– Causes InnoDB to split the page and compress each <br />

one separately <br />

– InnoDB can automa?cally add padding to prevent <br />

compression failure <br />

129 <br />

www.percona.com


Compression Considera7ons <br />

• Controlling padding to avoid compression failure <br />

– innodb_compression_failure_threshold_pct <br />

• Determines when to start adding padding to avoid <br />

compression failure <br />

– innodb_compression_pad_pct_max <br />

• Max percentage of space that can be reserved in page for <br />

whitespace padding <br />

• KEY_BLOCK_SIZE=16 <br />

– Only compress externally stored BLOBs <br />

– Can reduce size without overhead <br />

130 <br />

www.percona.com


Compression Considera7ons (cont.) <br />

• May need to provision more memory for the <br />

buffer pool to avoid frequent uncompression <br />

• May not be too good for a workload that is <br />

already CPU bound <br />

• Page size is too small for good compression <br />

• Have to “Guess” how data will compress to set <br />

the compressed block size <br />

131 <br />

www.percona.com


Compression Considera7ons (cont.) <br />

• Compression sepng is Per table <br />

– Though some indexes compress beSer than others <br />

• It may be more suitable to compress within <br />

the applica?on before wri?ng to the database <br />

– Can avoid overhead where some part of data <br />

compresses well and some doesn’t <br />

– Can help distribute the CPU consump?on across <br />

applica?on servers <br />

132 <br />

www.percona.com


Compression Sta7s7cs <br />

• Per-­‐index compression sta?s?cs <br />

– innodb_cmp_per_index_enabled <br />

• Enables I_S table INNODB_CMP_PER_INDEX <br />

• Can get expensive <br />

• Compression info exposed via I_S tables <br />

– Exposed via INNODB_CMP% tables <br />

– INNODB_CMP, INNODB_CMP_PER_INDEX, <br />

INNODB_CMPMEM, etc. <br />

133 <br />

www.percona.com


Alterna7ves to <strong>Innodb</strong> Compression <br />

• Compress on Applica?on <br />

• Use Filesystem/Storage which support <br />

compression <br />

• FusionIO – Hardware assisted compression <br />

– Inform Flash of “holes” which do not need to be <br />

kept <br />

• TokuDB Storage Engine <br />

– Offers much higher compression rate <br />

134 <br />

www.percona.com


Miscellaneous features and topics <br />

MISCELLANEOUS <br />

135 <br />

www.percona.com


Foreign Keys <br />

• Implemented on InnoDB level <br />

• Requires indexes on both tables <br />

– Can be very expensive some?mes <br />

• Checks happen when row is modified <br />

– No delayed checks ?ll transac?on commit <br />

• Foreign Keys introduce addi?onal locking <br />

overhead <br />

– Many tricky deadlock situa?ons are foreign key <br />

related <br />

136 <br />

www.percona.com


Direct IO Opera7on <br />

• Default IO mode for InnoDB data is Buffered <br />

• Good <br />

– Faster flushing when no write cache <br />

– Faster warm up on restart <br />

– Reduce problems with inode locking on EXT3 <br />

• Bad <br />

– Lost of effec?ve cache memory due to double buffering <br />

– Increased tendency to swap due to IO pressure <br />

• innodb_flush_method=O_DIRECT <br />

– O_DIRECT_NO_FSYNC on some filesystems <br />

– ALL_O_DIRECT on Percona Server <br />

137 <br />

www.percona.com


Direct IO Opera7on (cont.) <br />

• innodb_flush_method=O_DIRECT <br />

01010 0101 <br />

Buffered <br />

Log Files<br />

OS <br />

Cache <br />

Direct DMA write <br />

Buffer Pool<br />

Tablespace<br />

138 <br />

www.percona.com


Direct IO Opera7on (cont.) <br />

• innodb_flush_method=ALL_O_DIRECT <br />

01010 0101 <br />

Direct DMA write <br />

Log Files<br />

OS <br />

Cache <br />

Direct DMA write <br />

Buffer Pool<br />

Tablespace<br />

139 <br />

www.percona.com


READ-­‐ONLY Mode <br />

• innodb_read_only=1 <br />

• Make all InnoDB tables and opera?ons read-­‐only <br />

– Allows InnoDB to be operated from read-­‐only media <br />

– Allows more than one MySQL instances to share the <br />

same dataset <br />

• Following must be done for this to work <br />

– InnoDB slow shutdown <br />

– InnoDB started with change buffering disabled <br />

140 <br />

www.percona.com


Persistent Op7mizer Stats <br />

• Persistent stats are enabled by default <br />

• Enable query execu?on plans to be more stable <br />

• Stats recalculated on-­‐demand <br />

• Collect stats via ANALYZE/OPTIMIZE TABLE <br />

• Make sure you collect stats aler ALTER TABLE <br />

• Stats can be copied from master to slave <br />

– Stats tables: mysql.innodb_index_stats, <br />

mysql.innodb_table_stats <br />

141 <br />

www.percona.com


Unlocking Features in XtraBackup <br />

• Some important XtraBackup features are only <br />

available when used with XtraDB <br />

• Fast Incremental Backups <br />

– A bitmap based fast incremental backup strategy <br />

– Relies on “Changed Page Tracking” feature in <br />

XtraDB -­‐ hSp://goo.gl/xhjLRc <br />

• Archive Log Backups <br />

– Keep history of modifica?ons in redo log archives <br />

– Use with XtraBackup -­‐ hSp://goo.gl/W3bm8F <br />

142 <br />

www.percona.com


<strong>Innodb</strong> Full Text Search <br />

• Available since MySQL 5.6 <br />

• Is NOT 100% same as with MyISAM <br />

• Small/Medium scale implementa?ons <br />

– Larger ones use Solr, Sphinx, Elas?cSearch <br />

• Implemented as several “hidden” <strong>Innodb</strong> tables <br />

– Would require separate tutorial to cover in details <br />

• Loading a lot of data: <br />

– Load the data and when add FTS index <br />

143 <br />

www.percona.com


The Future <br />

SOME HIGHLIGHTS ON MYSQL 5.7 <br />

144 <br />

www.percona.com


More Online <br />

• Online Index Rename <br />

• Online VARCHAR length increase <br />

• Online OPTIMIZE TABLE <br />

145 <br />

www.percona.com


Temporary Tables <br />

• Faster Create/Drop of Temporary Tables <br />

• Can create temporary tables tablespace in <br />

separate path <br />

– <strong>Innodb</strong>_temp_data_path <br />

• Undo logs op?mized to be used for rollback <br />

only <br />

– No redo logging for temporary tables <br />

146 <br />

www.percona.com


General <strong>Performance</strong> <br />

• Automa?c detec?on of read only transac?ons <br />

• Internal Btree handling op?mized for lock <br />

conten?on <br />

• Mul?ple Cleaner Threads supported <br />

147 <br />

www.percona.com


Thank you ! <br />

pz@percona.com <br />

148 <br />

www.percona.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!