Physical Database Design (ch. 16 & ch. 3) Introduction
Physical Database Design (ch. 16 & ch. 3) Introduction
Physical Database Design (ch. 16 & ch. 3) Introduction
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
¢¡¤£¥¡§¦¨¡©¡¡¤£<br />
"!$#¨%&%#('()*<br />
<br />
+¢,¤-¥,§.¨,/01,2,¤30402-<br />
98":$;¨
@¢A¤B¥A§C¨ADEFAGA¤HEIEGB<br />
NM"O$P¨Q&JQP(R(ST<br />
JKLM<br />
VYZ[V\V¤]Z^Z\W<br />
U¢V¤W¥V§Ẍ<br />
cb"d$ë f&_fe(g(hi<br />
_`ab<br />
Inputs to <strong>Physical</strong> <strong>Design</strong><br />
• Normalized relations.<br />
• Volume estimates.<br />
• Attribute definitions.<br />
• Data usage: entered, retrieved, deleted, updated.<br />
• Response time requirements.<br />
• Requirements for security, backup, recovery, retention,<br />
integrity.<br />
• DBMS <strong>ch</strong>aracteristics.<br />
<strong>Physical</strong> <strong>Design</strong> Decisions<br />
• Specifying attribute data types.<br />
• Modifying the logical design.<br />
• Specifying the file organization.<br />
• Choosing indexes.<br />
2
j¢k¤l¥k§m¨knopkqk¤rosoql<br />
xw"y$z¨{&t{z(|(}~<br />
tuvw<br />
¢€¤¥€§‚¨€ƒ„…€†€¤‡„ˆ„†<br />
Œ"Ž$¨&‰(‘(’“<br />
‰Š‹Œ<br />
Data Volumes and Query Frequencies<br />
• Data volumes: estimation of number of data<br />
records in ea<strong>ch</strong> entity<br />
• Query frequencies: estimation of number of<br />
queries per hour towards ea<strong>ch</strong> entity<br />
These two types of information are useful for<br />
determining the storage requirements and<br />
performance requirements, whi<strong>ch</strong> are needed<br />
to make physical design decisions.<br />
An Example<br />
200 75<br />
PART<br />
1000<br />
70<br />
SUPLLIER<br />
50<br />
(<br />
MANUFACTURED<br />
PART<br />
400<br />
o<br />
40% 70%<br />
140<br />
(<br />
PURCHASED<br />
PART<br />
700<br />
40<br />
60<br />
40<br />
SUPLLY<br />
2500<br />
M<br />
N<br />
80<br />
Quotation<br />
3
”¢•¤–¥•§—¨•˜š•›•¤œ›–<br />
¡ ¢¡"£$¤¨¥&ž¥¤(¦(§¨<br />
žŸ<br />
©¢ª¤«¥ª§¬¨ª®¯ª°ª¤±®²®°«<br />
·"¸$¹¨º&³º¹(»(¼½<br />
³´µ<br />
<strong>Design</strong>ing Fields<br />
• Choosing data type -- Char(8), Date, etc.<br />
• Coding, compression, encryption.<br />
• Controlling data integrity.<br />
– Default value.<br />
– Range control.<br />
– Null value control.<br />
– Referential integrity.<br />
An example of code look-up table<br />
PRODUCT File<br />
Product_No Description Finish …<br />
B100 Chair C<br />
B120 Desk A<br />
M128 Table C<br />
T100 Bookcase B<br />
… … …<br />
FINISH Look-up Table<br />
Code Value<br />
A Bir<strong>ch</strong><br />
B Maple<br />
C Oak<br />
Coding is a way to a<strong>ch</strong>ieve compression<br />
4
¾¢¿¤À¥¿§Á¨¿ÂÃÄ¿Å¿¤ÆÃÇÃÅÀ<br />
ÌË"Í$ΨÏ&ÈÏÎ(Ð(ÑÒ<br />
ÈÉÊË<br />
Ó¢Ô¤Õ¥Ô§Ö¨Ô×ØÙÔÚÔ¤ÛØÜØÚÕ<br />
áà"â$ã¨ä&Ýäã(å(æç<br />
ÝÞßà<br />
<strong>Design</strong>ing Fields<br />
• Handling missing data.<br />
– Substitute an estimate of the missing value.<br />
– Trigger a report listing missing values.<br />
– In programs, ignore missing data unless the<br />
value is significant.<br />
<strong>Physical</strong> Records<br />
• <strong>Physical</strong> Record: A group of fields stored in<br />
adjacent memory locations and retrieved<br />
together as a unit.<br />
• Page: The amount of data read or written in<br />
one I/O operation. A page contains usually<br />
a number of physical records.<br />
• Blocking Factor: The number of physical<br />
records per page.<br />
5
è¢é¤ê¥é§ë¨éìíîéïé¤ðíñíïê<br />
öõ"÷$ø¨ù&òùø(ú(ûü<br />
òóôõ<br />
¨þ£¢¥¤§¦þ©¨þ¤¤¨ÿ<br />
ý¢þ¤ÿ¥þ¡<br />
!"©#<br />
£!<br />
<strong>Design</strong>ing <strong>Physical</strong> Files<br />
• <strong>Physical</strong> File: A file as stored on the disk.<br />
• Constructs to link two pieces of data:<br />
– Sequential storage.<br />
– Pointers.<br />
• File Organization: How the files are<br />
arranged on the disk.<br />
• Access Method: How the data can be<br />
retrieved based on the file organization.<br />
Sequential File Organization<br />
• Records of the file are stored in sequence by<br />
the primary key field values.<br />
Aces<br />
Boilermakers<br />
Devils<br />
Flyers<br />
Hawkeyes<br />
Hoosiers<br />
…<br />
Minors<br />
Panthers<br />
…<br />
Seminoles<br />
…<br />
Start of file<br />
Scan<br />
6
$&%'(%¡)%£*¥+§,-%©.¥%/+0+.1'<br />
23456£5789298!:!;©<<br />
=&>(>¡@>£A¥B§C->©D¥>EBFBD1<br />
GHIJK£JLMNGNM!O!P©Q<br />
Indexed File Organizations<br />
• Index: an auxiliary file to improve access<br />
efficiency of the main data file.<br />
• B-tree or B + -tree index.<br />
• Bitmap index<br />
– Ideal for attributes that have even a few possible<br />
values<br />
– Often requires less storage space<br />
– Can be used for multiple keys<br />
Bitmap Index on Product Price attribute<br />
Product Table Row Number<br />
Price 1 2 3 4 5 6 7 8 9 10<br />
100 0 0 1 0 1 0 0 0 0 0<br />
200 1 0 0 0 0 0 0 0 0 0<br />
300 0 1 0 0 0 0 1 0 0 1<br />
400 0 0 0 1 0 1 0 1 1 0<br />
Product 3 and 5 have Price $100<br />
Product 1 has Price $200<br />
Product 2, 7, and 10 have Price 300<br />
Product 4, 6, 8, and 0 have Price $400<br />
7
R&ST(S¡US£V¥W§X-S©Y¥SZW[WY1T<br />
\]^_`£_abc\cb!d!e©f<br />
g&hi(h¡jh£k¥l§m-h©n¥holpln1i<br />
qrstu£tvwxqxw!y!z©{<br />
Hashed File Organization<br />
• Hashing Algorithm: Converts a primary key<br />
value into a record address.<br />
• Division-remainder method:<br />
– Given 1000 pages to store employee records<br />
– Choose the prime number closest to 1000, i.e., 997<br />
– The bucket number of ea<strong>ch</strong> record is equal to the<br />
remainder of employee ID divided by 997<br />
– Finding the location of any employee record needs<br />
only a computation.<br />
Comparison of File Organizations<br />
• Sequential:<br />
– No waste space<br />
– Fast sequential retrieval<br />
– no random retrieval<br />
– update requires reorganization and slow<br />
8
|&}~(}¡}£€¥§‚-}©ƒ¥}„…ƒ1~<br />
†‡ˆ‰Š£‰‹Œ†Œ!Ž!©<br />
‘&’“(’¡”’£•¥–§—-’©˜¥’–š–˜1“<br />
¡¢›¢¡!£!¤©¥<br />
›œžŸ£ž<br />
Comparison of File Organizations<br />
• Indexed<br />
– require additional space for index<br />
– support random retrieval<br />
– deletion, addition, and update of records require<br />
modification of indexes<br />
Comparison of File Organizations<br />
• Hashed<br />
– May require overflow pages<br />
– sequential retrieval is impractical<br />
– random retrieval on primary key is very fast<br />
since it does not need to access index<br />
– deletion, addition, and modification of records<br />
are relatively easy<br />
9
¦&§¨(§¡©§£ª¥«§¬-§©¥§®«¯«1¨<br />
°±²³´£³µ·°·!¸!¹©º<br />
»&¼½(¼¡¾¼£¿¥À§Á-¼©Â¥¼ÃÀÄÀÂ1½<br />
ÅÆÇÈÉ£ÈÊËÌÅÌË!Í!ΩÏ<br />
Denormalization<br />
• The reversal of normalization in order to<br />
increase query processing efficiency.<br />
• During physical database design,<br />
denormalization is done if performance<br />
consideration dominates the issue of<br />
operational anomalies.<br />
An example<br />
• Two entities with a one-to-one relationship<br />
SID<br />
STUDET<br />
1 1<br />
Submit<br />
SCHOLARSHIP<br />
APPLICATION<br />
AID<br />
Address<br />
STUDENT(SID, Address)<br />
Application_<br />
Date<br />
Qualifications<br />
APPLICATION(AID, Application_date, Qualification, SID)<br />
Denormalized relation: STUDENT(SID, Address, AID, Application_Date,<br />
Qualification)<br />
10
Ð&ÑÒ(Ñ¡ÓÑ£Ô¥Õ§Ö-ѩץÑØÕÙÕ×1Ò<br />
ÚÛÜÝÞ£ÝßàáÚáà!â!ã©ä<br />
å&æç(æ¡èæ£é¥ê§ë-æ©ì¥æíêîêì1ç<br />
ïðñòó£òôõöïöõ!÷!ø©ù<br />
Partitioning<br />
• Horizontal Partitioning: Distributing the<br />
rows of a table into several separate files.<br />
• Vertical Partitioning: Distributing the<br />
columns of a table into several separate<br />
files.<br />
– The primary key must be repeated in ea<strong>ch</strong> file.<br />
Partitioning<br />
• Advantages of Partitioning:<br />
– Records used together are grouped together.<br />
– Ea<strong>ch</strong> partition can be optimized for performance.<br />
– Security, recovery.<br />
– Partitions stored on different disks.<br />
– Take advantage of parallel processing capability.<br />
• Disadvantages of Partitioning:<br />
– Slow retrievals across partitions.<br />
– Complexity.<br />
11