21.01.2015 Views

Physical Database Design (ch. 16 & ch. 3) Introduction

Physical Database Design (ch. 16 & ch. 3) Introduction

Physical Database Design (ch. 16 & ch. 3) Introduction

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

¢¡¤£¥¡§¦¨¡©¡¡¤£<br />

"!$#¨%&%#('()*<br />

<br />

+¢,¤-¥,§.¨,/01,2,¤30402-<br />

98":$;¨


@¢A¤B¥A§C¨ADEFAGA¤HEIEGB<br />

NM"O$P¨Q&JQP(R(ST<br />

JKLM<br />

VYZ[V\V¤]Z^Z\W<br />

U¢V¤W¥V§Ẍ<br />

cb"d$ë f&_fe(g(hi<br />

_`ab<br />

Inputs to <strong>Physical</strong> <strong>Design</strong><br />

• Normalized relations.<br />

• Volume estimates.<br />

• Attribute definitions.<br />

• Data usage: entered, retrieved, deleted, updated.<br />

• Response time requirements.<br />

• Requirements for security, backup, recovery, retention,<br />

integrity.<br />

• DBMS <strong>ch</strong>aracteristics.<br />

<strong>Physical</strong> <strong>Design</strong> Decisions<br />

• Specifying attribute data types.<br />

• Modifying the logical design.<br />

• Specifying the file organization.<br />

• Choosing indexes.<br />

2


j¢k¤l¥k§m¨knopkqk¤rosoql<br />

xw"y$z¨{&t{z(|(}~<br />

tuvw<br />

¢€¤¥€§‚¨€ƒ„…€†€¤‡„ˆ„†<br />

Œ"Ž$¨&‰(‘(’“<br />

‰Š‹Œ<br />

Data Volumes and Query Frequencies<br />

• Data volumes: estimation of number of data<br />

records in ea<strong>ch</strong> entity<br />

• Query frequencies: estimation of number of<br />

queries per hour towards ea<strong>ch</strong> entity<br />

These two types of information are useful for<br />

determining the storage requirements and<br />

performance requirements, whi<strong>ch</strong> are needed<br />

to make physical design decisions.<br />

An Example<br />

200 75<br />

PART<br />

1000<br />

70<br />

SUPLLIER<br />

50<br />

(<br />

MANUFACTURED<br />

PART<br />

400<br />

o<br />

40% 70%<br />

140<br />

(<br />

PURCHASED<br />

PART<br />

700<br />

40<br />

60<br />

40<br />

SUPLLY<br />

2500<br />

M<br />

N<br />

80<br />

Quotation<br />

3


”¢•¤–¥•§—¨•˜š•›•¤œ›–<br />

¡ ¢¡"£$¤¨¥&ž¥¤(¦(§¨<br />

žŸ<br />

©¢ª¤«¥ª§¬¨ª­®¯ª°ª¤±®²®°«<br />

·"¸$¹¨º&³º¹(»(¼½<br />

³´µ<br />

<strong>Design</strong>ing Fields<br />

• Choosing data type -- Char(8), Date, etc.<br />

• Coding, compression, encryption.<br />

• Controlling data integrity.<br />

– Default value.<br />

– Range control.<br />

– Null value control.<br />

– Referential integrity.<br />

An example of code look-up table<br />

PRODUCT File<br />

Product_No Description Finish …<br />

B100 Chair C<br />

B120 Desk A<br />

M128 Table C<br />

T100 Bookcase B<br />

… … …<br />

FINISH Look-up Table<br />

Code Value<br />

A Bir<strong>ch</strong><br />

B Maple<br />

C Oak<br />

Coding is a way to a<strong>ch</strong>ieve compression<br />

4


¾¢¿¤À¥¿§Á¨¿ÂÃÄ¿Å¿¤ÆÃÇÃÅÀ<br />

ÌË"Í$ΨÏ&ÈÏÎ(Ð(ÑÒ<br />

ÈÉÊË<br />

Ó¢Ô¤Õ¥Ô§Ö¨Ô×ØÙÔÚÔ¤ÛØÜØÚÕ<br />

áà"â$ã¨ä&Ýäã(å(æç<br />

ÝÞßà<br />

<strong>Design</strong>ing Fields<br />

• Handling missing data.<br />

– Substitute an estimate of the missing value.<br />

– Trigger a report listing missing values.<br />

– In programs, ignore missing data unless the<br />

value is significant.<br />

<strong>Physical</strong> Records<br />

• <strong>Physical</strong> Record: A group of fields stored in<br />

adjacent memory locations and retrieved<br />

together as a unit.<br />

• Page: The amount of data read or written in<br />

one I/O operation. A page contains usually<br />

a number of physical records.<br />

• Blocking Factor: The number of physical<br />

records per page.<br />

5


è¢é¤ê¥é§ë¨éìíîéïé¤ðíñíïê<br />

öõ"÷$ø¨ù&òùø(ú(ûü<br />

òóôõ<br />

¨þ£¢¥¤§¦þ©¨þ¤¤¨ÿ<br />

ý¢þ¤ÿ¥þ¡<br />

!"©#<br />

£!<br />

<strong>Design</strong>ing <strong>Physical</strong> Files<br />

• <strong>Physical</strong> File: A file as stored on the disk.<br />

• Constructs to link two pieces of data:<br />

– Sequential storage.<br />

– Pointers.<br />

• File Organization: How the files are<br />

arranged on the disk.<br />

• Access Method: How the data can be<br />

retrieved based on the file organization.<br />

Sequential File Organization<br />

• Records of the file are stored in sequence by<br />

the primary key field values.<br />

Aces<br />

Boilermakers<br />

Devils<br />

Flyers<br />

Hawkeyes<br />

Hoosiers<br />

…<br />

Minors<br />

Panthers<br />

…<br />

Seminoles<br />

…<br />

Start of file<br />

Scan<br />

6


$&%'(%¡)%£*¥+§,-%©.¥%/+0+.1'<br />

23456£5789298!:!;©<<br />

=&>(>¡@>£A¥B§C->©D¥>EBFBD1<br />

GHIJK£JLMNGNM!O!P©Q<br />

Indexed File Organizations<br />

• Index: an auxiliary file to improve access<br />

efficiency of the main data file.<br />

• B-tree or B + -tree index.<br />

• Bitmap index<br />

– Ideal for attributes that have even a few possible<br />

values<br />

– Often requires less storage space<br />

– Can be used for multiple keys<br />

Bitmap Index on Product Price attribute<br />

Product Table Row Number<br />

Price 1 2 3 4 5 6 7 8 9 10<br />

100 0 0 1 0 1 0 0 0 0 0<br />

200 1 0 0 0 0 0 0 0 0 0<br />

300 0 1 0 0 0 0 1 0 0 1<br />

400 0 0 0 1 0 1 0 1 1 0<br />

Product 3 and 5 have Price $100<br />

Product 1 has Price $200<br />

Product 2, 7, and 10 have Price 300<br />

Product 4, 6, 8, and 0 have Price $400<br />

7


R&ST(S¡US£V¥W§X-S©Y¥SZW[WY1T<br />

\]^_`£_abc\cb!d!e©f<br />

g&hi(h¡jh£k¥l§m-h©n¥holpln1i<br />

qrstu£tvwxqxw!y!z©{<br />

Hashed File Organization<br />

• Hashing Algorithm: Converts a primary key<br />

value into a record address.<br />

• Division-remainder method:<br />

– Given 1000 pages to store employee records<br />

– Choose the prime number closest to 1000, i.e., 997<br />

– The bucket number of ea<strong>ch</strong> record is equal to the<br />

remainder of employee ID divided by 997<br />

– Finding the location of any employee record needs<br />

only a computation.<br />

Comparison of File Organizations<br />

• Sequential:<br />

– No waste space<br />

– Fast sequential retrieval<br />

– no random retrieval<br />

– update requires reorganization and slow<br />

8


|&}~(}¡}£€¥§‚-}©ƒ¥}„…ƒ1~<br />

†‡ˆ‰Š£‰‹Œ†Œ!Ž!©<br />

‘&’“(’¡”’£•¥–§—-’©˜¥’–š–˜1“<br />

¡¢›¢¡!£!¤©¥<br />

›œžŸ£ž<br />

Comparison of File Organizations<br />

• Indexed<br />

– require additional space for index<br />

– support random retrieval<br />

– deletion, addition, and update of records require<br />

modification of indexes<br />

Comparison of File Organizations<br />

• Hashed<br />

– May require overflow pages<br />

– sequential retrieval is impractical<br />

– random retrieval on primary key is very fast<br />

since it does not need to access index<br />

– deletion, addition, and modification of records<br />

are relatively easy<br />

9


¦&§¨(§¡©§£ª¥«§¬-§©­¥§®«¯«­1¨<br />

°±²³´£³µ·°·!¸!¹©º<br />

»&¼½(¼¡¾¼£¿¥À§Á-¼©Â¥¼ÃÀÄÀÂ1½<br />

ÅÆÇÈÉ£ÈÊËÌÅÌË!Í!ΩÏ<br />

Denormalization<br />

• The reversal of normalization in order to<br />

increase query processing efficiency.<br />

• During physical database design,<br />

denormalization is done if performance<br />

consideration dominates the issue of<br />

operational anomalies.<br />

An example<br />

• Two entities with a one-to-one relationship<br />

SID<br />

STUDET<br />

1 1<br />

Submit<br />

SCHOLARSHIP<br />

APPLICATION<br />

AID<br />

Address<br />

STUDENT(SID, Address)<br />

Application_<br />

Date<br />

Qualifications<br />

APPLICATION(AID, Application_date, Qualification, SID)<br />

Denormalized relation: STUDENT(SID, Address, AID, Application_Date,<br />

Qualification)<br />

10


Ð&ÑÒ(Ñ¡ÓÑ£Ô¥Õ§Ö-ѩץÑØÕÙÕ×1Ò<br />

ÚÛÜÝÞ£ÝßàáÚáà!â!ã©ä<br />

å&æç(æ¡èæ£é¥ê§ë-æ©ì¥æíêîêì1ç<br />

ïðñòó£òôõöïöõ!÷!ø©ù<br />

Partitioning<br />

• Horizontal Partitioning: Distributing the<br />

rows of a table into several separate files.<br />

• Vertical Partitioning: Distributing the<br />

columns of a table into several separate<br />

files.<br />

– The primary key must be repeated in ea<strong>ch</strong> file.<br />

Partitioning<br />

• Advantages of Partitioning:<br />

– Records used together are grouped together.<br />

– Ea<strong>ch</strong> partition can be optimized for performance.<br />

– Security, recovery.<br />

– Partitions stored on different disks.<br />

– Take advantage of parallel processing capability.<br />

• Disadvantages of Partitioning:<br />

– Slow retrievals across partitions.<br />

– Complexity.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!