Physical Database Design (ch. 16 & ch. 3) Introduction

¢¡¤£¥¡§¦¨¡©¡¡¤£ 

"!$#¨%&%#('()* 

 

+¢,¤-¥,§.¨,/01,2,¤30402- 

98":$;¨

@¢A¤B¥A§C¨ADEFAGA¤HEIEGB 

NM"O$P¨Q&JQP(R(ST 

JKLM 

VYZ[V\V¤]Z^Z\W 

U¢V¤W¥V§Ẍ 

cb"d$ë f&_fe(g(hi 

_`ab 

Inputs to Physical Design 

• Normalized relations. 

• Volume estimates. 

• Attribute definitions. 

• Data usage: entered, retrieved, deleted, updated. 

• Response time requirements. 

• Requirements for security, backup, recovery, retention, 

integrity. 

• DBMS characteristics. 

Physical Design Decisions 

• Specifying attribute data types. 

• Modifying the logical design. 

• Specifying the file organization. 

• Choosing indexes. 

2

j¢k¤l¥k§m¨knopkqk¤rosoql 

xw"y$z¨{&t{z(|(}~ 

tuvw 

¢€¤¥€§‚¨€ƒ„…€†€¤‡„ˆ„† 

Œ"Ž$¨&‰(‘(’“ 

‰Š‹Œ 

Data Volumes and Query Frequencies 

• Data volumes: estimation of number of data 

records in each entity 

• Query frequencies: estimation of number of 

queries per hour towards each entity 

These two types of information are useful for 

determining the storage requirements and 

performance requirements, which are needed 

to make physical design decisions. 

An Example 

200 75 

PART 

1000 

70 

SUPLLIER 

50 

( 

MANUFACTURED 

PART 

400 

o 

40% 70% 

140 

( 

PURCHASED 

PART 

700 

40 

60 

40 

SUPLLY 

2500 

M 

N 

80 

Quotation 

3

”¢•¤–¥•§—¨•˜š•›•¤œ›– 

¡ ¢¡"£$¤¨¥&ž¥¤(¦(§¨ 

žŸ 

©¢ª¤«¥ª§¬¨ª®¯ª°ª¤±®²®°« 

·"¸$¹¨º&³º¹(»(¼½ 

³´µ 

Designing Fields 

• Choosing data type -- Char(8), Date, etc. 

• Coding, compression, encryption. 

• Controlling data integrity. 

– Default value. 

– Range control. 

– Null value control. 

– Referential integrity. 

An example of code look-up table 

PRODUCT File 

Product_No Description Finish … 

B100 Chair C 

B120 Desk A 

M128 Table C 

T100 Bookcase B 

… … … 

FINISH Look-up Table 

Code Value 

A Birch 

B Maple 

C Oak 

Coding is a way to achieve compression 

4

¾¢¿¤À¥¿§Á¨¿ÂÃÄ¿Å¿¤ÆÃÇÃÅÀ 

ÌË"Í$Î¨Ï&ÈÏÎ(Ð(ÑÒ 

ÈÉÊË 

Ó¢Ô¤Õ¥Ô§Ö¨Ô×ØÙÔÚÔ¤ÛØÜØÚÕ 

áà"â$ã¨ä&Ýäã(å(æç 

ÝÞßà 

Designing Fields 

• Handling missing data. 

– Substitute an estimate of the missing value. 

– Trigger a report listing missing values. 

– In programs, ignore missing data unless the 

value is significant. 

Physical Records 

• Physical Record: A group of fields stored in 

adjacent memory locations and retrieved 

together as a unit. 

• Page: The amount of data read or written in 

one I/O operation. A page contains usually 

a number of physical records. 

• Blocking Factor: The number of physical 

records per page. 

5

è¢é¤ê¥é§ë¨éìíîéïé¤ðíñíïê 

öõ"÷$ø¨ù&òùø(ú(ûü 

òóôõ 

¨þ£¢¥¤§¦þ©¨þ¤¤¨ÿ 

ý¢þ¤ÿ¥þ¡ 

!"©# 

£! 

Designing Physical Files 

• Physical File: A file as stored on the disk. 

• Constructs to link two pieces of data: 

– Sequential storage. 

– Pointers. 

• File Organization: How the files are 

arranged on the disk. 

• Access Method: How the data can be 

retrieved based on the file organization. 

Sequential File Organization 

• Records of the file are stored in sequence by 

the primary key field values. 

Aces 

Boilermakers 

Devils 

Flyers 

Hawkeyes 

Hoosiers 

… 

Minors 

Panthers 

… 

Seminoles 

… 

Start of file 

Scan 

6

$&%'(%¡)%£*¥+§,-%©.¥%/+0+.1' 

23456£5789298!:!;©< 

=&>(>¡@>£A¥B§C->©D¥>EBFBD1 

GHIJK£JLMNGNM!O!P©Q 

Indexed File Organizations 

• Index: an auxiliary file to improve access 

efficiency of the main data file. 

• B-tree or B + -tree index. 

• Bitmap index 

– Ideal for attributes that have even a few possible 

values 

– Often requires less storage space 

– Can be used for multiple keys 

Bitmap Index on Product Price attribute 

Product Table Row Number 

Price 1 2 3 4 5 6 7 8 9 10 

100 0 0 1 0 1 0 0 0 0 0 

200 1 0 0 0 0 0 0 0 0 0 

300 0 1 0 0 0 0 1 0 0 1 

400 0 0 0 1 0 1 0 1 1 0 

Product 3 and 5 have Price $100 

Product 1 has Price $200 

Product 2, 7, and 10 have Price 300 

Product 4, 6, 8, and 0 have Price $400 

7

R&ST(S¡US£V¥W§X-S©Y¥SZW[WY1T 

\]^_`£_abc\cb!d!e©f 

g&hi(h¡jh£k¥l§m-h©n¥holpln1i 

qrstu£tvwxqxw!y!z©{ 

Hashed File Organization 

• Hashing Algorithm: Converts a primary key 

value into a record address. 

• Division-remainder method: 

– Given 1000 pages to store employee records 

– Choose the prime number closest to 1000, i.e., 997 

– The bucket number of each record is equal to the 

remainder of employee ID divided by 997 

– Finding the location of any employee record needs 

only a computation. 

Comparison of File Organizations 

• Sequential: 

– No waste space 

– Fast sequential retrieval 

– no random retrieval 

– update requires reorganization and slow 

8

|&}~(}¡}£€¥§‚-}©ƒ¥}„…ƒ1~ 

†‡ˆ‰Š£‰‹Œ†Œ!Ž!© 

‘&’“(’¡”’£•¥–§—-’©˜¥’–š–˜1“ 

¡¢›¢¡!£!¤©¥ 

›œžŸ£ž 


• Indexed 

– require additional space for index 

– support random retrieval 

– deletion, addition, and update of records require 

modification of indexes 


• Hashed 

– May require overflow pages 

– sequential retrieval is impractical 

– random retrieval on primary key is very fast 

since it does not need to access index 

– deletion, addition, and modification of records 

are relatively easy 

9

¦&§¨(§¡©§£ª¥«§¬-§©¥§®«¯«1¨ 

°±²³´£³µ·°·!¸!¹©º 

»&¼½(¼¡¾¼£¿¥À§Á-¼©Â¥¼ÃÀÄÀÂ1½ 

ÅÆÇÈÉ£ÈÊËÌÅÌË!Í!Î©Ï 

Denormalization 

• The reversal of normalization in order to 

increase query processing efficiency. 

• During physical database design, 

denormalization is done if performance 

consideration dominates the issue of 

operational anomalies. 

An example 

• Two entities with a one-to-one relationship 

SID 

STUDET 

1 1 

Submit 

SCHOLARSHIP 

APPLICATION 

AID 

Address 

STUDENT(SID, Address) 

Application_ 

Date 

Qualifications 

APPLICATION(AID, Application_date, Qualification, SID) 

Denormalized relation: STUDENT(SID, Address, AID, Application_Date, 

Qualification) 

10

Ð&ÑÒ(Ñ¡ÓÑ£Ô¥Õ§Ö-Ñ©×¥ÑØÕÙÕ×1Ò 

ÚÛÜÝÞ£ÝßàáÚáà!â!ã©ä 

å&æç(æ¡èæ£é¥ê§ë-æ©ì¥æíêîêì1ç 

ïðñòó£òôõöïöõ!÷!ø©ù 

Partitioning 

• Horizontal Partitioning: Distributing the 

rows of a table into several separate files. 

• Vertical Partitioning: Distributing the 

columns of a table into several separate 

files. 

– The primary key must be repeated in each file. 

Partitioning 

• Advantages of Partitioning: 

– Records used together are grouped together. 

– Each partition can be optimized for performance. 

– Security, recovery. 

– Partitions stored on different disks. 

– Take advantage of parallel processing capability. 

• Disadvantages of Partitioning: 

– Slow retrievals across partitions. 

– Complexity. 

11

Physical Database Design (ch. 16 & ch. 3) Introduction

Create successful ePaper yourself

Delete template?

Save as template?