04.06.2015 Views

Database Modeling and Design

Database Modeling and Design

Database Modeling and Design

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.1 Fundamentals of Normalization 109<br />

interdependencies among individual attributes associated with those<br />

tables <strong>and</strong> taking projections (subsets of columns) of larger tables to<br />

form smaller ones.<br />

Let us first review the basic normal forms, which have been well<br />

established in the relational database literature <strong>and</strong> in practice.<br />

6.1.1 First Normal Form<br />

Definition. A table is in first normal form (1NF) if <strong>and</strong> only if all columns<br />

contain only atomic values, that is, each column can have<br />

only one value for each row in the table.<br />

Relational database tables, such as the Sales table illustrated in Figure<br />

6.1, have only atomic values for each row <strong>and</strong> for each column. Such<br />

tables are considered to be in first normal form, the most basic level of<br />

normalized tables.<br />

To better underst<strong>and</strong> the definition for 1NF, it helps to know the difference<br />

between a domain, an attribute, <strong>and</strong> a column. A domain is the<br />

set of all possible values for a particular type of attribute, but may be<br />

used for more than one attribute. For example, the domain of people’s<br />

names is the underlying set of all possible names that could be used for<br />

either customer-name or salesperson-name in the database table in Figure<br />

6.1. Each column in a relational table represents a single attribute,<br />

but in some cases more than one column may refer to different<br />

attributes from the same domain. When this occurs, the table is still in<br />

1NF because the values in the table are still atomic. In fact, st<strong>and</strong>ard SQL<br />

assumes only atomic values <strong>and</strong> a relational table is by default in 1NF. A<br />

nice explanation of this is given in Muller [1999].<br />

6.1.2 Superkeys, C<strong>and</strong>idate Keys, <strong>and</strong> Primary Keys<br />

A table in 1NF often suffers from data duplication, update performance,<br />

<strong>and</strong> update integrity problems, as noted above. To underst<strong>and</strong> these<br />

issues better, however, we must define the concept of a key in the context<br />

of normalized tables. A superkey is a set of one or more attributes,<br />

which, when taken collectively, allows us to identify uniquely an entity<br />

or table. Any subset of the attributes of a superkey that is also a superkey,<br />

<strong>and</strong> not reducible to another superkey, is called a c<strong>and</strong>idate key. A primary<br />

key is selected arbitrarily from the set of c<strong>and</strong>idate keys to be used in an<br />

index for that table.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!