01.12.2014 Views

here in PDF - Parasol Laboratory, Department of Computer Science ...

here in PDF - Parasol Laboratory, Department of Computer Science ...

here in PDF - Parasol Laboratory, Department of Computer Science ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 0 ]<br />

About These Slides<br />

These slides were developed by<br />

Pr<strong>of</strong>. Jennifer Welch<br />

<strong>Department</strong> <strong>of</strong> <strong>Computer</strong> <strong>Science</strong><br />

Texas A&M University<br />

College Station, TX 77843-3112<br />

welch@cs.tamu.edu<br />

dur<strong>in</strong>g Spr<strong>in</strong>g 1999. Comments and suggestions for<br />

improvements are welcome.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 1 ]<br />

What are Data Structures?<br />

Data structures are ways to organize data (<strong>in</strong>formation).<br />

Examples:<br />

simple variables —<br />

objects —<br />

arrays —<br />

l<strong>in</strong>ked lists —<br />

Typically, algorithms go with the data structures to<br />

manipulate the data (e.g., the methods <strong>of</strong> a class).<br />

This course will cover some more complicated data<br />

structures:<br />

how<br />

what


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 2 ]<br />

Abstract Data Types<br />

An abstract data type (ADT) def<strong>in</strong>es<br />

<br />

<br />

Similar to a<br />

This course will cover<br />

specifications <strong>of</strong><br />

pros and cons <strong>of</strong><br />

how the


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 3 ]<br />

Specific ADTs<br />

The ADTs to be studied (and some sample applications)<br />

are:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 4 ]<br />

How Does C Fit In?<br />

Although data structures are universal (can be implemented<br />

<strong>in</strong> any programm<strong>in</strong>g language), this course will<br />

use Java and C:<br />

<br />

<br />

We will learn how to ga<strong>in</strong> the advantages <strong>of</strong><br />

Reasons to learn C:<br />

learn<br />

useful<br />

ubiquitous and<br />

Unix<br />

C code can be very<br />

very efficient


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 5 ]<br />

Other Topics<br />

Course will emphasize good s<strong>of</strong>tware development<br />

practice:<br />

<br />

<br />

<br />

<br />

Course will touch on several more advanced computer<br />

science topics that appear later <strong>in</strong> the curriculum, and<br />

fit <strong>in</strong> with our topics this semester:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 6 ]<br />

Pr<strong>in</strong>ciples <strong>of</strong> <strong>Computer</strong> <strong>Science</strong><br />

<strong>Computer</strong> <strong>Science</strong> is like:<br />

eng<strong>in</strong>eer<strong>in</strong>g:<br />

science:<br />

math:<br />

However, CS studies<br />

Recurr<strong>in</strong>g concepts <strong>in</strong> computer science are:<br />

layers, hierarchies, <strong>in</strong>formation-hid<strong>in</strong>g, abstraction,<br />

<strong>in</strong>terfaces<br />

efficiency, trade<strong>of</strong>fs, resource usage<br />

reliability, affordability, correctness


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 7 ]<br />

Introduction to Data Structures<br />

Data structures are one <strong>of</strong> the endur<strong>in</strong>g pr<strong>in</strong>ciples<br />

<strong>in</strong> computer science. Why?<br />

1. Data structures are based on the notion <strong>of</strong> <strong>in</strong>formation<br />

hid<strong>in</strong>g:<br />

2. A number <strong>of</strong> data structures are useful <strong>in</strong> a wide<br />

range <strong>of</strong> applications.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 8 ]<br />

Efficiency Considerations<br />

S<strong>in</strong>ce these data structures are so widespread, it’s important<br />

to implement them efficiently. Measures <strong>of</strong><br />

efficiency:<br />

<br />

<br />

<strong>in</strong><br />

<br />

<br />

We will study trade<strong>of</strong>fs, such as<br />

<br />

<br />

Efficiency will be measured us<strong>in</strong>g


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 9 ]<br />

Asymptotic Analysis<br />

Actual (wall-clock) time <strong>of</strong> a program is affected by:<br />

<br />

<br />

<br />

<br />

<br />

<br />

Instead <strong>of</strong> wall-clock time, look at the pattern <strong>of</strong> the<br />

program’s behavior as the problem size <strong>in</strong>creases. This<br />

is called asymptotic analysis.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 10 ]<br />

Big-Oh Notation<br />

Big-oh notation is used to capture the generic<br />

From a practical po<strong>in</strong>t <strong>of</strong> view, you can get the big-oh<br />

notation for a function by<br />

1.<br />

2.<br />

Which terms are lower order than others? In <strong>in</strong>creas<strong>in</strong>g<br />

order:<br />

Examples:<br />

4302 =<br />

n 3 + n log n + n 5 + n =<br />

34n 3 , 2n log n + :0004n 5 +5:2n=<br />

See Appendix B, Section 4 <strong>of</strong> Standish, or CPSC 311,<br />

for mathematical def<strong>in</strong>itions and justifications.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 11 ]<br />

Why Multiplicative Constants are Unimportant<br />

An example show<strong>in</strong>g how multiplicative constants become<br />

unimportant as n gets very large:<br />

n 1000 log n :0001 n 2<br />

2<br />

256<br />

4096<br />

8192<br />

16,384<br />

32,768<br />

1,048,576<br />

Big-oh notation is not always appropriate! If your<br />

program is work<strong>in</strong>g on small <strong>in</strong>put sizes,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 12 ]<br />

Generic Steps<br />

How can you figure out the runn<strong>in</strong>g time <strong>of</strong> an algorithm<br />

without implement<strong>in</strong>g it, runn<strong>in</strong>g it on various<br />

<strong>in</strong>puts, plott<strong>in</strong>g the results, and fitt<strong>in</strong>g a curve to the<br />

data? And even if you did that, how would you know<br />

you fit the right curve?<br />

We count generic steps <strong>of</strong> the algorithm. Each generic<br />

step that we count should be<br />

Classify<strong>in</strong>g an assignment statement as a generic step<br />

is<br />

Classify<strong>in</strong>g a statement “sort the entire array” as a generic<br />

step is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 13 ]<br />

Stack vs. Heap<br />

Memory used by an execut<strong>in</strong>g program is partitioned:<br />

the stack:<br />

– When a method beg<strong>in</strong>s execut<strong>in</strong>g, a piece <strong>of</strong> the<br />

stack (stack frame) is devoted to it.<br />

– T<strong>here</strong> is an entry <strong>in</strong> the stack frame for<br />

<br />

<br />

<br />

– For variables <strong>of</strong> primitive type, the data itself is<br />

stored<br />

For variables <strong>of</strong> object type,<br />

– When the method f<strong>in</strong>ishes, the method’s stack frame<br />

is<br />

the heap: Dynamically allocated memory goes <strong>here</strong>,<br />

<strong>in</strong>clud<strong>in</strong>g the actual data for objects. Lifetime is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 14 ]<br />

Stack Frames Example<br />

q<br />

p<br />

p<br />

ma<strong>in</strong> ma<strong>in</strong> ma<strong>in</strong><br />

ma<strong>in</strong> calls p<br />

p calls q<br />

s<br />

r<br />

r<br />

p p p<br />

ma<strong>in</strong> ma<strong>in</strong> ma<strong>in</strong><br />

q returns p calls r r calls s<br />

r<br />

p<br />

p<br />

ma<strong>in</strong> ma<strong>in</strong> ma<strong>in</strong><br />

s returns r returns p returns


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 15 ]<br />

Objects<br />

An object is an entity (e.g., a ball) that has<br />

state —<br />

behavior —<br />

A class is the<br />

Analogy: a class is like an<br />

an object is like an<br />

class def<strong>in</strong>es important<br />

construction is required to<br />

many objects/houses can be created


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 16 ]<br />

Data Abstraction<br />

The class concept supports<br />

Similar pr<strong>in</strong>ciples apply as for procedural abstraction:<br />

group<br />

group<br />

separate the issue <strong>of</strong><br />

separate the issue <strong>of</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 17 ]<br />

References<br />

The class <strong>of</strong> an object is its<br />

Objects are declared differently than are variables <strong>of</strong><br />

primitive types.<br />

Suppose t<strong>here</strong> is a class called Person.<br />

<strong>in</strong>t total;<br />

Person neighbor;<br />

Declaration <strong>of</strong> total allocates storage on the<br />

Declaration <strong>of</strong> neighbor allocates storage on the


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 18 ]<br />

Creat<strong>in</strong>g Objects<br />

A constructor is a special method <strong>of</strong> the class that<br />

When a constructor is called,<br />

storage space is allocated<br />

each object gets<br />

the object’s state is<br />

The name <strong>of</strong> the constructor for class X is X(). Ex:<br />

neighbor = new Person();<br />

The operator new must be put <strong>in</strong> front <strong>of</strong> the call to the<br />

constructor.<br />

Summary: Declar<strong>in</strong>g a variable <strong>of</strong> an object type produces


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 19 ]<br />

Creat<strong>in</strong>g Objects (cont’d)<br />

You can comb<strong>in</strong>e the declaration and <strong>in</strong>itialization:<br />

Person neighbor = new Person();<br />

just as you can for primitive types:<br />

<strong>in</strong>t total = 25;


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 20 ]<br />

Object Assignment & Aliases<br />

The mean<strong>in</strong>g <strong>of</strong> assignment is different for objects than<br />

it is for primitive types.<br />

<strong>in</strong>t num1 = 5;<br />

<strong>in</strong>t num2 = 12;<br />

num2 = num1;<br />

At the end, num2 holds 5.<br />

Person neighbor = new Person(); // creates object 1<br />

Person friend = new Person(); // creates object 2<br />

friend = neighbor;<br />

At the end, friend and neighbor both refer to object1(theyarealiases<br />

<strong>of</strong> each other) and noth<strong>in</strong>g refers<br />

to object 2 (it is <strong>in</strong>accessible).


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 21 ]<br />

Data Abstraction Revisited<br />

As a rule <strong>of</strong> thumb, referr<strong>in</strong>g to <strong>in</strong>stance variables outside<br />

the class is<br />

For <strong>in</strong>stance, the implementor <strong>of</strong> the Person class<br />

might decide to store the age<br />

In this case, getAgeInYears must change:<br />

Code that got the age us<strong>in</strong>g this method need not change,<br />

but code that got the age us<strong>in</strong>g .age directly<br />

Moral:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 22 ]<br />

Public vs. Private<br />

You can tailor the ability to access methods and variables<br />

from outside the class, us<strong>in</strong>g visibility modifiers.<br />

public: the variable or method can<br />

private: the variable or method can<br />

Visibility modifiers go at the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the l<strong>in</strong>e that<br />

declares the variable or method. Ex:<br />

public static void ma<strong>in</strong>(...<br />

private <strong>in</strong>t age;<br />

Rules <strong>of</strong> thumb:<br />

make <strong>in</strong>stance variables<br />

make <strong>in</strong>stance methods that are part <strong>of</strong> the public<br />

<strong>in</strong>terface <strong>of</strong> the class<br />

make <strong>in</strong>stance methods that help with <strong>in</strong>ternal work<br />

<strong>of</strong> a class


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 23 ]<br />

Public vs. Private (cont’d)<br />

Instance variables should be accessible only <strong>in</strong>directly<br />

via public ”get” and ”set” methods. Ex:<br />

getAgeInYears()<br />

Group together all the private variables/methods, and<br />

all the public ones when you format your program.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 24 ]<br />

Specification vs. Implementation<br />

Users <strong>of</strong> a class should rely only on the specification <strong>of</strong><br />

the class. They are allowed to<br />

declare<br />

create<br />

<strong>in</strong>voke<br />

Implementors <strong>of</strong> a class should<br />

def<strong>in</strong>e<br />

hide<br />

protect<br />

feel free to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 25 ]<br />

Inheritance<br />

Inheritance lets a programmer derive a new class from<br />

an exist<strong>in</strong>g class. New class can<br />

use<br />

modify<br />

have<br />

Thus <strong>in</strong>heritance promotes s<strong>of</strong>tware reuse. Itisadef<strong>in</strong><strong>in</strong>g<br />

characteristic <strong>of</strong><br />

Term<strong>in</strong>ology:<br />

Class A is derived from (or, <strong>in</strong>herits from) another<br />

class B<br />

A is called subclass or child class.<br />

B is called superclass or parent class.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 26 ]<br />

Benefits <strong>of</strong> Inheritance<br />

Inheritance is particularly useful <strong>in</strong> large s<strong>of</strong>tware projects:<br />

<br />

– saves<br />

– provides<br />

– supports


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 27 ]<br />

Costs <strong>of</strong> Inheritance<br />

<br />

– Usually this disadvantage is outweighed by<br />

– Once system is work<strong>in</strong>g,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 28 ]<br />

Inheritance <strong>in</strong> Java<br />

To declare that a class is a subclass <strong>of</strong> another class:<br />

class extends {<br />

... // def<strong>in</strong>e the child-class<br />

}<br />

child class <strong>in</strong>herits<br />

child class <strong>in</strong>herits<br />

child class does NOT <strong>in</strong>herit<br />

child class does NOT <strong>in</strong>herit<br />

Inherited variables and methods can be used <strong>in</strong> the<br />

child class<br />

Inheritance is one-way street!!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 29 ]<br />

Protected Visibility<br />

private:<br />

public:<br />

This makes it dangerous to <strong>in</strong>herit variables, s<strong>in</strong>ce normally<br />

<strong>in</strong>stance variables should not be made accessible<br />

outside the class.<br />

The solution is<br />

protected:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 30 ]<br />

Overrid<strong>in</strong>g Methods<br />

When a child class def<strong>in</strong>es a method with the same<br />

name and signature (sequence <strong>of</strong> parameters) as the<br />

parent, the child’s version overrides the parent’s version.<br />

Useful when<br />

Polymorphism means that<br />

These are not necessarily the same, s<strong>in</strong>ce a variable can<br />

refer to any object whose class is a descendant <strong>of</strong> the<br />

variable’s class.<br />

When <strong>in</strong> doubt, draw a memory diagram!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 31 ]<br />

Abstract Classes — Motivation<br />

Consider a database for a veter<strong>in</strong>arian to keep track <strong>of</strong><br />

medical and bill<strong>in</strong>g <strong>in</strong>formation for each patient.<br />

Each patient is someone’s pet (e.g., dog, bird).<br />

Some aspects <strong>of</strong> the vet’s bus<strong>in</strong>ess are <strong>in</strong>dependent<br />

<strong>of</strong> the particular species (e.g., bill<strong>in</strong>g, owner <strong>in</strong>fo).<br />

Some aspects depend critically on the species (e.g.,<br />

the vacc<strong>in</strong>ation schedule, diet recommendations).<br />

An obvious organization is to have a<br />

Note that it does not make sense to create a Pet object<br />

—<br />

The Pet class is used to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 32 ]<br />

Rules for Abstract Classes and Methods<br />

Only <strong>in</strong>stance methods can be declared<br />

Any class with an abstract method must be declared<br />

A class may be declared abstract<br />

An abstract class cannot<br />

A non-abstract subclass <strong>of</strong> an abstract class must<br />

If a subclass <strong>of</strong> an abstract class does not implement<br />

all <strong>of</strong> the abstract methods that it <strong>in</strong>herits, then<br />

S<strong>in</strong>ce an abstract class cannot be <strong>in</strong>stantiated, its variables<br />

and methods are not directly used. But they can<br />

be


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 33 ]<br />

Declar<strong>in</strong>g an Interface<br />

An <strong>in</strong>terface is an abstract class taken to the extreme.<br />

It is like an abstract class <strong>in</strong> which<br />

<strong>in</strong>terface {<br />

// public f<strong>in</strong>al<br />

// public abstract<br />

}<br />

An <strong>in</strong>terface provides<br />

a collection <strong>of</strong><br />

a collection <strong>of</strong><br />

For example:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 34 ]<br />

Implement<strong>in</strong>g an Interface<br />

The syntax for “<strong>in</strong>herit<strong>in</strong>g from” (called implement<strong>in</strong>g)<br />

an <strong>in</strong>terface I is:<br />

class B implements I { ... }<br />

For example:<br />

The class Account<br />

can access the<br />

must provide an implementation <strong>of</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 35 ]<br />

Abstract Classes vs. Interfaces<br />

An abstract class can be used as a repository <strong>of</strong><br />

A class can implement<br />

Both abstract classes and <strong>in</strong>terfaces can be used to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 36 ]<br />

Object-Oriented Design<br />

The design <strong>of</strong> a s<strong>of</strong>tware system is an iterative process.<br />

choose<br />

develop<br />

previous step may <strong>in</strong>dicate that<br />

develop<br />

etc.<br />

As the design matures, objects are abstracted <strong>in</strong>to classes:<br />

group<br />

put<br />

determ<strong>in</strong>e<br />

Initial design effort focuses on the overall structure <strong>of</strong><br />

the program. The algorithms for the methods are specified<br />

us<strong>in</strong>g pseudocode. Actual cod<strong>in</strong>g beg<strong>in</strong>s


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 37 ]<br />

Decid<strong>in</strong>g on Objects and Classes<br />

Make some guesses about what the objects <strong>in</strong> the system<br />

are and try to arrange them <strong>in</strong>to groups (which<br />

will be the classes). Although you should put serious<br />

thought <strong>in</strong>to this, don’t try to do this perfectly on the<br />

first pass.<br />

Rule <strong>of</strong> Thumb:<br />

Later you may need<br />

As you come up with the objects, some details (variables<br />

and methods) will be obvious. Document these<br />

and test them out with scenarios —<br />

A scenario is a


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 38 ]<br />

L<strong>in</strong>ked List<br />

L<strong>in</strong>ked lists are useful when<br />

L<strong>in</strong>ked lists are an example <strong>of</strong><br />

Separate blocks <strong>of</strong> storage are<br />

L<strong>in</strong>ked representations are an important alternative to<br />

Many key abstract data types (lists, stacks, queues, sets,<br />

trees, tables) can be represented with either<br />

Important to understand the


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 39 ]<br />

Po<strong>in</strong>ters<br />

Po<strong>in</strong>ters <strong>in</strong> Java are called<br />

However, you cannot


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 40 ]<br />

L<strong>in</strong>ear L<strong>in</strong>ked Lists<br />

The list consists <strong>of</strong> a series <strong>of</strong><br />

Each node conta<strong>in</strong>s<br />

<br />

<br />

To realize this idea <strong>in</strong> Java:<br />

each<br />

class<br />

–<br />

–<br />

another class


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 41 ]<br />

L<strong>in</strong>ear L<strong>in</strong>ked Lists (cont’d)<br />

Here is a diagram <strong>of</strong> the heap:<br />

Space complexity:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 42 ]<br />

L<strong>in</strong>ked List Example — Node Class<br />

For a l<strong>in</strong>ked list <strong>of</strong> books, first def<strong>in</strong>e a class that represents<br />

<strong>in</strong>dividual list elements (nodes).<br />

The type <strong>of</strong> the l<strong>in</strong>k variable is the same as the class<br />

be<strong>in</strong>g def<strong>in</strong>ed —


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 43 ]<br />

L<strong>in</strong>ked List Example — List Class<br />

Then def<strong>in</strong>e a class that represents


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 44 ]<br />

L<strong>in</strong>ked List Operations<br />

What should be the operations on a l<strong>in</strong>ked list?<br />

–<br />

–<br />

–<br />

–<br />

–<br />

<br />

–<br />

Add some <strong>in</strong>stance methods to the BookList class:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 45 ]<br />

Us<strong>in</strong>g a L<strong>in</strong>ked List<br />

Example:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 46 ]<br />

Insert<strong>in</strong>g at the Front <strong>of</strong> a L<strong>in</strong>ked List<br />

Pseudocode:<br />

1.<br />

2.<br />

In Java (assum<strong>in</strong>g the parameter is not null):


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 47 ]<br />

Insert<strong>in</strong>g at the Front <strong>of</strong> a L<strong>in</strong>ked List (cont’d)<br />

What happens if we do step 1 and step 2 <strong>in</strong> the opposite<br />

order?<br />

Time Complexity:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 48 ]<br />

Insert<strong>in</strong>g at the End <strong>of</strong> a L<strong>in</strong>ked List<br />

First, assume the list is empty (i.e., first equals null).<br />

1.<br />

2.<br />

Now, assume the list is not empty (i.e., first does<br />

not equal null).<br />

1.<br />

2.<br />

Howdowedostep1?


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 49 ]<br />

Insert<strong>in</strong>g at the End <strong>of</strong> a L<strong>in</strong>ked List (cont’d)<br />

Time Complexity:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 50 ]<br />

Us<strong>in</strong>g a Last Po<strong>in</strong>ter<br />

To improve runn<strong>in</strong>g time, keep a po<strong>in</strong>ter to the last<br />

node <strong>in</strong> the list class, as well as the first node.<br />

Time Complexity:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 51 ]<br />

Us<strong>in</strong>g a Last Po<strong>in</strong>ter (cont’d)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 52 ]<br />

Delet<strong>in</strong>g Last Node from L<strong>in</strong>ked List<br />

Suppose we want to delete the node at the end <strong>of</strong> the<br />

list and return the deleted node.<br />

First, let’s handle the boundary conditions:<br />

If the list is empty,<br />

If the list has only one element


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 53 ]<br />

Delet<strong>in</strong>g Last Node from L<strong>in</strong>ked List (cont’d)<br />

Suppose the list has at least two elements.<br />

First attempt:<br />

1.<br />

2.<br />

3.<br />

...<br />

Step 1 can be done as before.<br />

return this<br />

What about step 2?


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 54 ]<br />

Delet<strong>in</strong>g Last Node from L<strong>in</strong>ked List (cont’d)<br />

Time Complexity:<br />

Would it help to keep a last po<strong>in</strong>ter?


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 55 ]<br />

L<strong>in</strong>ked Lists Pitfalls<br />

Check that a l<strong>in</strong>k is not null before follow<strong>in</strong>g it!<br />

Example:<br />

Mark end <strong>of</strong> list<br />

Be careful with boundary cases!<br />

Draw memory diagrams!<br />

Don’t lose access to needed objects!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 56 ]<br />

L<strong>in</strong>ked Lists vs. Arrays<br />

Space complexity:<br />

Time Complexity (n data items):<br />

<strong>in</strong>sert front<br />

s<strong>in</strong>gly s<strong>in</strong>gly doubly doubly array<br />

l<strong>in</strong>ked l<strong>in</strong>ked, l<strong>in</strong>ked l<strong>in</strong>ked,<br />

last ptr last ptr<br />

<strong>in</strong>sert end<br />

delete first<br />

delete last<br />

search


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 57 ]<br />

L<strong>in</strong>ked Lists vs. Arrays (cont’d)<br />

Suppose the items <strong>in</strong> the sequence are <strong>in</strong> sorted order.<br />

Then data items must be <strong>in</strong>serted <strong>in</strong> the correct place.<br />

But perhaps this will make search<strong>in</strong>g for an item easier.<br />

Break the <strong>in</strong>sertion process <strong>in</strong>to two parts:<br />

1. search<br />

2. <strong>in</strong>sert<br />

search<br />

s<strong>in</strong>gly s<strong>in</strong>gly doubly doubly array<br />

l<strong>in</strong>ked l<strong>in</strong>ked, l<strong>in</strong>ked l<strong>in</strong>ked,<br />

last ptr last ptr<br />

<strong>in</strong>sert


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 58 ]<br />

L<strong>in</strong>ked Lists vs. Arrays (cont’d)<br />

Trade<strong>of</strong>f:<br />

l<strong>in</strong>ked list:<br />

– <strong>in</strong>sert is<br />

– search is<br />

because nodes<br />

arrays:<br />

– <strong>in</strong>sert is<br />

– search is<br />

because nodes<br />

B<strong>in</strong>ary search cannot be used on<br />

Later we will see some other data structures that try to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 59 ]<br />

Other L<strong>in</strong>ked Structures<br />

We don’t have to restrict ourselves to just hav<strong>in</strong>g one<br />

l<strong>in</strong>k <strong>in</strong>stance variable per node. We can get arbitrarily<br />

complicated l<strong>in</strong>ked structures.<br />

Some <strong>of</strong> the more common and useful ones are:<br />

doubly l<strong>in</strong>ked list —<br />

r<strong>in</strong>gs —<br />

trees —<br />

general graphs —


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 60 ]<br />

Recursion<br />

Idea <strong>of</strong> recursion is closely related to the pr<strong>in</strong>ciple <strong>of</strong><br />

Figure out how to<br />

Assume you have a<br />

Figure out how to<br />

This is also an application <strong>of</strong><br />

Rules for recursive programs:<br />

T<strong>here</strong> must be<br />

Recursive call(s) must


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 61 ]<br />

Stack Frames for Recursive Methods<br />

When a recursive method is executed,<br />

Example:<br />

The factorial <strong>of</strong> n, represented n!, is calculated as n <br />

(n , 1) (n , 2) 21.<br />

To compute n!:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 62 ]<br />

Stack Frames for Factorial Example<br />

Stack frames when call<strong>in</strong>g fact(4) :


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 63 ]<br />

Revers<strong>in</strong>g a L<strong>in</strong>ked List Recursively<br />

To f<strong>in</strong>d a recursive solution, break the problem down<br />

<strong>in</strong>to a smaller problem. Let the list consist <strong>of</strong> nodes<br />

x 1 ;x 2 ;:::;x n .<br />

One idea:<br />

1. Reverse<br />

2. Put<br />

Step 1 solves a smaller problem; step 2 does a little<br />

more work to solve the larger problem.<br />

(A similar idea:<br />

1. Reverse<br />

2. Put<br />

Stopp<strong>in</strong>g case?


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 64 ]<br />

Revers<strong>in</strong>g a L<strong>in</strong>ked List Recursively (cont’d)<br />

abstract class Node {<br />

Node l<strong>in</strong>k;<br />

}<br />

class L<strong>in</strong>kedList {<br />

Node first;<br />

...<br />

void reverseList() {<br />

first = reverse(first);<br />

}<br />

}<br />

reverseList is an <strong>in</strong>stance method that<br />

Note a common occurrence:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 65 ]<br />

Revers<strong>in</strong>g a L<strong>in</strong>ked List Recursively (cont’d)<br />

reverse takes as a parameter<br />

reverse returns


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 66 ]<br />

Concatenat<strong>in</strong>g Two Lists<br />

Method concat appends the list start<strong>in</strong>g with node b<br />

to the end <strong>of</strong> the list start<strong>in</strong>g with node a. It returns a<br />

reference to the first node <strong>in</strong> the result<strong>in</strong>g list.<br />

Time Complexity: To reverse a list <strong>of</strong> n nodes takes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 67 ]<br />

Figure for Revers<strong>in</strong>g a L<strong>in</strong>ked List Recursively


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 68 ]<br />

Revers<strong>in</strong>g an Array Recursively<br />

Let A be an array <strong>of</strong> size n. To reverse A,wemust<br />

change which <strong>in</strong>dexes are occupied by which data, so<br />

that at the end:<br />

A[0] conta<strong>in</strong>s<br />

A[1] conta<strong>in</strong>s<br />

etc.<br />

We can follow the ideas from the l<strong>in</strong>ked list:<br />

1. save<br />

2. recursively cause<br />

3. store<br />

This breaks the problem <strong>of</strong> size n down <strong>in</strong>to a subproblem<br />

<strong>of</strong> size n , 1.<br />

Stopp<strong>in</strong>g case:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 69 ]<br />

Revers<strong>in</strong>g an Array Recursively (cont’d)<br />

The follow<strong>in</strong>g reverses the elements <strong>of</strong> A start<strong>in</strong>g at<br />

<strong>in</strong>dex start:<br />

The top level call is:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 70 ]<br />

Figure for Revers<strong>in</strong>g an Array Recursively


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 71 ]<br />

Towers <strong>of</strong> Hanoi<br />

Towers <strong>of</strong> Hanoi is is an example <strong>of</strong> a problem that<br />

is much easier to solve us<strong>in</strong>g recursion than not us<strong>in</strong>g<br />

recursion.<br />

T<strong>here</strong> are 3 pegs and n disks, all <strong>of</strong> different sizes<br />

Initially all disks are on the start peg, stacked <strong>in</strong><br />

decreas<strong>in</strong>g size, with largest on bottom and smallest<br />

on top.<br />

We must move all the disks to the end peg<br />

The third peg<br />

Example: n =2. Solution is:<br />

1. Move<br />

2. Move<br />

3. Move<br />

For larger n, it becomes difficult to figure out.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 72 ]<br />

Recursive Solution to Towers <strong>of</strong> Hanoi<br />

Us<strong>in</strong>g recursion can help. Suppose someone gives us a<br />

method M to move n , 1 pegs. We can use it to solve<br />

the problem for n pegs as follows:<br />

1. Move<br />

2. Move<br />

3. Move<br />

Steps 1 and 3 will be done<br />

Stopp<strong>in</strong>g case?


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 73 ]<br />

Figure for Towers <strong>of</strong> Hanoi


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 74 ]<br />

Recursive Solution to Towers <strong>of</strong> Hanoi (cont’d)<br />

The output <strong>of</strong> the program will be a list <strong>of</strong> <strong>in</strong>structions.<br />

To call this method, suppose you have 4 pegs and you<br />

want to use peg 1 as the start peg, peg 3 as the f<strong>in</strong>ish<br />

peg, and peg 2 as the spare peg:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 75 ]<br />

Time Complexity <strong>of</strong> Towers <strong>of</strong> Hanoi Solution<br />

Time Complexity: Asymptotically proportional to the<br />

number <strong>of</strong><br />

Each <strong>in</strong>stantiation <strong>of</strong> the method<br />

To count the number <strong>of</strong> <strong>in</strong>stantiations, draw a<br />

Number <strong>of</strong> vertices <strong>in</strong> the tree is<br />

T<strong>here</strong>fore time complexity is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 76 ]<br />

Pars<strong>in</strong>g Arithmetic Expressions<br />

An important part <strong>of</strong> a compiler is the parser, which<br />

checks whether<br />

An important part <strong>of</strong> this problem is to check whether<br />

a +(b,(x=y))<br />

a ++b=z<br />

(a)) c<br />

To simplify the problem:<br />

Assume that the operands are<br />

Only consider operators<br />

The correct syntax for arithmetic expressions can be<br />

described us<strong>in</strong>g


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 77 ]<br />

A Grammar for Arithmetic Expressions<br />

Sample Rules: (j means “or”)<br />

1.<br />

2.<br />

3.<br />

Here are some derivations:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 78 ]<br />

Recursive Pars<strong>in</strong>g Algorithm<br />

Idea is to try to obta<strong>in</strong> an expression from the <strong>in</strong>put. To<br />

do this, try to obta<strong>in</strong> from the <strong>in</strong>put<br />

<br />

<br />

<br />

To obta<strong>in</strong> a term from the <strong>in</strong>put (start<strong>in</strong>g at the current<br />

position), try to obta<strong>in</strong><br />

<br />

<br />

<br />

To obta<strong>in</strong> a factor from the <strong>in</strong>put (start<strong>in</strong>g at the current<br />

position), try to obta<strong>in</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 79 ]<br />

Recursive Pars<strong>in</strong>g Algorithm (cont’d)<br />

At the top level:<br />

boolean valid(Str<strong>in</strong>g <strong>in</strong>put) {<br />

Str<strong>in</strong>g rema<strong>in</strong>der = getExpr(<strong>in</strong>put);<br />

return ((rema<strong>in</strong>der != null) &&<br />

(rema<strong>in</strong>der.length() == 0));<br />

}<br />

getExpr recognizes an expression at the beg<strong>in</strong>n<strong>in</strong>g<br />

<strong>of</strong> <strong>in</strong>put and returns the rest <strong>of</strong> the str<strong>in</strong>g, which will<br />

be the empty str<strong>in</strong>g if noth<strong>in</strong>g is left over. If a syntax<br />

error is encountered, it returns null. (Does not handle<br />

white space <strong>in</strong> the <strong>in</strong>put.)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 80 ]<br />

Recursive Pars<strong>in</strong>g Algorithm (cont’d)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 81 ]<br />

Abstract Data Types<br />

An abstract data type (ADT) def<strong>in</strong>es entities that have<br />

<br />

<br />

ADTs provide the benefits <strong>of</strong><br />

T<strong>here</strong> is a strict separation between<br />

This separation facilitates<br />

ADTs are easily achieved <strong>in</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 82 ]<br />

ADT Example: Priority Queue Specification<br />

The priority queue ADT is useful <strong>in</strong> many situations.<br />

Here is its specification:<br />

The state is<br />

The operations on a priority queue are:<br />

–<br />

–<br />

–<br />

Note that t<strong>here</strong> is no operation to<br />

Example applications:<br />

Pay<br />

Provide


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 83 ]<br />

Us<strong>in</strong>g a Priority Queue to Sort a List <strong>of</strong> Integers<br />

Even without know<strong>in</strong>g anyth<strong>in</strong>g about how a priority<br />

queue might be implemented, we can take advantage<br />

<strong>of</strong> its operations to solve other problems.<br />

For example, to sort a list <strong>of</strong> numbers:<br />

Insert<br />

Successively<br />

Store


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 84 ]<br />

Implement<strong>in</strong>g a Priority Queue with an Array


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 85 ]<br />

Implement<strong>in</strong>g a Priority Queue with a L<strong>in</strong>ked List<br />

Pseudocode:<br />

To <strong>in</strong>sert an element:<br />

To remove the highest priority element:<br />

– Scan<br />

– When<br />

Time is<br />

Asymptotic runn<strong>in</strong>g times are<br />

Time to sort is<br />

Can we do th<strong>in</strong>gs faster by keep<strong>in</strong>g the array, or l<strong>in</strong>ked<br />

list, elements <strong>in</strong> sorted order?<br />

Warn<strong>in</strong>g:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 86 ]<br />

Implement<strong>in</strong>g a PQ with a Sorted Array<br />

Keep the array elements <strong>in</strong> <strong>in</strong>creas<strong>in</strong>g order <strong>of</strong> priority.<br />

(If highest priority is smallest element, then elements<br />

will be <strong>in</strong> decreas<strong>in</strong>g order).<br />

Pseudocode:<br />

To <strong>in</strong>sert an element:<br />

To remove the highest priority element:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 87 ]<br />

Implement<strong>in</strong>g a PQ with a Sorted L<strong>in</strong>ked List<br />

Pseudocode:<br />

To <strong>in</strong>sert an element:<br />

To remove the highest priority element:<br />

Asymptotic times are


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 88 ]<br />

Generic PQ Implementation Us<strong>in</strong>g Java<br />

To avoid rewrit<strong>in</strong>g the priority queue implementation<br />

for every different k<strong>in</strong>d <strong>of</strong> element (<strong>in</strong>teger, double,<br />

Str<strong>in</strong>g, user-def<strong>in</strong>ed classes, etc.), we can use Java’s<br />

<strong>in</strong>terface feature.<br />

All that is required is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 89 ]<br />

Us<strong>in</strong>g the ComparisonKey Interface<br />

Change the specification <strong>of</strong> the PriorityQueue<br />

class to consist <strong>of</strong> a collection <strong>of</strong><br />

Any class that<br />

Def<strong>in</strong>e a class called PQItem that<br />

sortPQ, the sort<strong>in</strong>g algorithm that uses a priority<br />

queue, can


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 90 ]<br />

Generic Implementation <strong>of</strong> PQ with Array<br />

class PriorityQueue {<br />

private ComparisonKey[] A =<br />

new ComparisonKey[100]; // <strong>in</strong>t -> CK<br />

private <strong>in</strong>t next;<br />

PriorityQueue() {<br />

next = 0;<br />

}<br />

public void <strong>in</strong>sert(ComparisonKey x) { // <strong>in</strong>t -> CK<br />

A[next] = x;<br />

next++;<br />

}<br />

public ComparisonKey remove() { // <strong>in</strong>t -> CK<br />

ComparisonKey high = A[0]; // <strong>in</strong>t -> CK<br />

<strong>in</strong>t highLoc = 0;<br />

for (<strong>in</strong>t cur = 1; cur < next; cur++) {<br />

if (high.compareTo(A[cur]) ==<br />

ComparisonKey.LOWER) { // use compareTo method<br />

high = A[cur];<br />

highLoc = cur;<br />

}<br />

}<br />

A[highLoc] = A[next-1];<br />

next--;<br />

return high;<br />

}<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 91 ]<br />

Implement<strong>in</strong>g the Generic PQItem<br />

Here is a possible PQItem class for <strong>in</strong>tegers. Note<br />

For a PQItem class for str<strong>in</strong>gs:<br />

make<br />

make<br />

the method


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 92 ]<br />

Generic PQItem’s (cont’d)<br />

This approach is particularly powerful s<strong>in</strong>ce we can<br />

Suppose the items are<br />

One form <strong>of</strong> priority might be<br />

Another form might be<br />

All those decisions will be encapsulated <strong>in</strong>side the


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 93 ]<br />

Sort<strong>in</strong>g with Generic PQ<br />

F<strong>in</strong>ally, <strong>here</strong> is the sort<strong>in</strong>g algorithm:<br />

void sortPQ (ComparisonKey[] A) {<br />

<strong>in</strong>t n = A.length;<br />

PriorityQueue pq =<br />

new PriorityQueue();<br />

for (<strong>in</strong>t i = 0; i < n; i++)<br />

pq.<strong>in</strong>sert(A[i]);<br />

for (<strong>in</strong>t i = 0; i < n; i++)<br />

A[i] = pq.remove();<br />

}<br />

The only difference from before is<br />

IMPORTANT TO NOTICE:<br />

The PriorityQueue class<br />

The sortPQ method


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 94 ]<br />

Importance <strong>of</strong> Modularity and Information Hid<strong>in</strong>g<br />

Why is it valuable to be able to do these k<strong>in</strong>ds <strong>of</strong> th<strong>in</strong>gs?<br />

The public/private visibility modifiers <strong>of</strong> Java, and the<br />

discipl<strong>in</strong>e <strong>of</strong> not mak<strong>in</strong>g the <strong>in</strong>ternal details be available<br />

outside are forms <strong>of</strong><br />

Information hid<strong>in</strong>g promotes modular programm<strong>in</strong>g<br />

— you can<br />

The key to abstraction is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 95 ]<br />

Compil<strong>in</strong>g and Runn<strong>in</strong>g a C Program <strong>in</strong> Unix<br />

Simple scenario <strong>in</strong> which your program is <strong>in</strong> a s<strong>in</strong>gle<br />

file: Suppose you want to name your program test.<br />

1. edit<br />

2. compile<br />

3. if<br />

4. run<br />

5. if


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 96 ]<br />

Structure <strong>of</strong> a C Program<br />

A C program is a list <strong>of</strong><br />

Every C program must conta<strong>in</strong><br />

Functions are<br />

The<br />

For<br />

<br />

The \n is<br />

Comments


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 97 ]<br />

A Useful Library<br />

See the Reek book (especially Chapter 16) for a description<br />

<strong>of</strong> what you can do with built-<strong>in</strong> libraries. In<br />

addition to stdio.h,<br />

stdlib.h lets you use functions for, e.g.,<br />

–<br />

–<br />

–<br />

–<br />

math.h provides<br />

str<strong>in</strong>g.h has


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 98 ]<br />

Pr<strong>in</strong>tf<br />

The function pr<strong>in</strong>tf is used to pr<strong>in</strong>t the standard output<br />

(screen):<br />

It can take a<br />

The first argument must<br />

The first argument might<br />

A<br />

Follow<strong>in</strong>g the first argument is a<br />

Example:<br />

Output is:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 99 ]<br />

Variables and Arithmetic Expressions<br />

The ma<strong>in</strong> numeric data types that we will use are:<br />

<br />

<br />

<br />

Variables are declared and manipulated <strong>in</strong> arithmetic<br />

expressions pretty much as <strong>in</strong> Java. For <strong>in</strong>stance,<br />

However, <strong>in</strong> C,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 100]<br />

Read<strong>in</strong>g from the Keyboard<br />

The function scanf reads <strong>in</strong> data from the keyboard.<br />

scanf takes a<br />

The first argument is<br />

Each<br />

After the first argument is a<br />

The subsequent arguments must each be<br />

The code for an<br />

When you run this program, it will wait for you to enter<br />

two <strong>in</strong>tegers, and then cont<strong>in</strong>ue. The <strong>in</strong>tegers can be on<br />

the same l<strong>in</strong>e separated by a space, or on two l<strong>in</strong>es.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 101]<br />

Functions<br />

Functions <strong>in</strong> C are pretty much like methods <strong>in</strong> Java<br />

(deal<strong>in</strong>g only with primitive types). Example:<br />

#<strong>in</strong>clude < stdio.h ><br />

double times2 (double x) {<br />

x = 2*x;<br />

return x;<br />

}<br />

ma<strong>in</strong> () {<br />

double y = 301.4;<br />

pr<strong>in</strong>tf("Orig<strong>in</strong>al value is %f; f<strong>in</strong>al value is %f.\n",<br />

y, times2(y));<br />

}<br />

Functions must be<br />

As <strong>in</strong> Java, parameters are<br />

As <strong>in</strong> Java, if the function does not return any value,<br />

Parameters and local variables <strong>of</strong> functions


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 102]<br />

Recursive Functions<br />

Recursion is essentially the same as <strong>in</strong> Java.<br />

The only difference is if you have mutually recursive<br />

functions, also called <strong>in</strong>direct recursion: for <strong>in</strong>stance,<br />

if function A calls function B, while B calls A.<br />

Then you have a problem with the requirement that<br />

functions be def<strong>in</strong>ed before they are used.<br />

You can get around this problem with


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 103]<br />

Global Variables and Constants<br />

C also provides global variables.<br />

A global variable is def<strong>in</strong>ed<br />

A global variable can be used<br />

Generally, global variables that can be changed are frowned<br />

upon, as contribut<strong>in</strong>g to errors. However, global variables<br />

are very appropriate for constants. Constants are<br />

def<strong>in</strong>ed us<strong>in</strong>g macros:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 104]<br />

Boolean Expressions<br />

The operators to compare two values are the same<br />

as <strong>in</strong> Java:<br />

However, <strong>in</strong>stead <strong>of</strong> return<strong>in</strong>g a boolean value, they<br />

return<br />

Actually, C <strong>in</strong>terprets<br />

Thus the analog <strong>in</strong> C <strong>of</strong> a boolean expression <strong>in</strong> Java<br />

is any expression that produces<br />

As <strong>in</strong> Java, boolean expressions can be operated on<br />

with Some examples:<br />

(10 == 3) evaluates to<br />

!(10 == 3) evaluates to<br />

!( (x < 4) || (y == 5) ) :ifxis 10 and<br />

y is 5, then this evaluates to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 105]<br />

If Statements and Loops<br />

Given the preced<strong>in</strong>g <strong>in</strong>terpretation <strong>of</strong> “boolean expression”,<br />

the follow<strong>in</strong>g statements are the same <strong>in</strong> C as <strong>in</strong><br />

Java:<br />

<br />

<br />

<br />

<br />

S<strong>in</strong>ce Boolean expressions are essentially <strong>in</strong>tegers, you<br />

can have a for statement like this <strong>in</strong> C:<br />

for (<strong>in</strong>t count = 99; count; count--) {<br />

...<br />

}<br />

count is <strong>in</strong>itialized to<br />

the loop is executed<br />

count is<br />

This loop is executed


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 106]<br />

Switch<br />

C has a switch statement that is like that <strong>in</strong> Java:<br />

switch ( ) {<br />

case :<br />

<br />

break;<br />

case :<br />

<br />

break;<br />

...<br />

default : <br />

}<br />

Don’t forget the break statements!<br />

The <strong>in</strong>teger expression must produce a value belong<strong>in</strong>g<br />

to any <strong>of</strong> the <strong>in</strong>tegral data types (various size <strong>in</strong>tegers<br />

and characters).


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 107]<br />

Enumerations<br />

This is someth<strong>in</strong>g neat that Java does not have.<br />

An enumeration is a way to give<br />

For <strong>in</strong>stance, suppose you need to have some codes<br />

<strong>in</strong> your program to <strong>in</strong>dicate whether a library book is<br />

checked <strong>in</strong>, checked out, or lost. Intead <strong>of</strong><br />

#def<strong>in</strong>e CHECKED_IN 0<br />

#def<strong>in</strong>e CHECKED_OUT 1<br />

#def<strong>in</strong>e LOST 2<br />

you can use an enumeration declaration:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 108]<br />

Us<strong>in</strong>g an Enumeration <strong>in</strong> a Switch Statement<br />

<strong>in</strong>t status;<br />

/* some code to give status a value */<br />

switch (status) {<br />

case CHECKED_IN :<br />

/* handle a checked <strong>in</strong> book */<br />

break;<br />

case CHECKED_OUT :<br />

/* handle a checked out book */<br />

break;<br />

case LOST :<br />

/* handle a lost book */<br />

break;<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 109]<br />

Enumeration Data Type<br />

You can give a name to an enumeration and thus create<br />

an enumeration data type. The syntax is:<br />

enum <br />

For example:<br />

enum book_status { CHECKED_IN, CHECKED_OUT, LOST };<br />

Why bother to do this?


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 110]<br />

Type Synonyms<br />

The enumeration type is our first example <strong>of</strong> a user<br />

def<strong>in</strong>ed type <strong>in</strong> C.<br />

It’s rather unpleasant to have to carry around the word<br />

enum all the time for this type.<br />

Instead, you can give a name to this type you have<br />

created, and subsequently just use that type – without<br />

hav<strong>in</strong>g to keep repeat<strong>in</strong>g enum. For example:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 111]<br />

Structures<br />

C also gives you a way to create more general types <strong>of</strong><br />

your own, as structures These are essentially like objects<br />

<strong>in</strong> Java, if you just consider the <strong>in</strong>stance variables.<br />

A structure groups together related data items that can<br />

be <strong>of</strong> different types.<br />

The syntax to def<strong>in</strong>e a structure is:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 112]<br />

Storage on the Stack<br />

The statement<br />

struct student stu;<br />

causes the entire stu structure to be stored


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 113]<br />

Us<strong>in</strong>g typedef with Structures<br />

When us<strong>in</strong>g the structure type, you have to carry along<br />

the word struct.<br />

To avoid this, you can use a<br />

A more concise way to do this is:<br />

Now you can create a Student variable:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 114]<br />

Us<strong>in</strong>g a Structure<br />

You can access the pieces <strong>of</strong> a structure us<strong>in</strong>g dot notation<br />

(analogous to access<strong>in</strong>g <strong>in</strong>stance variables <strong>of</strong> an<br />

object <strong>in</strong> Java) :<br />

You can also have the entire struct on either the left or<br />

the right side <strong>of</strong> the assignment operator:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 115]<br />

Figure for Copy<strong>in</strong>g a Structure


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 116]<br />

Pass<strong>in</strong>g a Structure to a Function<br />

Structures can be passed as parameters to functions:<br />

Then you can call the function:<br />

But if you put the follow<strong>in</strong>g l<strong>in</strong>e <strong>of</strong> code after the pr<strong>in</strong>tf<br />

<strong>in</strong> pr<strong>in</strong>t <strong>in</strong>fo:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 117]<br />

Return<strong>in</strong>g a Structure From a Function<br />

You can return a structure from a function also. Suppose<br />

you have the follow<strong>in</strong>g function:<br />

Now you can call the function:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 118]<br />

Figure for Return<strong>in</strong>g a Structure from a Function<br />

The copy<strong>in</strong>g <strong>of</strong> formal parameters and return values<br />

can be avoided by


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 119]<br />

Arrays<br />

To def<strong>in</strong>e an array:<br />

For example:<br />

Unlike Java,<br />

Unlike Java,<br />

Unlike Java,<br />

As <strong>in</strong> Java,<br />

As <strong>in</strong> Java,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 120]<br />

Arrays (cont’d)<br />

Two th<strong>in</strong>gs you CAN do:<br />

If you have an array <strong>of</strong> structures,<br />

You can declare a two-dimensional array (and higher):<br />

e.g.,<br />

Two th<strong>in</strong>gs you CANNOT do:<br />

<br />

<br />

We’ll see how to accomplish these tasks


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 121]<br />

Po<strong>in</strong>ters <strong>in</strong> C<br />

Po<strong>in</strong>ters are used <strong>in</strong> C to<br />

circumvent<br />

– copy<strong>in</strong>g <strong>of</strong> parameters and return values<br />

– last<strong>in</strong>g changes<br />

access<br />

allow<br />

For each data type T,<br />

For <strong>in</strong>stance,<br />

declares iptr to be <strong>of</strong> type “po<strong>in</strong>ter to <strong>in</strong>t”. iptr<br />

refers to a<br />

Actually, most C programmers write it as:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 122]<br />

Addresses and Indirection<br />

<strong>Computer</strong> memory is<br />

Each variable is<br />

The address <strong>of</strong> the variable is<br />

iptr refers to<br />

*iptr refers to<br />

Apply<strong>in</strong>g the * operator is called


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 123]<br />

The Address-Of Operator<br />

We saw the & operator <strong>in</strong> scanf. It<br />

<strong>in</strong>t i;<br />

<strong>in</strong>t* iptr;<br />

i = 55;<br />

iptr = &i;<br />

*iptr = *iptr + 1;<br />

Last l<strong>in</strong>e gets data out <strong>of</strong> location whose address is <strong>in</strong><br />

iptr, adds 1 to that data, and stores result back <strong>in</strong><br />

location whose address is <strong>in</strong> iptr.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 124]<br />

Compar<strong>in</strong>g Indirection and Address-Of Operators<br />

As a rule <strong>of</strong> thumb:<br />

Indirection:<br />

– It CANNOT<br />

– It CAN<br />

Address-Of:<br />

– It CAN<br />

– It CANNOT


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 125]<br />

Po<strong>in</strong>ters and Structures<br />

Remember the struct type Student, which has an<br />

<strong>in</strong>t age and a double grade po<strong>in</strong>t:<br />

Student stu;<br />

Student* sptr;<br />

sptr = &stu;<br />

To access variables <strong>of</strong> the structure:<br />

T<strong>here</strong> is a “shorthand” for this notation:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 126]<br />

Pass<strong>in</strong>g Po<strong>in</strong>ter Variables as Parameters<br />

You can pass po<strong>in</strong>ter variables as parameters.<br />

void pr<strong>in</strong>tAge(Student* sp) {<br />

pr<strong>in</strong>tf("Age is %i",sp->age);<br />

}<br />

When this function is called,<br />

1. a Student* variable:<br />

or<br />

2. apply the & operator to a Student variable:<br />

C still uses call by value to pass po<strong>in</strong>ter parameters, but<br />

because they are po<strong>in</strong>ters, what gets copied are<br />

Data com<strong>in</strong>g <strong>in</strong> to the function is not copied.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 127]<br />

Pass<strong>in</strong>g Po<strong>in</strong>ter Variables as Parameters (cont’d)<br />

Now we can<br />

void changeAge(Student* sp, <strong>in</strong>t newAge) {<br />

sp->age = newAge;<br />

}<br />

You can also<br />

Old <strong>in</strong>itialize with copy<strong>in</strong>g:<br />

Student <strong>in</strong>itialize(<strong>in</strong>t old, double gpa) {<br />

Student st;<br />

st.age = old;<br />

st.grade_po<strong>in</strong>t = gpa;<br />

return st;<br />

}<br />

More efficient <strong>in</strong>itialize us<strong>in</strong>g po<strong>in</strong>ters:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 128]<br />

Pass<strong>in</strong>g Po<strong>in</strong>ter Variables as Parameters (cont’d)<br />

Us<strong>in</strong>g po<strong>in</strong>ters is an optimization <strong>in</strong> previous case. But<br />

it is<br />

void swapAges (Student* sp1, Student* sp2) {<br />

<strong>in</strong>t temp;<br />

temp = sp1->age;<br />

sp1->age = sp2->age;<br />

sp2->age = temp;<br />

}<br />

To call this function:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 129]<br />

Po<strong>in</strong>ters and Arrays<br />

The name <strong>of</strong> an array is<br />

It is a<br />

To reference array elements, you can use<br />

<br />

or<br />

<br />

What is go<strong>in</strong>g on with the po<strong>in</strong>ter notation?<br />

a refers to<br />

*a refers to<br />

a+1 refers to<br />

*(a+1) refers to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 130]<br />

Po<strong>in</strong>ters and Arrays (cont’d)<br />

You can also refer to array elements with<br />

For example,<br />

<strong>in</strong>t a[5];<br />

<strong>in</strong>t* p;<br />

p = a; /* p = &a[0]; is same */<br />

p refers to<br />

*p refers to<br />

p+1 refers to<br />

*(p+1) refers to<br />

S<strong>in</strong>ce p is a non-constant po<strong>in</strong>ter, you can also<br />

Warn<strong>in</strong>g: NO BOUNDS CHECKING IS DONE IN<br />

C!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 131]<br />

Pass<strong>in</strong>g an Array as a Parameter<br />

To pass an array to a function:<br />

void pr<strong>in</strong>tAllAges(<strong>in</strong>t a[], <strong>in</strong>t n) {<br />

<strong>in</strong>t i;<br />

for (i = 0; i < n; i++) {<br />

pr<strong>in</strong>tf("%i \n", a[i]);<br />

}<br />

}<br />

The “array” parameter <strong>in</strong>dicates<br />

Alternative def<strong>in</strong>ition:<br />

void pr<strong>in</strong>tAllAges(<strong>in</strong>t* p, <strong>in</strong>t n) {<br />

<strong>in</strong>t i;<br />

for (i = 0; i < n; i++) {<br />

pr<strong>in</strong>tf("%i \n", *p);<br />

p++;<br />

}<br />

}<br />

The formal array parameter is a<br />

You can call the function like this:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 132]<br />

Dynamic Memory Allocation <strong>in</strong> Java<br />

Java<br />

That means that<br />

This happens whenever<br />

In Java t<strong>here</strong> is strict dist<strong>in</strong>ction between<br />

Every variable is either<br />

memory for variables is<br />

This memory<br />

memory for variables <strong>of</strong> primitive type<br />

memory that holds the actual contents <strong>of</strong> an object<br />

is<br />

This memory goes away


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 133]<br />

Dynamic Memory Allocation <strong>in</strong> C<br />

In C,<br />

Every type has the possibility <strong>of</strong> be<strong>in</strong>g allocated statically<br />

(on the stack) or dynamically (on the heap).<br />

To allocate space statically, you<br />

Space is allocated<br />

To allocate space dynamically, use<br />

It takes one <strong>in</strong>teger parameter <strong>in</strong>dicat<strong>in</strong>g the<br />

Use size<strong>of</strong> operator to get the length;<br />

It returns a<br />

The po<strong>in</strong>ter has type void*. You MUST cast it to<br />

the appropriate type. If malloc fails to allocate the<br />

space,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 134]<br />

malloc Example<br />

To dynamically allocate space for an <strong>in</strong>t:<br />

<strong>in</strong>t* p;<br />

p = (<strong>in</strong>t*) malloc(size<strong>of</strong>(<strong>in</strong>t)); /* cast result<br />

to <strong>in</strong>t* */<br />

if (p == NULL) { /* to be on the safe side */<br />

pr<strong>in</strong>tf("malloc failed!");<br />

} else {<br />

*p = 33;<br />

pr<strong>in</strong>tf("%i", *p);<br />

}<br />

Normally, you don’t need to allocate a s<strong>in</strong>gle <strong>in</strong>teger at<br />

a time. Typically, you would use malloc to:<br />

allocate<br />

allocate


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 135]<br />

Another malloc Example<br />

To dynamically allocate space for a structure:<br />

Student* sptr;<br />

sptr = (Student*) malloc(size<strong>of</strong>(Student));<br />

sptr->age = 20;<br />

sptr->grade_po<strong>in</strong>t = 3.4;


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 136]<br />

Allocat<strong>in</strong>g a L<strong>in</strong>ked List Node Dynamically<br />

For a s<strong>in</strong>gly l<strong>in</strong>ked list <strong>of</strong> students, use this type:<br />

typedef struct Stu_Node{<br />

<strong>in</strong>t age;<br />

double grade_po<strong>in</strong>t;<br />

struct Stu_Node* l<strong>in</strong>k;<br />

} StuNode;<br />

To allocate a node for the list:<br />

To <strong>in</strong>sert the node po<strong>in</strong>ted to by sptr after the node<br />

po<strong>in</strong>ted to by some other node, say cur:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 137]<br />

Allocat<strong>in</strong>g an Array Dynamically<br />

To allocate an array dynamically,<br />

<strong>in</strong>t i;<br />

<strong>in</strong>t* p;<br />

p = (<strong>in</strong>t*) malloc(100*size<strong>of</strong>(<strong>in</strong>t)); /* 100 elt array */<br />

/* now p po<strong>in</strong>ts to the beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> the array */<br />

for (i = 0; i < 100; i++) /* <strong>in</strong>itialize the array */<br />

p[i] = 0; /* access the elements */<br />

Similarly, you can allocate an array <strong>of</strong> structures:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 138]<br />

Deallocat<strong>in</strong>g Memory Dynamically<br />

When memory is allocated us<strong>in</strong>g malloc,<br />

You can get<br />

void sub() {<br />

<strong>in</strong>t *p;<br />

p = (<strong>in</strong>t*) malloc(100*size<strong>of</strong>(<strong>in</strong>t));<br />

return;<br />

}<br />

Although the space for the po<strong>in</strong>ter variable p goes away<br />

when sub f<strong>in</strong>ishes execut<strong>in</strong>g,<br />

But they are completely useless after sub is done,<br />

If you had wanted them to be accessible outside <strong>of</strong><br />

sub,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 139]<br />

Us<strong>in</strong>g free<br />

To deallocate memory when you are through with it,<br />

It takes as an argument a<br />

and returns noth<strong>in</strong>g. The result <strong>of</strong> free is that all the<br />

space start<strong>in</strong>g at the designated location will be<br />

In the function void sub above, just before the return,<br />

you should say:<br />

DO NOT DO THE FOLLOWING:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 140]<br />

Sav<strong>in</strong>g Space with Arrays <strong>of</strong> Po<strong>in</strong>ters<br />

Suppose you need an array <strong>of</strong> structures, w<strong>here</strong> each<br />

structure is fairly large. But you are not sure at compile<br />

time how big the array needs to be.<br />

1. Allocate<br />

2. F<strong>in</strong>d out<br />

3. Allocate


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 141]<br />

Array <strong>of</strong> Po<strong>in</strong>ters Example<br />

To implement with the usual Student struct:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 142]<br />

Information Hid<strong>in</strong>g <strong>in</strong> C<br />

Java provides support for <strong>in</strong>formation hid<strong>in</strong>g by<br />

<br />

<br />

Advantages <strong>of</strong> data abstraction, <strong>in</strong>clud<strong>in</strong>g the use <strong>of</strong><br />

constructor and accessor (set and get) functions:<br />

push<br />

easier<br />

easy<br />

easy<br />

C does not provide the same level <strong>of</strong> compiler support<br />

as Java, but you can achieve the same effect with some


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 143]<br />

Information Hid<strong>in</strong>g <strong>in</strong> C (cont’d)<br />

A “constructor” <strong>in</strong> C would be a function that<br />

calls<br />

<strong>in</strong>itializes<br />

returns<br />

For example:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 144]<br />

Information Hid<strong>in</strong>g <strong>in</strong> C (cont’d)<br />

The analog <strong>of</strong> a Java <strong>in</strong>stance method <strong>in</strong> C would be<br />

a function whose first parameter is the “object” to be<br />

operated on.<br />

You can write set and get functions <strong>in</strong> C:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 145]<br />

Information Hid<strong>in</strong>g <strong>in</strong> C (cont’d)<br />

You can use the set and get functions to swap the<br />

ages for two student objects:<br />

When should you provide set and get functions and<br />

when should you not? They obviously impose some<br />

overhead <strong>in</strong> terms <strong>of</strong> additional function calls.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 146]<br />

Str<strong>in</strong>gs <strong>in</strong> C<br />

T<strong>here</strong> is no explicit str<strong>in</strong>g type <strong>in</strong> C.<br />

A str<strong>in</strong>g <strong>in</strong> C is an array <strong>of</strong> characters that is<br />

term<strong>in</strong>ated with the null character.<br />

The length<br />

The null character<br />

A sequence <strong>of</strong> characters enclosed <strong>in</strong> double quotes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 147]<br />

Str<strong>in</strong>gs <strong>in</strong> C (cont’d)<br />

You can also declare a<br />

To <strong>in</strong>itialize name, do not assign to a str<strong>in</strong>g literal!<br />

Instead, either<br />

Access elements us<strong>in</strong>g the brackets notation:<br />

char firstLetter;<br />

name[3] = ’a’;<br />

firstLetter = name[0];<br />

namePtr[3] = ’b’;<br />

firstLetter = namePtr[0];


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 148]<br />

Pass<strong>in</strong>g Str<strong>in</strong>gs to and from Funtions<br />

To pass a str<strong>in</strong>g <strong>in</strong>to a function or return one from a<br />

function, you must<br />

Pass<strong>in</strong>g <strong>in</strong> a str<strong>in</strong>g:<br />

Return<strong>in</strong>g a str<strong>in</strong>g:<br />

You can call these functions like this:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 149]<br />

Read<strong>in</strong>g <strong>in</strong> a Str<strong>in</strong>g from the User<br />

To read <strong>in</strong> a str<strong>in</strong>g from the user, call:<br />

scanf("%s", name);<br />

Notice the use <strong>of</strong> %s <strong>in</strong> scanf. The correspond<strong>in</strong>g<br />

data must be a<br />

scanf reads a str<strong>in</strong>g from the <strong>in</strong>put stream up to<br />

The letters are read <strong>in</strong>to<br />

You must make sure that you have a large enough<br />

array to hold the str<strong>in</strong>g. How much space is needed?<br />

If you don’t have enough space, whatever follows<br />

the array will be


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 150]<br />

Str<strong>in</strong>g Manipulation Functions<br />

T<strong>here</strong> are some useful str<strong>in</strong>g manipulation functions<br />

provided for you <strong>in</strong> C. These <strong>in</strong>clude:<br />

strlen, which takes a str<strong>in</strong>g as an argument and<br />

returns the length <strong>of</strong> the str<strong>in</strong>g, not count<strong>in</strong>g the<br />

null character at the end. I.e., it counts how many<br />

characters it encounters before reach<strong>in</strong>g ’\0’.<br />

strcpy, which takes two str<strong>in</strong>gs as arguments and<br />

copies its second argument to its first argument.<br />

First, to use them, you need to <strong>in</strong>clude headers for the<br />

str<strong>in</strong>g handl<strong>in</strong>g library:<br />

#<strong>in</strong>clude <br />

To demonstrate the use <strong>of</strong> strlen and strcpy, suppose<br />

you want to add a name component to the Student<br />

structure and change the constructor so that it asks the<br />

user <strong>in</strong>teractively for the name:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 151]<br />

Str<strong>in</strong>g Manipulation Functions Example<br />

typedef struct {<br />

char* name;<br />

<strong>in</strong>t age;<br />

double grade_po<strong>in</strong>t;<br />

} Student;<br />

Student* constructStudent(<strong>in</strong>t age, double gpa) {<br />

char <strong>in</strong>putBuffer[100]; /* read name <strong>in</strong>to this */<br />

Student* sptr;<br />

sptr = (Student*) malloc(size<strong>of</strong>(Student));<br />

sptr->age = age;<br />

sptr->grade_po<strong>in</strong>t = gpa;<br />

/* <strong>here</strong>’s the new part: */<br />

pr<strong>in</strong>tf("Enter student’s name: ");<br />

scanf("%s", <strong>in</strong>putBuffer);<br />

/* allocate just enough space for the name */<br />

sptr->name = (char*) malloc (<br />

(strlen (<strong>in</strong>putBuffer) + 1)*size<strong>of</strong>(char) );<br />

/* copy name <strong>in</strong>to new space */<br />

strcpy (sptr->name, <strong>in</strong>putBuffer);<br />

return sptr;<br />

}<br />

When constructor returns, <strong>in</strong>putBuffer goes away.<br />

Space allocated for Student object is an <strong>in</strong>t,adouble<br />

and just enough space for the actual name.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 152]<br />

Other K<strong>in</strong>ds <strong>of</strong> Character Arrays<br />

Not every character array has to be used to represent a<br />

str<strong>in</strong>g. You may want a character array that holds all<br />

possible letter grades, for <strong>in</strong>stance:<br />

char grades[5];<br />

grades[0] = ’A’;<br />

grades[1] = ’B’;<br />

grades[2] = ’C’;<br />

grades[3] = ’D’;<br />

grades[4] = ’F’;<br />

In this case, t<strong>here</strong> is no reason for the last array entry<br />

to be the null character, and <strong>in</strong> fact, it is not.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 153]<br />

File Input and Output<br />

File I/O is much simpler than <strong>in</strong> Java.<br />

Include<br />

Declare<br />

Call<br />

Writ<strong>in</strong>g to a file is done with<br />

Read<strong>in</strong>g from a file is done with<br />

Call


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 154]<br />

File I/O Example<br />

/* to use the built <strong>in</strong> file functions */<br />

#<strong>in</strong>clude <br />

ma<strong>in</strong> () {<br />

/* create a po<strong>in</strong>ter to a struct called FILE; */<br />

/* it is system dependent */<br />

FILE* fp;<br />

char l<strong>in</strong>e[80];<br />

<strong>in</strong>t i;<br />

/* open the file for writ<strong>in</strong>g */<br />

fp = fopen("testfile", "w");<br />

/* write <strong>in</strong>to the file */<br />

fpr<strong>in</strong>tf(fp,"L<strong>in</strong>e %i ends \n", 1);<br />

fpr<strong>in</strong>tf(fp,"L<strong>in</strong>e %i ends \n", 2);<br />

/* close the file */<br />

fclose(fp);<br />

/* open the file for read<strong>in</strong>g */<br />

fp = fopen("testfile", "r");<br />

/* read six str<strong>in</strong>gs from the file */<br />

for (i = 1; i < 7; i++) {<br />

fscanf(fp,"%s", l<strong>in</strong>e);<br />

pr<strong>in</strong>tf("got from the file: %s \n", l<strong>in</strong>e);<br />

}<br />

/* close the file<br />

fclose(fp);<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 155]<br />

Motivation for Stacks<br />

Some examples <strong>of</strong> last-<strong>in</strong>, first-out (LIFO) behavior:<br />

Web browser’s<br />

Text editors<br />

The most recent pend<strong>in</strong>g method/function call<br />

To evaluate an arithmetic expression,<br />

A stack is a sequence <strong>of</strong> elements, to which elements<br />

can be added (push) and removed (pop):


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 156]<br />

Specify<strong>in</strong>g an ADT with an Abstract State<br />

We would like a specification to be as <strong>in</strong>dependent <strong>of</strong><br />

any particular implementation as possible.<br />

But s<strong>in</strong>ce people naturally th<strong>in</strong>k <strong>in</strong> terms <strong>of</strong> state, a<br />

popular way to specify an ADT is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 157]<br />

Specify<strong>in</strong>g the Stack ADT with an Abstract State<br />

1. A stack’s state is modeled as<br />

2. Initially the state <strong>of</strong> the stack is<br />

3. The effect <strong>of</strong> a push(x) operation is to<br />

4. The effect <strong>of</strong> a pop operation is to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 158]<br />

Specify<strong>in</strong>g an ADT with Operation Sequences<br />

But a purist might compla<strong>in</strong> that a state-based specification<br />

is, implicitly, suggest<strong>in</strong>g a particular implementation.<br />

To be even more abstract, one can specify an<br />

ADT<br />

For <strong>in</strong>stance:<br />

push(a) pop(a):<br />

pop(a):<br />

push(a) push(b) push(c) pop(c) pop(b) push(d) pop(d):<br />

push(a) push(b) pop(a):


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 159]<br />

Additional Stack Operations<br />

Other operations that you sometimes want to provide:<br />

peek:<br />

size:<br />

empty:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 160]<br />

Balanced Parentheses<br />

Recursive def<strong>in</strong>ition <strong>of</strong> a sequence <strong>of</strong> parentheses that<br />

is balanced:<br />

the sequence<br />

if the sequence<br />

Accord<strong>in</strong>g to this def<strong>in</strong>ition:<br />

():<br />

(()(())):<br />

(()))():<br />

())(:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 161]<br />

Algorithm to Check for Balanced Parentheses<br />

Key observations:<br />

1. T<strong>here</strong> must be<br />

2. In any prefix, the number <strong>of</strong><br />

Pseudocode:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 162]<br />

Java Method to Check for Balanced Parentheses<br />

Us<strong>in</strong>g java.util.Stack class (which manipulates<br />

objects):<br />

import java.util.*;<br />

boolean isBalanced(char[] parens) {<br />

Stack S = new Stack();<br />

try { // pop might throw an exception<br />

for (<strong>in</strong>t i = 0; i < parens.length; i++) {<br />

if ( parens[i] == ’(’ )<br />

S.push(new Character(’(’));<br />

else<br />

S.pop(); // discard popped object<br />

}<br />

return S.empty();<br />

}<br />

catch (EmptyStackException e) {<br />

return false;<br />

}<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 163]<br />

Check<strong>in</strong>g for Multiple K<strong>in</strong>ds <strong>of</strong> Balanced Parens<br />

Suppose t<strong>here</strong> are 3 different k<strong>in</strong>ds <strong>of</strong> parentheses:<br />

(and),[and],f and g.<br />

Modify the program:<br />

boolean isBalanced3(char[] parens) {<br />

Stack S = new Stack();<br />

try {<br />

for (<strong>in</strong>t i = 0; i < parens.length; i++) {<br />

if (leftParen(parens[i]) // ( or [ or {<br />

S.push(new Character(parens[i]));<br />

else {<br />

char leftp = ((Character)S.pop()).charValue();<br />

if (!match(leftp,parens[i])) return false;<br />

}<br />

}<br />

return S.empty();<br />

} // end try<br />

catch (EmptyStackException e) {<br />

return false;<br />

}<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 164]<br />

Multiple K<strong>in</strong>ds <strong>of</strong> Parentheses (cont’d)<br />

boolean leftParen(char c) {<br />

return ((c == ’(’) || (c == ’[’) || c == ’{’));<br />

}<br />

boolean match(char lp, char rp) {<br />

if ((lp == ’(’) && (rp == ’)’) return true;<br />

if ((lp == ’[’) && (rp == ’]’) return true;<br />

if ((lp == ’{’) && (rp == ’}’) return true;<br />

return false;<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 165]<br />

Postfix Expressions<br />

We normally write arithmetic expressions us<strong>in</strong>g <strong>in</strong>fix<br />

notation:<br />

Another way to write arithmetic expressions is to use<br />

postfix notation:<br />

For example,<br />

3 4 +is same as<br />

1 2 - 5 - 6 5 / +is same as<br />

One advantage <strong>of</strong> postfix is that<br />

For <strong>in</strong>stance,<br />

(1 + 2) * 3becomes<br />

1 + (2 * 3)becomes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 166]<br />

Us<strong>in</strong>g a Stack to Evaluate Postfix Expressions<br />

Pseudocode:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 167]<br />

Str<strong>in</strong>gTokenizer Class<br />

Java’s Str<strong>in</strong>gTokenizer class is very helpful to<br />

break up the <strong>in</strong>put str<strong>in</strong>g <strong>in</strong>to operators and operands<br />

— called<br />

Create a Str<strong>in</strong>gTokenizer object out <strong>of</strong> the <strong>in</strong>put<br />

str<strong>in</strong>g. It<br />

Use <strong>in</strong>stance method hasMoreTokens to test<br />

Use <strong>in</strong>stance method nextToken to<br />

Second argument to constructor <strong>in</strong>dicates that,<br />

Third argument to constructor <strong>in</strong>dicates that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 168]<br />

Java Method to Evaluate Postfix Expressions<br />

public static double evalPostFix(Str<strong>in</strong>g postfix)<br />

throws EmptyStackException {<br />

Stack S = new Stack();<br />

Str<strong>in</strong>gTokenizer parser = new Str<strong>in</strong>gTokenizer<br />

(postfix, " \n\t\r+-*/", true);<br />

while (parser.hasMoreTokens()) {<br />

Str<strong>in</strong>g token = parser.nextToken();<br />

char c = token.charAt(0);<br />

if (isOperator(c)) {<br />

double y = ((Double)S.pop()).doubleValue();<br />

double x = ((Double)S.pop()).doubleValue();<br />

switch (c) {<br />

case ’+’:<br />

S.push(new Double(x+y)); break;<br />

case ’-’:<br />

S.push(new Double(x-y)); break;<br />

case ’*’:<br />

S.push(new Double(x*y)); break;<br />

case ’/’:<br />

S.push(new Double(x/y)); break;<br />

} // end switch<br />

} // end if<br />

else if (!isWhiteSpace(c)) // token is operand<br />

S.push(Double.valueOf(token));<br />

} // end while<br />

return ((Double)S.pop()).doubleValue();<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 169]<br />

Evaluat<strong>in</strong>g Postfix (cont’d)<br />

public static boolean isOperator(char c) {<br />

return ( (c == ’+’) || (c == ’-’) ||<br />

(c == ’*’) || (c == ’/’) );<br />

}<br />

public static boolean isWhiteSpace(char c) {<br />

return ( (c == ’ ’) || (c == ’\n’) ||<br />

(c == ’\t’) || (c == ’\r’) );<br />

}<br />

Does not<br />

Does no


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 170]<br />

Implement<strong>in</strong>g a Stack with an Array<br />

S<strong>in</strong>ce Java supplies a Stack class, why bother?<br />

Idea:<br />

Issues for Java implementation:<br />

elements <strong>in</strong> the array are to be <strong>of</strong> type<br />

throw exception if<br />

dynamically <strong>in</strong>crease the size <strong>of</strong> the array to avoid<br />

To handle the last po<strong>in</strong>t, we’ll do the follow<strong>in</strong>g:<br />

<strong>in</strong>itially,<br />

if array is full and a push occurs,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 171]<br />

Implement<strong>in</strong>g a Stack with an Array <strong>in</strong> Java<br />

class Stack {<br />

private Object[] A;<br />

private <strong>in</strong>t next;<br />

public Stack () {<br />

A = new Object[16];<br />

next = 0;<br />

}<br />

public void push(Object obj) {<br />

if (next == A.length) {<br />

// array is full, double its size<br />

Object[] newA = new Object[2*A.length];<br />

for (<strong>in</strong>t i = 0; i < next; i++) // copy<br />

newA[i] = A[i];<br />

A = newA; // old A can now be garbage collected<br />

}<br />

A[next] = obj;<br />

next++;<br />

}<br />

public Object pop() throws EmptyStackException {<br />

if (next == 0)<br />

throw new EmptyStackException();<br />

else {<br />

next--;<br />

return A[next];<br />

}<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 172]<br />

Implement<strong>in</strong>g a Stack with an Array <strong>in</strong> Java (cont’d)<br />

public boolean empty() {<br />

return (next == 0);<br />

}<br />

public Object peek() throws EmptyStackException {<br />

if (next == 0)<br />

throw new EmptyStackException();<br />

else<br />

return A[next-1];<br />

}<br />

} // end Stack class<br />

class EmptyStackException extends Exception {<br />

}<br />

public EmptyStackException() {<br />

super();<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 173]<br />

Time Performance <strong>of</strong> Array Implementation<br />

push:<br />

pop:<br />

empty:<br />

peek:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 174]<br />

Impement<strong>in</strong>g a Stack with a L<strong>in</strong>ked List <strong>in</strong> Java<br />

Idea:<br />

class StackNode {<br />

Object item;<br />

StackNode l<strong>in</strong>k;<br />

}<br />

class Stack {<br />

private StackNode top; // first node <strong>in</strong> list, the top<br />

public Stack () {<br />

top = null;<br />

}<br />

public void push(Object obj) {<br />

StackNode node = new StackNode();<br />

node.item = obj;<br />

node.l<strong>in</strong>k = top;<br />

top = node;<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 175]<br />

Implement<strong>in</strong>g a Stack with a L<strong>in</strong>ked List <strong>in</strong> Java<br />

(cont’d)<br />

public Object pop() throws EmptyStackException {<br />

}<br />

if (top == null)<br />

throw new EmptyStackException();<br />

else {<br />

StackNode temp = top;<br />

top = top.l<strong>in</strong>k;<br />

return temp.item;<br />

}<br />

public boolean empty() {<br />

return (top == null);<br />

}<br />

}<br />

public Object peek() throws EmptyStackException {<br />

if (top == null)<br />

throw new EmptyStackException();<br />

else<br />

return top.item;<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 176]<br />

Time Performance <strong>of</strong> L<strong>in</strong>ked List Implementation<br />

push:<br />

pop:<br />

empty:<br />

peek:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 177]<br />

Interchangeability <strong>of</strong> Implementations<br />

If you have done th<strong>in</strong>gs right, you can:<br />

write a program us<strong>in</strong>g the built-<strong>in</strong> Stack class<br />

compile and run that program<br />

then make available your own Stack class, us<strong>in</strong>g<br />

the array implementation (e.g., put Stack.class<br />

<strong>in</strong> the same directory<br />

WITHOUT CHANGING OR RECOMPILING YOUR<br />

PROGRAM, run your program — it will use the local<br />

Stack implementation and will still be correct!<br />

then replace the array-based Stack.class file with<br />

your own l<strong>in</strong>ked-list-based Stack.class file<br />

aga<strong>in</strong>, WITHOUT CHANGING OR RECOMPIL-<br />

ING YOUR PROGRAM, run your program — it<br />

will use the local Stack implementation and will<br />

still be correct!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 178]<br />

Motivation for Queues<br />

Some examples <strong>of</strong> first-<strong>in</strong>, first-out (FIFO) behavior:<br />

<br />

<br />

<br />

A queue is a


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 179]<br />

Specify<strong>in</strong>g the Queue ADT<br />

Us<strong>in</strong>g the abstract state style <strong>of</strong> specification:<br />

The state <strong>of</strong> a queue is modeled as a<br />

Initially the state <strong>of</strong> the queue is the<br />

The effect <strong>of</strong> an enqueue(x) operation is to<br />

The effect <strong>of</strong> a dequeue operation is to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 180]<br />

Specify<strong>in</strong>g the Queue ADT (cont’d)<br />

Alternative specification us<strong>in</strong>g allowable sequences would<br />

give some rules (an “algebra”). Some specific examples:<br />

enqueue(a) dequeue(a):<br />

dequeue(a):<br />

enqueue(a) enqueue(b) enqueue(c) dequeue(a) enqueue(d)<br />

dequeue(b):<br />

enqueue(a) enqueue(b) dequeue(b):<br />

Other popular queue operations:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 181]<br />

Applications <strong>of</strong> Queues <strong>in</strong> Operat<strong>in</strong>g Systems<br />

The text discusses some applications <strong>of</strong> queues <strong>in</strong> operat<strong>in</strong>g<br />

systems:<br />

to buffer data com<strong>in</strong>g from a runn<strong>in</strong>g process go<strong>in</strong>g<br />

to a pr<strong>in</strong>ter:<br />

a pr<strong>in</strong>ter may be shared between several computers<br />

that are networked together.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 182]<br />

Application <strong>of</strong> Queues <strong>in</strong> Discrete Event Simulators<br />

A simulation program is a program that mimics, or<br />

“simulates”, the behavior <strong>of</strong> some complicated realworld<br />

situation, such as<br />

<br />

<br />

<br />

These systems are typically too complicated to be modeled<br />

exactly mathematically, so <strong>in</strong>stead, they are simulated:<br />

events take place <strong>in</strong> them accord<strong>in</strong>g to some<br />

random number generator. For <strong>in</strong>stance,<br />

at random times,<br />

at random times,<br />

at random times,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 183]<br />

Us<strong>in</strong>g a Queue to Convert Infix to Postfix<br />

First attempt: Assume <strong>in</strong>fix expression is<br />

For example:<br />

(((22=7) + 4) (6 , 2))<br />

(7 , (((2 3)+5)(8 , (4=2))))<br />

Pseudocode:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 184]<br />

Convert<strong>in</strong>g Infix to Postfix (cont’d)<br />

Examples:<br />

(((22=7) + 4) (6 , 2))<br />

Q:<br />

S:<br />

(7 , (((2 3)+5)(8 , (4=2))))<br />

Q:<br />

S:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 185]<br />

Convert<strong>in</strong>g Infix to Postfix with Precedence<br />

It is too restrictive to require parentheses around everyth<strong>in</strong>g.<br />

Instead, precedence conventions tell<br />

For <strong>in</strong>stance, 4 3+2equals<br />

We need to modify the above algorithm to handle operator<br />

precedence.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 186]<br />

Convert<strong>in</strong>g Infix to Postfix with Precedence (cont’d)<br />

create queue Q to hold postfix expression<br />

create stack S to hold operators not yet<br />

added to the postfix expression<br />

while t<strong>here</strong> are more tokens do<br />

get next token t<br />

if t is a number then enqueue t on Q<br />

else if S is empty then push t on S<br />

else if t is ( then push t on S<br />

else if t is ) then<br />

while top <strong>of</strong> S is not ( do<br />

pop S and enqueue result on Q<br />

endwhile<br />

pop S // get rid <strong>of</strong> ( that ended while<br />

else // t is real operator and S not empty)<br />

while prec(t)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 187]<br />

Convert<strong>in</strong>g Infix to Postfix with Precedence (cont’d)<br />

For example:<br />

(22=7 +4)(6 , 2)<br />

Q:<br />

S:<br />

7 , (2 3+5)(8 , 4=2)<br />

Q:<br />

S:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 188]<br />

Implement<strong>in</strong>g a Queue with an Array<br />

State is represented with:<br />

array A<br />

<strong>in</strong>teger head that holds<br />

<strong>in</strong>teger tail that holds<br />

Operation implementations:<br />

enqueue(x):<br />

dequeue(x):<br />

empty:<br />

peek:<br />

size:<br />

Problem:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 189]<br />

Implement<strong>in</strong>g a Queue with a Circular Array<br />

Wrap around to reuse the vacated space at the beg<strong>in</strong>n<strong>in</strong>g<br />

<strong>of</strong> the array <strong>in</strong> a circular fashion, us<strong>in</strong>g mod operator<br />

%.<br />

enqueue(x):<br />

dequeue(x):<br />

empty:<br />

The problem is that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 190]<br />

Expand<strong>in</strong>g Size <strong>of</strong> Queue Dynamically<br />

To avoid overflow problem <strong>in</strong> circular array implementation<br />

<strong>of</strong> a queue, use same idea as for array implementation<br />

<strong>of</strong> stack:<br />

If array is discovered to be full dur<strong>in</strong>g an enqueue,<br />

allocate<br />

copy<br />

enqueue<br />

free<br />

One complication with the queue, though, is that the<br />

contents <strong>of</strong> the queue might be <strong>in</strong> two sections:<br />

1. from<br />

2. then from<br />

Copy<strong>in</strong>g the new array must take this <strong>in</strong>to account.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 191]<br />

Performance <strong>of</strong> Circular Array<br />

Performance <strong>of</strong> the circular array implementation <strong>of</strong> a<br />

queue:<br />

Time:<br />

space:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 192]<br />

Implement<strong>in</strong>g a Queue with a L<strong>in</strong>ked List<br />

State representation:<br />

Data items are kept <strong>in</strong><br />

Po<strong>in</strong>ter head po<strong>in</strong>ts to<br />

Po<strong>in</strong>ter tail po<strong>in</strong>ts to<br />

Operation implementations:<br />

To enqueue an item,<br />

To dequeue an item,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 193]<br />

Implement<strong>in</strong>g a Queue with a L<strong>in</strong>ked List (cont’d)<br />

class Queue {<br />

private QueueNode head;<br />

private QueueNode tail;<br />

public Queue() {<br />

head = null;<br />

tail = null;<br />

}<br />

public boolean empty() {<br />

return (head == null);<br />

}<br />

public void enqueue(Object obj) {<br />

QueueNode node = new QueueNode(obj);<br />

if empty() {<br />

head = node;<br />

tail = node;<br />

} else {<br />

tail.l<strong>in</strong>k = node;<br />

tail = node;<br />

}<br />

}<br />

// cont<strong>in</strong>ued on next slide


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 194]<br />

Implement<strong>in</strong>g a Queue with a L<strong>in</strong>ked List (cont’d)<br />

// cont<strong>in</strong>ued from previous slide<br />

}<br />

public Object dequeue() {<br />

if ( empty() )<br />

return null; // or throw an EmptyQueueException<br />

else {<br />

Object returnItem = head.item;<br />

head = head.l<strong>in</strong>k; // remove first node from list<br />

if (head == null) // fix tail po<strong>in</strong>ter if needed<br />

tail = null;<br />

return returnItem;<br />

}<br />

}<br />

Every operation always takes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 195]<br />

Motivation for the List ADT<br />

This ADT is good for model<strong>in</strong>g<br />

Some sample applications:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 196]<br />

Specify<strong>in</strong>g the List ADT<br />

The state <strong>of</strong> a list object is<br />

Typical operations on a list are:<br />

create:<br />

empty:<br />

length:<br />

select(i):<br />

replace(i,x):<br />

delete(x):<br />

<strong>in</strong>sert(x):


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 197]<br />

Implement<strong>in</strong>g the List ADT<br />

Array implementation:<br />

Keep a counter<br />

To select or replace at some location,<br />

To <strong>in</strong>sert at some location, items down.<br />

To delete at some location,<br />

L<strong>in</strong>ked list implementation:<br />

Keep a count <strong>of</strong><br />

To select, replace, delete or <strong>in</strong>sert an item,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 198]<br />

Compar<strong>in</strong>g the Times <strong>of</strong> List Implementations<br />

Time for various operations, on a list <strong>of</strong> n data items:<br />

list s<strong>in</strong>gly<br />

operation l<strong>in</strong>ked list array<br />

empty<br />

length<br />

select(i)<br />

replace(i)<br />

delete(i)<br />

<strong>in</strong>sert(i)<br />

The time for <strong>in</strong>sert <strong>in</strong> an array assumes no overflow<br />

occurs. If overflow occurs,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 199]<br />

Compar<strong>in</strong>g the Space <strong>of</strong> List Implementations<br />

Space requirements:<br />

If the array holds po<strong>in</strong>ters to the items, then t<strong>here</strong> is<br />

the space overhead <strong>of</strong><br />

If the array holds the items themselves, then t<strong>here</strong> is<br />

the space overhead <strong>of</strong><br />

In both k<strong>in</strong>ds <strong>of</strong> arrays, t<strong>here</strong> is also the overhead <strong>of</strong><br />

If you use a l<strong>in</strong>ked list, then the space overhead is<br />

for<br />

To quantify the space trade<strong>of</strong>fs between the array <strong>of</strong><br />

items and l<strong>in</strong>ked list representations:<br />

Let p be the number <strong>of</strong><br />

Let q be the number <strong>of</strong><br />

Let m be the number <strong>of</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 200]<br />

Compar<strong>in</strong>g the Space (cont’d)<br />

To hold n items,<br />

the array representation uses<br />

the l<strong>in</strong>ked list representation uses<br />

The trade<strong>of</strong>f po<strong>in</strong>t is when<br />

When nqm=(p + q),<br />

When the item size, q, is much larger than the po<strong>in</strong>ter<br />

size, p,<br />

When the item size, q, is closer to the po<strong>in</strong>ter size,<br />

p,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 201]<br />

Generalized Lists<br />

A generalized list is<br />

Example: (a; b; (c; (d; e);f);g;(h; i)).<br />

T<strong>here</strong> are five elements <strong>in</strong> the (top level) list:<br />

1.<br />

2.<br />

3.<br />

4.<br />

5.<br />

Items which are not lists are called atoms (they cannot<br />

be further subdivided).


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 202]<br />

Sample Java Code for Generalized List<br />

class Node {<br />

Object item;<br />

Node l<strong>in</strong>k;<br />

Node (Object obj) { item = obj; }<br />

}<br />

class GenList {<br />

private Node first;<br />

GenList() { first = null; }<br />

void <strong>in</strong>sert(Object newItem) {<br />

Node node = new Node(newItem);<br />

node.l<strong>in</strong>k = first;<br />

first = node;<br />

}<br />

void pr<strong>in</strong>t() {<br />

System.out.pr<strong>in</strong>t("( ");<br />

Node node = first;<br />

while (node != null) {<br />

if (node.item <strong>in</strong>stance<strong>of</strong> GenList)<br />

((GenList)node.item).pr<strong>in</strong>t();<br />

else S.o.p(node.item);<br />

node = node.l<strong>in</strong>k;<br />

if (node != null) S.o.p(", ");<br />

}<br />

S.o.p(" )");<br />

}<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 203]<br />

Sample Java Code (cont’d)<br />

Notice:<br />

o <strong>in</strong>stance<strong>of</strong> C returns true if<br />

– object o<br />

– object o<br />

– object o<br />

– object o<br />

casts node.item to type GenList, if appropriate<br />

recursive call <strong>of</strong> the GenList method pr<strong>in</strong>t<br />

implicit use <strong>of</strong> the toStr<strong>in</strong>g method <strong>of</strong> every class,<br />

<strong>in</strong> the call to System.out.pr<strong>in</strong>t<br />

Don’t confuse the pr<strong>in</strong>t method <strong>of</strong> System.out<br />

with the pr<strong>in</strong>t method we are def<strong>in</strong><strong>in</strong>g for class<br />

GenList.)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 204]<br />

Sample Java Code (cont’d)<br />

How do we know that pr<strong>in</strong>t is well-def<strong>in</strong>ed and won’t<br />

get <strong>in</strong>to an <strong>in</strong>f<strong>in</strong>ite loop?<br />

The pr<strong>in</strong>t method is recursive and uses a while loop.<br />

The while loop<br />

If an item is not a generalized list, then it<br />

If an item is itself a generalized list, then<br />

The while loop stops when<br />

Each recursive call takes you deeper <strong>in</strong>to the nest<strong>in</strong>g <strong>of</strong><br />

the generalized list.<br />

Assume<br />

The stopp<strong>in</strong>g case for the recursion is<br />

Each recursive call takes you closer to a stopp<strong>in</strong>g<br />

case.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 205]<br />

Generalized List Pitfalls<br />

Warn<strong>in</strong>g! If t<strong>here</strong> is a cycle <strong>in</strong> the generalized list,<br />

pr<strong>in</strong>t will go <strong>in</strong>to an <strong>in</strong>f<strong>in</strong>ite loop. For <strong>in</strong>stance:<br />

Be careful about shared sublists. For <strong>in</strong>stance,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 206]<br />

Application <strong>of</strong> Generalized Lists: LISP<br />

Generalized lists are<br />

highly<br />

good for applications w<strong>here</strong><br />

the key structur<strong>in</strong>g paradigm <strong>in</strong><br />

LISP is a functional language:<br />

Each function call is represented as a list, with the<br />

name <strong>of</strong> the function com<strong>in</strong>g first, and the arguments<br />

com<strong>in</strong>g after it:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 207]<br />

LISP-like Approach to Arithmetic Expressions<br />

Apply this approach to evaluat<strong>in</strong>g arithmetic expressions:<br />

Use prefix notation (as opposed to postfix), with parentheses<br />

to delimit the sublists:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 208]<br />

Str<strong>in</strong>gs and Str<strong>in</strong>gBuffers<br />

Java differentiates between<br />

T<strong>here</strong> are no methods that change an exist<strong>in</strong>g Str<strong>in</strong>g.<br />

If you want to change the characters <strong>in</strong> a str<strong>in</strong>g, use a<br />

Str<strong>in</strong>gBuffer. Some key features are:<br />

change<br />

append<br />

<strong>in</strong>sert<br />

The Str<strong>in</strong>gBuffer class can be implemented us<strong>in</strong>g<br />

an array <strong>of</strong> characters. The ideas are not complicated.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 209]<br />

The Heap<br />

When you use new or malloc to dynamically allocate<br />

some space, the run-time system handles the mechanics<br />

<strong>of</strong> actually f<strong>in</strong>d<strong>in</strong>g the required free space <strong>of</strong><br />

the necessary size.<br />

When you make an object <strong>in</strong>accessible (<strong>in</strong> Java) or use<br />

free (<strong>in</strong> C), aga<strong>in</strong> the run-time system handles the<br />

mechanics <strong>of</strong> reclaim<strong>in</strong>g the space.<br />

We are now go<strong>in</strong>g to look at HOW one could implement<br />

dynamic allocation <strong>of</strong> objects from the heap. The<br />

reasons are:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 210]<br />

What is the Heap?<br />

The heap is an area <strong>of</strong> memory used to store objects<br />

that will by dynamically allocated and deallocated.<br />

Memory can be viewed as one long array <strong>of</strong> memory<br />

locations, w<strong>here</strong> the address <strong>of</strong> a memory location is<br />

the <strong>in</strong>dex <strong>of</strong> the location <strong>in</strong> the array.<br />

Thus we can view the heap as<br />

Contiguous locations <strong>in</strong> the heap (array) are grouped<br />

together <strong>in</strong>to<br />

When a request arrives to allocate n bytes, the system<br />

f<strong>in</strong>ds<br />

allocates<br />

returns<br />

Blocks are classified as either<br />

Initially,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 211]<br />

Heap Data Structures<br />

Once blocks are allocated, the heap might get chopped<br />

up <strong>in</strong>to alternat<strong>in</strong>g allocated and free blocks <strong>of</strong> vary<strong>in</strong>g<br />

sizes.<br />

We need a way to locate all the free blocks.<br />

This will be done by keep<strong>in</strong>g the free blocks <strong>in</strong> a<br />

The l<strong>in</strong>ked list is implemented us<strong>in</strong>g<br />

Each block has some


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 212]<br />

Allocation<br />

When a request arrives to allocate n bytes,<br />

T<strong>here</strong> are two strategies for choos<strong>in</strong>g the block to use:<br />

<br />

<br />

If the block found is bigger than n, then<br />

If the block found is exactly <strong>of</strong> size n, then<br />

If no block large enough is found, then


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 213]<br />

Deallocation<br />

When a block is deallocated, as a first cut, simply <strong>in</strong>sert<br />

the block at the front <strong>of</strong> the free list.<br />

100<br />

0<br />

79<br />

free<br />

000 111<br />

000 111<br />

000 111<br />

000 111<br />

000 111<br />

10 70<br />

0<br />

10<br />

79<br />

p := alloc(10)<br />

p<br />

free<br />

000 111 000000<br />

111111<br />

000 111 000000<br />

111111<br />

000 111 10 000000<br />

111111<br />

20<br />

000 111 000000<br />

111111<br />

000 111 000000<br />

111111<br />

0 10<br />

30<br />

50<br />

p q free<br />

79<br />

q := alloc(20)<br />

10<br />

0<br />

000000<br />

111111<br />

000000<br />

111111<br />

000000<br />

111111<br />

000000<br />

111111<br />

20<br />

000000<br />

111111<br />

000000<br />

111111<br />

10<br />

free q<br />

50<br />

30 79<br />

free(p)<br />

10<br />

0<br />

1111110000000000<br />

1111111111<br />

1111110000000000<br />

1111111111<br />

000000<br />

111111<br />

20 0000000000<br />

1111111111<br />

40<br />

1111110000000000<br />

1111111111<br />

000000<br />

1111111111111111<br />

0000000000<br />

10<br />

30<br />

free q r<br />

10<br />

70<br />

79<br />

r := alloc(40)<br />

10 20<br />

0000000000<br />

1111111111<br />

0000000000<br />

1111111111<br />

0000000000<br />

1111111111<br />

40<br />

0000000000<br />

1111111111<br />

1111111111<br />

0000000000<br />

0 10 30 70 79<br />

10<br />

free(q)<br />

free<br />

r


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 214]<br />

Fragmentation<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

40<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

10 20 10<br />

0 10 30 70 79<br />

free(q)<br />

free<br />

r<br />

Problem with previous example: If a request comes <strong>in</strong><br />

for 30 bytes, the system will check the free list, and<br />

f<strong>in</strong>d


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 215]<br />

Coalesc<strong>in</strong>g<br />

A solution to fragmentation is to<br />

physical neighbor:<br />

virtual neighbor:<br />

To facilitate this operation, we will need additional space<br />

overhead <strong>in</strong> the header, and it will also help to keep<br />

“footer” <strong>in</strong>formation at the end <strong>of</strong> each block to:<br />

make<br />

<strong>in</strong>dicate<br />

replicate


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 216]<br />

More Insidious Fragmentation<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

40<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

00000000000<br />

11111111111<br />

10 20 10<br />

0 10 30 70 79<br />

free(q)<br />

free<br />

r<br />

However, coalesc<strong>in</strong>g will not accommodate a request<br />

for


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 217]<br />

Compaction<br />

The solution to this problem is called<br />

The difficulty though is that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 218]<br />

Master Po<strong>in</strong>ters<br />

A solution is to use<br />

A special area <strong>of</strong> the heap conta<strong>in</strong>s<br />

The addresses<br />

The address returned by the allocate procedure is<br />

The contents <strong>of</strong> a master po<strong>in</strong>ter<br />

But the user,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 219]<br />

Master Po<strong>in</strong>ters (cont’d)<br />

...<br />

000000<br />

111111<br />

000000<br />

111111<br />

000000<br />

111111<br />

000000<br />

111111<br />

000000<br />

111111<br />

0000 1111000<br />

...<br />

0000 1111000<br />

0000 1111000<br />

0000 1111000<br />

0000 1111000<br />

p q r<br />

master po<strong>in</strong>ters<br />

Costs:<br />

Additional<br />

Additional<br />

rest <strong>of</strong> heap<br />

Unpredictable


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 220]<br />

Garbage Collection<br />

The above discussion <strong>of</strong> deallocation assumes the memory<br />

allocation algorithm is somehow <strong>in</strong>formed about<br />

which blocks are no longer <strong>in</strong> use:<br />

In C, this is done<br />

In Java,<br />

This process is part <strong>of</strong> garbage collection:<br />

<br />

<br />

One <strong>of</strong> the challeng<strong>in</strong>g aspects <strong>of</strong> garbage collection is<br />

how to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 221]<br />

Trees<br />

Important term<strong>in</strong>ology:<br />

Some uses <strong>of</strong> trees:<br />

model<br />

model<br />

a clever implementation <strong>of</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 222]<br />

Trees (cont’d)<br />

Some more terms:<br />

path:<br />

length <strong>of</strong> path:<br />

height <strong>of</strong> a node:<br />

height<strong>of</strong>tree:<br />

depth (or level) <strong>of</strong> a node:<br />

depth <strong>of</strong> tree:<br />

Fact: The depth <strong>of</strong> a tree equals the height <strong>of</strong> the tree.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 223]<br />

B<strong>in</strong>ary Trees<br />

B<strong>in</strong>ary tree: atree<strong>in</strong>which<br />

Complete b<strong>in</strong>ary tree: tree <strong>in</strong> which<br />

Important Facts:<br />

A complete b<strong>in</strong>ary tree with L levels conta<strong>in</strong>s<br />

A complete b<strong>in</strong>ary tree with n nodes has


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 224]<br />

B<strong>in</strong>ary Trees (cont’d)<br />

Leftmost b<strong>in</strong>ary tree: like a complete b<strong>in</strong>ary tree,<br />

except that<br />

however, all leaves at bottom level are<br />

Important Facts:<br />

A leftmost b<strong>in</strong>ary tree with L levels conta<strong>in</strong>s<br />

A leftmost b<strong>in</strong>ary tree with n nodes has


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 225]<br />

B<strong>in</strong>ary Heap<br />

Now suppose that t<strong>here</strong> is a data item, called<br />

<strong>in</strong>side each node <strong>of</strong> a tree.<br />

A b<strong>in</strong>ary heap (or m<strong>in</strong>-heap) is a<br />

leftmost b<strong>in</strong>ary tree<br />

satisfies the<br />

Do not confuse this use <strong>of</strong> “heap” with its usage <strong>in</strong><br />

memory management!<br />

Important Fact: The same set <strong>of</strong> keys<br />

T<strong>here</strong> is no


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 226]<br />

Us<strong>in</strong>g a Heap to Implement a Priority Queue<br />

To implement the priority queue operation <strong>in</strong>sert(x):<br />

1.<br />

2.<br />

3.<br />

Time:<br />

To implement the priority queue operation remove():<br />

Tricky part is how to remove the root without mess<strong>in</strong>g<br />

up the tree structure.<br />

1.<br />

2.<br />

3.<br />

Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 227]<br />

Us<strong>in</strong>g a Heap to Implement a PQ (cont’d)<br />

PQ operation sorted array unsorted array heap<br />

or l<strong>in</strong>ked list or l<strong>in</strong>ked list<br />

<strong>in</strong>sert<br />

remove (m<strong>in</strong>)<br />

No longer have the severe trade<strong>of</strong>fs <strong>of</strong> the array and<br />

l<strong>in</strong>ked list representations <strong>of</strong> priority queue.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 228]<br />

Heap Sort<br />

Recall the sort<strong>in</strong>g algorithm that used a priority queue:<br />

1. <strong>in</strong>sert the elements to be sorted, one by one, <strong>in</strong>to a<br />

priority queue.<br />

2. remove the elements, one by one, from the priority<br />

queue; they will come out <strong>in</strong> sorted order.<br />

If the priority queue is implemented with a heap, the<br />

runn<strong>in</strong>g time is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 229]<br />

L<strong>in</strong>ked Structure Implementation <strong>of</strong> Heap<br />

To implement a heap with a l<strong>in</strong>ked structure, each node<br />

<strong>of</strong> the tree will be represented with an object conta<strong>in</strong><strong>in</strong>g<br />

<br />

<br />

<br />

<br />

To f<strong>in</strong>d the next available location for <strong>in</strong>sert, or the<br />

rightmost node on the bottom level for remove, <strong>in</strong> constant<br />

time,<br />

<br />

<br />

Then keep a


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 230]<br />

Array Implementation <strong>of</strong> Heap<br />

Fortunately, t<strong>here</strong>’s a nifty way to implement a heap<br />

us<strong>in</strong>g an array, based on an <strong>in</strong>terest<strong>in</strong>g observation: If<br />

you number the nodes <strong>in</strong> a leftmost b<strong>in</strong>ary tree, start<strong>in</strong>g<br />

at the root and go<strong>in</strong>g across levels and down levels, you<br />

see a pattern:<br />

1<br />

2 3<br />

4 5<br />

6 7<br />

8 9<br />

Node number i has left child<br />

Node number i has right child<br />

If 2 i>n, then i has no<br />

If 2 i +1>n, then i has no<br />

T<strong>here</strong>fore, node number i is a leaf if<br />

The parent <strong>of</strong> node i is<br />

Next available location for <strong>in</strong>sert is <strong>in</strong>dex<br />

Rightmost node on the bottom level is <strong>in</strong>dex


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 231]<br />

Array Implementation <strong>of</strong> Heap (cont’d)<br />

Representation consists <strong>of</strong><br />

array A[1..max] (ignore location 0)<br />

<strong>in</strong>teger n, which is <strong>in</strong>itially 0, hold<strong>in</strong>g number <strong>of</strong><br />

elements <strong>in</strong> heap<br />

To implement <strong>in</strong>sert(x) (ignor<strong>in</strong>g overflow):<br />

n := n+1 // make a new leaf node<br />

A[n] := x // new node’s key is <strong>in</strong>itially x<br />

cur := n // start bubbl<strong>in</strong>g x up<br />

parent := cur/2<br />

while (parent != 0) && A[parent] > A[cur] do<br />

// current node is not the root and its key<br />

// has not found f<strong>in</strong>al rest<strong>in</strong>g place<br />

swap A[cur] and A[parent]<br />

cur := parent // move up a level <strong>in</strong> the tree<br />

parent := cur/2<br />

endwhile


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 232]<br />

Array Implementation <strong>of</strong> Heap (cont’d)<br />

To implement remove (ignor<strong>in</strong>g underflow):<br />

m<strong>in</strong>Key := A[1] // smallest key, to be returned<br />

A[1] := A[n] // replace root’s key with key <strong>in</strong><br />

// rightmost leaf on bottom level<br />

n := n-1 // delete rightmost leaf on bottom level<br />

cur := 1 // start bubbl<strong>in</strong>g down key <strong>in</strong> root<br />

Lchild := 2*cur<br />

Rchild := 2*cur + 1<br />

while (Lchild


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 233]<br />

B<strong>in</strong>ary Tree Traversals<br />

Now consider any k<strong>in</strong>d <strong>of</strong> b<strong>in</strong>ary tree with data <strong>in</strong> the<br />

nodes, not just leftmost b<strong>in</strong>ary trees.<br />

In many applications, we need to traverse a tree: “visit”<br />

each node exactly once. When the node is visited,<br />

some computation can take place, such as pr<strong>in</strong>t<strong>in</strong>g the<br />

key.<br />

T<strong>here</strong> are three popular k<strong>in</strong>ds <strong>of</strong> traversals, differ<strong>in</strong>g <strong>in</strong><br />

the order <strong>in</strong> which each node is visited <strong>in</strong> relation to the<br />

order <strong>in</strong> which its left and right subtrees are visited:<br />

<strong>in</strong>order traversal:<br />

preorder traversal:<br />

postorder traversal:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 234]<br />

B<strong>in</strong>ary Tree Traversals (cont’d)<br />

preorder(x):<br />

if x is not empty then<br />

visit x<br />

preorder(leftchild(x))<br />

preorder(rightchild(x))<br />

<strong>in</strong>order(x):<br />

if x is not empty then<br />

<strong>in</strong>order(leftchild(x))<br />

visit x<br />

<strong>in</strong>order(rightchild(x))<br />

postorder(x):<br />

if x is not empty then<br />

postorder(leftchild(x))<br />

postorder(rightchild(x))<br />

visit x<br />

a<br />

b<br />

c<br />

d<br />

f<br />

g<br />

preorder:<br />

<strong>in</strong>order:<br />

postorder:<br />

e<br />

h<br />

i


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 235]<br />

B<strong>in</strong>ary Tree Traversals (cont’d)<br />

These traversals are particularly <strong>in</strong>terest<strong>in</strong>g when the<br />

b<strong>in</strong>ary tree is a parse tree for an arithmetic expression:<br />

Postorder traversal results <strong>in</strong> the<br />

Preorder gives<br />

Does <strong>in</strong>order give<br />

*<br />

+ -<br />

preorder:<br />

<strong>in</strong>order:<br />

postorder:<br />

5 3 2<br />

1


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 236]<br />

Representation <strong>of</strong> a B<strong>in</strong>ary Tree<br />

The most straightforward representation for an (arbitrary)<br />

b<strong>in</strong>ary tree is a l<strong>in</strong>ked structure, w<strong>here</strong> each node<br />

has<br />

<br />

<br />

<br />

Notice that the array representation used for a heap<br />

will not work, because the structure <strong>of</strong> the tree is not<br />

necessarily very regular.<br />

class TreeNode {<br />

Object data;<br />

TreeNode left;<br />

TreeNode right;<br />

// data <strong>in</strong> the node<br />

// left child<br />

// right child<br />

// constructor goes <strong>here</strong>...<br />

}<br />

void visit() {<br />

// what to do when node is visited<br />

}


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 237]<br />

Representation <strong>of</strong> a B<strong>in</strong>ary Tree (cont’d)<br />

class Tree {<br />

TreeNode root;<br />

// other <strong>in</strong>formation...<br />

void preorderTraversal() {<br />

preorder(root);<br />

}<br />

}<br />

preorder(TreeNode t) {<br />

if (t != null) { // stopp<strong>in</strong>g case for recursion<br />

t.visit(); // user-def<strong>in</strong>ed visit method<br />

preorder(t.left);<br />

preorder(t.right);<br />

}<br />

}<br />

But we haven’t yet talked about how you actually MAKE<br />

a b<strong>in</strong>ary tree. We’ll do that next, when we talk about


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 238]<br />

Dictionary ADT Specification<br />

So far, we’ve seen the abstract data types<br />

<br />

<br />

<br />

<br />

Another useful ADT is a dictionary (or table). The<br />

abstract state <strong>of</strong> a dictionary is a<br />

The ma<strong>in</strong> operations are:<br />

<br />

<br />

<br />

Some additional operations are:<br />

f<strong>in</strong>d the<br />

f<strong>in</strong>d the


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 239]<br />

Dictionary ADT Applications<br />

The dictionary (or table) ADT is<br />

For <strong>in</strong>stance, student records at a university can be kept<br />

<strong>in</strong> a dictionary data structure:<br />

When a new student enrolls,<br />

When a student graduates,<br />

When <strong>in</strong>formation about a student needs to be updated,<br />

Once the search has located the record for that student,<br />

When <strong>in</strong>formation about student needs to be retrieved,<br />

The world is full <strong>of</strong> <strong>in</strong>formation databases, many <strong>of</strong><br />

them extremely large (imag<strong>in</strong>e what the IRS has).<br />

When the number <strong>of</strong> elements gets very large,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 240]<br />

Dictionary Implementations<br />

We will study a number <strong>of</strong> implementations:<br />

Search Trees<br />

<br />

:<br />

<br />

–<br />

–<br />

–<br />

Hash Tables


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 241]<br />

B<strong>in</strong>ary Search Tree<br />

Recall the heap order<strong>in</strong>g property for b<strong>in</strong>ary heaps:<br />

Another order<strong>in</strong>g property is the b<strong>in</strong>ary search tree<br />

property: for each node x,<br />

all keys <strong>in</strong> the left subtree <strong>of</strong> x<br />

all keys <strong>in</strong> the right subtree <strong>of</strong> x<br />

A b<strong>in</strong>ary search tree (BST) is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 242]<br />

Search<strong>in</strong>g <strong>in</strong> a BST<br />

To search for a particular key <strong>in</strong> a b<strong>in</strong>ary search tree,<br />

we take advantage <strong>of</strong> the b<strong>in</strong>ary search tree property:<br />

search(x,k): // x is node w<strong>here</strong> search starts<br />

----------- // k is key searched for<br />

if x is null then // stopp<strong>in</strong>g case for recursion<br />

return "not found"<br />

else if k = the key <strong>of</strong> x then<br />

return x<br />

else if k < the key <strong>of</strong> x then<br />

search(leftchild(x),k) // recursive call<br />

else // k > the key <strong>of</strong> x<br />

search(rightchild(x),k) // recursive call<br />

endif<br />

The top level call has x equal to<br />

In the previous tree, the search path for 17 is<br />

and the search path for 21 is<br />

Runn<strong>in</strong>g Time:<br />

If BST is a cha<strong>in</strong>, then


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 243]<br />

Search<strong>in</strong>g <strong>in</strong> a BST (cont’d)<br />

Iterative version <strong>of</strong> search:<br />

search(x,k):<br />

------------<br />

while x != null do<br />

if k = the key <strong>of</strong> x then<br />

return x<br />

else if k < the key <strong>of</strong> x then<br />

x := leftchild(x)<br />

else // k > the key <strong>of</strong> x<br />

x := rightchild(x)<br />

endif<br />

endwhile<br />

return "not found"<br />

As <strong>in</strong> the recursive version,<br />

The comparison <strong>of</strong> the search key with the node key<br />

tells you at each level<br />

Runn<strong>in</strong>g Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 244]<br />

Search<strong>in</strong>g <strong>in</strong> a Balanced BST<br />

If the tree is a complete b<strong>in</strong>ary tree, then the depth is<br />

and thus the search time is<br />

B<strong>in</strong>ary trees with O(log n) depth are considered balanced:<br />

t<strong>here</strong> is balance between<br />

You can have b<strong>in</strong>ary trees that are<br />

so that the depth is<br />

but might have a larger constant hidden <strong>in</strong> the big-oh.<br />

As an aside, a b<strong>in</strong>ary heap does not have<br />

S<strong>in</strong>ce nodes at the same level <strong>of</strong> the heap have no particular<br />

order<strong>in</strong>g relationship to each other, you will need<br />

to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 245]<br />

Insert<strong>in</strong>g<strong>in</strong>toaBST<br />

To <strong>in</strong>sert a key k <strong>in</strong>to a b<strong>in</strong>ary search tree,<br />

Then<br />

<strong>in</strong>sert(x,k):<br />

-----------<br />

if x = null then<br />

make a new node conta<strong>in</strong><strong>in</strong>g k<br />

return new node<br />

else if k = the key <strong>of</strong> x then<br />

return null // key already exists<br />

else if k < the key <strong>of</strong> x then<br />

leftchild(x) := <strong>in</strong>sert(leftchild(x),k)<br />

return x<br />

else // k > the key <strong>of</strong> x<br />

rightchild(x) := <strong>in</strong>sert(rightchild(x),k)<br />

return x<br />

endif<br />

Insert called on node x<br />

unless x is null, <strong>in</strong> which case<br />

As a result, a child <strong>of</strong> a node<br />

Runn<strong>in</strong>g Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 246]<br />

Insert<strong>in</strong>g <strong>in</strong>to a BST (cont’d)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 247]<br />

F<strong>in</strong>d<strong>in</strong>g M<strong>in</strong> and Max <strong>in</strong> B<strong>in</strong>ary Search Tree<br />

Fact: The smallest key <strong>in</strong> a b<strong>in</strong>ary tree is found by<br />

Runn<strong>in</strong>g Time:<br />

Guess how to f<strong>in</strong>d the largest key and how long it takes.<br />

M<strong>in</strong> is<br />

and max is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 248]<br />

Pr<strong>in</strong>t<strong>in</strong>g a BST <strong>in</strong> Sorted Order<br />

Cute tie-<strong>in</strong> between tree traversals and BST’s.<br />

Theorem: Inorder traversal <strong>of</strong> a b<strong>in</strong>ary search tree visits<br />

the nodes<br />

Inorder traversal on previous tree gives:<br />

Pro<strong>of</strong>: Let’s look at some small cases and then use<br />

<strong>in</strong>duction for the general case.<br />

Case 1:<br />

Case 2:<br />

Case n: Suppose true for trees <strong>of</strong> size<br />

Consider a tree <strong>of</strong> size


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 249]<br />

Pr<strong>in</strong>t<strong>in</strong>g a BST <strong>in</strong> Sorted Order (cont’d)<br />

L conta<strong>in</strong>s at most<br />

and R conta<strong>in</strong>s at most<br />

Inorder traversal:<br />

pr<strong>in</strong>ts out<br />

then pr<strong>in</strong>ts out<br />

then pr<strong>in</strong>ts out<br />

2<br />

Runn<strong>in</strong>g Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 250]<br />

Tree Sort<br />

Does previous theorem suggest yet another sort<strong>in</strong>g algorithm<br />

to you?<br />

Tree Sort: Insert all the keys<br />

thendoan<br />

Runn<strong>in</strong>g Time:<br />

s<strong>in</strong>ce each <strong>of</strong> the n <strong>in</strong>serts takes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 251]<br />

F<strong>in</strong>d<strong>in</strong>g Successor <strong>in</strong> a BST<br />

The successor <strong>of</strong> a node x <strong>in</strong> a BST is<br />

Case 1: If x has a right child, then the successor <strong>of</strong> x<br />

is the<br />

follow x’s right po<strong>in</strong>ter, then follow left po<strong>in</strong>ters until<br />

t<strong>here</strong> are no more.<br />

Path to f<strong>in</strong>d successor <strong>of</strong> 19 is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 252]<br />

F<strong>in</strong>d<strong>in</strong>g Successor <strong>in</strong> a BST (cont’d)<br />

19<br />

10 22<br />

4<br />

16<br />

20 26<br />

13<br />

17 27<br />

Case 2: If x does not have a right child, then f<strong>in</strong>d the<br />

Path to f<strong>in</strong>d successor <strong>of</strong> 17 is<br />

If you never f<strong>in</strong>d an ancestor that is larger than x’s key,<br />

then<br />

Path to try to f<strong>in</strong>d successor <strong>of</strong> 27 is<br />

Runn<strong>in</strong>g Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 253]<br />

F<strong>in</strong>d<strong>in</strong>g Predecessor <strong>in</strong> a BST<br />

The predecessor <strong>of</strong> a node x <strong>in</strong> a BST is the node<br />

whose<br />

To f<strong>in</strong>d it,<br />

Case 1: If x has a left child, then the predecessor <strong>of</strong> x<br />

follow x’s left po<strong>in</strong>ter, then follow right po<strong>in</strong>ters until<br />

t<strong>here</strong> are no more.<br />

Case 2: If x does not have a left child, then f<strong>in</strong>d the<br />

lowest ancestor <strong>of</strong> x<br />

(I.e., follow parent po<strong>in</strong>ters from x until reach<strong>in</strong>g a key<br />

smaller than x’s.)<br />

If you never f<strong>in</strong>d an ancestor that is smaller than x’s<br />

key, then<br />

Runn<strong>in</strong>g Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 254]<br />

Delet<strong>in</strong>g a Node from a BST<br />

Case 1: x is a leaf. Then<br />

Case 2: x has only one child. Then<br />

Case 3: x has two children. Use the same strategy as<br />

b<strong>in</strong>ary heap: Instead <strong>of</strong> remov<strong>in</strong>g the root node,<br />

1. F<strong>in</strong>d<br />

2. Delete<br />

3. Replace<br />

Runn<strong>in</strong>g Time:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 255]<br />

Delet<strong>in</strong>g a Node from a BST (cont’d)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 256]<br />

Balanced Search Trees<br />

We would like to come up with a way to keep a b<strong>in</strong>ary<br />

search tree “balanced”, so that the depth is<br />

and thus the runn<strong>in</strong>g time for the BST operations will<br />

be<br />

T<strong>here</strong> are a number <strong>of</strong> schemes that have been devised.<br />

We will briefly look at a few <strong>of</strong> them.<br />

They all require much more complicated algorithms<br />

for <strong>in</strong>sertion and deletion, <strong>in</strong> order to<br />

The algorithms for search<strong>in</strong>g, f<strong>in</strong>d<strong>in</strong>g m<strong>in</strong>, max, predecessor<br />

or successor, are essentially the same as for<br />

Next few slides give the ma<strong>in</strong> idea for the def<strong>in</strong>itions<br />

<strong>of</strong> the trees, but not why the def<strong>in</strong>itions give O(log n)<br />

depth, and not how the algorithms for <strong>in</strong>sertion and<br />

deletion work.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 257]<br />

AVL Trees<br />

An AVL tree is a b<strong>in</strong>ary search tree such that for each<br />

node, the heights <strong>of</strong> the left and right subtrees <strong>of</strong> the<br />

node<br />

Theorem: The depth <strong>of</strong> an AVL tree is<br />

When <strong>in</strong>sert<strong>in</strong>g or delet<strong>in</strong>g a node <strong>in</strong> an AVL tree, if<br />

you detect that the AVL tree property has been violated,<br />

then you


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 258]<br />

Red-Black Trees<br />

A red-black tree is a b<strong>in</strong>ary search tree <strong>in</strong> which<br />

every “real” node is given<br />

every node is colored<br />

– every leaf node is<br />

– if a node is red, then both its children are<br />

– every path from a node to a leaf conta<strong>in</strong>s<br />

From a fixed node, all paths from that node to a leaf<br />

differ <strong>in</strong> length by<br />

Theorem: The depth <strong>of</strong> an AVL tree is<br />

Insert and delete algorithms are quite <strong>in</strong>volved.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 259]<br />

B-Trees<br />

The AVL tree and red-black tree allowed some variation<br />

<strong>in</strong><br />

An alternative idea is to make sure that all root-to-leaf<br />

paths have<br />

and allow<br />

The def<strong>in</strong>ition <strong>of</strong> a B-tree uses a parameter m:<br />

every leaf<br />

the root has<br />

every non-root node has<br />

Keys are placed <strong>in</strong>to nodes like this:<br />

Each non-leaf node has<br />

Each leaf node has<br />

The keys with<strong>in</strong> a node are


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 260]<br />

B-Trees (cont’d)<br />

And we require the extended search tree property:<br />

For each node x, the i-th key <strong>in</strong> x is<br />

and is<br />

B-trees are extensively used <strong>in</strong> the real world, for <strong>in</strong>stance,<br />

database applications. In practice,<br />

Theorem: The depth <strong>of</strong> a B-tree tree is<br />

Insert and delete algorithms are quite <strong>in</strong>volved.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 261]<br />

Tries<br />

In the previous search trees, each key is<br />

except for their<br />

For some k<strong>in</strong>ds <strong>of</strong> keys, one key might be a<br />

For example, if the keys are str<strong>in</strong>gs, then the key “at”<br />

is a prefix <strong>of</strong> the key “atlas”.<br />

The next k<strong>in</strong>d <strong>of</strong> tree takes advantage <strong>of</strong><br />

to store them more efficiently.<br />

A trie is a (not necessarily b<strong>in</strong>ary) tree <strong>in</strong> which<br />

each node corresponds to<br />

prefix for each node<br />

The trie stor<strong>in</strong>g “a”, “ ale”, “ant”, “bed”, “bee”, “bet”:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 262]<br />

Insert<strong>in</strong>g <strong>in</strong>to a Trie<br />

To <strong>in</strong>sert <strong>in</strong>to a trie:<br />

<strong>in</strong>sert(x,s): // x is node, s is str<strong>in</strong>g to <strong>in</strong>sert<br />

------------<br />

if length(s) = 0 then<br />

mark x as hold<strong>in</strong>g a complete key<br />

else<br />

c := first character <strong>in</strong> s<br />

if no outgo<strong>in</strong>g edge from x is labeled with c then<br />

create a new child node <strong>of</strong> x<br />

label the edge to the new child node with c<br />

put the edge <strong>in</strong> the correct sorted order<br />

among all <strong>of</strong> x’s outgo<strong>in</strong>g edges<br />

endif<br />

x := child <strong>of</strong> x reached by edge labeled c<br />

s := result <strong>of</strong> remov<strong>in</strong>g first character from s<br />

<strong>in</strong>sert(x,s)<br />

endif<br />

Start the recursion<br />

To <strong>in</strong>sert “an” and “beep”:<br />

a<br />

b<br />

l<br />

n<br />

e<br />

e<br />

t<br />

d e<br />

t


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 263]<br />

Search<strong>in</strong>g <strong>in</strong> a Trie<br />

To search <strong>in</strong> a trie:<br />

search(x,s): // x is node, s is str<strong>in</strong>g to search for<br />

------------<br />

if length(s) = 0 then<br />

if x holds a complete key then return x<br />

else return null // s is not <strong>in</strong> the trie<br />

else<br />

c := first character <strong>in</strong> s<br />

if no outgo<strong>in</strong>g edge from x is labeled with c then<br />

return null // s is not <strong>in</strong> the trie<br />

else<br />

x := child <strong>of</strong> x reached by edge labeled c<br />

s := result <strong>of</strong> remov<strong>in</strong>g first character from s<br />

search(x,s)<br />

endif<br />

endif<br />

Start the recursion<br />

To search for “art” and “bee”:<br />

a b<br />

l<br />

n<br />

e<br />

e<br />

t<br />

d e<br />

t


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 264]<br />

Hash Table Implementation <strong>of</strong> Dictionary ADT<br />

Another implementation <strong>of</strong> the Dictionary ADT is a<br />

Hash tables support the operations<br />

<br />

<br />

<br />

with<br />

This is a significant advantage over even balanced search<br />

trees, which have average times <strong>of</strong><br />

The disadvantage <strong>of</strong> hash tables is that<br />

and pr<strong>in</strong>t<strong>in</strong>g all elements <strong>in</strong> sorted order takes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 265]<br />

Ma<strong>in</strong> Idea <strong>of</strong> Hash Table<br />

Ma<strong>in</strong> idea: exploit random access feature <strong>of</strong> arrays:<br />

the i-th entry <strong>of</strong> array A can be accessed<br />

Simple example: Suppose all keys are <strong>in</strong> the range<br />

Then store elements <strong>in</strong> an array A with<br />

Initialize all entries to some empty <strong>in</strong>dicator.<br />

To <strong>in</strong>sert x with key k:<br />

To search for key k:<br />

To delete element with key k:<br />

All times are<br />

But this idea does not scale well.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 266]<br />

Hash Functions<br />

Suppose<br />

elements are<br />

school has<br />

keys are<br />

S<strong>in</strong>ce t<strong>here</strong> are 1 billion possible SSN’s, we need an<br />

array <strong>of</strong> length 1 billion. And most <strong>of</strong> it will be wasted,<br />

s<strong>in</strong>ce only 40,000/1,000,000,000 = 1/25,000 fraction is<br />

nonempty.<br />

Instead, we need a way to<br />

Let M be the size <strong>of</strong> the array we are will<strong>in</strong>g to provide.<br />

Use a hash function, h, to<br />

Then h maps key values to <strong>in</strong>tegers <strong>in</strong> the range


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 267]<br />

Simple Hash Function Example<br />

Suppose keys are <strong>in</strong>tegers. Let the hash function be<br />

h(k) = k mod M. Notice that this always gives you<br />

someth<strong>in</strong>g <strong>in</strong> the range<br />

To <strong>in</strong>sert x with key k:<br />

To search for element with key k:<br />

To delete element with key k:<br />

All times are<br />

assum<strong>in</strong>g the hash function can be computed <strong>in</strong> constant<br />

time.<br />

The key to mak<strong>in</strong>g this work is to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 268]<br />

Collisions<br />

In reality, any hash function will have collisions: when<br />

two different keys<br />

This is <strong>in</strong>evitable, s<strong>in</strong>ce the hash function is squash<strong>in</strong>g<br />

down a large doma<strong>in</strong> <strong>in</strong>to a small range.<br />

For example, if h(k) =kmod M, then<br />

s<strong>in</strong>ce they both hash to<br />

What should you do when you have a collision? Two<br />

common solutions are<br />

1.<br />

2.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 269]<br />

Cha<strong>in</strong><strong>in</strong>g<br />

Keep all data items that hash to the same array location<br />

<strong>in</strong> a<br />

to <strong>in</strong>sert element x with key k:<br />

to search for element with key k:<br />

to delete element with key k:<br />

Worst case times, assum<strong>in</strong>g comput<strong>in</strong>g h is constant:<br />

<strong>in</strong>sert:<br />

search and delete:<br />

Worst case is if all n elements


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 270]<br />

Good Hash Functions for Cha<strong>in</strong><strong>in</strong>g<br />

Intuition: Hash function should<br />

More formally:<br />

Impractical to check <strong>in</strong> practice s<strong>in</strong>ce<br />

For example: Suppose the symbol table <strong>in</strong> a compiler<br />

is implemented with a hash table. The compiler writer<br />

cannot know <strong>in</strong> advance which variable names will appear<br />

<strong>in</strong> each program to be compiled.<br />

Heuristics are used to approximate this condition:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 271]<br />

Good Hash Functions for Cha<strong>in</strong><strong>in</strong>g (cont’d)<br />

Some issues to consider <strong>in</strong> choos<strong>in</strong>g a hash function:<br />

Exploit<br />

For symbol table example, take <strong>in</strong>to account the k<strong>in</strong>ds<br />

<strong>of</strong> variables names that people <strong>of</strong>ten choose (e.g.,<br />

x1).<br />

Hash function should depend on<br />

For example: if the keys are English words, it is not<br />

a good idea to hash on the first letter, s<strong>in</strong>ce many<br />

words beg<strong>in</strong> with S and few with X.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 272]<br />

Average Case Analysis <strong>of</strong> Cha<strong>in</strong><strong>in</strong>g<br />

Def<strong>in</strong>e load factor <strong>of</strong> hash table with M entries and n<br />

keys to be<br />

Assume a hash function that is ideal for cha<strong>in</strong><strong>in</strong>g<br />

Fact: Average length <strong>of</strong> each l<strong>in</strong>ked list is<br />

The average runn<strong>in</strong>g time for cha<strong>in</strong><strong>in</strong>g:<br />

Insert:<br />

Unsuccessful Search:<br />

O(1) time to compute h(k); items, on average, <strong>in</strong><br />

the l<strong>in</strong>ked list are checked until discover<strong>in</strong>g that k is<br />

not present.<br />

Successful Search:<br />

O(1) time to compute h(k); on average, key be<strong>in</strong>g<br />

sought is <strong>in</strong> middle <strong>of</strong> l<strong>in</strong>ked list, so =2 comparisons<br />

needed to f<strong>in</strong>d k.<br />

Delete:<br />

For these times to be O(1), must be O(1),son cannot<br />

be too much larger than


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 273]<br />

Open Address<strong>in</strong>g<br />

With this scheme, t<strong>here</strong> are<br />

Instead,<br />

If t<strong>here</strong> is a collision, you have to probe the table –<br />

You must pick a pattern that you will use to probe the<br />

table.<br />

The simplest pattern is to<br />

and then check<br />

This is called<br />

If h(k) =7, the probe sequence will be


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 274]<br />

Cluster<strong>in</strong>g<br />

A problem with l<strong>in</strong>ear prob<strong>in</strong>g:<br />

If an <strong>in</strong>sert probe sequence beg<strong>in</strong>s <strong>in</strong> a cluster,<br />

<br />

<br />

To reduce cluster<strong>in</strong>g,<br />

to skip over some locations, so locations are not checked<br />

T<strong>here</strong> are various schemes for how to choose the <strong>in</strong>crements;<br />

<strong>in</strong> fact, the <strong>in</strong>crement to use can be


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 275]<br />

Cluster<strong>in</strong>g (cont’d)<br />

If the probe sequence starts at 7 and the probe <strong>in</strong>crement<br />

is 4, then the probe sequence will be<br />

Warn<strong>in</strong>g! The probe <strong>in</strong>crement must be<br />

otherwise you will not search all locations.<br />

For example, suppose you have table size 9 and <strong>in</strong>crement<br />

3. You will only search


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 276]<br />

Double Hash<strong>in</strong>g<br />

Even when “non-l<strong>in</strong>ear” prob<strong>in</strong>g is used, it is still true<br />

that<br />

To get around this problem, use<br />

1. One hash function, h 1 ,isusedtodeterm<strong>in</strong>e<br />

2. A second hash function, h 2 ,isusedtodeterm<strong>in</strong>e<br />

If the hash functions are chosen properly,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 277]<br />

Double Hash<strong>in</strong>g Example<br />

Let h 1 (k) =kmod 13 and h 2 (k) =1+(kmod 11).<br />

To <strong>in</strong>sert 14: start prob<strong>in</strong>g at<br />

Probe <strong>in</strong>crement is<br />

Probe sequence is<br />

To <strong>in</strong>sert 27: start prob<strong>in</strong>g at<br />

Probe <strong>in</strong>crement is<br />

Probe sequence is<br />

To search for 18: start prob<strong>in</strong>g at<br />

Probe <strong>in</strong>crement is<br />

Probe sequence is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 278]<br />

Delet<strong>in</strong>g with Open Address<strong>in</strong>g<br />

Open address<strong>in</strong>g has another complication:<br />

to <strong>in</strong>sert:<br />

to search:<br />

Suppose we use l<strong>in</strong>ear prob<strong>in</strong>g. Consider this sequence:<br />

Insert k 1 ,w<strong>here</strong>h(k 1 )=3, at location 3.<br />

Insert k 2 ,w<strong>here</strong>h(k 2 )=3, at location 4.<br />

Insert k 3 ,w<strong>here</strong>h(k 3 )=3, at location 5.<br />

Delete k 2 from location 4 by sett<strong>in</strong>g location 4 to<br />

empty.<br />

Search for k 3 .<br />

Solution: when an element is deleted, <strong>in</strong>stead <strong>of</strong> mark<strong>in</strong>g<br />

the slot as empty,<br />

Then the search algorithm needs to cont<strong>in</strong>ue search<strong>in</strong>g<br />

if it f<strong>in</strong>ds one <strong>of</strong> those slots.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 279]<br />

Good Hash Functions for Open Address<strong>in</strong>g<br />

An ideal hash function for open address<strong>in</strong>g would satisfy<br />

an even stronger property than that for cha<strong>in</strong><strong>in</strong>g,<br />

namely:<br />

This is even harder to achieve <strong>in</strong> practice than the ideal<br />

property for cha<strong>in</strong><strong>in</strong>g.<br />

A good approximation is double hash<strong>in</strong>g with this scheme:<br />

<br />

Generalizes the earlier example.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 280]<br />

Average Case Analysis <strong>of</strong> Open Address<strong>in</strong>g<br />

In this situation, the load factor = n=M is always<br />

less than 1:<br />

Assume that t<strong>here</strong> is always at least one empty slot.<br />

Assume that the hash function ensures that each key is<br />

equally likely to have each permutation <strong>of</strong><br />

f0; 1;:::;M ,1gas its probe sequence.<br />

Average case runn<strong>in</strong>g times:<br />

Unsuccessful Search:<br />

Insert:<br />

Successful Search:<br />

Delete:<br />

The reason<strong>in</strong>g beh<strong>in</strong>d these formulas requires more sophisticated<br />

probability than for cha<strong>in</strong><strong>in</strong>g.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 281]<br />

Sanity Check for Open Address<strong>in</strong>g Analysis<br />

The time for searches should<br />

The formula for unsuccessful search is<br />

As n gets closer to M,<br />

so<br />

so<br />

At the extreme, when n = M , 1, the formula 1<br />

1, =<br />

M, mean<strong>in</strong>g that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 282]<br />

Sort<strong>in</strong>g<br />

Insertion Sort:<br />

– Consider<br />

– Shift<br />

– Insert<br />

– Worst-case time is<br />

Treesort:<br />

– Insert<br />

– Then do<br />

– For a basic BST, worst-case time is<br />

but average time is<br />

– For a balanced BST, worst-cast time is<br />

although code is more complicated.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 283]<br />

Sort<strong>in</strong>g (cont’d)<br />

Heapsort:<br />

– Insert<br />

– Then<br />

– Worst-case time is<br />

Mergesort: Apply the idea <strong>of</strong><br />

– Split<br />

– Recursively<br />

– Recursively<br />

– Then<br />

– Worst-case time is<br />

however, it requires more space.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 284]<br />

Object-Oriented S<strong>of</strong>tware Eng<strong>in</strong>eer<strong>in</strong>g<br />

References:<br />

Standish textbook, Appendix C<br />

Develop<strong>in</strong>g Java S<strong>of</strong>tware, by Russel W<strong>in</strong>der and<br />

Graham Roberts, John Wiley & Sons, 1998 (ch 8-<br />

9).<br />

Outl<strong>in</strong>e <strong>of</strong> material:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 285]<br />

Small Scale vs. Large Scale Programm<strong>in</strong>g<br />

Programm<strong>in</strong>g <strong>in</strong> the small: programs done by<br />

whose length is<br />

Programm<strong>in</strong>g <strong>in</strong> the large: projects consist<strong>in</strong>g <strong>of</strong><br />

and produc<strong>in</strong>g<br />

Obviously the complications are much greater <strong>here</strong>.<br />

The field <strong>of</strong> s<strong>of</strong>tware eng<strong>in</strong>eer<strong>in</strong>g is mostly oriented<br />

toward<br />

However, the pr<strong>in</strong>ciples still hold (although simplified)<br />

for programm<strong>in</strong>g <strong>in</strong> the small. It’s worth understand<strong>in</strong>g<br />

these pr<strong>in</strong>ciples so that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 286]<br />

Object-Oriented S<strong>of</strong>tware Eng<strong>in</strong>eer<strong>in</strong>g<br />

S<strong>of</strong>tware eng<strong>in</strong>eer<strong>in</strong>g studies<br />

Object-oriented s<strong>of</strong>tware eng<strong>in</strong>eer<strong>in</strong>g uses<br />

Why object-oriented?<br />

use <strong>of</strong> abstractions to<br />

benefits <strong>of</strong> encapsulation to<br />

power <strong>of</strong> <strong>in</strong>heritance to<br />

Experience has shown that object-oriented s<strong>of</strong>tware eng<strong>in</strong>eer<strong>in</strong>g<br />

helps create robust reliable programs with<br />

promotes the development <strong>of</strong> programs by


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 287]<br />

Object-Oriented S<strong>of</strong>tware Eng<strong>in</strong>eer<strong>in</strong>g (cont’d)<br />

Solutions to specific problems tend to be fragile and<br />

short-lived:<br />

To m<strong>in</strong>imize effects <strong>of</strong> requirement changes<br />

<strong>in</strong>stead <strong>of</strong> just focus<strong>in</strong>g on<br />

Usually the problem doma<strong>in</strong> is fairly stable, w<strong>here</strong>as a<br />

If you capture the problem doma<strong>in</strong> as the core <strong>of</strong><br />

your design, then the code is likely to be<br />

More traditional structured programm<strong>in</strong>g tends to lead<br />

to a


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 288]<br />

Object-Oriented S<strong>of</strong>tware Eng<strong>in</strong>eer<strong>in</strong>g (cont’d)<br />

In OO analysis and design,identify<br />

and model them as<br />

Leads to<br />

go downwards to<br />

go upwards to<br />

This approach tends to lead to<br />

and<br />

For <strong>in</strong>stance, when the requirements change, you may<br />

have all the basic abstractions right but you<br />

Aim for<br />

which are specialized by <strong>in</strong>heritance to provide


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 289]<br />

S<strong>of</strong>tware Life Cycle<br />

<strong>in</strong>ception:<br />

– requirements:<br />

elaboration:<br />

– analysis:<br />

– design:<br />

– identify reuse:<br />

implementation<br />

–<br />

–<br />

–<br />

test<strong>in</strong>g<br />

delivery and ma<strong>in</strong>tenance


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 290]<br />

S<strong>of</strong>tware Life Cycle (cont’d)<br />

Lifecycle is not followed l<strong>in</strong>early;<br />

An ideal way to proceed is by<br />

implement<br />

review<br />

decide<br />

proceed<br />

cont<strong>in</strong>ue<br />

This supports<br />

lett<strong>in</strong>g you try alternatives and


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 291]<br />

Requirements<br />

Decide what the program is supposed to do<br />

Harder than it sounds.<br />

Ask the user<br />

<br />

<br />

Involve the user <strong>in</strong> review<strong>in</strong>g the requirements when<br />

they are produced and the prototypes developed.<br />

Typically, requirements are organized<br />

Helpful to construct scenarios, which describe


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 292]<br />

Requirements (cont’d)<br />

An example scenario to look up a phone number:<br />

1. select<br />

2. enter<br />

3.<br />

4. program computes, to<br />

(do NOT specify data structure to be used at this<br />

level)<br />

5.<br />

Construct as many scenarios as needed until you feel<br />

comfortable, and have gotten feedback from the user,<br />

that<br />

This part <strong>of</strong> the s<strong>of</strong>tware life cycle is no different for<br />

object-oriented s<strong>of</strong>tware eng<strong>in</strong>eer<strong>in</strong>g than for non-objectoriented.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 293]<br />

Object-Oriented Analysis and Design<br />

Ma<strong>in</strong> objective:<br />

Analysis and design are two ends <strong>of</strong> a spectrum: Analysis<br />

focuses more on the<br />

while design focuses more on the<br />

For large scale projects, t<strong>here</strong> might be a real dist<strong>in</strong>ction:<br />

for example,<br />

might be required to implement<br />

For small scale projects, t<strong>here</strong> is typically no dist<strong>in</strong>ction<br />

between analysis and design:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 294]<br />

Object-Oriented Analysis and Design (cont’d)<br />

To decide on the classes:<br />

Study<br />

Look for nouns <strong>in</strong> the requirements:<br />

These will probably turn <strong>in</strong>to<br />

and/or<br />

See how the requirements specify <strong>in</strong>teractions between<br />

th<strong>in</strong>gs (e.g., each student has a GPA, each<br />

course has a set <strong>of</strong> enrolled students).<br />

Use an analysis method:<br />

(Particularly aimed at large scale projects.)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 295]<br />

An Example OO Analysis Method<br />

CRC (Class, Responsibility, Collaboration): It clearly<br />

identifies the Classes, what the Responsibilities are <strong>of</strong><br />

each class, and how the classes Collaborate (<strong>in</strong>teract).<br />

In the CRC method, you draw class diagrams:<br />

each class is<br />

–<br />

–<br />

–<br />

if class 1 is a subclass <strong>of</strong> class 2, then<br />

if an object <strong>of</strong> class 1 is part <strong>of</strong> (an <strong>in</strong>stance variable<br />

<strong>of</strong>) class 2, then<br />

if objects <strong>of</strong> class 1 need to communicate with objects<br />

<strong>of</strong> class 2, then<br />

The arrows and l<strong>in</strong>es can be annotated to <strong>in</strong>dicate the<br />

number <strong>of</strong> objects <strong>in</strong>volved, the role they play, etc.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 296]<br />

CRC Example<br />

To model a game with several players who take turns<br />

throw<strong>in</strong>g a cup conta<strong>in</strong><strong>in</strong>g dice, <strong>in</strong> which some scor<strong>in</strong>g<br />

system is used to determ<strong>in</strong>e the best score:<br />

This is a diagram <strong>of</strong> the<br />

not the<br />

Object diagrams are trickier s<strong>in</strong>ce<br />

Double-check that the class diagram is consistent with<br />

requirements scenarios.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 297]<br />

Object-Oriented Analysis and Design (cont’d)<br />

While flesh<strong>in</strong>g out the design, after identify<strong>in</strong>g what<br />

the different methods <strong>of</strong> the classes should be, figure<br />

out<br />

This means decid<strong>in</strong>g what<br />

Do not fall <strong>in</strong> love with one particular solution (such as<br />

the first one that occurs to you). Generate<br />

and then try to<br />

Do not commit to a particular solution too early <strong>in</strong> the<br />

process. Concentrate on<br />

The use <strong>of</strong> ADTs assists <strong>in</strong> this aspect.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 298]<br />

Verification and Correctness Pro<strong>of</strong>s<br />

Part <strong>of</strong> the design <strong>in</strong>cludes<br />

You should have some conv<strong>in</strong>c<strong>in</strong>g argument as to why<br />

these algorithms are correct.<br />

In many cases, it will be obvious:<br />

<br />

<br />

But sometimes you might be com<strong>in</strong>g up with your own<br />

algorithm, or<br />

In these cases, it’s important to check what you are<br />

do<strong>in</strong>g!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 299]<br />

Verification and Correctness Pro<strong>of</strong>s (cont’d)<br />

The Standish book describes one particular way to prove<br />

correctness <strong>of</strong> small programs, or program fragments.<br />

The important lessons are:<br />

It is possible to<br />

Formalisms can help you to<br />

Spend<strong>in</strong>g a lot <strong>of</strong> time th<strong>in</strong>k<strong>in</strong>g about your program,<br />

no matter what formalism, will<br />

These approaches are impossible to do<br />

For large programs, t<strong>here</strong> are research efforts aimed at<br />

i.e., programs that<br />

Generally automatic verification is slow and cumbersome,<br />

and requires some specialized skills.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 300]<br />

Verification and Correctness Pro<strong>of</strong>s (cont’d)<br />

An alternative approach to program verification is<br />

Instead <strong>of</strong> try<strong>in</strong>g to verify actual code,<br />

Represent the algorithm <strong>in</strong><br />

then<br />

Of course, you might make a mistake when translat<strong>in</strong>g<br />

your pseudocode <strong>in</strong>to Java, but the prov<strong>in</strong>g will be<br />

much more manageable than the verification.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 301]<br />

Implementation<br />

The design is now fleshed out to the level <strong>of</strong> code:<br />

<br />

<br />

<br />

<br />

As the code is written, document the key design decisions,<br />

implementation choices, and any unobvious<br />

aspects <strong>of</strong> the code.<br />

S<strong>of</strong>tware reuse: Use library classes as appropriate (e.g.,<br />

Stack, Vector, Date, HashTable). K<strong>in</strong>ds <strong>of</strong> reuse:<br />

<br />

<br />

<br />

But sometimes modifications can be more time consum<strong>in</strong>g<br />

than start<strong>in</strong>g from scratch.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 302]<br />

Test<strong>in</strong>g and Debugg<strong>in</strong>g: The Limitations<br />

Test<strong>in</strong>g cannot prove that your program is correct.<br />

It is impossible to test a program on every s<strong>in</strong>gle <strong>in</strong>put,<br />

so<br />

Even if you could apply some k<strong>in</strong>d <strong>of</strong> program verification<br />

to your program,<br />

And <strong>in</strong> fact, how do you know that your requirements<br />

However, test<strong>in</strong>g still serves a worthwhile, pragmatic,<br />

purpose.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 303]<br />

Test Cases, Plans and Logs<br />

Run the program on various test cases.<br />

should<br />

Test cases<br />

More specifically,<br />

test on<br />

test on<br />

test on<br />

Organize your test cases accord<strong>in</strong>g to a<br />

Purposes:<br />

make it clear<br />

ensure that<br />

Results <strong>of</strong> runn<strong>in</strong>g a set <strong>of</strong> tests is a<br />

After fix<strong>in</strong>g a bug, you must<br />

(W<strong>in</strong>der and Roberts calls this the Pr<strong>in</strong>ciple <strong>of</strong> Maximum<br />

Paranoia.)


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 304]<br />

K<strong>in</strong>ds <strong>of</strong> Test<strong>in</strong>g<br />

Unit test<strong>in</strong>g:<br />

<br />

<br />

Integration test<strong>in</strong>g:<br />

Two approaches to <strong>in</strong>tegration test<strong>in</strong>g:<br />

Bottom-up test<strong>in</strong>g<br />

Then progress to the next level up: those methods and<br />

classes that only use the bottom level ones already tested.<br />

Use a driver to test comb<strong>in</strong>ations <strong>of</strong> the bottom two<br />

layers.<br />

Proceed until


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 305]<br />

K<strong>in</strong>ds <strong>of</strong> Test<strong>in</strong>g (cont’d)<br />

Top down test<strong>in</strong>g proceeds <strong>in</strong> the opposite direction,<br />

mak<strong>in</strong>g<br />

Reasons to do top down test<strong>in</strong>g:<br />

to allow s<strong>of</strong>tware development to<br />

if you have modules that are mutually dependent,<br />

e.g., X uses Y, Y uses Z, and Z uses X. You can


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 306]<br />

Other Approaches to Debugg<strong>in</strong>g<br />

In addition to test<strong>in</strong>g, another approach to debugg<strong>in</strong>g a<br />

program is to<br />

A third approach is called a<br />

Some companies give your (group’s) code to another<br />

group, whose job is to try to make your code break!


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 307]<br />

Ma<strong>in</strong>tenance and Documentation<br />

Ma<strong>in</strong>tenance <strong>in</strong>cludes:<br />

<br />

<br />

<br />

<br />

Most <strong>of</strong>ten, the person (or people) do<strong>in</strong>g the ma<strong>in</strong>tenance<br />

are NOT the one(s) who orig<strong>in</strong>ally wrote the<br />

program.<br />

T<strong>here</strong> are (at least) two k<strong>in</strong>ds <strong>of</strong> documentation, both<br />

<strong>of</strong> which need to be updated dur<strong>in</strong>g ma<strong>in</strong>tenance:<br />

<strong>in</strong>ternal documentation,<br />

external documentation,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 308]<br />

Ma<strong>in</strong>tenance and Documentation (cont’d)<br />

In addition to good documentation, a clean and easily<br />

modifiable structure is needed for effective ma<strong>in</strong>tenance,<br />

If changes are made <strong>in</strong> ad hoc, kludgey way, (either because<br />

the ma<strong>in</strong>ta<strong>in</strong>er does not understand the underly<strong>in</strong>g<br />

design or because the design is poor), the program<br />

will<br />

Try<strong>in</strong>g to fix one problem causes someth<strong>in</strong>g else to<br />

break, so <strong>in</strong> desperation you put <strong>in</strong> some jumps (spaghetti<br />

code) to try to avoid this, etc.<br />

Eventually it may be better to replace the program with


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 309]<br />

Measurement and Tun<strong>in</strong>g<br />

Experience has shown:<br />

<br />

<br />

These observations suggest that optimiz<strong>in</strong>g your program<br />

can pay big benefits, but that it is smarter to<br />

How can you figure out w<strong>here</strong> your program is spend<strong>in</strong>g<br />

its time?<br />

use a tool called an


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 310]<br />

Measurement and Tun<strong>in</strong>g (cont’d)<br />

Th<strong>in</strong>gs you can do to speed up a program:<br />

f<strong>in</strong>d<br />

replace<br />

replace<br />

take advantage <strong>of</strong><br />

Don’t do th<strong>in</strong>gs that are stupidly slow <strong>in</strong> your program<br />

from the beg<strong>in</strong>n<strong>in</strong>g.<br />

On the other hand, don’t go overboard <strong>in</strong> supposed<br />

optimizations (that might hurt readability) unless you


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 311]<br />

S<strong>of</strong>tware Reuse and Bottom-up Programm<strong>in</strong>g<br />

The bottom l<strong>in</strong>e from section C.7 <strong>in</strong> Standish is:<br />

the effort required to build s<strong>of</strong>tware is<br />

mak<strong>in</strong>g use <strong>of</strong> reusable components can<br />

So it makes lots <strong>of</strong> sense to try to reuse s<strong>of</strong>tware. Of<br />

course, t<strong>here</strong> are costs associated with reuse:<br />

<br />

<br />

Us<strong>in</strong>g lots <strong>of</strong> reusable components leads to more bottomup,<br />

rather than top down, programm<strong>in</strong>g. Or perhaps,<br />

more appropriately,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 312]<br />

Design Patterns<br />

As you ga<strong>in</strong> experience, you will learn to recognize<br />

good and bad design and build up<br />

Why not try to exploit other people’s experience <strong>in</strong> this<br />

area as well?<br />

A design pattern captures a component <strong>of</strong> a complete<br />

design that has been observed to<br />

It provides both a solution to a problem and <strong>in</strong>formation<br />

about them.<br />

T<strong>here</strong> is a grow<strong>in</strong>g literature on design patterns, especially<br />

for object oriented programm<strong>in</strong>g. It is worthwhile<br />

to become familiar with it. For <strong>in</strong>stance, search<br />

the WWW for “design pattern” and see what you get.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 313]<br />

File Structures<br />

A file is<br />

Why on mass storage?<br />

<br />

<br />

<br />

The data is subdivided <strong>in</strong>to<br />

Each record conta<strong>in</strong>s a number <strong>of</strong><br />

One (or more) field is the<br />

Issue:<br />

We will discuss sequential files, <strong>in</strong>dexed files, and hashed<br />

files.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 314]<br />

Sequential Files<br />

Records are conceptually organized <strong>in</strong><br />

The actual storage might or might not be sequential:<br />

On a tape,<br />

On a disk,<br />

Convenient way to batch (group together) a number <strong>of</strong><br />

updates:<br />

Store the<br />

Sort the<br />

Scan through<br />

Not a convenient organization for access<strong>in</strong>g a particular<br />

record quickly.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 315]<br />

Indexed Files<br />

Sequential search is even slower on disk/tape than <strong>in</strong><br />

ma<strong>in</strong> memory. Try to improve performance us<strong>in</strong>g<br />

An <strong>in</strong>dex for a file is a<br />

Typically the key field is<br />

The <strong>in</strong>dex can be organized as a list, a search tree, a<br />

hash table, etc. To f<strong>in</strong>d a particular record:<br />

<br />

<br />

<br />

Multiple <strong>in</strong>dexes, one per key field, allow


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 316]<br />

Hashed Files<br />

An alternative to stor<strong>in</strong>g the <strong>in</strong>dex as a hash table is to<br />

Instead, hash on the key to f<strong>in</strong>d the address <strong>of</strong> the desired<br />

record and<br />

The usual hash<strong>in</strong>g considerations arise.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 317]<br />

Databases<br />

A database is<br />

<br />

<br />

Example: Collection <strong>of</strong> student records can be viewed<br />

as a database to be used by:<br />

<br />

<br />

<br />

<br />

The advantages <strong>of</strong> consolidat<strong>in</strong>g the data:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 318]<br />

Database System Organization<br />

The “s<strong>of</strong>tware architecture” <strong>of</strong> a database system is<br />

End user calls application s<strong>of</strong>tware to access the<br />

data. End user th<strong>in</strong>ks <strong>of</strong> data<br />

Application s<strong>of</strong>tware calls database management system<br />

(DBMS) s<strong>of</strong>tware. The applications s<strong>of</strong>tware<br />

has a<br />

DBMS deals with the<br />

As usual, the advantages <strong>of</strong> layer<strong>in</strong>g are that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 319]<br />

Communication with a Database<br />

Databases usually provide a useful and powerful <strong>in</strong>terface<br />

for obta<strong>in</strong><strong>in</strong>g <strong>in</strong>formation from them. So far,<br />

we’ve just seen requests <strong>of</strong> the form:<br />

<br />

<br />

<br />

But suppose you’d like to pr<strong>in</strong>t out the names <strong>of</strong> all<br />

students that are freshman and either have a 4.0 GPA<br />

or whose names start with X.<br />

T<strong>here</strong> are ways to conceptually organize the data to<br />

allow such queries to be answered efficiently, us<strong>in</strong>g<br />

what are called<br />

The application s<strong>of</strong>tware communicates with<br />

The DBMS must


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 320]<br />

Database Integrity<br />

Data <strong>in</strong> a database is typically<br />

<br />

<br />

Thus it must<br />

Data can be corrupted if<br />

Example <strong>of</strong> corrupted data:<br />

T1 transfers<br />

T2 <strong>in</strong>ventories<br />

Suppose this sequence <strong>of</strong> events occurs:<br />

T1 subtracts<br />

T2 gets the<br />

T2 gets the<br />

T1 adds<br />

T2’s total balance is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 321]<br />

DB Serializability<br />

To prevent transactions from <strong>in</strong>terfer<strong>in</strong>g with each other,<br />

the DBMS should<br />

This property is called<br />

The DMBS does not have to (and should not) actually<br />

make the transactions run serially, but if t<strong>here</strong> is a potential<br />

conflict,<br />

One solution is<br />

Before access<strong>in</strong>g any data item, the transaction must<br />

Only one transaction at a time can<br />

If another transaction already has the lock, then<br />

After access<strong>in</strong>g all the data items,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 322]<br />

Committ<strong>in</strong>g and Abort<strong>in</strong>g a Transaction<br />

Two-phase lock<strong>in</strong>g can lead to deadlock, e.g.:<br />

<br />

<br />

<br />

<br />

The DBMS must periodically check for deadlock, and<br />

if one is discovered, it must<br />

If the aborted transaction has already made changes to<br />

the database, the DBMS must<br />

either<br />

don’t actually<br />

Once the transaction has successfully completed, then<br />

it is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 323]<br />

Artificial Intelligence<br />

Goal: Develop mach<strong>in</strong>es that<br />

<br />

<br />

and proceed ”<strong>in</strong>telligently”<br />

<br />

<br />

<br />

Dist<strong>in</strong>ct but related goals:<br />

1.<br />

2.<br />

3.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 324]<br />

8-Puzzle Example<br />

Given a 3-by-3 box that holds 8 tiles, numbered 1 through<br />

8. One tile is miss<strong>in</strong>g. The goal is to start with the tiles<br />

scrambled and<br />

We will try to solve this problem by a mach<strong>in</strong>e that has<br />

a gripper,<br />

a video camera,<br />

a computer,<br />

a “f<strong>in</strong>ger”,<br />

Ideas from mechanical eng<strong>in</strong>eer<strong>in</strong>g can be used to implement<br />

the gripper and the f<strong>in</strong>ger. We will talk about<br />

how to “see” w<strong>here</strong> the tiles are, and how to decide<br />

how to move the tiles.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 325]<br />

<strong>Computer</strong> Vision<br />

It is not enough to simply store the image obta<strong>in</strong>ed<br />

from the camera. The program must be<br />

figure out which parts <strong>of</strong> the image are the salient<br />

objects, called<br />

and then recognize the objects by compar<strong>in</strong>g them<br />

to known symbols, called<br />

For the 8-puzzle, this problem can be highly simplified:<br />

always expect the digits to<br />

<br />

<br />

<br />

But <strong>in</strong> general this is a very difficult problem and one<br />

w<strong>here</strong> t<strong>here</strong> has been extensive research.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 326]<br />

Reason<strong>in</strong>g<br />

How can the program solve the puzzle?<br />

One solution is to<br />

For example, if the <strong>in</strong>put is<br />

then the solution is to<br />

But <strong>in</strong> this case t<strong>here</strong> are approximately 9! = 362,880<br />

different <strong>in</strong>puts, some <strong>of</strong> which require a long sequence<br />

<strong>of</strong> moves to solve, and it would require a lot <strong>of</strong> space.<br />

Plus, someone would have to figure out all the answers<br />

<strong>in</strong> advance.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 327]<br />

Production Systems<br />

Instead, have the program figure out the solution. One<br />

approach is the<br />

First, consider the state graph <strong>of</strong> the problem:<br />

<br />

<br />

Here is a t<strong>in</strong>y piece <strong>of</strong> the state graph for the 8-puzzle:<br />

Identify the<br />

The control system figures out how to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 328]<br />

Solv<strong>in</strong>g a Production System<br />

We must f<strong>in</strong>d a path through the state graph from<br />

Luckily, f<strong>in</strong>d<strong>in</strong>g paths <strong>in</strong> graphs is<br />

One way is to build a search tree (not to be confused<br />

with a b<strong>in</strong>ary search tree), which<br />

Two solutions are


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 329]<br />

Breadth-First Search<br />

Build the search tree <strong>in</strong> a breadth-first manner:<br />

The root<br />

The next level<br />

The next level<br />

For example:<br />

1 2 3<br />

4<br />

6<br />

7<br />

5<br />

8<br />

2 3<br />

1 2 3<br />

1 2 3<br />

1<br />

4<br />

6<br />

4 6<br />

7<br />

4<br />

6<br />

7<br />

5<br />

8<br />

7<br />

5<br />

8<br />

5<br />

8<br />

2<br />

3<br />

1 3<br />

1 2 3<br />

1 2 3<br />

1 2 3<br />

1<br />

4<br />

6<br />

4 2 6<br />

4<br />

6<br />

4 5 6<br />

7<br />

4<br />

6<br />

7 5 8 7 5 8 7 5 8 7 8<br />

But the search tree grows exponentially.<br />

5<br />

8


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 330]<br />

Depth-First Search<br />

Another approach is<br />

Pursue more promis<strong>in</strong>g paths to greater depths and<br />

consider other options only if<br />

To implement this idea, we need some criterion to decide<br />

which paths are promis<strong>in</strong>g, or appear to be promis<strong>in</strong>g.<br />

Such criteria are called heuristics. A heuristic is<br />

We need someth<strong>in</strong>g quantitative so we can


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 331]<br />

Heuristic for 8-Puzzle<br />

For the 8-puzzle example, our <strong>in</strong>tuitive rule <strong>of</strong> thumb<br />

is to<br />

A quantitative heuristic measure is:<br />

For <strong>in</strong>stance, if the <strong>in</strong>put is<br />

then the heuristic measure is<br />

This heuristic has two desirable properties:<br />

1. it is a<br />

2. it is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 332]<br />

Us<strong>in</strong>g a Heuristic <strong>in</strong> Depth-First Search<br />

Repeatedly<br />

Choose the<br />

Generate<br />

Cont<strong>in</strong>ue<br />

In the 8-puzzle example above:<br />

Generate the root. Its heuristic measure is<br />

Generate all children <strong>of</strong> the root. They have measures<br />

Choose the leaf with measure 2 and generate all its<br />

children. They have measures<br />

Choose the leaf with measure 1 and generate all its<br />

children. They have measures<br />

In this depth-first search, we only had to generate 9<br />

states, <strong>in</strong>stead <strong>of</strong>


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 333]<br />

Other Applications <strong>of</strong> Production Systems<br />

Many problems can be formulated as production systems.<br />

In addition to the 8-puzzle,<br />

You can even model the process <strong>of</strong> draw<strong>in</strong>g logical<br />

conclusions from a set <strong>of</strong> given facts as a production<br />

system. In this case,<br />

each state is<br />

a production/rule/move corresponds to<br />

For <strong>in</strong>stance, part <strong>of</strong> the state graph might be:<br />

s<strong>in</strong>ce t<strong>here</strong> is a rule <strong>of</strong> logic that says: Given the facts<br />

1.<br />

2.<br />

then you can deduce that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 334]<br />

Some Other Areas <strong>of</strong> AI<br />

Neural Networks: Try to take advantage <strong>of</strong> the power<br />

<strong>of</strong> parallelism (multiprocessor computer architectures)<br />

us<strong>in</strong>g a paradigm that (roughly) follows the model <strong>of</strong><br />

Robotics: Hardware and s<strong>of</strong>tware work<strong>in</strong>g together,<br />

e.g., automated manufactur<strong>in</strong>g. Great <strong>in</strong>terest <strong>in</strong> hav<strong>in</strong>g<br />

mach<strong>in</strong>es explore and function <strong>in</strong> uncontrolled and<br />

unpredictable environments, such as<br />

<br />

<br />

<br />

Expert Systems: Comb<strong>in</strong>e doma<strong>in</strong> specific knowledge<br />

from human experts with For example:


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 335]<br />

Time Complexity <strong>of</strong> an Algorithm<br />

Time complexity <strong>of</strong> an algorithm: the function T (n)<br />

that describes the<br />

Given a particular algorithm, discover this function by<br />

attack<strong>in</strong>g the problem from two directions:<br />

f<strong>in</strong>d an upper bound U (n) on the function T (n), i.e.,<br />

conv<strong>in</strong>ce ourselves that the algorithm will<br />

f<strong>in</strong>d a lower bound L(n) on the function T (n), i.e.,<br />

conv<strong>in</strong>ce ourselves that, for each n, t<strong>here</strong> is<br />

Try to f<strong>in</strong>d smallest U and largest L, so that T is squeezed<br />

<strong>in</strong> between and has no room to hide.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 336]<br />

Time Complexity <strong>of</strong> an Algorithm (cont’d)<br />

(a) No execution on an <strong>in</strong>put <strong>of</strong> size n 0 takes<br />

(b) The slowest execution on all <strong>in</strong>puts <strong>of</strong> size n 0 takes<br />

(c) At least one execution on an <strong>in</strong>put <strong>of</strong> size n 0 takes


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 337]<br />

Time Complexity <strong>of</strong> Heapsort<br />

Let T (n) be the time complexity <strong>of</strong> heapsort.<br />

First cut at upper bound:<br />

First cut at lower bound:<br />

Ref<strong>in</strong>ed argument for upper bound: each heap operation<br />

never<br />

Ref<strong>in</strong>ed argument for lower bound: Describe a particular<br />

<strong>in</strong>put that<br />

On <strong>in</strong>put n; n , 1;n,2;:::;3;2;1, runn<strong>in</strong>g time is at<br />

least<br />

Thus T (n) now precisely identified as


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 338]<br />

Time Complexity <strong>of</strong> a Problem<br />

Time complexity <strong>of</strong> a problem: the time complexity<br />

for<br />

To show that a problem has time complexity T (n):<br />

Identify a<br />

Then prove<br />

Example: Sort<strong>in</strong>g problem has time complexity O(n log n).<br />

<br />

It can be proved that<br />

Problems can be classified by their time complexity.<br />

Harder problems are considered to be those


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 339]<br />

The Class P<br />

All problems (not algorithms) whose time complexity<br />

is at most some polynomial are said to be<br />

Example:<br />

Not all problems are <strong>in</strong> P.<br />

Example: Consider the problem <strong>of</strong> list<strong>in</strong>g all permutations<br />

<strong>of</strong> the <strong>in</strong>tegers 1 through n.<br />

Output size is<br />

Thus runn<strong>in</strong>g time is<br />

n! is larger than 2 n , thus


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 340]<br />

NP-Complete Problems<br />

T<strong>here</strong> is an important class <strong>of</strong> problems that<br />

These problems are called<br />

These problems have the follow<strong>in</strong>g characteristic:<br />

<br />

<br />

Many real-world problems <strong>in</strong> science, math, eng<strong>in</strong>eer<strong>in</strong>g,<br />

operations research, etc. are NP-complete.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 341]<br />

Travel<strong>in</strong>g Salesman Problem<br />

An example NP-complete problem is the<br />

Given a set <strong>of</strong> cities and the distances between them,<br />

determ<strong>in</strong>e an order <strong>in</strong> which to<br />

A candidate solution for TSP is<br />

To check whether the allowed mileage is exceeded, add<br />

up the distances between adjacent cities <strong>in</strong> the list<strong>in</strong>g,<br />

which will take<br />

But the total number <strong>of</strong> different candidate solutions is


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 342]<br />

Pvs.NP<br />

Imag<strong>in</strong>e an (unrealistically) powerful model <strong>of</strong> computation<br />

<strong>in</strong> which the computer first makes a lucky guess<br />

(a nondeterm<strong>in</strong>istic choice) as to a candidate solution<br />

<strong>in</strong> constant time, and then behaves as an ord<strong>in</strong>ary computer<br />

and verifies the solution.<br />

Problems solvable on this computer <strong>in</strong> polynomial time<br />

are<br />

NP <strong>in</strong>cludes<br />

Hav<strong>in</strong>g polynomial runn<strong>in</strong>g time on this funny computer<br />

would not seem to ensure polynomial runn<strong>in</strong>g<br />

time on a real computer.<br />

That is, it seems likely that<br />

But no one has yet been able to prove P 6= NP. Outstand<strong>in</strong>g<br />

open question <strong>in</strong> CS s<strong>in</strong>ce the 1970’s.


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 343]<br />

Computability Theory<br />

Complexity theory focuses on<br />

Computability theory focuses on<br />

We will focus on comput<strong>in</strong>g (mathematical) functions,<br />

with <strong>in</strong>puts and outputs.<br />

We would like to know if t<strong>here</strong> exist functions that


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 344]<br />

Church-Tur<strong>in</strong>g Thesis<br />

First, we have to decide what constitutes an algorithm.<br />

Assembly languages have<br />

High-level languages have<br />

<br />

Church-Tur<strong>in</strong>g thesis: (“thesis” means “conjecture”)<br />

Anyth<strong>in</strong>g that can reasonably be considered an algorithm<br />

can be<br />

A Tur<strong>in</strong>g mach<strong>in</strong>e is a<br />

Thus, for theoretical purposes,


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 345]<br />

Comput<strong>in</strong>g Functions<br />

Some sample functions:<br />

f (n) =3:<br />

f(n)=2n:<br />

f(n) = s<strong>in</strong> n:<br />

T<strong>here</strong> exist non-computable functions, functions whose<br />

<strong>in</strong>put/output relationships are so complicated that t<strong>here</strong><br />

is no<br />

We will assume<br />

your<br />

with a<br />

only consider


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 346]<br />

Goedel Number <strong>of</strong> a Program<br />

Here is a way to convert a program <strong>in</strong>to an <strong>in</strong>teger.<br />

<br />

<br />

Conversely, any <strong>in</strong>teger can be converted<br />

Most <strong>of</strong> the time,<br />

Sometimes it<br />

Rarely,<br />

More rarely,<br />

Use this number<strong>in</strong>g scheme to


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 347]<br />

An Uncomputable Function<br />

Def<strong>in</strong>e a function h called the<br />

If the program with Goedel number n halts when its<br />

<strong>in</strong>put is n, then<br />

If the program with Goedel number n does not halt<br />

when its <strong>in</strong>put is n, then<br />

Theorem: h is uncomputable<br />

Pro<strong>of</strong>: Assume <strong>in</strong> contradiction that h is computable.<br />

Then<br />

Def<strong>in</strong>e another program I (which will be <strong>in</strong> the list<strong>in</strong>g):<br />

1. n<br />

2. run program H<br />

3. let x be<br />

4. if x =0then<br />

5. else


CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 348]<br />

An Uncomputable Function (cont’d)<br />

Let n I<br />

be the Goedel number <strong>of</strong> I.<br />

Case 1:<br />

Case 2:<br />

Thus the hypothetical program H<br />

2<br />

Another way to view this result is that

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!