12.07.2015 Views

Introduction to Database Systems A database is a collection of ...

Introduction to Database Systems A database is a collection of ...

Introduction to Database Systems A database is a collection of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Introduction</strong> <strong>to</strong> <strong>Database</strong> <strong>Systems</strong>A <strong>database</strong> <strong>is</strong> a <strong>collection</strong> <strong>of</strong> related data. It <strong>is</strong> a <strong>collection</strong> <strong>of</strong> information that ex<strong>is</strong>ts over along period <strong>of</strong> time, <strong>of</strong>ten many years.The common use <strong>of</strong> the term <strong>database</strong> usually refers <strong>to</strong> a <strong>collection</strong> <strong>of</strong> data that <strong>is</strong> managedby a <strong>database</strong> management system or DBMS.A <strong>database</strong> has the following implicit properties:‐A <strong>database</strong> represents some aspect <strong>of</strong> the real world, sometimes called a miniworld or theuniverse <strong>of</strong> d<strong>is</strong>course (UoD). Changes <strong>to</strong> the miniworld are reflected in the <strong>database</strong>.A <strong>database</strong> <strong>is</strong> a logically coherent <strong>collection</strong> <strong>of</strong> data with some inherent meaning.A <strong>database</strong> <strong>is</strong> designed, built, and populated with data for a specific purpose. It has anintended group <strong>of</strong> users and some applications in which the user <strong>is</strong> interested.A DBMS <strong>is</strong> a powerful <strong>to</strong>ol for creating and managing large amount <strong>of</strong> data efficiently andallowing it <strong>to</strong> pers<strong>is</strong>t over long period <strong>of</strong> time.A DBMS <strong>is</strong> a <strong>collection</strong> <strong>of</strong> programs that enable users <strong>to</strong> create and maintain a <strong>database</strong>.The DBMS <strong>is</strong> a general purpose s<strong>of</strong>tware system that facilitates the processes <strong>of</strong> defining,constructing, manipulating and sharing <strong>database</strong>s among various users and applications.The DBMS <strong>is</strong> expected <strong>to</strong>:Allow users <strong>to</strong> create new <strong>database</strong>s and specify their logical structure <strong>of</strong> data or schemas,using a specialized data definition language.Give users the ability <strong>to</strong> query the data, (a query <strong>is</strong> a question about the data) and modifythe data, using an appropriate language called the query language or data manipulationlanguage.Support the s<strong>to</strong>rage <strong>of</strong> very large amount <strong>of</strong> data over a long period <strong>of</strong> time, allowingefficient access <strong>to</strong> the data for queries and data modifications.Enable durability, the recovery <strong>of</strong> data in the face <strong>of</strong> failures, data incons<strong>is</strong>tency orintentional m<strong>is</strong>use.Control access <strong>to</strong> data from many users at once, without allowing unexpected interactionbetween users (<strong>is</strong>olation) or incomplete operations on data (a<strong>to</strong>micity).<strong>Database</strong> Catalog or Dictionary or meta­data: Defining a <strong>database</strong> involves specifyingthe data types, structures, and constraints <strong>of</strong> the data <strong>to</strong> be s<strong>to</strong>red in the <strong>database</strong>. The<strong>database</strong> definition <strong>is</strong> s<strong>to</strong>red by the DBMS in the form <strong>of</strong> a <strong>database</strong> catalog or dictionary; it<strong>is</strong> called meta‐data.Constructing the <strong>database</strong> <strong>is</strong> the process <strong>of</strong> s<strong>to</strong>ring the data based on the catalog usingsome s<strong>to</strong>rage device controlled by the DBMS.Manipulating a <strong>database</strong> includes functions such as querying the <strong>database</strong> <strong>to</strong> retrievespecific data, updating the <strong>database</strong> <strong>to</strong> reflect changes in the miniworld.Sharing a <strong>database</strong> allows multiple users or programs <strong>to</strong> access the <strong>database</strong>simultaneously.An application program accesses a <strong>database</strong> by sending queries or request for information<strong>to</strong> the DBMS. A query typically causes some data <strong>to</strong> be retrieved.


A transaction may cause some data <strong>to</strong> be read and some data <strong>to</strong> be written in<strong>to</strong> the<strong>database</strong>.Data S<strong>to</strong>red in Other ways:Besides s<strong>to</strong>ring data in <strong>database</strong>s, there are other ways <strong>of</strong> s<strong>to</strong>ring data. For example, worddocuments and excel spreadsheets, etc...Files:Using files such as text files or Word documents <strong>to</strong> s<strong>to</strong>re data has many advantagesespecially when the amount <strong>of</strong> data <strong>is</strong> small. Besides, there <strong>is</strong> no need <strong>to</strong> get a <strong>database</strong>system or learn how <strong>to</strong> use it. It <strong>is</strong> easy <strong>to</strong> write applications using standard programminglanguages.D<strong>is</strong>advantages <strong>of</strong> using files:Limited user interfaceAre not efficient in handling large amount <strong>of</strong> dataDo not provide support <strong>to</strong> specifying relationships between data itemsDo not support data integrity.Do not provide a query languageSpreadsheets:Advantages:Provide structured s<strong>to</strong>rage model (rows and columns)Support data types, Allow fields <strong>to</strong> be computed (based on other fields) and others.However they also suffer from most d<strong>is</strong>advantages l<strong>is</strong>ted for files.Early <strong>Database</strong> Management <strong>Systems</strong>:The first <strong>database</strong> management systems evolved from file systems and appeared in the late1960’s.File systems s<strong>to</strong>re data over a long period <strong>of</strong> time and they allow for s<strong>to</strong>rage <strong>of</strong> largeamount <strong>of</strong> data. However, they do not generally guarantee that data will not be lost andthey do not support efficient access <strong>to</strong> data when their location <strong>is</strong> not known.The first applications <strong>to</strong> use DBMS were ones where data was composed <strong>of</strong> many smallitems and many queries or modifications were required (banking, airline reservations,corporate record keeping, etc...)Nowadays, <strong>database</strong>s have evolved <strong>to</strong> be <strong>of</strong> any size and complexity: a simple address book<strong>of</strong> hundreds <strong>of</strong> records <strong>to</strong> millions <strong>of</strong> entries (IRS, Amazon)


Ieee Transactions On Parallel…II. RELATED WORKSFlooding and RW are two typical examples <strong>of</strong> blind search algorithms by which query messages aresent <strong>to</strong> neighbors without any knowledge about the possible locations <strong>of</strong> the queried resources or any preferencefor the directions <strong>to</strong> send. Some other blind search algorithms include modified BFS (MBFS) [23], directed BFS[6], expanding ring [17], and random periodical flooding (RPF) [24]. These algorithms try <strong>to</strong> modify theoperation <strong>of</strong> flooding <strong>to</strong> improve the efficiency. However, they still generate a large amount <strong>of</strong> query messages.Jiang et al. propose a LightFlood algorithm, which <strong>is</strong> a combination <strong>of</strong> the initial pure flooding and subsequenttree-based flooding [25], [26]. DS and LightFlood operate analogously, but DS avoids the extra cost <strong>to</strong> constructand maintain the treelike suboverlay.Knowledge-based search algorithms take advantage <strong>of</strong> the knowledgelearned from previous search results and route query messages with different weights based on the knowl-edge.Thus, each node could relay query messages more intelligently. Some examples are adaptive probabil<strong>is</strong>ticsearch (APS) [27], [28], biased RW [29], routing index (RI) [30], local indices [31], and intelligent search [32].APS builds the knowledge with respect <strong>to</strong> each file based on the past experiences. RI classifies each documentin<strong>to</strong> some thematic categories and forwards query messages more intelligently based on the categories. Theoperation <strong>of</strong> local indices <strong>is</strong> similar <strong>to</strong> that <strong>of</strong> super-peer networks. Each node collects the file indices <strong>of</strong> peerswithin its predefined radius. If a search request <strong>is</strong> out <strong>of</strong> a node’s knowledge, th<strong>is</strong> node would perform aflooding search. The intelligent search uses a function <strong>to</strong> compute the similarity between a search query andrecently answered requests. Nodes relay query messages based on the similarity. There are some other researchworks that focus on replicating a reference pointer <strong>to</strong> queried resources in order <strong>to</strong> improve the search time [33],[34].III. DYNAMIC SEARCH ALGORITHMIn th<strong>is</strong> section, we provide the details <strong>of</strong> the proposed DS algorithm. Section 3.1 presents the operation<strong>of</strong> DS algorithm, and Section 3.2 provides the mechan<strong>is</strong>m <strong>to</strong> combine DS with the knowledge-based searchalgorithms.3.1 Operation <strong>of</strong> Dynamic Search AlgorithmDS <strong>is</strong> designed as a generalization <strong>of</strong> flooding, MBFS, and RW. There are two phases in DS. Eachphase has a different searching strategy. The choice <strong>of</strong> search strategy at each phase depends on the relationshipbetween the hop count h <strong>of</strong> query messages and the dec<strong>is</strong>ion thresh-old n <strong>of</strong> DS.3.1.1 Phase 1. When h _ nAt th<strong>is</strong> phase, DS acts as flooding or MBFS. The number <strong>of</strong> neighbors that a query source sends thequery messages <strong>to</strong> depends on the predefined transm<strong>is</strong>sion probability p. If the link degree <strong>of</strong> th<strong>is</strong> query source<strong>is</strong> d, it would only send the query messages <strong>to</strong> d_p neighbors. When p <strong>is</strong> equal <strong>to</strong> 1, DS resembles flooding.Otherw<strong>is</strong>e, it operates as MBFS with the transm<strong>is</strong>sion probability p.3.1.2 Phase 2. When h > nAt th<strong>is</strong> phase, the search strategy switches <strong>to</strong> RW. Each node that receives the query message wouldsend the query message <strong>to</strong> one <strong>of</strong> its neighbors if it does not have the queried resource. Assume that the number<strong>of</strong> nodes v<strong>is</strong>ited by DS at hop h¼n <strong>is</strong> the coverage c n , and then the operation <strong>of</strong> DS at that time can be regardedas RW with c n walkers. However, there are some differences between DS and RW when we consider the wholeoperation. Consider the simple scenario shown in Fig. 1. Assume that the dec<strong>is</strong>ion threshold n <strong>is</strong> set as 2. Whenh >2, DS performs the same as RW with c 2 ¼12 walkers. Let us consider an RW search with K¼12 walkers. Atthe first hop, the walkers only v<strong>is</strong>it four nodes, but the cost <strong>is</strong> 12 messages.656IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 5, MAY2009||Issn 2250-3005 || ||July||2013|| Page 48


Insulation between Programs and Data and Data Abstraction:Data Abstraction:In traditional file processing, the structure <strong>of</strong> data files <strong>is</strong> embedded in the applicationprograms, so any changes <strong>to</strong> the file may require changing the programs that access thatfile. DBMS access programs do not require such changes in most cases. The structure <strong>of</strong>data files <strong>is</strong> s<strong>to</strong>red in the DBMS catalog separately from access programs. Th<strong>is</strong> <strong>is</strong> calledprogram­data independence (encapsulation).In some types <strong>of</strong> <strong>database</strong>s, such as OO (object‐oriented) and OR (object‐relational)systems, users can define functions or methods and only provide their pro<strong>to</strong>type orinterface (name, number and datatype <strong>of</strong> its arguments) <strong>to</strong> application programs that usethem without knowing the details <strong>of</strong> their implementation. Th<strong>is</strong> <strong>is</strong> termed as programoperationindependence (Abstraction).The DBMS provides users with a conceptual representation <strong>of</strong> the data that does notinclude how the data <strong>is</strong> s<strong>to</strong>red or how the operations are implemented.<strong>Database</strong> Views:A <strong>database</strong> typically has many users, each <strong>of</strong> whom requires a different perspective or view<strong>of</strong> the <strong>database</strong>. A view may be a subset <strong>of</strong> the <strong>database</strong> or it may contain virtual data that<strong>is</strong> derived from the <strong>database</strong> files but <strong>is</strong> not explicitly s<strong>to</strong>red. A multiuser DBMS whoseusers have a variety <strong>of</strong> d<strong>is</strong>tinct applications must provide the ability for defining multipleviews.Sharing <strong>of</strong> Data and Multiuser Transaction Processing:A multiuser DBMS must allow multiple users <strong>to</strong> access the <strong>database</strong> at the same time. It <strong>is</strong>essentialif data for multiple applications <strong>is</strong> <strong>to</strong> be integrated and maintained in the same<strong>database</strong>, that the DBMS includes concurrency control. Concurrency control ensures thatseveral users trying <strong>to</strong> update the same data at the same time do so in a controlled manner,so that the result <strong>of</strong> the updates <strong>is</strong> correct.A transaction <strong>is</strong> an executing program or process that includes one or more <strong>database</strong>accesses. Each transaction <strong>is</strong> supposed <strong>to</strong> execute a logically correct <strong>database</strong> access ifexecuted in its entirety without interference from other transactions.The <strong>is</strong>olation property ensures that each transaction appears <strong>to</strong> execute in <strong>is</strong>olation fromother transactions, even though many <strong>of</strong> them could be executing concurrently.The a<strong>to</strong>micity property ensures that either all the <strong>database</strong> operations are executed ornone are.Ac<strong>to</strong>rs:


<strong>Database</strong> Admin<strong>is</strong>tra<strong>to</strong>rs (DBA):In a <strong>database</strong> environment, the primary resource <strong>is</strong> the <strong>database</strong>, and the secondaryresource <strong>is</strong> the DBMS and related s<strong>of</strong>tware. The DBA <strong>is</strong> responsible for authorizing access<strong>to</strong> the <strong>database</strong>, coordinating and moni<strong>to</strong>ring its use and acquiring s<strong>of</strong>tware and hardwareresources as needed. The DBA <strong>is</strong> accountable for problems such as security breach, poorresponse time, etc…<strong>Database</strong> Designers:<strong>Database</strong> designers are responsible for identifying the data <strong>to</strong> be s<strong>to</strong>red in the <strong>database</strong> andfor choosing the appropriate structures <strong>to</strong> represent and s<strong>to</strong>re the data. These tasks areundertaken before the <strong>database</strong> <strong>is</strong> implemented and populated with data.It <strong>is</strong> the responsibility <strong>of</strong> <strong>database</strong> designers <strong>to</strong> communicate with the prospective users inorder <strong>to</strong> understand their needs and create a design that meets these requirements.<strong>Database</strong> designers typically interact with each potential group <strong>of</strong> users and develop views<strong>of</strong> the <strong>database</strong> that meet the data and processing requirement <strong>of</strong> these groups. The finalproduct must support the requirements <strong>of</strong> all user groups.End users:Casual users: Occasional access, but different needs each time. Middle or high level managers.Naïve or parametric users:Make up a sizable portion <strong>of</strong> <strong>database</strong> end users. Main function <strong>is</strong> <strong>to</strong> query and update the<strong>database</strong> using standard types <strong>of</strong> queries and updates or canned transactions.System Analysts: (S<strong>of</strong>tware Engineers)Determine the requirements <strong>of</strong> end users, especially naïve end users and developspecification for the canned transactions that meet these requirements. Applicationengineers implement these specifications as programs they test, debug and document andthen deploy them.Other workers:• DBMS system designers and implementers: A DBMS <strong>is</strong> a very complex s<strong>of</strong>tware systemthat cons<strong>is</strong>ts <strong>of</strong> many components or module, including implementation <strong>of</strong> the catalog,definition and implementation <strong>of</strong> query languages, interface processors, data access,concurrency control, etc... <strong>Systems</strong> designers and implementers design and implementthese component, they usually work for a s<strong>of</strong>tware company that sells the system withsome cus<strong>to</strong>mization <strong>to</strong> other non‐s<strong>of</strong>tware companies.• Tool developers: include persons who design and implement <strong>to</strong>ols‐ that facilitate<strong>database</strong> system design and use.


• Opera<strong>to</strong>rs and maintenance personnel: are system admin<strong>is</strong>trations personnel who areresponsible for the actual running and maintenance for the hardware and s<strong>of</strong>twareenvironment for the <strong>database</strong> system.Advantages <strong>of</strong> Using a DBMS:‐Controlling RedundancyRedundancy <strong>is</strong> s<strong>to</strong>ring the same data multiple times. In traditional s<strong>of</strong>tware development usingfile processing, each application usually maintains its own files for handling its data‐processingapplications. Th<strong>is</strong> can lead <strong>to</strong> several problems:‐ Duplication <strong>of</strong> efforts: A single logical update requires entering the same data severaltimes.‐ S<strong>to</strong>rage Waste: when the same data <strong>is</strong> s<strong>to</strong>red multiple times, there <strong>is</strong> a waste a space.‐ Data may become incons<strong>is</strong>tent: since updates can m<strong>is</strong>s one or more reposi<strong>to</strong>ry or datamay be entered differently from one file <strong>to</strong> the other.In the <strong>database</strong> approach, the views <strong>of</strong> different groups are integrated during <strong>database</strong>design.For cons<strong>is</strong>tency, we should have a <strong>database</strong> design that s<strong>to</strong>res each logical data item,such as a student’s name or birth date, ‐ in only one place in the <strong>database</strong>. Th<strong>is</strong> doessave space and does allow incons<strong>is</strong>tency.However, controlled redundancy maybe useful for improving the performance <strong>of</strong>queries, in that we do not have <strong>to</strong> search multiple files <strong>to</strong> retrieve information. In suchcases, the DBMS should have the capability <strong>to</strong> control th<strong>is</strong> redundancy so as <strong>to</strong> preventincons<strong>is</strong>tencies among files. Th<strong>is</strong> can be done by au<strong>to</strong>matically checking that a fieldvalue in a table should have the same value or match that <strong>of</strong> another field in a differenttable. Such checks are specified <strong>to</strong> the DBMS during the <strong>database</strong> definition and designand au<strong>to</strong>matically enforced by the DBMS whenever there <strong>is</strong> an update.Restricting Unauthorized Access:When multiple users share a <strong>database</strong>, it <strong>is</strong> likely that they will have different accessrights and authorization. The DBMS should provide a security and authorizationsubsystem, which the DBA uses <strong>to</strong> create accounts and <strong>to</strong> specify account restrictions.Providing Pers<strong>is</strong>tent S<strong>to</strong>rage for Program Objects and Data Structure:


An object <strong>is</strong> said <strong>to</strong> be pers<strong>is</strong>tent if it survives the termination <strong>of</strong> program execution andcan later be directly retrieved by another program. The pers<strong>is</strong>tent s<strong>to</strong>rage <strong>of</strong> programand data <strong>is</strong> an important functionality <strong>of</strong> <strong>database</strong> systems.Providing multiple user interfaces:Because many users have varying levels <strong>of</strong> technical knowledge use <strong>of</strong> the <strong>database</strong>, aDBMS should provide a variety <strong>of</strong> user interfaces. These include query languages forcasual users; programming language interfaces for applications programmers, forms andcommand code for parametric users and menu driven interfaces for natural languageinterfaces for stand‐alone users.Representing Complex Relationships among Data:A <strong>database</strong> may include numerous varieties <strong>of</strong> data. A DBMS must have the capability <strong>to</strong>represent a variety <strong>of</strong> complex relationship among the data as well as retrieve andupdate data easily and efficiently.Enforcing Integrity Constraints:Integrity constraints involve specifying conditions for data correctness. The simplest oneinvolves defining a data type for each data item. Others involve the length <strong>of</strong> a field or arange <strong>of</strong> values for a specific field.More complex kind <strong>of</strong> constraints involve the inter relations between fields in differenttables or uniqueness <strong>of</strong> data item values.Providing Backup and Recovery:A DBMS must provide facilities for recovering from hardware and s<strong>of</strong>tware failures. Thebackup and recovery subsystem <strong>of</strong> the DBMS <strong>is</strong> responsible for recovery.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!