03.01.2015 Views

a multi-objective bisexual reproduction genetic algorithm for ...

a multi-objective bisexual reproduction genetic algorithm for ...

a multi-objective bisexual reproduction genetic algorithm for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

COURSE SCHEDULING IN MULTIPLE FACULTIES USING<br />

A GRID COMPUTING ENVIRONMENT<br />

MR. NGUYEN CONG DANH<br />

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS<br />

FOR THE DEGREE OF MASTER OF SCIENCE (INFORMATION TECHNOLOGY)<br />

GRADUATE COLLEGE<br />

KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK<br />

ACADEMIC YEAR 2005<br />

ISBN 974-19-0543-2<br />

COPYRIGHT OF KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK


Name : Mr. Nguyen Cong Danh<br />

Thesis Title : Course Scheduling in Multiple Faculties Using a Grid<br />

Computing Environment<br />

Major Field : In<strong>for</strong>mation Technology<br />

King Mongkut’s Institute of Technology North Bangkok<br />

Thesis Advisor : Assistant Professor Dr. Yaowadee Temtanapat<br />

Academic Year : 2005<br />

Abstract<br />

Course scheduling <strong>for</strong> <strong>multi</strong>ple faculty universities is a large and complex<br />

problem. In these universities, each faculty desires to have its own timetable to use its<br />

resources. However, lecturers, courses, rooms and other resources can be shared<br />

between faculties. The data used <strong>for</strong> the course scheduling thus needs to be shared<br />

across the university. As a result, the constraint conflicts in the timetable can occur<br />

not only in each faculty but also across faculties. The course scheduling problem<br />

becomes more difficult to solve. This study proposes a hybrid centralized and decentralized<br />

approach <strong>for</strong> the course scheduling. The <strong>genetic</strong> <strong>algorithm</strong> and grid<br />

computing environment are used. The <strong>genetic</strong> <strong>algorithm</strong> is to solve the hard and soft<br />

constraints while grid computing environment is used as an infrastructure <strong>for</strong><br />

distributed and parallel computing. The results of this research indicated that the<br />

proposed system can solve most of required constraints and the grid computing can<br />

improve significantly computing per<strong>for</strong>mance of the whole system.<br />

(Total 145 pages)<br />

___________________________________________________________Chairperson<br />

ii


ชื่อ : นายฮูเยน ชอง แดน<br />

ชื่อวิทยานิพนธ : การจัดตารางสอนสําหรับมหาวิทยาลัยที่มีหลายคณะโดยใช<br />

สภาพแวดลอมการประมวลผลแบบกริด<br />

สาขาวิชา : เทคโนโลยีสารสนเทศ<br />

สถาบันเทคโนโลยีพระจอมเกลาพระนครเหนือ<br />

ที่ปรึกษาวิทยานิพนธ : ผูชวยศาสตราจารย ดร. เยาวดี เต็มธนาภัทร<br />

ปการศึกษา : 2548<br />

บทคัดยอ<br />

การจัดตารางสอนสําหรับมหาวิทยาลัยที่มีหลายคณะเปนปญหาที่ใหญและซับซอน ใน<br />

มหาวิทยาลัยเหลานี้ แตละคณะมีความตองการตารางสอนของตนเองโดยใชทรัพยากรที่ตนมีอยู<br />

อยางไรก็ตาม อาจารย วิชา หองและทรัพยากรอื่นก็ยังสามารถที่จะถูกใชงานรวมกันได ขอมูล<br />

สําหรับการจัดตารางสอนจึงจําเปนที่จะตองใชงานรวมกัน ผลก็คือไมใชเพียงจะเกิดความขัดแยงใน<br />

เรื่องของเงื่อนไขของตารางสอนภายในคณะที่ได แตยังรวมไปถึงความขัดแยงของเงื่อนไขที่จะเกิด<br />

ไดในระหวางแตละคณะดวย ทําใหปญหาการจัดตารางสอนในมหาวิทยาลัยเหลานี้จึงเพิ่มความ<br />

ยุงยากยิ่งขึ้นไปอีก ในการศึกษานี้เราจึงนําเสนอวิธีการที่เปนการผสมระหวางการจัดตารางสอน<br />

แบบรวมศูนยและแบบกระจาย โดยใชขั้นตอนวิธีแบบพันธุกรรมรวมกับสภาพแวดลอมการ<br />

ประมวลผลแบบกริด ขั้นตอนวิธีแบบพันธุกรรมใชในการแกปญหาของเงื่อนไขแบบไมผอนปรน<br />

(hard constraint) และแบบอาจผอนปรนไดบาง (soft constraint) สําหรับการประมวลผลใน<br />

สภาพแวดลอมแบบกริดใชเปนพื้นฐานสําหรับการประมวลผลแบบกระจายและแบบขนาน ผลลัพธ<br />

ของงานวิจัยชี้ใหเห็นวา ระบบที่นําเสนอสามารถแกปญหาของเงื่อนไขสวนใหญได และการ<br />

ประมวลผลแบบกริดสามารถเพิ่มประสิทธิภาพการประมวลผลของทั้งระบบไดอยางเห็นไดชัด<br />

(วิทยานิพนธมีจํานวนทั้งสิ้น 145 หนา)<br />

_______________________________ประธานกรรมการที่ปรึกษาวิทยานิพนธ<br />

iii


ACKNOWLEDGEMENTS<br />

First and <strong>for</strong>emost, I would like to thank Assistant Professor Dr. Yaowadee<br />

Temtanapat <strong>for</strong> her support and encouragement throughout my time at King<br />

Mongkut’s Institute of Technology North Bangkok (KMITNB). I deeply appreciate<br />

not only her intelligence, knowledge, and willingness to provide guidance <strong>for</strong> my<br />

thesis, but also her sense of humor and her enthusiasm.<br />

Grateful acknowledgements are addressed to Assistant Professor Dr. Utomporn<br />

Phalavonk, Assistant Professor Dr. Phayung Meesad, Dr. Gareth Clayton, and other<br />

members of the program committee <strong>for</strong> their valuable and constructive comments on<br />

this thesis.<br />

I wish to express my gratitude to all teachers, staffs at KMITNB <strong>for</strong> their<br />

knowledge, encouragement and support during my study.<br />

Thanks to my friends, graduate students, <strong>for</strong> their encouragement. They also<br />

made my time at KMITNB and Thailand an enjoyable experience.<br />

The most sincere thanks to my parents who have always been true believers and<br />

encouraged me in the past two years.<br />

Last but certainly not least, I am especially indebted to my scholarship provider<br />

“DTEC” <strong>for</strong> their financial support that gave me the opportunity to study at KMITNB.<br />

Nguyen Cong Danh<br />

iv


TABLE OF CONTENTS<br />

Page<br />

Abstract (in English)<br />

ii<br />

Abstract (in Thai)<br />

iii<br />

Acknowledgements<br />

iv<br />

List of Tables<br />

vii<br />

List of Figures<br />

viii<br />

Chapter 1. Introduction 1<br />

1.1 Problem Statement and Background 1<br />

1.2 The Objectives of the Study 3<br />

1.3 The Scope of the Study 3<br />

1.4 The Utilizations of the Study 5<br />

Chapter 2. Literature Review 7<br />

2.1 The Course Scheduling Problems 7<br />

2.2 The Related Works on Course Scheduling Problems 10<br />

2.3 Genetic Algorithms 19<br />

2.4 Grid Computing 24<br />

2.5 Summary 31<br />

Chapter 3. Methodology 33<br />

3.1 System Development 33<br />

3.2 Problem Definition 34<br />

3.3 The System Boundary 36<br />

3.4 The Proposed Course Scheduling System 37<br />

3.5 The Database Design 40<br />

3.6 The Proposed Genetic Algorithm 42<br />

3.7 The System <strong>for</strong> Experiment 53<br />

3.8 The Grid Components 54<br />

Chapter 4. Experimental Results 61<br />

4.1 The Data <strong>for</strong> the Experiments 61<br />

4.2 The Experiments and Discussions 66<br />

4.3 The Sample Results 74<br />

v


TABLE OF CONTENTS (CONTINUED)<br />

Page<br />

Chapter 5. Conclusion 79<br />

5.1 Conclusions 79<br />

5.2 Future Works 80<br />

References 81<br />

Appendix A 87<br />

Appendix B 95<br />

Appendix C 109<br />

Appendix D 119<br />

Appendix E 121<br />

Biography 145<br />

vi


LIST OF TABLES<br />

Table<br />

Page<br />

2-1 Courses taught by a department 8<br />

2-2 Teaching assignment 9<br />

2-3 Sample timetable 10<br />

2-4 Tentative list of tools <strong>for</strong> grid computing 27<br />

4-1 Courses fulfilled by each class 61<br />

4-2 Lecturer and classroom assignment 64<br />

4-3 Timetable created by the centralized scheduling program 74<br />

4-4 Timetable created by the decentralized scheduling program <strong>for</strong><br />

Faculty of Engineering 75<br />

4-5 Timetable created by the decentralized scheduling program <strong>for</strong><br />

Faculty of Science 76<br />

A-1 Faculty 88<br />

A-2 Department 88<br />

A-3 Lecturer 89<br />

A-4 Busy Time 89<br />

A-5 Building 90<br />

A-6 Classroom 90<br />

A-7 Classroom group 90<br />

A-8 Department controls classroom 91<br />

A-9 Course 91<br />

A-10 Program 92<br />

A-11 Curriculum 92<br />

A-12 Class 93<br />

A-13 Course section 93<br />

A-14 Timetable 94<br />

B-1 Host names, IP addressing, and software 97<br />

B-2 Group, user ID and password 98<br />

B-3 Distinguished name and passphrase 98<br />

vii


LIST OF FIGURES<br />

Figure<br />

Page<br />

1-1 Shared lecturers, courses, and classrooms 1<br />

1-2 Outline of the basic <strong>genetic</strong> <strong>algorithm</strong> 2<br />

1-3 Sample timetable <strong>for</strong> a classroom 4<br />

2-1 Graph of 12 events 11<br />

2-2 Graph after coloring 11<br />

2-3 Local optimal problem 13<br />

2-4 Simulated annealing <strong>algorithm</strong> 14<br />

2-5 Tabu search <strong>algorithm</strong> 16<br />

2-6 Multi agent system 19<br />

2-7 Encoding chromosome 20<br />

2-8 Example of crossover 21<br />

2-9 Example of mutation 21<br />

2-10 Roulette wheel selection 23<br />

2-11 Rank selection 24<br />

2-12 Application consists of jobs: B, C, D, and E executed in parallel 25<br />

2-13 Application consist of jobs that are networked 26<br />

2-14 Components of Globus Toolkit 2.2 28<br />

2-15 Simple LDAP configuration 28<br />

2-16 Grid components: a high-level perspective 29<br />

3-1 Shared classrooms in a <strong>multi</strong>ple faculty university 35<br />

3-2 Use case diagram of the course scheduling system 36<br />

3-3 Proposed system 38<br />

3-4 System architecture 39<br />

3-5 Entity relation diagram 41<br />

3-6 High level representation of the proposed <strong>genetic</strong> <strong>algorithm</strong> 42<br />

3-7 Sub-timetable of a classroom 43<br />

3-8 Chromosome 44<br />

3-9 Population 44<br />

viii


LIST OF FIGURES (CONTINUED)<br />

Figure<br />

Page<br />

3-10 Creating constraint data 45<br />

3-11 Algorithm <strong>for</strong> initializing a random population 45<br />

3-12 Pseudo code <strong>for</strong> creating a random chromosome 46<br />

3-13 Pseudo code <strong>for</strong> checking small classroom conflicts 47<br />

3-14 Pseudo code <strong>for</strong> checking lecturer’s busy time 47<br />

3-15 Pseudo code <strong>for</strong> detecting conflicts about preferable times 48<br />

3-16 Pseudo code <strong>for</strong> checking conflicts about double scheduled lecturers 48<br />

3-17 Pseudo code <strong>for</strong> checking conflicts about double scheduled classes 49<br />

3-18 Pseudo code <strong>for</strong> checking conflicts about double scheduled courses 49<br />

3-19 Crossover 50<br />

3-20 Pseudo code <strong>for</strong> crossover 51<br />

3-21 Mutation 52<br />

3-22 Pseudo code <strong>for</strong> mutating a chromosome 52<br />

3-23 Hardware and software <strong>for</strong> each machine 53<br />

3-24 MDS configuration 54<br />

3-25 Working with a broker 55<br />

3-26 Centralized scheduling 56<br />

3-27 Job scheduler <strong>for</strong> the grid computing environment 57<br />

3-28 Overview of GRAM and GASS 58<br />

4-1 The average fitness value of hard constraints vs various weights 67<br />

4-2 The average fitness value of soft constraints vs various weights 68<br />

4-3 The average execution time <strong>for</strong> a resultant solution vs population sizes 69<br />

4-4 The GA with various mutation rates 71<br />

4-5 The execution time versus various models 72<br />

4-6 Parallel execution versus serial execution 73<br />

C-1 Visual-grid-proxy-init 113<br />

C-2 Service configuration 115<br />

C-3 Result in the web browser 117<br />

ix


CHAPTER 1<br />

INTRODUCTION<br />

1.1 Problem Statement and Background<br />

1.1.1 Problem Statement<br />

Course scheduling problems are very common, but very difficult to solve in<br />

practice. They are known as constraint optimization problems, NP hard problems,<br />

these are concerned with the allocations, subject to constraints of given resources to<br />

objects in space and time in such a way as to satisfy a possible set of desirable<br />

<strong>objective</strong>s [1, 2, 3]. Courses will be scheduled to time and classrooms so that lecturers<br />

can teach and students can attend these courses without any conflicts. A large number<br />

of researches have been carried out on these problems [1, 2, 3]. However, most of the<br />

researches have focused on solving the problems of universities without the<br />

separation of resources between faculties. The course scheduling <strong>for</strong> a <strong>multi</strong>ple<br />

faculty university still needs more researches [4, 5].<br />

Faculty 1<br />

Lecturers Classrooms<br />

Courses Timetable<br />

Faculty 2<br />

Lecturers Classrooms<br />

Courses Timetable<br />

Shared lecturers, courses, and classrooms<br />

Faculty n<br />

Lecturers Classrooms<br />

Courses Timetable<br />

FIGURE 1-1 Shared lecturers, courses, and classrooms<br />

The course scheduling will become more complex in a <strong>multi</strong>ple faculty<br />

university where each faculty has its own resources such as lecturers, courses, and<br />

classrooms, as illustrated in Figure 1-1. Moreover, these resources can be shared<br />

between faculties. The lecturers working in a faculty can teach courses of other<br />

faculties. The courses can be attended by students who come from different faculties.


2<br />

The classrooms are sometime shared between faculties. Each faculty needs its own<br />

timetable <strong>for</strong> its own resources. As a result, many problems still exist in the course<br />

scheduling related to the shared resources.<br />

Course scheduling itself contains a large number of conflicts and needs a large<br />

amount of processing time. For course scheduling in the <strong>multi</strong>ple faculties, the data<br />

used <strong>for</strong> scheduling also needs to be collected and shared across the faculties. This<br />

study proposes a hybrid centralized and de-centralized approach, <strong>genetic</strong> <strong>algorithm</strong>,<br />

and grid computing environment to the course scheduling problem in <strong>multi</strong>ple faculty<br />

universities. The proposed approach and the <strong>genetic</strong> <strong>algorithm</strong> are used to solve the<br />

NP hard problems. In addition, the grid computing environment is used as<br />

infrastructure <strong>for</strong> distributed and parallel computing.<br />

1.1.2 Background<br />

The <strong>genetic</strong> <strong>algorithm</strong> (GA) is a global search optimization <strong>algorithm</strong> using<br />

parallel points. While searching <strong>for</strong> solutions, the GA uses a fitness function that<br />

affects the direction of the search [6]. The GA evaluates the population by using<br />

<strong>genetic</strong> operators such as selection, crossover, and mutation. The outline of the basic<br />

GA is presented in Figure 1-2.<br />

1 [Start] Generate random population of n chromosomes.<br />

2 [Fitness] Evaluate the fitness f(x) of each chromosome x in the population.<br />

3 [New population] Create a new population by repeating following steps until the new population is<br />

complete.<br />

3.1 [Selection] Select two parent chromosomes from a population according to their fitness (the better<br />

fitness, the bigger chance to be selected).<br />

3.2 [Crossover] With a crossover rate cross over the parents to <strong>for</strong>m new offspring (children). If no<br />

crossover was per<strong>for</strong>med, offspring is the exact copy of parents.<br />

3.3 [Mutation] With a mutation rate mutate new offspring at each locus (position in chromosome).<br />

3.4 [Accepting] Place new offspring in the new population.<br />

4 [Replace] Use new generated population <strong>for</strong> a further run of the <strong>algorithm</strong>.<br />

5 [Test] If the end condition is satisfied, stop, and return the best solution in current population.<br />

6 [Loop] Go to step 2.<br />

FIGURE 1-2 Outline of the basic <strong>genetic</strong> <strong>algorithm</strong> [6]


3<br />

The GA is based on the principle of survival of the fittest members of the<br />

population to produce the solution. The selected individual according to the fitness<br />

level of the problem domain creates the set of solutions. The GA is an iterative<br />

process that is repeated until the convergence criterion is satisfied.<br />

Grid computing, most simply stated, is distributed computing. The goal is to<br />

create the illusion of a simple yet large and powerful self-managing virtual computer<br />

out of a large collection of connected heterogeneous systems sharing various<br />

combinations of resources [7].<br />

Not all applications are suitable <strong>for</strong> the use of the grid computing. We need to<br />

look at considerations <strong>for</strong> an application to run in a grid environment where resources<br />

are dynamically allocated based on actual needs. Normally, an application consists of<br />

jobs that can be executed in parallel, serial, and networked. If an application consists<br />

of several jobs that can be executed in parallel, a grid may be very suitable <strong>for</strong><br />

effective execution on dedicated nodes, especially in the case when there is no or a<br />

very limited exchange of data among the jobs [8].<br />

1.2 The Objectives of the Study<br />

The <strong>objective</strong>s of this study can be defined as follows:<br />

1.2.1 To provide a system that helps <strong>multi</strong>ple faculty universities solve their<br />

course scheduling problems.<br />

1.2.2 To investigate the use of the proposed GA and the grid computing<br />

environment to the course scheduling problem in <strong>multi</strong>ple faculty universities.<br />

1.3 The Scope of the Study<br />

The scope of this study can be defined as follows:<br />

1.3.1 The system must satisfy the following hard constraints:<br />

1.3.1.1 Every course must be scheduled exactly once in a week.<br />

1.3.1.2 For courses at each faculty, values assigned to days in a week are<br />

Monday, Tuesday, Wednesday, Thursday, and Friday. In addition, 8 time-slots is used<br />

in a day. Hours are assigned to time-slots are 08:00-12:00 and 13:00-17:00. No<br />

course is scheduled cross morning and afternoon working sessions. Figure 1-3<br />

presents a sample timetable <strong>for</strong> a classroom.


4<br />

Classroom i<br />

Time-slot Hour Mon Tue Wed Thu Fri<br />

0 08:00-09:00 Course 1 Course 3 Course 15<br />

1 09:00-10:00 Course 1 Course 4 Course 3 Course 15<br />

2 10:00-11:00 Course 1 Course 4 Course 2 Course 15<br />

3 11:00-12:00 Course 2 Course 15<br />

4 13:00-14:00 Course 8 Course 5 Course 6 Course 7<br />

5 14:00-15:00 Course 8 Course 5 Course 6 Course 7<br />

6 15:00-16:00 Course 13 Course 5 Course 19 Course 7<br />

7 16:00-17:00 Course 13 Course 19 Course 7<br />

FIGURE 1-3 Sample timetable <strong>for</strong> a classroom<br />

1.3.1.3 Neither a class nor a lecturer nor a classroom is assigned to more<br />

than one course at the same time.<br />

1.3.1.4 Each course must be booked to a classroom that is large enough to<br />

hold students of that course.<br />

1.3.1.5 In each semester, each class of students studies from list of<br />

courses in the curriculum. All these courses have to be scheduled to different times in<br />

each week so that all students in that class can attend.<br />

1.3.1.6 If a course is attended by students who come from different<br />

classes, it has to be scheduled so that these students can attended this course and their<br />

other courses without any time conflicts.<br />

1.3.1.7 Each lecturer can teach courses in his/her faculty and other<br />

faculties.<br />

1.3.1.8 Lecturers can require some unavoidable working-sessions in a<br />

week. For instance, Dr. Tim cannot teach on Monday morning because of a weekly<br />

meeting. There<strong>for</strong>e, his courses must be scheduled at another time.<br />

1.3.1.9 Each course must be booked to a classroom of a designated<br />

classroom group.<br />

1.3.2 The system tries to satisfy as much as possible the following soft<br />

constraint:<br />

The system avoids booking lecturers’ courses to their undesired time.


5<br />

Unlike the hard constraint in section 1.3.1.8 that the system must satisfy it, the<br />

soft constraint will be satisfied as much as possible. Several conflicts of this soft<br />

constraint in the resultant solution are acceptable.<br />

All hard and soft constraints are applied to all timetables in all faculties.<br />

1.3.3 The Globus Toolkit 2.2 is used as middleware to implement the grid<br />

computing environment [7, 8].<br />

1.3.4 The efficiency of the proposed GA and the grid computing environment<br />

will be evaluated and discussed on the following.<br />

1.3.4.1 The suitability of the proposed GA against the hard constraints<br />

and soft constraints.<br />

1.3.4.2 Per<strong>for</strong>mance measurement of using the grid computing vs. not<br />

using grid computing.<br />

1.4 The Utilizations of the Study<br />

1.4.1 To provide a system that helps <strong>multi</strong>ple faculty universities to resolve their<br />

course scheduling problems.<br />

1.4.2 To investigate the efficiency of using a <strong>genetic</strong> <strong>algorithm</strong> and grid<br />

computing to the course scheduling problem in a <strong>multi</strong>ple faculty university.


CHAPTER 2<br />

LITERATURE REVIEW<br />

In this chapter, course scheduling problems, related works, <strong>genetic</strong> <strong>algorithm</strong>s,<br />

and grid computing are reviewed. Section 2.1 describes the activities that are to<br />

prepare data <strong>for</strong> the course scheduling. Section 2.2 describes the related works,<br />

including existing researches. Section 2.3 presents the basic knowledge about <strong>genetic</strong><br />

<strong>algorithm</strong>s. And finally, section 2.4 presents knowledge about grid computing and the<br />

Globus Toolkit 2.2.<br />

2.1 The Course Scheduling Problems<br />

Course scheduling is a part of a general scheduling problem. It deals with the<br />

satisfactory allocation of resources over time to achieve an organization’s tasks. It is a<br />

decision-making process with the intention of optimizing one or more <strong>objective</strong>s.<br />

In any optimization problem, there are <strong>objective</strong>s, decisions to make, available<br />

resources and related constraints. In the course scheduling problem, available<br />

resources are lecturers, students, courses, classrooms, and time periods. A solution<br />

must group these resources together to create a timetable that satisfies the constraints.<br />

There are two types of constraints: hard constraints and soft constraints. Hard<br />

constraints are conditions that must be satisfied, such as no two distinct courses can<br />

be held at the same time and the same classroom. Soft constraints, however, may be<br />

violated, but should be satisfied as much as possible, such as some lecturers dislike<br />

teaching at certain times.<br />

Course scheduling systems are usually quite varied at each university. This is<br />

based on a set of hard and soft constraints as well as requirements about the<br />

management at each university. This section introduces the activities needed <strong>for</strong> a<br />

basic course scheduling problem. A particular course scheduling system is introduced<br />

in detail in chapter 3.


8<br />

2.1.1 General Activities <strong>for</strong> Course Scheduling<br />

Each university usually has a central course scheduling office where<br />

experienced staffs are working. In each department of the faculties, several staffs also<br />

have similar responsibilities. The course scheduling activities will need the<br />

cooperation of all these staffs.<br />

2.1.2 The Activities of Staffs in Departments of Each Faculty<br />

Each department has the responsibilities of teaching many courses. To prepare<br />

the data <strong>for</strong> course scheduling, each department has to make a teaching plan. The<br />

departments have to know the list of courses and corresponding classes that will study<br />

these courses. The departments will make an assignment based on their own resources<br />

such as lecturers and classrooms. The resources that concern the lecturers are<br />

sometime subject to change. For instance, some lecturers are in training or feel bored<br />

if teaching the same course every semester. Some courses sometime need lecturers<br />

from other faculties. Table 2-1 shows an example of courses taught by a department.<br />

TABLE 2-1 Courses taught by a department<br />

Course Class Number of<br />

Students<br />

Section Lecturer Classroom<br />

Group<br />

CSC211 BSCS04A 30 <br />

CSC211 BSCS05B 35 <br />

CSC221 BSCS04A 30 <br />

CSC210 BSCS04A 30 <br />

CSC110 BSCS04A 30 <br />

CSC113 BSCS04A 30 <br />

CSC113 BSCS04B 35 <br />

In this case, a class is a group of students who study the same program and have<br />

the same enrolment year. A classroom group is a group of classrooms that have the<br />

same function. A course will be scheduled to a classroom of a designed classroom<br />

group. Of course, each department knows how many students will study a particular<br />

course. This helps the department separate the courses into a suitable number of<br />

sections. A section with too many students usually makes it difficult <strong>for</strong> a lecturer to


9<br />

teach effectively. However, in some cases, if the department does not have enough<br />

classrooms or lecturers, a section with a large number of students is acceptable.<br />

Finally, an assignment is created <strong>for</strong> each department, as shown in Table 2-2.<br />

TABLE 2-2 Teaching assignment<br />

Course Class Number of<br />

Students<br />

Section Lecturer Classroom<br />

Group<br />

CSC211 BSCS04A 30 1 00020 CSCCOMLB<br />

CSC211 BSCS05B 35 2 00020 CSCCOMLB<br />

CSC221 BSCS04A 30 1 00012 CSCLECRM<br />

CSC210 BSCS04A 30 1 00012 CSCLECRM<br />

CSC110 BSCS04A 30 1 00015 CSCLECRM<br />

CSC113 BSCS04A 30 1 00023 CSCCOMLB<br />

CSC113 BSCS04B 35 1 00023 CSCCOMLB<br />

In Table 2-2, course CSC211 is studied by two different classes: BSCS04A and<br />

BSCS05B, and it is divided into two distinct sections: 1 and 2. On the other hand,<br />

course CSC113 is also studied by two different classes: BSCS04A and BSCS05B, but<br />

both are mixed to study the same section. CSC211 and CSC113 use classrooms in<br />

group CSCCOMLB whereas CSC221, CSC210, and CSC110 use classrooms in group<br />

CSCLECRM.<br />

2.1.3 Activities of Staffs at the Central Course Scheduling Office<br />

After the central course scheduling office receives all data from the departments,<br />

they will run the course scheduling system to create a timetable. Booking sections of<br />

courses to time-slots in the timetable is a hard job. Its complexity depends on the<br />

complexity of the constraints and rules of each university. The Table 2-3 presents a<br />

sample timetable.<br />

The timetable has to satisfy the constraints. Lecturers who teach several sections<br />

have to be scheduled so that they can teach their sections without any time conflict.<br />

One classroom cannot hold more than one section at the same time. Once a class


10<br />

studies many different courses, these courses also have to be scheduled to different<br />

times. The other constraints are also satisfied.<br />

TABLE 2-3 Sample timetable<br />

Course Section Time Day Classroom Lecturer<br />

CSC211 1 13:00-16:00 W B304A01 00020<br />

CSC211 2 8:00-11:00 W B304A01 00020<br />

CSC221 1 10:00-12:00 T B304A05 00012<br />

CSC210 1 13:00-16:00 M B304A02 00012<br />

CSC110 1 9:00-12:00 F B304A02 00015<br />

CSC113 1 13:00-16:00 T B304A05 00023<br />

2.2 The Related Works on Course Scheduling Problems<br />

Course scheduling is a <strong>multi</strong>-dimensional NP-Complete problem that has<br />

generated hundreds of papers and thousands of researchers who have attempted to<br />

solve this problem. In this section, we discuss some of the primary approaches that<br />

have been applied to general course scheduling problems, scheduling <strong>for</strong> courses and<br />

exams. In practice, the main idea used <strong>for</strong> the course scheduling can be applied to<br />

exam scheduling and vice versa. The approaches can be divided into four groups:<br />

sequential methods, cluster methods, constraint based methods, and meta-heuristic<br />

methods [9].<br />

2.2.1 Sequential Methods<br />

Sequential methods order the events <strong>for</strong> scheduling using heuristics (often graph<br />

coloring heuristics). They assign the ordered events to valid time periods so that no<br />

events in the period are in conflict with each other, i.e. two events which require the<br />

same resource are not scheduled in the same time period [10].<br />

The graph coloring approach usually presents events as different vertices with<br />

an edge between the two vertices where two respective events conflict in some way.<br />

The graph coloring is the process of allocating different colors to each vertex so that<br />

no two adjacent (conflicting) vertices have the same color.


11<br />

The set of vertexes are considered as the set of classes and the edges<br />

corresponding to courses that conflict with each other. For instance, the courses are in<br />

conflict with each other if there is a student who must be in both courses at the same<br />

time. Then, coloring the graph is to assign courses to appropriate periods such that<br />

conflicts are avoided [11].<br />

FIGURE 2-1 Graph of 12 events<br />

The final result of coloring can be presented by a three color graph (denoted by<br />

three different shapes), shown in Figure 2-2.<br />

FIGURE 2-2 Graph after coloring<br />

This result means that the timetable may be constructed in three periods, one<br />

period per color. For larger timetables or graphs this is much less likely to be the case,<br />

since the graph coloring problem is NP-complete. Many researches used a heuristic<br />

<strong>algorithm</strong> to find a reasonable coloring if not an optimal one [12-13].


12<br />

2.2.2 Cluster methods<br />

Cluster methods split the set of events into groups which are conflict-free and<br />

then assign the groups to the time periods to fulfill the other constraints imposed on<br />

the scheduling problem [14]. This technique can also be applied to schedule courses<br />

or exams. The <strong>multi</strong>phase exam scheduling package described by Arani et al. consists<br />

of three phases [15]. In the first phase, clusters of exams are <strong>for</strong>med with the aim of<br />

minimizing the number of students with simultaneous exams. In the second phase,<br />

these clusters are assigned to exam days while minimizing the number of students<br />

with two or more exams per day. Finally the exam days and clusters are arranged to<br />

minimize the number of students with consecutive exams.<br />

The main drawback of these approaches is that the clusters of events are <strong>for</strong>med<br />

and fixed at the beginning of the <strong>algorithm</strong> and that may result in a poor quality<br />

timetable.<br />

2.2.3 Constraint Based Methods<br />

A constraint satisfaction problem (CSP) can be expressed in the following <strong>for</strong>m.<br />

Given a set of variables, a set of possible values that can be assigned to each variable,<br />

and a list of constraints, the CSP will find end values of the variables that satisfy<br />

every constraint. For example, given x = {x 1 , x 2 , x 3 }, possible values of x 1 , x 2 , and x 3<br />

in [0..100], find x 1 , x 2 , and x 3 so that they satisfy constraints: x 1 ≠ x 2 , 2x 1 =10x 2 + x 3 ,<br />

and x 1 x 2 < x 3 .<br />

Constraint based approaches model a course scheduling problem as a set of<br />

variables (i.e. courses) to which values (i.e. resources such as classrooms and time<br />

periods) have to be assigned to satisfy a number of constraints (i.e. classroom sizes<br />

and contiguous periods) [16-18].<br />

Constraint Logic Programming (CLP) is usually used <strong>for</strong> CSP. A labeling<br />

strategy dictates the order in which the search space is traversed, which is vital <strong>for</strong> an<br />

effective search. There are two orderings. The first order in which the variables are<br />

instantiated (i.e. courses placed), and the second order in which the values (i.e. times<br />

and classrooms) are assigned. Programming languages such as PROLOG, LISP, C,<br />

and C++ can be used to CLP.


13<br />

Gueret et al. have implemented a lecture scheduling system in CHIP called<br />

FELIAC [19]. CHIP is a Constraint Logic Programming language based on Prolog,<br />

which provides several types of constraints. CHIP’s new “cumulative” constraints<br />

limit the amount of a resource which can be used at any time, and Gueret et al. uses<br />

this to implement the classroom capacity constraint. Longest courses are scheduled<br />

first in the day which has the shortest total length of clashing lectures. Relaxation of<br />

constraints is essential <strong>for</strong> highly constrained CSPs of the course scheduling. (A<br />

problem in which constraints may be relaxed is called a dynamic CSP.) For each<br />

failed assignment, FELIAC stores a “justification”, which identifies the constraints<br />

which the assignment violated. These justifications are used to undo the effects of a<br />

constraint when it is relaxed.<br />

Using the CLP <strong>for</strong> the course scheduling usually brings advantages such as<br />

short programs and fast execution time.<br />

2.2.4 Meta-heuristic Methods<br />

Over the last two decades a variety of meta-heuristic approaches such as<br />

simulated annealing, tabu search, <strong>genetic</strong> <strong>algorithm</strong>s, and hybrid approaches have<br />

been investigated <strong>for</strong> the course scheduling problem. Meta-heuristic methods begin<br />

with one or more initial solutions and employ search strategies that try to avoid local<br />

optima. All of these search <strong>algorithm</strong>s can produce high quality solutions but often<br />

have a considerable computational cost [20-25].<br />

FIGURE 2-3 Local optimal problem


14<br />

2.2.4.1 Simulated Annealing<br />

Simulated annealing (SA) is a Monte-Carlo technique which can be used to find<br />

solutions <strong>for</strong> optimization problems. The technique simulates the cooling of a<br />

collection of hot vibrating atoms.<br />

The approach comprises of the following:<br />

• A cost function E that associates Energy with the state of the system.<br />

• A ''temperature'' T that decreases slowly<br />

• Various ways to change the state of the system.<br />

Figure 2-4 presents the SA <strong>algorithm</strong>.<br />

1. Generate an initial timetable s.<br />

2. Set the initial best timetable s* = s.<br />

3. Compute cost of s: C(s).<br />

4. Compute initial temperature T 0 .<br />

5. Set the temperature T = T 0 .<br />

6. While stop criterion is not satisfied do:<br />

a. Repeat Markov chain length (M) times:<br />

i. Select a random neighbor s’ to the cu rrent timetable, (s’ Ns).<br />

ii. Set Δ(C) = C(s’) − C(s).<br />

iii. If (Δ(C) > 0 {downhill move}):<br />

• Set s = s’.<br />

• If C(s) < C(s*) then set s* = s.<br />

iv. If (Δ(C)<br />

> 0 {uphill move}):<br />

• Choose a random number r uni<strong>for</strong>mly from [0; 1].<br />

• If r < e −Δ (C)/T then set s = s’<br />

b. Reduce (or update) temperature T.<br />

7. Return the timetable s*.<br />

FIGURE 2-4 Simulated annealing <strong>algorithm</strong><br />

The temperature would increase the cost by Δ(C). Also, s is the current schedule<br />

and s’ is a neighboring schedule obtained from the current neighborhood space (Ns)<br />

by swapping two courses in time and/or space.


15<br />

When the atoms are at a high temperature they are free to move around, and<br />

tend to move with random displacements. However, as the mass cools the interparticle<br />

bonds <strong>for</strong>ce the atoms together. When the mass is cool, no movement is<br />

possible, and the configuration is frozen. If the mass is cooled quickly then chance of<br />

obtaining a low cost solution is lower than if it is cooled slowly (or annealed). At any<br />

given temperature a new configuration of atoms is accepted if the system energy is<br />

lowered. However, if the energy is higher, then the configuration is accepted only if<br />

the probability of such an increase is lower than that expected at the given<br />

temperature [26-27].<br />

The SA <strong>algorithm</strong> has both advantages and disadvantages compared to other<br />

global optimization techniques. It is an extremely popular method and appears<br />

competitive with many of the best heuristics in solving large problems such as course<br />

scheduling, job scheduling, etc. However, it has two drawbacks: one being trapped by<br />

local minima or two taking too long to find a reasonable solution. In order to<br />

overcome these drawbacks, many recent researches combine using SA with other<br />

heuristics such as the <strong>genetic</strong> <strong>algorithm</strong>s or implemented SA as parallel <strong>algorithm</strong>s.<br />

The main aim is to avoid local minima traps and/or to have faster convergence [28-<br />

29].<br />

2.2.4.2 Tabu Search<br />

Tabu search is a meta-heuristic that guides a local heuristic search procedure to<br />

explore the solution space beyond local optimality. Tabu search has been applied<br />

successfully in a number of combinatorial optimization problems, in particular course<br />

scheduling [30-31].<br />

The basic concept of tabu search as described by Glover is as: “A meta-heuristic<br />

superimposed on another heuristic. The overall approach is to avoid entrainment in<br />

cycles by <strong>for</strong>bidding or penalizing moves which take the solution, in the next iteration,<br />

to points in the solution space previously visited (“tabu”)” [32].<br />

Tabu Search is a typical local search that explores its neighborhood <strong>for</strong> a<br />

trans<strong>for</strong>med solution (s’) that can be obtained by a simple local change. Each time<br />

that a solution is entered is known as a move. In simple cases, every move is added<br />

into a tabu list that remembers the N recent moves taken, where N is the size of the<br />

tabu list. A tabu list acts as a short-term memory (like a first in first out) that


16<br />

remembers the N recent moves. Any new move that is already in the tabu list is<br />

avoided, that is, a tabu. This approach prevents the recently tried movements and<br />

prevents the search from cycling round the local optimal area thus driving the search<br />

towards a different direction in the search space, resulting in better opportunity<br />

towards global optimal.<br />

The decision to move to a trans<strong>for</strong>med solution state is usually based on the<br />

steepest descent or mildest ascent in the <strong>objective</strong> function value. With this strategy, a<br />

heuristic accepts a marginal and temporary deterioration in its <strong>objective</strong> function<br />

value in exchange <strong>for</strong> opportunities to escape from a local optimal and move towards<br />

the global optimal, as illustrated in Figure 2-3. Figure 2-5 presents the tabu search<br />

<strong>algorithm</strong>.<br />

1. Generate an initially random but feasible solution s.<br />

2. Repeat:<br />

i. Attempt to find an improved feasible solution s' with the <strong>objective</strong> function<br />

value z(s'), avoid using moves already stored in the tabu list.<br />

ii. Compute the moves from s to s’.<br />

iii. Update tabu list by adding the latest move so that it is set as a tabu <strong>for</strong> some subsequent<br />

moves.<br />

iv. If z(s') < z(s) + (mildest ascent tolerance) then<br />

per<strong>for</strong>m exchanges: s := s', z(s) := z(s')<br />

End if<br />

Until (no improved solution is found) or (stopping criteria is met)<br />

FIGURE 2-5 Tabu search <strong>algorithm</strong><br />

Result z(s') is the best estimated minimum, it does not guarantee to find the<br />

global minimum but stands a better chance as compared to gradient descent approach.<br />

2.2.4.3 Genetic Algorithms<br />

The idea of <strong>genetic</strong> <strong>algorithm</strong>s is based on the evolutionary principle developed<br />

by Darwin [6]. A “population” of feasible timetables is maintained. The “fittest”<br />

timetables are selected to <strong>for</strong>m the basis of the next iteration, or “generation”, thus<br />

improving the overall fitness whilst maintaining diversity.


17<br />

The outline of the basic <strong>genetic</strong> <strong>algorithm</strong> is presented in section 1.1.2.<br />

At present, a large number of researches have used the GAs <strong>for</strong> course<br />

scheduling. The difference of the proposed GAs depends on representing<br />

chromosomes and populations, setting up GAs parameters (population size, crossover<br />

rate, and mutation rate), designing strategies in selection, crossover, and mutation, and<br />

evaluating the fitness function.<br />

The chromosome represents a timetable that is a solution. It can be represented<br />

directly or indirectly. In the <strong>for</strong>mer, the timetable is usually a long bit string of<br />

encoding, that stands <strong>for</strong> when and where each course takes place [33]. Thus, pairs of<br />

selected timetables may be “crossed over” by cutting and splicing the bit strings to<br />

create a new timetable. On the other hand, in the later, the timetable can be<br />

represented by using a data structure such as a <strong>multi</strong>-dimension array or a linked list.<br />

The indirect representation brings the advantage of processing time and simple GA<br />

operations. However, it needs complex processing to exchange and maintain<br />

constraints between the bit string and real timetable. In contrast, the direct<br />

representation needs more processing time <strong>for</strong> GA operations, but it is easy to<br />

maintain a large number of constraints <strong>for</strong> a real timetable. More details of the GAs<br />

will be presented in section 2.3.<br />

2.2.4.4 Hybrid Approaches<br />

The above approaches have been proved that they can create good solutions <strong>for</strong><br />

course scheduling problems. However, as above mentioned, they usually need a long<br />

computational time. In order to overcome this problem, many researchers have used<br />

hybrid approaches.<br />

Tuan et al. have successfully combined constraint programming and simulated<br />

annealing <strong>for</strong> the problem of exam scheduling with real data sets [34]. The proposed<br />

<strong>algorithm</strong> consists of two phases. A constraint programming phase is to provide an<br />

initial solution. This solution is improved by the simulated annealing phase. Tuan et<br />

al. have applied Kempe chain as neighborhood structure, a special technique <strong>for</strong><br />

determining starting temperature T 0 and a mechanism that allows the user to define a<br />

certain period of time in which the <strong>algorithm</strong> should run. The mentioned mechanism<br />

not only helps to increase the efficiency of the SA <strong>algorithm</strong> but also makes simulated<br />

annealing experiments easier.


18<br />

Alkan et al. have developed a Memetic Algorithms (MAs) by combining GAs<br />

and local search techniques, hill climbing [1]. This approach has achieved good<br />

computational per<strong>for</strong>mance. The idea behind hill climbing approach is to create a hill<br />

climbing method <strong>for</strong> each type of constraint and combine them under a single hill<br />

climbing method, denoted as AHC. Starting from a high resolution, select a constraint<br />

type based hill climbing method by using a selection method, giving a higher chance<br />

to an operator of the related constraint type causing more violations. There are 3<br />

improvement strategies. First of all, invoke the selected operator <strong>for</strong> the related type<br />

of constraints, producing a new individual. Second, if this attempt does not make any<br />

improvement on the old one, ignore the new individual. Depending on the constraint<br />

type, a selected block of genes, possibly causing more violations among the other<br />

blocks, are attempted to be corrected. Finally, if this attempt also fails to produce a<br />

better individual, then using the old one, a selected single gene in a block of genes,<br />

possibly causing more violations, is attempted to be corrected. If the fitness of an<br />

individual improves in any case, AHC is reapplied on it.<br />

Some other researchers have also used distributed and parallel computing<br />

models <strong>for</strong> course scheduling problem. One of them is the Multi Agent System model,<br />

which has mentioned to problems that are similar to our study.<br />

The Multi Agent System (MAS) model has been introduced to the course<br />

scheduling problem by Kaplansky et al. [35]. The architecture is composed of a set of<br />

autonomous scheduling agents (SAis) that solve the course scheduling <strong>for</strong> each<br />

department. Each agent has its own course scheduling problem and its own goals. The<br />

scheduling agents must coordinate these goals with the other agents in order to<br />

achieve a solution <strong>for</strong> the whole organization that yields a better result with respect to<br />

the global targets. To achieve a coherent and consistent global solution, the SAs make<br />

use of a sophisticated negotiation protocol among scheduling agents that always ends<br />

in an agreement (not ensured to be optimal). The main functionalities of this protocol<br />

are agent to agent relation definition, a mechanism to approve a chain of request <strong>for</strong><br />

changes (RfC) and an electronic marketplace <strong>for</strong> bidding on preferred common timeslots.


19<br />

As shown in Figure 2-6, first of all, the scheduling agents conduct negotiation<br />

<strong>for</strong> global timetable. Next, the room agent (RA) adds new constraints to the SAis. The<br />

SAis solve the modified problem and send back a new timetable.<br />

FIGURE 2-6 Multi agent system<br />

2.3 Genetic Algorithms<br />

The <strong>genetic</strong> <strong>algorithm</strong>s are inspired by Darwin's theory of evolution. Simply<br />

said, problems are solved by an evolutionary process resulting in a best (fittest)<br />

solution - in other words, the solution is evolved.<br />

Algorithm begins with a set of solutions (represented by chromosomes) called<br />

population. Solutions from one population are taken and used to <strong>for</strong>m a new<br />

population. This is motivated by a hope, that the new population will be better than<br />

the old one. Solutions which are then selected to <strong>for</strong>m new solutions (offspring) are<br />

selected according to their fitness - the more suitable they are the more chances they<br />

have to reproduce [6].<br />

The outline of the basic <strong>genetic</strong> <strong>algorithm</strong> is presented in section 1.1.2.<br />

2.3.1 Biological Background<br />

2.3.1.1 Chromosome<br />

All living organisms consist of cells. In each cell there is the same set of<br />

chromosomes. The chromosomes are strings of DNA and serve as a model <strong>for</strong> the<br />

whole organism. A chromosome consists of genes, blocks of DNA. Each gene<br />

encodes a particular protein. Basically, it can be said that each gene encodes a trait,<br />

<strong>for</strong> example color of eyes. Possible settings <strong>for</strong> a trait (e.g. blue, brown) are called<br />

alleles. Each gene has its own position in the chromosome. This position is called<br />

locus.


20<br />

Complete set of <strong>genetic</strong> material (all chromosomes) is called genome. Particular<br />

set of genes in genome is called a genotype. The genotype with later development<br />

after birth is the base <strong>for</strong> the organism's phenotype, its physical and mental<br />

characteristics, such as eye color, intelligence, etc.<br />

2.3.1.2 Reproduction<br />

During <strong>reproduction</strong>, recombination (or crossover) first occurs. Genes from<br />

parents combine to <strong>for</strong>m a whole new chromosome. The newly created offspring can<br />

then be mutated. Mutation means that the elements of DNA are a bit changed. These<br />

changes are mainly caused by errors in copying genes from parents.<br />

The fitness of an organism is measured by success of the organism in its life<br />

(survival).<br />

2.3.2 Operators of GA<br />

As presented in the outline of the basic <strong>genetic</strong> <strong>algorithm</strong>, the crossover and<br />

mutation are the most important parts of the <strong>genetic</strong> <strong>algorithm</strong>. The per<strong>for</strong>mance is<br />

influenced mainly by these two operators. Be<strong>for</strong>e we can explain more about<br />

crossover and mutation, more in<strong>for</strong>mation on chromosomes will be outlined.<br />

A chromosome should in some way contain in<strong>for</strong>mation about the solution that<br />

it represents. The most common way of encoding is a binary string, as shown in<br />

Figure 2-7.<br />

Chromosome 1 1101100100110110<br />

Chromosome 2 1101111000011110<br />

FIGURE 2-7 Encoding chromosome<br />

Each chromosome is represented by a binary string. Each bit in the string can<br />

represent some characteristics of the solution. Another possibility is that the whole<br />

string can represent a number. Of course, there are many other ways of encoding. The<br />

encoding depends mainly on the solved problem. For example, one can encode<br />

directly integer or real numbers. Sometimes it is useful to encode some permutations<br />

and so on.


21<br />

2.3.2.1 Crossover<br />

After we have decided what encoding we will use, we can proceed to crossover<br />

operation. Crossover operates on selected genes from parent chromosomes and<br />

creates a new offspring. The simplest way of doing that is to choose at random some<br />

crossover point and copy everything be<strong>for</strong>e this point from the first parent and then<br />

copy everything after the crossover point from the other parent.<br />

Crossover can be illustrated as in Figure 2-8 (| is the crossover point).<br />

Chromosome 1 11011 | 00100110110<br />

Chromosome 2 11011 | 11000011110<br />

Offspring 1 11011 | 11000011110<br />

Offspring 2 11011 | 00100110110<br />

FIGURE 2-8 Example of crossover<br />

There are other ways to make a crossover. For example, we can choose more<br />

crossover points. Crossover can be quite complicated and depends mainly on the<br />

encoding of chromosomes. A specific crossover made <strong>for</strong> a specific problem can<br />

improve the per<strong>for</strong>mance of the <strong>genetic</strong> <strong>algorithm</strong>.<br />

2.3.2.2 Mutation<br />

After a crossover is per<strong>for</strong>med, mutation takes place. Mutation is intended to<br />

prevent falling of all solutions in the population into a local optimum of the solved<br />

problem. Mutation operation randomly changes the offspring resulted from crossover.<br />

In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or<br />

from 0 to 1. Mutation can be then illustrated as in Figure 2-9.<br />

Original offspring 1 1101111000011110<br />

Original offspring 2 1101100100110110<br />

Mutated offspring 1 1100111000011110<br />

Mutated offspring 2 1101101100110110<br />

FIGURE 2-9 Example of mutation


22<br />

The technique of mutation (as well as crossover) depends mainly on the<br />

encoding of chromosomes. For example, when we are encoding permutations,<br />

mutation could be per<strong>for</strong>med as an exchange of two genes.<br />

2.3.3 Parameters of GA<br />

2.3.3.1 Crossover and Mutation Rate<br />

There are two basic parameters of a GA: crossover rate and mutation rate.<br />

The crossover rate describes how often a crossover will be per<strong>for</strong>med. If there is<br />

no crossover, offspring are exact copies of parents. If there is crossover, offspring are<br />

made from parts of both parent's chromosome. If crossover rate is 100%, then all<br />

offspring are made by crossover. If it is 0%, whole new generation is made from exact<br />

copies of chromosomes from the old population. Crossover is made in hope that new<br />

chromosomes will contain good parts of old chromosomes and there<strong>for</strong>e the new<br />

chromosomes will be better. However, it is good to leave some part of old population<br />

to survive to next generation.<br />

The mutation rate describes how often parts of chromosome will be mutated. If<br />

there is no mutation, offspring are generated immediately after crossover (or directly<br />

copied) without any change. If mutation is per<strong>for</strong>med, one or more parts of a<br />

chromosome are changed. If mutation rate is 100%, whole chromosome is changed, if<br />

it is 0%, nothing is changed. Mutation generally prevents the GA from falling into<br />

local extremes. Mutation should not occur very often because the GA will in fact<br />

change to random search.<br />

2.3.3.2 Other Parameters<br />

One another important parameter is population size. Population size describes<br />

how many chromosomes are in a population. If there are too few chromosomes, the<br />

GA has few possibilities to per<strong>for</strong>m crossover and only a small part of search space is<br />

explored. On the other hand, if there are too many chromosomes, the GA slows down.<br />

Research shows that after some limit (which depends mainly on encoding and the<br />

problem) it is not useful to use very large populations because it does not solve the<br />

problem faster than moderate sized populations.<br />

2.3.4 Methods of Selection<br />

As presented in the outline of the basic <strong>genetic</strong> <strong>algorithm</strong>, chromosomes are<br />

selected from the population to be parents <strong>for</strong> crossover. The problem is how to select


23<br />

these chromosomes. According to Darwin's theory of evolution the best ones survive<br />

to create new offspring. There are many different methods which a GA can use to<br />

select the chromosomes to be copied over into the next generation, but listed below<br />

are some of the most common methods.<br />

2.3.4.1 Roulette Wheel Selection<br />

Parents are selected according to their fitness. The better the chromosomes are,<br />

the more chances to be selected they have. Imagine a roulette wheel where all the<br />

chromosomes in the population are placed. The size of the section in the roulette<br />

wheel is proportional to the value of the fitness function of every chromosome - the<br />

bigger the value is, the larger the section is. Figure 2-10 shows an example.<br />

Chromosome 4<br />

Chromosome 3<br />

Chromosome 2<br />

Chromosome 1<br />

FIGURE 2-10 Roulette wheel selection<br />

A marble is thrown on the roulette wheel and the chromosome where it stops is<br />

selected. Clearly, the chromosomes with bigger fitness value will be selected more<br />

times.<br />

2.3.4.2 Rank Selection<br />

The previous type of selection has problems when there are big differences<br />

between the fitness values. For example, if the best chromosome fitness is 90% of the<br />

sum of all fitness then the other chromosomes will have very few chances to be<br />

selected.<br />

Rank selection ranks the population first and then every chromosome receives<br />

fitness value determined by this ranking, as shown in Figure 2-11. The worst will<br />

have the fitness 1, the second worst 2, etc, and the best will have fitness N.<br />

Now all the chromosomes have a chance to be selected. However this method<br />

can lead to slower convergence, because the best chromosomes do not differ so much<br />

from others.


24<br />

Chromosome 4<br />

Chromosome 3<br />

Chromosome 2<br />

Chromosome 1<br />

Rank Chromosome<br />

1 Chromosome 1<br />

2 Chromosome 2<br />

3 Chromosome 4<br />

4 Chromosome 3<br />

FIGURE 2-11 Rank selection<br />

2.3.4.3 Steady-State Selection<br />

The steady-state selection works in the following way. In every generation a<br />

few good (with higher fitness) chromosomes are selected <strong>for</strong> creating new offspring.<br />

Then some bad (with lower fitness) chromosomes are removed and the new offspring<br />

is placed in their place. The rest of population survives to new generation.<br />

2.3.4.4 Tournament selection<br />

Subgroups of chromosomes are chosen from a larger population, and members<br />

of each subgroup compete against each other. Only one chromosome from each<br />

subgroup is chosen to reproduce [36].<br />

2.3.4.5 Elitism Selection<br />

Elitism is the name of the method that first copies the best chromosome (or few<br />

best chromosomes) to the new population. The rest of the population can be<br />

constructed in the methods described above. Elitism can rapidly increase the<br />

per<strong>for</strong>mance of the GA, because it prevents a loss of the best found solution.<br />

2.4 Grid Computing<br />

Grid computing is a method <strong>for</strong> sharing computing and data resources. The grid<br />

computing is used <strong>for</strong> distributed systems that shares resources over a local or wide<br />

area network. The specific focus, that underlies grid computing, is coordinated<br />

resource sharing in a <strong>multi</strong>-institutional environment [7-8]. It attempts to combine all<br />

types of resources, including supercomputers and clusters of machines, from <strong>multi</strong>ple<br />

institutions, into a resource that is more powerful than any single resource.<br />

This section will introduce grid computing in the following topics: the<br />

application considerations, the Globus Toolkit, the Globus Toolkit 2.2 and the grid<br />

components.


25<br />

2.4.1 Application Considerations<br />

If an application consists of several jobs that can all be executed in parallel, a<br />

grid may be very suitable <strong>for</strong> effective execution on dedicated nodes, especially in the<br />

case when there is no or a very limited exchange of data among the jobs.<br />

From an initial job, a number of jobs are launched to execute on pre-selected or<br />

dynamically assigned nodes within the grid. Each job may receive a discrete set of<br />

data, and fulfills its computational task independently and delivers its output. The<br />

output is collected by a final job or stored in a defined data store, as shown in Figure<br />

2-12.<br />

FIGURE 2-12 Application consists of jobs: B, C, D, and E executed in parallel<br />

Many other applications can consist of jobs are executable in parallel, but there<br />

are interdependences between them. For example, shown in Figure 2-13, jobs B and C<br />

can be launched simultaneously, but they heavily exchange data with each other. Job<br />

F cannot be launched be<strong>for</strong>e B and C have completed, whereas job E or D can be<br />

launched upon completion of B or C respectively. Finally, job G finally collects all<br />

output from the jobs D, E, and F, and its termination and results then represent the<br />

completion of the grid application.<br />

For such applications, a possible approach is to do more analysis to determine<br />

how best to split the application into individual jobs, maximizing parallelism. It also<br />

adds more dependencies on the grid infrastructure services such as schedulers and<br />

brokers, but once that infrastructure is in place, the application can benefit from the<br />

flexibility and utilization of the virtual computing environment. The use of a job flow


26<br />

management service not only can handle the synchronization of the individual results,<br />

but also can create a loose coupling between the jobs to avoid high inter-process<br />

communication and reduces the overheads in the grid [37].<br />

FIGURE 2-13 Application consists of jobs that are networked<br />

2.4.2 The Globus Toolkit<br />

In the most general case, grid resources are supposed to be geographically<br />

distributed and to be owned by different organizations, each with proprietary policies<br />

regarding security, resource allocation, plat<strong>for</strong>m maintenance, and so on. Such an<br />

environment depends strongly upon the construction of a robust infrastructure of<br />

fundamental services, able to smooth out mismatches between different machines,<br />

security policies, scheduling policies, operating systems, and plat<strong>for</strong>ms. Besides this,<br />

resource sharing must be highly controlled, with resource providers and consumers<br />

clearly defining what is shared, who is allowed to share, and the conditions under<br />

which sharing occurs. Furthermore, access to resources has to be carefully scheduled<br />

in order to extract the maximum per<strong>for</strong>mance from the available resources, and<br />

applications should have the possibility of tailoring their behavior dynamically, in<br />

order to cope with resource failure, a highly probable event in such a variegated<br />

context.<br />

All these requirements can be summarized by the need to allow transparent<br />

access to resources, as if they belonged to a single, unified “metacomputer.” There are<br />

many grid projects worldwide aimed at achieving this ambitious goal, shown in Table<br />

2-4. Globus Toolkit is one of the most promising: it is rapidly becoming the de facto<br />

standard grid middleware [39]. Globus Toolkit is a joint initiative of the University of


27<br />

Southern Cali<strong>for</strong>nia, the Argonne National Lab, and the University of Chicago. It<br />

provides an open-source set of services addressing fundamental grid issues, such as<br />

security, in<strong>for</strong>mation discovery, resource management, data management, and<br />

communication. Due to its flexibility and high interoperability with the most<br />

widespread technologies used <strong>for</strong> distributed and parallel computing, Globus Toolkit<br />

has been chosen <strong>for</strong> our problem.<br />

TABLE 2-4 Tentative list of tools <strong>for</strong> grid computing [37]<br />

A bag of services giving basic software infrastructure <strong>for</strong> grid<br />

GLOBUS development: http://www.glohus.org<br />

LEGION<br />

An object-based project at the University of Virginia:<br />

http:/ilegion.virginia.edu<br />

UNICORE<br />

The UNi<strong>for</strong>m Interface to COmputing Resources is a European<br />

grid computing ef<strong>for</strong>t: http://www .unicore.org<br />

NETSOLVE<br />

A client/server system oriented to solve computational science<br />

problems: http://icl.cs.utk.edu/netsolve/<br />

CACTUS<br />

An open-source problem-solving environment designed <strong>for</strong><br />

parallel computing and collaborative software development:<br />

http://www.catcuscode.org<br />

The next section introduces about Globus Toolkit 2.2 that will be use <strong>for</strong> our<br />

study.<br />

2.4.3 Globus Toolkit 2.2<br />

The Globus Toolkit 2.2 provides [7]:<br />

2.4.3.1 A set of basic facilities needed <strong>for</strong> grid computing, shown in Figure<br />

2-14.


28<br />

FIGURE 2-14 Components of Globus Toolkit 2.2<br />

a) Security: Single sign-on, authentication, authorization, and<br />

secure data transfer.<br />

b) Resource Management provides support <strong>for</strong>:<br />

- Resource allocation.<br />

- Submitting jobs: Remotely running executable files and<br />

receiving results.<br />

- Managing job status and progress.<br />

c) Data Management provides a system to transfer files among<br />

machines in the grid and <strong>for</strong> the management of these transfers.<br />

d) In<strong>for</strong>mation Services includes directory services of available<br />

resources and their status. It provides support <strong>for</strong> collecting in<strong>for</strong>mation in the grid<br />

and <strong>for</strong> querying this in<strong>for</strong>mation, based on the Lightweight Directory Access<br />

Protocol (LDAP), shown in Figure 2-15.<br />

FIGURE 2-15 Simple LDAP configuration [7]


29<br />

2.4.3.2 Application Programming Interfaces (APIs) to the above facilities.<br />

2.4.3.3 C bindings are needed to build and compile programs.<br />

In addition to the above, which are considered the core of the toolkit, other<br />

components are also available that complement or build on top of these facilities. For<br />

instance, Globus provides a rapid development kit known as Commodity Grid (CoG),<br />

which supports technologies such as Java, Python, Web services, CORBA, and so on.<br />

2.4.4 Grid Components<br />

This section describes high level the primary components of the grid<br />

environment, shown in Figure 2-16. Depending on the grid design and its expected<br />

use, some of these components may or may not be required, and in some cases they<br />

may be combined to <strong>for</strong>m a hybrid component.<br />

FIGURE 2-16 Grid components: a high-level perspective [8]<br />

2.4.4.1 Grid portal<br />

The grid portal provides an interface <strong>for</strong> a user to launch applications that will<br />

utilize the resources and services provided by the grid.<br />

The current Globus Toolkit does not provide any services or tools to generate a<br />

portal.<br />

2.4.4.2 Security<br />

A major requirement <strong>for</strong> the grid computing is security. There must be<br />

mechanisms to provide security including authentication, authorization, and data<br />

encryption.


30<br />

The Grid Security Infrastructure (GSI) component of the Globus Toolkit<br />

provides robust security mechanisms. The GSI includes an OpenSSL implementation.<br />

It also provides a single sign-on mechanism. There<strong>for</strong>e, once a user is authenticated, a<br />

proxy certificate is created and used when per<strong>for</strong>ming actions within the grid.<br />

2.4.4.3 Broker<br />

Once authenticated, a user will launch the application. Based on the parameters<br />

provided by the user, the broker will identify the available and appropriate resources<br />

to utilize within the grid.<br />

Though there is no broker implementation provided by Globus Toolkit, there is<br />

an LDAP-based in<strong>for</strong>mation service. This service is called Grid Resource In<strong>for</strong>mation<br />

Service (GRIS), or more commonly the Monitoring and Discovery Service (MDS).<br />

2.4.4.4 Scheduler<br />

Once the resources have been identified, the next logical step is to schedule the<br />

individual jobs to run on the individual nodes within the grid.<br />

Globus Toolkit does not have its own job scheduler to find available resources<br />

and automatically send jobs to suitable machines. Instead, it provides the tools and<br />

interfaces needed to implement schedulers.<br />

2.4.4.5 Data Management<br />

If any data (including application modules) must be moved or made accessible<br />

to the nodes where the application’s jobs will execute, then there needs to be a secure<br />

and reliable method <strong>for</strong> moving files and data to various nodes within the grid.<br />

The Globus Toolkit contains a data management component that provides such<br />

services. This component, known as Grid Access to Secondary Storage (GASS),<br />

includes facilities such as GridFTP. The GridFTP is built on top of the authentication<br />

and authorization standard FTP protocol, but adds additional functions and utilizes the<br />

GSI <strong>for</strong> user authentication and authorization.<br />

2.4.4.6 Job and Resource Management<br />

This component provides the services to actually launch a job on a particular<br />

resource, check on its status, and retrieve its results when it is complete.<br />

The Grid Resource Allocation Manager (GRAM) of Globus Toolkit provides<br />

the services <strong>for</strong> this component.


31<br />

2.5 Summary<br />

The course scheduling is a part of a general scheduling problem. It schedules<br />

courses to periods of time and classrooms so that lecturers can teach and students can<br />

attend their courses without any conflicts.<br />

Many researches have been carried out on course scheduling problems. The<br />

different approaches can be divided into four groups: sequential methods, cluster<br />

methods, constraint based methods, and meta-heuristic methods. Although they have<br />

successfully solved the course scheduling problems, not many researches have<br />

focused on solving the problems of the <strong>multi</strong>ple faculty universities. In such<br />

universities, conflicts can occur across faculties due to both sharing and non sharing<br />

resources.<br />

This study proposes a new system <strong>for</strong> <strong>multi</strong>ple faculty universities. The<br />

proposed system will apply a hybrid centralized and de-centralized approach, a GA,<br />

and a grid computing environment. The GA is a global search optimization <strong>algorithm</strong><br />

using parallel points, so it is suitable and flexible to satisfy constraints in the required<br />

timetable. The combination between the GA and the hybrid centralized and decentralized<br />

approach is able to create solutions without any conflicts between the<br />

resources around the university. The grid computing environment is used as<br />

infrastructure <strong>for</strong> sharing computing and data over a local or wide area network.


CHAPTER 3<br />

METHODOLOGY<br />

The general course scheduling problem, <strong>objective</strong>s and scope of our study were<br />

presented in chapter 1. This chapter presents the plan and the phases of analyzing,<br />

designing and implementing the proposed course scheduling system.<br />

3.1 System Development<br />

In order to obtain the expected <strong>objective</strong>s, we will follow the six phases below:<br />

3.1.1 Phase 1: Systems Analysis<br />

a) To verify the requirements and the <strong>objective</strong>s of the study.<br />

b) To choose the tools and software to be used to develop the system.<br />

3.1.2 Phase 2: Design<br />

a) To study the <strong>genetic</strong> <strong>algorithm</strong>s and grid computing environment.<br />

b) To specify the proposed system.<br />

c) To design the interfaces and the module’s functions.<br />

d) To design the database.<br />

e) To design a prototype <strong>for</strong> connecting between users and the system.<br />

3.1.3 Phase 3: Implementation<br />

a) To study the <strong>genetic</strong> <strong>algorithm</strong>s and grid computing environment.<br />

b) To install the correct software to develop the system.<br />

c) To install the database.<br />

d) To implement the prototype <strong>for</strong> connecting between users and the<br />

system.<br />

e) To implement the designed modules.<br />

3.1.4 Phase 4: Testing<br />

a) To test the system.<br />

b) To run a demonstration.<br />

c) To do some evaluations on the effectiveness of the system.


34<br />

3.1.5 Phase 5: Measurement<br />

a) To evaluate the suitability of the proposed GA against the hard and soft<br />

constraints.<br />

b) To measure the per<strong>for</strong>mance of using grid computing vs. not using grid<br />

computing.<br />

3.1.6 Phase 6: Documentation<br />

a) To write the user manuals.<br />

b) To write reports.<br />

3.2 Problem Definition<br />

The more realistic the problem the more complex it is <strong>for</strong> the developers to<br />

overcome. In the real world, course scheduling problems are very complex. For<br />

<strong>multi</strong>ple faculty universities, they are really hard jobs. Also they are strongly based on<br />

the particular requirements of each university. This study will focus on the common<br />

requirements of <strong>multi</strong>ple faculty universities. However, the proposed system with its<br />

solved constraints is strong enough so that not many changes are needed to obtain a<br />

good system <strong>for</strong> a particular university.<br />

The <strong>multi</strong>ple faculty universities where we have the chance to collect data are<br />

King Mongkut’s Institute Technology North Bangkok in Thailand and Cantho<br />

Univesity in Vietnam. At these universities, each faculty has several departments.<br />

Each department has its own resources that include lecturers, courses, and classrooms.<br />

Each department desires to construct a timetable using its own resources. These<br />

resources can also be shared by other departments in the university.<br />

Each course that is usually divided into many sections belongs to just one<br />

department. However, it is almost always the case that a significant part of the<br />

curriculum of one department is provided by another department. If a course is<br />

provided to more than one department it must be scheduled at the same time-slot on<br />

all the departmental timetables that use this course. These courses are called shared<br />

courses.<br />

Similarly we have shared classrooms. Each department desires to use its own<br />

classrooms. However, some courses sometime need to use the shared classrooms of<br />

the faculty, common buildings or other faculties. There<strong>for</strong>e, a group of classrooms


35<br />

used <strong>for</strong> a particular course has to be assigned be<strong>for</strong>e scheduling. A course has to be<br />

scheduled to these classrooms without any conflicts between the departments. Figure<br />

3-1 illustrates an arrangement <strong>for</strong> the shared classrooms.<br />

Dept1.<br />

l<br />

Faculty 1<br />

Shared classrooms<br />

Deptn. classrooms<br />

Faculty n<br />

Dept1. classrooms Deptm. classrooms<br />

Shared classrooms<br />

Common building<br />

Shared classrooms<br />

FIGURE 3-1 Shared classrooms in a <strong>multi</strong>ple faculty university<br />

Each department has a responsibility to teach a number of courses. There<strong>for</strong>e, a<br />

teaching assignment <strong>for</strong> its lecturers has to be done. Some lecturers from other<br />

faculties are invited to teach. Now we have shared lecturers who are teaching courses<br />

in more than one faculty.<br />

Also we do not schedule <strong>for</strong> the individual students. However, we will handle<br />

student problems at a class level instead. The students are divided into classes and<br />

expected to chronologically follow their advised pre-requisites in the curriculum of<br />

their respective program. Our responsibility is to schedule a timetable to help the<br />

students fulfill the courses in their curriculum. We say that two courses are in conflict<br />

with each other if they belong to the same curriculum and are scheduled at the same<br />

time.<br />

In many cases, a course can be attended by students who come from classes of<br />

different departments or faculties. This means that the students who study this shared<br />

course can have different curriculums. In any case, we have to schedule so that the<br />

students can attend their courses.


36<br />

All the above problems can be presented in a brief and clear way, included in<br />

section 1.3, the set of hard and soft constraints solved in our study.<br />

3.3 The System Boundary<br />

The system boundary gives a brief application overview through a use case<br />

diagram in Figure 3-2.<br />

Assign classrooms to departments<br />

Faculty Staff<br />

Department Staff<br />

Lecturer<br />

Create classes<br />

Create combined classes<br />

Assign teaching<br />

Schedule courses<br />

View timetable<br />

Request busy time<br />

Request preferable time<br />

University In<strong>for</strong>mation<br />

System<br />

Central Office Staff<br />

FIGURE 3-2 Use case diagram of the course scheduling system<br />

There are five actors in the use case diagram of the course scheduling system.<br />

3.3.1 Lecturer: This is a person who can request his/her busy and preferable<br />

times so that the course scheduling programs try to avoid these times. The lecturers<br />

can view the timetable after it is completed.<br />

3.3.2 Department Staff: This is a person who works in the department. The<br />

department staff prepares classes to be scheduled. Based on the teaching plan, and the<br />

department staff will assign lecturers to teach the courses.<br />

3.3.3 Faculty Staff: This is a person who works in the faculty. The faculty staff<br />

can assign the classrooms to the departments in the faculty. Each department can use<br />

these classrooms <strong>for</strong> its courses. This allocation sometime does not need to be done<br />

in each semester.


37<br />

3.3.4 Central Office Staff: This is a person who works in the central office of the<br />

university. The central office staff will activate the course scheduling system to<br />

schedule all courses <strong>for</strong> the whole university.<br />

3.3.5 University In<strong>for</strong>mation System: This is a system actor that includes a<br />

database and a database management system. It is responsible <strong>for</strong> storing and<br />

managing the data of the university.<br />

3.4 The Proposed Course Scheduling System<br />

This section presents the proposed system through a scheduling strategy and the<br />

system architecture.<br />

3.4.1 The Scheduling Strategy<br />

In general, there are two approaches to the course scheduling problem, namely<br />

centralized and de-centralized. Both approaches have their own advantages and<br />

disadvantages.<br />

The centralized approach uses software to schedule the timetable <strong>for</strong> the entire<br />

of the university. This software has a global view of the problem, presenting all the<br />

in<strong>for</strong>mation necessary to most effectively create a timetable. Un<strong>for</strong>tunately, the size<br />

of the problem is too big, so the course scheduling program is unable to create a good<br />

timetable. Furthermore, the co-operation between faculties and the central scheduling<br />

office is also a difficult problem [5].<br />

The de-centralized approach lets each faculty schedule its own timetable using<br />

its own resources. However, this approach rapidly becomes infeasible when there are<br />

shared resources across faculties. This approach can only work well if the<br />

communication between faculties is reduced to a minimum [5]. Our study proposes a<br />

hybrid centralized and de-centralized approach. The centralized course scheduling<br />

program only schedules <strong>for</strong> shared resources whereas the decentralized course<br />

scheduling program schedules <strong>for</strong> the remaining resources of each faculty. The<br />

proposed course scheduling system is shown in Figure 3-3.<br />

The proposed system is designed to consist of jobs that are processed in parallel.<br />

After clients at all faculties send their own data used in course scheduling to the<br />

Central Manager Host, a client in the central office will run the course scheduling<br />

program. In turn, the following three stages will be per<strong>for</strong>med automatically.


38<br />

Client at a<br />

faculty<br />

Client at the<br />

central office<br />

Data submission<br />

<strong>for</strong> the course<br />

scheduling<br />

Job submission<br />

<strong>for</strong> the course<br />

scheduling<br />

Central<br />

Manager Host<br />

Data and job<br />

<strong>for</strong> execution<br />

Execution Host<br />

schedules <strong>for</strong><br />

Facuty 1<br />

. . . .<br />

Execution Host<br />

schedules <strong>for</strong><br />

Facuty n<br />

FIGURE 3-3 Proposed system<br />

3.4.1.1 Stage 1<br />

The Central Manager Host requests a job to execute the centralized course<br />

scheduling program on a remote Execution Host to create a timetable of the shared<br />

resources across the faculties. The result will be written into the database on the<br />

Central Manager Host.<br />

3.4.1.2 Stage 2<br />

The Central Manager Host requests jobs to execute the decentralized course<br />

scheduling program in parallel on remote Execution Hosts. In this stage, each remote<br />

host uses the fixed timetable created in Stage 1 as an initial input, and then tries to<br />

find a timetable <strong>for</strong> each faculty. The decentralized course scheduling program must<br />

give results that do not conflict with the centralized scheduling output. The results<br />

from all remote nodes will also be written into the database on the Central Manager<br />

Host.<br />

3.4.1.3 Stage 3<br />

The Central Manager Host requests a job to merge the results in the database of<br />

Central Manager Host. Finally, the entire timetable <strong>for</strong> the whole university will be<br />

created.<br />

We will use a <strong>genetic</strong> <strong>algorithm</strong> to develop both the centralized course<br />

scheduling program and decentralized course scheduling program. The grid<br />

computing environment is used as infrastructure <strong>for</strong> distributed and parallel<br />

computing.


39<br />

3.4.2 The System Architecture<br />

The system can be separated into two subsystems: Front End system and Grid<br />

system, shown in Figure 3-4.<br />

The Front End system is based on the 3-tier architecture. This will be used by<br />

the clients in the faculties and in the central office to prepare the data be<strong>for</strong>e<br />

scheduling. It includes three components: GUIs, application program and data<br />

storage.<br />

By separating the system into 3 tiers, they can work independently. The<br />

presentation tier involves the graphical user interface. The application tier consists of<br />

the application manager. The last tier, the database tier, consists of a database and its<br />

database management system (DBMS).<br />

Presentation tier<br />

Clients at the faculties<br />

Client at the central office<br />

Client 1 Client 2 Client n Client n+1<br />

Application<br />

tier<br />

Application<br />

Manager<br />

Scheduling<br />

Engine<br />

Commodity Grid<br />

Search available machines<br />

Send data to machines<br />

Send jobs to machines<br />

Distribute job/data<br />

Globus Grid<br />

Environment<br />

Node 1<br />

Node n<br />

Get results from jobs<br />

submitted to the machines<br />

Node 2<br />

Database tier<br />

DBMS<br />

Results<br />

DB<br />

FIGURE 3-4 System architecture


40<br />

The Grid system is only used by a client in the central office to start the<br />

scheduling engine that then activates the grid system. The grid system is also a 3-tier<br />

architecture of the following: Client, Commodity Grid (CoG), and Globus Grid<br />

Environment (Grid).<br />

The Client tier is the interface between users and the grid system. It is<br />

responsible <strong>for</strong> receiving command to run the scheduling engine.<br />

The CoG tier acts as an interface between the Grid and Client tier. Using the<br />

facilities provided by the API, the CoG is able to allow secure file transfers and also<br />

takes the responsibility of job scheduling and monitoring the status of jobs. There is<br />

one job <strong>for</strong> centralized course scheduling, and many other jobs <strong>for</strong> decentralized<br />

course scheduling. When a job needs to be per<strong>for</strong>med, the CoG will look <strong>for</strong> available<br />

nodes to assign it to. The Management and Discovery Service (MDS) provided by the<br />

Globus Toolkit will provide in<strong>for</strong>mation about the available nodes within the Grid.<br />

Next, it checks and locates the sequence data to available machines (nodes).<br />

Security (GSI) and reliability is important when transferring data to various nodes<br />

within the Grid. In order to provide <strong>for</strong> such requirements, the Globus Toolkit<br />

provides a data management component, known as Grid Access to Secondary Storage<br />

(GASS), <strong>for</strong> secure and reliable data transfers. It uses the GridFTP protocol to<br />

facilitate the checking and transport of data files.<br />

The CoG tier monitors the progress of each job and polls regularly to check if<br />

the jobs are finished. The Grid Resource Allocation Manager (GRAM) provides the<br />

necessary services <strong>for</strong> these processes. Once compiled, the results will be stored into<br />

the database, and their status will be shown to the Client.<br />

3.5 The Database Design<br />

In the database design, we present an entity relation diagram, shown in Figure<br />

3-5. This design also helps us understand more clearly the system requirements.<br />

Data relations between the entities in the above diagram are very important.<br />

Since the course scheduling programs will not work directly on the database, it works<br />

on the data structures instead. There<strong>for</strong>e, the data and its relations need to be loaded<br />

from the database into the corresponding data structures be<strong>for</strong>e scheduling. The


41<br />

course scheduling programs have to know the data relations so that they are able to<br />

look <strong>for</strong> enough in<strong>for</strong>mation to satisfy the hard and soft constraints.<br />

Building<br />

BuildingID<br />

Faculty<br />

FacultyID<br />

1<br />

BuildingName<br />

1<br />

FacultyName<br />

ClassroomGroupID<br />

has<br />

DeptID<br />

has<br />

consists<br />

of<br />

ClassroomGroupName<br />

N<br />

1 ClassroomGroup N controls M Department<br />

1 1 1<br />

DeptName<br />

ClassroomID<br />

N M<br />

Classroom<br />

M<br />

ClassroomName<br />

Seats<br />

has<br />

N<br />

has<br />

N<br />

Course<br />

has<br />

semester<br />

Curriculum<br />

year<br />

N M<br />

Program 1 has N Class<br />

ProgramID<br />

ProgramName<br />

NumSemesters<br />

Semester<br />

DayinWeek<br />

Year<br />

Time-slot<br />

N<br />

1<br />

classID<br />

className<br />

enrolYear<br />

CourseID<br />

CourseName<br />

Credits<br />

Kind<br />

takes<br />

numStudents<br />

hasTimeTable<br />

consists<br />

of<br />

M N N<br />

CourseSection<br />

N<br />

has<br />

teaches<br />

N<br />

Lecturer<br />

1<br />

SectionNo<br />

Semester<br />

Year<br />

NumStudents<br />

has<br />

1<br />

N<br />

BusyTime<br />

LecturerID<br />

LecturerName<br />

DayinWeek<br />

Working<br />

Session<br />

State<br />

Gender<br />

FIGURE 3-5 Entity relation diagram<br />

The data dictionary is presented in Appendix A.


42<br />

3.6 The Proposed Genetic Algorithm<br />

This section presents the proposed <strong>genetic</strong> <strong>algorithm</strong> that includes <strong>genetic</strong><br />

representations, processes to create constraint data, initialize a random population,<br />

evaluate fitness function, crossover, and mutate chromosomes. Figure 3-6 presents<br />

the high level representation of this <strong>algorithm</strong>.<br />

Start<br />

Create constraint data<br />

Initialize a random population of n chromosomes<br />

Is fitness f(x) of<br />

Yes<br />

first chromosome x<br />

satisfied<br />

No<br />

Delete some bad chromosomes (low fitness value)<br />

Output<br />

Solution<br />

Stop<br />

No<br />

Population size < n<br />

Yes<br />

Select 2 chromosomes as parent<br />

Crossover<br />

Breed a new chromosome (offspring)<br />

Mutate<br />

Evaluate the fitness value of the offspring<br />

Add the offspring to the population in order of fitness value<br />

FIGURE 3-6 High level representation of the proposed <strong>genetic</strong> <strong>algorithm</strong><br />

To generate an optimum result, we apply the <strong>genetic</strong> <strong>algorithm</strong> to create one or<br />

more solutions that have various fitness values. Based on comparisons, changes, and<br />

creation of new solutions, we can choose a good solution. Of course, we can obtain a<br />

variety of good solutions.<br />

Shown in Figure 3-6, we will insert a new chromosome that has just mutated<br />

into the right position in the population. The crossover and mutation operations are


43<br />

repeated to change the population until the first chromosome of the population obtains<br />

a good enough fitness value f(x). However, if repeated too many times, these<br />

operations will create a large number of chromosomes that is above the preset<br />

population size. To solve this problem, once the number of chromosomes increases up<br />

to a critical value n, we will kill off half of the population.<br />

3.6.1 Representations<br />

This section defines the <strong>genetic</strong> representations of the chromosomes, the genes,<br />

and the population.<br />

3.6.1.1 Chromosomes<br />

A chromosome is a solution, in our case a timetable of the university. The<br />

timetable contains a number of sub-timetables of classrooms. Each classroom has its<br />

own sub-timetable.<br />

Classroom i<br />

Hour Mon Tue Wed Thu Fri<br />

08:00-09:00 Course 1 Course 2<br />

09:00-10:00 Course 1 Course 2<br />

10:00-11:00 Course 1 Course 2<br />

11:00-12:00<br />

13:00-14:00 Course 3<br />

14:00-15:00 Course 3<br />

15:00-16:00 Course 4<br />

16:00-17:00 Course 4<br />

FIGURE 3-7 Sub-timetable of a classroom<br />

We use a classroom as a ‘storage space’. Courses are scheduled to the time-slots<br />

<strong>for</strong> each classroom. This direct representation creates a visual view. Here courseis are<br />

courses that are divided into sections. These sections are assigned to be taught by<br />

particular lecturer and studied by a class of students. A look at the data relations in the<br />

database, we have course → lecturer, course→ class. This is a good foundation <strong>for</strong><br />

checking the hard and soft constraint conflicts.<br />

The Figure 3-8 illustrates an entire chromosome.


44<br />

Chromosome x i<br />

Fitness = f(x i )<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1<br />

Class2<br />

Mon<br />

Class1<br />

Tue Wed<br />

Class2<br />

Thu Fri<br />

Classroom Class1 1<br />

Class1 Class2 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Course 1<br />

Class1 Class3 Course 2<br />

Class2<br />

Course 1 Class3 Course 2<br />

Course 1<br />

Class3 Course 2<br />

Class4<br />

Class3<br />

Class4<br />

Course 3<br />

Course 3<br />

Class4<br />

Class4<br />

Course 4<br />

Course 4<br />

A gene=A time-slot<br />

FIGURE 3-8 Chromosome<br />

Each chromosome x i has a fitness value f(x i ). We will use this value to look <strong>for</strong> a<br />

good chromosome (a good solution).<br />

3.6.1.2 Genes<br />

A gene is a time-slot in a chromosome, so there are many genes in a<br />

chromosome. Each gene contains a 0 if no course is held at that position. On the<br />

contrary, the gene contains a course. If changing value of the genes, we will create a<br />

new chromosome.<br />

3.6.1.3 Population<br />

A population is a set of n chromosomes, or n solutions. The population is<br />

always sorted decreasingly in the order of the chromosome’s fitness value. As a<br />

result, the first chromosome has the highest fitness value, thus a candidate <strong>for</strong> the best<br />

solution, as illustrated in Figure 3-9.<br />

Chromosome x n<br />

Fitness = f(x n )<br />

A population<br />

Chromosome x 2<br />

Fitness = f(x 2 )<br />

Chromosome x 1<br />

Fitness = f(x 1 )<br />

FIGURE 3-9 Population


45<br />

3.6.2 Creating Constraint Data<br />

Figure 3-10 presents processes to prepare data be<strong>for</strong>e scheduling.<br />

User Input<br />

Faculties<br />

Departments<br />

Curriculums<br />

Classrooms<br />

Lecturers<br />

Courses<br />

Classes<br />

Assignments<br />

Constraint data<br />

are stored into<br />

Data Structures<br />

GA Parameters<br />

GA<br />

Timetable<br />

FIGURE 3-10 Creating constraint data<br />

All data, and their relations, plus the GA parameters have to be prepared be<strong>for</strong>e<br />

running the GA. The data about each faculty, department, curriculum, classroom,<br />

lecturer, course, class and teaching assignment are entered into the database by the<br />

users. Then automatically a program module will extract and store these data into the<br />

data structures. The list data structures are used because they are flexible <strong>for</strong><br />

designing the <strong>algorithm</strong>s. The GA parameters such as the population size, mutation<br />

and crossover rates, and penalty costs <strong>for</strong> the unsatisfied constraints are also prepared<br />

as variables in the program.<br />

3.6.3 Initializing a Random Population of Chromosomes<br />

Start<br />

Initialize an empty population<br />

Population size < n<br />

Yes<br />

Create a random chromosome x<br />

No<br />

Stop<br />

Evaluate the fitness f(x) <strong>for</strong> new chromosome x<br />

Add the new chromosome x to the population in order of fitness<br />

FIGURE 3-11 Algorithm <strong>for</strong> initializing a random population


46<br />

A population is a list of n chromosomes. Starting with an empty population, one<br />

after another we create and add new random chromosomes into this population.<br />

A pseudo code <strong>for</strong> creating this is given in Figure 3-12.<br />

For each course<br />

n= number of time-slots needed <strong>for</strong> this class (= number of credits)<br />

Repeat<br />

Randomly select a classroom in list of classrooms that are permissible <strong>for</strong> this course<br />

Search n free time-slots in the chosen classroom<br />

If (n free time-slots are found)<br />

Book the current course to these time-slots<br />

Until (course is booked)<br />

FIGURE 3-12 Pseudo code <strong>for</strong> creating a random chromosome<br />

3.6.4 Evaluating Fitness Function<br />

As represented above, each chromosome x has a fitness value f(x). In this<br />

section, we discuss how to find f(x).<br />

Assume that we have m hard constraints. Let Hc i denote the number of<br />

conflicted hard constraints i, where i = 1..m. Each hard constraints i is assigned a<br />

penalty cost Penalty_hc i . We use f 1 (x) to denote the fitness value of hard constraints.<br />

1<br />

f1(<br />

x)<br />

= Eq. 3-1<br />

m<br />

1+<br />

Hc Penalty _ hc<br />

∑<br />

i=<br />

1<br />

i<br />

Similarly assume that we have n soft constraints. Let Sc j denote the number of<br />

conflicted soft constraints j, where j = 0..n. Each soft constraint j is assigned a penalty<br />

cost Penalty_sc j . We use f 2 (x) to denote the fitness value of soft constraints.<br />

∑<br />

j=<br />

1<br />

j<br />

sc j<br />

i<br />

1<br />

f<br />

2<br />

( x)<br />

= Eq. 3-2<br />

n<br />

1+<br />

Sc Penalty _<br />

Thus, if a chromosome has a lower number of conflicts, f 1 (x) and f 2 (x) will have<br />

a higher fitness value. We use f(x) to denote the fitness value of the chromosome x.<br />

f ( x)<br />

= W ( ) ( )<br />

Eq. 3-3<br />

1<br />

f1<br />

x + W2<br />

f<br />

2<br />

x


47<br />

where W 1 and W 2 denote weights of hard and soft constraints respectively. We will<br />

do experiments to identify suitable values <strong>for</strong> these weights.<br />

In this study, we design a course scheduling <strong>algorithm</strong> to find solutions that<br />

have the highest fitness value f(x). This is a heuristic search, so we will look at<br />

solutions having high fitness value until we meet a solution whose f 1 (x) is equal to 1.<br />

3.6.4.1 Checking Conflicts about Small Classrooms<br />

Each course must be booked to a classroom that is large enough to hold the<br />

students of that course.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-13.<br />

Count=0<br />

For each classroom<br />

For each day in a week<br />

For each time-slot in a day<br />

If ( number of students attending the course held in the current classroom ><br />

number of seats of the current classroom) Count =Count+1<br />

FIGURE 3-13 Pseudo code <strong>for</strong> checking small classroom conflicts<br />

3.6.4.2 Checking Conflicts Regarding Lecturer’s Busy Time<br />

The courses taught by a lecturer cannot be booked to his/her busy workingsessions<br />

in a week.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-14.<br />

Count=0<br />

For each lecturer<br />

For each day in a week<br />

For each time-slot in a day<br />

For each classroom<br />

If (the current lecturer teaching the class is held in the current classroom and at<br />

this time-slot ) and (the current lecturer is busy at this time) Count=Count+1<br />

FIGURE 3-14 Pseudo code <strong>for</strong> checking lecturer’s busy time


48<br />

Lecturers register their busy time. This checking will compare their busy time<br />

with the time that is used to book the lecturers courses. If duplicated, an error is<br />

counted.<br />

3.6.4.3 Checking Conflicts about Preferable Time<br />

Some lecturers dislike teaching in some working-sessions in a week. The system<br />

should try to avoid booking their courses to this time.<br />

The course scheduling program tries to book lecturers’ courses in these desired<br />

time periods. Any conflict will be counted as a soft constraint.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-15.<br />

Count=0<br />

For each lecturer<br />

For each day in a week<br />

For each time-slot in a day<br />

For each classroom<br />

If (the current lecturer teaching the class is held in the current classroom and at<br />

this time-slot ) and (the current lecturer dislikes teaching at this time) Count=Count+1<br />

FIGURE 3-15 Pseudo code <strong>for</strong> detecting conflicts about preferable times<br />

3.6.4.4 Checking Conflicts about Double Booked Lecturers<br />

A lecturer cannot teach more than one course at the same time.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-16.<br />

Count=0<br />

For each lecturer<br />

For each day in a week<br />

For each time-slot in a day<br />

Booked=0<br />

For each classroom<br />

If (course held in this classroom is taught by the current lecturer) Booked = Booked+1<br />

If (Booked>1) Count=Count+1<br />

FIGURE 3-16 Pseudo code <strong>for</strong> checking conflicts about double scheduled lecturers


49<br />

At the same time, if a lecturer is booked to teach more than one course, a<br />

conflict will be counted.<br />

3.6.4.5 Checking Conflicts about Double Scheduled Classes<br />

Courses attended by the same class of students have to be scheduled to different<br />

time so that all students of that class can attend.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-17.<br />

For each class<br />

For each day in a week<br />

For each time-slot in a day<br />

Count=0<br />

For each classroom<br />

If (the course held in the current time-slot is studied by the current class)<br />

Count=Count+1<br />

FIGURE 3-17 Pseudo code <strong>for</strong> checking conflicts about double scheduled classes<br />

At the same time, a class cannot be booked to study more than one course. If<br />

double scheduled, a conflict will be counted.<br />

3.6.4.6 Checking Conflicts about Double Scheduled Courses<br />

Every course must be scheduled exactly once in a week.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-18.<br />

Count=0<br />

For each course<br />

Booked=0<br />

For each classroom<br />

For each day in a week<br />

For each time-slot in a day<br />

If (the current course is held in this time period)<br />

Booked=Booked+1<br />

If (Booked> the number of credits of the current course) Count=Count+1<br />

FIGURE 3-18 Pseudo code <strong>for</strong> checking conflicts about double scheduled courses


50<br />

A course is booked to the time-lots based on the number of its credits. In our<br />

study, the number of credits of a course can be 1, 2, 3, or 4. We stipulate that if a<br />

course has n credits, it will be scheduled to n straight time-slots in a day. For instance,<br />

course MAT125 has 3 credits, so it has to be scheduled to 3 straight time-slots. In any<br />

other case, a conflict will be counted.<br />

3.6.5 Crossover<br />

Two chromosomes from a population are chosen at random as mother and<br />

father. A new offspring is generated by creating an empty chromosome, then inserting<br />

alternately genes (time-slots) from the mother and father, as illustrated in Figure 3-19.<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Classroom<br />

Class1<br />

Class1 1<br />

Class2<br />

Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Class1<br />

Class3<br />

Class2<br />

Course 1 Class3 Course 2<br />

Course Class3 1 Course 2 Class4<br />

Course Class3 1 Course 2 Class4<br />

Class4<br />

Class4<br />

Course 3<br />

Course 3<br />

Chromosome x<br />

(Mother)<br />

Course 4<br />

Course 4<br />

Chromosome y<br />

(Father)<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Classroom Class1<br />

Class1 1 Class2<br />

Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Class1<br />

Class3<br />

Class2<br />

Course 2 Class3 Course 3<br />

Course Class3 2 Course 3<br />

Class4<br />

Course Class3 2<br />

Class4<br />

Class4<br />

Class4<br />

Course 4<br />

Course 4<br />

New chromosome z<br />

(Offspring)<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Classroom Class1 1<br />

Class1 Class2<br />

Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Class1<br />

Class3<br />

Class2<br />

Course 2 Class3 Course 2<br />

Course<br />

Class3 2 Course 2 Class4<br />

Course<br />

Class3 2 Course 2 Course 4 Class4<br />

Course<br />

Class4 4<br />

Class4<br />

Course 3<br />

Course 3<br />

Course 4<br />

Course 4<br />

FIGURE 3-19 Crossover


51<br />

The new offspring is created from an empty chromosome, and then it is inserted<br />

alternately with genes from mother and father. Because a n-credit course will be<br />

scheduled to n successive time-slots, successive time-slots have to be copied from<br />

mother and father. To facilitate this, all time-slots of morning or afternoon working<br />

sessions will be copied from the mother or father to the new offspring.<br />

Usually the new offspring is not correct thus it needs to be repaired. If a course<br />

has not been scheduled yet, it also needs to be scheduled. In the contrary, if a course<br />

has been scheduled more than one time in a week, it has to be removed.<br />

A pseudo code <strong>for</strong> crossover is given in Figure 3-20.<br />

Crossover rate pc=0.5<br />

Father x= a chromosome is chosen randomly from the population<br />

Mother y= a chromosome is chosen randomly from the population (y≠x)<br />

For each day in a week<br />

For each working-session in [morning, afternoon]<br />

For each classroom<br />

If (random(100) < pc*100)<br />

Copy afternoon time-slots of father x to afternoon time-slots of the new offspring z<br />

Else<br />

Copy morning time-slots of mother y to morning time-slots of the new offspring z<br />

Mutate the new offspring z<br />

Repair the new offspring z<br />

Calculate fitness value <strong>for</strong> the new offspring z<br />

Insert the new offspring z into the population in order of fitness value<br />

FIGURE 3-20 Pseudo code <strong>for</strong> crossover<br />

If the crossover rate pc is chosen to be 50%, the 50% of the genes from the<br />

mother and 50% of the genes from father are copied to the new offspring.<br />

3.6.6 Mutation<br />

A new offspring that has just been created by crossover will be mutated with a<br />

mutation rate. This is done via the following process: go through each gene and swap<br />

its content with another gene in the same chromosome.


52<br />

As mentioned in the previous section, a course has to be scheduled to successive<br />

time-slots, so we have to swap the successive time-slots booked <strong>for</strong> a course with<br />

other successive time-slots. To facilitate this, we choose all time-slots of a working<br />

session to swap with those of another, as illustrated in Figure 3-21.<br />

…<br />

…<br />

Classroom j<br />

Mon Tue Wed Thu Fri<br />

Chromosome x<br />

Course 6 Course 8<br />

Course 6 Course<br />

Classroom i<br />

8<br />

Course6 Course 8<br />

Mon Tue Wed Thu Fri<br />

Course 1 Course 2<br />

Course 1 Course 2<br />

Course 1 Course 2<br />

Course3 Course 9<br />

Course3 Course 9<br />

Course 3<br />

Course 3<br />

Course 4<br />

Course 4<br />

Swap contenst of 2 workingsessions<br />

with each other<br />

FIGURE 3-21 Mutation<br />

A pseudo code <strong>for</strong> mutating is given in Figure 3-22.<br />

Mutation rate pm=0.02<br />

For each classroom<br />

For each day in a week<br />

For each working-session in [morning, afternoon]<br />

If (random(100) < pm*100)<br />

R= a classroom is chosen randomly from the classroom group that is the<br />

same group of the current classroom<br />

Swap all time-slots of the current working-session of the current classroom<br />

with those of classroom R<br />

FIGURE 3-22 Pseudo code <strong>for</strong> mutating a chromosome<br />

Because a course is scheduled by only using classrooms in an assigned<br />

classroom group, any swapping has to ensure to use the classrooms within this<br />

classroom group.


53<br />

If the mutation rate is chosen to be 2%, only 2% of the genes are swapped their<br />

contents with others.<br />

3.7 The System <strong>for</strong> Experiment<br />

The Globus Toolkit 2.2 is used as middleware to develop our grid computing<br />

environment [7, 8]. This section presents the main steps <strong>for</strong> installing and setting up<br />

this environment.<br />

An Ethernet LAN and three Intel Pentium machines were used to build the grid<br />

environment. Redhat Linux 9.0 and Globus Toolkit 2.2 were installed and set up. In<br />

Figure 3-23, we present this environment with the host names and functions of each<br />

machine.<br />

m2.kmitnb.ac.th<br />

m1.kmitnb.ac.th<br />

Output<br />

Jobs<br />

- Globus client<br />

- J2sdk1.4, Java Cog Kit 1.1<br />

- MySQL 4.0<br />

m3.kmitnb.ac.th<br />

- Centralized course<br />

scheduling program<br />

- Decentralized course<br />

scheduling program<br />

- Globus server<br />

- GIIS, GRIS<br />

- CA<br />

- NTP server<br />

- Decentralized course<br />

scheduling program<br />

- Globus server<br />

- GRIS<br />

FIGURE 3-23 Hardware and software <strong>for</strong> each machine<br />

The host names are m1, m2 and m3. The machines should have a clock speed of<br />

at least 500 Mhz, at least 128 MB of memory and at least an 8 GB hard drive.<br />

We will configure the Monitoring and Discovery Service (MDS) to have one<br />

Grid In<strong>for</strong>mation Index Service (GIIS) on machine m2, which collects the data<br />

reported by the Grid Resource In<strong>for</strong>mation Servers (GRIS) on all the machines,<br />

shown in Figure 3-24.<br />

The GRIS servers send in<strong>for</strong>mation about their respective servers to the GIIS.<br />

We will use this to find the available machines. The user will be able to query the


54<br />

GIIS from the client machine m1. The machine m2 is used as a Certificate Authority<br />

machine.<br />

m2.kmitnb.ac.th<br />

m1.kmitnb.ac.th<br />

GRIS<br />

GIIS<br />

Grid-info-search<br />

GRIS<br />

m3.kmitnb.ac.th<br />

FIGURE 3-24 MDS configuration<br />

The MDS is secured so that only certified users can access the GIIS and only<br />

certified server GRISs can register to send in<strong>for</strong>mation to the GIIS. The machine m2<br />

is also used as a Network Time Protocol (NTP) server. We have to configure the NTP<br />

clients <strong>for</strong> the others (m1 and m3). The NTP needs to be installed because the grid<br />

needs the clocks on all of the machines to be synchronized.<br />

The installation and set up process in detail is presented in Appendix B.<br />

3.8 The Grid Components<br />

This section introduces the following grid components: broker, scheduler, and<br />

job and resource management.<br />

3.8.1 Broker<br />

The broker identifies the available resources to utilize within the grid<br />

environment. The Globus Toolkit 2.2 does not provide a broker implementation, but it<br />

provides the necessary functions and framework to create one through the MDS<br />

component.<br />

The broker will communicate via the LDAP protocol in the Globus Toolkit 2.2<br />

with the GIIS and GRIS servers. The broker can be linked with other in<strong>for</strong>mation


55<br />

stored in the databases or plain files that provide the resource in<strong>for</strong>mation, shown in<br />

Figure 3-25.<br />

In our study, we use a broker that uses the LDAP APIs provided by the Globus<br />

Toolkit 2.2 to send requests to the GIIS server located on machine m2.<br />

The complete source code <strong>for</strong> the broker is given in the file GridInfoSearch.java<br />

in Appendix E.<br />

m1.kmitnb.ac.th<br />

Broker<br />

LDAP query<br />

m2.kmitnb.ac.th<br />

GIIS<br />

GRIS<br />

Application<br />

GRIS<br />

GRIS<br />

m3.kmitnb.ac.th<br />

…<br />

FIGURE 3-25 Working with a broker<br />

When called, the GIIS server will return a list of available hosts within the grid.<br />

Each host has gathered the following resource in<strong>for</strong>mation:<br />

- Host name<br />

- CPU speed (MHz)<br />

- Number of CPU(s)<br />

- Free CPU Percentage<br />

The list of available hosts will be sorted by the weight that measures CPU<br />

workload.<br />

CPU<br />

speed<br />

* CPU<br />

count<br />

* CPU<br />

load<br />

Weight<br />

host<br />

= Eq. 3-4<br />

100<br />

where CPU speed : CPU speed; CPU count : the number of CPU(s); and CPU load : the<br />

current CPU workload.<br />

The most available host will be selected to run a new job.


56<br />

The complete source code <strong>for</strong> managing the available hosts is given in the file<br />

AvailableHost.java in Appendix E.<br />

3.8.2 Job Scheduler<br />

The job scheduler schedules the individual jobs to run on the individual hosts.<br />

Hamscher et al. [40] presented three job scheduling paradigms <strong>for</strong> a grid –<br />

centralized, hierarchical and distributed. Our study uses a centralized scheduling<br />

system. In addition, because the Globus Toolkit does not have its own job scheduler,<br />

our study will propose a job scheduler.<br />

In a centralized scheduling paradigm, a central machine acts as a resource<br />

manager to schedule jobs to all the surrounding hosts within the grid environment.<br />

Figure 3-26 presents the architecture of this scheduling.<br />

Jobs<br />

Central<br />

scheduling<br />

Job 1 Job 2 Job 3<br />

Host 1 Host 2 Host 3<br />

FIGURE 3-26 Centralized scheduling<br />

In this scenario, the jobs are first submitted to the central scheduler that then<br />

dispatches the jobs to the appropriate hosts. The jobs that cannot be started on a host<br />

are normally stored in a central job queue <strong>for</strong> later start.<br />

In our study, the central scheduling is implemented in machine m1. In addition,<br />

there are two kinds of jobs: one is the centralized course scheduling job and two is the<br />

decentralized course scheduling job. These jobs will be run on machine m2 and m3.<br />

Figure 3-27 presents the proposed <strong>algorithm</strong> <strong>for</strong> the centralized scheduling.


57<br />

Start<br />

Request the centralized course scheduling job<br />

to be run on a designated host<br />

Stage 1<br />

Wait <strong>for</strong> the results<br />

The job fails<br />

Yes<br />

No<br />

Select a job from the list of all<br />

decentralized course scheduling jobs<br />

Stage 2<br />

Search a host having the lowest load<br />

Request the decentralized course scheduling<br />

job to be run on the searched host<br />

All decentralized course<br />

scheduling jobs are requested<br />

No<br />

Yes<br />

All jobs were done<br />

No<br />

Select a job from the list of all<br />

decentralized course scheduling jobs<br />

Yes<br />

End<br />

Stage 3<br />

Get status of the job<br />

No<br />

The job failed<br />

Yes<br />

Search a host having the lowest load<br />

Request the failed job to be run on the searched host<br />

FIGURE 3-27 Job scheduler <strong>for</strong> the grid computing environment<br />

The <strong>algorithm</strong> can be divided into three stages:<br />

3.8.2.1 Stage 1<br />

The centralized course scheduling job is requested to be executed on a<br />

designated host, machine m2. The system will wait <strong>for</strong> the results and resubmit if it<br />

fails.


58<br />

3.8.2.2 Stage 2<br />

After the centralized course scheduling job is executed successfully, all<br />

decentralized course scheduling jobs are requested to be executed on remote<br />

machines: m2 and m3.<br />

There is no exchange of data between the decentralized course scheduling jobs,<br />

so these jobs can be requested one after another to be run in parallel in the grid.<br />

After each job is submitted to be executed on a host, the most available host will<br />

be updated.<br />

3.8.2.3 Stage 3<br />

The system monitors all the decentralized course scheduling jobs and resubmit a<br />

job if it fails.<br />

The complete source code <strong>for</strong> this job scheduler is given in the file<br />

Scheduling.java in Appendix E.<br />

3.8.3 Job and Resource Management<br />

The job and resource management submits a job to a particular resource, queries<br />

job status, and resubmits a job if it fails.<br />

FIGURE 3-28 Overview of GRAM and GASS


59<br />

The job and resource management in the Java Cog Kit is done by using the Grid<br />

Resource Allocation Manager (GRAM) and the Grid Access to Secondary Storage<br />

(GASS), shown in Figure 3-28.<br />

The GRAM is a module that provides the remote execution and status<br />

management of the execution. When a job is submitted by a client, the request is sent<br />

to the remote host and handled by the gatekeeper daemon located in the remote host.<br />

Then the gatekeeper creates a job manager to start and monitor the job. When the job<br />

is finished, the job manager sends the status in<strong>for</strong>mation back to the client and<br />

terminates.<br />

3.8.3.1 Job<br />

In Globus terminology, a job is a binary executable or command to be run on a<br />

remote resource (machine). In order to run this job, the remote server must have the<br />

Globus Toolkit installed. The remote server is also referred as a gatekeeper.<br />

In our case, we have two jobs that are executable programs: the centralized<br />

course scheduling and decentralized course scheduling. Both are written in C<br />

language. The centralized course scheduling program schedules <strong>for</strong> courses whose<br />

lecturers are invited from other faculties and courses whose students come from other<br />

faculties. On the other hand, the decentralized scheduling program schedules <strong>for</strong><br />

courses of each particular faculty that have not been scheduled yet by the centralized<br />

course scheduling program.<br />

3.8.3.2 The Resource Specific Language (RSL)<br />

RSL is a language used by the clients to submit a job. All job submission<br />

requests are described in RSL, including the executable file and condition on which it<br />

must be executed.<br />

The following is a sample RSL string that requests to execute the file<br />

decentralizedscheduling.exe one time on a remote host. The directory of this file is<br />

also identified.<br />

&(execuatable = decentralizedscheduling.exe)<br />

(directory = /usr/study/coursescheduling)<br />

(arguments = facultyID)(count=1)


60<br />

3.8.3.3 The Gatekeeper<br />

The gatekeeper daemon builds the secure communication between the clients<br />

and the servers. It communicates with the GRAM client and authenticates the right to<br />

submit jobs. After authentication, gatekeeper splits and creates a job manager<br />

delegating the authority to communicate with clients.<br />

The Java CoG Kit provides a personal gatekeeper that can be used as a<br />

lightweight alternative to the Globus gatekeeper. A gridmap file is used by the<br />

gatekeeper to map the Globus credentials to local users. The gridmap file is<br />

introduced in Appendix B.<br />

3.8.3.4 Job manager<br />

The job manager is created by the gatekeeper daemon as part of the job<br />

requesting process. It provides the interfaces that control the allocation of each local<br />

resource manager. The job manager functions are:<br />

a) Parse the RSL.<br />

b) Allocate job requests to the local resource managers. The local<br />

resource manager is usually a job scheduler like PBS, LSF, or LoadLeveler. However,<br />

our study does not use these job schedulers.<br />

c) Send callbacks to clients, if necessary.<br />

d) Receive the status and cancel requests from clients.<br />

e) Send output results to clients using the GASS, if requested.<br />

The GRAM uses the GASS <strong>for</strong> providing the mechanism to transfer the output<br />

file from servers to clients. Some APIs are provided under the Grid Security<br />

Infrastructure (GSI) protocol to furnish secure transfers.<br />

The complete source code <strong>for</strong> the job submission is given in the file<br />

GassJob.java in Appendix E.


CHAPTER 4<br />

EXPERIMENTAL RESULTS<br />

The system <strong>for</strong> the experiment was installed and set up as outlined in section<br />

3.7. This chapter discusses some of the results of our <strong>genetic</strong> <strong>algorithm</strong> (GA) and the<br />

grid computing environment. Section 4.1 presents the data used <strong>for</strong> the experiments.<br />

Section 4.2 presents experiments and discussions. Section 4.3 presents sample results.<br />

4.1 The Data <strong>for</strong> the Experiments<br />

The data used <strong>for</strong> the experiments are collected from the three departments of<br />

three different faculties: Department of English – Faculty of Education, Department<br />

of Electrical and Computer Engineering – Faculty of Engineering, and Department of<br />

Computer Science – Faculty of Science, in Cantho University (Vietnam). Twelve<br />

classes will be scheduled to study 76 sections of the courses in their curriculums in<br />

the first semester of 2006. They are Bachelor of Science in Computer Science<br />

(BSCS04A, BSCS04B, BSCS05A, BSCS05B, BSCS06A, and BSCS06B) and<br />

Bachelor of Science in Electrical Engineering (BSEE04A, BSEE04B, BSEE05A,<br />

BSEE05B, BSEE06A, and BSEE06B), shown in Table 4-1.<br />

TABLE 4-1 Courses fulfilled by each class<br />

Class Semester Course Section Credits Number of Students<br />

BSCS04A<br />

5<br />

CSC329<br />

001<br />

3<br />

30<br />

BSCS04A<br />

5<br />

CSC330<br />

001<br />

2<br />

30<br />

BSCS04A<br />

5<br />

ENL307<br />

001<br />

3<br />

30<br />

BSCS04A<br />

5<br />

CSC326<br />

001<br />

3<br />

30<br />

BSCS04A<br />

5<br />

CSC327<br />

001<br />

2<br />

30<br />

BSCS04A<br />

5<br />

CSC328<br />

001<br />

2<br />

30<br />

BSCS04B<br />

5<br />

CSC326<br />

002<br />

3<br />

30<br />

BSCS04B<br />

5<br />

CSC327<br />

002<br />

2<br />

30<br />

BSCS04B<br />

5<br />

CSC328<br />

002<br />

2<br />

30<br />

BSCS04B<br />

5<br />

CSC329<br />

002<br />

3<br />

30


62<br />

TABLE 4-1 (CONTINUED)<br />

Class Semester Course Section Credits Number of Students<br />

BSCS04B<br />

BSCS04B<br />

5<br />

5<br />

CSC330<br />

ENL307<br />

002<br />

001<br />

2<br />

3<br />

30<br />

30<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

ECE218<br />

MAT220<br />

CSC211<br />

CSC215<br />

CSC221<br />

ECE217<br />

CSC210<br />

001<br />

001<br />

002<br />

002<br />

002<br />

001<br />

002<br />

2<br />

3<br />

4<br />

2<br />

3<br />

2<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

CSC215<br />

CSC221<br />

ECE217<br />

ECE218<br />

MAT220<br />

CSC211<br />

CSC210<br />

001<br />

001<br />

002<br />

002<br />

002<br />

001<br />

001<br />

2<br />

3<br />

2<br />

2<br />

3<br />

4<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

CSC120<br />

CSC127<br />

ENL101<br />

MAT125<br />

CSC110<br />

CSC113<br />

CSC115<br />

002<br />

002<br />

001<br />

001<br />

002<br />

002<br />

002<br />

3<br />

2<br />

3<br />

3<br />

2<br />

2<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

MAT125<br />

CSC113<br />

CSC115<br />

CSC120<br />

CSC127<br />

ENL101<br />

CSC110<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

3<br />

2<br />

2<br />

3<br />

2<br />

3<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

5<br />

5<br />

5<br />

ECE320<br />

ECE325<br />

ECE326<br />

001<br />

001<br />

001<br />

2<br />

3<br />

2<br />

30<br />

30<br />

30


63<br />

TABLE 4-1 (CONTINUED)<br />

Class Semester Course Section Credits Number of Students<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

5<br />

5<br />

5<br />

ENL308<br />

MAT322<br />

SIE305<br />

001<br />

001<br />

001<br />

3<br />

2<br />

3<br />

30<br />

30<br />

30<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

5<br />

5<br />

5<br />

5<br />

5<br />

5<br />

ECE320<br />

ECE325<br />

ECE326<br />

ENL308<br />

MAT322<br />

SIE305<br />

002<br />

002<br />

002<br />

002<br />

002<br />

002<br />

2<br />

3<br />

2<br />

3<br />

2<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

ECE212<br />

MAT223<br />

PHY241<br />

ECE200<br />

ECE205<br />

ECE203<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

3<br />

2<br />

3<br />

2<br />

2<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

MAT223<br />

PHY241<br />

ECE200<br />

ECE203<br />

ECE205<br />

ECE212<br />

002<br />

002<br />

002<br />

002<br />

002<br />

002<br />

2<br />

3<br />

2<br />

2<br />

2<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

ENL101<br />

MAT125<br />

CHE103<br />

CHE104<br />

ECE120<br />

ECE102<br />

002<br />

002<br />

006<br />

006<br />

001<br />

001<br />

3<br />

3<br />

3<br />

2<br />

3<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

CHE103<br />

CHE104<br />

ECE102<br />

ENL101<br />

MAT125<br />

ECE120<br />

005<br />

005<br />

002<br />

003<br />

002<br />

002<br />

3<br />

2<br />

2<br />

3<br />

3<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30


64<br />

26 lecturers are assigned to teach courses. Classroom groups used <strong>for</strong> each<br />

“course + section” are identified, shown in Table 4-2.<br />

TABLE 4-2 Lecturer and classroom assignment<br />

Course Section Lecturer Room Group<br />

ENL101<br />

ENL101<br />

ENL101<br />

001<br />

002<br />

003<br />

00001<br />

00001<br />

00001<br />

ENLLECRM<br />

ENLLECRM<br />

ENLLECRM<br />

ENL307<br />

001<br />

00003<br />

ENLLECRM<br />

ENL308<br />

001<br />

00003<br />

ENLLECRM<br />

ENL308<br />

002<br />

00003<br />

ENLLECRM<br />

PHY241 002 00006 PHYLECRM<br />

PHY241 001 00007 PHYLECRM<br />

CSC110<br />

CSC110<br />

CSC113<br />

CSC115<br />

001<br />

002<br />

002<br />

002<br />

00014<br />

00014<br />

00014<br />

00014<br />

CSCLECRM<br />

CSCLECRM<br />

CSCCOMLB<br />

CSCLECRM<br />

CSC120<br />

002<br />

00015<br />

CSCLECRM<br />

CSC127<br />

001<br />

00015<br />

CSCLECRM<br />

CSC127<br />

002<br />

00015<br />

CSCLECRM<br />

CSC210<br />

001<br />

00015<br />

CSCLECRM<br />

CSC113<br />

001<br />

00016<br />

CSCCOMLB<br />

CSC115<br />

001<br />

00016<br />

CSCLECRM<br />

CSC120<br />

001<br />

00016<br />

CSCLECRM<br />

CSC211<br />

001<br />

00016<br />

CSCCOMLB<br />

CSC221<br />

001<br />

00017<br />

CSCLECRM<br />

CSC221<br />

002<br />

00017<br />

CSCLECRM<br />

CSC210<br />

002<br />

00018<br />

CSCLECRM<br />

CSC211<br />

002<br />

00018<br />

CSCCOMLB<br />

CSC215<br />

001<br />

00018<br />

CSCLECRM<br />

CSC215<br />

002<br />

00018<br />

CSCLECRM<br />

CSC326<br />

001<br />

00019<br />

CSCLECRM<br />

CSC326<br />

002<br />

00019<br />

CSCLECRM<br />

CSC327<br />

001<br />

00019<br />

CSCLECRM<br />

CSC327<br />

002<br />

00019<br />

CSCLECRM


65<br />

TABLE 4-2 (CONTINUED)<br />

Course Section Lecturer Room Group<br />

CSC329<br />

CSC329<br />

CSC330<br />

001<br />

002<br />

001<br />

00020<br />

00020<br />

00020<br />

CSCLECRM<br />

CSCLECRM<br />

CSCLECRM<br />

CSC328<br />

CSC328<br />

CSC330<br />

001<br />

002<br />

002<br />

00021<br />

00021<br />

00021<br />

CSCCOMLB<br />

CSCCOMLB<br />

CSCLECRM<br />

ECE120<br />

ECE120<br />

ECE200<br />

ECE200<br />

001<br />

002<br />

001<br />

002<br />

00031<br />

00031<br />

00031<br />

00031<br />

ECELECRM<br />

ECELECRM<br />

ECEESTLB<br />

ECEESTLB<br />

ECE102<br />

ECE102<br />

ECE205<br />

ECE212<br />

001<br />

002<br />

002<br />

001<br />

00032<br />

00032<br />

00032<br />

00032<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECE203<br />

ECE203<br />

ECE205<br />

ECE212<br />

001<br />

002<br />

001<br />

002<br />

00033<br />

00033<br />

00033<br />

00033<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECE217<br />

ECE217<br />

ECE218<br />

ECE218<br />

001<br />

002<br />

001<br />

002<br />

00034<br />

00034<br />

00034<br />

00034<br />

ECELECRM<br />

ECELECRM<br />

ECEDCDLB<br />

ECEDCDLB<br />

ECE320<br />

ECE320<br />

ECE325<br />

ECE325<br />

001<br />

002<br />

001<br />

002<br />

00035<br />

00035<br />

00035<br />

00035<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECE326<br />

ECE326<br />

001<br />

002<br />

00036<br />

00036<br />

ECEELCLB<br />

ECEELCLB<br />

SIE305 001 00046 SIELECRM<br />

SIE305 002 00047 SIELECRM<br />

MAT125<br />

MAT125<br />

001<br />

002<br />

00059<br />

00059<br />

MATLECRM<br />

MATLECRM<br />

MAT220<br />

MAT220<br />

MAT223<br />

001<br />

002<br />

001<br />

00061<br />

00061<br />

00061<br />

MATLECRM<br />

MATLECRM<br />

MATLECRM


66<br />

TABLE 4-2 (CONTINUED)<br />

Course Section Lecturer Room Group<br />

MAT223 002 00061 MATLECRM<br />

MAT322<br />

MAT322<br />

001<br />

002<br />

00063<br />

00063<br />

MATLECRM<br />

MATLECRM<br />

CHE103<br />

CHE103<br />

005<br />

006<br />

00071<br />

00071<br />

CHELECRM<br />

CHELECRM<br />

CHE104 005 00072 CHEFTCLB<br />

CHE104 006 00073 CHEFTCLB<br />

Similarly, constraints about classroom size and lecturer’s time are also prepared.<br />

4.2 The Experiments and Discussions<br />

4.2.1 Experimental Designs<br />

The aims of the experiments are to evaluate the influence of setting the GA<br />

parameters and the influence of the grid computing environment.<br />

The proposed GA that is presented in chapter 3 is applied to both the centralized<br />

course scheduling program and decentralized course scheduling program. In addition,<br />

the same values of the GA parameters will be applied to these programs. Thus, to<br />

evaluate the efficiency of the GA, we only need to test one of the above course<br />

scheduling programs. Here, we test the centralized course scheduling program. To<br />

evaluate the influence of the grid computing environment, we use the grid system as<br />

shown in section 3.7.<br />

We will do four separate experiments. The first experiment tests the influence of<br />

weighting <strong>for</strong> hard and soft constraints in the fitness function. The second and third<br />

experiments test the influence of the mutation rate and the population size on the<br />

speed of evolution respectively. Finally, the <strong>for</strong>th experiment tests the influence of<br />

using the grid computing environment.<br />

The course scheduling is a NP hard problem, and the GA itself is a metaheuristic<br />

<strong>algorithm</strong>. There<strong>for</strong>e, we would obtain a good enough solution if not the best<br />

one. Each experiment will run models until the GA detects the best solution or until<br />

the GA cannot improve the fitness value in 300 consecutive generations. The model<br />

giving a faster fitness value via many runs would be a better one.


67<br />

4.2.2 Experiment 1: Hard and Soft Constraint Weight Test<br />

The aim of this experiment is to analyze the behavior of the GA as weights W 1<br />

and W 2 in the fitness function f x)<br />

= W f ( x)<br />

+ W f ( ) are modified. More details<br />

(<br />

1 1<br />

2 2<br />

x<br />

about this function were presented in section 3.6.4.<br />

To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />

run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />

- Population size : 10<br />

- Crossover rate : 0.5<br />

- Mutation rate : 0.02<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : Varied<br />

- Soft constraint weight W 2 : Varied<br />

This experiment is per<strong>for</strong>med <strong>for</strong> 3 different pairs of weights as below:<br />

- W 1 =1.0 and W 2 =0.0<br />

- W 1 =0.75 and W 2 =0.25<br />

- W 1 =0.5 and W 2 =0.5<br />

Each pair of weights is tested 5 times. Figure 4-1 presents the average fitness<br />

value f 1 (x) of hard constraints after 500 generations.<br />

The Fitness Value of Hard Constraints vs Various Weights<br />

1.00000<br />

Fitness Value f1(x)<br />

0.50000<br />

0.00000<br />

1 51 101 151 201 251 301 351 401 451 501<br />

Generation<br />

W1=1.0 & W2=0.0 W1=0.75 & W2=0.25 W1=0.5 & W2=0.5<br />

FIGURE 4-1 The average fitness value of hard constraints vs various weights


68<br />

This result shows that the GA rapidly obtains a high fitness value f 1 (x) if we use<br />

a large value W 1. This is because the solutions that have a high fitness value of hard<br />

constraints will have more chance to be selected <strong>for</strong> survival. When W 1 is 1.0, the GA<br />

gives the fastest evolution of hard constraints.<br />

Now, we will consider what will happen <strong>for</strong> fitness value f 2 (x) of soft<br />

constraints. Figure 4-2 presents the average fitness values f 2 (x) after 500 generations.<br />

The result also shows that the GA rapidly obtains a high value f 2 (x) if we use a<br />

large value W 2. When W 2 is 0.5, the GA gives the fastest evolution of soft constraints.<br />

1.00000<br />

The Fitness Value of Soft Constraints vs Various Weights<br />

Fitness Value f2(x)<br />

0.50000<br />

0.00000<br />

1 51 101 151 201 251 301 351 401 451 501<br />

Generation<br />

W1=1.0 & W2=0.0 W1=0.75 & W2= 0.25 W1=0.5 & W2= 0.5<br />

FIGURE 4-2 The average fitness value of soft constraints vs various weights<br />

However, using a larger weight <strong>for</strong> the hard constraints means using smaller<br />

weight <strong>for</strong> the soft constraints. We have to balance between hard and soft constraints.<br />

In our study, there are nine hard constraints and only one soft constraint. There<strong>for</strong>e,<br />

the pair of W 1 =0.75 and W 2 =0.25 seems the most suitable one <strong>for</strong> our GA.<br />

4.2.3 Experiment 2: Population Size Test<br />

The aim of this experiment is to analyze the behavior of the GA as population<br />

size is modified.


69<br />

To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />

run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />

- Crossover rate : 0.5<br />

- Mutation rate : 0.02<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : 0.75<br />

- Soft constraint weight W 2 : 0.25<br />

- Population size : Varied<br />

This experiment is per<strong>for</strong>med <strong>for</strong> 3 different population sizes: 5, 10 and 15.<br />

Each the population size is tested 5 times. The chart of average execution time <strong>for</strong> a<br />

resultant solution as the population size is changed is given in Figure 4-3.<br />

The Average Time <strong>for</strong> a Resultant Solution<br />

Population Size<br />

15<br />

10<br />

5<br />

2842.6<br />

2652.8<br />

5829<br />

0 1000 2000 3000 4000 5000 6000 7000<br />

Execution Time in Secconds<br />

FIGURE 4-3 The average execution time <strong>for</strong> a resultant solution vs population sizes<br />

We know that a large population contains many different individuals. This<br />

creates a diversity of possible solutions. Using a large population size, the GA can<br />

obtain a resultant solution after a small number of generations. However, our<br />

experiment shows that in term of time, the GA with a small population size converges<br />

to a solution faster than the GA with a large size population. To explain this result, we


70<br />

should revise the chromosome representation, presented in section 3.6.1. Each<br />

chromosome represents directly a timetable or a solution, so it stores a large amount<br />

of data. It also has a large amount of related data from the database. As a result, the<br />

larger population needs more memory and more processing time <strong>for</strong> GA operations.<br />

This experiment also shows that with the smallest population size (five) we<br />

have the fastest GA.<br />

The GAs with a large population do not give a faster speed of evolution.<br />

However, in order to have diversity of solutions, it may be safe to keep the population<br />

size larger than an optimum size although it is a little slower to execute. We will use<br />

the population of 10 <strong>for</strong> our GA.<br />

4.2.4 Experiment 3: Mutation Rate Test<br />

The aim of this experiment is to analyze the behavior of the GA as mutation rate<br />

is modified.<br />

To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />

run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />

- Population size : 10<br />

- Crossover rate : 0.5<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : 0.75<br />

- Soft constraint weight W 2 : 0.25<br />

- Mutation rate : Varied<br />

This experiment is per<strong>for</strong>med <strong>for</strong> 4 different mutation rates: 0.00, 0.02, 0.20 and<br />

0.40. Each rate is tested 5 times. The chart of the average fitness value f(x) after 500<br />

generations versus different mutation rates is given in Figure 4-4.<br />

The best mutation rate is found to be 0.02. The mutation rates that are lower or<br />

higher than this rate give slower evolution. This is shown definitely. If there is no<br />

mutation (0.00), offspring are generated immediately after crossover without any<br />

change. There<strong>for</strong>e, the GA would fall into local optimum. On the other hand, the high<br />

mutation rates usually cause the exploration of search space. The GA now can fall<br />

into a random search space instead of searching from offspring of good parents.


71<br />

The GA with Various Mutation Rates<br />

1.00000<br />

Fitness Value f(x)<br />

0.50000<br />

0.00000<br />

1 51 101 151 201 251 301 351 401 451 501<br />

Generation<br />

0.00 0.02 0.20 0.40<br />

FIGURE 4-4 The GA with various mutation rates<br />

4.2.5 Experiment 4: Parallel Execution on the Grid Computing Environment<br />

The aim of this experiment is to evaluate the influence of the grid computing<br />

environment to the resultant solutions.<br />

The experiment tests three different models. The first model uses a single<br />

machine to per<strong>for</strong>m the centralized course scheduling strategy as introduced in section<br />

3.4.1. The centralized course scheduling program is used to test a centralized<br />

execution that schedules <strong>for</strong> all courses. The second model also uses a single machine,<br />

but both the centralized course scheduling program and the decentralized course<br />

scheduling program are used <strong>for</strong> a serial execution. First, the centralized course<br />

scheduling program schedules <strong>for</strong> all shared resources, and then one after another the<br />

decentralized course scheduling program schedules <strong>for</strong> the remaining resources of<br />

each faculty. Finally, the third model uses a grid computing environment <strong>for</strong> parallel<br />

execution. First, the centralized course scheduling program is executed on a machine,<br />

and then the decentralized course scheduling program is executed in parallel on<br />

remote machines.<br />

Both the centralized course scheduling program and the decentralized course<br />

scheduling program will set up with the following GA settings:


72<br />

- Population size : 10<br />

- Crossover rate : 0.5<br />

- Mutation rate : 0.02<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : 0.75<br />

- Soft constraint weight W 2 : 0.25<br />

The first and second models are per<strong>for</strong>med on a Pentium IV 1.7 GHz machine.<br />

On the other hand, the third model is per<strong>for</strong>med on a gird computing environment of 3<br />

machines, as shown in Figure 3-23. The Central Manager Host m1 is a Pentium III<br />

700 MHz machine. The remote machines m2 and m3 are Pentium IV 1.7 GHz<br />

machines.<br />

Figure 4-5 presents a chart of the average execution time of each model after 5<br />

runs. Each model is executed until the GA finds a resultant solution.<br />

Execution Time vs Models<br />

Parallel Execution on the<br />

Grid<br />

439.6<br />

Model<br />

Serial Execution<br />

852.6<br />

Centralized Execution<br />

2842.6<br />

0 500 1000 1500 2000 2500 3000<br />

Execution Time in Seconds<br />

FIGURE 4-5 The execution time versus various models<br />

The first model is slower than the second model. The first model has a global<br />

view of the whole data, so it should have given a resultant solution within a short time<br />

interval. However, it gave an unexpected result. This is because when the whole data<br />

are centralized to be processed on a single machine, the size of the problem becomes


73<br />

too big. Certainly, the GA is slowed down when it works on large size chromosomes<br />

with a large number of conflicted hard and soft constraints. However, if the data are<br />

separated to be processed one after another by the centralized course scheduling<br />

program and the decentralized course scheduling program, the overall execution time<br />

will be shorter.<br />

The parallel execution of the third model is significant faster than the serial<br />

execution of the second model. It is almost definitely understood. Instead course<br />

scheduling jobs are per<strong>for</strong>med one after another; some of them are per<strong>for</strong>med in<br />

parallel by many different processors, as illustrated in Figure 4-6.<br />

Processors<br />

Parallel<br />

Execution<br />

Centralized Course<br />

Scheduling Program<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Engineering<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Education<br />

Centralized Course<br />

Scheduling Program<br />

Decentralized Course<br />

Scheduling Program<br />

<strong>for</strong> Faculty of Science<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Engineering<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Education<br />

Serial<br />

Execution<br />

Decentralized Course<br />

Scheduling Program<br />

<strong>for</strong> Faculty of Science<br />

Execution Time<br />

FIGURE 4-6 Parallel execution versus serial execution<br />

The total execution time <strong>for</strong> a complete resultant solution of the third model can<br />

be presented as follow:<br />

Total parallel execution time = Time <strong>for</strong> the centralized course scheduling +<br />

Max(Time <strong>for</strong> the decentralized course scheduling on remote machines)<br />

The data that is used <strong>for</strong> the course scheduling programs is transferred from the<br />

central database to the remote machines once be<strong>for</strong>e they are processed. In addition,<br />

there are not any exchanges of data while the programs are being executed. The time


74<br />

<strong>for</strong> network communication is much smaller than the execution time of each program,<br />

so this time is not considered in this experiment.<br />

4.3 The Sample Results<br />

This section presents the results that are obtained by running the third model<br />

that is presented in the previous section.<br />

First of all, the centralized course scheduling program is executed on machine<br />

m2. It schedules <strong>for</strong> shared resources that consist of courses whose lecturers are<br />

invited from other faculties and courses whose students come from other faculties.<br />

The results are presented in Table 4-3. Then the decentralized course scheduling<br />

program is submitted to be executed in parallel on the machines m2 and m3. It<br />

schedules <strong>for</strong> the remaining resources of each faculty. All courses taught by the<br />

Faculty of Education have been scheduled by the centralized course scheduling<br />

program, so now the decentralized course scheduling program only schedules <strong>for</strong><br />

courses taught by the Faculty of Engineering and the Faculty of Science. The results<br />

are presented in Table 4-4 and Table 4-5.<br />

TABLE 4-3 Timetable created by the centralized course scheduling program<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

ENL307 001 B201A01 3 4->6 BSCS04A 00003<br />

ENL307 001 B201A01 3 4->6 BSCS04B 00003<br />

ECE218<br />

ECE217<br />

001<br />

001<br />

B301B02<br />

B301A07<br />

4<br />

2<br />

2->3<br />

4->5<br />

BSCS05A<br />

BSCS05A<br />

00034<br />

00034<br />

ECE218<br />

ECE217<br />

002<br />

002<br />

B301B02<br />

B301A06<br />

1<br />

1<br />

6->7<br />

2->3<br />

BSCS05B<br />

BSCS05B<br />

00034<br />

00034<br />

ENL101 001 B201A01 2 4->6 BSCS06A 00001<br />

ENL101 001 B201A01 2 4->6 BSCS06B 00001<br />

MAT322<br />

ENL308<br />

001<br />

001<br />

B101A09<br />

B201A03<br />

0<br />

4<br />

6->7<br />

0->2<br />

BSEE04A<br />

BSEE04A<br />

00063<br />

00003<br />

ENL308<br />

MAT322<br />

002<br />

002<br />

B201A03<br />

B101A10<br />

4<br />

4<br />

4->6<br />

2->3<br />

BSEE04B<br />

BSEE04B<br />

00003<br />

00063<br />

MAT223<br />

PHY241<br />

001<br />

001<br />

B101A12<br />

B102A04<br />

1<br />

2<br />

4->5<br />

0->2<br />

BSEE05A<br />

BSEE05A<br />

00061<br />

00007


75<br />

TABLE 4-3 (CONTINUED)<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

MAT223<br />

PHY241<br />

002<br />

002<br />

B101A08<br />

B102A06<br />

0<br />

3<br />

2->3<br />

4->6<br />

BSEE05B<br />

BSEE05B<br />

00061<br />

00006<br />

CHE104<br />

ENL101<br />

MAT125<br />

CHE103<br />

006<br />

002<br />

002<br />

006<br />

B103A15<br />

B201A02<br />

B101A01<br />

B103A06<br />

0<br />

3<br />

2<br />

0<br />

2->3<br />

0->2<br />

0->2<br />

4->6<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

00073<br />

00001<br />

00059<br />

00071<br />

MAT125<br />

002<br />

B101A01<br />

2<br />

0->2<br />

BSEE06B<br />

00059<br />

CHE103<br />

005<br />

B103A01<br />

4<br />

0->2<br />

BSEE06B<br />

00071<br />

CHE104<br />

005<br />

B103A11<br />

4<br />

6->7<br />

BSEE06B<br />

00072<br />

ENL101<br />

003<br />

B201A01<br />

1<br />

0->2<br />

BSEE06B<br />

00001<br />

TABLE 4-4 Timetable created by the decentralized course scheduling program <strong>for</strong><br />

Faculty of Engineering<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

ECE325<br />

ECE326<br />

SIE305<br />

ECE320<br />

001<br />

001<br />

001<br />

001<br />

B301A04<br />

B301B01<br />

B302A03<br />

B301A01<br />

3<br />

2<br />

4<br />

1<br />

0->2<br />

4->5<br />

4->6<br />

2->3<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

00035<br />

00036<br />

00046<br />

00035<br />

ECE320<br />

002<br />

B301A10<br />

0<br />

2->3<br />

BSEE04B<br />

00035<br />

ECE325<br />

002<br />

B301A10<br />

2<br />

0->2<br />

BSEE04B<br />

00035<br />

SIE305<br />

002<br />

B302A02<br />

1<br />

4->6<br />

BSEE04B<br />

00047<br />

ECE326<br />

002<br />

B301B01<br />

4<br />

0->1<br />

BSEE04B<br />

00036<br />

ECE212<br />

001<br />

B301A01<br />

3<br />

4->6<br />

BSEE05A<br />

00032<br />

ECE203<br />

001<br />

B301A02<br />

4<br />

0->1<br />

BSEE05A<br />

00033<br />

ECE200<br />

001<br />

B301B05<br />

4<br />

4->5<br />

BSEE05A<br />

00031<br />

ECE205<br />

001<br />

B301A01<br />

1<br />

0->1<br />

BSEE05A<br />

00033<br />

ECE205<br />

002<br />

B301A01<br />

4<br />

4->5<br />

BSEE05B<br />

00032<br />

ECE212<br />

002<br />

B301A09<br />

2<br />

0->2<br />

BSEE05B<br />

00033<br />

ECE200<br />

002<br />

B301B05<br />

4<br />

6->7<br />

BSEE05B<br />

00031<br />

ECE203<br />

002<br />

B301A08<br />

0<br />

4->5<br />

BSEE05B<br />

00033<br />

ECE102<br />

001<br />

B301A07<br />

4<br />

6->7<br />

BSEE06A<br />

00032<br />

ECE120<br />

001<br />

B301A08<br />

2<br />

4->6<br />

BSEE06A<br />

00031<br />

ECE120<br />

002<br />

B301A08<br />

3<br />

0->2<br />

BSEE06B<br />

00031<br />

ECE102<br />

002<br />

B301A01<br />

0<br />

0->1<br />

BSEE06B<br />

00032


76<br />

TABLE 4-5 Timetable created by the decentralized course scheduling program <strong>for</strong><br />

Faculty of Science<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

CSC328<br />

CSC326<br />

CSC329<br />

CSC327<br />

CSC330<br />

001<br />

001<br />

001<br />

001<br />

001<br />

B104B18<br />

B104B05<br />

B104B11<br />

B104B05<br />

B104B02<br />

2<br />

1<br />

0<br />

4<br />

4<br />

2->3<br />

0->2<br />

0->2<br />

6->7<br />

2->3<br />

BSCS04A<br />

BSCS04A<br />

BSCS04A<br />

BSCS04A<br />

BSCS04A<br />

00021<br />

00019<br />

00020<br />

00019<br />

00020<br />

CSC328<br />

CSC329<br />

CSC326<br />

CSC330<br />

CSC327<br />

002<br />

002<br />

002<br />

002<br />

002<br />

B104B16<br />

B104B09<br />

B104B10<br />

B104B03<br />

B104B01<br />

1<br />

0<br />

2<br />

4<br />

2<br />

2->3<br />

4->6<br />

4->6<br />

6->7<br />

2->3<br />

BSCS04B<br />

BSCS04B<br />

BSCS04B<br />

BSCS04B<br />

BSCS04B<br />

00021<br />

00020<br />

00019<br />

00021<br />

00019<br />

CSC210<br />

CSC215<br />

CSC221<br />

MAT220<br />

CSC211<br />

002<br />

002<br />

002<br />

001<br />

002<br />

B104B06<br />

B104B04<br />

B104B03<br />

B101A02<br />

B104B17<br />

3<br />

4<br />

1<br />

0<br />

2<br />

4->6<br />

6->7<br />

4->6<br />

4->6<br />

0->3<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

00018<br />

00018<br />

00017<br />

00061<br />

00018<br />

MAT220<br />

CSC221<br />

CSC211<br />

CSC210<br />

CSC215<br />

002<br />

001<br />

001<br />

001<br />

001<br />

B101A11<br />

B104B09<br />

B104B15<br />

B104B08<br />

B104B04<br />

2<br />

2<br />

3<br />

0<br />

3<br />

4->6<br />

0->2<br />

4->7<br />

0->2<br />

2->3<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

00061<br />

00017<br />

00016<br />

00015<br />

00018<br />

MAT125<br />

CSC120<br />

CSC115<br />

CSC110<br />

CSC127<br />

CSC113<br />

001<br />

002<br />

002<br />

002<br />

002<br />

002<br />

B101A03<br />

B104B07<br />

B104B12<br />

B104B06<br />

B104B01<br />

B104B14<br />

4<br />

3<br />

1<br />

3<br />

4<br />

4<br />

4->6<br />

4->6<br />

0->1<br />

0->1<br />

2->3<br />

0->1<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

00059<br />

00015<br />

00014<br />

00014<br />

00015<br />

00014<br />

CSC120<br />

CSC110<br />

MAT125<br />

CSC127<br />

CSC113<br />

CSC115<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

B104B12<br />

B104B08<br />

B101A03<br />

B104B04<br />

B104B14<br />

B104B08<br />

4<br />

3<br />

4<br />

1<br />

3<br />

2<br />

0->2<br />

6->7<br />

4->6<br />

4->5<br />

2->3<br />

0->1<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

00016<br />

00014<br />

00059<br />

00015<br />

00016<br />

00016


77<br />

These results show that all constraints presented in section 1.3 have been<br />

satisfied. Every “course + section” is scheduled exactly once in a week. No course is<br />

scheduled cross morning and afternoon working sessions. Neither a class nor a<br />

lecturer nor a classroom is assigned to more than one course at the same time. For<br />

example, shown in Table 4-3, section 001 of course ENL308 is scheduled <strong>for</strong> lecturer<br />

00003 using classroom B201A03 on day 4 (Friday) and in the time-slots: 0, 1, and 2.<br />

There<strong>for</strong>e, this lecturer and this classroom are not booked <strong>for</strong> other courses at this<br />

time.<br />

Once a class of students studies from a list of courses, these courses have to be<br />

scheduled to different time periods. For example, shown in Table 4-1, class BSCS05B<br />

studies section 001 of courses: CSC215, CSC221, CSC210, and CSC211, and section<br />

002 of courses: ECE217, ECE218, and MAT220. There<strong>for</strong>e these “course + section”<br />

are scheduled to different time periods. Another example is shown in Table 4-3.<br />

Section 001 of course ENL307 is attended by both classes: BSCS04A and BSCS04B.<br />

There<strong>for</strong>e, this course section is scheduled to the same time periods and the same<br />

classroom so that these classes can attend it as well as their other courses.<br />

Other constraints presented in section 1.3 have also been satisfied, but they are<br />

not introduced here.<br />

The decentralized course scheduling program must give results that do not<br />

conflict with the centralized course scheduling output. If a class is scheduled by the<br />

centralized course scheduling program, then the decentralized course scheduling<br />

program has to schedule the remaining courses that concern this class to another time.<br />

For example, shown in Table 4-3, the centralized course scheduling program<br />

scheduled the courses that are attended by class BSEE06A. There<strong>for</strong>e, the<br />

decentralized course scheduling program scheduled other courses studied by this class<br />

to another time, shown in Table 4-4.


CHAPTER 5<br />

CONCLUSION<br />

5.1 Conclusions<br />

This study proposed a hybrid centralized and de-centralized approach, a <strong>genetic</strong><br />

<strong>algorithm</strong>, and a grid computing environment <strong>for</strong> course scheduling in <strong>multi</strong>ple<br />

faculty universities.<br />

The proposed GA demonstrated its ability <strong>for</strong> solving a complex optimization<br />

problem, the highly constrained course scheduling problem. The direct representation<br />

of chromosomes is convenient <strong>for</strong> representing a large number of constraints of a<br />

realistic timetable. Additional constraints can easily be added into the model without<br />

much modification on the basic model.<br />

The speed of evolution of the GA is significantly different dependent on GA<br />

parameters used. The GAs with large populations do not give a faster speed of<br />

convergence. However, in order to have diversity of solutions, it may be safe to keep<br />

the population size larger than an optimum size although it is a little slower. The<br />

experiments also show that the use of mutation is very important <strong>for</strong> the GA. A small<br />

enough rate is effective. No mutation or mutation with high rates gives a slower<br />

evolution. The weighting <strong>for</strong> hard and soft constraints in the fitness function should be<br />

based on the number and importance of them. The hard constraints should be<br />

weighted larger than the soft constraints.<br />

The hybrid centralized and de-centralized approach was used. The centralized<br />

course scheduling program only schedules <strong>for</strong> shared resources whereas the<br />

decentralized course scheduling program schedules <strong>for</strong> remaining resources of each<br />

faculty. The results showed that this approach gave the expected solutions without<br />

any constraint conflicts between resources around the university. The resultant<br />

solution can help lecturers not only teach at their faculty but also at other faculties. A<br />

course can be attended by many different classes.<br />

The grid computing environment is used as infrastructure <strong>for</strong> distributed and<br />

parallel computing. There is a combination of the hybrid centralized and de-


80<br />

centralized approach and grid computing environment. Now the centralized course<br />

scheduling program and decentralized course scheduling program are considered as<br />

jobs. These jobs are scheduled to be executed. The centralized course scheduling job<br />

is per<strong>for</strong>med first, and then the decentralized course scheduling jobs are per<strong>for</strong>med in<br />

parallel on separate machines. The decentralized course scheduling program must<br />

give results that do not conflict with the centralized course scheduling output.<br />

The use of the grid computing environment gave a high level of efficiency. It<br />

reduces significantly the overall execution time <strong>for</strong> a resultant solution. This is<br />

because a very large problem with many conflicted constraints is now separated into<br />

small size problems to be processed in parallel by many different machines instead of<br />

using only one machine.<br />

5.2 Future Works<br />

Overall, our preliminary experiments suggested that the proposed model has<br />

been successful to satisfy the <strong>objective</strong>s in our proposal. We have worked on two<br />

interesting areas: the <strong>genetic</strong> <strong>algorithm</strong> and the grid computing. They are wide areas,<br />

so what has been obtained is a foundation <strong>for</strong> further research.<br />

Our experiments identified the GA parameters <strong>for</strong> an effective GA. Further<br />

experiments should be done <strong>for</strong> various data and more soft constraints. We also need<br />

design <strong>algorithm</strong>s that are able to automatically identify suitable values <strong>for</strong> the GA<br />

parameters.<br />

Local search techniques should be used to improve the speed of the GA. The<br />

local search <strong>algorithm</strong>s should also help the GA to create solutions that are able to<br />

minimize use of university resources, e.g. the number of used classrooms and the<br />

stretch of lecturer time.<br />

To satisfy both hard and soft constraints in a balanced way, the <strong>multi</strong>-<strong>objective</strong><br />

<strong>genetic</strong> <strong>algorithm</strong> should be researched.<br />

The grid computing environment was implemented on Linux machines. For<br />

more flexible use, it should be developed <strong>for</strong> heterogeneous environments with more<br />

machines added.


REFERENCES<br />

1. Alkan, A. and Ozcan, E. “Memetic Algorithms <strong>for</strong> Timetabling.” IEEE Congress<br />

on Evolutionary Computation. 3 (2003, December 8-12) : 1796-1802.<br />

2. Marc Buf, Tim Fischer, et al. “Automated solution of a highly constrained school<br />

timetabling problem - preliminary results.” Applications of Evolutionary<br />

Computing : EvoWorkshops 2001: EvoCOP, EvoFlight, EvoIASP, EvoLearn,<br />

and EvoSTIM, Como, Italy. (2001, April 18-20) : 431-440.<br />

3 Goulas, G. and Housos, E. “SchedSP: Providing GRID-enabled Real - World<br />

Scheduling Solutions as Application Services.” EuroWeb 2002 Conference,<br />

St Anne's College, Ox<strong>for</strong>d, UK. (2002, December 17-18).<br />

4. Kaplansky, E., Kendall, G., et al. “Distributed Examination Timetabling.”<br />

PATAT '04 Proceedings of the 5th International Conference on the Practice<br />

and Theory of Automated Timetabling, Pittsburgh, PA USA. (2004, August<br />

18-20) : 511-516.<br />

5. Lim, A., Ang, J. C., et al. “UTTSExam: A Campus-Wide University Exam-<br />

Timetabling System”. Proceedings of the Eighteenth National Conference<br />

on Artificial Intelligence and Fourteenth Conference on Innovative<br />

Applications of Artificial Intelligence, Edmonton, Alberta, Canada. (2002,<br />

July 28 - August 1) : 838-844.<br />

6. Genetic Algorithm [Online]. Available from:<br />

http://cs.felk.cvut.cz/~xobitko/ga/gaintro.html [2005, May 2].<br />

7. Luis Ferreira, et al. Introduction to Grid Computing with Globus. IBM Redbooks,<br />

September 2003.<br />

8. Bart Jacob, et al. Enabling Applications <strong>for</strong> Grid Computing with Globus. IBM<br />

Redbooks, June 2003.<br />

9. Carter, M. W. and Laporte, G. “Recent Developments in Practical Course<br />

Timetabling.” In Edmund Burke and Michael Carter, editors, Practice and<br />

Theory of Automated Timetabling II, Springer-Verlag LNCS. 1408 (1998) :<br />

3-19.


82<br />

10. Carter, M. W. “A Survey of Practical Applications of Examination Timetabling<br />

Algorithms.” Operations Research. 34 (1986) : 193-202.<br />

11. Burke, E. K., Elliman, D. G., et al. “University Timetabling System Based on<br />

Graph Colouring and Constraint Manipulation.” Journal of Research on<br />

Computing in Education. 27(1) (1993) : 1-18.<br />

12. Burke, E. K., Dror, M., et al. “Hybrid Graph Heuristics within a Hyper-heuristic<br />

Approach to Exam Timetabling Problems.” The Next Wave in Computing,<br />

Optimization, and Decision Technologies. (2005) : 79-91.<br />

13. Redl, T. A. “A Study of University Timetabling that Blends Graph Coloring with<br />

the Satisfaction of Various Essential and Preferential Conditions.”<br />

PhD.Thesis, Rice University, Houston, Texas, 2004.<br />

14. Balakrishnan, N., Lucena, A. and Wong, R. T. “Scheduling Examinations to<br />

Reduce Second-Order Conflicts.” Computers & Operations Research. 19<br />

(1992) : 353-361.<br />

15. Arani, T. and Lotfi, V. “A Three Phased Approach to Final Exam Scheduling.”<br />

IIE Trans. 21 (1989) : 86-96.<br />

16. Sally C. Brails<strong>for</strong>d, Chris N. Potts, et al. ”Constraint Satisfaction Problems:<br />

Algorithms and Applications.” European Journal of Operational Research.<br />

119 (1999) : 557-581.<br />

17. White, G. M. “Constrained Satisfaction, Not So Constrained Satisfaction and the<br />

Timetabling Problem.” PATAT '00 Proceedings of the 3rd International<br />

Conference on the Practice and Theory of Automated Timetabling, Konstanz,<br />

Germany. 1 (2000, August 16-18) : 32-47.<br />

18. Valouxis, C. and Housos, E.. “Constraint Programming Approach <strong>for</strong> School<br />

Timetabling.” Computers & Operations Research. 30(1) (2003, September) :<br />

1555–1572.<br />

19. Gueret, C., Jussien, N., et al. “Building University timetables using Constraint<br />

Logic Programming.” Proceedings of the First International Conference on<br />

the Practice and Theory of Automated Timetabling (ICPTAT '95), France.<br />

(1995) : 393-408.


83<br />

20. Burke, E. K. and Newall, J. P. “A Multi-Stage Evolutionary Algorithm <strong>for</strong> the<br />

Timetable Problem.” The IEEE Transactions on Evolutionary Computation.<br />

3(1) (1999, April) : 63-74.<br />

21. Paechter, B., Rankin, R. C. and Cumming, A. “Improving a Lecture Timetabling<br />

System <strong>for</strong> University-Wide Use.” In: Burke, E., Carter, M. (eds.): The<br />

Practice and Theory of Automated Timetabling II: Selected Papers<br />

(PATAT ’97, University of Toronto), Lecture Notes in Computer Science,<br />

Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 156-165.<br />

22. Ross, P., Hart, E. and Corne, D. “Some Observations about GA based<br />

Timetabling.” In: Burke, E., Carter, M. (eds.): The Practice and Theory of<br />

Automated Timetabling II: Selected Papers (PATAT ’97, University of<br />

Toronto, Lecture Notes in Computer Science, Springer-Verlag, Berlin<br />

Heidelberg New York. 1408 (1998) : 115-129.<br />

23. Elmohamed, S., Coddington, P. and Fox., F. A. “Comparison of Annealing<br />

Techniques <strong>for</strong> Academic Course Scheduling.” In: Burke, E., Carter, M.<br />

(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />

Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />

Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 92-<br />

112.<br />

24. White, G. M. and Zhang, J. “Generating Complete University Timetables by<br />

Combining Tabu Search with Constraint Logic.” In: Burke, E., Carter, M.<br />

(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />

Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />

Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 187-<br />

210.<br />

25. Dowsland, K. A. “Off the Peg or Made to Measure.” In: Burke, E., Carter, M.<br />

(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />

Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />

Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 37-52.<br />

26. Elmohamed, S., et al. “A Comparison of Annealing Techniques <strong>for</strong> Academic<br />

Course Scheduling.” Lecture Notes in Computer Science. 1408 (1998) : 92-<br />

114.


84<br />

27. Abramson, D. “Constructing School Timetables using Simulated Annealing:<br />

Sequential and Parallel Algorithms.” Management Science. 37(1) (1991,<br />

January) : 98 – 113.<br />

28. Aydin, M. E. “A Distributed Evolutionary Simulated Annealing Algorithm <strong>for</strong><br />

Combinatorial Optimisation Problems.” Journal of Heuristics. 10 (2004) :<br />

269–292.<br />

29. Calaor, A. E., Hermosilla, A.Y., et al. “Parallel Hybrid Adventures with<br />

Simulated Annealing and Genetic Algorithms.” Proceedings of the<br />

International Symposium on Parallel Architectures, Algorithms and<br />

Networks (ISPAN.02). (2002, May 22-24) : 33-38.<br />

30. Alvarez-valdes, R. “A Tabu Search Algorithm to Schedule University<br />

Examinations.” QUESTIIO. 21 (1997) : 201-215.<br />

31. Burke, E. K., Kendall, G. and Soubeiga, E. “Tabu-Search Hyperheuristic <strong>for</strong><br />

Timetabling and Rostering.” Journal of Heuristics. 9 (2003) : 451–470.<br />

32. Tabu Search [Online]. Available from:<br />

http://www.cs.sandia.gov/opt/survey/ts.html [2005, September 12].<br />

33. Wang, Y. Z. “A GA-based methodology to determine an optimal curriculum <strong>for</strong><br />

schools.” Expert Systems with Applications. 28 (2005) : 163–174.<br />

34. Tuan, D. A. and Kim, H. L. “Combining Constraint Programming and Simulated<br />

Annealing on University Exam Timetabling.” International Conference,<br />

RIVF’04, Hanoi, Vietnam. (2004, February 2-5) : 205-210.<br />

35. Kaplansky, E. and Meisels, A. “Negotiation among Scheduling Agents <strong>for</strong><br />

Distributed Timetabling.” In Submitted to the 5th International Conference<br />

on the Practice and Theory of Automated Timetabling PATAT'04, Pittsburgh,<br />

PA USA. (2004, August) : 84-105.<br />

36. Marczyk, A. Genetic Algorithms and Evolutionary Computation [Online].<br />

Available from: http://www.talkorigins.org/faqs/genalg/genalg.html [2005,<br />

September 18].<br />

37. Esposito, A. and Tarricone, L. “Grid Computing <strong>for</strong> Electromagnetics: A<br />

Beginner’s Guide with Applications.” IEEE Antennas and Propagation<br />

Magazine. 45(2) (2003, April) : 91-100.


85<br />

38. Globus Toolkit [Online]. Available From: http://www.globus.org [2005,<br />

September 20].<br />

39. Foster, I., Kesselman, C. and Tuecke, S. “The Anatomy of the Grid: Enabling<br />

Scalable Virtual Organizations.” International Journal of High Per<strong>for</strong>mance<br />

Computing Applications. 15(3) (2001) : 200-222.<br />

40. Hamscher, V., Schwiegelshohn, U., et al. “Evaluation of Job-Scheduling<br />

Strategies <strong>for</strong> Grid Computing.” In Proceedings of the 7th International<br />

Conference on High Per<strong>for</strong>mance Computing HiPC-2000, Springer, Berlin,<br />

Lecture Notes in Computer Science LNCS 1971, Bangalore, Indien. (2000,<br />

December) : 192-202.


APPENDIX A<br />

DATA DICTIONARY


88<br />

This section presents the structure of the tables in the database that is created <strong>for</strong><br />

the entity relationship diagram shown in Figure 3-5.<br />

A.1 Faculty<br />

TABLE A-1 Faculty<br />

Table: Faculty<br />

Field Type Key Description<br />

FacultyID char(2) Primary ID of faculty<br />

FacultyName char(100) Name of faculty<br />

The university has several faculties, e.g. Faculty of Science, Faculty of<br />

Engineering, and Faculty of Education.<br />

A.2 Department<br />

TABLE A-2 Department<br />

Table: Department<br />

Field Type Key Description<br />

DeptID char(3) Primary ID of department<br />

DeptName char(255) Name of department<br />

FacultyID char(2) Foreign ID of faculty<br />

Each faculty has several departments that include a set of lecturers and courses<br />

within the same scientific domain, e.g. Department of Computer Science, Department<br />

of Mathematics, and Department of Physics.


89<br />

A.3 Lecturer<br />

TABLE A-3 Lecturer<br />

Table: Lecturer<br />

Field Type Key Description<br />

LecturerID char(5) Primary ID of lecturer<br />

LecturerName char(40) Name of lecturer<br />

Gender char(1) Gender of lecturer<br />

DeptID char(3) Foreign ID of department<br />

Lecturers are responsible <strong>for</strong> teaching several courses. Each lecturer is member<br />

of a department.<br />

A.4 Busy Time<br />

TABLE A-4 Busy Time<br />

Table: BusyTime<br />

Field Type Key Description<br />

LecturerID char(5) Primary ID of lecturer<br />

Day int(2) Date in a week<br />

Workingsession int(2) Working session in a day<br />

State int(1) State of lecturer<br />

Not all working sessions of a day in each week are available to be scheduled <strong>for</strong><br />

a lecturer. For instance, Mr. Tim cannot teach on every Monday morning because of<br />

weekly meeting. Some other lecturers dislike teaching in some working sessions. For<br />

instance, Miss Mary dislikes teaching on Friday mornings. Based on data stored in the<br />

BusyTime table, the system tries to satisfy lecturers’ desires. A state has one of the<br />

following three states: 0, 1, or 2. The value of 2 presents a available working session.<br />

The value of 1 is used if the lecturer dislikes teaching at this time (soft constraint).<br />

Finally, the value of 0 is used if the lecturer cannot teach at this time (hard constraint).


90<br />

A.5 Building<br />

TABLE A-5 Building<br />

Table: Building<br />

Field Type Key Description<br />

BuildingID char(2) Primary ID of building<br />

BuildingName char(100) Name of building<br />

The university has several buildings that have a number of classrooms.<br />

A.6 Classroom<br />

TABLE A-6 Classroom<br />

Table: Classoom<br />

Field Type Key Description<br />

ClassroomID char(7) Primary ID of classroom<br />

ClassroomName char(10) Name of classroom<br />

Seats int(3) Number of seats<br />

BuildingID char(2) Foreign ID of building<br />

ClasssroomGroupID char(8) Foreign ID of classroom group<br />

A classroom in a building belongs to a certain classroom group.<br />

A.7 Classroom Group<br />

TABLE A-7 Classroom group<br />

Table: ClassroomGroup<br />

Field Type Key Description<br />

ClassroomGroupID char(8) Primary ID of classroom group<br />

ClassroomGroupName char(100) Name of classroom group


91<br />

Classrooms are grouped into groups. A course is scheduled to a classroom of<br />

designated groups. For instance, course ECE218 (Digital Circuit Design Lab) is only<br />

expected to be scheduled to group ECEDCDLB (Digital Circuit Design Labs).<br />

A.8 Department Controls Rooms<br />

TABLE A-8 Department controls classroom<br />

Table: DeptControlRoom<br />

Field Type Key Description<br />

DeptID char(3) Primary ID of department<br />

ClassroomGroupID char(8) Primary ID of classroom group<br />

A department owns a number of classroom groups that are used <strong>for</strong> its courses.<br />

A.9 Course<br />

TABLE A-9 Course<br />

Table: Course<br />

Field Type Key Description<br />

CourseID char(6) Primary ID of course<br />

CourseName char(80) Name of course<br />

Credits int(2) Number of credits<br />

Kind char(1) Kind : lecture or practice<br />

DeptID char(3) Foreign ID of a department<br />

A course belongs to a department.


92<br />

A.10 Program<br />

TABLE A-10 Program<br />

Table: Program<br />

Field Type Key Description<br />

ProgramID char(4) Primary ID of program<br />

ProgramName char(255) Name of program<br />

NumSemesters int(2) Number of semesters<br />

DeptID char(3) Foreign ID of department<br />

The university has a number of programs. After studying a program that<br />

includes a number of courses, a student will get a degree, e.g. Bachelor of Science in<br />

Computer Science. A program belongs to a department.<br />

A.11 Curriculum<br />

TABLE A-11 Curriculum<br />

Table: Curriculum<br />

Field Type Key Description<br />

ProgramID char(4) Primary ID of program<br />

CourseID char(6) Primary ID of course<br />

Semester int(2) Semester has this course<br />

Year int(4) Enrolment year of students<br />

<strong>for</strong> applying this curriculum<br />

To take a degree a student has to fulfill a list of courses in each semester. For<br />

instance, in the first semester, students of Bachelor of Science in Computer Science<br />

take courses: ENL101, CSC110, CSC113, MAT125, CSC115, CSC120, and CSC127.<br />

A curriculum is applied to students based on their enrolment year.


93<br />

A.12 Class<br />

TABLE A-12 Class<br />

Table: Class<br />

Field Type Key Description<br />

ClassID char(7) Primary ID of class<br />

ClassName char(100) Name of class<br />

NumStudents int(3) Number of students<br />

EnrolYear int(4) Enrolment year<br />

ProgramID char(4) Foreign ID of program<br />

Students who study the same program and have the same enrolment year are<br />

grouped into classes.<br />

A.13 Course Section<br />

TABLE A-13 Course section<br />

Table: CourseSection<br />

Field Type Key Description<br />

ClassID char(7) Primary ID of class<br />

Semester int(2) Primary Current semester<br />

Year int(4) Primary Current year<br />

CourseID char(6) Primary ID of course<br />

SectionNo char(3) Section number<br />

LecturerID char(5) ID of lecturer<br />

NumStudents char(4) Number of student<br />

A section is used as an instance of a course taught by a lecturer. “A section of a<br />

course + a lecturer + an estimated number of attended students” is that we will<br />

schedule to time-slots of a certain classroom.


94<br />

A.14 Timetable<br />

TABLE A-14 Timetable<br />

Table: Timetable<br />

Field Type Key Description<br />

RoomID char(7) Primary ID of room<br />

Day int(2) Primary Day in a week<br />

Hour int(2) Primary Hour in a day<br />

CourseSectionID char(9) CourseID+ SectionID<br />

Although this timetable looks simple, it stores the results from the whole course<br />

scheduling system. A section of a course will be schedule to successive time-slots.


APPENDIX B<br />

INSTALLING GRID ENVIRONMENT


96<br />

This section presents in detail steps <strong>for</strong> installing and setting up the grid<br />

environment that includes Red Hat Linux, Network Time Protocol, Globus, and a<br />

Certificate Authority.<br />

The following topics are discussed:<br />

- Required software<br />

- Hardware environment<br />

- Operating system installation<br />

- Globus installation and setup<br />

- CA installation and setup<br />

B.1 Required Software<br />

Globus Toolkit 2.2 is used in this study. Globus Toolkit 2.x supports Red Hat<br />

Linux on xSeries and AIX on pSeries. We select Red Hat Linux 9.0 as our host<br />

operating system.<br />

The below is the list of required files to be downloaded:<br />

- Globus Packaging Technology: gpt-2.2.2-src.tar.gz<br />

- Globus client: globus-all-client-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

- Server bundle: globus-all-server-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

- Certificate Authority: globus_simple_ca_bundle-0.9.tar.gz<br />

- Network Time Protocol (NTP): ntp-4.1.1-1.i386.rpm<br />

Place these files in the directory /usr/src. These Globus files can be downloaded<br />

from the address: ftp://ftp.globus.org/pub/gt2/2.2/.<br />

The NTP package already is installed in Red Hat Linux 9.0, so we do not need<br />

to download and install it. However, <strong>for</strong> other versions of Linux, we have to set up the<br />

NTP on hosts.<br />

B.2 Setting Up the Environment<br />

An Ethernet LAN and three Intel Pentium machines were used to build the grid<br />

environment. Figure 3-23 presents this environment with the host names and<br />

functions to be installed in each machine.<br />

The host names are m1, m2, and m3. The machines should have a clock speed<br />

of at least 500 Mhz, at least 128 MB of memory, and at least 8 GB hard drives.


97<br />

There are dependencies among steps of installing and setting up, so they require<br />

to be per<strong>for</strong>med in the order.<br />

The major steps to set up the grid environment include installing:<br />

- Red Hat Linux 9.0 on each machine<br />

- Network Time Protocol server on one machine (here we use m2) and<br />

configuring NTP clients <strong>for</strong> the others (m1 and m3)<br />

- Globus Packaging Technology on each machine<br />

- Globus Server on the m2 and m3 machines<br />

- Globus Client on m1<br />

- Globus Simple Certificate Authority on m2<br />

The grid is configured using the below major steps:<br />

- Sign the certificate requests from all components and users needing them<br />

- Set up gridmap files <strong>for</strong> each system<br />

- Set up automated grid startup<br />

- Set up each GRIS to talk to one GIIS<br />

- Set up MDS security<br />

B.2.1 Naming and Addressing Planning<br />

The Table B-1 shows names, IP addresses, and software to be installed on<br />

machines.<br />

TABLE B-1 Host names, IP addressing, and software<br />

Host name IP Software<br />

m1.kmitnb.ac.th 192.168.10.241 Globus client, centralized scheduling program, MySQL 4.0<br />

m2.kmitnb.ac.th 192.168.10.242 Globus server, CA, and NTP server<br />

m3.kmitnb.ac.th 192.168.10.243 Globus server<br />

We also define the user IDs, groups, and passwords be<strong>for</strong>e implementation,<br />

shown in Table B-2.<br />

The root and globususer ID are used on all machines. Some machines have no<br />

password <strong>for</strong> snobol and adminca ID because the corresponding machine does not<br />

have that user ID installed on it.


98<br />

TABLE B-2 Group, user ID and password<br />

User ID Group ID m1 password m2 password m3 password<br />

Root Root pwrtm1 pwrm2 pwrm3<br />

globususer globus pwgbm1 pwgm2 pwgm3<br />

snobol snobol pwsbm1<br />

adminca adminca pwamm2<br />

The globususer ID is used to run jobs on the grid <strong>for</strong> the user. Since this user ID<br />

has more than eight characters, we will need to install it later, rather than installing it<br />

as part of the Linux install process. The other user IDs can be installed as part of the<br />

Linux installation or later.<br />

The snobol ID is used to submit jobs to the grid.<br />

The adminca ID is used to receive certificate requests <strong>for</strong> the Certificate<br />

Authority. The adminca ID could be used to ftp the certificate requests to the machine<br />

m2 in our install. The certificates will be signed using the root ID on machine m2.<br />

Be<strong>for</strong>e installing the Globus Simple Certificate Authority, we must define the<br />

distinguished name (DN) that will be used by the CA in our environment. Table B-3<br />

describes the distinguished name used <strong>for</strong> the Certificate Authority in our<br />

environment. The distinguished names <strong>for</strong> the users and <strong>for</strong> the Globus services will<br />

be generated automatically.<br />

TABLE B-3 Distinguished name and passphrase<br />

Certificate Authority DN<br />

cn=my test CA, ou=m2.kmitnb.ac.th, ou=demotest, o=grid<br />

Passphrase<br />

mycapw<br />

The distinguished name (DN) and passphrase will be used by the Certificate<br />

Authority to sign certificate requests.<br />

B.2.2 Installing Linux<br />

Install Linux on all of the machines using the “server” install, selecting all<br />

packages and “no firewall”. Each system should use a fixed network IP address with a<br />

corresponding host name, given in Table B-1, and do not use DHCP.<br />

After installing Linux on each system, we create user IDs in Table B-2. The<br />

below is an example of how to add the globususer ID on machine m1.


99<br />

Add a group <strong>for</strong> globus by executing:<br />

groupadd -g 900 globus<br />

Add the user globususer (with password globususer) by executing:<br />

adduser -u 900 -g globus -d /home/globususer -n globususer<br />

Change the globususer ID’s password from globususer to pwsbm1 or other<br />

password by executing:<br />

passwd globususer<br />

B.2.3 Installing Network Time Protocol (NTP)<br />

NTP needs to be installed because the grid needs the clocks on the systems to be<br />

synchronized. The security process creates proxy certificates that are valid <strong>for</strong> specific<br />

times. If the systems do not have their clocks synchronized, then the users may not be<br />

able to use the grid, because the proxy certificates may look like they have expired or<br />

are not yet valid.<br />

On all of the grid machines, install NTP as follows using the root ID:<br />

$ rpm -ivh /usr/src/ntp-4.1.1-1.i386.rpm<br />

If the package is already installed as a part of the Linux distribution, ignore the<br />

error message and continue to set up the NTP server. Proceed by setting up the server<br />

and daemons.<br />

Edit the file /etc/ntp.conf on the machine designated to be the time server,<br />

machine m2, and leave the following four lines as the only un-commented ones,<br />

commenting all others with a leading “#” character:<br />

server 127.127.1.0 # local clock<br />

fudge 127.127.1.0 stratum 10<br />

driftfile /etc/ntp/drift<br />

broadcastdelay 0.008<br />

Also, on the NTP server machine (m2), use the settings ntsysv command to<br />

enable the NTP daemon (ntpd) on the next reboot. We can also start the Red Hat<br />

Service Configuration tool by clicking on Main Menu System Setting Server<br />

Setting Services. Scroll down the list of services on the left side until we get to the<br />

ntpd service. Click on the ntpd service and click Start to run it.<br />

On the other machines in the grid (m1 and m3), change the file /etc/ntp.conf to<br />

leave only the following lines un-commented:<br />

server m2.kmitnb.ac.th<br />

driftfile /etc/ntp/drift


100<br />

broadcastdelay 0.008<br />

authenticate no<br />

Next, execute the following command to have them check <strong>for</strong> the time from the<br />

above server machine m2:<br />

ntpdate -b m2.kmitnb.ac.th<br />

This should be executed at least once per boot, and could be set up to run<br />

periodically using crond and crontab.<br />

B.2.4 Setting Up Host Files and Environment Variables on Each Machine<br />

As root, use an editor to edit the hosts file /etc/hosts on each machine with the<br />

following lines:<br />

127.0.0.1 localhost<br />

192.168.10.241 m1.kmitnb.ac.th m1<br />

192.168.10.242 m2.kmitnb.ac.th m2<br />

192.168.10.243 m3.kmitnb.ac.th m3<br />

Verify machine connectivity after the next reboot, using the ping command to<br />

ping each of the other machines by name.<br />

Edit the file /etc/profile in each machine. Insert the following three lines after<br />

the line in /etc/profile that says “export PATH USER ...”:<br />

export GPT_LOCATION=/usr/local/gpt<br />

export GLOBUS_LOCATION=/usr/local/globus<br />

export PATH=$PATH:$GLOBUS_LOCATION/bin:$GLOBUS_LOCATION/sbin<br />

Log off and log back on the machines after modifying the file /etc/profile so that<br />

the above settings take effect.<br />

B.2.5 Installing the GPT<br />

Log on as root and install GPT on all of the machines. Please ignore all<br />

warnings from Globus:<br />

cd /usr/src<br />

tar -xzvf gpt-2.2.2-src.tar.gz<br />

cd gpt-2.2.2<br />

./build_gpt<br />

ls ${GPT_LOCATION}/sbin | wc -l<br />

The final ls command should show 29 gpt-* executable files.<br />

B.2.6 Installing a Globus Server Bundle<br />

The following is used to install the server bundle on each server machine.<br />

Per<strong>for</strong>m these steps on each machine that will be a server. In our demo, we will use<br />

machines m2 and m3 as servers.


101<br />

As root, run:<br />

cd /usr/src<br />

export PATH=$PATH:$GPT_LOCATION/sbin<br />

gpt-install globus-all-server-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

gpt-postinstall<br />

/usr/local/globus/setup/globus/setup-gsi<br />

y<br />

q<br />

B.2.7 Installing a Globus Client Bundle<br />

The following is used to install the client bundle on any machines that will be<br />

used to query or submit jobs to the grid. In our application, we will install the client<br />

on the machine m1.<br />

As root, run:<br />

cd /usr/src<br />

export PATH=$PATH:$GPT_LOCATION/sbin<br />

gpt-install globus-all-client-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

gpt-postinstall<br />

/usr/local/globus/setup/globus/setup-gsi<br />

y<br />

q<br />

B.2.8 Installing the Globus Simple Certificate Authority<br />

To install the Globus Simple Certificate Authority, one of the Globus bundles<br />

(server or client) needs to be installed on the machine due to a dependency. We will<br />

install the CA and a Globus server on the machine m2.<br />

As root, run:<br />

cd /usr/src<br />

export PATH=$PATH:$GPT_LOCATION/sbin<br />

gpt-build -nosrc gcc32<br />

gpt-build globus_simple_ca_bundle-0.9.tar.gz gcc32<br />

gpt-postinstall<br />

...<br />

Do you want to keep this as the CA subject (y/n) [y]: n<br />

Enter a unique subject name <strong>for</strong> this CA:<br />

cn=my test CA, ou=m2.kmitnb.ac.th, ou=demotest, o=grid<br />

Enter the email of the CA:<br />

adminca@m2.kmitnb.ac.th<br />

[default 5 years] 1825


102<br />

mycapw<br />

[enter]<br />

During the above process, a hash number is generated and used as part of the<br />

file name. Please note this number <strong>for</strong> use in the next steps. Run the script name<br />

printed at the end of the prior install, substituting the hex hash number printed by the<br />

above process in place of the shown below, adding the “-default” argument:<br />

/usr/local/globus/setup/globus_simple_ca__setup/setup-gsi -default<br />

y<br />

q<br />

The file /root/.globus/simpleCA/private/cakey.pem is the CA’s private key and<br />

should not be given out to anyone else. The file /root/.globus/simpleCA/cacert.pem<br />

contains the CA’s public key.<br />

The following is used to install the CA’s certificate on each of the other grid<br />

machines. /root/.globus/simpleCA/globus_simple_ca__setup-0.9.tar.gz is the<br />

file containing the public CA key and other in<strong>for</strong>mation needed to participate in this<br />

grid. This must be copied to each of the other machines and installed using the gptbuild<br />

command.<br />

First, on machine m2, use ftp to copy the file<br />

/root/.globus/simpleCA/globus_simple_ca__setup-0.9.tar.gz to the directory<br />

/usr/src/ of each of the other grid machines. This can be done in two steps by ftp-ing<br />

them to the directory /home/globususer on each of those machines using globususer<br />

ID. Then, using root, this file can be moved to the directory /usr/src. Next, issue the<br />

following commands on each of those machines as root:<br />

gpt-build /usr/src/globus_simple_ca__setup-0.9.tar.gz<br />

gpt-postinstall<br />

/usr/local/globus/setup/globus_simple_ca__setup/setup-gsi -default<br />

y<br />

q<br />

B.2.9 Requesting and Signing Gatekeeper Certificates <strong>for</strong> Servers<br />

On each of the server machines (m2 and m3), we per<strong>for</strong>m the following steps to<br />

request and sign certificates:<br />

grid-cert-request -host <br />

Use ftp or e-mail (if available and using the adminca ID) to copy the file<br />

/etc/grid-security/hostcert_request.pem to the CA machine and put it into the directory<br />

/root. On the CA machine, as root, sign the certificate using the following:


103<br />

grid-ca-sign -in /root/hostcert_request.pem -out /root/hostcert.pem<br />

mycapw<br />

Then, ftp the file /root/hostcert.pem back to the server machine and place it in<br />

the directory /etc/grid-security.<br />

B.2.10 Requesting and Signing User Certificates<br />

For each user who will use the grid (in our example, user snobol on the client<br />

machine m1), the following procedure must be executed by the user and Certificate<br />

Authority. On the snobol user’s logon, run:<br />

grid-cert-request<br />

<br />

<br />

The user should make up his own passphrase <strong>for</strong> his certificate. He will use this<br />

same passphrase later with the grid-proxy-init command to authenticate with the<br />

grid. In our example, the snobol user’s login password could be used here.<br />

The user must then send the file /home//.globus/usercert_request.pem<br />

to the Certificate Authority (machine m2) <strong>for</strong> signing. On the CA machine (m2), sign<br />

the certificate using root with the following command, adjusting the location of<br />

usercert_request.pem to point to wherever the above request file is now stored on m2:<br />

grid-ca-sign -in usercert_request.pem -out usercert.pem<br />

mycapw<br />

Securely send the file usercert.pem back the requesting user. The user should<br />

put the file usercert.pem into his /home//.globus directory.<br />

The user should also be added to the grid-mapfile (on machine m2 under root)<br />

using the following command (note the backward apostrophe characters next to the<br />

double quote characters):<br />

grid-mapfile-add-entry -dn “`grid-cert-info -f usercert.pem –subject`” –ln globususer<br />

Copy grid-mapfile in /etc/grid-security/grid-mapfile to each of the other servers<br />

(m2) so that all of the servers have this file.<br />

B.2.11 Setting Up the Gatekeepers<br />

On each server (m2 and m3), add the following two lines to the file<br />

/etc/services:<br />

gsigatekeeper 2119/tcp #globus gatekeeper<br />

gsiftp 2811/tcp #globus wuftp<br />

Create the file /etc/xinetd.d/gsigatekeeper on each server, containing the lines:


104<br />

service gsigatekeeper<br />

{<br />

socket_type = stream<br />

protocol = tcp<br />

wait = no<br />

user = root<br />

env = LD_LIBRARY_PATH=/usr/local/globus/lib<br />

server = /usr/local/globus/sbin/globus-gatekeeper<br />

server_args = -conf /usr/local/globus/etc/globus-gatekeeper.conf<br />

disable = no<br />

}<br />

Create the file /etc/xinetd.d/gsiftp on each server, containing the lines:<br />

service gsiftp<br />

{<br />

instances = 1000<br />

socket_type = stream<br />

wait = no<br />

user = root<br />

env = LD_LIBRARY_PATH=/usr/local/globus/lib<br />

server = /usr/local/globus/sbin/in.ftpd<br />

server_args = -l -a -G /usr/local/globus<br />

log_on_success += DURATION USERID<br />

log_on_failure += USERID<br />

nice = 10<br />

disable = no<br />

}<br />

Now reboot all of the machines.<br />

B.3 Setting Up the MDS<br />

We will configure the Monitoring and Discovery Service (MDS) to have one<br />

Grid In<strong>for</strong>mation Index Service (GIIS) in the machine m2, which collects the data<br />

reported by the Grid Resource In<strong>for</strong>mation Servers (GRIS) in all of the machines.<br />

The GRIS servers send in<strong>for</strong>mation about their respective servers to the GIIS. In<br />

the demo application, we will use this to find machines that are not too busy. The user<br />

will be able to query the GIIS from the client machine m1.


105<br />

To set up this structure, we need to modify several configuration files. These<br />

files name the GIIS and GRIS, and show how these components should register with<br />

each other.<br />

Figure 3-24 presents the relationship among the MDS components in our<br />

application.<br />

B.3.1 Setting Up the GIIS and GRIS on the Machine m2<br />

On m2, make the following modifications to the conf files in the directory<br />

$GLOBUS_LOCATION/etc.<br />

In the file grid-info-slapd.conf, name the GIIS on machine m2. Change the<br />

second of the lines:<br />

to<br />

to<br />

database giis<br />

suffix “Mds-Vo-name=site, o=Grid”<br />

database giis<br />

suffix “Mds-Vo-name=m2.kmitnb.ac.th, o=Grid”<br />

In the file grid-info-site-policy.conf, allow registrations from the domain.<br />

Change the below line:<br />

policydata: (&(Mds-Service-hn=site) (Mds-Service-port=2135))<br />

policydata: (&(Mds-Service-hn=*.kmitnb.ac.th) (Mds-Service-port=2135))<br />

In the file grid-info-resource-register.conf, tell the m2 GRIS to register with the<br />

m2 GIIS. Change the two matching lines to the settings shown below:<br />

dn: Mds-Vo-Op-name=register, Mds-Vo-name=m2.kmitnb.ac.th, o=grid<br />

reghn: m2.kmitnb.ac.th<br />

B.3.2 Setting Up the GRIS on m3<br />

On all of the other server machines (here we have only m3), make the following<br />

modifications to the conf files in the directory $GLOBUS_LOCATION/etc.<br />

In the file grid-info-slapd.conf, remove the GIIS server from these machines.<br />

Remove the block of lines starting with the following lines:<br />

database giis<br />

suffix “Mds-Vo-name=site, o=Grid”<br />

In the file grid-info-resource-register.conf, tell the GRIS which GIIS to register<br />

with. Change the two matching lines as shown below:<br />

dn: Mds-Vo-Op-name=register, Mds-Vo-name=m2.kmitnb.ac.th, o=grid<br />

reghn: m2.kmitnb.ac.th


106<br />

B.3.3 Starting the MDS on All of the Servers<br />

Start the MDS on all of the servers (m2 and m3) using:<br />

globus-mds start<br />

This can be automated by putting it in /etc/rc.d/rc.5 per the usual conventions.<br />

Copy the globus-mds script into the directory /etc/init.d/. Then create two symbolic<br />

links as follows:<br />

cp $GLOBUS_LOCATION/sbin/globus-mds /etc/init.d/<br />

cd /etc/rc.d/rc5.d/<br />

ln -s /etc/init.d/globus-mds S92globus-mds<br />

ln -s /etc/init.d/globus-mds K92globus-mds<br />

B.3.4 Setting Up the MDS Client m1<br />

Modify the file $GLOBUS_LOCATION/etc/grid-info.conf lines shown below<br />

so that searches go to the GIIS on machine m2:<br />

GRID_INFO_HOST=”m2.kmitnb.ac.th”<br />

GRID_INFO_ORGANIZATION_DN=”Mds-Vo-name=m2.kmitnb.ac.th, o=Grid”<br />

B.3.5 Setting Up a Secure MDS<br />

So far, we have set up an MDS that permits anonymous access. The grid-infosearch<br />

command should use the -x flag to indicate an anonymous search request.<br />

However, the MDS can be secured so that only certified users can access the GIIS and<br />

only certified server GRISs can register to send in<strong>for</strong>mation to the GIIS. The<br />

following steps should be per<strong>for</strong>med.<br />

B.3.5.1 Requesting and Signing Certificates <strong>for</strong> Each Server Machine<br />

For each of the server machines (m2 and m3) request LDAP certificates, sign<br />

them using the Certificate Authority on m2, and copy the signed certificates to the<br />

proper location. The steps <strong>for</strong> one of the servers (m3) are shown below.<br />

On the server machine (m3) under root, run:<br />

grid-cert-request -service ldap -host m3.kmitnb.ac.th<br />

Copy the request certificate from /etc/grid-security/ldap/ldapcert_request.pem to<br />

the Certificate Authority machine (m2) using ftp or any other desired method. Sign<br />

the certificate using root on m2 substituting the correct locations <strong>for</strong> the request<br />

certificate and signed certificates:<br />

grid-ca-sign -in ldapcert_request.pem -out ldapcert.pem


107<br />

Copy the resulting signed certificate file ldapcert.pem from the Certificate<br />

Authority machine (m2) to the file the server machine (m3) location /etc/gridsecurity/ldap/ldapcert.pem.<br />

B.3.5.2 Changing the conf Files<br />

Change the following configuration files on the servers.<br />

Change $GLOBUS_LOCATION/etc/grid-info-slapd.conf to change the<br />

anonymousbind setting(s) as follows:<br />

anonymousbind yes<br />

Change the files $GLOBUS_LOCATION/etc/grid-info-resource-register.conf<br />

on the servers to require authentication when registering:<br />

bindmethod: ANONYM-ONLY<br />

At this point, the registration "authentication" bind method has been specified.<br />

Who can register with whom and how, but when anonymous bind has been<br />

deactivated, each registrant node must be in<strong>for</strong>med that the GIIS (m2) is authorized to<br />

receive resource in<strong>for</strong>mation.<br />

To authorize m2 (the GIIS) to receive registration in<strong>for</strong>mation, m2's ldap<br />

subject name must be entered in the grid-mapfile file. To get m2's ldap subject name,<br />

we run "grid-cert-info" on m3 as follows, in directory /etc/grid-security, with the<br />

assumption that m3's ldap subject name would be similar.<br />

% grid-cert-info -f /etc/grid-security/ldap/ldapcert.pem -subject<br />

The name was<br />

/O=grid/OU=demotest/OU=m2.kmitnb.ac.th/CN=ldap/m3.kmitnb.ac.th<br />

Since direct editing of the grid-mapfile is discouraged, we run the following<br />

command using the name obtained from above, substituting "m2" <strong>for</strong> "m3."<br />

% grid-mapfile-add-entry \<br />

-dn "/O=grid/OU=demotest/OU=m2.kmitnb.ac.th/CN=ldap/ m2.kmitnb.ac.th" \<br />

-ln globususer<br />

Successful entry was indicated with the following string returned:<br />

(1) entry added<br />

After making all of these changes, the server machines should be rebooted or<br />

the following should be used to restart the MDS on each of the servers (m2 and m3):<br />

globus-mds stop<br />

globus-mds start


108<br />

B.4 Checking the Installation<br />

To check the installations on each machine, as root use the command:<br />

$GPT_LOCATION/sbin/gpt-verify<br />

The following commands can be used on a server machine to see if the GRAM<br />

and GridFTP are listening on their respective ports:<br />

netstat -an | grep 2119<br />

netstat -an | grep 2811<br />

From the client machine (m1) logged on as the user snobol, do the following:<br />

This command sets up the environment so that Globus commands can be issued<br />

by the user. One may want to add this line to one’s login profile:<br />

. $GLOBUS_LOCATION/etc/globus-user-env.sh<br />

This command refreshes the proxy certificate <strong>for</strong> the user (snobol):<br />

grid-proxy-init<br />

<br />

The following commands send a simple job to the server machine. This test<br />

whether jobs can be submitted to each of the server machines:<br />

globus-job-run m2.kmitnb.ac.th “/bin/hostname”<br />

globus-job-run m3.kmitnb.ac.th “/bin/hostname”<br />

To refine the search to look <strong>for</strong> processors having more than 90 percent free of<br />

CPU utilization <strong>for</strong> the last minute, use:<br />

grid-info-search -x “(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=90))”<br />

Now we are ready to install and run the course scheduling application.


APPENDIX C<br />

INSTALLING SOFTWARE


110<br />

This section introduces the steps <strong>for</strong> installing and setting up MySQL 4.0,<br />

J2sdk1.4, Java Cog Kit 1.1, Tomcat 5.0, mod_jk2 and JDBC driver on Redhat Linux<br />

9.0 (RH9). In this study, we will install this software on machine m1.<br />

C.1 Installing MySQL 4.0<br />

First, make sure there is no previous version of MySQL installed on the system.<br />

As root execute the command:<br />

#rpm –q mysql<br />

If there is none, proceed to install phase, otherwise uninstall it by the command:<br />

#rpm –e mysql<br />

Download the rpm packages <strong>for</strong> MySQL’s server, client and dynamic shared<br />

libraries:<br />

- MySQL-server-4.0.24-0.i386.rpm<br />

- MySQL-client-4.0.24-0.i386.rpm<br />

- MySQL-shared-4.0.24-0.i386.rpm<br />

- MySQL-devel-4.0.24-0.i386.rpm<br />

Then install them one by one by using the following commands as root:<br />

#rpm -ivh MySQL-server-4.0.24-0.i386.rpm<br />

#rpm -ivh MySQL-client-4.0.24-0.i386.rpm<br />

#rpm -ivh MySQL-shared-4.0.24-0.i386.rpm<br />

#rpm -ivh MySQL-devel-4.0.24-0.i386.rpm<br />

The MySQL database has been created in /var/lib/mysql.<br />

Initialize MySQL database after installation by typing:<br />

#mysql_install_db<br />

Do not <strong>for</strong>get to set the mysqlclient.so path into search path file /etc/ld.so.conf.<br />

For example, we have:<br />

/usr/lib/libmysqlclient.so<br />

Make sure /etc/ld.so.conf contains:<br />

/usr/lib<br />

Then run<br />

#/usr/sbin/ldconfig<br />

The following instructions are to change the default empty password <strong>for</strong><br />

MySQL users to what we like. For example, change the empty password to ncdanh:<br />

#/usr/bin/mysqladmin –u root password ncdanh<br />

Now, try to log in MySQL with the new password. As root, type:


111<br />

#mysql –u root<br />

Enter password: ncdanh<br />

mysql><br />

C.2 Installing J2sdk1.4<br />

To install J2sdk1.4, do the following steps:<br />

- Download j2sdk-1_4_2_10-linux-i586.bin file and copy it to /usr/local:<br />

[root@m1 root]#cp –p j2sdk-1_4_2_10-linux-i586.bin /usr/local<br />

- Run the above file:<br />

[root@m1 root]#./j2sdk-1_4_2_10-linux-i586.bin<br />

This leaves directory /usr/local/j2sdk-1.4.2_10.<br />

- Insert the following lines inside file /etc/profile or /root/.bashrc:<br />

export JAVA_HOME= /usr/local/j2sdk1.4.2_10<br />

export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar:./<br />

C.3 Installing Java Cog Kit 1.1<br />

This section presents how to download, install and configure the Java CoG Kit<br />

1.1.<br />

Installation is the first step that needs to be accomplished be<strong>for</strong>e the Java CoG<br />

Kit can be used. It ensures that the Java CoG Kit exists on our local machine in a<br />

proper state. After installation, configuration is needed to adjust various parameters<br />

that are specific to our environment.<br />

C.3.1 Downloading the Java Cog Kit<br />

This study uses jglobus stable binary. Using this version, we are interested in<br />

just the jar files without modifying them.<br />

The stable binary distribution of the jglobus is available from the web-site:<br />

http://www.globus.org/cog/java/1.1/cog-1.1-bin.tar.gz.<br />

As root, do the following steps:<br />

- Download cog-1.1-bin.tar.gz file and copy to /usr/local.<br />

- Unpack this file:<br />

[root@m1 root]#cd /usr/local<br />

[root@m1 local]#tar –xzf cog-1.1-bin.tar.gz<br />

A directory named cog-1.1 will be created. This directory will, from now on, be<br />

referred to as


112<br />

C.3.2 Configuration<br />

This section shows how to configure the Java CoG Kit.<br />

C.3.2.1 Environment Variables<br />

The COG_INSTALL_PATH environment variable is used to determine the<br />

installation location of the Java CoG Kit. The COG_INSTALL_PATH should point to<br />

the directory.<br />

It is also highly recommended that you add the /bin directory<br />

to the binary search path (named PATH on most systems).<br />

Add the following commands to the /etc/profile:<br />

export COG_INSTALL_PATH=/usr/local/cog-1.1<br />

export PATH=$ COG_INSTALL_PATH/bin<br />

Log out and log in the RH9 machine to active the above profile.<br />

C.3.2.2 Configuration<br />

Manual configuration of the Java CoG Kit is also possible. Using an Editor, we<br />

create the configuration file named cog.properties and locate it in the directory /.globus.<br />

In our situation, this directory is /home/snobol/.globus (The snobol<br />

user is created in Appendix B).<br />

A sample Java CoG Kit configuration file is shown as follows:<br />

#Java CoG Kit Configuration File<br />

#Mon Dec 26 10:30:30 CST 2005<br />

usercert=/home/snobol/.globus/usercert.pem<br />

userkey=/home/snobol/.globus/userkey.pem<br />

proxy=/tmp/x509up_u800<br />

cacert=/usr/local/globus/etc/grid-security/certificates/42864e48.0<br />

ip=192.168.10.241<br />

It includes a number of important properties. These properties are:<br />

- usercert: points to the location of the Globus user certificate.<br />

- userkey: points to the location of the private key associated with the Globus<br />

user certificate.<br />

- proxy: points to the location of the user proxy. The proxy is located in a<br />

temporary directory, and has its name composed of the string x509up_u and a user id<br />

(OS specific). In the above example, the user id is 1000.<br />

- cacert: contains a comma separated list of certificate authorities that the user<br />

trusts.


113<br />

- ip: represents the IP address of the machine the Java CoG Kit will be run<br />

from.<br />

C.3.2.3 Managing Certificates and Proxies<br />

Currently, the Java CoG Kit provides some GUI-based tools <strong>for</strong> credential<br />

management. These tools need the environment variable COG_INSTALL_PATH to<br />

be set to .<br />

One of the tools is Visual-grid-proxy-init. This tool allows creation of a proxy.<br />

Lifetime and cryptographic strength of the proxy can be specified. Also, the locations<br />

of user’s long-term credentials and the location of the resulting proxy file can be<br />

specified.<br />

FIGURE C-1 Visual-grid-proxy-init<br />

To run this tool, as root, do the following steps:<br />

- Run the following command:<br />

[root@m1 root]# visual-grid-proxy-init<br />

The system will show a dialog box as presented in Figure C-1.<br />

- Input password: pwsbm1.<br />

- Input the options with the following values:<br />

• Proxy lifetime : 12h<br />

• Strength : 512<br />

• Proxy file : /tmp/x509up_u800<br />

• User certificate : /home/snobol/.globus/usercert.pem<br />

• User private key : /home/snobol/.globus/userkey.pem<br />

- Press ”Create” button.<br />

For testing, after running the proxy file, run some following commands:<br />

- Display in<strong>for</strong>mation regarding a proxy


114<br />

[root@m1 root]#grid-proxy-info<br />

- Execute a command on remote machine m2 from local machine m1:<br />

[root@m1 root]#globusrun –r m2.kmitnb.ac.th –o “&(executable=/bin/ls)”<br />

C.4 Installing Tomcat 5.0<br />

C.4.1 Installing Tomcat 5.0<br />

To install Tomcat 5.0, do the following steps:<br />

- Download file jakarta-tomcat-5.0.28.tar.gz and copy it to /usr/local/opt.<br />

[root@m1 root]#cp –p jakarta-tomcat-5.0.28.tar.gz /usr/local/opt<br />

- Change into /usr/local/opt and do the following commands:<br />

[root@m1 root]# cd /usr/local/opt<br />

[root@m1 opt]# tar –zxvf jakarta-tomcat-5.0.28.tar.gz<br />

[root@m1 opt]# ln –s jakarta-tomcat-5.0.28 tomcat<br />

Tomcat has been installed into /usr/local/opt/jakarta-tomcat-5.0.28 and<br />

linked to /usr/local/opt/tomcat.<br />

- Insert the following line inside file /etc/profile or /root/.bashrc:<br />

export CATALINA_HOME=/usr/local/opt/tomcat<br />

Now, log out and then log in the RH9 machine to ensure that all changes<br />

take effect.<br />

C.4.2 Starting and Stopping Tomcat 5.0<br />

First of all, we need to ensure that CATALINA_HOME and JAVA_HOME are<br />

correctly set. To do this, open a terminal and type the following commands:<br />

# echo $JAVA_HOME<br />

# echo $CATALINA_HOME<br />

If we get a blank line, or if the directory points anywhere besides where it is<br />

supposed to, we will have to correct these environment variables first, be<strong>for</strong>e<br />

continuing.<br />

If everything is fine, we can start Tomcat with the following command. As root,<br />

# $CATALINA_HOME/bin/startup.sh<br />

To check if Tomcat is running fine, we should open a browser and point the<br />

URL to http://localhost:8080. We should see the default Tomcat welcome page.<br />

To stop Tomcat, as root,<br />

# $CATALINA_HOME/bin/shutdown.sh


115<br />

If Tomcat does not start and we downloaded the zip file, the cause is probably<br />

due to permissions. Ensure that the following files are executable inside directory<br />

$CATALINA_HOME/bin,<br />

# chmod +x startup.sh<br />

# chmod +x shutdown.sh<br />

# chmod +x tomcat.sh<br />

After making the files executable, we try starting and stopping Tomcat again.<br />

C.5 Installing mod_jk<br />

We will use the Apache server included in RH9, instead of installing another<br />

one. The httpd service was installed in /etc/httpd.<br />

Be<strong>for</strong>e installing mod_jk, we should shutdown both the httpd service and<br />

Tomcat. The httpd service can be shutdown from Menu bar of RH9 (System<br />

Settings/Server Settings/Services), shown in Figure C-2. Select httpd and press<br />

“Stop”.<br />

FIGURE C-2 Service configuration<br />

Now, to install mod_jk do the following steps:<br />

- Download file mod_jk2-2.0.4-2jpp.i386.rpm (We can download at<br />

http://rpm.pbone.net) and copy it to /usr/software.<br />

[root@m1 root]#cd /usr/software<br />

- Install this file:<br />

[root@m1 software]#rpm –ihv mod_jk2-2.0.4-2jpp.i386.rpm


116<br />

The system will automatically put both mod_jk2.so and jkjni.so into<br />

/etc/httpd/modules of RH9.<br />

Now we configure <strong>for</strong> the following files: server.xml, workers2.properties and<br />

httpd.conf.<br />

C.5.1 Editing server.xml File<br />

Open the file CATALINA_HOME/conf/server.xml and look <strong>for</strong> the "non-SSL<br />

Coyote HTTP/1.1 Connector". This is a standard Tomcat-only connector. Comment it<br />

out since we will be using Apache <strong>for</strong> handling HTTP requests:<br />

<br />

<br />

C.5.2 Creating workers2.properties File<br />

Create file /etc/httpd/conf/workers2.properties with the following contents:<br />

[shm]<br />

file=/etc/httpd/logs/shm.file<br />

size=1048576<br />

# socket channel<br />

[channel.socket:localhost:8009]<br />

port=8009<br />

host=127.0.0.1<br />

# worker <strong>for</strong> the connector<br />

[ajp13:localhost:8009]<br />

channel=channel.socket:localhost:8009<br />

Note that the port matches that defined in the file server.xml <strong>for</strong> Tomcat.<br />

C.5.3 Editing httpd.conf File<br />

Open the file /etc/httpd/conf/httpd.conf and add the following lines at the end of<br />

the list of modules loaded into Apache.<br />

LoadModule jk2_module modules/mod_jk2.so<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />


117<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

For testing, we will create the directory<br />

CATALINA_HOME/webapps/ROOT/scheduling to store the JSP or html files <strong>for</strong> our<br />

system, then create a simple file test.jsp and put this file into the above directory. The<br />

file test.jsp has the following content:<br />

<br />

<br />

<br />

<br />

<br />

Now, try to access it from a web browser as presented in Figure C-3.<br />

FIGURE C-3 Result in the web browser<br />

Tomcat will automatically create the following files:<br />

CATALINA/work/Catalina/localhost/_/org/apache/jsp/scheduling/*.class


118<br />

C.6 Installing JDBC Driver on Linux<br />

Assume that we already have MySQL installed on the Redhat Linux machine.<br />

To access MySQL from Java or JSP programs, we need to download the MySQL<br />

Connector-J from its website. This study uses MySQL Connector/J 3.2.<br />

- Download the file mysql-connector-java-3.2.0-alpha.tar.gz (We can<br />

download it from http://www.mysql.com/products/connector/j/index.html).<br />

- Unzip, untar this tar.gz file and then place the above file into /usr/local.<br />

- Copy the file mysql-connector-java-3.2.0-alpha-bin.jar to the directory<br />

JAVA_HOME/jre/lib/ext.<br />

- Copy the file Driver.class to JAVA_HOME/jre/lib/ext. This will allow the<br />

java interpreter to find the driver.<br />

- Finally, insert the following lines inside file /etc/profile or /root/.bashrc.<br />

export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar:<br />

$JAVA_HOME/jre/lib/ext/mysql-connector-java-3.2.0-alpha-bin.jar:./


APPENDIX D<br />

INSTALLING CENTRALIZED AND DECENTRLIZED COURSE<br />

SCHEDULING PROGRAMS


120<br />

This section presents how to compile the centralized and decentralized course<br />

scheduling programs. These programs are written in C language that was included in<br />

the Redhat Linux 9.0 installation.<br />

D.1 The Centralized Course Scheduling Program<br />

This program will be installed on machine m2. On machine m2, we do the<br />

following steps:<br />

- Copy the file centralizedscheduling.c to /usr/study/coursescheduling.<br />

- Run the following commands as root:<br />

[root@m2 root]#cd /usr/study/coursescheduling<br />

[root@m2 coursescheduling]# gcc –I/usr/include/mysql centralizedscheduling.c –I/usr/lib/mysql –<br />

lmysqlclient –lz –o centralizedscheduling.exe<br />

The file centralizedscheduling.exe has been created in the same directory.<br />

For testing, we can run the following command.<br />

[root@m2 coursescheduling]#./centralizedscheduling.exe<br />

D.2 The Decentralized Course Scheduling Program<br />

This program will be installed on machines m2 and m3. The following steps are<br />

to compile it on machine m2.<br />

- Copy the file decentralizedscheduling.c to /usr/study/coursescheduling.<br />

- Run the following commands as root:<br />

[root@m2 root]#cd /usr/study/coursescheduling<br />

[root@m2 coursescheduling]# gcc –I/usr/include/mysql decentralizedscheduling.c –I/usr/lib/mysql –<br />

lmysqlclient –lz –o decentralizedscheduling.exe<br />

The file decentralizedscheduling.exe has been created in the same directory.


APPENDIX E<br />

JAVA SOURCE CODE FOR GRID SYSTEM


122<br />

All the following files are complied and stored in the directory<br />

/usr/study/gridsystem on machine m1.<br />

GridInfoSearch.java<br />

import java.util.Hashtable;<br />

import java.util.Enumeration;<br />

import java.net.InetAddress;<br />

import java.net.UnknownHostException;<br />

import javax.naming.Context;<br />

import javax.naming.NamingEnumeration;<br />

import javax.naming.NamingException;<br />

import javax.naming.directory.Attribute;<br />

import javax.naming.directory.SearchControls;<br />

import javax.naming.directory.SearchResult;<br />

import javax.naming.directory.Attributes;<br />

import javax.naming.ldap.LdapContext;<br />

import javax.naming.ldap.InitialLdapContext;<br />

import org.globus.mds.gsi.common.GSIMechanism;<br />

// we could add: aliasing, referral support<br />

public class GridInfoSearch {<br />

//Default values<br />

private static final String version = org.globus.common.Version.getVersion();<br />

private static final String DEFAULT_CTX ="com.sun.jndi.ldap.LdapCtxFactory";<br />

private String hostname = "m2.sched.grid.com";<br />

private int port = 2135;<br />

private String baseDN = "mds-vo-name=m2.sched.grid.com, o=grid";<br />

private int scope = SearchControls.SUBTREE_SCOPE;<br />

private int ldapVersion = 3;<br />

private int sizeLimit = 0;<br />

private int timeLimit = 0;<br />

private boolean ldapTrace = false;<br />

private String saslMech;<br />

private String bindDN;<br />

private String password;<br />

private String qop = "auth"; //could be auth, auth-int, auth-conf<br />

private static AvailableHost ob;//static mean that the values of ob will exist until the program finishs<br />

public GridInfoSearch(){<br />

}


123<br />

public String getTheBestHost(){<br />

GridInfoSearch gridInfoSearch = new GridInfoSearch();<br />

String filter = "(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=0))";<br />

gridInfoSearch.search(filter);<br />

ob.displayHost();<br />

System.out.println("the best:"+ob.getBestHost());<br />

return ob.getBestHost();<br />

}<br />

//Search the ldap server <strong>for</strong> the filter specified in the main function<br />

private void search(String filter) {<br />

Hashtable env = new Hashtable();<br />

String url = "ldap://" + hostname + ":" + port;<br />

env.put("java.naming.ldap.version", String.valueOf(ldapVersion));<br />

env.put(Context.INITIAL_CONTEXT_FACTORY, DEFAULT_CTX);<br />

env.put(Context.PROVIDER_URL, url);<br />

if (bindDN != null) {<br />

env.put(Context.SECURITY_PRINCIPAL, bindDN);<br />

}<br />

//use GSI authentication from grid-proxy-init certificate<br />

saslMech = GSIMechanism.NAME;<br />

env.put("javax.security.sasl.client.pkgs",<br />

"org.globus.mds.gsi.jndi");<br />

env.put(Context.SECURITY_AUTHENTICATION, saslMech);<br />

env.put("javax.security.sasl.qop", qop);<br />

LdapContext ctx = null;<br />

//create a new ldap context to hold per<strong>for</strong>m search on filter<br />

try {<br />

ctx = new InitialLdapContext(env, null);<br />

SearchControls constraints = new SearchControls();<br />

constraints.setSearchScope(scope);<br />

constraints.setCountLimit(sizeLimit);<br />

constraints.setTimeLimit(timeLimit);<br />

//store the results of the search in the results variable<br />

NamingEnumeration results = ctx.search(baseDN, filter, constraints);<br />

//displayResults(results);<br />

getAvailableHosts(results);//the results will be stored in ob<br />

} catch (Exception e) {<br />

System.err.println("Failed to search: " + e.getMessage());<br />

} finally {<br />

if (ctx != null) {


124<br />

}<br />

}<br />

}<br />

try { ctx.close(); } catch (Exception e) {}<br />

// Display results of search<br />

private void displayResults(NamingEnumeration results) throws NamingException {<br />

if (results == null) return;<br />

String dn;<br />

String attribute;<br />

Attributes attrs;<br />

Attribute at;<br />

SearchResult si;<br />

}//while<br />

}<br />

//use the results variable from search method and store them in a printable variable.<br />

while (results.hasMoreElements()) {<br />

si = (SearchResult)results.next();<br />

attrs = si.getAttributes();<br />

if (si.getName().trim().length() == 0) {<br />

dn = baseDN;<br />

} else {<br />

dn = si.getName() + ", " + baseDN;<br />

if(dn.substring(0,11).equals("Mds-Host-hn")){<br />

System.out.println("dn: " + dn);<br />

<strong>for</strong> (NamingEnumeration ae = attrs.getAll(); ae.hasMoreElements();) {<br />

at = (Attribute)ae.next();<br />

attribute = at.getID();<br />

if(attribute.equals("Mds-Cpu-Free-1minX100")){<br />

Enumeration vals = at.getAll();<br />

while(vals.hasMoreElements()) {<br />

System.out.println(attribute + ": " + vals.nextElement());<br />

}<br />

}<br />

}<br />

System.out.println();<br />

}<br />

}//else


125<br />

// Display results of search<br />

private void getAvailableHosts(NamingEnumeration results)throws NamingException {<br />

if (results == null) return;<br />

String dn;<br />

String attribute;<br />

Attributes attrs;<br />

Attribute at;<br />

SearchResult si;<br />

int Mds_Cpu_speedMHz=0;<br />

int Mds_Memory_Ram_Total_freeMB=0;<br />

int Mds_Cpu_Total_count=0;<br />

String Mds_Host_hn="";<br />

int Mds_Cpu_Free_1minX100=0;<br />

//use the results variable from search method and store them in a printable variable.<br />

ob=new AvailableHost();<br />

while (results.hasMoreElements()) {<br />

si = (SearchResult)results.next();<br />

attrs = si.getAttributes();<br />

if (si.getName().trim().length() == 0) {<br />

dn = baseDN;<br />

} else {<br />

dn = si.getName() + ", " + baseDN;<br />

if(dn.substring(0,32).equals("Mds-Device-Group-name=processors")){<br />

System.out.println("dn: " + dn);<br />

<strong>for</strong> (NamingEnumeration ae = attrs.getAll(); ae.hasMoreElements();) {<br />

at = (Attribute)ae.next();<br />

attribute = at.getID();<br />

if(attribute.equals("Mds-Cpu-speedMHz")){<br />

Enumeration vals = at.getAll();<br />

Mds_Cpu_speedMHz=Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Cpu_speedMHz);<br />

}else if(attribute.equals("Mds-Memory-Ram-Total-freeMB")){<br />

Enumeration vals = at.getAll();<br />

Mds_Memory_Ram_Total_freeMB=<br />

Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Memory_Ram_Total_freeMB);<br />

}else if(attribute.equals("Mds-Cpu-Total-count")){<br />

Enumeration vals = at.getAll();<br />

Mds_Cpu_Total_count=Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Cpu_Total_count);


126<br />

}//<strong>for</strong><br />

}else if(attribute.equals("Mds-Host-hn")){<br />

Enumeration vals = at.getAll();<br />

Mds_Host_hn=(String)vals.nextElement();<br />

System.out.println(attribute + ": " + Mds_Host_hn);<br />

}else if(attribute.equals("Mds-Cpu-Free-1minX100")){<br />

Enumeration vals = at.getAll();<br />

Mds_Cpu_Free_1minX100=<br />

Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Cpu_Free_1minX100);<br />

}//else if<br />

}//while<br />

//extract hostname from dn<br />

Mds_Host_hn=(String)dn.substring(dn.indexOf("Mds-Host-hn")+12,<br />

dn.indexOf("mds-vo-name")-2);<br />

System.out.println(Mds_Host_hn);<br />

//add hosts into ArrayList<br />

ob.addHost( Mds_Host_hn,<br />

Mds_Cpu_speedMHz,<br />

Mds_Memory_Ram_Total_freeMB,<br />

Mds_Cpu_Total_count,<br />

Mds_Cpu_Free_1minX100);<br />

}<br />

System.out.println();<br />

}<br />

}<br />

}<br />

//Create new instance of MyGridInfoSearch and use specified filter string<br />

public static void main( String [] args ) {<br />

GridInfoSearch gridInfoSearch = new GridInfoSearch();<br />

String filter = "(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=0))";<br />

gridInfoSearch.search(filter);<br />

}


127<br />

AvailableHost.java<br />

import java.util.*;<br />

public class AvailableHost{<br />

ArrayList ar;<br />

public AvailableHost() {<br />

ar = new ArrayList();<br />

}<br />

public void addHost( String Mds_Host_hn,<br />

int Mds_Cpu_speedMHz,<br />

int Mds_Memory_Ram_Total_freeMB,<br />

int Mds_Cpu_Total_count,<br />

int Mds_Cpu_Free_1minX100){<br />

ar.add(new Host( Mds_Host_hn,<br />

Mds_Cpu_speedMHz,<br />

Mds_Memory_Ram_Total_freeMB,<br />

Mds_Cpu_Total_count,<br />

Mds_Cpu_Free_1minX100));<br />

}<br />

public void displayHost(){<br />

<strong>for</strong>(int i=0; i


128<br />

public static void main(String args[]){<br />

AvailableHost ob = new AvailableHost();<br />

ob.addHost("m1.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,70/*%freeCPU*/);<br />

ob.addHost("m2.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,90/*%freeCPU*/);<br />

ob.addHost("m3.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,80/*%freeCPU*/);<br />

ob.displayHost();<br />

ob.displayBestHost();<br />

}//main<br />

}//class AvailableHost<br />

class Host implements Comparable {<br />

private int Mds_Cpu_speedMHz;<br />

private int Mds_Memory_Ram_Total_freeMB;<br />

private int Mds_Cpu_Total_count;<br />

private String Mds_Host_hn;<br />

private int Mds_Cpu_Free_1minX100;<br />

private int Weight;<br />

public Host(<br />

String Mds_Host_hn,<br />

int Mds_Cpu_speedMHz,<br />

int Mds_Memory_Ram_Total_freeMB,<br />

int Mds_Cpu_Total_count,<br />

int Mds_Cpu_Free_1minX100){<br />

}<br />

this.Mds_Host_hn=Mds_Host_hn;<br />

this.Mds_Cpu_speedMHz=Mds_Cpu_speedMHz;<br />

this.Mds_Memory_Ram_Total_freeMB=Mds_Memory_Ram_Total_freeMB;<br />

this.Mds_Cpu_Total_count=Mds_Cpu_Total_count;<br />

this.Mds_Cpu_Free_1minX100=Mds_Cpu_Free_1minX100;<br />

this.Weight=<br />

(int)(Mds_Cpu_Free_1minX100*Mds_Cpu_speedMHz*Mds_Cpu_Total_count/100.00);<br />

public String getHostname(){<br />

return Mds_Host_hn;<br />

}<br />

public int getWeight(){<br />

return Weight;<br />

}


129<br />

public String toString() {<br />

}<br />

return Mds_Host_hn + "\t" + Weight;<br />

//Order by cpu<br />

public int compareTo(Object ob) throws ClassCastException{<br />

Host temp = (Host)ob;<br />

int cpu1=Weight,cpu2=temp.Weight;<br />

if(cpu2>cpu1){<br />

return 1;}<br />

else if(cpu2


130<br />

System.out.println(CentralizedSchedulingJobOut);<br />

System.out.println(gassJob[0].doGetStatus());<br />

// if failed, resubmit it<br />

// waiting <strong>for</strong> the result<br />

System.out.println("\nWaiting <strong>for</strong> the centralized scheduling job to finish");<br />

do {<br />

stillRunningJob=false;<br />

if (jobListeners[0].stillActive()) {<br />

stillRunningJob = true;<br />

}<br />

if(jobListeners[0].fail()){<br />

System.out.println("Resubmit:"+CentralizedSchedulingRSL);<br />

gassJob[0]=new GassJob(centralmachine,false);<br />

CentralizedSchedulingJobOut =<br />

gassJob[0].GlobusRun(CentralizedSchedulingRSL);<br />

jobListeners[0]=gassJob[0].getInteractiveJobListener();<br />

stillRunningJob = true;<br />

}//esle if<br />

System.out.print(".");<br />

delay(1000);<br />

jobs.updateJobId(0, gassJob[0].doGetJobId());<br />

jobs.updateStatus(0,gassJob[0].doGetStatus());<br />

} while (stillRunningJob);<br />

System.out.println("\n");<br />

/********************************<br />

*Decentralized scheduling<br />

********************************/<br />

String gassJobOut;<br />

String deRSL;<br />

String theBestMachine;


131<br />

//request all these jobs<br />

<strong>for</strong>(int i=1; i


132<br />

gassJob[jobCount]=new GassJob(theBestMachine,false);<br />

gassJobOut = gassJob[jobCount].GlobusRun(deRSL);<br />

jobListeners[jobCount]=<br />

gassJob[jobCount].getInteractiveJobListener();<br />

//wait to receive a jobid<br />

//update jobid <strong>for</strong> this Job<br />

jobs.updateJobId(jobCount, gassJob[jobCount].doGetJobId());<br />

//update machine that is used <strong>for</strong> this job<br />

jobs.updateMachine(jobCount, theBestMachine);<br />

jobs.updateStatus(jobCount,gassJob[jobCount].doGetStatus());<br />

stillRunningJob = true;<br />

delay(30000);<br />

}//if<br />

}//<strong>for</strong><br />

System.out.print(".");<br />

delay(5000);<br />

} while (stillRunningJob);<br />

System.out.println("\n");<br />

}<br />

}//main<br />

GassJob.java<br />

import org.globus.gram.*;<br />

import org.grid<strong>for</strong>um.jgss.*;<br />

import org.ietf.jgss.*;<br />

import org.globus.security.gridmap.*;<br />

import org.globus.io.gass.server.*;<br />

import org.globus.util.deactivator.Deactivator;<br />

import COM.claymoresystems.sslg.*;<br />

import xjava.security.interfaces.*;<br />

import cryptix.asn1.lang.*;<br />

/**<br />

* Java CoG Job submission class<br />

**/<br />

public class GassJob implements JobOutputListener<br />

{<br />

private GassServer m_gassServer; // GASS Server: required to get job output<br />

private String m_gassURL = null; // URL of the GASS server<br />

private GramJob m_job = null; // GRAM JOB to be executed


133<br />

private String m_jobOutput = "";<br />

private boolean m_batch = false;<br />

private String m_remoteHost = null;<br />

private GSSCredential m_proxy=null;<br />

// job output as string<br />

// Submission modes: batch=do not wait <strong>for</strong> output<br />

// non-batch=wait <strong>for</strong> output.<br />

// host where job will run<br />

InteractiveJobListener jobListeners;<br />

// Globus proxy used <strong>for</strong> authentication against gatekeeper<br />

// Job output variables:<br />

// Used <strong>for</strong> non-batch mode jobs to receive output from<br />

// gatekeeper through the GASS server<br />

private JobOutputStream m_stdoutStream = null;<br />

private JobOutputStream m_stderrStream = null;<br />

private String m_jobid = null; // Globus job id on the <strong>for</strong>m:<br />

//https://server.com:39374/15621/1021382777/<br />

public GassJob(String Contact, boolean batch) {<br />

m_remoteHost = Contact; // remote host<br />

m_batch = batch; // submission mode<br />

}<br />

/**<br />

* Start the Globus GASS Server. Used to get the output from the server<br />

* back to the client.<br />

*/<br />

private boolean startGassServer(GSSCredential proxy) {<br />

if (m_gassServer != null) return true;<br />

try {<br />

m_gassServer = new GassServer(proxy, 0);<br />

m_gassURL = m_gassServer.getURL();<br />

} catch(Exception e) {<br />

System.err.println("gass server failed to start!");<br />

e.printStackTrace();<br />

return false;<br />

}<br />

m_gassServer.registerDefaultDeactivator();<br />

return true;<br />

}


134<br />

/**<br />

* Init job out listeners <strong>for</strong> non-batch mode jobs.<br />

*/<br />

private void initJobOutListeners() throws Exception {<br />

if ( m_stdoutStream != null ) return;<br />

// job output vars<br />

m_stdoutStream = new JobOutputStream(this);<br />

m_stderrStream = new JobOutputStream(this);<br />

m_jobid = String.valueOf(System.currentTimeMillis());<br />

}<br />

// register output listeners<br />

m_gassServer.registerJobOutputStream("err-" + m_jobid, m_stderrStream);<br />

m_gassServer.registerJobOutputStream("out-" + m_jobid, m_stdoutStream);<br />

return;<br />

/**<br />

* This method is used to notify the implementer when the status of a<br />

* GramJob has changed.<br />

*<br />

* @param job The GramJob whose status has changed.<br />

*/<br />

public void statusChanged(GramJob job) {<br />

try {<br />

if ( job.getStatus() == GramJob.STATUS_DONE ) {<br />

// notify waiting thread when job ready<br />

m_jobOutput = "Job sent. url=" + job.getIDAsString();<br />

// if notify enabled return URL as output<br />

synchronized(this) {<br />

notify();<br />

}<br />

}<br />

}<br />

catch (Exception ex) {<br />

System.out.println("statusChanged Error:" + ex.getMessage());<br />

}<br />

}


135<br />

/**<br />

* This method is used to get the status of the job<br />

*/<br />

public String doGetStatus(){<br />

return jobListeners.doGetStatus();<br />

}<br />

/**<br />

* This method is used to get the status of the job<br />

*/<br />

public String doGetJobId(){<br />

return m_job.getIDAsString();<br />

}<br />

public InteractiveJobListener getInteractiveJobListener(){<br />

return jobListeners;<br />

}<br />

/**<br />

* It is called whenever the job's output<br />

* has been updated.<br />

*<br />

* @param output new output<br />

*/<br />

public void outputChanged(String output) {<br />

m_jobOutput += output;<br />

}<br />

/**<br />

* It is called whenever job finished<br />

* and no more output will be generated.<br />

*/<br />

public void outputClosed() {<br />

}<br />

public synchronized String GlobusRun(String RSL) {<br />

try {<br />

// load default Globus proxy. Java CoG kit must be installed<br />

//and a user certificate setup properly<br />

ExtendedGSSManager manager =<br />

(ExtendedGSSManager)ExtendedGSSManager.getInstance();<br />

GSSCredential m_proxy =<br />

manager.createCredential(GSSCredential.INITIATE_AND_ACCEPT);


136<br />

// Start GASS server<br />

if (! startGassServer(m_proxy)) {<br />

throw new Exception("Unable to stat GASS server.");<br />

}<br />

// setup Job Output listeners<br />

initJobOutListeners();<br />

// Append GASS URL to job String so we can get some output back<br />

String newRSL = null;<br />

// if non-batch, then get some output back<br />

if ( !m_batch) {<br />

newRSL = "&" + RSL.substring(0, RSL.indexOf('&')) +<br />

"(rsl_substitution=(GLOBUSRUN_GASS_URL " + m_gassURL + "))" +<br />

RSL.substring(RSL.indexOf('&') + 1, RSL.length()) +<br />

"(stdout=$(GLOBUSRUN_GASS_URL)/dev/stdout-" + m_jobid + ")" +<br />

"(stderr=$(GLOBUSRUN_GASS_URL)/dev/stderr-" + m_jobid + ")";<br />

}<br />

else {<br />

// <strong>for</strong>mat batching RSL so output can be retrieved later on using any GTK commands<br />

newRSL = RSL +<br />

"(stdout=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stdout anExtraTag)"<br />

+ "(stderr=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stderr anExtraTag)";<br />

}<br />

m_job = new GramJob(newRSL);<br />

// set proxy. CoG kit and user credentials must be installed and set<br />

// up properly<br />

m_job.setCredentials(m_proxy);<br />

// if non-batch then listen <strong>for</strong> output<br />

jobListeners=new InteractiveJobListener(false);<br />

m_job.addListener(jobListeners);<br />

System.out.println("Sending job request to: " + m_remoteHost);<br />

m_job.request(m_remoteHost, m_batch, false);<br />

m_jobOutput = "Job sent. url=" + m_job.getIDAsString();<br />

}<br />

catch (Exception ex) {


137<br />

}<br />

}<br />

if ( m_gassServer != null ) {<br />

// unregister from gass server<br />

m_gassServer.unregisterJobOutputStream("err-" + m_jobid);<br />

m_gassServer.unregisterJobOutputStream("out-" + m_jobid);<br />

}<br />

m_jobOutput = "Error submitting job: " + ex.getClass() + ":"<br />

+ ex.getMessage();<br />

}<br />

// cleanup<br />

//Deactivator.deactivateAll();<br />

return m_jobOutput;<br />

InteractiveJobListener.java<br />

import java.io.*;<br />

import org.globus.gram.Gram;<br />

import org.globus.gram.GramJob;<br />

import org.globus.gram.GramException;<br />

import org.globus.gram.WaitingForCommitException;<br />

import org.globus.gram.GramJobListener;<br />

class InteractiveJobListener extends JobListener {<br />

private boolean quiet;<br />

private boolean finished = false;<br />

private boolean fail=false;<br />

private String strStatus="";<br />

public InteractiveJobListener(boolean quiet) {<br />

this.quiet = quiet;<br />

}<br />

public boolean stillActive() {<br />

}<br />

return !this.finished;<br />

public boolean fail(){<br />

}<br />

return this.fail;


138<br />

// waits <strong>for</strong> DONE or FAILED status<br />

public synchronized void waitFor() throws InterruptedException {<br />

while (!finished) {<br />

wait();<br />

}<br />

}<br />

public synchronized String doGetStatus(){<br />

}<br />

return strStatus;<br />

public synchronized void statusChanged(GramJob job) {<br />

if (!quiet) {<br />

System.out.println("Job: "+ job.getStatusAsString());<br />

}<br />

status = job.getStatus();<br />

strStatus=job.getStatusAsString();<br />

}<br />

}<br />

if (status == GramJob.STATUS_DONE) {<br />

finished = true;<br />

error = 0;<br />

notify();<br />

} else if (job.getStatus() == GramJob.STATUS_FAILED) {<br />

finished = true;<br />

fail=true;<br />

error = job.getError();<br />

notify();<br />

}<br />

JobListener.java<br />

import org.globus.gram.GramJob;<br />

import org.globus.gram.GramJobListener;<br />

abstract class JobListener implements GramJobListener {<br />

protected int status = 0;<br />

protected int error = 0;<br />

public abstract void waitFor() throws InterruptedException;


139<br />

public int getError() {<br />

}<br />

return error;<br />

public int getStatus() {<br />

}<br />

return status;<br />

public boolean isFinished() {<br />

}<br />

return (status == GramJob.STATUS_DONE ||status == GramJob.STATUS_FAILED);<br />

}<br />

Jobs.java<br />

import java.util.*;<br />

public class Jobs{<br />

public static ArrayList ar;<br />

public Jobs() {<br />

ar = new ArrayList();<br />

ar.add(new Job("centralizedscheduling","",<br />

"& (executable =/usr/study/coursescheduling/centralizedscheduling)","m2.sched.grid.com","",0));<br />

ar.add(new Job("decentralizedschedulingER","",<br />

"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />

(arguments=ER)", "","",0));<br />

ar.add(new Job("decentralizedschedulingSC","",<br />

"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />

(arguments=SC)", "","",0));<br />

ar.add(new Job("decentralizedschedulingED","",<br />

"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />

(arguments=ED)", "","",0));<br />

}<br />

//get a job that has index i<br />

public Job getJob(int i){<br />

return (Job) ar.get(i);<br />

}<br />

public int getSize(){<br />

return (int) ar.size();<br />

}


140<br />

//get RSL of Job having index i<br />

public String getRSL(int i){<br />

Job ob= getJob(i);<br />

return ob.getRSL();<br />

}<br />

//get Machine of Job having index i<br />

public String getMachine(int i){<br />

Job ob= getJob(i);<br />

return ob.getMachine();<br />

}<br />

//get Status of Job having index i<br />

public String getStatus(int i){<br />

Job ob= getJob(i);<br />

return ob.getStatus();<br />

}<br />

//update a new jobid <strong>for</strong> the job that has index i<br />

public void updateJobId(int i, String jobid ){<br />

Job oldJob= getJob(i);<br />

ar.set(i, new Job( oldJob.getJobName(),<br />

jobid,<br />

oldJob.getRSL(),<br />

oldJob.getMachine(),<br />

oldJob.getStatus(),<br />

oldJob.getExectime()));<br />

}<br />

//update a new machine <strong>for</strong> the job that has index i<br />

public void updateMachine(int i, String machine){<br />

Job oldJob= getJob(i);<br />

ar.set(i, new Job( oldJob.getJobName(),<br />

oldJob.getJobId(),<br />

oldJob.getRSL(),<br />

machine,<br />

oldJob.getStatus(),<br />

oldJob.getExectime()));<br />

}


141<br />

//update a new jobid <strong>for</strong> the job that has index i<br />

public void updateStatus(int i, String status){<br />

Job oldJob= getJob(i);<br />

ar.set(i, new Job( oldJob.getJobName(),<br />

oldJob.getJobId(),<br />

oldJob.getRSL(),<br />

oldJob.getMachine(),<br />

status,<br />

oldJob.getExectime()));<br />

}<br />

public void displayJobs(){<br />

<strong>for</strong>(int i=0; i


142<br />

class Job {<br />

private String jobname;<br />

private String jobid;<br />

private String RSL;<br />

private String machine;<br />

private String status;<br />

private int exectime;<br />

public Job(String jobname, String jobid, String RSL, String machine, String status, int exectime){<br />

this.jobname = jobname;<br />

this.jobid = jobid;<br />

this.RSL = RSL;<br />

this.machine = machine;<br />

this.status = status;<br />

this.exectime= exectime;<br />

}<br />

public String getJobName(){<br />

return jobname;<br />

}<br />

public String getRSL(){<br />

}<br />

return RSL;<br />

public String getJobId(){<br />

}<br />

return jobid;<br />

public String getMachine(){<br />

}<br />

return machine;<br />

public String getStatus(){<br />

}<br />

return status;<br />

public int getExectime(){<br />

}<br />

return exectime;


143<br />

public void updateJobId(String jobid ){<br />

}<br />

this.jobid = jobid;<br />

public void updateMachine(String machine ){<br />

}<br />

this.machine = machine;<br />

public void updateStatus(String status){<br />

}<br />

this.status = status;<br />

public String toString() {<br />

}<br />

return jobname + "\t" + machine + "\t" + status + "\t" + exectime;<br />

}//class Job


145<br />

BIOGRAPHY<br />

Name : Mr. Nguyen Cong Danh<br />

Thesis Title : Course Scheduling in Multiple Faculties Using a Grid Computing<br />

Environment<br />

Major Field : In<strong>for</strong>mation Technology<br />

Biography<br />

I graduated with a bachelor’s degree in Computer Science from Cantho<br />

University (Vietnam) in 2000.<br />

My contact address is 1 Ly Tu Trong street, Ninh Kieu district, Cantho city,<br />

Vietnam. My e-mail address is ncdanh@cit.ctu.edu.vn.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!