03.01.2015 Views

a multi-objective bisexual reproduction genetic algorithm for ...

a multi-objective bisexual reproduction genetic algorithm for ...

a multi-objective bisexual reproduction genetic algorithm for ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

COURSE SCHEDULING IN MULTIPLE FACULTIES USING<br />

A GRID COMPUTING ENVIRONMENT<br />

MR. NGUYEN CONG DANH<br />

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS<br />

FOR THE DEGREE OF MASTER OF SCIENCE (INFORMATION TECHNOLOGY)<br />

GRADUATE COLLEGE<br />

KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK<br />

ACADEMIC YEAR 2005<br />

ISBN 974-19-0543-2<br />

COPYRIGHT OF KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK


Name : Mr. Nguyen Cong Danh<br />

Thesis Title : Course Scheduling in Multiple Faculties Using a Grid<br />

Computing Environment<br />

Major Field : In<strong>for</strong>mation Technology<br />

King Mongkut’s Institute of Technology North Bangkok<br />

Thesis Advisor : Assistant Professor Dr. Yaowadee Temtanapat<br />

Academic Year : 2005<br />

Abstract<br />

Course scheduling <strong>for</strong> <strong>multi</strong>ple faculty universities is a large and complex<br />

problem. In these universities, each faculty desires to have its own timetable to use its<br />

resources. However, lecturers, courses, rooms and other resources can be shared<br />

between faculties. The data used <strong>for</strong> the course scheduling thus needs to be shared<br />

across the university. As a result, the constraint conflicts in the timetable can occur<br />

not only in each faculty but also across faculties. The course scheduling problem<br />

becomes more difficult to solve. This study proposes a hybrid centralized and decentralized<br />

approach <strong>for</strong> the course scheduling. The <strong>genetic</strong> <strong>algorithm</strong> and grid<br />

computing environment are used. The <strong>genetic</strong> <strong>algorithm</strong> is to solve the hard and soft<br />

constraints while grid computing environment is used as an infrastructure <strong>for</strong><br />

distributed and parallel computing. The results of this research indicated that the<br />

proposed system can solve most of required constraints and the grid computing can<br />

improve significantly computing per<strong>for</strong>mance of the whole system.<br />

(Total 145 pages)<br />

___________________________________________________________Chairperson<br />

ii


ชื่อ : นายฮูเยน ชอง แดน<br />

ชื่อวิทยานิพนธ : การจัดตารางสอนสําหรับมหาวิทยาลัยที่มีหลายคณะโดยใช<br />

สภาพแวดลอมการประมวลผลแบบกริด<br />

สาขาวิชา : เทคโนโลยีสารสนเทศ<br />

สถาบันเทคโนโลยีพระจอมเกลาพระนครเหนือ<br />

ที่ปรึกษาวิทยานิพนธ : ผูชวยศาสตราจารย ดร. เยาวดี เต็มธนาภัทร<br />

ปการศึกษา : 2548<br />

บทคัดยอ<br />

การจัดตารางสอนสําหรับมหาวิทยาลัยที่มีหลายคณะเปนปญหาที่ใหญและซับซอน ใน<br />

มหาวิทยาลัยเหลานี้ แตละคณะมีความตองการตารางสอนของตนเองโดยใชทรัพยากรที่ตนมีอยู<br />

อยางไรก็ตาม อาจารย วิชา หองและทรัพยากรอื่นก็ยังสามารถที่จะถูกใชงานรวมกันได ขอมูล<br />

สําหรับการจัดตารางสอนจึงจําเปนที่จะตองใชงานรวมกัน ผลก็คือไมใชเพียงจะเกิดความขัดแยงใน<br />

เรื่องของเงื่อนไขของตารางสอนภายในคณะที่ได แตยังรวมไปถึงความขัดแยงของเงื่อนไขที่จะเกิด<br />

ไดในระหวางแตละคณะดวย ทําใหปญหาการจัดตารางสอนในมหาวิทยาลัยเหลานี้จึงเพิ่มความ<br />

ยุงยากยิ่งขึ้นไปอีก ในการศึกษานี้เราจึงนําเสนอวิธีการที่เปนการผสมระหวางการจัดตารางสอน<br />

แบบรวมศูนยและแบบกระจาย โดยใชขั้นตอนวิธีแบบพันธุกรรมรวมกับสภาพแวดลอมการ<br />

ประมวลผลแบบกริด ขั้นตอนวิธีแบบพันธุกรรมใชในการแกปญหาของเงื่อนไขแบบไมผอนปรน<br />

(hard constraint) และแบบอาจผอนปรนไดบาง (soft constraint) สําหรับการประมวลผลใน<br />

สภาพแวดลอมแบบกริดใชเปนพื้นฐานสําหรับการประมวลผลแบบกระจายและแบบขนาน ผลลัพธ<br />

ของงานวิจัยชี้ใหเห็นวา ระบบที่นําเสนอสามารถแกปญหาของเงื่อนไขสวนใหญได และการ<br />

ประมวลผลแบบกริดสามารถเพิ่มประสิทธิภาพการประมวลผลของทั้งระบบไดอยางเห็นไดชัด<br />

(วิทยานิพนธมีจํานวนทั้งสิ้น 145 หนา)<br />

_______________________________ประธานกรรมการที่ปรึกษาวิทยานิพนธ<br />

iii


ACKNOWLEDGEMENTS<br />

First and <strong>for</strong>emost, I would like to thank Assistant Professor Dr. Yaowadee<br />

Temtanapat <strong>for</strong> her support and encouragement throughout my time at King<br />

Mongkut’s Institute of Technology North Bangkok (KMITNB). I deeply appreciate<br />

not only her intelligence, knowledge, and willingness to provide guidance <strong>for</strong> my<br />

thesis, but also her sense of humor and her enthusiasm.<br />

Grateful acknowledgements are addressed to Assistant Professor Dr. Utomporn<br />

Phalavonk, Assistant Professor Dr. Phayung Meesad, Dr. Gareth Clayton, and other<br />

members of the program committee <strong>for</strong> their valuable and constructive comments on<br />

this thesis.<br />

I wish to express my gratitude to all teachers, staffs at KMITNB <strong>for</strong> their<br />

knowledge, encouragement and support during my study.<br />

Thanks to my friends, graduate students, <strong>for</strong> their encouragement. They also<br />

made my time at KMITNB and Thailand an enjoyable experience.<br />

The most sincere thanks to my parents who have always been true believers and<br />

encouraged me in the past two years.<br />

Last but certainly not least, I am especially indebted to my scholarship provider<br />

“DTEC” <strong>for</strong> their financial support that gave me the opportunity to study at KMITNB.<br />

Nguyen Cong Danh<br />

iv


TABLE OF CONTENTS<br />

Page<br />

Abstract (in English)<br />

ii<br />

Abstract (in Thai)<br />

iii<br />

Acknowledgements<br />

iv<br />

List of Tables<br />

vii<br />

List of Figures<br />

viii<br />

Chapter 1. Introduction 1<br />

1.1 Problem Statement and Background 1<br />

1.2 The Objectives of the Study 3<br />

1.3 The Scope of the Study 3<br />

1.4 The Utilizations of the Study 5<br />

Chapter 2. Literature Review 7<br />

2.1 The Course Scheduling Problems 7<br />

2.2 The Related Works on Course Scheduling Problems 10<br />

2.3 Genetic Algorithms 19<br />

2.4 Grid Computing 24<br />

2.5 Summary 31<br />

Chapter 3. Methodology 33<br />

3.1 System Development 33<br />

3.2 Problem Definition 34<br />

3.3 The System Boundary 36<br />

3.4 The Proposed Course Scheduling System 37<br />

3.5 The Database Design 40<br />

3.6 The Proposed Genetic Algorithm 42<br />

3.7 The System <strong>for</strong> Experiment 53<br />

3.8 The Grid Components 54<br />

Chapter 4. Experimental Results 61<br />

4.1 The Data <strong>for</strong> the Experiments 61<br />

4.2 The Experiments and Discussions 66<br />

4.3 The Sample Results 74<br />

v


TABLE OF CONTENTS (CONTINUED)<br />

Page<br />

Chapter 5. Conclusion 79<br />

5.1 Conclusions 79<br />

5.2 Future Works 80<br />

References 81<br />

Appendix A 87<br />

Appendix B 95<br />

Appendix C 109<br />

Appendix D 119<br />

Appendix E 121<br />

Biography 145<br />

vi


LIST OF TABLES<br />

Table<br />

Page<br />

2-1 Courses taught by a department 8<br />

2-2 Teaching assignment 9<br />

2-3 Sample timetable 10<br />

2-4 Tentative list of tools <strong>for</strong> grid computing 27<br />

4-1 Courses fulfilled by each class 61<br />

4-2 Lecturer and classroom assignment 64<br />

4-3 Timetable created by the centralized scheduling program 74<br />

4-4 Timetable created by the decentralized scheduling program <strong>for</strong><br />

Faculty of Engineering 75<br />

4-5 Timetable created by the decentralized scheduling program <strong>for</strong><br />

Faculty of Science 76<br />

A-1 Faculty 88<br />

A-2 Department 88<br />

A-3 Lecturer 89<br />

A-4 Busy Time 89<br />

A-5 Building 90<br />

A-6 Classroom 90<br />

A-7 Classroom group 90<br />

A-8 Department controls classroom 91<br />

A-9 Course 91<br />

A-10 Program 92<br />

A-11 Curriculum 92<br />

A-12 Class 93<br />

A-13 Course section 93<br />

A-14 Timetable 94<br />

B-1 Host names, IP addressing, and software 97<br />

B-2 Group, user ID and password 98<br />

B-3 Distinguished name and passphrase 98<br />

vii


LIST OF FIGURES<br />

Figure<br />

Page<br />

1-1 Shared lecturers, courses, and classrooms 1<br />

1-2 Outline of the basic <strong>genetic</strong> <strong>algorithm</strong> 2<br />

1-3 Sample timetable <strong>for</strong> a classroom 4<br />

2-1 Graph of 12 events 11<br />

2-2 Graph after coloring 11<br />

2-3 Local optimal problem 13<br />

2-4 Simulated annealing <strong>algorithm</strong> 14<br />

2-5 Tabu search <strong>algorithm</strong> 16<br />

2-6 Multi agent system 19<br />

2-7 Encoding chromosome 20<br />

2-8 Example of crossover 21<br />

2-9 Example of mutation 21<br />

2-10 Roulette wheel selection 23<br />

2-11 Rank selection 24<br />

2-12 Application consists of jobs: B, C, D, and E executed in parallel 25<br />

2-13 Application consist of jobs that are networked 26<br />

2-14 Components of Globus Toolkit 2.2 28<br />

2-15 Simple LDAP configuration 28<br />

2-16 Grid components: a high-level perspective 29<br />

3-1 Shared classrooms in a <strong>multi</strong>ple faculty university 35<br />

3-2 Use case diagram of the course scheduling system 36<br />

3-3 Proposed system 38<br />

3-4 System architecture 39<br />

3-5 Entity relation diagram 41<br />

3-6 High level representation of the proposed <strong>genetic</strong> <strong>algorithm</strong> 42<br />

3-7 Sub-timetable of a classroom 43<br />

3-8 Chromosome 44<br />

3-9 Population 44<br />

viii


LIST OF FIGURES (CONTINUED)<br />

Figure<br />

Page<br />

3-10 Creating constraint data 45<br />

3-11 Algorithm <strong>for</strong> initializing a random population 45<br />

3-12 Pseudo code <strong>for</strong> creating a random chromosome 46<br />

3-13 Pseudo code <strong>for</strong> checking small classroom conflicts 47<br />

3-14 Pseudo code <strong>for</strong> checking lecturer’s busy time 47<br />

3-15 Pseudo code <strong>for</strong> detecting conflicts about preferable times 48<br />

3-16 Pseudo code <strong>for</strong> checking conflicts about double scheduled lecturers 48<br />

3-17 Pseudo code <strong>for</strong> checking conflicts about double scheduled classes 49<br />

3-18 Pseudo code <strong>for</strong> checking conflicts about double scheduled courses 49<br />

3-19 Crossover 50<br />

3-20 Pseudo code <strong>for</strong> crossover 51<br />

3-21 Mutation 52<br />

3-22 Pseudo code <strong>for</strong> mutating a chromosome 52<br />

3-23 Hardware and software <strong>for</strong> each machine 53<br />

3-24 MDS configuration 54<br />

3-25 Working with a broker 55<br />

3-26 Centralized scheduling 56<br />

3-27 Job scheduler <strong>for</strong> the grid computing environment 57<br />

3-28 Overview of GRAM and GASS 58<br />

4-1 The average fitness value of hard constraints vs various weights 67<br />

4-2 The average fitness value of soft constraints vs various weights 68<br />

4-3 The average execution time <strong>for</strong> a resultant solution vs population sizes 69<br />

4-4 The GA with various mutation rates 71<br />

4-5 The execution time versus various models 72<br />

4-6 Parallel execution versus serial execution 73<br />

C-1 Visual-grid-proxy-init 113<br />

C-2 Service configuration 115<br />

C-3 Result in the web browser 117<br />

ix


CHAPTER 1<br />

INTRODUCTION<br />

1.1 Problem Statement and Background<br />

1.1.1 Problem Statement<br />

Course scheduling problems are very common, but very difficult to solve in<br />

practice. They are known as constraint optimization problems, NP hard problems,<br />

these are concerned with the allocations, subject to constraints of given resources to<br />

objects in space and time in such a way as to satisfy a possible set of desirable<br />

<strong>objective</strong>s [1, 2, 3]. Courses will be scheduled to time and classrooms so that lecturers<br />

can teach and students can attend these courses without any conflicts. A large number<br />

of researches have been carried out on these problems [1, 2, 3]. However, most of the<br />

researches have focused on solving the problems of universities without the<br />

separation of resources between faculties. The course scheduling <strong>for</strong> a <strong>multi</strong>ple<br />

faculty university still needs more researches [4, 5].<br />

Faculty 1<br />

Lecturers Classrooms<br />

Courses Timetable<br />

Faculty 2<br />

Lecturers Classrooms<br />

Courses Timetable<br />

Shared lecturers, courses, and classrooms<br />

Faculty n<br />

Lecturers Classrooms<br />

Courses Timetable<br />

FIGURE 1-1 Shared lecturers, courses, and classrooms<br />

The course scheduling will become more complex in a <strong>multi</strong>ple faculty<br />

university where each faculty has its own resources such as lecturers, courses, and<br />

classrooms, as illustrated in Figure 1-1. Moreover, these resources can be shared<br />

between faculties. The lecturers working in a faculty can teach courses of other<br />

faculties. The courses can be attended by students who come from different faculties.


2<br />

The classrooms are sometime shared between faculties. Each faculty needs its own<br />

timetable <strong>for</strong> its own resources. As a result, many problems still exist in the course<br />

scheduling related to the shared resources.<br />

Course scheduling itself contains a large number of conflicts and needs a large<br />

amount of processing time. For course scheduling in the <strong>multi</strong>ple faculties, the data<br />

used <strong>for</strong> scheduling also needs to be collected and shared across the faculties. This<br />

study proposes a hybrid centralized and de-centralized approach, <strong>genetic</strong> <strong>algorithm</strong>,<br />

and grid computing environment to the course scheduling problem in <strong>multi</strong>ple faculty<br />

universities. The proposed approach and the <strong>genetic</strong> <strong>algorithm</strong> are used to solve the<br />

NP hard problems. In addition, the grid computing environment is used as<br />

infrastructure <strong>for</strong> distributed and parallel computing.<br />

1.1.2 Background<br />

The <strong>genetic</strong> <strong>algorithm</strong> (GA) is a global search optimization <strong>algorithm</strong> using<br />

parallel points. While searching <strong>for</strong> solutions, the GA uses a fitness function that<br />

affects the direction of the search [6]. The GA evaluates the population by using<br />

<strong>genetic</strong> operators such as selection, crossover, and mutation. The outline of the basic<br />

GA is presented in Figure 1-2.<br />

1 [Start] Generate random population of n chromosomes.<br />

2 [Fitness] Evaluate the fitness f(x) of each chromosome x in the population.<br />

3 [New population] Create a new population by repeating following steps until the new population is<br />

complete.<br />

3.1 [Selection] Select two parent chromosomes from a population according to their fitness (the better<br />

fitness, the bigger chance to be selected).<br />

3.2 [Crossover] With a crossover rate cross over the parents to <strong>for</strong>m new offspring (children). If no<br />

crossover was per<strong>for</strong>med, offspring is the exact copy of parents.<br />

3.3 [Mutation] With a mutation rate mutate new offspring at each locus (position in chromosome).<br />

3.4 [Accepting] Place new offspring in the new population.<br />

4 [Replace] Use new generated population <strong>for</strong> a further run of the <strong>algorithm</strong>.<br />

5 [Test] If the end condition is satisfied, stop, and return the best solution in current population.<br />

6 [Loop] Go to step 2.<br />

FIGURE 1-2 Outline of the basic <strong>genetic</strong> <strong>algorithm</strong> [6]


3<br />

The GA is based on the principle of survival of the fittest members of the<br />

population to produce the solution. The selected individual according to the fitness<br />

level of the problem domain creates the set of solutions. The GA is an iterative<br />

process that is repeated until the convergence criterion is satisfied.<br />

Grid computing, most simply stated, is distributed computing. The goal is to<br />

create the illusion of a simple yet large and powerful self-managing virtual computer<br />

out of a large collection of connected heterogeneous systems sharing various<br />

combinations of resources [7].<br />

Not all applications are suitable <strong>for</strong> the use of the grid computing. We need to<br />

look at considerations <strong>for</strong> an application to run in a grid environment where resources<br />

are dynamically allocated based on actual needs. Normally, an application consists of<br />

jobs that can be executed in parallel, serial, and networked. If an application consists<br />

of several jobs that can be executed in parallel, a grid may be very suitable <strong>for</strong><br />

effective execution on dedicated nodes, especially in the case when there is no or a<br />

very limited exchange of data among the jobs [8].<br />

1.2 The Objectives of the Study<br />

The <strong>objective</strong>s of this study can be defined as follows:<br />

1.2.1 To provide a system that helps <strong>multi</strong>ple faculty universities solve their<br />

course scheduling problems.<br />

1.2.2 To investigate the use of the proposed GA and the grid computing<br />

environment to the course scheduling problem in <strong>multi</strong>ple faculty universities.<br />

1.3 The Scope of the Study<br />

The scope of this study can be defined as follows:<br />

1.3.1 The system must satisfy the following hard constraints:<br />

1.3.1.1 Every course must be scheduled exactly once in a week.<br />

1.3.1.2 For courses at each faculty, values assigned to days in a week are<br />

Monday, Tuesday, Wednesday, Thursday, and Friday. In addition, 8 time-slots is used<br />

in a day. Hours are assigned to time-slots are 08:00-12:00 and 13:00-17:00. No<br />

course is scheduled cross morning and afternoon working sessions. Figure 1-3<br />

presents a sample timetable <strong>for</strong> a classroom.


4<br />

Classroom i<br />

Time-slot Hour Mon Tue Wed Thu Fri<br />

0 08:00-09:00 Course 1 Course 3 Course 15<br />

1 09:00-10:00 Course 1 Course 4 Course 3 Course 15<br />

2 10:00-11:00 Course 1 Course 4 Course 2 Course 15<br />

3 11:00-12:00 Course 2 Course 15<br />

4 13:00-14:00 Course 8 Course 5 Course 6 Course 7<br />

5 14:00-15:00 Course 8 Course 5 Course 6 Course 7<br />

6 15:00-16:00 Course 13 Course 5 Course 19 Course 7<br />

7 16:00-17:00 Course 13 Course 19 Course 7<br />

FIGURE 1-3 Sample timetable <strong>for</strong> a classroom<br />

1.3.1.3 Neither a class nor a lecturer nor a classroom is assigned to more<br />

than one course at the same time.<br />

1.3.1.4 Each course must be booked to a classroom that is large enough to<br />

hold students of that course.<br />

1.3.1.5 In each semester, each class of students studies from list of<br />

courses in the curriculum. All these courses have to be scheduled to different times in<br />

each week so that all students in that class can attend.<br />

1.3.1.6 If a course is attended by students who come from different<br />

classes, it has to be scheduled so that these students can attended this course and their<br />

other courses without any time conflicts.<br />

1.3.1.7 Each lecturer can teach courses in his/her faculty and other<br />

faculties.<br />

1.3.1.8 Lecturers can require some unavoidable working-sessions in a<br />

week. For instance, Dr. Tim cannot teach on Monday morning because of a weekly<br />

meeting. There<strong>for</strong>e, his courses must be scheduled at another time.<br />

1.3.1.9 Each course must be booked to a classroom of a designated<br />

classroom group.<br />

1.3.2 The system tries to satisfy as much as possible the following soft<br />

constraint:<br />

The system avoids booking lecturers’ courses to their undesired time.


5<br />

Unlike the hard constraint in section 1.3.1.8 that the system must satisfy it, the<br />

soft constraint will be satisfied as much as possible. Several conflicts of this soft<br />

constraint in the resultant solution are acceptable.<br />

All hard and soft constraints are applied to all timetables in all faculties.<br />

1.3.3 The Globus Toolkit 2.2 is used as middleware to implement the grid<br />

computing environment [7, 8].<br />

1.3.4 The efficiency of the proposed GA and the grid computing environment<br />

will be evaluated and discussed on the following.<br />

1.3.4.1 The suitability of the proposed GA against the hard constraints<br />

and soft constraints.<br />

1.3.4.2 Per<strong>for</strong>mance measurement of using the grid computing vs. not<br />

using grid computing.<br />

1.4 The Utilizations of the Study<br />

1.4.1 To provide a system that helps <strong>multi</strong>ple faculty universities to resolve their<br />

course scheduling problems.<br />

1.4.2 To investigate the efficiency of using a <strong>genetic</strong> <strong>algorithm</strong> and grid<br />

computing to the course scheduling problem in a <strong>multi</strong>ple faculty university.


CHAPTER 2<br />

LITERATURE REVIEW<br />

In this chapter, course scheduling problems, related works, <strong>genetic</strong> <strong>algorithm</strong>s,<br />

and grid computing are reviewed. Section 2.1 describes the activities that are to<br />

prepare data <strong>for</strong> the course scheduling. Section 2.2 describes the related works,<br />

including existing researches. Section 2.3 presents the basic knowledge about <strong>genetic</strong><br />

<strong>algorithm</strong>s. And finally, section 2.4 presents knowledge about grid computing and the<br />

Globus Toolkit 2.2.<br />

2.1 The Course Scheduling Problems<br />

Course scheduling is a part of a general scheduling problem. It deals with the<br />

satisfactory allocation of resources over time to achieve an organization’s tasks. It is a<br />

decision-making process with the intention of optimizing one or more <strong>objective</strong>s.<br />

In any optimization problem, there are <strong>objective</strong>s, decisions to make, available<br />

resources and related constraints. In the course scheduling problem, available<br />

resources are lecturers, students, courses, classrooms, and time periods. A solution<br />

must group these resources together to create a timetable that satisfies the constraints.<br />

There are two types of constraints: hard constraints and soft constraints. Hard<br />

constraints are conditions that must be satisfied, such as no two distinct courses can<br />

be held at the same time and the same classroom. Soft constraints, however, may be<br />

violated, but should be satisfied as much as possible, such as some lecturers dislike<br />

teaching at certain times.<br />

Course scheduling systems are usually quite varied at each university. This is<br />

based on a set of hard and soft constraints as well as requirements about the<br />

management at each university. This section introduces the activities needed <strong>for</strong> a<br />

basic course scheduling problem. A particular course scheduling system is introduced<br />

in detail in chapter 3.


8<br />

2.1.1 General Activities <strong>for</strong> Course Scheduling<br />

Each university usually has a central course scheduling office where<br />

experienced staffs are working. In each department of the faculties, several staffs also<br />

have similar responsibilities. The course scheduling activities will need the<br />

cooperation of all these staffs.<br />

2.1.2 The Activities of Staffs in Departments of Each Faculty<br />

Each department has the responsibilities of teaching many courses. To prepare<br />

the data <strong>for</strong> course scheduling, each department has to make a teaching plan. The<br />

departments have to know the list of courses and corresponding classes that will study<br />

these courses. The departments will make an assignment based on their own resources<br />

such as lecturers and classrooms. The resources that concern the lecturers are<br />

sometime subject to change. For instance, some lecturers are in training or feel bored<br />

if teaching the same course every semester. Some courses sometime need lecturers<br />

from other faculties. Table 2-1 shows an example of courses taught by a department.<br />

TABLE 2-1 Courses taught by a department<br />

Course Class Number of<br />

Students<br />

Section Lecturer Classroom<br />

Group<br />

CSC211 BSCS04A 30 <br />

CSC211 BSCS05B 35 <br />

CSC221 BSCS04A 30 <br />

CSC210 BSCS04A 30 <br />

CSC110 BSCS04A 30 <br />

CSC113 BSCS04A 30 <br />

CSC113 BSCS04B 35 <br />

In this case, a class is a group of students who study the same program and have<br />

the same enrolment year. A classroom group is a group of classrooms that have the<br />

same function. A course will be scheduled to a classroom of a designed classroom<br />

group. Of course, each department knows how many students will study a particular<br />

course. This helps the department separate the courses into a suitable number of<br />

sections. A section with too many students usually makes it difficult <strong>for</strong> a lecturer to


9<br />

teach effectively. However, in some cases, if the department does not have enough<br />

classrooms or lecturers, a section with a large number of students is acceptable.<br />

Finally, an assignment is created <strong>for</strong> each department, as shown in Table 2-2.<br />

TABLE 2-2 Teaching assignment<br />

Course Class Number of<br />

Students<br />

Section Lecturer Classroom<br />

Group<br />

CSC211 BSCS04A 30 1 00020 CSCCOMLB<br />

CSC211 BSCS05B 35 2 00020 CSCCOMLB<br />

CSC221 BSCS04A 30 1 00012 CSCLECRM<br />

CSC210 BSCS04A 30 1 00012 CSCLECRM<br />

CSC110 BSCS04A 30 1 00015 CSCLECRM<br />

CSC113 BSCS04A 30 1 00023 CSCCOMLB<br />

CSC113 BSCS04B 35 1 00023 CSCCOMLB<br />

In Table 2-2, course CSC211 is studied by two different classes: BSCS04A and<br />

BSCS05B, and it is divided into two distinct sections: 1 and 2. On the other hand,<br />

course CSC113 is also studied by two different classes: BSCS04A and BSCS05B, but<br />

both are mixed to study the same section. CSC211 and CSC113 use classrooms in<br />

group CSCCOMLB whereas CSC221, CSC210, and CSC110 use classrooms in group<br />

CSCLECRM.<br />

2.1.3 Activities of Staffs at the Central Course Scheduling Office<br />

After the central course scheduling office receives all data from the departments,<br />

they will run the course scheduling system to create a timetable. Booking sections of<br />

courses to time-slots in the timetable is a hard job. Its complexity depends on the<br />

complexity of the constraints and rules of each university. The Table 2-3 presents a<br />

sample timetable.<br />

The timetable has to satisfy the constraints. Lecturers who teach several sections<br />

have to be scheduled so that they can teach their sections without any time conflict.<br />

One classroom cannot hold more than one section at the same time. Once a class


10<br />

studies many different courses, these courses also have to be scheduled to different<br />

times. The other constraints are also satisfied.<br />

TABLE 2-3 Sample timetable<br />

Course Section Time Day Classroom Lecturer<br />

CSC211 1 13:00-16:00 W B304A01 00020<br />

CSC211 2 8:00-11:00 W B304A01 00020<br />

CSC221 1 10:00-12:00 T B304A05 00012<br />

CSC210 1 13:00-16:00 M B304A02 00012<br />

CSC110 1 9:00-12:00 F B304A02 00015<br />

CSC113 1 13:00-16:00 T B304A05 00023<br />

2.2 The Related Works on Course Scheduling Problems<br />

Course scheduling is a <strong>multi</strong>-dimensional NP-Complete problem that has<br />

generated hundreds of papers and thousands of researchers who have attempted to<br />

solve this problem. In this section, we discuss some of the primary approaches that<br />

have been applied to general course scheduling problems, scheduling <strong>for</strong> courses and<br />

exams. In practice, the main idea used <strong>for</strong> the course scheduling can be applied to<br />

exam scheduling and vice versa. The approaches can be divided into four groups:<br />

sequential methods, cluster methods, constraint based methods, and meta-heuristic<br />

methods [9].<br />

2.2.1 Sequential Methods<br />

Sequential methods order the events <strong>for</strong> scheduling using heuristics (often graph<br />

coloring heuristics). They assign the ordered events to valid time periods so that no<br />

events in the period are in conflict with each other, i.e. two events which require the<br />

same resource are not scheduled in the same time period [10].<br />

The graph coloring approach usually presents events as different vertices with<br />

an edge between the two vertices where two respective events conflict in some way.<br />

The graph coloring is the process of allocating different colors to each vertex so that<br />

no two adjacent (conflicting) vertices have the same color.


11<br />

The set of vertexes are considered as the set of classes and the edges<br />

corresponding to courses that conflict with each other. For instance, the courses are in<br />

conflict with each other if there is a student who must be in both courses at the same<br />

time. Then, coloring the graph is to assign courses to appropriate periods such that<br />

conflicts are avoided [11].<br />

FIGURE 2-1 Graph of 12 events<br />

The final result of coloring can be presented by a three color graph (denoted by<br />

three different shapes), shown in Figure 2-2.<br />

FIGURE 2-2 Graph after coloring<br />

This result means that the timetable may be constructed in three periods, one<br />

period per color. For larger timetables or graphs this is much less likely to be the case,<br />

since the graph coloring problem is NP-complete. Many researches used a heuristic<br />

<strong>algorithm</strong> to find a reasonable coloring if not an optimal one [12-13].


12<br />

2.2.2 Cluster methods<br />

Cluster methods split the set of events into groups which are conflict-free and<br />

then assign the groups to the time periods to fulfill the other constraints imposed on<br />

the scheduling problem [14]. This technique can also be applied to schedule courses<br />

or exams. The <strong>multi</strong>phase exam scheduling package described by Arani et al. consists<br />

of three phases [15]. In the first phase, clusters of exams are <strong>for</strong>med with the aim of<br />

minimizing the number of students with simultaneous exams. In the second phase,<br />

these clusters are assigned to exam days while minimizing the number of students<br />

with two or more exams per day. Finally the exam days and clusters are arranged to<br />

minimize the number of students with consecutive exams.<br />

The main drawback of these approaches is that the clusters of events are <strong>for</strong>med<br />

and fixed at the beginning of the <strong>algorithm</strong> and that may result in a poor quality<br />

timetable.<br />

2.2.3 Constraint Based Methods<br />

A constraint satisfaction problem (CSP) can be expressed in the following <strong>for</strong>m.<br />

Given a set of variables, a set of possible values that can be assigned to each variable,<br />

and a list of constraints, the CSP will find end values of the variables that satisfy<br />

every constraint. For example, given x = {x 1 , x 2 , x 3 }, possible values of x 1 , x 2 , and x 3<br />

in [0..100], find x 1 , x 2 , and x 3 so that they satisfy constraints: x 1 ≠ x 2 , 2x 1 =10x 2 + x 3 ,<br />

and x 1 x 2 < x 3 .<br />

Constraint based approaches model a course scheduling problem as a set of<br />

variables (i.e. courses) to which values (i.e. resources such as classrooms and time<br />

periods) have to be assigned to satisfy a number of constraints (i.e. classroom sizes<br />

and contiguous periods) [16-18].<br />

Constraint Logic Programming (CLP) is usually used <strong>for</strong> CSP. A labeling<br />

strategy dictates the order in which the search space is traversed, which is vital <strong>for</strong> an<br />

effective search. There are two orderings. The first order in which the variables are<br />

instantiated (i.e. courses placed), and the second order in which the values (i.e. times<br />

and classrooms) are assigned. Programming languages such as PROLOG, LISP, C,<br />

and C++ can be used to CLP.


13<br />

Gueret et al. have implemented a lecture scheduling system in CHIP called<br />

FELIAC [19]. CHIP is a Constraint Logic Programming language based on Prolog,<br />

which provides several types of constraints. CHIP’s new “cumulative” constraints<br />

limit the amount of a resource which can be used at any time, and Gueret et al. uses<br />

this to implement the classroom capacity constraint. Longest courses are scheduled<br />

first in the day which has the shortest total length of clashing lectures. Relaxation of<br />

constraints is essential <strong>for</strong> highly constrained CSPs of the course scheduling. (A<br />

problem in which constraints may be relaxed is called a dynamic CSP.) For each<br />

failed assignment, FELIAC stores a “justification”, which identifies the constraints<br />

which the assignment violated. These justifications are used to undo the effects of a<br />

constraint when it is relaxed.<br />

Using the CLP <strong>for</strong> the course scheduling usually brings advantages such as<br />

short programs and fast execution time.<br />

2.2.4 Meta-heuristic Methods<br />

Over the last two decades a variety of meta-heuristic approaches such as<br />

simulated annealing, tabu search, <strong>genetic</strong> <strong>algorithm</strong>s, and hybrid approaches have<br />

been investigated <strong>for</strong> the course scheduling problem. Meta-heuristic methods begin<br />

with one or more initial solutions and employ search strategies that try to avoid local<br />

optima. All of these search <strong>algorithm</strong>s can produce high quality solutions but often<br />

have a considerable computational cost [20-25].<br />

FIGURE 2-3 Local optimal problem


14<br />

2.2.4.1 Simulated Annealing<br />

Simulated annealing (SA) is a Monte-Carlo technique which can be used to find<br />

solutions <strong>for</strong> optimization problems. The technique simulates the cooling of a<br />

collection of hot vibrating atoms.<br />

The approach comprises of the following:<br />

• A cost function E that associates Energy with the state of the system.<br />

• A ''temperature'' T that decreases slowly<br />

• Various ways to change the state of the system.<br />

Figure 2-4 presents the SA <strong>algorithm</strong>.<br />

1. Generate an initial timetable s.<br />

2. Set the initial best timetable s* = s.<br />

3. Compute cost of s: C(s).<br />

4. Compute initial temperature T 0 .<br />

5. Set the temperature T = T 0 .<br />

6. While stop criterion is not satisfied do:<br />

a. Repeat Markov chain length (M) times:<br />

i. Select a random neighbor s’ to the cu rrent timetable, (s’ Ns).<br />

ii. Set Δ(C) = C(s’) − C(s).<br />

iii. If (Δ(C) > 0 {downhill move}):<br />

• Set s = s’.<br />

• If C(s) < C(s*) then set s* = s.<br />

iv. If (Δ(C)<br />

> 0 {uphill move}):<br />

• Choose a random number r uni<strong>for</strong>mly from [0; 1].<br />

• If r < e −Δ (C)/T then set s = s’<br />

b. Reduce (or update) temperature T.<br />

7. Return the timetable s*.<br />

FIGURE 2-4 Simulated annealing <strong>algorithm</strong><br />

The temperature would increase the cost by Δ(C). Also, s is the current schedule<br />

and s’ is a neighboring schedule obtained from the current neighborhood space (Ns)<br />

by swapping two courses in time and/or space.


15<br />

When the atoms are at a high temperature they are free to move around, and<br />

tend to move with random displacements. However, as the mass cools the interparticle<br />

bonds <strong>for</strong>ce the atoms together. When the mass is cool, no movement is<br />

possible, and the configuration is frozen. If the mass is cooled quickly then chance of<br />

obtaining a low cost solution is lower than if it is cooled slowly (or annealed). At any<br />

given temperature a new configuration of atoms is accepted if the system energy is<br />

lowered. However, if the energy is higher, then the configuration is accepted only if<br />

the probability of such an increase is lower than that expected at the given<br />

temperature [26-27].<br />

The SA <strong>algorithm</strong> has both advantages and disadvantages compared to other<br />

global optimization techniques. It is an extremely popular method and appears<br />

competitive with many of the best heuristics in solving large problems such as course<br />

scheduling, job scheduling, etc. However, it has two drawbacks: one being trapped by<br />

local minima or two taking too long to find a reasonable solution. In order to<br />

overcome these drawbacks, many recent researches combine using SA with other<br />

heuristics such as the <strong>genetic</strong> <strong>algorithm</strong>s or implemented SA as parallel <strong>algorithm</strong>s.<br />

The main aim is to avoid local minima traps and/or to have faster convergence [28-<br />

29].<br />

2.2.4.2 Tabu Search<br />

Tabu search is a meta-heuristic that guides a local heuristic search procedure to<br />

explore the solution space beyond local optimality. Tabu search has been applied<br />

successfully in a number of combinatorial optimization problems, in particular course<br />

scheduling [30-31].<br />

The basic concept of tabu search as described by Glover is as: “A meta-heuristic<br />

superimposed on another heuristic. The overall approach is to avoid entrainment in<br />

cycles by <strong>for</strong>bidding or penalizing moves which take the solution, in the next iteration,<br />

to points in the solution space previously visited (“tabu”)” [32].<br />

Tabu Search is a typical local search that explores its neighborhood <strong>for</strong> a<br />

trans<strong>for</strong>med solution (s’) that can be obtained by a simple local change. Each time<br />

that a solution is entered is known as a move. In simple cases, every move is added<br />

into a tabu list that remembers the N recent moves taken, where N is the size of the<br />

tabu list. A tabu list acts as a short-term memory (like a first in first out) that


16<br />

remembers the N recent moves. Any new move that is already in the tabu list is<br />

avoided, that is, a tabu. This approach prevents the recently tried movements and<br />

prevents the search from cycling round the local optimal area thus driving the search<br />

towards a different direction in the search space, resulting in better opportunity<br />

towards global optimal.<br />

The decision to move to a trans<strong>for</strong>med solution state is usually based on the<br />

steepest descent or mildest ascent in the <strong>objective</strong> function value. With this strategy, a<br />

heuristic accepts a marginal and temporary deterioration in its <strong>objective</strong> function<br />

value in exchange <strong>for</strong> opportunities to escape from a local optimal and move towards<br />

the global optimal, as illustrated in Figure 2-3. Figure 2-5 presents the tabu search<br />

<strong>algorithm</strong>.<br />

1. Generate an initially random but feasible solution s.<br />

2. Repeat:<br />

i. Attempt to find an improved feasible solution s' with the <strong>objective</strong> function<br />

value z(s'), avoid using moves already stored in the tabu list.<br />

ii. Compute the moves from s to s’.<br />

iii. Update tabu list by adding the latest move so that it is set as a tabu <strong>for</strong> some subsequent<br />

moves.<br />

iv. If z(s') < z(s) + (mildest ascent tolerance) then<br />

per<strong>for</strong>m exchanges: s := s', z(s) := z(s')<br />

End if<br />

Until (no improved solution is found) or (stopping criteria is met)<br />

FIGURE 2-5 Tabu search <strong>algorithm</strong><br />

Result z(s') is the best estimated minimum, it does not guarantee to find the<br />

global minimum but stands a better chance as compared to gradient descent approach.<br />

2.2.4.3 Genetic Algorithms<br />

The idea of <strong>genetic</strong> <strong>algorithm</strong>s is based on the evolutionary principle developed<br />

by Darwin [6]. A “population” of feasible timetables is maintained. The “fittest”<br />

timetables are selected to <strong>for</strong>m the basis of the next iteration, or “generation”, thus<br />

improving the overall fitness whilst maintaining diversity.


17<br />

The outline of the basic <strong>genetic</strong> <strong>algorithm</strong> is presented in section 1.1.2.<br />

At present, a large number of researches have used the GAs <strong>for</strong> course<br />

scheduling. The difference of the proposed GAs depends on representing<br />

chromosomes and populations, setting up GAs parameters (population size, crossover<br />

rate, and mutation rate), designing strategies in selection, crossover, and mutation, and<br />

evaluating the fitness function.<br />

The chromosome represents a timetable that is a solution. It can be represented<br />

directly or indirectly. In the <strong>for</strong>mer, the timetable is usually a long bit string of<br />

encoding, that stands <strong>for</strong> when and where each course takes place [33]. Thus, pairs of<br />

selected timetables may be “crossed over” by cutting and splicing the bit strings to<br />

create a new timetable. On the other hand, in the later, the timetable can be<br />

represented by using a data structure such as a <strong>multi</strong>-dimension array or a linked list.<br />

The indirect representation brings the advantage of processing time and simple GA<br />

operations. However, it needs complex processing to exchange and maintain<br />

constraints between the bit string and real timetable. In contrast, the direct<br />

representation needs more processing time <strong>for</strong> GA operations, but it is easy to<br />

maintain a large number of constraints <strong>for</strong> a real timetable. More details of the GAs<br />

will be presented in section 2.3.<br />

2.2.4.4 Hybrid Approaches<br />

The above approaches have been proved that they can create good solutions <strong>for</strong><br />

course scheduling problems. However, as above mentioned, they usually need a long<br />

computational time. In order to overcome this problem, many researchers have used<br />

hybrid approaches.<br />

Tuan et al. have successfully combined constraint programming and simulated<br />

annealing <strong>for</strong> the problem of exam scheduling with real data sets [34]. The proposed<br />

<strong>algorithm</strong> consists of two phases. A constraint programming phase is to provide an<br />

initial solution. This solution is improved by the simulated annealing phase. Tuan et<br />

al. have applied Kempe chain as neighborhood structure, a special technique <strong>for</strong><br />

determining starting temperature T 0 and a mechanism that allows the user to define a<br />

certain period of time in which the <strong>algorithm</strong> should run. The mentioned mechanism<br />

not only helps to increase the efficiency of the SA <strong>algorithm</strong> but also makes simulated<br />

annealing experiments easier.


18<br />

Alkan et al. have developed a Memetic Algorithms (MAs) by combining GAs<br />

and local search techniques, hill climbing [1]. This approach has achieved good<br />

computational per<strong>for</strong>mance. The idea behind hill climbing approach is to create a hill<br />

climbing method <strong>for</strong> each type of constraint and combine them under a single hill<br />

climbing method, denoted as AHC. Starting from a high resolution, select a constraint<br />

type based hill climbing method by using a selection method, giving a higher chance<br />

to an operator of the related constraint type causing more violations. There are 3<br />

improvement strategies. First of all, invoke the selected operator <strong>for</strong> the related type<br />

of constraints, producing a new individual. Second, if this attempt does not make any<br />

improvement on the old one, ignore the new individual. Depending on the constraint<br />

type, a selected block of genes, possibly causing more violations among the other<br />

blocks, are attempted to be corrected. Finally, if this attempt also fails to produce a<br />

better individual, then using the old one, a selected single gene in a block of genes,<br />

possibly causing more violations, is attempted to be corrected. If the fitness of an<br />

individual improves in any case, AHC is reapplied on it.<br />

Some other researchers have also used distributed and parallel computing<br />

models <strong>for</strong> course scheduling problem. One of them is the Multi Agent System model,<br />

which has mentioned to problems that are similar to our study.<br />

The Multi Agent System (MAS) model has been introduced to the course<br />

scheduling problem by Kaplansky et al. [35]. The architecture is composed of a set of<br />

autonomous scheduling agents (SAis) that solve the course scheduling <strong>for</strong> each<br />

department. Each agent has its own course scheduling problem and its own goals. The<br />

scheduling agents must coordinate these goals with the other agents in order to<br />

achieve a solution <strong>for</strong> the whole organization that yields a better result with respect to<br />

the global targets. To achieve a coherent and consistent global solution, the SAs make<br />

use of a sophisticated negotiation protocol among scheduling agents that always ends<br />

in an agreement (not ensured to be optimal). The main functionalities of this protocol<br />

are agent to agent relation definition, a mechanism to approve a chain of request <strong>for</strong><br />

changes (RfC) and an electronic marketplace <strong>for</strong> bidding on preferred common timeslots.


19<br />

As shown in Figure 2-6, first of all, the scheduling agents conduct negotiation<br />

<strong>for</strong> global timetable. Next, the room agent (RA) adds new constraints to the SAis. The<br />

SAis solve the modified problem and send back a new timetable.<br />

FIGURE 2-6 Multi agent system<br />

2.3 Genetic Algorithms<br />

The <strong>genetic</strong> <strong>algorithm</strong>s are inspired by Darwin's theory of evolution. Simply<br />

said, problems are solved by an evolutionary process resulting in a best (fittest)<br />

solution - in other words, the solution is evolved.<br />

Algorithm begins with a set of solutions (represented by chromosomes) called<br />

population. Solutions from one population are taken and used to <strong>for</strong>m a new<br />

population. This is motivated by a hope, that the new population will be better than<br />

the old one. Solutions which are then selected to <strong>for</strong>m new solutions (offspring) are<br />

selected according to their fitness - the more suitable they are the more chances they<br />

have to reproduce [6].<br />

The outline of the basic <strong>genetic</strong> <strong>algorithm</strong> is presented in section 1.1.2.<br />

2.3.1 Biological Background<br />

2.3.1.1 Chromosome<br />

All living organisms consist of cells. In each cell there is the same set of<br />

chromosomes. The chromosomes are strings of DNA and serve as a model <strong>for</strong> the<br />

whole organism. A chromosome consists of genes, blocks of DNA. Each gene<br />

encodes a particular protein. Basically, it can be said that each gene encodes a trait,<br />

<strong>for</strong> example color of eyes. Possible settings <strong>for</strong> a trait (e.g. blue, brown) are called<br />

alleles. Each gene has its own position in the chromosome. This position is called<br />

locus.


20<br />

Complete set of <strong>genetic</strong> material (all chromosomes) is called genome. Particular<br />

set of genes in genome is called a genotype. The genotype with later development<br />

after birth is the base <strong>for</strong> the organism's phenotype, its physical and mental<br />

characteristics, such as eye color, intelligence, etc.<br />

2.3.1.2 Reproduction<br />

During <strong>reproduction</strong>, recombination (or crossover) first occurs. Genes from<br />

parents combine to <strong>for</strong>m a whole new chromosome. The newly created offspring can<br />

then be mutated. Mutation means that the elements of DNA are a bit changed. These<br />

changes are mainly caused by errors in copying genes from parents.<br />

The fitness of an organism is measured by success of the organism in its life<br />

(survival).<br />

2.3.2 Operators of GA<br />

As presented in the outline of the basic <strong>genetic</strong> <strong>algorithm</strong>, the crossover and<br />

mutation are the most important parts of the <strong>genetic</strong> <strong>algorithm</strong>. The per<strong>for</strong>mance is<br />

influenced mainly by these two operators. Be<strong>for</strong>e we can explain more about<br />

crossover and mutation, more in<strong>for</strong>mation on chromosomes will be outlined.<br />

A chromosome should in some way contain in<strong>for</strong>mation about the solution that<br />

it represents. The most common way of encoding is a binary string, as shown in<br />

Figure 2-7.<br />

Chromosome 1 1101100100110110<br />

Chromosome 2 1101111000011110<br />

FIGURE 2-7 Encoding chromosome<br />

Each chromosome is represented by a binary string. Each bit in the string can<br />

represent some characteristics of the solution. Another possibility is that the whole<br />

string can represent a number. Of course, there are many other ways of encoding. The<br />

encoding depends mainly on the solved problem. For example, one can encode<br />

directly integer or real numbers. Sometimes it is useful to encode some permutations<br />

and so on.


21<br />

2.3.2.1 Crossover<br />

After we have decided what encoding we will use, we can proceed to crossover<br />

operation. Crossover operates on selected genes from parent chromosomes and<br />

creates a new offspring. The simplest way of doing that is to choose at random some<br />

crossover point and copy everything be<strong>for</strong>e this point from the first parent and then<br />

copy everything after the crossover point from the other parent.<br />

Crossover can be illustrated as in Figure 2-8 (| is the crossover point).<br />

Chromosome 1 11011 | 00100110110<br />

Chromosome 2 11011 | 11000011110<br />

Offspring 1 11011 | 11000011110<br />

Offspring 2 11011 | 00100110110<br />

FIGURE 2-8 Example of crossover<br />

There are other ways to make a crossover. For example, we can choose more<br />

crossover points. Crossover can be quite complicated and depends mainly on the<br />

encoding of chromosomes. A specific crossover made <strong>for</strong> a specific problem can<br />

improve the per<strong>for</strong>mance of the <strong>genetic</strong> <strong>algorithm</strong>.<br />

2.3.2.2 Mutation<br />

After a crossover is per<strong>for</strong>med, mutation takes place. Mutation is intended to<br />

prevent falling of all solutions in the population into a local optimum of the solved<br />

problem. Mutation operation randomly changes the offspring resulted from crossover.<br />

In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or<br />

from 0 to 1. Mutation can be then illustrated as in Figure 2-9.<br />

Original offspring 1 1101111000011110<br />

Original offspring 2 1101100100110110<br />

Mutated offspring 1 1100111000011110<br />

Mutated offspring 2 1101101100110110<br />

FIGURE 2-9 Example of mutation


22<br />

The technique of mutation (as well as crossover) depends mainly on the<br />

encoding of chromosomes. For example, when we are encoding permutations,<br />

mutation could be per<strong>for</strong>med as an exchange of two genes.<br />

2.3.3 Parameters of GA<br />

2.3.3.1 Crossover and Mutation Rate<br />

There are two basic parameters of a GA: crossover rate and mutation rate.<br />

The crossover rate describes how often a crossover will be per<strong>for</strong>med. If there is<br />

no crossover, offspring are exact copies of parents. If there is crossover, offspring are<br />

made from parts of both parent's chromosome. If crossover rate is 100%, then all<br />

offspring are made by crossover. If it is 0%, whole new generation is made from exact<br />

copies of chromosomes from the old population. Crossover is made in hope that new<br />

chromosomes will contain good parts of old chromosomes and there<strong>for</strong>e the new<br />

chromosomes will be better. However, it is good to leave some part of old population<br />

to survive to next generation.<br />

The mutation rate describes how often parts of chromosome will be mutated. If<br />

there is no mutation, offspring are generated immediately after crossover (or directly<br />

copied) without any change. If mutation is per<strong>for</strong>med, one or more parts of a<br />

chromosome are changed. If mutation rate is 100%, whole chromosome is changed, if<br />

it is 0%, nothing is changed. Mutation generally prevents the GA from falling into<br />

local extremes. Mutation should not occur very often because the GA will in fact<br />

change to random search.<br />

2.3.3.2 Other Parameters<br />

One another important parameter is population size. Population size describes<br />

how many chromosomes are in a population. If there are too few chromosomes, the<br />

GA has few possibilities to per<strong>for</strong>m crossover and only a small part of search space is<br />

explored. On the other hand, if there are too many chromosomes, the GA slows down.<br />

Research shows that after some limit (which depends mainly on encoding and the<br />

problem) it is not useful to use very large populations because it does not solve the<br />

problem faster than moderate sized populations.<br />

2.3.4 Methods of Selection<br />

As presented in the outline of the basic <strong>genetic</strong> <strong>algorithm</strong>, chromosomes are<br />

selected from the population to be parents <strong>for</strong> crossover. The problem is how to select


23<br />

these chromosomes. According to Darwin's theory of evolution the best ones survive<br />

to create new offspring. There are many different methods which a GA can use to<br />

select the chromosomes to be copied over into the next generation, but listed below<br />

are some of the most common methods.<br />

2.3.4.1 Roulette Wheel Selection<br />

Parents are selected according to their fitness. The better the chromosomes are,<br />

the more chances to be selected they have. Imagine a roulette wheel where all the<br />

chromosomes in the population are placed. The size of the section in the roulette<br />

wheel is proportional to the value of the fitness function of every chromosome - the<br />

bigger the value is, the larger the section is. Figure 2-10 shows an example.<br />

Chromosome 4<br />

Chromosome 3<br />

Chromosome 2<br />

Chromosome 1<br />

FIGURE 2-10 Roulette wheel selection<br />

A marble is thrown on the roulette wheel and the chromosome where it stops is<br />

selected. Clearly, the chromosomes with bigger fitness value will be selected more<br />

times.<br />

2.3.4.2 Rank Selection<br />

The previous type of selection has problems when there are big differences<br />

between the fitness values. For example, if the best chromosome fitness is 90% of the<br />

sum of all fitness then the other chromosomes will have very few chances to be<br />

selected.<br />

Rank selection ranks the population first and then every chromosome receives<br />

fitness value determined by this ranking, as shown in Figure 2-11. The worst will<br />

have the fitness 1, the second worst 2, etc, and the best will have fitness N.<br />

Now all the chromosomes have a chance to be selected. However this method<br />

can lead to slower convergence, because the best chromosomes do not differ so much<br />

from others.


24<br />

Chromosome 4<br />

Chromosome 3<br />

Chromosome 2<br />

Chromosome 1<br />

Rank Chromosome<br />

1 Chromosome 1<br />

2 Chromosome 2<br />

3 Chromosome 4<br />

4 Chromosome 3<br />

FIGURE 2-11 Rank selection<br />

2.3.4.3 Steady-State Selection<br />

The steady-state selection works in the following way. In every generation a<br />

few good (with higher fitness) chromosomes are selected <strong>for</strong> creating new offspring.<br />

Then some bad (with lower fitness) chromosomes are removed and the new offspring<br />

is placed in their place. The rest of population survives to new generation.<br />

2.3.4.4 Tournament selection<br />

Subgroups of chromosomes are chosen from a larger population, and members<br />

of each subgroup compete against each other. Only one chromosome from each<br />

subgroup is chosen to reproduce [36].<br />

2.3.4.5 Elitism Selection<br />

Elitism is the name of the method that first copies the best chromosome (or few<br />

best chromosomes) to the new population. The rest of the population can be<br />

constructed in the methods described above. Elitism can rapidly increase the<br />

per<strong>for</strong>mance of the GA, because it prevents a loss of the best found solution.<br />

2.4 Grid Computing<br />

Grid computing is a method <strong>for</strong> sharing computing and data resources. The grid<br />

computing is used <strong>for</strong> distributed systems that shares resources over a local or wide<br />

area network. The specific focus, that underlies grid computing, is coordinated<br />

resource sharing in a <strong>multi</strong>-institutional environment [7-8]. It attempts to combine all<br />

types of resources, including supercomputers and clusters of machines, from <strong>multi</strong>ple<br />

institutions, into a resource that is more powerful than any single resource.<br />

This section will introduce grid computing in the following topics: the<br />

application considerations, the Globus Toolkit, the Globus Toolkit 2.2 and the grid<br />

components.


25<br />

2.4.1 Application Considerations<br />

If an application consists of several jobs that can all be executed in parallel, a<br />

grid may be very suitable <strong>for</strong> effective execution on dedicated nodes, especially in the<br />

case when there is no or a very limited exchange of data among the jobs.<br />

From an initial job, a number of jobs are launched to execute on pre-selected or<br />

dynamically assigned nodes within the grid. Each job may receive a discrete set of<br />

data, and fulfills its computational task independently and delivers its output. The<br />

output is collected by a final job or stored in a defined data store, as shown in Figure<br />

2-12.<br />

FIGURE 2-12 Application consists of jobs: B, C, D, and E executed in parallel<br />

Many other applications can consist of jobs are executable in parallel, but there<br />

are interdependences between them. For example, shown in Figure 2-13, jobs B and C<br />

can be launched simultaneously, but they heavily exchange data with each other. Job<br />

F cannot be launched be<strong>for</strong>e B and C have completed, whereas job E or D can be<br />

launched upon completion of B or C respectively. Finally, job G finally collects all<br />

output from the jobs D, E, and F, and its termination and results then represent the<br />

completion of the grid application.<br />

For such applications, a possible approach is to do more analysis to determine<br />

how best to split the application into individual jobs, maximizing parallelism. It also<br />

adds more dependencies on the grid infrastructure services such as schedulers and<br />

brokers, but once that infrastructure is in place, the application can benefit from the<br />

flexibility and utilization of the virtual computing environment. The use of a job flow


26<br />

management service not only can handle the synchronization of the individual results,<br />

but also can create a loose coupling between the jobs to avoid high inter-process<br />

communication and reduces the overheads in the grid [37].<br />

FIGURE 2-13 Application consists of jobs that are networked<br />

2.4.2 The Globus Toolkit<br />

In the most general case, grid resources are supposed to be geographically<br />

distributed and to be owned by different organizations, each with proprietary policies<br />

regarding security, resource allocation, plat<strong>for</strong>m maintenance, and so on. Such an<br />

environment depends strongly upon the construction of a robust infrastructure of<br />

fundamental services, able to smooth out mismatches between different machines,<br />

security policies, scheduling policies, operating systems, and plat<strong>for</strong>ms. Besides this,<br />

resource sharing must be highly controlled, with resource providers and consumers<br />

clearly defining what is shared, who is allowed to share, and the conditions under<br />

which sharing occurs. Furthermore, access to resources has to be carefully scheduled<br />

in order to extract the maximum per<strong>for</strong>mance from the available resources, and<br />

applications should have the possibility of tailoring their behavior dynamically, in<br />

order to cope with resource failure, a highly probable event in such a variegated<br />

context.<br />

All these requirements can be summarized by the need to allow transparent<br />

access to resources, as if they belonged to a single, unified “metacomputer.” There are<br />

many grid projects worldwide aimed at achieving this ambitious goal, shown in Table<br />

2-4. Globus Toolkit is one of the most promising: it is rapidly becoming the de facto<br />

standard grid middleware [39]. Globus Toolkit is a joint initiative of the University of


27<br />

Southern Cali<strong>for</strong>nia, the Argonne National Lab, and the University of Chicago. It<br />

provides an open-source set of services addressing fundamental grid issues, such as<br />

security, in<strong>for</strong>mation discovery, resource management, data management, and<br />

communication. Due to its flexibility and high interoperability with the most<br />

widespread technologies used <strong>for</strong> distributed and parallel computing, Globus Toolkit<br />

has been chosen <strong>for</strong> our problem.<br />

TABLE 2-4 Tentative list of tools <strong>for</strong> grid computing [37]<br />

A bag of services giving basic software infrastructure <strong>for</strong> grid<br />

GLOBUS development: http://www.glohus.org<br />

LEGION<br />

An object-based project at the University of Virginia:<br />

http:/ilegion.virginia.edu<br />

UNICORE<br />

The UNi<strong>for</strong>m Interface to COmputing Resources is a European<br />

grid computing ef<strong>for</strong>t: http://www .unicore.org<br />

NETSOLVE<br />

A client/server system oriented to solve computational science<br />

problems: http://icl.cs.utk.edu/netsolve/<br />

CACTUS<br />

An open-source problem-solving environment designed <strong>for</strong><br />

parallel computing and collaborative software development:<br />

http://www.catcuscode.org<br />

The next section introduces about Globus Toolkit 2.2 that will be use <strong>for</strong> our<br />

study.<br />

2.4.3 Globus Toolkit 2.2<br />

The Globus Toolkit 2.2 provides [7]:<br />

2.4.3.1 A set of basic facilities needed <strong>for</strong> grid computing, shown in Figure<br />

2-14.


28<br />

FIGURE 2-14 Components of Globus Toolkit 2.2<br />

a) Security: Single sign-on, authentication, authorization, and<br />

secure data transfer.<br />

b) Resource Management provides support <strong>for</strong>:<br />

- Resource allocation.<br />

- Submitting jobs: Remotely running executable files and<br />

receiving results.<br />

- Managing job status and progress.<br />

c) Data Management provides a system to transfer files among<br />

machines in the grid and <strong>for</strong> the management of these transfers.<br />

d) In<strong>for</strong>mation Services includes directory services of available<br />

resources and their status. It provides support <strong>for</strong> collecting in<strong>for</strong>mation in the grid<br />

and <strong>for</strong> querying this in<strong>for</strong>mation, based on the Lightweight Directory Access<br />

Protocol (LDAP), shown in Figure 2-15.<br />

FIGURE 2-15 Simple LDAP configuration [7]


29<br />

2.4.3.2 Application Programming Interfaces (APIs) to the above facilities.<br />

2.4.3.3 C bindings are needed to build and compile programs.<br />

In addition to the above, which are considered the core of the toolkit, other<br />

components are also available that complement or build on top of these facilities. For<br />

instance, Globus provides a rapid development kit known as Commodity Grid (CoG),<br />

which supports technologies such as Java, Python, Web services, CORBA, and so on.<br />

2.4.4 Grid Components<br />

This section describes high level the primary components of the grid<br />

environment, shown in Figure 2-16. Depending on the grid design and its expected<br />

use, some of these components may or may not be required, and in some cases they<br />

may be combined to <strong>for</strong>m a hybrid component.<br />

FIGURE 2-16 Grid components: a high-level perspective [8]<br />

2.4.4.1 Grid portal<br />

The grid portal provides an interface <strong>for</strong> a user to launch applications that will<br />

utilize the resources and services provided by the grid.<br />

The current Globus Toolkit does not provide any services or tools to generate a<br />

portal.<br />

2.4.4.2 Security<br />

A major requirement <strong>for</strong> the grid computing is security. There must be<br />

mechanisms to provide security including authentication, authorization, and data<br />

encryption.


30<br />

The Grid Security Infrastructure (GSI) component of the Globus Toolkit<br />

provides robust security mechanisms. The GSI includes an OpenSSL implementation.<br />

It also provides a single sign-on mechanism. There<strong>for</strong>e, once a user is authenticated, a<br />

proxy certificate is created and used when per<strong>for</strong>ming actions within the grid.<br />

2.4.4.3 Broker<br />

Once authenticated, a user will launch the application. Based on the parameters<br />

provided by the user, the broker will identify the available and appropriate resources<br />

to utilize within the grid.<br />

Though there is no broker implementation provided by Globus Toolkit, there is<br />

an LDAP-based in<strong>for</strong>mation service. This service is called Grid Resource In<strong>for</strong>mation<br />

Service (GRIS), or more commonly the Monitoring and Discovery Service (MDS).<br />

2.4.4.4 Scheduler<br />

Once the resources have been identified, the next logical step is to schedule the<br />

individual jobs to run on the individual nodes within the grid.<br />

Globus Toolkit does not have its own job scheduler to find available resources<br />

and automatically send jobs to suitable machines. Instead, it provides the tools and<br />

interfaces needed to implement schedulers.<br />

2.4.4.5 Data Management<br />

If any data (including application modules) must be moved or made accessible<br />

to the nodes where the application’s jobs will execute, then there needs to be a secure<br />

and reliable method <strong>for</strong> moving files and data to various nodes within the grid.<br />

The Globus Toolkit contains a data management component that provides such<br />

services. This component, known as Grid Access to Secondary Storage (GASS),<br />

includes facilities such as GridFTP. The GridFTP is built on top of the authentication<br />

and authorization standard FTP protocol, but adds additional functions and utilizes the<br />

GSI <strong>for</strong> user authentication and authorization.<br />

2.4.4.6 Job and Resource Management<br />

This component provides the services to actually launch a job on a particular<br />

resource, check on its status, and retrieve its results when it is complete.<br />

The Grid Resource Allocation Manager (GRAM) of Globus Toolkit provides<br />

the services <strong>for</strong> this component.


31<br />

2.5 Summary<br />

The course scheduling is a part of a general scheduling problem. It schedules<br />

courses to periods of time and classrooms so that lecturers can teach and students can<br />

attend their courses without any conflicts.<br />

Many researches have been carried out on course scheduling problems. The<br />

different approaches can be divided into four groups: sequential methods, cluster<br />

methods, constraint based methods, and meta-heuristic methods. Although they have<br />

successfully solved the course scheduling problems, not many researches have<br />

focused on solving the problems of the <strong>multi</strong>ple faculty universities. In such<br />

universities, conflicts can occur across faculties due to both sharing and non sharing<br />

resources.<br />

This study proposes a new system <strong>for</strong> <strong>multi</strong>ple faculty universities. The<br />

proposed system will apply a hybrid centralized and de-centralized approach, a GA,<br />

and a grid computing environment. The GA is a global search optimization <strong>algorithm</strong><br />

using parallel points, so it is suitable and flexible to satisfy constraints in the required<br />

timetable. The combination between the GA and the hybrid centralized and decentralized<br />

approach is able to create solutions without any conflicts between the<br />

resources around the university. The grid computing environment is used as<br />

infrastructure <strong>for</strong> sharing computing and data over a local or wide area network.


CHAPTER 3<br />

METHODOLOGY<br />

The general course scheduling problem, <strong>objective</strong>s and scope of our study were<br />

presented in chapter 1. This chapter presents the plan and the phases of analyzing,<br />

designing and implementing the proposed course scheduling system.<br />

3.1 System Development<br />

In order to obtain the expected <strong>objective</strong>s, we will follow the six phases below:<br />

3.1.1 Phase 1: Systems Analysis<br />

a) To verify the requirements and the <strong>objective</strong>s of the study.<br />

b) To choose the tools and software to be used to develop the system.<br />

3.1.2 Phase 2: Design<br />

a) To study the <strong>genetic</strong> <strong>algorithm</strong>s and grid computing environment.<br />

b) To specify the proposed system.<br />

c) To design the interfaces and the module’s functions.<br />

d) To design the database.<br />

e) To design a prototype <strong>for</strong> connecting between users and the system.<br />

3.1.3 Phase 3: Implementation<br />

a) To study the <strong>genetic</strong> <strong>algorithm</strong>s and grid computing environment.<br />

b) To install the correct software to develop the system.<br />

c) To install the database.<br />

d) To implement the prototype <strong>for</strong> connecting between users and the<br />

system.<br />

e) To implement the designed modules.<br />

3.1.4 Phase 4: Testing<br />

a) To test the system.<br />

b) To run a demonstration.<br />

c) To do some evaluations on the effectiveness of the system.


34<br />

3.1.5 Phase 5: Measurement<br />

a) To evaluate the suitability of the proposed GA against the hard and soft<br />

constraints.<br />

b) To measure the per<strong>for</strong>mance of using grid computing vs. not using grid<br />

computing.<br />

3.1.6 Phase 6: Documentation<br />

a) To write the user manuals.<br />

b) To write reports.<br />

3.2 Problem Definition<br />

The more realistic the problem the more complex it is <strong>for</strong> the developers to<br />

overcome. In the real world, course scheduling problems are very complex. For<br />

<strong>multi</strong>ple faculty universities, they are really hard jobs. Also they are strongly based on<br />

the particular requirements of each university. This study will focus on the common<br />

requirements of <strong>multi</strong>ple faculty universities. However, the proposed system with its<br />

solved constraints is strong enough so that not many changes are needed to obtain a<br />

good system <strong>for</strong> a particular university.<br />

The <strong>multi</strong>ple faculty universities where we have the chance to collect data are<br />

King Mongkut’s Institute Technology North Bangkok in Thailand and Cantho<br />

Univesity in Vietnam. At these universities, each faculty has several departments.<br />

Each department has its own resources that include lecturers, courses, and classrooms.<br />

Each department desires to construct a timetable using its own resources. These<br />

resources can also be shared by other departments in the university.<br />

Each course that is usually divided into many sections belongs to just one<br />

department. However, it is almost always the case that a significant part of the<br />

curriculum of one department is provided by another department. If a course is<br />

provided to more than one department it must be scheduled at the same time-slot on<br />

all the departmental timetables that use this course. These courses are called shared<br />

courses.<br />

Similarly we have shared classrooms. Each department desires to use its own<br />

classrooms. However, some courses sometime need to use the shared classrooms of<br />

the faculty, common buildings or other faculties. There<strong>for</strong>e, a group of classrooms


35<br />

used <strong>for</strong> a particular course has to be assigned be<strong>for</strong>e scheduling. A course has to be<br />

scheduled to these classrooms without any conflicts between the departments. Figure<br />

3-1 illustrates an arrangement <strong>for</strong> the shared classrooms.<br />

Dept1.<br />

l<br />

Faculty 1<br />

Shared classrooms<br />

Deptn. classrooms<br />

Faculty n<br />

Dept1. classrooms Deptm. classrooms<br />

Shared classrooms<br />

Common building<br />

Shared classrooms<br />

FIGURE 3-1 Shared classrooms in a <strong>multi</strong>ple faculty university<br />

Each department has a responsibility to teach a number of courses. There<strong>for</strong>e, a<br />

teaching assignment <strong>for</strong> its lecturers has to be done. Some lecturers from other<br />

faculties are invited to teach. Now we have shared lecturers who are teaching courses<br />

in more than one faculty.<br />

Also we do not schedule <strong>for</strong> the individual students. However, we will handle<br />

student problems at a class level instead. The students are divided into classes and<br />

expected to chronologically follow their advised pre-requisites in the curriculum of<br />

their respective program. Our responsibility is to schedule a timetable to help the<br />

students fulfill the courses in their curriculum. We say that two courses are in conflict<br />

with each other if they belong to the same curriculum and are scheduled at the same<br />

time.<br />

In many cases, a course can be attended by students who come from classes of<br />

different departments or faculties. This means that the students who study this shared<br />

course can have different curriculums. In any case, we have to schedule so that the<br />

students can attend their courses.


36<br />

All the above problems can be presented in a brief and clear way, included in<br />

section 1.3, the set of hard and soft constraints solved in our study.<br />

3.3 The System Boundary<br />

The system boundary gives a brief application overview through a use case<br />

diagram in Figure 3-2.<br />

Assign classrooms to departments<br />

Faculty Staff<br />

Department Staff<br />

Lecturer<br />

Create classes<br />

Create combined classes<br />

Assign teaching<br />

Schedule courses<br />

View timetable<br />

Request busy time<br />

Request preferable time<br />

University In<strong>for</strong>mation<br />

System<br />

Central Office Staff<br />

FIGURE 3-2 Use case diagram of the course scheduling system<br />

There are five actors in the use case diagram of the course scheduling system.<br />

3.3.1 Lecturer: This is a person who can request his/her busy and preferable<br />

times so that the course scheduling programs try to avoid these times. The lecturers<br />

can view the timetable after it is completed.<br />

3.3.2 Department Staff: This is a person who works in the department. The<br />

department staff prepares classes to be scheduled. Based on the teaching plan, and the<br />

department staff will assign lecturers to teach the courses.<br />

3.3.3 Faculty Staff: This is a person who works in the faculty. The faculty staff<br />

can assign the classrooms to the departments in the faculty. Each department can use<br />

these classrooms <strong>for</strong> its courses. This allocation sometime does not need to be done<br />

in each semester.


37<br />

3.3.4 Central Office Staff: This is a person who works in the central office of the<br />

university. The central office staff will activate the course scheduling system to<br />

schedule all courses <strong>for</strong> the whole university.<br />

3.3.5 University In<strong>for</strong>mation System: This is a system actor that includes a<br />

database and a database management system. It is responsible <strong>for</strong> storing and<br />

managing the data of the university.<br />

3.4 The Proposed Course Scheduling System<br />

This section presents the proposed system through a scheduling strategy and the<br />

system architecture.<br />

3.4.1 The Scheduling Strategy<br />

In general, there are two approaches to the course scheduling problem, namely<br />

centralized and de-centralized. Both approaches have their own advantages and<br />

disadvantages.<br />

The centralized approach uses software to schedule the timetable <strong>for</strong> the entire<br />

of the university. This software has a global view of the problem, presenting all the<br />

in<strong>for</strong>mation necessary to most effectively create a timetable. Un<strong>for</strong>tunately, the size<br />

of the problem is too big, so the course scheduling program is unable to create a good<br />

timetable. Furthermore, the co-operation between faculties and the central scheduling<br />

office is also a difficult problem [5].<br />

The de-centralized approach lets each faculty schedule its own timetable using<br />

its own resources. However, this approach rapidly becomes infeasible when there are<br />

shared resources across faculties. This approach can only work well if the<br />

communication between faculties is reduced to a minimum [5]. Our study proposes a<br />

hybrid centralized and de-centralized approach. The centralized course scheduling<br />

program only schedules <strong>for</strong> shared resources whereas the decentralized course<br />

scheduling program schedules <strong>for</strong> the remaining resources of each faculty. The<br />

proposed course scheduling system is shown in Figure 3-3.<br />

The proposed system is designed to consist of jobs that are processed in parallel.<br />

After clients at all faculties send their own data used in course scheduling to the<br />

Central Manager Host, a client in the central office will run the course scheduling<br />

program. In turn, the following three stages will be per<strong>for</strong>med automatically.


38<br />

Client at a<br />

faculty<br />

Client at the<br />

central office<br />

Data submission<br />

<strong>for</strong> the course<br />

scheduling<br />

Job submission<br />

<strong>for</strong> the course<br />

scheduling<br />

Central<br />

Manager Host<br />

Data and job<br />

<strong>for</strong> execution<br />

Execution Host<br />

schedules <strong>for</strong><br />

Facuty 1<br />

. . . .<br />

Execution Host<br />

schedules <strong>for</strong><br />

Facuty n<br />

FIGURE 3-3 Proposed system<br />

3.4.1.1 Stage 1<br />

The Central Manager Host requests a job to execute the centralized course<br />

scheduling program on a remote Execution Host to create a timetable of the shared<br />

resources across the faculties. The result will be written into the database on the<br />

Central Manager Host.<br />

3.4.1.2 Stage 2<br />

The Central Manager Host requests jobs to execute the decentralized course<br />

scheduling program in parallel on remote Execution Hosts. In this stage, each remote<br />

host uses the fixed timetable created in Stage 1 as an initial input, and then tries to<br />

find a timetable <strong>for</strong> each faculty. The decentralized course scheduling program must<br />

give results that do not conflict with the centralized scheduling output. The results<br />

from all remote nodes will also be written into the database on the Central Manager<br />

Host.<br />

3.4.1.3 Stage 3<br />

The Central Manager Host requests a job to merge the results in the database of<br />

Central Manager Host. Finally, the entire timetable <strong>for</strong> the whole university will be<br />

created.<br />

We will use a <strong>genetic</strong> <strong>algorithm</strong> to develop both the centralized course<br />

scheduling program and decentralized course scheduling program. The grid<br />

computing environment is used as infrastructure <strong>for</strong> distributed and parallel<br />

computing.


39<br />

3.4.2 The System Architecture<br />

The system can be separated into two subsystems: Front End system and Grid<br />

system, shown in Figure 3-4.<br />

The Front End system is based on the 3-tier architecture. This will be used by<br />

the clients in the faculties and in the central office to prepare the data be<strong>for</strong>e<br />

scheduling. It includes three components: GUIs, application program and data<br />

storage.<br />

By separating the system into 3 tiers, they can work independently. The<br />

presentation tier involves the graphical user interface. The application tier consists of<br />

the application manager. The last tier, the database tier, consists of a database and its<br />

database management system (DBMS).<br />

Presentation tier<br />

Clients at the faculties<br />

Client at the central office<br />

Client 1 Client 2 Client n Client n+1<br />

Application<br />

tier<br />

Application<br />

Manager<br />

Scheduling<br />

Engine<br />

Commodity Grid<br />

Search available machines<br />

Send data to machines<br />

Send jobs to machines<br />

Distribute job/data<br />

Globus Grid<br />

Environment<br />

Node 1<br />

Node n<br />

Get results from jobs<br />

submitted to the machines<br />

Node 2<br />

Database tier<br />

DBMS<br />

Results<br />

DB<br />

FIGURE 3-4 System architecture


40<br />

The Grid system is only used by a client in the central office to start the<br />

scheduling engine that then activates the grid system. The grid system is also a 3-tier<br />

architecture of the following: Client, Commodity Grid (CoG), and Globus Grid<br />

Environment (Grid).<br />

The Client tier is the interface between users and the grid system. It is<br />

responsible <strong>for</strong> receiving command to run the scheduling engine.<br />

The CoG tier acts as an interface between the Grid and Client tier. Using the<br />

facilities provided by the API, the CoG is able to allow secure file transfers and also<br />

takes the responsibility of job scheduling and monitoring the status of jobs. There is<br />

one job <strong>for</strong> centralized course scheduling, and many other jobs <strong>for</strong> decentralized<br />

course scheduling. When a job needs to be per<strong>for</strong>med, the CoG will look <strong>for</strong> available<br />

nodes to assign it to. The Management and Discovery Service (MDS) provided by the<br />

Globus Toolkit will provide in<strong>for</strong>mation about the available nodes within the Grid.<br />

Next, it checks and locates the sequence data to available machines (nodes).<br />

Security (GSI) and reliability is important when transferring data to various nodes<br />

within the Grid. In order to provide <strong>for</strong> such requirements, the Globus Toolkit<br />

provides a data management component, known as Grid Access to Secondary Storage<br />

(GASS), <strong>for</strong> secure and reliable data transfers. It uses the GridFTP protocol to<br />

facilitate the checking and transport of data files.<br />

The CoG tier monitors the progress of each job and polls regularly to check if<br />

the jobs are finished. The Grid Resource Allocation Manager (GRAM) provides the<br />

necessary services <strong>for</strong> these processes. Once compiled, the results will be stored into<br />

the database, and their status will be shown to the Client.<br />

3.5 The Database Design<br />

In the database design, we present an entity relation diagram, shown in Figure<br />

3-5. This design also helps us understand more clearly the system requirements.<br />

Data relations between the entities in the above diagram are very important.<br />

Since the course scheduling programs will not work directly on the database, it works<br />

on the data structures instead. There<strong>for</strong>e, the data and its relations need to be loaded<br />

from the database into the corresponding data structures be<strong>for</strong>e scheduling. The


41<br />

course scheduling programs have to know the data relations so that they are able to<br />

look <strong>for</strong> enough in<strong>for</strong>mation to satisfy the hard and soft constraints.<br />

Building<br />

BuildingID<br />

Faculty<br />

FacultyID<br />

1<br />

BuildingName<br />

1<br />

FacultyName<br />

ClassroomGroupID<br />

has<br />

DeptID<br />

has<br />

consists<br />

of<br />

ClassroomGroupName<br />

N<br />

1 ClassroomGroup N controls M Department<br />

1 1 1<br />

DeptName<br />

ClassroomID<br />

N M<br />

Classroom<br />

M<br />

ClassroomName<br />

Seats<br />

has<br />

N<br />

has<br />

N<br />

Course<br />

has<br />

semester<br />

Curriculum<br />

year<br />

N M<br />

Program 1 has N Class<br />

ProgramID<br />

ProgramName<br />

NumSemesters<br />

Semester<br />

DayinWeek<br />

Year<br />

Time-slot<br />

N<br />

1<br />

classID<br />

className<br />

enrolYear<br />

CourseID<br />

CourseName<br />

Credits<br />

Kind<br />

takes<br />

numStudents<br />

hasTimeTable<br />

consists<br />

of<br />

M N N<br />

CourseSection<br />

N<br />

has<br />

teaches<br />

N<br />

Lecturer<br />

1<br />

SectionNo<br />

Semester<br />

Year<br />

NumStudents<br />

has<br />

1<br />

N<br />

BusyTime<br />

LecturerID<br />

LecturerName<br />

DayinWeek<br />

Working<br />

Session<br />

State<br />

Gender<br />

FIGURE 3-5 Entity relation diagram<br />

The data dictionary is presented in Appendix A.


42<br />

3.6 The Proposed Genetic Algorithm<br />

This section presents the proposed <strong>genetic</strong> <strong>algorithm</strong> that includes <strong>genetic</strong><br />

representations, processes to create constraint data, initialize a random population,<br />

evaluate fitness function, crossover, and mutate chromosomes. Figure 3-6 presents<br />

the high level representation of this <strong>algorithm</strong>.<br />

Start<br />

Create constraint data<br />

Initialize a random population of n chromosomes<br />

Is fitness f(x) of<br />

Yes<br />

first chromosome x<br />

satisfied<br />

No<br />

Delete some bad chromosomes (low fitness value)<br />

Output<br />

Solution<br />

Stop<br />

No<br />

Population size < n<br />

Yes<br />

Select 2 chromosomes as parent<br />

Crossover<br />

Breed a new chromosome (offspring)<br />

Mutate<br />

Evaluate the fitness value of the offspring<br />

Add the offspring to the population in order of fitness value<br />

FIGURE 3-6 High level representation of the proposed <strong>genetic</strong> <strong>algorithm</strong><br />

To generate an optimum result, we apply the <strong>genetic</strong> <strong>algorithm</strong> to create one or<br />

more solutions that have various fitness values. Based on comparisons, changes, and<br />

creation of new solutions, we can choose a good solution. Of course, we can obtain a<br />

variety of good solutions.<br />

Shown in Figure 3-6, we will insert a new chromosome that has just mutated<br />

into the right position in the population. The crossover and mutation operations are


43<br />

repeated to change the population until the first chromosome of the population obtains<br />

a good enough fitness value f(x). However, if repeated too many times, these<br />

operations will create a large number of chromosomes that is above the preset<br />

population size. To solve this problem, once the number of chromosomes increases up<br />

to a critical value n, we will kill off half of the population.<br />

3.6.1 Representations<br />

This section defines the <strong>genetic</strong> representations of the chromosomes, the genes,<br />

and the population.<br />

3.6.1.1 Chromosomes<br />

A chromosome is a solution, in our case a timetable of the university. The<br />

timetable contains a number of sub-timetables of classrooms. Each classroom has its<br />

own sub-timetable.<br />

Classroom i<br />

Hour Mon Tue Wed Thu Fri<br />

08:00-09:00 Course 1 Course 2<br />

09:00-10:00 Course 1 Course 2<br />

10:00-11:00 Course 1 Course 2<br />

11:00-12:00<br />

13:00-14:00 Course 3<br />

14:00-15:00 Course 3<br />

15:00-16:00 Course 4<br />

16:00-17:00 Course 4<br />

FIGURE 3-7 Sub-timetable of a classroom<br />

We use a classroom as a ‘storage space’. Courses are scheduled to the time-slots<br />

<strong>for</strong> each classroom. This direct representation creates a visual view. Here courseis are<br />

courses that are divided into sections. These sections are assigned to be taught by<br />

particular lecturer and studied by a class of students. A look at the data relations in the<br />

database, we have course → lecturer, course→ class. This is a good foundation <strong>for</strong><br />

checking the hard and soft constraint conflicts.<br />

The Figure 3-8 illustrates an entire chromosome.


44<br />

Chromosome x i<br />

Fitness = f(x i )<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1<br />

Class2<br />

Mon<br />

Class1<br />

Tue Wed<br />

Class2<br />

Thu Fri<br />

Classroom Class1 1<br />

Class1 Class2 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Course 1<br />

Class1 Class3 Course 2<br />

Class2<br />

Course 1 Class3 Course 2<br />

Course 1<br />

Class3 Course 2<br />

Class4<br />

Class3<br />

Class4<br />

Course 3<br />

Course 3<br />

Class4<br />

Class4<br />

Course 4<br />

Course 4<br />

A gene=A time-slot<br />

FIGURE 3-8 Chromosome<br />

Each chromosome x i has a fitness value f(x i ). We will use this value to look <strong>for</strong> a<br />

good chromosome (a good solution).<br />

3.6.1.2 Genes<br />

A gene is a time-slot in a chromosome, so there are many genes in a<br />

chromosome. Each gene contains a 0 if no course is held at that position. On the<br />

contrary, the gene contains a course. If changing value of the genes, we will create a<br />

new chromosome.<br />

3.6.1.3 Population<br />

A population is a set of n chromosomes, or n solutions. The population is<br />

always sorted decreasingly in the order of the chromosome’s fitness value. As a<br />

result, the first chromosome has the highest fitness value, thus a candidate <strong>for</strong> the best<br />

solution, as illustrated in Figure 3-9.<br />

Chromosome x n<br />

Fitness = f(x n )<br />

A population<br />

Chromosome x 2<br />

Fitness = f(x 2 )<br />

Chromosome x 1<br />

Fitness = f(x 1 )<br />

FIGURE 3-9 Population


45<br />

3.6.2 Creating Constraint Data<br />

Figure 3-10 presents processes to prepare data be<strong>for</strong>e scheduling.<br />

User Input<br />

Faculties<br />

Departments<br />

Curriculums<br />

Classrooms<br />

Lecturers<br />

Courses<br />

Classes<br />

Assignments<br />

Constraint data<br />

are stored into<br />

Data Structures<br />

GA Parameters<br />

GA<br />

Timetable<br />

FIGURE 3-10 Creating constraint data<br />

All data, and their relations, plus the GA parameters have to be prepared be<strong>for</strong>e<br />

running the GA. The data about each faculty, department, curriculum, classroom,<br />

lecturer, course, class and teaching assignment are entered into the database by the<br />

users. Then automatically a program module will extract and store these data into the<br />

data structures. The list data structures are used because they are flexible <strong>for</strong><br />

designing the <strong>algorithm</strong>s. The GA parameters such as the population size, mutation<br />

and crossover rates, and penalty costs <strong>for</strong> the unsatisfied constraints are also prepared<br />

as variables in the program.<br />

3.6.3 Initializing a Random Population of Chromosomes<br />

Start<br />

Initialize an empty population<br />

Population size < n<br />

Yes<br />

Create a random chromosome x<br />

No<br />

Stop<br />

Evaluate the fitness f(x) <strong>for</strong> new chromosome x<br />

Add the new chromosome x to the population in order of fitness<br />

FIGURE 3-11 Algorithm <strong>for</strong> initializing a random population


46<br />

A population is a list of n chromosomes. Starting with an empty population, one<br />

after another we create and add new random chromosomes into this population.<br />

A pseudo code <strong>for</strong> creating this is given in Figure 3-12.<br />

For each course<br />

n= number of time-slots needed <strong>for</strong> this class (= number of credits)<br />

Repeat<br />

Randomly select a classroom in list of classrooms that are permissible <strong>for</strong> this course<br />

Search n free time-slots in the chosen classroom<br />

If (n free time-slots are found)<br />

Book the current course to these time-slots<br />

Until (course is booked)<br />

FIGURE 3-12 Pseudo code <strong>for</strong> creating a random chromosome<br />

3.6.4 Evaluating Fitness Function<br />

As represented above, each chromosome x has a fitness value f(x). In this<br />

section, we discuss how to find f(x).<br />

Assume that we have m hard constraints. Let Hc i denote the number of<br />

conflicted hard constraints i, where i = 1..m. Each hard constraints i is assigned a<br />

penalty cost Penalty_hc i . We use f 1 (x) to denote the fitness value of hard constraints.<br />

1<br />

f1(<br />

x)<br />

= Eq. 3-1<br />

m<br />

1+<br />

Hc Penalty _ hc<br />

∑<br />

i=<br />

1<br />

i<br />

Similarly assume that we have n soft constraints. Let Sc j denote the number of<br />

conflicted soft constraints j, where j = 0..n. Each soft constraint j is assigned a penalty<br />

cost Penalty_sc j . We use f 2 (x) to denote the fitness value of soft constraints.<br />

∑<br />

j=<br />

1<br />

j<br />

sc j<br />

i<br />

1<br />

f<br />

2<br />

( x)<br />

= Eq. 3-2<br />

n<br />

1+<br />

Sc Penalty _<br />

Thus, if a chromosome has a lower number of conflicts, f 1 (x) and f 2 (x) will have<br />

a higher fitness value. We use f(x) to denote the fitness value of the chromosome x.<br />

f ( x)<br />

= W ( ) ( )<br />

Eq. 3-3<br />

1<br />

f1<br />

x + W2<br />

f<br />

2<br />

x


47<br />

where W 1 and W 2 denote weights of hard and soft constraints respectively. We will<br />

do experiments to identify suitable values <strong>for</strong> these weights.<br />

In this study, we design a course scheduling <strong>algorithm</strong> to find solutions that<br />

have the highest fitness value f(x). This is a heuristic search, so we will look at<br />

solutions having high fitness value until we meet a solution whose f 1 (x) is equal to 1.<br />

3.6.4.1 Checking Conflicts about Small Classrooms<br />

Each course must be booked to a classroom that is large enough to hold the<br />

students of that course.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-13.<br />

Count=0<br />

For each classroom<br />

For each day in a week<br />

For each time-slot in a day<br />

If ( number of students attending the course held in the current classroom ><br />

number of seats of the current classroom) Count =Count+1<br />

FIGURE 3-13 Pseudo code <strong>for</strong> checking small classroom conflicts<br />

3.6.4.2 Checking Conflicts Regarding Lecturer’s Busy Time<br />

The courses taught by a lecturer cannot be booked to his/her busy workingsessions<br />

in a week.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-14.<br />

Count=0<br />

For each lecturer<br />

For each day in a week<br />

For each time-slot in a day<br />

For each classroom<br />

If (the current lecturer teaching the class is held in the current classroom and at<br />

this time-slot ) and (the current lecturer is busy at this time) Count=Count+1<br />

FIGURE 3-14 Pseudo code <strong>for</strong> checking lecturer’s busy time


48<br />

Lecturers register their busy time. This checking will compare their busy time<br />

with the time that is used to book the lecturers courses. If duplicated, an error is<br />

counted.<br />

3.6.4.3 Checking Conflicts about Preferable Time<br />

Some lecturers dislike teaching in some working-sessions in a week. The system<br />

should try to avoid booking their courses to this time.<br />

The course scheduling program tries to book lecturers’ courses in these desired<br />

time periods. Any conflict will be counted as a soft constraint.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-15.<br />

Count=0<br />

For each lecturer<br />

For each day in a week<br />

For each time-slot in a day<br />

For each classroom<br />

If (the current lecturer teaching the class is held in the current classroom and at<br />

this time-slot ) and (the current lecturer dislikes teaching at this time) Count=Count+1<br />

FIGURE 3-15 Pseudo code <strong>for</strong> detecting conflicts about preferable times<br />

3.6.4.4 Checking Conflicts about Double Booked Lecturers<br />

A lecturer cannot teach more than one course at the same time.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-16.<br />

Count=0<br />

For each lecturer<br />

For each day in a week<br />

For each time-slot in a day<br />

Booked=0<br />

For each classroom<br />

If (course held in this classroom is taught by the current lecturer) Booked = Booked+1<br />

If (Booked>1) Count=Count+1<br />

FIGURE 3-16 Pseudo code <strong>for</strong> checking conflicts about double scheduled lecturers


49<br />

At the same time, if a lecturer is booked to teach more than one course, a<br />

conflict will be counted.<br />

3.6.4.5 Checking Conflicts about Double Scheduled Classes<br />

Courses attended by the same class of students have to be scheduled to different<br />

time so that all students of that class can attend.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-17.<br />

For each class<br />

For each day in a week<br />

For each time-slot in a day<br />

Count=0<br />

For each classroom<br />

If (the course held in the current time-slot is studied by the current class)<br />

Count=Count+1<br />

FIGURE 3-17 Pseudo code <strong>for</strong> checking conflicts about double scheduled classes<br />

At the same time, a class cannot be booked to study more than one course. If<br />

double scheduled, a conflict will be counted.<br />

3.6.4.6 Checking Conflicts about Double Scheduled Courses<br />

Every course must be scheduled exactly once in a week.<br />

A pseudo code <strong>for</strong> checking this is given in Figure 3-18.<br />

Count=0<br />

For each course<br />

Booked=0<br />

For each classroom<br />

For each day in a week<br />

For each time-slot in a day<br />

If (the current course is held in this time period)<br />

Booked=Booked+1<br />

If (Booked> the number of credits of the current course) Count=Count+1<br />

FIGURE 3-18 Pseudo code <strong>for</strong> checking conflicts about double scheduled courses


50<br />

A course is booked to the time-lots based on the number of its credits. In our<br />

study, the number of credits of a course can be 1, 2, 3, or 4. We stipulate that if a<br />

course has n credits, it will be scheduled to n straight time-slots in a day. For instance,<br />

course MAT125 has 3 credits, so it has to be scheduled to 3 straight time-slots. In any<br />

other case, a conflict will be counted.<br />

3.6.5 Crossover<br />

Two chromosomes from a population are chosen at random as mother and<br />

father. A new offspring is generated by creating an empty chromosome, then inserting<br />

alternately genes (time-slots) from the mother and father, as illustrated in Figure 3-19.<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Classroom<br />

Class1<br />

Class1 1<br />

Class2<br />

Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Class1<br />

Class3<br />

Class2<br />

Course 1 Class3 Course 2<br />

Course Class3 1 Course 2 Class4<br />

Course Class3 1 Course 2 Class4<br />

Class4<br />

Class4<br />

Course 3<br />

Course 3<br />

Chromosome x<br />

(Mother)<br />

Course 4<br />

Course 4<br />

Chromosome y<br />

(Father)<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Classroom Class1<br />

Class1 1 Class2<br />

Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Class1<br />

Class3<br />

Class2<br />

Course 2 Class3 Course 3<br />

Course Class3 2 Course 3<br />

Class4<br />

Course Class3 2<br />

Class4<br />

Class4<br />

Class4<br />

Course 4<br />

Course 4<br />

New chromosome z<br />

(Offspring)<br />

Classroom n<br />

Mon Tue Wed Thu Fri<br />

Classroom 2<br />

Class1 Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Classroom Class1 1<br />

Class1 Class2<br />

Class2<br />

Mon Class1 Tue Wed Class2 Thu Fri<br />

Class1<br />

Class3<br />

Class2<br />

Course 2 Class3 Course 2<br />

Course<br />

Class3 2 Course 2 Class4<br />

Course<br />

Class3 2 Course 2 Course 4 Class4<br />

Course<br />

Class4 4<br />

Class4<br />

Course 3<br />

Course 3<br />

Course 4<br />

Course 4<br />

FIGURE 3-19 Crossover


51<br />

The new offspring is created from an empty chromosome, and then it is inserted<br />

alternately with genes from mother and father. Because a n-credit course will be<br />

scheduled to n successive time-slots, successive time-slots have to be copied from<br />

mother and father. To facilitate this, all time-slots of morning or afternoon working<br />

sessions will be copied from the mother or father to the new offspring.<br />

Usually the new offspring is not correct thus it needs to be repaired. If a course<br />

has not been scheduled yet, it also needs to be scheduled. In the contrary, if a course<br />

has been scheduled more than one time in a week, it has to be removed.<br />

A pseudo code <strong>for</strong> crossover is given in Figure 3-20.<br />

Crossover rate pc=0.5<br />

Father x= a chromosome is chosen randomly from the population<br />

Mother y= a chromosome is chosen randomly from the population (y≠x)<br />

For each day in a week<br />

For each working-session in [morning, afternoon]<br />

For each classroom<br />

If (random(100) < pc*100)<br />

Copy afternoon time-slots of father x to afternoon time-slots of the new offspring z<br />

Else<br />

Copy morning time-slots of mother y to morning time-slots of the new offspring z<br />

Mutate the new offspring z<br />

Repair the new offspring z<br />

Calculate fitness value <strong>for</strong> the new offspring z<br />

Insert the new offspring z into the population in order of fitness value<br />

FIGURE 3-20 Pseudo code <strong>for</strong> crossover<br />

If the crossover rate pc is chosen to be 50%, the 50% of the genes from the<br />

mother and 50% of the genes from father are copied to the new offspring.<br />

3.6.6 Mutation<br />

A new offspring that has just been created by crossover will be mutated with a<br />

mutation rate. This is done via the following process: go through each gene and swap<br />

its content with another gene in the same chromosome.


52<br />

As mentioned in the previous section, a course has to be scheduled to successive<br />

time-slots, so we have to swap the successive time-slots booked <strong>for</strong> a course with<br />

other successive time-slots. To facilitate this, we choose all time-slots of a working<br />

session to swap with those of another, as illustrated in Figure 3-21.<br />

…<br />

…<br />

Classroom j<br />

Mon Tue Wed Thu Fri<br />

Chromosome x<br />

Course 6 Course 8<br />

Course 6 Course<br />

Classroom i<br />

8<br />

Course6 Course 8<br />

Mon Tue Wed Thu Fri<br />

Course 1 Course 2<br />

Course 1 Course 2<br />

Course 1 Course 2<br />

Course3 Course 9<br />

Course3 Course 9<br />

Course 3<br />

Course 3<br />

Course 4<br />

Course 4<br />

Swap contenst of 2 workingsessions<br />

with each other<br />

FIGURE 3-21 Mutation<br />

A pseudo code <strong>for</strong> mutating is given in Figure 3-22.<br />

Mutation rate pm=0.02<br />

For each classroom<br />

For each day in a week<br />

For each working-session in [morning, afternoon]<br />

If (random(100) < pm*100)<br />

R= a classroom is chosen randomly from the classroom group that is the<br />

same group of the current classroom<br />

Swap all time-slots of the current working-session of the current classroom<br />

with those of classroom R<br />

FIGURE 3-22 Pseudo code <strong>for</strong> mutating a chromosome<br />

Because a course is scheduled by only using classrooms in an assigned<br />

classroom group, any swapping has to ensure to use the classrooms within this<br />

classroom group.


53<br />

If the mutation rate is chosen to be 2%, only 2% of the genes are swapped their<br />

contents with others.<br />

3.7 The System <strong>for</strong> Experiment<br />

The Globus Toolkit 2.2 is used as middleware to develop our grid computing<br />

environment [7, 8]. This section presents the main steps <strong>for</strong> installing and setting up<br />

this environment.<br />

An Ethernet LAN and three Intel Pentium machines were used to build the grid<br />

environment. Redhat Linux 9.0 and Globus Toolkit 2.2 were installed and set up. In<br />

Figure 3-23, we present this environment with the host names and functions of each<br />

machine.<br />

m2.kmitnb.ac.th<br />

m1.kmitnb.ac.th<br />

Output<br />

Jobs<br />

- Globus client<br />

- J2sdk1.4, Java Cog Kit 1.1<br />

- MySQL 4.0<br />

m3.kmitnb.ac.th<br />

- Centralized course<br />

scheduling program<br />

- Decentralized course<br />

scheduling program<br />

- Globus server<br />

- GIIS, GRIS<br />

- CA<br />

- NTP server<br />

- Decentralized course<br />

scheduling program<br />

- Globus server<br />

- GRIS<br />

FIGURE 3-23 Hardware and software <strong>for</strong> each machine<br />

The host names are m1, m2 and m3. The machines should have a clock speed of<br />

at least 500 Mhz, at least 128 MB of memory and at least an 8 GB hard drive.<br />

We will configure the Monitoring and Discovery Service (MDS) to have one<br />

Grid In<strong>for</strong>mation Index Service (GIIS) on machine m2, which collects the data<br />

reported by the Grid Resource In<strong>for</strong>mation Servers (GRIS) on all the machines,<br />

shown in Figure 3-24.<br />

The GRIS servers send in<strong>for</strong>mation about their respective servers to the GIIS.<br />

We will use this to find the available machines. The user will be able to query the


54<br />

GIIS from the client machine m1. The machine m2 is used as a Certificate Authority<br />

machine.<br />

m2.kmitnb.ac.th<br />

m1.kmitnb.ac.th<br />

GRIS<br />

GIIS<br />

Grid-info-search<br />

GRIS<br />

m3.kmitnb.ac.th<br />

FIGURE 3-24 MDS configuration<br />

The MDS is secured so that only certified users can access the GIIS and only<br />

certified server GRISs can register to send in<strong>for</strong>mation to the GIIS. The machine m2<br />

is also used as a Network Time Protocol (NTP) server. We have to configure the NTP<br />

clients <strong>for</strong> the others (m1 and m3). The NTP needs to be installed because the grid<br />

needs the clocks on all of the machines to be synchronized.<br />

The installation and set up process in detail is presented in Appendix B.<br />

3.8 The Grid Components<br />

This section introduces the following grid components: broker, scheduler, and<br />

job and resource management.<br />

3.8.1 Broker<br />

The broker identifies the available resources to utilize within the grid<br />

environment. The Globus Toolkit 2.2 does not provide a broker implementation, but it<br />

provides the necessary functions and framework to create one through the MDS<br />

component.<br />

The broker will communicate via the LDAP protocol in the Globus Toolkit 2.2<br />

with the GIIS and GRIS servers. The broker can be linked with other in<strong>for</strong>mation


55<br />

stored in the databases or plain files that provide the resource in<strong>for</strong>mation, shown in<br />

Figure 3-25.<br />

In our study, we use a broker that uses the LDAP APIs provided by the Globus<br />

Toolkit 2.2 to send requests to the GIIS server located on machine m2.<br />

The complete source code <strong>for</strong> the broker is given in the file GridInfoSearch.java<br />

in Appendix E.<br />

m1.kmitnb.ac.th<br />

Broker<br />

LDAP query<br />

m2.kmitnb.ac.th<br />

GIIS<br />

GRIS<br />

Application<br />

GRIS<br />

GRIS<br />

m3.kmitnb.ac.th<br />

…<br />

FIGURE 3-25 Working with a broker<br />

When called, the GIIS server will return a list of available hosts within the grid.<br />

Each host has gathered the following resource in<strong>for</strong>mation:<br />

- Host name<br />

- CPU speed (MHz)<br />

- Number of CPU(s)<br />

- Free CPU Percentage<br />

The list of available hosts will be sorted by the weight that measures CPU<br />

workload.<br />

CPU<br />

speed<br />

* CPU<br />

count<br />

* CPU<br />

load<br />

Weight<br />

host<br />

= Eq. 3-4<br />

100<br />

where CPU speed : CPU speed; CPU count : the number of CPU(s); and CPU load : the<br />

current CPU workload.<br />

The most available host will be selected to run a new job.


56<br />

The complete source code <strong>for</strong> managing the available hosts is given in the file<br />

AvailableHost.java in Appendix E.<br />

3.8.2 Job Scheduler<br />

The job scheduler schedules the individual jobs to run on the individual hosts.<br />

Hamscher et al. [40] presented three job scheduling paradigms <strong>for</strong> a grid –<br />

centralized, hierarchical and distributed. Our study uses a centralized scheduling<br />

system. In addition, because the Globus Toolkit does not have its own job scheduler,<br />

our study will propose a job scheduler.<br />

In a centralized scheduling paradigm, a central machine acts as a resource<br />

manager to schedule jobs to all the surrounding hosts within the grid environment.<br />

Figure 3-26 presents the architecture of this scheduling.<br />

Jobs<br />

Central<br />

scheduling<br />

Job 1 Job 2 Job 3<br />

Host 1 Host 2 Host 3<br />

FIGURE 3-26 Centralized scheduling<br />

In this scenario, the jobs are first submitted to the central scheduler that then<br />

dispatches the jobs to the appropriate hosts. The jobs that cannot be started on a host<br />

are normally stored in a central job queue <strong>for</strong> later start.<br />

In our study, the central scheduling is implemented in machine m1. In addition,<br />

there are two kinds of jobs: one is the centralized course scheduling job and two is the<br />

decentralized course scheduling job. These jobs will be run on machine m2 and m3.<br />

Figure 3-27 presents the proposed <strong>algorithm</strong> <strong>for</strong> the centralized scheduling.


57<br />

Start<br />

Request the centralized course scheduling job<br />

to be run on a designated host<br />

Stage 1<br />

Wait <strong>for</strong> the results<br />

The job fails<br />

Yes<br />

No<br />

Select a job from the list of all<br />

decentralized course scheduling jobs<br />

Stage 2<br />

Search a host having the lowest load<br />

Request the decentralized course scheduling<br />

job to be run on the searched host<br />

All decentralized course<br />

scheduling jobs are requested<br />

No<br />

Yes<br />

All jobs were done<br />

No<br />

Select a job from the list of all<br />

decentralized course scheduling jobs<br />

Yes<br />

End<br />

Stage 3<br />

Get status of the job<br />

No<br />

The job failed<br />

Yes<br />

Search a host having the lowest load<br />

Request the failed job to be run on the searched host<br />

FIGURE 3-27 Job scheduler <strong>for</strong> the grid computing environment<br />

The <strong>algorithm</strong> can be divided into three stages:<br />

3.8.2.1 Stage 1<br />

The centralized course scheduling job is requested to be executed on a<br />

designated host, machine m2. The system will wait <strong>for</strong> the results and resubmit if it<br />

fails.


58<br />

3.8.2.2 Stage 2<br />

After the centralized course scheduling job is executed successfully, all<br />

decentralized course scheduling jobs are requested to be executed on remote<br />

machines: m2 and m3.<br />

There is no exchange of data between the decentralized course scheduling jobs,<br />

so these jobs can be requested one after another to be run in parallel in the grid.<br />

After each job is submitted to be executed on a host, the most available host will<br />

be updated.<br />

3.8.2.3 Stage 3<br />

The system monitors all the decentralized course scheduling jobs and resubmit a<br />

job if it fails.<br />

The complete source code <strong>for</strong> this job scheduler is given in the file<br />

Scheduling.java in Appendix E.<br />

3.8.3 Job and Resource Management<br />

The job and resource management submits a job to a particular resource, queries<br />

job status, and resubmits a job if it fails.<br />

FIGURE 3-28 Overview of GRAM and GASS


59<br />

The job and resource management in the Java Cog Kit is done by using the Grid<br />

Resource Allocation Manager (GRAM) and the Grid Access to Secondary Storage<br />

(GASS), shown in Figure 3-28.<br />

The GRAM is a module that provides the remote execution and status<br />

management of the execution. When a job is submitted by a client, the request is sent<br />

to the remote host and handled by the gatekeeper daemon located in the remote host.<br />

Then the gatekeeper creates a job manager to start and monitor the job. When the job<br />

is finished, the job manager sends the status in<strong>for</strong>mation back to the client and<br />

terminates.<br />

3.8.3.1 Job<br />

In Globus terminology, a job is a binary executable or command to be run on a<br />

remote resource (machine). In order to run this job, the remote server must have the<br />

Globus Toolkit installed. The remote server is also referred as a gatekeeper.<br />

In our case, we have two jobs that are executable programs: the centralized<br />

course scheduling and decentralized course scheduling. Both are written in C<br />

language. The centralized course scheduling program schedules <strong>for</strong> courses whose<br />

lecturers are invited from other faculties and courses whose students come from other<br />

faculties. On the other hand, the decentralized scheduling program schedules <strong>for</strong><br />

courses of each particular faculty that have not been scheduled yet by the centralized<br />

course scheduling program.<br />

3.8.3.2 The Resource Specific Language (RSL)<br />

RSL is a language used by the clients to submit a job. All job submission<br />

requests are described in RSL, including the executable file and condition on which it<br />

must be executed.<br />

The following is a sample RSL string that requests to execute the file<br />

decentralizedscheduling.exe one time on a remote host. The directory of this file is<br />

also identified.<br />

&(execuatable = decentralizedscheduling.exe)<br />

(directory = /usr/study/coursescheduling)<br />

(arguments = facultyID)(count=1)


60<br />

3.8.3.3 The Gatekeeper<br />

The gatekeeper daemon builds the secure communication between the clients<br />

and the servers. It communicates with the GRAM client and authenticates the right to<br />

submit jobs. After authentication, gatekeeper splits and creates a job manager<br />

delegating the authority to communicate with clients.<br />

The Java CoG Kit provides a personal gatekeeper that can be used as a<br />

lightweight alternative to the Globus gatekeeper. A gridmap file is used by the<br />

gatekeeper to map the Globus credentials to local users. The gridmap file is<br />

introduced in Appendix B.<br />

3.8.3.4 Job manager<br />

The job manager is created by the gatekeeper daemon as part of the job<br />

requesting process. It provides the interfaces that control the allocation of each local<br />

resource manager. The job manager functions are:<br />

a) Parse the RSL.<br />

b) Allocate job requests to the local resource managers. The local<br />

resource manager is usually a job scheduler like PBS, LSF, or LoadLeveler. However,<br />

our study does not use these job schedulers.<br />

c) Send callbacks to clients, if necessary.<br />

d) Receive the status and cancel requests from clients.<br />

e) Send output results to clients using the GASS, if requested.<br />

The GRAM uses the GASS <strong>for</strong> providing the mechanism to transfer the output<br />

file from servers to clients. Some APIs are provided under the Grid Security<br />

Infrastructure (GSI) protocol to furnish secure transfers.<br />

The complete source code <strong>for</strong> the job submission is given in the file<br />

GassJob.java in Appendix E.


CHAPTER 4<br />

EXPERIMENTAL RESULTS<br />

The system <strong>for</strong> the experiment was installed and set up as outlined in section<br />

3.7. This chapter discusses some of the results of our <strong>genetic</strong> <strong>algorithm</strong> (GA) and the<br />

grid computing environment. Section 4.1 presents the data used <strong>for</strong> the experiments.<br />

Section 4.2 presents experiments and discussions. Section 4.3 presents sample results.<br />

4.1 The Data <strong>for</strong> the Experiments<br />

The data used <strong>for</strong> the experiments are collected from the three departments of<br />

three different faculties: Department of English – Faculty of Education, Department<br />

of Electrical and Computer Engineering – Faculty of Engineering, and Department of<br />

Computer Science – Faculty of Science, in Cantho University (Vietnam). Twelve<br />

classes will be scheduled to study 76 sections of the courses in their curriculums in<br />

the first semester of 2006. They are Bachelor of Science in Computer Science<br />

(BSCS04A, BSCS04B, BSCS05A, BSCS05B, BSCS06A, and BSCS06B) and<br />

Bachelor of Science in Electrical Engineering (BSEE04A, BSEE04B, BSEE05A,<br />

BSEE05B, BSEE06A, and BSEE06B), shown in Table 4-1.<br />

TABLE 4-1 Courses fulfilled by each class<br />

Class Semester Course Section Credits Number of Students<br />

BSCS04A<br />

5<br />

CSC329<br />

001<br />

3<br />

30<br />

BSCS04A<br />

5<br />

CSC330<br />

001<br />

2<br />

30<br />

BSCS04A<br />

5<br />

ENL307<br />

001<br />

3<br />

30<br />

BSCS04A<br />

5<br />

CSC326<br />

001<br />

3<br />

30<br />

BSCS04A<br />

5<br />

CSC327<br />

001<br />

2<br />

30<br />

BSCS04A<br />

5<br />

CSC328<br />

001<br />

2<br />

30<br />

BSCS04B<br />

5<br />

CSC326<br />

002<br />

3<br />

30<br />

BSCS04B<br />

5<br />

CSC327<br />

002<br />

2<br />

30<br />

BSCS04B<br />

5<br />

CSC328<br />

002<br />

2<br />

30<br />

BSCS04B<br />

5<br />

CSC329<br />

002<br />

3<br />

30


62<br />

TABLE 4-1 (CONTINUED)<br />

Class Semester Course Section Credits Number of Students<br />

BSCS04B<br />

BSCS04B<br />

5<br />

5<br />

CSC330<br />

ENL307<br />

002<br />

001<br />

2<br />

3<br />

30<br />

30<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

ECE218<br />

MAT220<br />

CSC211<br />

CSC215<br />

CSC221<br />

ECE217<br />

CSC210<br />

001<br />

001<br />

002<br />

002<br />

002<br />

001<br />

002<br />

2<br />

3<br />

4<br />

2<br />

3<br />

2<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

CSC215<br />

CSC221<br />

ECE217<br />

ECE218<br />

MAT220<br />

CSC211<br />

CSC210<br />

001<br />

001<br />

002<br />

002<br />

002<br />

001<br />

001<br />

2<br />

3<br />

2<br />

2<br />

3<br />

4<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

CSC120<br />

CSC127<br />

ENL101<br />

MAT125<br />

CSC110<br />

CSC113<br />

CSC115<br />

002<br />

002<br />

001<br />

001<br />

002<br />

002<br />

002<br />

3<br />

2<br />

3<br />

3<br />

2<br />

2<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

MAT125<br />

CSC113<br />

CSC115<br />

CSC120<br />

CSC127<br />

ENL101<br />

CSC110<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

3<br />

2<br />

2<br />

3<br />

2<br />

3<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

5<br />

5<br />

5<br />

ECE320<br />

ECE325<br />

ECE326<br />

001<br />

001<br />

001<br />

2<br />

3<br />

2<br />

30<br />

30<br />

30


63<br />

TABLE 4-1 (CONTINUED)<br />

Class Semester Course Section Credits Number of Students<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

5<br />

5<br />

5<br />

ENL308<br />

MAT322<br />

SIE305<br />

001<br />

001<br />

001<br />

3<br />

2<br />

3<br />

30<br />

30<br />

30<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

BSEE04B<br />

5<br />

5<br />

5<br />

5<br />

5<br />

5<br />

ECE320<br />

ECE325<br />

ECE326<br />

ENL308<br />

MAT322<br />

SIE305<br />

002<br />

002<br />

002<br />

002<br />

002<br />

002<br />

2<br />

3<br />

2<br />

3<br />

2<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

BSEE05A<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

ECE212<br />

MAT223<br />

PHY241<br />

ECE200<br />

ECE205<br />

ECE203<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

3<br />

2<br />

3<br />

2<br />

2<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

BSEE05B<br />

3<br />

3<br />

3<br />

3<br />

3<br />

3<br />

MAT223<br />

PHY241<br />

ECE200<br />

ECE203<br />

ECE205<br />

ECE212<br />

002<br />

002<br />

002<br />

002<br />

002<br />

002<br />

2<br />

3<br />

2<br />

2<br />

2<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

ENL101<br />

MAT125<br />

CHE103<br />

CHE104<br />

ECE120<br />

ECE102<br />

002<br />

002<br />

006<br />

006<br />

001<br />

001<br />

3<br />

3<br />

3<br />

2<br />

3<br />

2<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

BSEE06B<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

CHE103<br />

CHE104<br />

ECE102<br />

ENL101<br />

MAT125<br />

ECE120<br />

005<br />

005<br />

002<br />

003<br />

002<br />

002<br />

3<br />

2<br />

2<br />

3<br />

3<br />

3<br />

30<br />

30<br />

30<br />

30<br />

30<br />

30


64<br />

26 lecturers are assigned to teach courses. Classroom groups used <strong>for</strong> each<br />

“course + section” are identified, shown in Table 4-2.<br />

TABLE 4-2 Lecturer and classroom assignment<br />

Course Section Lecturer Room Group<br />

ENL101<br />

ENL101<br />

ENL101<br />

001<br />

002<br />

003<br />

00001<br />

00001<br />

00001<br />

ENLLECRM<br />

ENLLECRM<br />

ENLLECRM<br />

ENL307<br />

001<br />

00003<br />

ENLLECRM<br />

ENL308<br />

001<br />

00003<br />

ENLLECRM<br />

ENL308<br />

002<br />

00003<br />

ENLLECRM<br />

PHY241 002 00006 PHYLECRM<br />

PHY241 001 00007 PHYLECRM<br />

CSC110<br />

CSC110<br />

CSC113<br />

CSC115<br />

001<br />

002<br />

002<br />

002<br />

00014<br />

00014<br />

00014<br />

00014<br />

CSCLECRM<br />

CSCLECRM<br />

CSCCOMLB<br />

CSCLECRM<br />

CSC120<br />

002<br />

00015<br />

CSCLECRM<br />

CSC127<br />

001<br />

00015<br />

CSCLECRM<br />

CSC127<br />

002<br />

00015<br />

CSCLECRM<br />

CSC210<br />

001<br />

00015<br />

CSCLECRM<br />

CSC113<br />

001<br />

00016<br />

CSCCOMLB<br />

CSC115<br />

001<br />

00016<br />

CSCLECRM<br />

CSC120<br />

001<br />

00016<br />

CSCLECRM<br />

CSC211<br />

001<br />

00016<br />

CSCCOMLB<br />

CSC221<br />

001<br />

00017<br />

CSCLECRM<br />

CSC221<br />

002<br />

00017<br />

CSCLECRM<br />

CSC210<br />

002<br />

00018<br />

CSCLECRM<br />

CSC211<br />

002<br />

00018<br />

CSCCOMLB<br />

CSC215<br />

001<br />

00018<br />

CSCLECRM<br />

CSC215<br />

002<br />

00018<br />

CSCLECRM<br />

CSC326<br />

001<br />

00019<br />

CSCLECRM<br />

CSC326<br />

002<br />

00019<br />

CSCLECRM<br />

CSC327<br />

001<br />

00019<br />

CSCLECRM<br />

CSC327<br />

002<br />

00019<br />

CSCLECRM


65<br />

TABLE 4-2 (CONTINUED)<br />

Course Section Lecturer Room Group<br />

CSC329<br />

CSC329<br />

CSC330<br />

001<br />

002<br />

001<br />

00020<br />

00020<br />

00020<br />

CSCLECRM<br />

CSCLECRM<br />

CSCLECRM<br />

CSC328<br />

CSC328<br />

CSC330<br />

001<br />

002<br />

002<br />

00021<br />

00021<br />

00021<br />

CSCCOMLB<br />

CSCCOMLB<br />

CSCLECRM<br />

ECE120<br />

ECE120<br />

ECE200<br />

ECE200<br />

001<br />

002<br />

001<br />

002<br />

00031<br />

00031<br />

00031<br />

00031<br />

ECELECRM<br />

ECELECRM<br />

ECEESTLB<br />

ECEESTLB<br />

ECE102<br />

ECE102<br />

ECE205<br />

ECE212<br />

001<br />

002<br />

002<br />

001<br />

00032<br />

00032<br />

00032<br />

00032<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECE203<br />

ECE203<br />

ECE205<br />

ECE212<br />

001<br />

002<br />

001<br />

002<br />

00033<br />

00033<br />

00033<br />

00033<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECE217<br />

ECE217<br />

ECE218<br />

ECE218<br />

001<br />

002<br />

001<br />

002<br />

00034<br />

00034<br />

00034<br />

00034<br />

ECELECRM<br />

ECELECRM<br />

ECEDCDLB<br />

ECEDCDLB<br />

ECE320<br />

ECE320<br />

ECE325<br />

ECE325<br />

001<br />

002<br />

001<br />

002<br />

00035<br />

00035<br />

00035<br />

00035<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECELECRM<br />

ECE326<br />

ECE326<br />

001<br />

002<br />

00036<br />

00036<br />

ECEELCLB<br />

ECEELCLB<br />

SIE305 001 00046 SIELECRM<br />

SIE305 002 00047 SIELECRM<br />

MAT125<br />

MAT125<br />

001<br />

002<br />

00059<br />

00059<br />

MATLECRM<br />

MATLECRM<br />

MAT220<br />

MAT220<br />

MAT223<br />

001<br />

002<br />

001<br />

00061<br />

00061<br />

00061<br />

MATLECRM<br />

MATLECRM<br />

MATLECRM


66<br />

TABLE 4-2 (CONTINUED)<br />

Course Section Lecturer Room Group<br />

MAT223 002 00061 MATLECRM<br />

MAT322<br />

MAT322<br />

001<br />

002<br />

00063<br />

00063<br />

MATLECRM<br />

MATLECRM<br />

CHE103<br />

CHE103<br />

005<br />

006<br />

00071<br />

00071<br />

CHELECRM<br />

CHELECRM<br />

CHE104 005 00072 CHEFTCLB<br />

CHE104 006 00073 CHEFTCLB<br />

Similarly, constraints about classroom size and lecturer’s time are also prepared.<br />

4.2 The Experiments and Discussions<br />

4.2.1 Experimental Designs<br />

The aims of the experiments are to evaluate the influence of setting the GA<br />

parameters and the influence of the grid computing environment.<br />

The proposed GA that is presented in chapter 3 is applied to both the centralized<br />

course scheduling program and decentralized course scheduling program. In addition,<br />

the same values of the GA parameters will be applied to these programs. Thus, to<br />

evaluate the efficiency of the GA, we only need to test one of the above course<br />

scheduling programs. Here, we test the centralized course scheduling program. To<br />

evaluate the influence of the grid computing environment, we use the grid system as<br />

shown in section 3.7.<br />

We will do four separate experiments. The first experiment tests the influence of<br />

weighting <strong>for</strong> hard and soft constraints in the fitness function. The second and third<br />

experiments test the influence of the mutation rate and the population size on the<br />

speed of evolution respectively. Finally, the <strong>for</strong>th experiment tests the influence of<br />

using the grid computing environment.<br />

The course scheduling is a NP hard problem, and the GA itself is a metaheuristic<br />

<strong>algorithm</strong>. There<strong>for</strong>e, we would obtain a good enough solution if not the best<br />

one. Each experiment will run models until the GA detects the best solution or until<br />

the GA cannot improve the fitness value in 300 consecutive generations. The model<br />

giving a faster fitness value via many runs would be a better one.


67<br />

4.2.2 Experiment 1: Hard and Soft Constraint Weight Test<br />

The aim of this experiment is to analyze the behavior of the GA as weights W 1<br />

and W 2 in the fitness function f x)<br />

= W f ( x)<br />

+ W f ( ) are modified. More details<br />

(<br />

1 1<br />

2 2<br />

x<br />

about this function were presented in section 3.6.4.<br />

To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />

run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />

- Population size : 10<br />

- Crossover rate : 0.5<br />

- Mutation rate : 0.02<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : Varied<br />

- Soft constraint weight W 2 : Varied<br />

This experiment is per<strong>for</strong>med <strong>for</strong> 3 different pairs of weights as below:<br />

- W 1 =1.0 and W 2 =0.0<br />

- W 1 =0.75 and W 2 =0.25<br />

- W 1 =0.5 and W 2 =0.5<br />

Each pair of weights is tested 5 times. Figure 4-1 presents the average fitness<br />

value f 1 (x) of hard constraints after 500 generations.<br />

The Fitness Value of Hard Constraints vs Various Weights<br />

1.00000<br />

Fitness Value f1(x)<br />

0.50000<br />

0.00000<br />

1 51 101 151 201 251 301 351 401 451 501<br />

Generation<br />

W1=1.0 & W2=0.0 W1=0.75 & W2=0.25 W1=0.5 & W2=0.5<br />

FIGURE 4-1 The average fitness value of hard constraints vs various weights


68<br />

This result shows that the GA rapidly obtains a high fitness value f 1 (x) if we use<br />

a large value W 1. This is because the solutions that have a high fitness value of hard<br />

constraints will have more chance to be selected <strong>for</strong> survival. When W 1 is 1.0, the GA<br />

gives the fastest evolution of hard constraints.<br />

Now, we will consider what will happen <strong>for</strong> fitness value f 2 (x) of soft<br />

constraints. Figure 4-2 presents the average fitness values f 2 (x) after 500 generations.<br />

The result also shows that the GA rapidly obtains a high value f 2 (x) if we use a<br />

large value W 2. When W 2 is 0.5, the GA gives the fastest evolution of soft constraints.<br />

1.00000<br />

The Fitness Value of Soft Constraints vs Various Weights<br />

Fitness Value f2(x)<br />

0.50000<br />

0.00000<br />

1 51 101 151 201 251 301 351 401 451 501<br />

Generation<br />

W1=1.0 & W2=0.0 W1=0.75 & W2= 0.25 W1=0.5 & W2= 0.5<br />

FIGURE 4-2 The average fitness value of soft constraints vs various weights<br />

However, using a larger weight <strong>for</strong> the hard constraints means using smaller<br />

weight <strong>for</strong> the soft constraints. We have to balance between hard and soft constraints.<br />

In our study, there are nine hard constraints and only one soft constraint. There<strong>for</strong>e,<br />

the pair of W 1 =0.75 and W 2 =0.25 seems the most suitable one <strong>for</strong> our GA.<br />

4.2.3 Experiment 2: Population Size Test<br />

The aim of this experiment is to analyze the behavior of the GA as population<br />

size is modified.


69<br />

To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />

run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />

- Crossover rate : 0.5<br />

- Mutation rate : 0.02<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : 0.75<br />

- Soft constraint weight W 2 : 0.25<br />

- Population size : Varied<br />

This experiment is per<strong>for</strong>med <strong>for</strong> 3 different population sizes: 5, 10 and 15.<br />

Each the population size is tested 5 times. The chart of average execution time <strong>for</strong> a<br />

resultant solution as the population size is changed is given in Figure 4-3.<br />

The Average Time <strong>for</strong> a Resultant Solution<br />

Population Size<br />

15<br />

10<br />

5<br />

2842.6<br />

2652.8<br />

5829<br />

0 1000 2000 3000 4000 5000 6000 7000<br />

Execution Time in Secconds<br />

FIGURE 4-3 The average execution time <strong>for</strong> a resultant solution vs population sizes<br />

We know that a large population contains many different individuals. This<br />

creates a diversity of possible solutions. Using a large population size, the GA can<br />

obtain a resultant solution after a small number of generations. However, our<br />

experiment shows that in term of time, the GA with a small population size converges<br />

to a solution faster than the GA with a large size population. To explain this result, we


70<br />

should revise the chromosome representation, presented in section 3.6.1. Each<br />

chromosome represents directly a timetable or a solution, so it stores a large amount<br />

of data. It also has a large amount of related data from the database. As a result, the<br />

larger population needs more memory and more processing time <strong>for</strong> GA operations.<br />

This experiment also shows that with the smallest population size (five) we<br />

have the fastest GA.<br />

The GAs with a large population do not give a faster speed of evolution.<br />

However, in order to have diversity of solutions, it may be safe to keep the population<br />

size larger than an optimum size although it is a little slower to execute. We will use<br />

the population of 10 <strong>for</strong> our GA.<br />

4.2.4 Experiment 3: Mutation Rate Test<br />

The aim of this experiment is to analyze the behavior of the GA as mutation rate<br />

is modified.<br />

To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />

run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />

- Population size : 10<br />

- Crossover rate : 0.5<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : 0.75<br />

- Soft constraint weight W 2 : 0.25<br />

- Mutation rate : Varied<br />

This experiment is per<strong>for</strong>med <strong>for</strong> 4 different mutation rates: 0.00, 0.02, 0.20 and<br />

0.40. Each rate is tested 5 times. The chart of the average fitness value f(x) after 500<br />

generations versus different mutation rates is given in Figure 4-4.<br />

The best mutation rate is found to be 0.02. The mutation rates that are lower or<br />

higher than this rate give slower evolution. This is shown definitely. If there is no<br />

mutation (0.00), offspring are generated immediately after crossover without any<br />

change. There<strong>for</strong>e, the GA would fall into local optimum. On the other hand, the high<br />

mutation rates usually cause the exploration of search space. The GA now can fall<br />

into a random search space instead of searching from offspring of good parents.


71<br />

The GA with Various Mutation Rates<br />

1.00000<br />

Fitness Value f(x)<br />

0.50000<br />

0.00000<br />

1 51 101 151 201 251 301 351 401 451 501<br />

Generation<br />

0.00 0.02 0.20 0.40<br />

FIGURE 4-4 The GA with various mutation rates<br />

4.2.5 Experiment 4: Parallel Execution on the Grid Computing Environment<br />

The aim of this experiment is to evaluate the influence of the grid computing<br />

environment to the resultant solutions.<br />

The experiment tests three different models. The first model uses a single<br />

machine to per<strong>for</strong>m the centralized course scheduling strategy as introduced in section<br />

3.4.1. The centralized course scheduling program is used to test a centralized<br />

execution that schedules <strong>for</strong> all courses. The second model also uses a single machine,<br />

but both the centralized course scheduling program and the decentralized course<br />

scheduling program are used <strong>for</strong> a serial execution. First, the centralized course<br />

scheduling program schedules <strong>for</strong> all shared resources, and then one after another the<br />

decentralized course scheduling program schedules <strong>for</strong> the remaining resources of<br />

each faculty. Finally, the third model uses a grid computing environment <strong>for</strong> parallel<br />

execution. First, the centralized course scheduling program is executed on a machine,<br />

and then the decentralized course scheduling program is executed in parallel on<br />

remote machines.<br />

Both the centralized course scheduling program and the decentralized course<br />

scheduling program will set up with the following GA settings:


72<br />

- Population size : 10<br />

- Crossover rate : 0.5<br />

- Mutation rate : 0.02<br />

- Selection method : Steady state<br />

- Hard constraint weight W 1 : 0.75<br />

- Soft constraint weight W 2 : 0.25<br />

The first and second models are per<strong>for</strong>med on a Pentium IV 1.7 GHz machine.<br />

On the other hand, the third model is per<strong>for</strong>med on a gird computing environment of 3<br />

machines, as shown in Figure 3-23. The Central Manager Host m1 is a Pentium III<br />

700 MHz machine. The remote machines m2 and m3 are Pentium IV 1.7 GHz<br />

machines.<br />

Figure 4-5 presents a chart of the average execution time of each model after 5<br />

runs. Each model is executed until the GA finds a resultant solution.<br />

Execution Time vs Models<br />

Parallel Execution on the<br />

Grid<br />

439.6<br />

Model<br />

Serial Execution<br />

852.6<br />

Centralized Execution<br />

2842.6<br />

0 500 1000 1500 2000 2500 3000<br />

Execution Time in Seconds<br />

FIGURE 4-5 The execution time versus various models<br />

The first model is slower than the second model. The first model has a global<br />

view of the whole data, so it should have given a resultant solution within a short time<br />

interval. However, it gave an unexpected result. This is because when the whole data<br />

are centralized to be processed on a single machine, the size of the problem becomes


73<br />

too big. Certainly, the GA is slowed down when it works on large size chromosomes<br />

with a large number of conflicted hard and soft constraints. However, if the data are<br />

separated to be processed one after another by the centralized course scheduling<br />

program and the decentralized course scheduling program, the overall execution time<br />

will be shorter.<br />

The parallel execution of the third model is significant faster than the serial<br />

execution of the second model. It is almost definitely understood. Instead course<br />

scheduling jobs are per<strong>for</strong>med one after another; some of them are per<strong>for</strong>med in<br />

parallel by many different processors, as illustrated in Figure 4-6.<br />

Processors<br />

Parallel<br />

Execution<br />

Centralized Course<br />

Scheduling Program<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Engineering<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Education<br />

Centralized Course<br />

Scheduling Program<br />

Decentralized Course<br />

Scheduling Program<br />

<strong>for</strong> Faculty of Science<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Engineering<br />

Decentralized Course<br />

Scheduling Program <strong>for</strong><br />

Faculty of Education<br />

Serial<br />

Execution<br />

Decentralized Course<br />

Scheduling Program<br />

<strong>for</strong> Faculty of Science<br />

Execution Time<br />

FIGURE 4-6 Parallel execution versus serial execution<br />

The total execution time <strong>for</strong> a complete resultant solution of the third model can<br />

be presented as follow:<br />

Total parallel execution time = Time <strong>for</strong> the centralized course scheduling +<br />

Max(Time <strong>for</strong> the decentralized course scheduling on remote machines)<br />

The data that is used <strong>for</strong> the course scheduling programs is transferred from the<br />

central database to the remote machines once be<strong>for</strong>e they are processed. In addition,<br />

there are not any exchanges of data while the programs are being executed. The time


74<br />

<strong>for</strong> network communication is much smaller than the execution time of each program,<br />

so this time is not considered in this experiment.<br />

4.3 The Sample Results<br />

This section presents the results that are obtained by running the third model<br />

that is presented in the previous section.<br />

First of all, the centralized course scheduling program is executed on machine<br />

m2. It schedules <strong>for</strong> shared resources that consist of courses whose lecturers are<br />

invited from other faculties and courses whose students come from other faculties.<br />

The results are presented in Table 4-3. Then the decentralized course scheduling<br />

program is submitted to be executed in parallel on the machines m2 and m3. It<br />

schedules <strong>for</strong> the remaining resources of each faculty. All courses taught by the<br />

Faculty of Education have been scheduled by the centralized course scheduling<br />

program, so now the decentralized course scheduling program only schedules <strong>for</strong><br />

courses taught by the Faculty of Engineering and the Faculty of Science. The results<br />

are presented in Table 4-4 and Table 4-5.<br />

TABLE 4-3 Timetable created by the centralized course scheduling program<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

ENL307 001 B201A01 3 4->6 BSCS04A 00003<br />

ENL307 001 B201A01 3 4->6 BSCS04B 00003<br />

ECE218<br />

ECE217<br />

001<br />

001<br />

B301B02<br />

B301A07<br />

4<br />

2<br />

2->3<br />

4->5<br />

BSCS05A<br />

BSCS05A<br />

00034<br />

00034<br />

ECE218<br />

ECE217<br />

002<br />

002<br />

B301B02<br />

B301A06<br />

1<br />

1<br />

6->7<br />

2->3<br />

BSCS05B<br />

BSCS05B<br />

00034<br />

00034<br />

ENL101 001 B201A01 2 4->6 BSCS06A 00001<br />

ENL101 001 B201A01 2 4->6 BSCS06B 00001<br />

MAT322<br />

ENL308<br />

001<br />

001<br />

B101A09<br />

B201A03<br />

0<br />

4<br />

6->7<br />

0->2<br />

BSEE04A<br />

BSEE04A<br />

00063<br />

00003<br />

ENL308<br />

MAT322<br />

002<br />

002<br />

B201A03<br />

B101A10<br />

4<br />

4<br />

4->6<br />

2->3<br />

BSEE04B<br />

BSEE04B<br />

00003<br />

00063<br />

MAT223<br />

PHY241<br />

001<br />

001<br />

B101A12<br />

B102A04<br />

1<br />

2<br />

4->5<br />

0->2<br />

BSEE05A<br />

BSEE05A<br />

00061<br />

00007


75<br />

TABLE 4-3 (CONTINUED)<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

MAT223<br />

PHY241<br />

002<br />

002<br />

B101A08<br />

B102A06<br />

0<br />

3<br />

2->3<br />

4->6<br />

BSEE05B<br />

BSEE05B<br />

00061<br />

00006<br />

CHE104<br />

ENL101<br />

MAT125<br />

CHE103<br />

006<br />

002<br />

002<br />

006<br />

B103A15<br />

B201A02<br />

B101A01<br />

B103A06<br />

0<br />

3<br />

2<br />

0<br />

2->3<br />

0->2<br />

0->2<br />

4->6<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

BSEE06A<br />

00073<br />

00001<br />

00059<br />

00071<br />

MAT125<br />

002<br />

B101A01<br />

2<br />

0->2<br />

BSEE06B<br />

00059<br />

CHE103<br />

005<br />

B103A01<br />

4<br />

0->2<br />

BSEE06B<br />

00071<br />

CHE104<br />

005<br />

B103A11<br />

4<br />

6->7<br />

BSEE06B<br />

00072<br />

ENL101<br />

003<br />

B201A01<br />

1<br />

0->2<br />

BSEE06B<br />

00001<br />

TABLE 4-4 Timetable created by the decentralized course scheduling program <strong>for</strong><br />

Faculty of Engineering<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

ECE325<br />

ECE326<br />

SIE305<br />

ECE320<br />

001<br />

001<br />

001<br />

001<br />

B301A04<br />

B301B01<br />

B302A03<br />

B301A01<br />

3<br />

2<br />

4<br />

1<br />

0->2<br />

4->5<br />

4->6<br />

2->3<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

BSEE04A<br />

00035<br />

00036<br />

00046<br />

00035<br />

ECE320<br />

002<br />

B301A10<br />

0<br />

2->3<br />

BSEE04B<br />

00035<br />

ECE325<br />

002<br />

B301A10<br />

2<br />

0->2<br />

BSEE04B<br />

00035<br />

SIE305<br />

002<br />

B302A02<br />

1<br />

4->6<br />

BSEE04B<br />

00047<br />

ECE326<br />

002<br />

B301B01<br />

4<br />

0->1<br />

BSEE04B<br />

00036<br />

ECE212<br />

001<br />

B301A01<br />

3<br />

4->6<br />

BSEE05A<br />

00032<br />

ECE203<br />

001<br />

B301A02<br />

4<br />

0->1<br />

BSEE05A<br />

00033<br />

ECE200<br />

001<br />

B301B05<br />

4<br />

4->5<br />

BSEE05A<br />

00031<br />

ECE205<br />

001<br />

B301A01<br />

1<br />

0->1<br />

BSEE05A<br />

00033<br />

ECE205<br />

002<br />

B301A01<br />

4<br />

4->5<br />

BSEE05B<br />

00032<br />

ECE212<br />

002<br />

B301A09<br />

2<br />

0->2<br />

BSEE05B<br />

00033<br />

ECE200<br />

002<br />

B301B05<br />

4<br />

6->7<br />

BSEE05B<br />

00031<br />

ECE203<br />

002<br />

B301A08<br />

0<br />

4->5<br />

BSEE05B<br />

00033<br />

ECE102<br />

001<br />

B301A07<br />

4<br />

6->7<br />

BSEE06A<br />

00032<br />

ECE120<br />

001<br />

B301A08<br />

2<br />

4->6<br />

BSEE06A<br />

00031<br />

ECE120<br />

002<br />

B301A08<br />

3<br />

0->2<br />

BSEE06B<br />

00031<br />

ECE102<br />

002<br />

B301A01<br />

0<br />

0->1<br />

BSEE06B<br />

00032


76<br />

TABLE 4-5 Timetable created by the decentralized course scheduling program <strong>for</strong><br />

Faculty of Science<br />

Course Section Classroom Day Time-slot Class Lecturer<br />

CSC328<br />

CSC326<br />

CSC329<br />

CSC327<br />

CSC330<br />

001<br />

001<br />

001<br />

001<br />

001<br />

B104B18<br />

B104B05<br />

B104B11<br />

B104B05<br />

B104B02<br />

2<br />

1<br />

0<br />

4<br />

4<br />

2->3<br />

0->2<br />

0->2<br />

6->7<br />

2->3<br />

BSCS04A<br />

BSCS04A<br />

BSCS04A<br />

BSCS04A<br />

BSCS04A<br />

00021<br />

00019<br />

00020<br />

00019<br />

00020<br />

CSC328<br />

CSC329<br />

CSC326<br />

CSC330<br />

CSC327<br />

002<br />

002<br />

002<br />

002<br />

002<br />

B104B16<br />

B104B09<br />

B104B10<br />

B104B03<br />

B104B01<br />

1<br />

0<br />

2<br />

4<br />

2<br />

2->3<br />

4->6<br />

4->6<br />

6->7<br />

2->3<br />

BSCS04B<br />

BSCS04B<br />

BSCS04B<br />

BSCS04B<br />

BSCS04B<br />

00021<br />

00020<br />

00019<br />

00021<br />

00019<br />

CSC210<br />

CSC215<br />

CSC221<br />

MAT220<br />

CSC211<br />

002<br />

002<br />

002<br />

001<br />

002<br />

B104B06<br />

B104B04<br />

B104B03<br />

B101A02<br />

B104B17<br />

3<br />

4<br />

1<br />

0<br />

2<br />

4->6<br />

6->7<br />

4->6<br />

4->6<br />

0->3<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

BSCS05A<br />

00018<br />

00018<br />

00017<br />

00061<br />

00018<br />

MAT220<br />

CSC221<br />

CSC211<br />

CSC210<br />

CSC215<br />

002<br />

001<br />

001<br />

001<br />

001<br />

B101A11<br />

B104B09<br />

B104B15<br />

B104B08<br />

B104B04<br />

2<br />

2<br />

3<br />

0<br />

3<br />

4->6<br />

0->2<br />

4->7<br />

0->2<br />

2->3<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

BSCS05B<br />

00061<br />

00017<br />

00016<br />

00015<br />

00018<br />

MAT125<br />

CSC120<br />

CSC115<br />

CSC110<br />

CSC127<br />

CSC113<br />

001<br />

002<br />

002<br />

002<br />

002<br />

002<br />

B101A03<br />

B104B07<br />

B104B12<br />

B104B06<br />

B104B01<br />

B104B14<br />

4<br />

3<br />

1<br />

3<br />

4<br />

4<br />

4->6<br />

4->6<br />

0->1<br />

0->1<br />

2->3<br />

0->1<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

BSCS06A<br />

00059<br />

00015<br />

00014<br />

00014<br />

00015<br />

00014<br />

CSC120<br />

CSC110<br />

MAT125<br />

CSC127<br />

CSC113<br />

CSC115<br />

001<br />

001<br />

001<br />

001<br />

001<br />

001<br />

B104B12<br />

B104B08<br />

B101A03<br />

B104B04<br />

B104B14<br />

B104B08<br />

4<br />

3<br />

4<br />

1<br />

3<br />

2<br />

0->2<br />

6->7<br />

4->6<br />

4->5<br />

2->3<br />

0->1<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

BSCS06B<br />

00016<br />

00014<br />

00059<br />

00015<br />

00016<br />

00016


77<br />

These results show that all constraints presented in section 1.3 have been<br />

satisfied. Every “course + section” is scheduled exactly once in a week. No course is<br />

scheduled cross morning and afternoon working sessions. Neither a class nor a<br />

lecturer nor a classroom is assigned to more than one course at the same time. For<br />

example, shown in Table 4-3, section 001 of course ENL308 is scheduled <strong>for</strong> lecturer<br />

00003 using classroom B201A03 on day 4 (Friday) and in the time-slots: 0, 1, and 2.<br />

There<strong>for</strong>e, this lecturer and this classroom are not booked <strong>for</strong> other courses at this<br />

time.<br />

Once a class of students studies from a list of courses, these courses have to be<br />

scheduled to different time periods. For example, shown in Table 4-1, class BSCS05B<br />

studies section 001 of courses: CSC215, CSC221, CSC210, and CSC211, and section<br />

002 of courses: ECE217, ECE218, and MAT220. There<strong>for</strong>e these “course + section”<br />

are scheduled to different time periods. Another example is shown in Table 4-3.<br />

Section 001 of course ENL307 is attended by both classes: BSCS04A and BSCS04B.<br />

There<strong>for</strong>e, this course section is scheduled to the same time periods and the same<br />

classroom so that these classes can attend it as well as their other courses.<br />

Other constraints presented in section 1.3 have also been satisfied, but they are<br />

not introduced here.<br />

The decentralized course scheduling program must give results that do not<br />

conflict with the centralized course scheduling output. If a class is scheduled by the<br />

centralized course scheduling program, then the decentralized course scheduling<br />

program has to schedule the remaining courses that concern this class to another time.<br />

For example, shown in Table 4-3, the centralized course scheduling program<br />

scheduled the courses that are attended by class BSEE06A. There<strong>for</strong>e, the<br />

decentralized course scheduling program scheduled other courses studied by this class<br />

to another time, shown in Table 4-4.


CHAPTER 5<br />

CONCLUSION<br />

5.1 Conclusions<br />

This study proposed a hybrid centralized and de-centralized approach, a <strong>genetic</strong><br />

<strong>algorithm</strong>, and a grid computing environment <strong>for</strong> course scheduling in <strong>multi</strong>ple<br />

faculty universities.<br />

The proposed GA demonstrated its ability <strong>for</strong> solving a complex optimization<br />

problem, the highly constrained course scheduling problem. The direct representation<br />

of chromosomes is convenient <strong>for</strong> representing a large number of constraints of a<br />

realistic timetable. Additional constraints can easily be added into the model without<br />

much modification on the basic model.<br />

The speed of evolution of the GA is significantly different dependent on GA<br />

parameters used. The GAs with large populations do not give a faster speed of<br />

convergence. However, in order to have diversity of solutions, it may be safe to keep<br />

the population size larger than an optimum size although it is a little slower. The<br />

experiments also show that the use of mutation is very important <strong>for</strong> the GA. A small<br />

enough rate is effective. No mutation or mutation with high rates gives a slower<br />

evolution. The weighting <strong>for</strong> hard and soft constraints in the fitness function should be<br />

based on the number and importance of them. The hard constraints should be<br />

weighted larger than the soft constraints.<br />

The hybrid centralized and de-centralized approach was used. The centralized<br />

course scheduling program only schedules <strong>for</strong> shared resources whereas the<br />

decentralized course scheduling program schedules <strong>for</strong> remaining resources of each<br />

faculty. The results showed that this approach gave the expected solutions without<br />

any constraint conflicts between resources around the university. The resultant<br />

solution can help lecturers not only teach at their faculty but also at other faculties. A<br />

course can be attended by many different classes.<br />

The grid computing environment is used as infrastructure <strong>for</strong> distributed and<br />

parallel computing. There is a combination of the hybrid centralized and de-


80<br />

centralized approach and grid computing environment. Now the centralized course<br />

scheduling program and decentralized course scheduling program are considered as<br />

jobs. These jobs are scheduled to be executed. The centralized course scheduling job<br />

is per<strong>for</strong>med first, and then the decentralized course scheduling jobs are per<strong>for</strong>med in<br />

parallel on separate machines. The decentralized course scheduling program must<br />

give results that do not conflict with the centralized course scheduling output.<br />

The use of the grid computing environment gave a high level of efficiency. It<br />

reduces significantly the overall execution time <strong>for</strong> a resultant solution. This is<br />

because a very large problem with many conflicted constraints is now separated into<br />

small size problems to be processed in parallel by many different machines instead of<br />

using only one machine.<br />

5.2 Future Works<br />

Overall, our preliminary experiments suggested that the proposed model has<br />

been successful to satisfy the <strong>objective</strong>s in our proposal. We have worked on two<br />

interesting areas: the <strong>genetic</strong> <strong>algorithm</strong> and the grid computing. They are wide areas,<br />

so what has been obtained is a foundation <strong>for</strong> further research.<br />

Our experiments identified the GA parameters <strong>for</strong> an effective GA. Further<br />

experiments should be done <strong>for</strong> various data and more soft constraints. We also need<br />

design <strong>algorithm</strong>s that are able to automatically identify suitable values <strong>for</strong> the GA<br />

parameters.<br />

Local search techniques should be used to improve the speed of the GA. The<br />

local search <strong>algorithm</strong>s should also help the GA to create solutions that are able to<br />

minimize use of university resources, e.g. the number of used classrooms and the<br />

stretch of lecturer time.<br />

To satisfy both hard and soft constraints in a balanced way, the <strong>multi</strong>-<strong>objective</strong><br />

<strong>genetic</strong> <strong>algorithm</strong> should be researched.<br />

The grid computing environment was implemented on Linux machines. For<br />

more flexible use, it should be developed <strong>for</strong> heterogeneous environments with more<br />

machines added.


REFERENCES<br />

1. Alkan, A. and Ozcan, E. “Memetic Algorithms <strong>for</strong> Timetabling.” IEEE Congress<br />

on Evolutionary Computation. 3 (2003, December 8-12) : 1796-1802.<br />

2. Marc Buf, Tim Fischer, et al. “Automated solution of a highly constrained school<br />

timetabling problem - preliminary results.” Applications of Evolutionary<br />

Computing : EvoWorkshops 2001: EvoCOP, EvoFlight, EvoIASP, EvoLearn,<br />

and EvoSTIM, Como, Italy. (2001, April 18-20) : 431-440.<br />

3 Goulas, G. and Housos, E. “SchedSP: Providing GRID-enabled Real - World<br />

Scheduling Solutions as Application Services.” EuroWeb 2002 Conference,<br />

St Anne's College, Ox<strong>for</strong>d, UK. (2002, December 17-18).<br />

4. Kaplansky, E., Kendall, G., et al. “Distributed Examination Timetabling.”<br />

PATAT '04 Proceedings of the 5th International Conference on the Practice<br />

and Theory of Automated Timetabling, Pittsburgh, PA USA. (2004, August<br />

18-20) : 511-516.<br />

5. Lim, A., Ang, J. C., et al. “UTTSExam: A Campus-Wide University Exam-<br />

Timetabling System”. Proceedings of the Eighteenth National Conference<br />

on Artificial Intelligence and Fourteenth Conference on Innovative<br />

Applications of Artificial Intelligence, Edmonton, Alberta, Canada. (2002,<br />

July 28 - August 1) : 838-844.<br />

6. Genetic Algorithm [Online]. Available from:<br />

http://cs.felk.cvut.cz/~xobitko/ga/gaintro.html [2005, May 2].<br />

7. Luis Ferreira, et al. Introduction to Grid Computing with Globus. IBM Redbooks,<br />

September 2003.<br />

8. Bart Jacob, et al. Enabling Applications <strong>for</strong> Grid Computing with Globus. IBM<br />

Redbooks, June 2003.<br />

9. Carter, M. W. and Laporte, G. “Recent Developments in Practical Course<br />

Timetabling.” In Edmund Burke and Michael Carter, editors, Practice and<br />

Theory of Automated Timetabling II, Springer-Verlag LNCS. 1408 (1998) :<br />

3-19.


82<br />

10. Carter, M. W. “A Survey of Practical Applications of Examination Timetabling<br />

Algorithms.” Operations Research. 34 (1986) : 193-202.<br />

11. Burke, E. K., Elliman, D. G., et al. “University Timetabling System Based on<br />

Graph Colouring and Constraint Manipulation.” Journal of Research on<br />

Computing in Education. 27(1) (1993) : 1-18.<br />

12. Burke, E. K., Dror, M., et al. “Hybrid Graph Heuristics within a Hyper-heuristic<br />

Approach to Exam Timetabling Problems.” The Next Wave in Computing,<br />

Optimization, and Decision Technologies. (2005) : 79-91.<br />

13. Redl, T. A. “A Study of University Timetabling that Blends Graph Coloring with<br />

the Satisfaction of Various Essential and Preferential Conditions.”<br />

PhD.Thesis, Rice University, Houston, Texas, 2004.<br />

14. Balakrishnan, N., Lucena, A. and Wong, R. T. “Scheduling Examinations to<br />

Reduce Second-Order Conflicts.” Computers & Operations Research. 19<br />

(1992) : 353-361.<br />

15. Arani, T. and Lotfi, V. “A Three Phased Approach to Final Exam Scheduling.”<br />

IIE Trans. 21 (1989) : 86-96.<br />

16. Sally C. Brails<strong>for</strong>d, Chris N. Potts, et al. ”Constraint Satisfaction Problems:<br />

Algorithms and Applications.” European Journal of Operational Research.<br />

119 (1999) : 557-581.<br />

17. White, G. M. “Constrained Satisfaction, Not So Constrained Satisfaction and the<br />

Timetabling Problem.” PATAT '00 Proceedings of the 3rd International<br />

Conference on the Practice and Theory of Automated Timetabling, Konstanz,<br />

Germany. 1 (2000, August 16-18) : 32-47.<br />

18. Valouxis, C. and Housos, E.. “Constraint Programming Approach <strong>for</strong> School<br />

Timetabling.” Computers & Operations Research. 30(1) (2003, September) :<br />

1555–1572.<br />

19. Gueret, C., Jussien, N., et al. “Building University timetables using Constraint<br />

Logic Programming.” Proceedings of the First International Conference on<br />

the Practice and Theory of Automated Timetabling (ICPTAT '95), France.<br />

(1995) : 393-408.


83<br />

20. Burke, E. K. and Newall, J. P. “A Multi-Stage Evolutionary Algorithm <strong>for</strong> the<br />

Timetable Problem.” The IEEE Transactions on Evolutionary Computation.<br />

3(1) (1999, April) : 63-74.<br />

21. Paechter, B., Rankin, R. C. and Cumming, A. “Improving a Lecture Timetabling<br />

System <strong>for</strong> University-Wide Use.” In: Burke, E., Carter, M. (eds.): The<br />

Practice and Theory of Automated Timetabling II: Selected Papers<br />

(PATAT ’97, University of Toronto), Lecture Notes in Computer Science,<br />

Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 156-165.<br />

22. Ross, P., Hart, E. and Corne, D. “Some Observations about GA based<br />

Timetabling.” In: Burke, E., Carter, M. (eds.): The Practice and Theory of<br />

Automated Timetabling II: Selected Papers (PATAT ’97, University of<br />

Toronto, Lecture Notes in Computer Science, Springer-Verlag, Berlin<br />

Heidelberg New York. 1408 (1998) : 115-129.<br />

23. Elmohamed, S., Coddington, P. and Fox., F. A. “Comparison of Annealing<br />

Techniques <strong>for</strong> Academic Course Scheduling.” In: Burke, E., Carter, M.<br />

(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />

Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />

Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 92-<br />

112.<br />

24. White, G. M. and Zhang, J. “Generating Complete University Timetables by<br />

Combining Tabu Search with Constraint Logic.” In: Burke, E., Carter, M.<br />

(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />

Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />

Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 187-<br />

210.<br />

25. Dowsland, K. A. “Off the Peg or Made to Measure.” In: Burke, E., Carter, M.<br />

(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />

Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />

Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 37-52.<br />

26. Elmohamed, S., et al. “A Comparison of Annealing Techniques <strong>for</strong> Academic<br />

Course Scheduling.” Lecture Notes in Computer Science. 1408 (1998) : 92-<br />

114.


84<br />

27. Abramson, D. “Constructing School Timetables using Simulated Annealing:<br />

Sequential and Parallel Algorithms.” Management Science. 37(1) (1991,<br />

January) : 98 – 113.<br />

28. Aydin, M. E. “A Distributed Evolutionary Simulated Annealing Algorithm <strong>for</strong><br />

Combinatorial Optimisation Problems.” Journal of Heuristics. 10 (2004) :<br />

269–292.<br />

29. Calaor, A. E., Hermosilla, A.Y., et al. “Parallel Hybrid Adventures with<br />

Simulated Annealing and Genetic Algorithms.” Proceedings of the<br />

International Symposium on Parallel Architectures, Algorithms and<br />

Networks (ISPAN.02). (2002, May 22-24) : 33-38.<br />

30. Alvarez-valdes, R. “A Tabu Search Algorithm to Schedule University<br />

Examinations.” QUESTIIO. 21 (1997) : 201-215.<br />

31. Burke, E. K., Kendall, G. and Soubeiga, E. “Tabu-Search Hyperheuristic <strong>for</strong><br />

Timetabling and Rostering.” Journal of Heuristics. 9 (2003) : 451–470.<br />

32. Tabu Search [Online]. Available from:<br />

http://www.cs.sandia.gov/opt/survey/ts.html [2005, September 12].<br />

33. Wang, Y. Z. “A GA-based methodology to determine an optimal curriculum <strong>for</strong><br />

schools.” Expert Systems with Applications. 28 (2005) : 163–174.<br />

34. Tuan, D. A. and Kim, H. L. “Combining Constraint Programming and Simulated<br />

Annealing on University Exam Timetabling.” International Conference,<br />

RIVF’04, Hanoi, Vietnam. (2004, February 2-5) : 205-210.<br />

35. Kaplansky, E. and Meisels, A. “Negotiation among Scheduling Agents <strong>for</strong><br />

Distributed Timetabling.” In Submitted to the 5th International Conference<br />

on the Practice and Theory of Automated Timetabling PATAT'04, Pittsburgh,<br />

PA USA. (2004, August) : 84-105.<br />

36. Marczyk, A. Genetic Algorithms and Evolutionary Computation [Online].<br />

Available from: http://www.talkorigins.org/faqs/genalg/genalg.html [2005,<br />

September 18].<br />

37. Esposito, A. and Tarricone, L. “Grid Computing <strong>for</strong> Electromagnetics: A<br />

Beginner’s Guide with Applications.” IEEE Antennas and Propagation<br />

Magazine. 45(2) (2003, April) : 91-100.


85<br />

38. Globus Toolkit [Online]. Available From: http://www.globus.org [2005,<br />

September 20].<br />

39. Foster, I., Kesselman, C. and Tuecke, S. “The Anatomy of the Grid: Enabling<br />

Scalable Virtual Organizations.” International Journal of High Per<strong>for</strong>mance<br />

Computing Applications. 15(3) (2001) : 200-222.<br />

40. Hamscher, V., Schwiegelshohn, U., et al. “Evaluation of Job-Scheduling<br />

Strategies <strong>for</strong> Grid Computing.” In Proceedings of the 7th International<br />

Conference on High Per<strong>for</strong>mance Computing HiPC-2000, Springer, Berlin,<br />

Lecture Notes in Computer Science LNCS 1971, Bangalore, Indien. (2000,<br />

December) : 192-202.


APPENDIX A<br />

DATA DICTIONARY


88<br />

This section presents the structure of the tables in the database that is created <strong>for</strong><br />

the entity relationship diagram shown in Figure 3-5.<br />

A.1 Faculty<br />

TABLE A-1 Faculty<br />

Table: Faculty<br />

Field Type Key Description<br />

FacultyID char(2) Primary ID of faculty<br />

FacultyName char(100) Name of faculty<br />

The university has several faculties, e.g. Faculty of Science, Faculty of<br />

Engineering, and Faculty of Education.<br />

A.2 Department<br />

TABLE A-2 Department<br />

Table: Department<br />

Field Type Key Description<br />

DeptID char(3) Primary ID of department<br />

DeptName char(255) Name of department<br />

FacultyID char(2) Foreign ID of faculty<br />

Each faculty has several departments that include a set of lecturers and courses<br />

within the same scientific domain, e.g. Department of Computer Science, Department<br />

of Mathematics, and Department of Physics.


89<br />

A.3 Lecturer<br />

TABLE A-3 Lecturer<br />

Table: Lecturer<br />

Field Type Key Description<br />

LecturerID char(5) Primary ID of lecturer<br />

LecturerName char(40) Name of lecturer<br />

Gender char(1) Gender of lecturer<br />

DeptID char(3) Foreign ID of department<br />

Lecturers are responsible <strong>for</strong> teaching several courses. Each lecturer is member<br />

of a department.<br />

A.4 Busy Time<br />

TABLE A-4 Busy Time<br />

Table: BusyTime<br />

Field Type Key Description<br />

LecturerID char(5) Primary ID of lecturer<br />

Day int(2) Date in a week<br />

Workingsession int(2) Working session in a day<br />

State int(1) State of lecturer<br />

Not all working sessions of a day in each week are available to be scheduled <strong>for</strong><br />

a lecturer. For instance, Mr. Tim cannot teach on every Monday morning because of<br />

weekly meeting. Some other lecturers dislike teaching in some working sessions. For<br />

instance, Miss Mary dislikes teaching on Friday mornings. Based on data stored in the<br />

BusyTime table, the system tries to satisfy lecturers’ desires. A state has one of the<br />

following three states: 0, 1, or 2. The value of 2 presents a available working session.<br />

The value of 1 is used if the lecturer dislikes teaching at this time (soft constraint).<br />

Finally, the value of 0 is used if the lecturer cannot teach at this time (hard constraint).


90<br />

A.5 Building<br />

TABLE A-5 Building<br />

Table: Building<br />

Field Type Key Description<br />

BuildingID char(2) Primary ID of building<br />

BuildingName char(100) Name of building<br />

The university has several buildings that have a number of classrooms.<br />

A.6 Classroom<br />

TABLE A-6 Classroom<br />

Table: Classoom<br />

Field Type Key Description<br />

ClassroomID char(7) Primary ID of classroom<br />

ClassroomName char(10) Name of classroom<br />

Seats int(3) Number of seats<br />

BuildingID char(2) Foreign ID of building<br />

ClasssroomGroupID char(8) Foreign ID of classroom group<br />

A classroom in a building belongs to a certain classroom group.<br />

A.7 Classroom Group<br />

TABLE A-7 Classroom group<br />

Table: ClassroomGroup<br />

Field Type Key Description<br />

ClassroomGroupID char(8) Primary ID of classroom group<br />

ClassroomGroupName char(100) Name of classroom group


91<br />

Classrooms are grouped into groups. A course is scheduled to a classroom of<br />

designated groups. For instance, course ECE218 (Digital Circuit Design Lab) is only<br />

expected to be scheduled to group ECEDCDLB (Digital Circuit Design Labs).<br />

A.8 Department Controls Rooms<br />

TABLE A-8 Department controls classroom<br />

Table: DeptControlRoom<br />

Field Type Key Description<br />

DeptID char(3) Primary ID of department<br />

ClassroomGroupID char(8) Primary ID of classroom group<br />

A department owns a number of classroom groups that are used <strong>for</strong> its courses.<br />

A.9 Course<br />

TABLE A-9 Course<br />

Table: Course<br />

Field Type Key Description<br />

CourseID char(6) Primary ID of course<br />

CourseName char(80) Name of course<br />

Credits int(2) Number of credits<br />

Kind char(1) Kind : lecture or practice<br />

DeptID char(3) Foreign ID of a department<br />

A course belongs to a department.


92<br />

A.10 Program<br />

TABLE A-10 Program<br />

Table: Program<br />

Field Type Key Description<br />

ProgramID char(4) Primary ID of program<br />

ProgramName char(255) Name of program<br />

NumSemesters int(2) Number of semesters<br />

DeptID char(3) Foreign ID of department<br />

The university has a number of programs. After studying a program that<br />

includes a number of courses, a student will get a degree, e.g. Bachelor of Science in<br />

Computer Science. A program belongs to a department.<br />

A.11 Curriculum<br />

TABLE A-11 Curriculum<br />

Table: Curriculum<br />

Field Type Key Description<br />

ProgramID char(4) Primary ID of program<br />

CourseID char(6) Primary ID of course<br />

Semester int(2) Semester has this course<br />

Year int(4) Enrolment year of students<br />

<strong>for</strong> applying this curriculum<br />

To take a degree a student has to fulfill a list of courses in each semester. For<br />

instance, in the first semester, students of Bachelor of Science in Computer Science<br />

take courses: ENL101, CSC110, CSC113, MAT125, CSC115, CSC120, and CSC127.<br />

A curriculum is applied to students based on their enrolment year.


93<br />

A.12 Class<br />

TABLE A-12 Class<br />

Table: Class<br />

Field Type Key Description<br />

ClassID char(7) Primary ID of class<br />

ClassName char(100) Name of class<br />

NumStudents int(3) Number of students<br />

EnrolYear int(4) Enrolment year<br />

ProgramID char(4) Foreign ID of program<br />

Students who study the same program and have the same enrolment year are<br />

grouped into classes.<br />

A.13 Course Section<br />

TABLE A-13 Course section<br />

Table: CourseSection<br />

Field Type Key Description<br />

ClassID char(7) Primary ID of class<br />

Semester int(2) Primary Current semester<br />

Year int(4) Primary Current year<br />

CourseID char(6) Primary ID of course<br />

SectionNo char(3) Section number<br />

LecturerID char(5) ID of lecturer<br />

NumStudents char(4) Number of student<br />

A section is used as an instance of a course taught by a lecturer. “A section of a<br />

course + a lecturer + an estimated number of attended students” is that we will<br />

schedule to time-slots of a certain classroom.


94<br />

A.14 Timetable<br />

TABLE A-14 Timetable<br />

Table: Timetable<br />

Field Type Key Description<br />

RoomID char(7) Primary ID of room<br />

Day int(2) Primary Day in a week<br />

Hour int(2) Primary Hour in a day<br />

CourseSectionID char(9) CourseID+ SectionID<br />

Although this timetable looks simple, it stores the results from the whole course<br />

scheduling system. A section of a course will be schedule to successive time-slots.


APPENDIX B<br />

INSTALLING GRID ENVIRONMENT


96<br />

This section presents in detail steps <strong>for</strong> installing and setting up the grid<br />

environment that includes Red Hat Linux, Network Time Protocol, Globus, and a<br />

Certificate Authority.<br />

The following topics are discussed:<br />

- Required software<br />

- Hardware environment<br />

- Operating system installation<br />

- Globus installation and setup<br />

- CA installation and setup<br />

B.1 Required Software<br />

Globus Toolkit 2.2 is used in this study. Globus Toolkit 2.x supports Red Hat<br />

Linux on xSeries and AIX on pSeries. We select Red Hat Linux 9.0 as our host<br />

operating system.<br />

The below is the list of required files to be downloaded:<br />

- Globus Packaging Technology: gpt-2.2.2-src.tar.gz<br />

- Globus client: globus-all-client-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

- Server bundle: globus-all-server-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

- Certificate Authority: globus_simple_ca_bundle-0.9.tar.gz<br />

- Network Time Protocol (NTP): ntp-4.1.1-1.i386.rpm<br />

Place these files in the directory /usr/src. These Globus files can be downloaded<br />

from the address: ftp://ftp.globus.org/pub/gt2/2.2/.<br />

The NTP package already is installed in Red Hat Linux 9.0, so we do not need<br />

to download and install it. However, <strong>for</strong> other versions of Linux, we have to set up the<br />

NTP on hosts.<br />

B.2 Setting Up the Environment<br />

An Ethernet LAN and three Intel Pentium machines were used to build the grid<br />

environment. Figure 3-23 presents this environment with the host names and<br />

functions to be installed in each machine.<br />

The host names are m1, m2, and m3. The machines should have a clock speed<br />

of at least 500 Mhz, at least 128 MB of memory, and at least 8 GB hard drives.


97<br />

There are dependencies among steps of installing and setting up, so they require<br />

to be per<strong>for</strong>med in the order.<br />

The major steps to set up the grid environment include installing:<br />

- Red Hat Linux 9.0 on each machine<br />

- Network Time Protocol server on one machine (here we use m2) and<br />

configuring NTP clients <strong>for</strong> the others (m1 and m3)<br />

- Globus Packaging Technology on each machine<br />

- Globus Server on the m2 and m3 machines<br />

- Globus Client on m1<br />

- Globus Simple Certificate Authority on m2<br />

The grid is configured using the below major steps:<br />

- Sign the certificate requests from all components and users needing them<br />

- Set up gridmap files <strong>for</strong> each system<br />

- Set up automated grid startup<br />

- Set up each GRIS to talk to one GIIS<br />

- Set up MDS security<br />

B.2.1 Naming and Addressing Planning<br />

The Table B-1 shows names, IP addresses, and software to be installed on<br />

machines.<br />

TABLE B-1 Host names, IP addressing, and software<br />

Host name IP Software<br />

m1.kmitnb.ac.th 192.168.10.241 Globus client, centralized scheduling program, MySQL 4.0<br />

m2.kmitnb.ac.th 192.168.10.242 Globus server, CA, and NTP server<br />

m3.kmitnb.ac.th 192.168.10.243 Globus server<br />

We also define the user IDs, groups, and passwords be<strong>for</strong>e implementation,<br />

shown in Table B-2.<br />

The root and globususer ID are used on all machines. Some machines have no<br />

password <strong>for</strong> snobol and adminca ID because the corresponding machine does not<br />

have that user ID installed on it.


98<br />

TABLE B-2 Group, user ID and password<br />

User ID Group ID m1 password m2 password m3 password<br />

Root Root pwrtm1 pwrm2 pwrm3<br />

globususer globus pwgbm1 pwgm2 pwgm3<br />

snobol snobol pwsbm1<br />

adminca adminca pwamm2<br />

The globususer ID is used to run jobs on the grid <strong>for</strong> the user. Since this user ID<br />

has more than eight characters, we will need to install it later, rather than installing it<br />

as part of the Linux install process. The other user IDs can be installed as part of the<br />

Linux installation or later.<br />

The snobol ID is used to submit jobs to the grid.<br />

The adminca ID is used to receive certificate requests <strong>for</strong> the Certificate<br />

Authority. The adminca ID could be used to ftp the certificate requests to the machine<br />

m2 in our install. The certificates will be signed using the root ID on machine m2.<br />

Be<strong>for</strong>e installing the Globus Simple Certificate Authority, we must define the<br />

distinguished name (DN) that will be used by the CA in our environment. Table B-3<br />

describes the distinguished name used <strong>for</strong> the Certificate Authority in our<br />

environment. The distinguished names <strong>for</strong> the users and <strong>for</strong> the Globus services will<br />

be generated automatically.<br />

TABLE B-3 Distinguished name and passphrase<br />

Certificate Authority DN<br />

cn=my test CA, ou=m2.kmitnb.ac.th, ou=demotest, o=grid<br />

Passphrase<br />

mycapw<br />

The distinguished name (DN) and passphrase will be used by the Certificate<br />

Authority to sign certificate requests.<br />

B.2.2 Installing Linux<br />

Install Linux on all of the machines using the “server” install, selecting all<br />

packages and “no firewall”. Each system should use a fixed network IP address with a<br />

corresponding host name, given in Table B-1, and do not use DHCP.<br />

After installing Linux on each system, we create user IDs in Table B-2. The<br />

below is an example of how to add the globususer ID on machine m1.


99<br />

Add a group <strong>for</strong> globus by executing:<br />

groupadd -g 900 globus<br />

Add the user globususer (with password globususer) by executing:<br />

adduser -u 900 -g globus -d /home/globususer -n globususer<br />

Change the globususer ID’s password from globususer to pwsbm1 or other<br />

password by executing:<br />

passwd globususer<br />

B.2.3 Installing Network Time Protocol (NTP)<br />

NTP needs to be installed because the grid needs the clocks on the systems to be<br />

synchronized. The security process creates proxy certificates that are valid <strong>for</strong> specific<br />

times. If the systems do not have their clocks synchronized, then the users may not be<br />

able to use the grid, because the proxy certificates may look like they have expired or<br />

are not yet valid.<br />

On all of the grid machines, install NTP as follows using the root ID:<br />

$ rpm -ivh /usr/src/ntp-4.1.1-1.i386.rpm<br />

If the package is already installed as a part of the Linux distribution, ignore the<br />

error message and continue to set up the NTP server. Proceed by setting up the server<br />

and daemons.<br />

Edit the file /etc/ntp.conf on the machine designated to be the time server,<br />

machine m2, and leave the following four lines as the only un-commented ones,<br />

commenting all others with a leading “#” character:<br />

server 127.127.1.0 # local clock<br />

fudge 127.127.1.0 stratum 10<br />

driftfile /etc/ntp/drift<br />

broadcastdelay 0.008<br />

Also, on the NTP server machine (m2), use the settings ntsysv command to<br />

enable the NTP daemon (ntpd) on the next reboot. We can also start the Red Hat<br />

Service Configuration tool by clicking on Main Menu System Setting Server<br />

Setting Services. Scroll down the list of services on the left side until we get to the<br />

ntpd service. Click on the ntpd service and click Start to run it.<br />

On the other machines in the grid (m1 and m3), change the file /etc/ntp.conf to<br />

leave only the following lines un-commented:<br />

server m2.kmitnb.ac.th<br />

driftfile /etc/ntp/drift


100<br />

broadcastdelay 0.008<br />

authenticate no<br />

Next, execute the following command to have them check <strong>for</strong> the time from the<br />

above server machine m2:<br />

ntpdate -b m2.kmitnb.ac.th<br />

This should be executed at least once per boot, and could be set up to run<br />

periodically using crond and crontab.<br />

B.2.4 Setting Up Host Files and Environment Variables on Each Machine<br />

As root, use an editor to edit the hosts file /etc/hosts on each machine with the<br />

following lines:<br />

127.0.0.1 localhost<br />

192.168.10.241 m1.kmitnb.ac.th m1<br />

192.168.10.242 m2.kmitnb.ac.th m2<br />

192.168.10.243 m3.kmitnb.ac.th m3<br />

Verify machine connectivity after the next reboot, using the ping command to<br />

ping each of the other machines by name.<br />

Edit the file /etc/profile in each machine. Insert the following three lines after<br />

the line in /etc/profile that says “export PATH USER ...”:<br />

export GPT_LOCATION=/usr/local/gpt<br />

export GLOBUS_LOCATION=/usr/local/globus<br />

export PATH=$PATH:$GLOBUS_LOCATION/bin:$GLOBUS_LOCATION/sbin<br />

Log off and log back on the machines after modifying the file /etc/profile so that<br />

the above settings take effect.<br />

B.2.5 Installing the GPT<br />

Log on as root and install GPT on all of the machines. Please ignore all<br />

warnings from Globus:<br />

cd /usr/src<br />

tar -xzvf gpt-2.2.2-src.tar.gz<br />

cd gpt-2.2.2<br />

./build_gpt<br />

ls ${GPT_LOCATION}/sbin | wc -l<br />

The final ls command should show 29 gpt-* executable files.<br />

B.2.6 Installing a Globus Server Bundle<br />

The following is used to install the server bundle on each server machine.<br />

Per<strong>for</strong>m these steps on each machine that will be a server. In our demo, we will use<br />

machines m2 and m3 as servers.


101<br />

As root, run:<br />

cd /usr/src<br />

export PATH=$PATH:$GPT_LOCATION/sbin<br />

gpt-install globus-all-server-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

gpt-postinstall<br />

/usr/local/globus/setup/globus/setup-gsi<br />

y<br />

q<br />

B.2.7 Installing a Globus Client Bundle<br />

The following is used to install the client bundle on any machines that will be<br />

used to query or submit jobs to the grid. In our application, we will install the client<br />

on the machine m1.<br />

As root, run:<br />

cd /usr/src<br />

export PATH=$PATH:$GPT_LOCATION/sbin<br />

gpt-install globus-all-client-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />

gpt-postinstall<br />

/usr/local/globus/setup/globus/setup-gsi<br />

y<br />

q<br />

B.2.8 Installing the Globus Simple Certificate Authority<br />

To install the Globus Simple Certificate Authority, one of the Globus bundles<br />

(server or client) needs to be installed on the machine due to a dependency. We will<br />

install the CA and a Globus server on the machine m2.<br />

As root, run:<br />

cd /usr/src<br />

export PATH=$PATH:$GPT_LOCATION/sbin<br />

gpt-build -nosrc gcc32<br />

gpt-build globus_simple_ca_bundle-0.9.tar.gz gcc32<br />

gpt-postinstall<br />

...<br />

Do you want to keep this as the CA subject (y/n) [y]: n<br />

Enter a unique subject name <strong>for</strong> this CA:<br />

cn=my test CA, ou=m2.kmitnb.ac.th, ou=demotest, o=grid<br />

Enter the email of the CA:<br />

adminca@m2.kmitnb.ac.th<br />

[default 5 years] 1825


102<br />

mycapw<br />

[enter]<br />

During the above process, a hash number is generated and used as part of the<br />

file name. Please note this number <strong>for</strong> use in the next steps. Run the script name<br />

printed at the end of the prior install, substituting the hex hash number printed by the<br />

above process in place of the shown below, adding the “-default” argument:<br />

/usr/local/globus/setup/globus_simple_ca__setup/setup-gsi -default<br />

y<br />

q<br />

The file /root/.globus/simpleCA/private/cakey.pem is the CA’s private key and<br />

should not be given out to anyone else. The file /root/.globus/simpleCA/cacert.pem<br />

contains the CA’s public key.<br />

The following is used to install the CA’s certificate on each of the other grid<br />

machines. /root/.globus/simpleCA/globus_simple_ca__setup-0.9.tar.gz is the<br />

file containing the public CA key and other in<strong>for</strong>mation needed to participate in this<br />

grid. This must be copied to each of the other machines and installed using the gptbuild<br />

command.<br />

First, on machine m2, use ftp to copy the file<br />

/root/.globus/simpleCA/globus_simple_ca__setup-0.9.tar.gz to the directory<br />

/usr/src/ of each of the other grid machines. This can be done in two steps by ftp-ing<br />

them to the directory /home/globususer on each of those machines using globususer<br />

ID. Then, using root, this file can be moved to the directory /usr/src. Next, issue the<br />

following commands on each of those machines as root:<br />

gpt-build /usr/src/globus_simple_ca__setup-0.9.tar.gz<br />

gpt-postinstall<br />

/usr/local/globus/setup/globus_simple_ca__setup/setup-gsi -default<br />

y<br />

q<br />

B.2.9 Requesting and Signing Gatekeeper Certificates <strong>for</strong> Servers<br />

On each of the server machines (m2 and m3), we per<strong>for</strong>m the following steps to<br />

request and sign certificates:<br />

grid-cert-request -host <br />

Use ftp or e-mail (if available and using the adminca ID) to copy the file<br />

/etc/grid-security/hostcert_request.pem to the CA machine and put it into the directory<br />

/root. On the CA machine, as root, sign the certificate using the following:


103<br />

grid-ca-sign -in /root/hostcert_request.pem -out /root/hostcert.pem<br />

mycapw<br />

Then, ftp the file /root/hostcert.pem back to the server machine and place it in<br />

the directory /etc/grid-security.<br />

B.2.10 Requesting and Signing User Certificates<br />

For each user who will use the grid (in our example, user snobol on the client<br />

machine m1), the following procedure must be executed by the user and Certificate<br />

Authority. On the snobol user’s logon, run:<br />

grid-cert-request<br />

<br />

<br />

The user should make up his own passphrase <strong>for</strong> his certificate. He will use this<br />

same passphrase later with the grid-proxy-init command to authenticate with the<br />

grid. In our example, the snobol user’s login password could be used here.<br />

The user must then send the file /home//.globus/usercert_request.pem<br />

to the Certificate Authority (machine m2) <strong>for</strong> signing. On the CA machine (m2), sign<br />

the certificate using root with the following command, adjusting the location of<br />

usercert_request.pem to point to wherever the above request file is now stored on m2:<br />

grid-ca-sign -in usercert_request.pem -out usercert.pem<br />

mycapw<br />

Securely send the file usercert.pem back the requesting user. The user should<br />

put the file usercert.pem into his /home//.globus directory.<br />

The user should also be added to the grid-mapfile (on machine m2 under root)<br />

using the following command (note the backward apostrophe characters next to the<br />

double quote characters):<br />

grid-mapfile-add-entry -dn “`grid-cert-info -f usercert.pem –subject`” –ln globususer<br />

Copy grid-mapfile in /etc/grid-security/grid-mapfile to each of the other servers<br />

(m2) so that all of the servers have this file.<br />

B.2.11 Setting Up the Gatekeepers<br />

On each server (m2 and m3), add the following two lines to the file<br />

/etc/services:<br />

gsigatekeeper 2119/tcp #globus gatekeeper<br />

gsiftp 2811/tcp #globus wuftp<br />

Create the file /etc/xinetd.d/gsigatekeeper on each server, containing the lines:


104<br />

service gsigatekeeper<br />

{<br />

socket_type = stream<br />

protocol = tcp<br />

wait = no<br />

user = root<br />

env = LD_LIBRARY_PATH=/usr/local/globus/lib<br />

server = /usr/local/globus/sbin/globus-gatekeeper<br />

server_args = -conf /usr/local/globus/etc/globus-gatekeeper.conf<br />

disable = no<br />

}<br />

Create the file /etc/xinetd.d/gsiftp on each server, containing the lines:<br />

service gsiftp<br />

{<br />

instances = 1000<br />

socket_type = stream<br />

wait = no<br />

user = root<br />

env = LD_LIBRARY_PATH=/usr/local/globus/lib<br />

server = /usr/local/globus/sbin/in.ftpd<br />

server_args = -l -a -G /usr/local/globus<br />

log_on_success += DURATION USERID<br />

log_on_failure += USERID<br />

nice = 10<br />

disable = no<br />

}<br />

Now reboot all of the machines.<br />

B.3 Setting Up the MDS<br />

We will configure the Monitoring and Discovery Service (MDS) to have one<br />

Grid In<strong>for</strong>mation Index Service (GIIS) in the machine m2, which collects the data<br />

reported by the Grid Resource In<strong>for</strong>mation Servers (GRIS) in all of the machines.<br />

The GRIS servers send in<strong>for</strong>mation about their respective servers to the GIIS. In<br />

the demo application, we will use this to find machines that are not too busy. The user<br />

will be able to query the GIIS from the client machine m1.


105<br />

To set up this structure, we need to modify several configuration files. These<br />

files name the GIIS and GRIS, and show how these components should register with<br />

each other.<br />

Figure 3-24 presents the relationship among the MDS components in our<br />

application.<br />

B.3.1 Setting Up the GIIS and GRIS on the Machine m2<br />

On m2, make the following modifications to the conf files in the directory<br />

$GLOBUS_LOCATION/etc.<br />

In the file grid-info-slapd.conf, name the GIIS on machine m2. Change the<br />

second of the lines:<br />

to<br />

to<br />

database giis<br />

suffix “Mds-Vo-name=site, o=Grid”<br />

database giis<br />

suffix “Mds-Vo-name=m2.kmitnb.ac.th, o=Grid”<br />

In the file grid-info-site-policy.conf, allow registrations from the domain.<br />

Change the below line:<br />

policydata: (&(Mds-Service-hn=site) (Mds-Service-port=2135))<br />

policydata: (&(Mds-Service-hn=*.kmitnb.ac.th) (Mds-Service-port=2135))<br />

In the file grid-info-resource-register.conf, tell the m2 GRIS to register with the<br />

m2 GIIS. Change the two matching lines to the settings shown below:<br />

dn: Mds-Vo-Op-name=register, Mds-Vo-name=m2.kmitnb.ac.th, o=grid<br />

reghn: m2.kmitnb.ac.th<br />

B.3.2 Setting Up the GRIS on m3<br />

On all of the other server machines (here we have only m3), make the following<br />

modifications to the conf files in the directory $GLOBUS_LOCATION/etc.<br />

In the file grid-info-slapd.conf, remove the GIIS server from these machines.<br />

Remove the block of lines starting with the following lines:<br />

database giis<br />

suffix “Mds-Vo-name=site, o=Grid”<br />

In the file grid-info-resource-register.conf, tell the GRIS which GIIS to register<br />

with. Change the two matching lines as shown below:<br />

dn: Mds-Vo-Op-name=register, Mds-Vo-name=m2.kmitnb.ac.th, o=grid<br />

reghn: m2.kmitnb.ac.th


106<br />

B.3.3 Starting the MDS on All of the Servers<br />

Start the MDS on all of the servers (m2 and m3) using:<br />

globus-mds start<br />

This can be automated by putting it in /etc/rc.d/rc.5 per the usual conventions.<br />

Copy the globus-mds script into the directory /etc/init.d/. Then create two symbolic<br />

links as follows:<br />

cp $GLOBUS_LOCATION/sbin/globus-mds /etc/init.d/<br />

cd /etc/rc.d/rc5.d/<br />

ln -s /etc/init.d/globus-mds S92globus-mds<br />

ln -s /etc/init.d/globus-mds K92globus-mds<br />

B.3.4 Setting Up the MDS Client m1<br />

Modify the file $GLOBUS_LOCATION/etc/grid-info.conf lines shown below<br />

so that searches go to the GIIS on machine m2:<br />

GRID_INFO_HOST=”m2.kmitnb.ac.th”<br />

GRID_INFO_ORGANIZATION_DN=”Mds-Vo-name=m2.kmitnb.ac.th, o=Grid”<br />

B.3.5 Setting Up a Secure MDS<br />

So far, we have set up an MDS that permits anonymous access. The grid-infosearch<br />

command should use the -x flag to indicate an anonymous search request.<br />

However, the MDS can be secured so that only certified users can access the GIIS and<br />

only certified server GRISs can register to send in<strong>for</strong>mation to the GIIS. The<br />

following steps should be per<strong>for</strong>med.<br />

B.3.5.1 Requesting and Signing Certificates <strong>for</strong> Each Server Machine<br />

For each of the server machines (m2 and m3) request LDAP certificates, sign<br />

them using the Certificate Authority on m2, and copy the signed certificates to the<br />

proper location. The steps <strong>for</strong> one of the servers (m3) are shown below.<br />

On the server machine (m3) under root, run:<br />

grid-cert-request -service ldap -host m3.kmitnb.ac.th<br />

Copy the request certificate from /etc/grid-security/ldap/ldapcert_request.pem to<br />

the Certificate Authority machine (m2) using ftp or any other desired method. Sign<br />

the certificate using root on m2 substituting the correct locations <strong>for</strong> the request<br />

certificate and signed certificates:<br />

grid-ca-sign -in ldapcert_request.pem -out ldapcert.pem


107<br />

Copy the resulting signed certificate file ldapcert.pem from the Certificate<br />

Authority machine (m2) to the file the server machine (m3) location /etc/gridsecurity/ldap/ldapcert.pem.<br />

B.3.5.2 Changing the conf Files<br />

Change the following configuration files on the servers.<br />

Change $GLOBUS_LOCATION/etc/grid-info-slapd.conf to change the<br />

anonymousbind setting(s) as follows:<br />

anonymousbind yes<br />

Change the files $GLOBUS_LOCATION/etc/grid-info-resource-register.conf<br />

on the servers to require authentication when registering:<br />

bindmethod: ANONYM-ONLY<br />

At this point, the registration "authentication" bind method has been specified.<br />

Who can register with whom and how, but when anonymous bind has been<br />

deactivated, each registrant node must be in<strong>for</strong>med that the GIIS (m2) is authorized to<br />

receive resource in<strong>for</strong>mation.<br />

To authorize m2 (the GIIS) to receive registration in<strong>for</strong>mation, m2's ldap<br />

subject name must be entered in the grid-mapfile file. To get m2's ldap subject name,<br />

we run "grid-cert-info" on m3 as follows, in directory /etc/grid-security, with the<br />

assumption that m3's ldap subject name would be similar.<br />

% grid-cert-info -f /etc/grid-security/ldap/ldapcert.pem -subject<br />

The name was<br />

/O=grid/OU=demotest/OU=m2.kmitnb.ac.th/CN=ldap/m3.kmitnb.ac.th<br />

Since direct editing of the grid-mapfile is discouraged, we run the following<br />

command using the name obtained from above, substituting "m2" <strong>for</strong> "m3."<br />

% grid-mapfile-add-entry \<br />

-dn "/O=grid/OU=demotest/OU=m2.kmitnb.ac.th/CN=ldap/ m2.kmitnb.ac.th" \<br />

-ln globususer<br />

Successful entry was indicated with the following string returned:<br />

(1) entry added<br />

After making all of these changes, the server machines should be rebooted or<br />

the following should be used to restart the MDS on each of the servers (m2 and m3):<br />

globus-mds stop<br />

globus-mds start


108<br />

B.4 Checking the Installation<br />

To check the installations on each machine, as root use the command:<br />

$GPT_LOCATION/sbin/gpt-verify<br />

The following commands can be used on a server machine to see if the GRAM<br />

and GridFTP are listening on their respective ports:<br />

netstat -an | grep 2119<br />

netstat -an | grep 2811<br />

From the client machine (m1) logged on as the user snobol, do the following:<br />

This command sets up the environment so that Globus commands can be issued<br />

by the user. One may want to add this line to one’s login profile:<br />

. $GLOBUS_LOCATION/etc/globus-user-env.sh<br />

This command refreshes the proxy certificate <strong>for</strong> the user (snobol):<br />

grid-proxy-init<br />

<br />

The following commands send a simple job to the server machine. This test<br />

whether jobs can be submitted to each of the server machines:<br />

globus-job-run m2.kmitnb.ac.th “/bin/hostname”<br />

globus-job-run m3.kmitnb.ac.th “/bin/hostname”<br />

To refine the search to look <strong>for</strong> processors having more than 90 percent free of<br />

CPU utilization <strong>for</strong> the last minute, use:<br />

grid-info-search -x “(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=90))”<br />

Now we are ready to install and run the course scheduling application.


APPENDIX C<br />

INSTALLING SOFTWARE


110<br />

This section introduces the steps <strong>for</strong> installing and setting up MySQL 4.0,<br />

J2sdk1.4, Java Cog Kit 1.1, Tomcat 5.0, mod_jk2 and JDBC driver on Redhat Linux<br />

9.0 (RH9). In this study, we will install this software on machine m1.<br />

C.1 Installing MySQL 4.0<br />

First, make sure there is no previous version of MySQL installed on the system.<br />

As root execute the command:<br />

#rpm –q mysql<br />

If there is none, proceed to install phase, otherwise uninstall it by the command:<br />

#rpm –e mysql<br />

Download the rpm packages <strong>for</strong> MySQL’s server, client and dynamic shared<br />

libraries:<br />

- MySQL-server-4.0.24-0.i386.rpm<br />

- MySQL-client-4.0.24-0.i386.rpm<br />

- MySQL-shared-4.0.24-0.i386.rpm<br />

- MySQL-devel-4.0.24-0.i386.rpm<br />

Then install them one by one by using the following commands as root:<br />

#rpm -ivh MySQL-server-4.0.24-0.i386.rpm<br />

#rpm -ivh MySQL-client-4.0.24-0.i386.rpm<br />

#rpm -ivh MySQL-shared-4.0.24-0.i386.rpm<br />

#rpm -ivh MySQL-devel-4.0.24-0.i386.rpm<br />

The MySQL database has been created in /var/lib/mysql.<br />

Initialize MySQL database after installation by typing:<br />

#mysql_install_db<br />

Do not <strong>for</strong>get to set the mysqlclient.so path into search path file /etc/ld.so.conf.<br />

For example, we have:<br />

/usr/lib/libmysqlclient.so<br />

Make sure /etc/ld.so.conf contains:<br />

/usr/lib<br />

Then run<br />

#/usr/sbin/ldconfig<br />

The following instructions are to change the default empty password <strong>for</strong><br />

MySQL users to what we like. For example, change the empty password to ncdanh:<br />

#/usr/bin/mysqladmin –u root password ncdanh<br />

Now, try to log in MySQL with the new password. As root, type:


111<br />

#mysql –u root<br />

Enter password: ncdanh<br />

mysql><br />

C.2 Installing J2sdk1.4<br />

To install J2sdk1.4, do the following steps:<br />

- Download j2sdk-1_4_2_10-linux-i586.bin file and copy it to /usr/local:<br />

[root@m1 root]#cp –p j2sdk-1_4_2_10-linux-i586.bin /usr/local<br />

- Run the above file:<br />

[root@m1 root]#./j2sdk-1_4_2_10-linux-i586.bin<br />

This leaves directory /usr/local/j2sdk-1.4.2_10.<br />

- Insert the following lines inside file /etc/profile or /root/.bashrc:<br />

export JAVA_HOME= /usr/local/j2sdk1.4.2_10<br />

export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar:./<br />

C.3 Installing Java Cog Kit 1.1<br />

This section presents how to download, install and configure the Java CoG Kit<br />

1.1.<br />

Installation is the first step that needs to be accomplished be<strong>for</strong>e the Java CoG<br />

Kit can be used. It ensures that the Java CoG Kit exists on our local machine in a<br />

proper state. After installation, configuration is needed to adjust various parameters<br />

that are specific to our environment.<br />

C.3.1 Downloading the Java Cog Kit<br />

This study uses jglobus stable binary. Using this version, we are interested in<br />

just the jar files without modifying them.<br />

The stable binary distribution of the jglobus is available from the web-site:<br />

http://www.globus.org/cog/java/1.1/cog-1.1-bin.tar.gz.<br />

As root, do the following steps:<br />

- Download cog-1.1-bin.tar.gz file and copy to /usr/local.<br />

- Unpack this file:<br />

[root@m1 root]#cd /usr/local<br />

[root@m1 local]#tar –xzf cog-1.1-bin.tar.gz<br />

A directory named cog-1.1 will be created. This directory will, from now on, be<br />

referred to as


112<br />

C.3.2 Configuration<br />

This section shows how to configure the Java CoG Kit.<br />

C.3.2.1 Environment Variables<br />

The COG_INSTALL_PATH environment variable is used to determine the<br />

installation location of the Java CoG Kit. The COG_INSTALL_PATH should point to<br />

the directory.<br />

It is also highly recommended that you add the /bin directory<br />

to the binary search path (named PATH on most systems).<br />

Add the following commands to the /etc/profile:<br />

export COG_INSTALL_PATH=/usr/local/cog-1.1<br />

export PATH=$ COG_INSTALL_PATH/bin<br />

Log out and log in the RH9 machine to active the above profile.<br />

C.3.2.2 Configuration<br />

Manual configuration of the Java CoG Kit is also possible. Using an Editor, we<br />

create the configuration file named cog.properties and locate it in the directory /.globus.<br />

In our situation, this directory is /home/snobol/.globus (The snobol<br />

user is created in Appendix B).<br />

A sample Java CoG Kit configuration file is shown as follows:<br />

#Java CoG Kit Configuration File<br />

#Mon Dec 26 10:30:30 CST 2005<br />

usercert=/home/snobol/.globus/usercert.pem<br />

userkey=/home/snobol/.globus/userkey.pem<br />

proxy=/tmp/x509up_u800<br />

cacert=/usr/local/globus/etc/grid-security/certificates/42864e48.0<br />

ip=192.168.10.241<br />

It includes a number of important properties. These properties are:<br />

- usercert: points to the location of the Globus user certificate.<br />

- userkey: points to the location of the private key associated with the Globus<br />

user certificate.<br />

- proxy: points to the location of the user proxy. The proxy is located in a<br />

temporary directory, and has its name composed of the string x509up_u and a user id<br />

(OS specific). In the above example, the user id is 1000.<br />

- cacert: contains a comma separated list of certificate authorities that the user<br />

trusts.


113<br />

- ip: represents the IP address of the machine the Java CoG Kit will be run<br />

from.<br />

C.3.2.3 Managing Certificates and Proxies<br />

Currently, the Java CoG Kit provides some GUI-based tools <strong>for</strong> credential<br />

management. These tools need the environment variable COG_INSTALL_PATH to<br />

be set to .<br />

One of the tools is Visual-grid-proxy-init. This tool allows creation of a proxy.<br />

Lifetime and cryptographic strength of the proxy can be specified. Also, the locations<br />

of user’s long-term credentials and the location of the resulting proxy file can be<br />

specified.<br />

FIGURE C-1 Visual-grid-proxy-init<br />

To run this tool, as root, do the following steps:<br />

- Run the following command:<br />

[root@m1 root]# visual-grid-proxy-init<br />

The system will show a dialog box as presented in Figure C-1.<br />

- Input password: pwsbm1.<br />

- Input the options with the following values:<br />

• Proxy lifetime : 12h<br />

• Strength : 512<br />

• Proxy file : /tmp/x509up_u800<br />

• User certificate : /home/snobol/.globus/usercert.pem<br />

• User private key : /home/snobol/.globus/userkey.pem<br />

- Press ”Create” button.<br />

For testing, after running the proxy file, run some following commands:<br />

- Display in<strong>for</strong>mation regarding a proxy


114<br />

[root@m1 root]#grid-proxy-info<br />

- Execute a command on remote machine m2 from local machine m1:<br />

[root@m1 root]#globusrun –r m2.kmitnb.ac.th –o “&(executable=/bin/ls)”<br />

C.4 Installing Tomcat 5.0<br />

C.4.1 Installing Tomcat 5.0<br />

To install Tomcat 5.0, do the following steps:<br />

- Download file jakarta-tomcat-5.0.28.tar.gz and copy it to /usr/local/opt.<br />

[root@m1 root]#cp –p jakarta-tomcat-5.0.28.tar.gz /usr/local/opt<br />

- Change into /usr/local/opt and do the following commands:<br />

[root@m1 root]# cd /usr/local/opt<br />

[root@m1 opt]# tar –zxvf jakarta-tomcat-5.0.28.tar.gz<br />

[root@m1 opt]# ln –s jakarta-tomcat-5.0.28 tomcat<br />

Tomcat has been installed into /usr/local/opt/jakarta-tomcat-5.0.28 and<br />

linked to /usr/local/opt/tomcat.<br />

- Insert the following line inside file /etc/profile or /root/.bashrc:<br />

export CATALINA_HOME=/usr/local/opt/tomcat<br />

Now, log out and then log in the RH9 machine to ensure that all changes<br />

take effect.<br />

C.4.2 Starting and Stopping Tomcat 5.0<br />

First of all, we need to ensure that CATALINA_HOME and JAVA_HOME are<br />

correctly set. To do this, open a terminal and type the following commands:<br />

# echo $JAVA_HOME<br />

# echo $CATALINA_HOME<br />

If we get a blank line, or if the directory points anywhere besides where it is<br />

supposed to, we will have to correct these environment variables first, be<strong>for</strong>e<br />

continuing.<br />

If everything is fine, we can start Tomcat with the following command. As root,<br />

# $CATALINA_HOME/bin/startup.sh<br />

To check if Tomcat is running fine, we should open a browser and point the<br />

URL to http://localhost:8080. We should see the default Tomcat welcome page.<br />

To stop Tomcat, as root,<br />

# $CATALINA_HOME/bin/shutdown.sh


115<br />

If Tomcat does not start and we downloaded the zip file, the cause is probably<br />

due to permissions. Ensure that the following files are executable inside directory<br />

$CATALINA_HOME/bin,<br />

# chmod +x startup.sh<br />

# chmod +x shutdown.sh<br />

# chmod +x tomcat.sh<br />

After making the files executable, we try starting and stopping Tomcat again.<br />

C.5 Installing mod_jk<br />

We will use the Apache server included in RH9, instead of installing another<br />

one. The httpd service was installed in /etc/httpd.<br />

Be<strong>for</strong>e installing mod_jk, we should shutdown both the httpd service and<br />

Tomcat. The httpd service can be shutdown from Menu bar of RH9 (System<br />

Settings/Server Settings/Services), shown in Figure C-2. Select httpd and press<br />

“Stop”.<br />

FIGURE C-2 Service configuration<br />

Now, to install mod_jk do the following steps:<br />

- Download file mod_jk2-2.0.4-2jpp.i386.rpm (We can download at<br />

http://rpm.pbone.net) and copy it to /usr/software.<br />

[root@m1 root]#cd /usr/software<br />

- Install this file:<br />

[root@m1 software]#rpm –ihv mod_jk2-2.0.4-2jpp.i386.rpm


116<br />

The system will automatically put both mod_jk2.so and jkjni.so into<br />

/etc/httpd/modules of RH9.<br />

Now we configure <strong>for</strong> the following files: server.xml, workers2.properties and<br />

httpd.conf.<br />

C.5.1 Editing server.xml File<br />

Open the file CATALINA_HOME/conf/server.xml and look <strong>for</strong> the "non-SSL<br />

Coyote HTTP/1.1 Connector". This is a standard Tomcat-only connector. Comment it<br />

out since we will be using Apache <strong>for</strong> handling HTTP requests:<br />

<br />

<br />

C.5.2 Creating workers2.properties File<br />

Create file /etc/httpd/conf/workers2.properties with the following contents:<br />

[shm]<br />

file=/etc/httpd/logs/shm.file<br />

size=1048576<br />

# socket channel<br />

[channel.socket:localhost:8009]<br />

port=8009<br />

host=127.0.0.1<br />

# worker <strong>for</strong> the connector<br />

[ajp13:localhost:8009]<br />

channel=channel.socket:localhost:8009<br />

Note that the port matches that defined in the file server.xml <strong>for</strong> Tomcat.<br />

C.5.3 Editing httpd.conf File<br />

Open the file /etc/httpd/conf/httpd.conf and add the following lines at the end of<br />

the list of modules loaded into Apache.<br />

LoadModule jk2_module modules/mod_jk2.so<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />


117<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

<br />

JkUriSet worker ajp13:localhost:8009<br />

<br />

For testing, we will create the directory<br />

CATALINA_HOME/webapps/ROOT/scheduling to store the JSP or html files <strong>for</strong> our<br />

system, then create a simple file test.jsp and put this file into the above directory. The<br />

file test.jsp has the following content:<br />

<br />

<br />

<br />

<br />

<br />

Now, try to access it from a web browser as presented in Figure C-3.<br />

FIGURE C-3 Result in the web browser<br />

Tomcat will automatically create the following files:<br />

CATALINA/work/Catalina/localhost/_/org/apache/jsp/scheduling/*.class


118<br />

C.6 Installing JDBC Driver on Linux<br />

Assume that we already have MySQL installed on the Redhat Linux machine.<br />

To access MySQL from Java or JSP programs, we need to download the MySQL<br />

Connector-J from its website. This study uses MySQL Connector/J 3.2.<br />

- Download the file mysql-connector-java-3.2.0-alpha.tar.gz (We can<br />

download it from http://www.mysql.com/products/connector/j/index.html).<br />

- Unzip, untar this tar.gz file and then place the above file into /usr/local.<br />

- Copy the file mysql-connector-java-3.2.0-alpha-bin.jar to the directory<br />

JAVA_HOME/jre/lib/ext.<br />

- Copy the file Driver.class to JAVA_HOME/jre/lib/ext. This will allow the<br />

java interpreter to find the driver.<br />

- Finally, insert the following lines inside file /etc/profile or /root/.bashrc.<br />

export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar:<br />

$JAVA_HOME/jre/lib/ext/mysql-connector-java-3.2.0-alpha-bin.jar:./


APPENDIX D<br />

INSTALLING CENTRALIZED AND DECENTRLIZED COURSE<br />

SCHEDULING PROGRAMS


120<br />

This section presents how to compile the centralized and decentralized course<br />

scheduling programs. These programs are written in C language that was included in<br />

the Redhat Linux 9.0 installation.<br />

D.1 The Centralized Course Scheduling Program<br />

This program will be installed on machine m2. On machine m2, we do the<br />

following steps:<br />

- Copy the file centralizedscheduling.c to /usr/study/coursescheduling.<br />

- Run the following commands as root:<br />

[root@m2 root]#cd /usr/study/coursescheduling<br />

[root@m2 coursescheduling]# gcc –I/usr/include/mysql centralizedscheduling.c –I/usr/lib/mysql –<br />

lmysqlclient –lz –o centralizedscheduling.exe<br />

The file centralizedscheduling.exe has been created in the same directory.<br />

For testing, we can run the following command.<br />

[root@m2 coursescheduling]#./centralizedscheduling.exe<br />

D.2 The Decentralized Course Scheduling Program<br />

This program will be installed on machines m2 and m3. The following steps are<br />

to compile it on machine m2.<br />

- Copy the file decentralizedscheduling.c to /usr/study/coursescheduling.<br />

- Run the following commands as root:<br />

[root@m2 root]#cd /usr/study/coursescheduling<br />

[root@m2 coursescheduling]# gcc –I/usr/include/mysql decentralizedscheduling.c –I/usr/lib/mysql –<br />

lmysqlclient –lz –o decentralizedscheduling.exe<br />

The file decentralizedscheduling.exe has been created in the same directory.


APPENDIX E<br />

JAVA SOURCE CODE FOR GRID SYSTEM


122<br />

All the following files are complied and stored in the directory<br />

/usr/study/gridsystem on machine m1.<br />

GridInfoSearch.java<br />

import java.util.Hashtable;<br />

import java.util.Enumeration;<br />

import java.net.InetAddress;<br />

import java.net.UnknownHostException;<br />

import javax.naming.Context;<br />

import javax.naming.NamingEnumeration;<br />

import javax.naming.NamingException;<br />

import javax.naming.directory.Attribute;<br />

import javax.naming.directory.SearchControls;<br />

import javax.naming.directory.SearchResult;<br />

import javax.naming.directory.Attributes;<br />

import javax.naming.ldap.LdapContext;<br />

import javax.naming.ldap.InitialLdapContext;<br />

import org.globus.mds.gsi.common.GSIMechanism;<br />

// we could add: aliasing, referral support<br />

public class GridInfoSearch {<br />

//Default values<br />

private static final String version = org.globus.common.Version.getVersion();<br />

private static final String DEFAULT_CTX ="com.sun.jndi.ldap.LdapCtxFactory";<br />

private String hostname = "m2.sched.grid.com";<br />

private int port = 2135;<br />

private String baseDN = "mds-vo-name=m2.sched.grid.com, o=grid";<br />

private int scope = SearchControls.SUBTREE_SCOPE;<br />

private int ldapVersion = 3;<br />

private int sizeLimit = 0;<br />

private int timeLimit = 0;<br />

private boolean ldapTrace = false;<br />

private String saslMech;<br />

private String bindDN;<br />

private String password;<br />

private String qop = "auth"; //could be auth, auth-int, auth-conf<br />

private static AvailableHost ob;//static mean that the values of ob will exist until the program finishs<br />

public GridInfoSearch(){<br />

}


123<br />

public String getTheBestHost(){<br />

GridInfoSearch gridInfoSearch = new GridInfoSearch();<br />

String filter = "(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=0))";<br />

gridInfoSearch.search(filter);<br />

ob.displayHost();<br />

System.out.println("the best:"+ob.getBestHost());<br />

return ob.getBestHost();<br />

}<br />

//Search the ldap server <strong>for</strong> the filter specified in the main function<br />

private void search(String filter) {<br />

Hashtable env = new Hashtable();<br />

String url = "ldap://" + hostname + ":" + port;<br />

env.put("java.naming.ldap.version", String.valueOf(ldapVersion));<br />

env.put(Context.INITIAL_CONTEXT_FACTORY, DEFAULT_CTX);<br />

env.put(Context.PROVIDER_URL, url);<br />

if (bindDN != null) {<br />

env.put(Context.SECURITY_PRINCIPAL, bindDN);<br />

}<br />

//use GSI authentication from grid-proxy-init certificate<br />

saslMech = GSIMechanism.NAME;<br />

env.put("javax.security.sasl.client.pkgs",<br />

"org.globus.mds.gsi.jndi");<br />

env.put(Context.SECURITY_AUTHENTICATION, saslMech);<br />

env.put("javax.security.sasl.qop", qop);<br />

LdapContext ctx = null;<br />

//create a new ldap context to hold per<strong>for</strong>m search on filter<br />

try {<br />

ctx = new InitialLdapContext(env, null);<br />

SearchControls constraints = new SearchControls();<br />

constraints.setSearchScope(scope);<br />

constraints.setCountLimit(sizeLimit);<br />

constraints.setTimeLimit(timeLimit);<br />

//store the results of the search in the results variable<br />

NamingEnumeration results = ctx.search(baseDN, filter, constraints);<br />

//displayResults(results);<br />

getAvailableHosts(results);//the results will be stored in ob<br />

} catch (Exception e) {<br />

System.err.println("Failed to search: " + e.getMessage());<br />

} finally {<br />

if (ctx != null) {


124<br />

}<br />

}<br />

}<br />

try { ctx.close(); } catch (Exception e) {}<br />

// Display results of search<br />

private void displayResults(NamingEnumeration results) throws NamingException {<br />

if (results == null) return;<br />

String dn;<br />

String attribute;<br />

Attributes attrs;<br />

Attribute at;<br />

SearchResult si;<br />

}//while<br />

}<br />

//use the results variable from search method and store them in a printable variable.<br />

while (results.hasMoreElements()) {<br />

si = (SearchResult)results.next();<br />

attrs = si.getAttributes();<br />

if (si.getName().trim().length() == 0) {<br />

dn = baseDN;<br />

} else {<br />

dn = si.getName() + ", " + baseDN;<br />

if(dn.substring(0,11).equals("Mds-Host-hn")){<br />

System.out.println("dn: " + dn);<br />

<strong>for</strong> (NamingEnumeration ae = attrs.getAll(); ae.hasMoreElements();) {<br />

at = (Attribute)ae.next();<br />

attribute = at.getID();<br />

if(attribute.equals("Mds-Cpu-Free-1minX100")){<br />

Enumeration vals = at.getAll();<br />

while(vals.hasMoreElements()) {<br />

System.out.println(attribute + ": " + vals.nextElement());<br />

}<br />

}<br />

}<br />

System.out.println();<br />

}<br />

}//else


125<br />

// Display results of search<br />

private void getAvailableHosts(NamingEnumeration results)throws NamingException {<br />

if (results == null) return;<br />

String dn;<br />

String attribute;<br />

Attributes attrs;<br />

Attribute at;<br />

SearchResult si;<br />

int Mds_Cpu_speedMHz=0;<br />

int Mds_Memory_Ram_Total_freeMB=0;<br />

int Mds_Cpu_Total_count=0;<br />

String Mds_Host_hn="";<br />

int Mds_Cpu_Free_1minX100=0;<br />

//use the results variable from search method and store them in a printable variable.<br />

ob=new AvailableHost();<br />

while (results.hasMoreElements()) {<br />

si = (SearchResult)results.next();<br />

attrs = si.getAttributes();<br />

if (si.getName().trim().length() == 0) {<br />

dn = baseDN;<br />

} else {<br />

dn = si.getName() + ", " + baseDN;<br />

if(dn.substring(0,32).equals("Mds-Device-Group-name=processors")){<br />

System.out.println("dn: " + dn);<br />

<strong>for</strong> (NamingEnumeration ae = attrs.getAll(); ae.hasMoreElements();) {<br />

at = (Attribute)ae.next();<br />

attribute = at.getID();<br />

if(attribute.equals("Mds-Cpu-speedMHz")){<br />

Enumeration vals = at.getAll();<br />

Mds_Cpu_speedMHz=Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Cpu_speedMHz);<br />

}else if(attribute.equals("Mds-Memory-Ram-Total-freeMB")){<br />

Enumeration vals = at.getAll();<br />

Mds_Memory_Ram_Total_freeMB=<br />

Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Memory_Ram_Total_freeMB);<br />

}else if(attribute.equals("Mds-Cpu-Total-count")){<br />

Enumeration vals = at.getAll();<br />

Mds_Cpu_Total_count=Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Cpu_Total_count);


126<br />

}//<strong>for</strong><br />

}else if(attribute.equals("Mds-Host-hn")){<br />

Enumeration vals = at.getAll();<br />

Mds_Host_hn=(String)vals.nextElement();<br />

System.out.println(attribute + ": " + Mds_Host_hn);<br />

}else if(attribute.equals("Mds-Cpu-Free-1minX100")){<br />

Enumeration vals = at.getAll();<br />

Mds_Cpu_Free_1minX100=<br />

Integer.parseInt((String)vals.nextElement());<br />

System.out.println(attribute + ": " + Mds_Cpu_Free_1minX100);<br />

}//else if<br />

}//while<br />

//extract hostname from dn<br />

Mds_Host_hn=(String)dn.substring(dn.indexOf("Mds-Host-hn")+12,<br />

dn.indexOf("mds-vo-name")-2);<br />

System.out.println(Mds_Host_hn);<br />

//add hosts into ArrayList<br />

ob.addHost( Mds_Host_hn,<br />

Mds_Cpu_speedMHz,<br />

Mds_Memory_Ram_Total_freeMB,<br />

Mds_Cpu_Total_count,<br />

Mds_Cpu_Free_1minX100);<br />

}<br />

System.out.println();<br />

}<br />

}<br />

}<br />

//Create new instance of MyGridInfoSearch and use specified filter string<br />

public static void main( String [] args ) {<br />

GridInfoSearch gridInfoSearch = new GridInfoSearch();<br />

String filter = "(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=0))";<br />

gridInfoSearch.search(filter);<br />

}


127<br />

AvailableHost.java<br />

import java.util.*;<br />

public class AvailableHost{<br />

ArrayList ar;<br />

public AvailableHost() {<br />

ar = new ArrayList();<br />

}<br />

public void addHost( String Mds_Host_hn,<br />

int Mds_Cpu_speedMHz,<br />

int Mds_Memory_Ram_Total_freeMB,<br />

int Mds_Cpu_Total_count,<br />

int Mds_Cpu_Free_1minX100){<br />

ar.add(new Host( Mds_Host_hn,<br />

Mds_Cpu_speedMHz,<br />

Mds_Memory_Ram_Total_freeMB,<br />

Mds_Cpu_Total_count,<br />

Mds_Cpu_Free_1minX100));<br />

}<br />

public void displayHost(){<br />

<strong>for</strong>(int i=0; i


128<br />

public static void main(String args[]){<br />

AvailableHost ob = new AvailableHost();<br />

ob.addHost("m1.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,70/*%freeCPU*/);<br />

ob.addHost("m2.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,90/*%freeCPU*/);<br />

ob.addHost("m3.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,80/*%freeCPU*/);<br />

ob.displayHost();<br />

ob.displayBestHost();<br />

}//main<br />

}//class AvailableHost<br />

class Host implements Comparable {<br />

private int Mds_Cpu_speedMHz;<br />

private int Mds_Memory_Ram_Total_freeMB;<br />

private int Mds_Cpu_Total_count;<br />

private String Mds_Host_hn;<br />

private int Mds_Cpu_Free_1minX100;<br />

private int Weight;<br />

public Host(<br />

String Mds_Host_hn,<br />

int Mds_Cpu_speedMHz,<br />

int Mds_Memory_Ram_Total_freeMB,<br />

int Mds_Cpu_Total_count,<br />

int Mds_Cpu_Free_1minX100){<br />

}<br />

this.Mds_Host_hn=Mds_Host_hn;<br />

this.Mds_Cpu_speedMHz=Mds_Cpu_speedMHz;<br />

this.Mds_Memory_Ram_Total_freeMB=Mds_Memory_Ram_Total_freeMB;<br />

this.Mds_Cpu_Total_count=Mds_Cpu_Total_count;<br />

this.Mds_Cpu_Free_1minX100=Mds_Cpu_Free_1minX100;<br />

this.Weight=<br />

(int)(Mds_Cpu_Free_1minX100*Mds_Cpu_speedMHz*Mds_Cpu_Total_count/100.00);<br />

public String getHostname(){<br />

return Mds_Host_hn;<br />

}<br />

public int getWeight(){<br />

return Weight;<br />

}


129<br />

public String toString() {<br />

}<br />

return Mds_Host_hn + "\t" + Weight;<br />

//Order by cpu<br />

public int compareTo(Object ob) throws ClassCastException{<br />

Host temp = (Host)ob;<br />

int cpu1=Weight,cpu2=temp.Weight;<br />

if(cpu2>cpu1){<br />

return 1;}<br />

else if(cpu2


130<br />

System.out.println(CentralizedSchedulingJobOut);<br />

System.out.println(gassJob[0].doGetStatus());<br />

// if failed, resubmit it<br />

// waiting <strong>for</strong> the result<br />

System.out.println("\nWaiting <strong>for</strong> the centralized scheduling job to finish");<br />

do {<br />

stillRunningJob=false;<br />

if (jobListeners[0].stillActive()) {<br />

stillRunningJob = true;<br />

}<br />

if(jobListeners[0].fail()){<br />

System.out.println("Resubmit:"+CentralizedSchedulingRSL);<br />

gassJob[0]=new GassJob(centralmachine,false);<br />

CentralizedSchedulingJobOut =<br />

gassJob[0].GlobusRun(CentralizedSchedulingRSL);<br />

jobListeners[0]=gassJob[0].getInteractiveJobListener();<br />

stillRunningJob = true;<br />

}//esle if<br />

System.out.print(".");<br />

delay(1000);<br />

jobs.updateJobId(0, gassJob[0].doGetJobId());<br />

jobs.updateStatus(0,gassJob[0].doGetStatus());<br />

} while (stillRunningJob);<br />

System.out.println("\n");<br />

/********************************<br />

*Decentralized scheduling<br />

********************************/<br />

String gassJobOut;<br />

String deRSL;<br />

String theBestMachine;


131<br />

//request all these jobs<br />

<strong>for</strong>(int i=1; i


132<br />

gassJob[jobCount]=new GassJob(theBestMachine,false);<br />

gassJobOut = gassJob[jobCount].GlobusRun(deRSL);<br />

jobListeners[jobCount]=<br />

gassJob[jobCount].getInteractiveJobListener();<br />

//wait to receive a jobid<br />

//update jobid <strong>for</strong> this Job<br />

jobs.updateJobId(jobCount, gassJob[jobCount].doGetJobId());<br />

//update machine that is used <strong>for</strong> this job<br />

jobs.updateMachine(jobCount, theBestMachine);<br />

jobs.updateStatus(jobCount,gassJob[jobCount].doGetStatus());<br />

stillRunningJob = true;<br />

delay(30000);<br />

}//if<br />

}//<strong>for</strong><br />

System.out.print(".");<br />

delay(5000);<br />

} while (stillRunningJob);<br />

System.out.println("\n");<br />

}<br />

}//main<br />

GassJob.java<br />

import org.globus.gram.*;<br />

import org.grid<strong>for</strong>um.jgss.*;<br />

import org.ietf.jgss.*;<br />

import org.globus.security.gridmap.*;<br />

import org.globus.io.gass.server.*;<br />

import org.globus.util.deactivator.Deactivator;<br />

import COM.claymoresystems.sslg.*;<br />

import xjava.security.interfaces.*;<br />

import cryptix.asn1.lang.*;<br />

/**<br />

* Java CoG Job submission class<br />

**/<br />

public class GassJob implements JobOutputListener<br />

{<br />

private GassServer m_gassServer; // GASS Server: required to get job output<br />

private String m_gassURL = null; // URL of the GASS server<br />

private GramJob m_job = null; // GRAM JOB to be executed


133<br />

private String m_jobOutput = "";<br />

private boolean m_batch = false;<br />

private String m_remoteHost = null;<br />

private GSSCredential m_proxy=null;<br />

// job output as string<br />

// Submission modes: batch=do not wait <strong>for</strong> output<br />

// non-batch=wait <strong>for</strong> output.<br />

// host where job will run<br />

InteractiveJobListener jobListeners;<br />

// Globus proxy used <strong>for</strong> authentication against gatekeeper<br />

// Job output variables:<br />

// Used <strong>for</strong> non-batch mode jobs to receive output from<br />

// gatekeeper through the GASS server<br />

private JobOutputStream m_stdoutStream = null;<br />

private JobOutputStream m_stderrStream = null;<br />

private String m_jobid = null; // Globus job id on the <strong>for</strong>m:<br />

//https://server.com:39374/15621/1021382777/<br />

public GassJob(String Contact, boolean batch) {<br />

m_remoteHost = Contact; // remote host<br />

m_batch = batch; // submission mode<br />

}<br />

/**<br />

* Start the Globus GASS Server. Used to get the output from the server<br />

* back to the client.<br />

*/<br />

private boolean startGassServer(GSSCredential proxy) {<br />

if (m_gassServer != null) return true;<br />

try {<br />

m_gassServer = new GassServer(proxy, 0);<br />

m_gassURL = m_gassServer.getURL();<br />

} catch(Exception e) {<br />

System.err.println("gass server failed to start!");<br />

e.printStackTrace();<br />

return false;<br />

}<br />

m_gassServer.registerDefaultDeactivator();<br />

return true;<br />

}


134<br />

/**<br />

* Init job out listeners <strong>for</strong> non-batch mode jobs.<br />

*/<br />

private void initJobOutListeners() throws Exception {<br />

if ( m_stdoutStream != null ) return;<br />

// job output vars<br />

m_stdoutStream = new JobOutputStream(this);<br />

m_stderrStream = new JobOutputStream(this);<br />

m_jobid = String.valueOf(System.currentTimeMillis());<br />

}<br />

// register output listeners<br />

m_gassServer.registerJobOutputStream("err-" + m_jobid, m_stderrStream);<br />

m_gassServer.registerJobOutputStream("out-" + m_jobid, m_stdoutStream);<br />

return;<br />

/**<br />

* This method is used to notify the implementer when the status of a<br />

* GramJob has changed.<br />

*<br />

* @param job The GramJob whose status has changed.<br />

*/<br />

public void statusChanged(GramJob job) {<br />

try {<br />

if ( job.getStatus() == GramJob.STATUS_DONE ) {<br />

// notify waiting thread when job ready<br />

m_jobOutput = "Job sent. url=" + job.getIDAsString();<br />

// if notify enabled return URL as output<br />

synchronized(this) {<br />

notify();<br />

}<br />

}<br />

}<br />

catch (Exception ex) {<br />

System.out.println("statusChanged Error:" + ex.getMessage());<br />

}<br />

}


135<br />

/**<br />

* This method is used to get the status of the job<br />

*/<br />

public String doGetStatus(){<br />

return jobListeners.doGetStatus();<br />

}<br />

/**<br />

* This method is used to get the status of the job<br />

*/<br />

public String doGetJobId(){<br />

return m_job.getIDAsString();<br />

}<br />

public InteractiveJobListener getInteractiveJobListener(){<br />

return jobListeners;<br />

}<br />

/**<br />

* It is called whenever the job's output<br />

* has been updated.<br />

*<br />

* @param output new output<br />

*/<br />

public void outputChanged(String output) {<br />

m_jobOutput += output;<br />

}<br />

/**<br />

* It is called whenever job finished<br />

* and no more output will be generated.<br />

*/<br />

public void outputClosed() {<br />

}<br />

public synchronized String GlobusRun(String RSL) {<br />

try {<br />

// load default Globus proxy. Java CoG kit must be installed<br />

//and a user certificate setup properly<br />

ExtendedGSSManager manager =<br />

(ExtendedGSSManager)ExtendedGSSManager.getInstance();<br />

GSSCredential m_proxy =<br />

manager.createCredential(GSSCredential.INITIATE_AND_ACCEPT);


136<br />

// Start GASS server<br />

if (! startGassServer(m_proxy)) {<br />

throw new Exception("Unable to stat GASS server.");<br />

}<br />

// setup Job Output listeners<br />

initJobOutListeners();<br />

// Append GASS URL to job String so we can get some output back<br />

String newRSL = null;<br />

// if non-batch, then get some output back<br />

if ( !m_batch) {<br />

newRSL = "&" + RSL.substring(0, RSL.indexOf('&')) +<br />

"(rsl_substitution=(GLOBUSRUN_GASS_URL " + m_gassURL + "))" +<br />

RSL.substring(RSL.indexOf('&') + 1, RSL.length()) +<br />

"(stdout=$(GLOBUSRUN_GASS_URL)/dev/stdout-" + m_jobid + ")" +<br />

"(stderr=$(GLOBUSRUN_GASS_URL)/dev/stderr-" + m_jobid + ")";<br />

}<br />

else {<br />

// <strong>for</strong>mat batching RSL so output can be retrieved later on using any GTK commands<br />

newRSL = RSL +<br />

"(stdout=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stdout anExtraTag)"<br />

+ "(stderr=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stderr anExtraTag)";<br />

}<br />

m_job = new GramJob(newRSL);<br />

// set proxy. CoG kit and user credentials must be installed and set<br />

// up properly<br />

m_job.setCredentials(m_proxy);<br />

// if non-batch then listen <strong>for</strong> output<br />

jobListeners=new InteractiveJobListener(false);<br />

m_job.addListener(jobListeners);<br />

System.out.println("Sending job request to: " + m_remoteHost);<br />

m_job.request(m_remoteHost, m_batch, false);<br />

m_jobOutput = "Job sent. url=" + m_job.getIDAsString();<br />

}<br />

catch (Exception ex) {


137<br />

}<br />

}<br />

if ( m_gassServer != null ) {<br />

// unregister from gass server<br />

m_gassServer.unregisterJobOutputStream("err-" + m_jobid);<br />

m_gassServer.unregisterJobOutputStream("out-" + m_jobid);<br />

}<br />

m_jobOutput = "Error submitting job: " + ex.getClass() + ":"<br />

+ ex.getMessage();<br />

}<br />

// cleanup<br />

//Deactivator.deactivateAll();<br />

return m_jobOutput;<br />

InteractiveJobListener.java<br />

import java.io.*;<br />

import org.globus.gram.Gram;<br />

import org.globus.gram.GramJob;<br />

import org.globus.gram.GramException;<br />

import org.globus.gram.WaitingForCommitException;<br />

import org.globus.gram.GramJobListener;<br />

class InteractiveJobListener extends JobListener {<br />

private boolean quiet;<br />

private boolean finished = false;<br />

private boolean fail=false;<br />

private String strStatus="";<br />

public InteractiveJobListener(boolean quiet) {<br />

this.quiet = quiet;<br />

}<br />

public boolean stillActive() {<br />

}<br />

return !this.finished;<br />

public boolean fail(){<br />

}<br />

return this.fail;


138<br />

// waits <strong>for</strong> DONE or FAILED status<br />

public synchronized void waitFor() throws InterruptedException {<br />

while (!finished) {<br />

wait();<br />

}<br />

}<br />

public synchronized String doGetStatus(){<br />

}<br />

return strStatus;<br />

public synchronized void statusChanged(GramJob job) {<br />

if (!quiet) {<br />

System.out.println("Job: "+ job.getStatusAsString());<br />

}<br />

status = job.getStatus();<br />

strStatus=job.getStatusAsString();<br />

}<br />

}<br />

if (status == GramJob.STATUS_DONE) {<br />

finished = true;<br />

error = 0;<br />

notify();<br />

} else if (job.getStatus() == GramJob.STATUS_FAILED) {<br />

finished = true;<br />

fail=true;<br />

error = job.getError();<br />

notify();<br />

}<br />

JobListener.java<br />

import org.globus.gram.GramJob;<br />

import org.globus.gram.GramJobListener;<br />

abstract class JobListener implements GramJobListener {<br />

protected int status = 0;<br />

protected int error = 0;<br />

public abstract void waitFor() throws InterruptedException;


139<br />

public int getError() {<br />

}<br />

return error;<br />

public int getStatus() {<br />

}<br />

return status;<br />

public boolean isFinished() {<br />

}<br />

return (status == GramJob.STATUS_DONE ||status == GramJob.STATUS_FAILED);<br />

}<br />

Jobs.java<br />

import java.util.*;<br />

public class Jobs{<br />

public static ArrayList ar;<br />

public Jobs() {<br />

ar = new ArrayList();<br />

ar.add(new Job("centralizedscheduling","",<br />

"& (executable =/usr/study/coursescheduling/centralizedscheduling)","m2.sched.grid.com","",0));<br />

ar.add(new Job("decentralizedschedulingER","",<br />

"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />

(arguments=ER)", "","",0));<br />

ar.add(new Job("decentralizedschedulingSC","",<br />

"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />

(arguments=SC)", "","",0));<br />

ar.add(new Job("decentralizedschedulingED","",<br />

"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />

(arguments=ED)", "","",0));<br />

}<br />

//get a job that has index i<br />

public Job getJob(int i){<br />

return (Job) ar.get(i);<br />

}<br />

public int getSize(){<br />

return (int) ar.size();<br />

}


140<br />

//get RSL of Job having index i<br />

public String getRSL(int i){<br />

Job ob= getJob(i);<br />

return ob.getRSL();<br />

}<br />

//get Machine of Job having index i<br />

public String getMachine(int i){<br />

Job ob= getJob(i);<br />

return ob.getMachine();<br />

}<br />

//get Status of Job having index i<br />

public String getStatus(int i){<br />

Job ob= getJob(i);<br />

return ob.getStatus();<br />

}<br />

//update a new jobid <strong>for</strong> the job that has index i<br />

public void updateJobId(int i, String jobid ){<br />

Job oldJob= getJob(i);<br />

ar.set(i, new Job( oldJob.getJobName(),<br />

jobid,<br />

oldJob.getRSL(),<br />

oldJob.getMachine(),<br />

oldJob.getStatus(),<br />

oldJob.getExectime()));<br />

}<br />

//update a new machine <strong>for</strong> the job that has index i<br />

public void updateMachine(int i, String machine){<br />

Job oldJob= getJob(i);<br />

ar.set(i, new Job( oldJob.getJobName(),<br />

oldJob.getJobId(),<br />

oldJob.getRSL(),<br />

machine,<br />

oldJob.getStatus(),<br />

oldJob.getExectime()));<br />

}


141<br />

//update a new jobid <strong>for</strong> the job that has index i<br />

public void updateStatus(int i, String status){<br />

Job oldJob= getJob(i);<br />

ar.set(i, new Job( oldJob.getJobName(),<br />

oldJob.getJobId(),<br />

oldJob.getRSL(),<br />

oldJob.getMachine(),<br />

status,<br />

oldJob.getExectime()));<br />

}<br />

public void displayJobs(){<br />

<strong>for</strong>(int i=0; i


142<br />

class Job {<br />

private String jobname;<br />

private String jobid;<br />

private String RSL;<br />

private String machine;<br />

private String status;<br />

private int exectime;<br />

public Job(String jobname, String jobid, String RSL, String machine, String status, int exectime){<br />

this.jobname = jobname;<br />

this.jobid = jobid;<br />

this.RSL = RSL;<br />

this.machine = machine;<br />

this.status = status;<br />

this.exectime= exectime;<br />

}<br />

public String getJobName(){<br />

return jobname;<br />

}<br />

public String getRSL(){<br />

}<br />

return RSL;<br />

public String getJobId(){<br />

}<br />

return jobid;<br />

public String getMachine(){<br />

}<br />

return machine;<br />

public String getStatus(){<br />

}<br />

return status;<br />

public int getExectime(){<br />

}<br />

return exectime;


143<br />

public void updateJobId(String jobid ){<br />

}<br />

this.jobid = jobid;<br />

public void updateMachine(String machine ){<br />

}<br />

this.machine = machine;<br />

public void updateStatus(String status){<br />

}<br />

this.status = status;<br />

public String toString() {<br />

}<br />

return jobname + "\t" + machine + "\t" + status + "\t" + exectime;<br />

}//class Job


145<br />

BIOGRAPHY<br />

Name : Mr. Nguyen Cong Danh<br />

Thesis Title : Course Scheduling in Multiple Faculties Using a Grid Computing<br />

Environment<br />

Major Field : In<strong>for</strong>mation Technology<br />

Biography<br />

I graduated with a bachelor’s degree in Computer Science from Cantho<br />

University (Vietnam) in 2000.<br />

My contact address is 1 Ly Tu Trong street, Ninh Kieu district, Cantho city,<br />

Vietnam. My e-mail address is ncdanh@cit.ctu.edu.vn.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!