a multi-objective bisexual reproduction genetic algorithm for ...
a multi-objective bisexual reproduction genetic algorithm for ...
a multi-objective bisexual reproduction genetic algorithm for ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
COURSE SCHEDULING IN MULTIPLE FACULTIES USING<br />
A GRID COMPUTING ENVIRONMENT<br />
MR. NGUYEN CONG DANH<br />
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS<br />
FOR THE DEGREE OF MASTER OF SCIENCE (INFORMATION TECHNOLOGY)<br />
GRADUATE COLLEGE<br />
KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK<br />
ACADEMIC YEAR 2005<br />
ISBN 974-19-0543-2<br />
COPYRIGHT OF KING MONGKUT'S INSTITUTE OF TECHNOLOGY NORTH BANGKOK
Name : Mr. Nguyen Cong Danh<br />
Thesis Title : Course Scheduling in Multiple Faculties Using a Grid<br />
Computing Environment<br />
Major Field : In<strong>for</strong>mation Technology<br />
King Mongkut’s Institute of Technology North Bangkok<br />
Thesis Advisor : Assistant Professor Dr. Yaowadee Temtanapat<br />
Academic Year : 2005<br />
Abstract<br />
Course scheduling <strong>for</strong> <strong>multi</strong>ple faculty universities is a large and complex<br />
problem. In these universities, each faculty desires to have its own timetable to use its<br />
resources. However, lecturers, courses, rooms and other resources can be shared<br />
between faculties. The data used <strong>for</strong> the course scheduling thus needs to be shared<br />
across the university. As a result, the constraint conflicts in the timetable can occur<br />
not only in each faculty but also across faculties. The course scheduling problem<br />
becomes more difficult to solve. This study proposes a hybrid centralized and decentralized<br />
approach <strong>for</strong> the course scheduling. The <strong>genetic</strong> <strong>algorithm</strong> and grid<br />
computing environment are used. The <strong>genetic</strong> <strong>algorithm</strong> is to solve the hard and soft<br />
constraints while grid computing environment is used as an infrastructure <strong>for</strong><br />
distributed and parallel computing. The results of this research indicated that the<br />
proposed system can solve most of required constraints and the grid computing can<br />
improve significantly computing per<strong>for</strong>mance of the whole system.<br />
(Total 145 pages)<br />
___________________________________________________________Chairperson<br />
ii
ชื่อ : นายฮูเยน ชอง แดน<br />
ชื่อวิทยานิพนธ : การจัดตารางสอนสําหรับมหาวิทยาลัยที่มีหลายคณะโดยใช<br />
สภาพแวดลอมการประมวลผลแบบกริด<br />
สาขาวิชา : เทคโนโลยีสารสนเทศ<br />
สถาบันเทคโนโลยีพระจอมเกลาพระนครเหนือ<br />
ที่ปรึกษาวิทยานิพนธ : ผูชวยศาสตราจารย ดร. เยาวดี เต็มธนาภัทร<br />
ปการศึกษา : 2548<br />
บทคัดยอ<br />
การจัดตารางสอนสําหรับมหาวิทยาลัยที่มีหลายคณะเปนปญหาที่ใหญและซับซอน ใน<br />
มหาวิทยาลัยเหลานี้ แตละคณะมีความตองการตารางสอนของตนเองโดยใชทรัพยากรที่ตนมีอยู<br />
อยางไรก็ตาม อาจารย วิชา หองและทรัพยากรอื่นก็ยังสามารถที่จะถูกใชงานรวมกันได ขอมูล<br />
สําหรับการจัดตารางสอนจึงจําเปนที่จะตองใชงานรวมกัน ผลก็คือไมใชเพียงจะเกิดความขัดแยงใน<br />
เรื่องของเงื่อนไขของตารางสอนภายในคณะที่ได แตยังรวมไปถึงความขัดแยงของเงื่อนไขที่จะเกิด<br />
ไดในระหวางแตละคณะดวย ทําใหปญหาการจัดตารางสอนในมหาวิทยาลัยเหลานี้จึงเพิ่มความ<br />
ยุงยากยิ่งขึ้นไปอีก ในการศึกษานี้เราจึงนําเสนอวิธีการที่เปนการผสมระหวางการจัดตารางสอน<br />
แบบรวมศูนยและแบบกระจาย โดยใชขั้นตอนวิธีแบบพันธุกรรมรวมกับสภาพแวดลอมการ<br />
ประมวลผลแบบกริด ขั้นตอนวิธีแบบพันธุกรรมใชในการแกปญหาของเงื่อนไขแบบไมผอนปรน<br />
(hard constraint) และแบบอาจผอนปรนไดบาง (soft constraint) สําหรับการประมวลผลใน<br />
สภาพแวดลอมแบบกริดใชเปนพื้นฐานสําหรับการประมวลผลแบบกระจายและแบบขนาน ผลลัพธ<br />
ของงานวิจัยชี้ใหเห็นวา ระบบที่นําเสนอสามารถแกปญหาของเงื่อนไขสวนใหญได และการ<br />
ประมวลผลแบบกริดสามารถเพิ่มประสิทธิภาพการประมวลผลของทั้งระบบไดอยางเห็นไดชัด<br />
(วิทยานิพนธมีจํานวนทั้งสิ้น 145 หนา)<br />
_______________________________ประธานกรรมการที่ปรึกษาวิทยานิพนธ<br />
iii
ACKNOWLEDGEMENTS<br />
First and <strong>for</strong>emost, I would like to thank Assistant Professor Dr. Yaowadee<br />
Temtanapat <strong>for</strong> her support and encouragement throughout my time at King<br />
Mongkut’s Institute of Technology North Bangkok (KMITNB). I deeply appreciate<br />
not only her intelligence, knowledge, and willingness to provide guidance <strong>for</strong> my<br />
thesis, but also her sense of humor and her enthusiasm.<br />
Grateful acknowledgements are addressed to Assistant Professor Dr. Utomporn<br />
Phalavonk, Assistant Professor Dr. Phayung Meesad, Dr. Gareth Clayton, and other<br />
members of the program committee <strong>for</strong> their valuable and constructive comments on<br />
this thesis.<br />
I wish to express my gratitude to all teachers, staffs at KMITNB <strong>for</strong> their<br />
knowledge, encouragement and support during my study.<br />
Thanks to my friends, graduate students, <strong>for</strong> their encouragement. They also<br />
made my time at KMITNB and Thailand an enjoyable experience.<br />
The most sincere thanks to my parents who have always been true believers and<br />
encouraged me in the past two years.<br />
Last but certainly not least, I am especially indebted to my scholarship provider<br />
“DTEC” <strong>for</strong> their financial support that gave me the opportunity to study at KMITNB.<br />
Nguyen Cong Danh<br />
iv
TABLE OF CONTENTS<br />
Page<br />
Abstract (in English)<br />
ii<br />
Abstract (in Thai)<br />
iii<br />
Acknowledgements<br />
iv<br />
List of Tables<br />
vii<br />
List of Figures<br />
viii<br />
Chapter 1. Introduction 1<br />
1.1 Problem Statement and Background 1<br />
1.2 The Objectives of the Study 3<br />
1.3 The Scope of the Study 3<br />
1.4 The Utilizations of the Study 5<br />
Chapter 2. Literature Review 7<br />
2.1 The Course Scheduling Problems 7<br />
2.2 The Related Works on Course Scheduling Problems 10<br />
2.3 Genetic Algorithms 19<br />
2.4 Grid Computing 24<br />
2.5 Summary 31<br />
Chapter 3. Methodology 33<br />
3.1 System Development 33<br />
3.2 Problem Definition 34<br />
3.3 The System Boundary 36<br />
3.4 The Proposed Course Scheduling System 37<br />
3.5 The Database Design 40<br />
3.6 The Proposed Genetic Algorithm 42<br />
3.7 The System <strong>for</strong> Experiment 53<br />
3.8 The Grid Components 54<br />
Chapter 4. Experimental Results 61<br />
4.1 The Data <strong>for</strong> the Experiments 61<br />
4.2 The Experiments and Discussions 66<br />
4.3 The Sample Results 74<br />
v
TABLE OF CONTENTS (CONTINUED)<br />
Page<br />
Chapter 5. Conclusion 79<br />
5.1 Conclusions 79<br />
5.2 Future Works 80<br />
References 81<br />
Appendix A 87<br />
Appendix B 95<br />
Appendix C 109<br />
Appendix D 119<br />
Appendix E 121<br />
Biography 145<br />
vi
LIST OF TABLES<br />
Table<br />
Page<br />
2-1 Courses taught by a department 8<br />
2-2 Teaching assignment 9<br />
2-3 Sample timetable 10<br />
2-4 Tentative list of tools <strong>for</strong> grid computing 27<br />
4-1 Courses fulfilled by each class 61<br />
4-2 Lecturer and classroom assignment 64<br />
4-3 Timetable created by the centralized scheduling program 74<br />
4-4 Timetable created by the decentralized scheduling program <strong>for</strong><br />
Faculty of Engineering 75<br />
4-5 Timetable created by the decentralized scheduling program <strong>for</strong><br />
Faculty of Science 76<br />
A-1 Faculty 88<br />
A-2 Department 88<br />
A-3 Lecturer 89<br />
A-4 Busy Time 89<br />
A-5 Building 90<br />
A-6 Classroom 90<br />
A-7 Classroom group 90<br />
A-8 Department controls classroom 91<br />
A-9 Course 91<br />
A-10 Program 92<br />
A-11 Curriculum 92<br />
A-12 Class 93<br />
A-13 Course section 93<br />
A-14 Timetable 94<br />
B-1 Host names, IP addressing, and software 97<br />
B-2 Group, user ID and password 98<br />
B-3 Distinguished name and passphrase 98<br />
vii
LIST OF FIGURES<br />
Figure<br />
Page<br />
1-1 Shared lecturers, courses, and classrooms 1<br />
1-2 Outline of the basic <strong>genetic</strong> <strong>algorithm</strong> 2<br />
1-3 Sample timetable <strong>for</strong> a classroom 4<br />
2-1 Graph of 12 events 11<br />
2-2 Graph after coloring 11<br />
2-3 Local optimal problem 13<br />
2-4 Simulated annealing <strong>algorithm</strong> 14<br />
2-5 Tabu search <strong>algorithm</strong> 16<br />
2-6 Multi agent system 19<br />
2-7 Encoding chromosome 20<br />
2-8 Example of crossover 21<br />
2-9 Example of mutation 21<br />
2-10 Roulette wheel selection 23<br />
2-11 Rank selection 24<br />
2-12 Application consists of jobs: B, C, D, and E executed in parallel 25<br />
2-13 Application consist of jobs that are networked 26<br />
2-14 Components of Globus Toolkit 2.2 28<br />
2-15 Simple LDAP configuration 28<br />
2-16 Grid components: a high-level perspective 29<br />
3-1 Shared classrooms in a <strong>multi</strong>ple faculty university 35<br />
3-2 Use case diagram of the course scheduling system 36<br />
3-3 Proposed system 38<br />
3-4 System architecture 39<br />
3-5 Entity relation diagram 41<br />
3-6 High level representation of the proposed <strong>genetic</strong> <strong>algorithm</strong> 42<br />
3-7 Sub-timetable of a classroom 43<br />
3-8 Chromosome 44<br />
3-9 Population 44<br />
viii
LIST OF FIGURES (CONTINUED)<br />
Figure<br />
Page<br />
3-10 Creating constraint data 45<br />
3-11 Algorithm <strong>for</strong> initializing a random population 45<br />
3-12 Pseudo code <strong>for</strong> creating a random chromosome 46<br />
3-13 Pseudo code <strong>for</strong> checking small classroom conflicts 47<br />
3-14 Pseudo code <strong>for</strong> checking lecturer’s busy time 47<br />
3-15 Pseudo code <strong>for</strong> detecting conflicts about preferable times 48<br />
3-16 Pseudo code <strong>for</strong> checking conflicts about double scheduled lecturers 48<br />
3-17 Pseudo code <strong>for</strong> checking conflicts about double scheduled classes 49<br />
3-18 Pseudo code <strong>for</strong> checking conflicts about double scheduled courses 49<br />
3-19 Crossover 50<br />
3-20 Pseudo code <strong>for</strong> crossover 51<br />
3-21 Mutation 52<br />
3-22 Pseudo code <strong>for</strong> mutating a chromosome 52<br />
3-23 Hardware and software <strong>for</strong> each machine 53<br />
3-24 MDS configuration 54<br />
3-25 Working with a broker 55<br />
3-26 Centralized scheduling 56<br />
3-27 Job scheduler <strong>for</strong> the grid computing environment 57<br />
3-28 Overview of GRAM and GASS 58<br />
4-1 The average fitness value of hard constraints vs various weights 67<br />
4-2 The average fitness value of soft constraints vs various weights 68<br />
4-3 The average execution time <strong>for</strong> a resultant solution vs population sizes 69<br />
4-4 The GA with various mutation rates 71<br />
4-5 The execution time versus various models 72<br />
4-6 Parallel execution versus serial execution 73<br />
C-1 Visual-grid-proxy-init 113<br />
C-2 Service configuration 115<br />
C-3 Result in the web browser 117<br />
ix
CHAPTER 1<br />
INTRODUCTION<br />
1.1 Problem Statement and Background<br />
1.1.1 Problem Statement<br />
Course scheduling problems are very common, but very difficult to solve in<br />
practice. They are known as constraint optimization problems, NP hard problems,<br />
these are concerned with the allocations, subject to constraints of given resources to<br />
objects in space and time in such a way as to satisfy a possible set of desirable<br />
<strong>objective</strong>s [1, 2, 3]. Courses will be scheduled to time and classrooms so that lecturers<br />
can teach and students can attend these courses without any conflicts. A large number<br />
of researches have been carried out on these problems [1, 2, 3]. However, most of the<br />
researches have focused on solving the problems of universities without the<br />
separation of resources between faculties. The course scheduling <strong>for</strong> a <strong>multi</strong>ple<br />
faculty university still needs more researches [4, 5].<br />
Faculty 1<br />
Lecturers Classrooms<br />
Courses Timetable<br />
Faculty 2<br />
Lecturers Classrooms<br />
Courses Timetable<br />
Shared lecturers, courses, and classrooms<br />
Faculty n<br />
Lecturers Classrooms<br />
Courses Timetable<br />
FIGURE 1-1 Shared lecturers, courses, and classrooms<br />
The course scheduling will become more complex in a <strong>multi</strong>ple faculty<br />
university where each faculty has its own resources such as lecturers, courses, and<br />
classrooms, as illustrated in Figure 1-1. Moreover, these resources can be shared<br />
between faculties. The lecturers working in a faculty can teach courses of other<br />
faculties. The courses can be attended by students who come from different faculties.
2<br />
The classrooms are sometime shared between faculties. Each faculty needs its own<br />
timetable <strong>for</strong> its own resources. As a result, many problems still exist in the course<br />
scheduling related to the shared resources.<br />
Course scheduling itself contains a large number of conflicts and needs a large<br />
amount of processing time. For course scheduling in the <strong>multi</strong>ple faculties, the data<br />
used <strong>for</strong> scheduling also needs to be collected and shared across the faculties. This<br />
study proposes a hybrid centralized and de-centralized approach, <strong>genetic</strong> <strong>algorithm</strong>,<br />
and grid computing environment to the course scheduling problem in <strong>multi</strong>ple faculty<br />
universities. The proposed approach and the <strong>genetic</strong> <strong>algorithm</strong> are used to solve the<br />
NP hard problems. In addition, the grid computing environment is used as<br />
infrastructure <strong>for</strong> distributed and parallel computing.<br />
1.1.2 Background<br />
The <strong>genetic</strong> <strong>algorithm</strong> (GA) is a global search optimization <strong>algorithm</strong> using<br />
parallel points. While searching <strong>for</strong> solutions, the GA uses a fitness function that<br />
affects the direction of the search [6]. The GA evaluates the population by using<br />
<strong>genetic</strong> operators such as selection, crossover, and mutation. The outline of the basic<br />
GA is presented in Figure 1-2.<br />
1 [Start] Generate random population of n chromosomes.<br />
2 [Fitness] Evaluate the fitness f(x) of each chromosome x in the population.<br />
3 [New population] Create a new population by repeating following steps until the new population is<br />
complete.<br />
3.1 [Selection] Select two parent chromosomes from a population according to their fitness (the better<br />
fitness, the bigger chance to be selected).<br />
3.2 [Crossover] With a crossover rate cross over the parents to <strong>for</strong>m new offspring (children). If no<br />
crossover was per<strong>for</strong>med, offspring is the exact copy of parents.<br />
3.3 [Mutation] With a mutation rate mutate new offspring at each locus (position in chromosome).<br />
3.4 [Accepting] Place new offspring in the new population.<br />
4 [Replace] Use new generated population <strong>for</strong> a further run of the <strong>algorithm</strong>.<br />
5 [Test] If the end condition is satisfied, stop, and return the best solution in current population.<br />
6 [Loop] Go to step 2.<br />
FIGURE 1-2 Outline of the basic <strong>genetic</strong> <strong>algorithm</strong> [6]
3<br />
The GA is based on the principle of survival of the fittest members of the<br />
population to produce the solution. The selected individual according to the fitness<br />
level of the problem domain creates the set of solutions. The GA is an iterative<br />
process that is repeated until the convergence criterion is satisfied.<br />
Grid computing, most simply stated, is distributed computing. The goal is to<br />
create the illusion of a simple yet large and powerful self-managing virtual computer<br />
out of a large collection of connected heterogeneous systems sharing various<br />
combinations of resources [7].<br />
Not all applications are suitable <strong>for</strong> the use of the grid computing. We need to<br />
look at considerations <strong>for</strong> an application to run in a grid environment where resources<br />
are dynamically allocated based on actual needs. Normally, an application consists of<br />
jobs that can be executed in parallel, serial, and networked. If an application consists<br />
of several jobs that can be executed in parallel, a grid may be very suitable <strong>for</strong><br />
effective execution on dedicated nodes, especially in the case when there is no or a<br />
very limited exchange of data among the jobs [8].<br />
1.2 The Objectives of the Study<br />
The <strong>objective</strong>s of this study can be defined as follows:<br />
1.2.1 To provide a system that helps <strong>multi</strong>ple faculty universities solve their<br />
course scheduling problems.<br />
1.2.2 To investigate the use of the proposed GA and the grid computing<br />
environment to the course scheduling problem in <strong>multi</strong>ple faculty universities.<br />
1.3 The Scope of the Study<br />
The scope of this study can be defined as follows:<br />
1.3.1 The system must satisfy the following hard constraints:<br />
1.3.1.1 Every course must be scheduled exactly once in a week.<br />
1.3.1.2 For courses at each faculty, values assigned to days in a week are<br />
Monday, Tuesday, Wednesday, Thursday, and Friday. In addition, 8 time-slots is used<br />
in a day. Hours are assigned to time-slots are 08:00-12:00 and 13:00-17:00. No<br />
course is scheduled cross morning and afternoon working sessions. Figure 1-3<br />
presents a sample timetable <strong>for</strong> a classroom.
4<br />
Classroom i<br />
Time-slot Hour Mon Tue Wed Thu Fri<br />
0 08:00-09:00 Course 1 Course 3 Course 15<br />
1 09:00-10:00 Course 1 Course 4 Course 3 Course 15<br />
2 10:00-11:00 Course 1 Course 4 Course 2 Course 15<br />
3 11:00-12:00 Course 2 Course 15<br />
4 13:00-14:00 Course 8 Course 5 Course 6 Course 7<br />
5 14:00-15:00 Course 8 Course 5 Course 6 Course 7<br />
6 15:00-16:00 Course 13 Course 5 Course 19 Course 7<br />
7 16:00-17:00 Course 13 Course 19 Course 7<br />
FIGURE 1-3 Sample timetable <strong>for</strong> a classroom<br />
1.3.1.3 Neither a class nor a lecturer nor a classroom is assigned to more<br />
than one course at the same time.<br />
1.3.1.4 Each course must be booked to a classroom that is large enough to<br />
hold students of that course.<br />
1.3.1.5 In each semester, each class of students studies from list of<br />
courses in the curriculum. All these courses have to be scheduled to different times in<br />
each week so that all students in that class can attend.<br />
1.3.1.6 If a course is attended by students who come from different<br />
classes, it has to be scheduled so that these students can attended this course and their<br />
other courses without any time conflicts.<br />
1.3.1.7 Each lecturer can teach courses in his/her faculty and other<br />
faculties.<br />
1.3.1.8 Lecturers can require some unavoidable working-sessions in a<br />
week. For instance, Dr. Tim cannot teach on Monday morning because of a weekly<br />
meeting. There<strong>for</strong>e, his courses must be scheduled at another time.<br />
1.3.1.9 Each course must be booked to a classroom of a designated<br />
classroom group.<br />
1.3.2 The system tries to satisfy as much as possible the following soft<br />
constraint:<br />
The system avoids booking lecturers’ courses to their undesired time.
5<br />
Unlike the hard constraint in section 1.3.1.8 that the system must satisfy it, the<br />
soft constraint will be satisfied as much as possible. Several conflicts of this soft<br />
constraint in the resultant solution are acceptable.<br />
All hard and soft constraints are applied to all timetables in all faculties.<br />
1.3.3 The Globus Toolkit 2.2 is used as middleware to implement the grid<br />
computing environment [7, 8].<br />
1.3.4 The efficiency of the proposed GA and the grid computing environment<br />
will be evaluated and discussed on the following.<br />
1.3.4.1 The suitability of the proposed GA against the hard constraints<br />
and soft constraints.<br />
1.3.4.2 Per<strong>for</strong>mance measurement of using the grid computing vs. not<br />
using grid computing.<br />
1.4 The Utilizations of the Study<br />
1.4.1 To provide a system that helps <strong>multi</strong>ple faculty universities to resolve their<br />
course scheduling problems.<br />
1.4.2 To investigate the efficiency of using a <strong>genetic</strong> <strong>algorithm</strong> and grid<br />
computing to the course scheduling problem in a <strong>multi</strong>ple faculty university.
CHAPTER 2<br />
LITERATURE REVIEW<br />
In this chapter, course scheduling problems, related works, <strong>genetic</strong> <strong>algorithm</strong>s,<br />
and grid computing are reviewed. Section 2.1 describes the activities that are to<br />
prepare data <strong>for</strong> the course scheduling. Section 2.2 describes the related works,<br />
including existing researches. Section 2.3 presents the basic knowledge about <strong>genetic</strong><br />
<strong>algorithm</strong>s. And finally, section 2.4 presents knowledge about grid computing and the<br />
Globus Toolkit 2.2.<br />
2.1 The Course Scheduling Problems<br />
Course scheduling is a part of a general scheduling problem. It deals with the<br />
satisfactory allocation of resources over time to achieve an organization’s tasks. It is a<br />
decision-making process with the intention of optimizing one or more <strong>objective</strong>s.<br />
In any optimization problem, there are <strong>objective</strong>s, decisions to make, available<br />
resources and related constraints. In the course scheduling problem, available<br />
resources are lecturers, students, courses, classrooms, and time periods. A solution<br />
must group these resources together to create a timetable that satisfies the constraints.<br />
There are two types of constraints: hard constraints and soft constraints. Hard<br />
constraints are conditions that must be satisfied, such as no two distinct courses can<br />
be held at the same time and the same classroom. Soft constraints, however, may be<br />
violated, but should be satisfied as much as possible, such as some lecturers dislike<br />
teaching at certain times.<br />
Course scheduling systems are usually quite varied at each university. This is<br />
based on a set of hard and soft constraints as well as requirements about the<br />
management at each university. This section introduces the activities needed <strong>for</strong> a<br />
basic course scheduling problem. A particular course scheduling system is introduced<br />
in detail in chapter 3.
8<br />
2.1.1 General Activities <strong>for</strong> Course Scheduling<br />
Each university usually has a central course scheduling office where<br />
experienced staffs are working. In each department of the faculties, several staffs also<br />
have similar responsibilities. The course scheduling activities will need the<br />
cooperation of all these staffs.<br />
2.1.2 The Activities of Staffs in Departments of Each Faculty<br />
Each department has the responsibilities of teaching many courses. To prepare<br />
the data <strong>for</strong> course scheduling, each department has to make a teaching plan. The<br />
departments have to know the list of courses and corresponding classes that will study<br />
these courses. The departments will make an assignment based on their own resources<br />
such as lecturers and classrooms. The resources that concern the lecturers are<br />
sometime subject to change. For instance, some lecturers are in training or feel bored<br />
if teaching the same course every semester. Some courses sometime need lecturers<br />
from other faculties. Table 2-1 shows an example of courses taught by a department.<br />
TABLE 2-1 Courses taught by a department<br />
Course Class Number of<br />
Students<br />
Section Lecturer Classroom<br />
Group<br />
CSC211 BSCS04A 30 <br />
CSC211 BSCS05B 35 <br />
CSC221 BSCS04A 30 <br />
CSC210 BSCS04A 30 <br />
CSC110 BSCS04A 30 <br />
CSC113 BSCS04A 30 <br />
CSC113 BSCS04B 35 <br />
In this case, a class is a group of students who study the same program and have<br />
the same enrolment year. A classroom group is a group of classrooms that have the<br />
same function. A course will be scheduled to a classroom of a designed classroom<br />
group. Of course, each department knows how many students will study a particular<br />
course. This helps the department separate the courses into a suitable number of<br />
sections. A section with too many students usually makes it difficult <strong>for</strong> a lecturer to
9<br />
teach effectively. However, in some cases, if the department does not have enough<br />
classrooms or lecturers, a section with a large number of students is acceptable.<br />
Finally, an assignment is created <strong>for</strong> each department, as shown in Table 2-2.<br />
TABLE 2-2 Teaching assignment<br />
Course Class Number of<br />
Students<br />
Section Lecturer Classroom<br />
Group<br />
CSC211 BSCS04A 30 1 00020 CSCCOMLB<br />
CSC211 BSCS05B 35 2 00020 CSCCOMLB<br />
CSC221 BSCS04A 30 1 00012 CSCLECRM<br />
CSC210 BSCS04A 30 1 00012 CSCLECRM<br />
CSC110 BSCS04A 30 1 00015 CSCLECRM<br />
CSC113 BSCS04A 30 1 00023 CSCCOMLB<br />
CSC113 BSCS04B 35 1 00023 CSCCOMLB<br />
In Table 2-2, course CSC211 is studied by two different classes: BSCS04A and<br />
BSCS05B, and it is divided into two distinct sections: 1 and 2. On the other hand,<br />
course CSC113 is also studied by two different classes: BSCS04A and BSCS05B, but<br />
both are mixed to study the same section. CSC211 and CSC113 use classrooms in<br />
group CSCCOMLB whereas CSC221, CSC210, and CSC110 use classrooms in group<br />
CSCLECRM.<br />
2.1.3 Activities of Staffs at the Central Course Scheduling Office<br />
After the central course scheduling office receives all data from the departments,<br />
they will run the course scheduling system to create a timetable. Booking sections of<br />
courses to time-slots in the timetable is a hard job. Its complexity depends on the<br />
complexity of the constraints and rules of each university. The Table 2-3 presents a<br />
sample timetable.<br />
The timetable has to satisfy the constraints. Lecturers who teach several sections<br />
have to be scheduled so that they can teach their sections without any time conflict.<br />
One classroom cannot hold more than one section at the same time. Once a class
10<br />
studies many different courses, these courses also have to be scheduled to different<br />
times. The other constraints are also satisfied.<br />
TABLE 2-3 Sample timetable<br />
Course Section Time Day Classroom Lecturer<br />
CSC211 1 13:00-16:00 W B304A01 00020<br />
CSC211 2 8:00-11:00 W B304A01 00020<br />
CSC221 1 10:00-12:00 T B304A05 00012<br />
CSC210 1 13:00-16:00 M B304A02 00012<br />
CSC110 1 9:00-12:00 F B304A02 00015<br />
CSC113 1 13:00-16:00 T B304A05 00023<br />
2.2 The Related Works on Course Scheduling Problems<br />
Course scheduling is a <strong>multi</strong>-dimensional NP-Complete problem that has<br />
generated hundreds of papers and thousands of researchers who have attempted to<br />
solve this problem. In this section, we discuss some of the primary approaches that<br />
have been applied to general course scheduling problems, scheduling <strong>for</strong> courses and<br />
exams. In practice, the main idea used <strong>for</strong> the course scheduling can be applied to<br />
exam scheduling and vice versa. The approaches can be divided into four groups:<br />
sequential methods, cluster methods, constraint based methods, and meta-heuristic<br />
methods [9].<br />
2.2.1 Sequential Methods<br />
Sequential methods order the events <strong>for</strong> scheduling using heuristics (often graph<br />
coloring heuristics). They assign the ordered events to valid time periods so that no<br />
events in the period are in conflict with each other, i.e. two events which require the<br />
same resource are not scheduled in the same time period [10].<br />
The graph coloring approach usually presents events as different vertices with<br />
an edge between the two vertices where two respective events conflict in some way.<br />
The graph coloring is the process of allocating different colors to each vertex so that<br />
no two adjacent (conflicting) vertices have the same color.
11<br />
The set of vertexes are considered as the set of classes and the edges<br />
corresponding to courses that conflict with each other. For instance, the courses are in<br />
conflict with each other if there is a student who must be in both courses at the same<br />
time. Then, coloring the graph is to assign courses to appropriate periods such that<br />
conflicts are avoided [11].<br />
FIGURE 2-1 Graph of 12 events<br />
The final result of coloring can be presented by a three color graph (denoted by<br />
three different shapes), shown in Figure 2-2.<br />
FIGURE 2-2 Graph after coloring<br />
This result means that the timetable may be constructed in three periods, one<br />
period per color. For larger timetables or graphs this is much less likely to be the case,<br />
since the graph coloring problem is NP-complete. Many researches used a heuristic<br />
<strong>algorithm</strong> to find a reasonable coloring if not an optimal one [12-13].
12<br />
2.2.2 Cluster methods<br />
Cluster methods split the set of events into groups which are conflict-free and<br />
then assign the groups to the time periods to fulfill the other constraints imposed on<br />
the scheduling problem [14]. This technique can also be applied to schedule courses<br />
or exams. The <strong>multi</strong>phase exam scheduling package described by Arani et al. consists<br />
of three phases [15]. In the first phase, clusters of exams are <strong>for</strong>med with the aim of<br />
minimizing the number of students with simultaneous exams. In the second phase,<br />
these clusters are assigned to exam days while minimizing the number of students<br />
with two or more exams per day. Finally the exam days and clusters are arranged to<br />
minimize the number of students with consecutive exams.<br />
The main drawback of these approaches is that the clusters of events are <strong>for</strong>med<br />
and fixed at the beginning of the <strong>algorithm</strong> and that may result in a poor quality<br />
timetable.<br />
2.2.3 Constraint Based Methods<br />
A constraint satisfaction problem (CSP) can be expressed in the following <strong>for</strong>m.<br />
Given a set of variables, a set of possible values that can be assigned to each variable,<br />
and a list of constraints, the CSP will find end values of the variables that satisfy<br />
every constraint. For example, given x = {x 1 , x 2 , x 3 }, possible values of x 1 , x 2 , and x 3<br />
in [0..100], find x 1 , x 2 , and x 3 so that they satisfy constraints: x 1 ≠ x 2 , 2x 1 =10x 2 + x 3 ,<br />
and x 1 x 2 < x 3 .<br />
Constraint based approaches model a course scheduling problem as a set of<br />
variables (i.e. courses) to which values (i.e. resources such as classrooms and time<br />
periods) have to be assigned to satisfy a number of constraints (i.e. classroom sizes<br />
and contiguous periods) [16-18].<br />
Constraint Logic Programming (CLP) is usually used <strong>for</strong> CSP. A labeling<br />
strategy dictates the order in which the search space is traversed, which is vital <strong>for</strong> an<br />
effective search. There are two orderings. The first order in which the variables are<br />
instantiated (i.e. courses placed), and the second order in which the values (i.e. times<br />
and classrooms) are assigned. Programming languages such as PROLOG, LISP, C,<br />
and C++ can be used to CLP.
13<br />
Gueret et al. have implemented a lecture scheduling system in CHIP called<br />
FELIAC [19]. CHIP is a Constraint Logic Programming language based on Prolog,<br />
which provides several types of constraints. CHIP’s new “cumulative” constraints<br />
limit the amount of a resource which can be used at any time, and Gueret et al. uses<br />
this to implement the classroom capacity constraint. Longest courses are scheduled<br />
first in the day which has the shortest total length of clashing lectures. Relaxation of<br />
constraints is essential <strong>for</strong> highly constrained CSPs of the course scheduling. (A<br />
problem in which constraints may be relaxed is called a dynamic CSP.) For each<br />
failed assignment, FELIAC stores a “justification”, which identifies the constraints<br />
which the assignment violated. These justifications are used to undo the effects of a<br />
constraint when it is relaxed.<br />
Using the CLP <strong>for</strong> the course scheduling usually brings advantages such as<br />
short programs and fast execution time.<br />
2.2.4 Meta-heuristic Methods<br />
Over the last two decades a variety of meta-heuristic approaches such as<br />
simulated annealing, tabu search, <strong>genetic</strong> <strong>algorithm</strong>s, and hybrid approaches have<br />
been investigated <strong>for</strong> the course scheduling problem. Meta-heuristic methods begin<br />
with one or more initial solutions and employ search strategies that try to avoid local<br />
optima. All of these search <strong>algorithm</strong>s can produce high quality solutions but often<br />
have a considerable computational cost [20-25].<br />
FIGURE 2-3 Local optimal problem
14<br />
2.2.4.1 Simulated Annealing<br />
Simulated annealing (SA) is a Monte-Carlo technique which can be used to find<br />
solutions <strong>for</strong> optimization problems. The technique simulates the cooling of a<br />
collection of hot vibrating atoms.<br />
The approach comprises of the following:<br />
• A cost function E that associates Energy with the state of the system.<br />
• A ''temperature'' T that decreases slowly<br />
• Various ways to change the state of the system.<br />
Figure 2-4 presents the SA <strong>algorithm</strong>.<br />
1. Generate an initial timetable s.<br />
2. Set the initial best timetable s* = s.<br />
3. Compute cost of s: C(s).<br />
4. Compute initial temperature T 0 .<br />
5. Set the temperature T = T 0 .<br />
6. While stop criterion is not satisfied do:<br />
a. Repeat Markov chain length (M) times:<br />
i. Select a random neighbor s’ to the cu rrent timetable, (s’ Ns).<br />
ii. Set Δ(C) = C(s’) − C(s).<br />
iii. If (Δ(C) > 0 {downhill move}):<br />
• Set s = s’.<br />
• If C(s) < C(s*) then set s* = s.<br />
iv. If (Δ(C)<br />
> 0 {uphill move}):<br />
• Choose a random number r uni<strong>for</strong>mly from [0; 1].<br />
• If r < e −Δ (C)/T then set s = s’<br />
b. Reduce (or update) temperature T.<br />
7. Return the timetable s*.<br />
FIGURE 2-4 Simulated annealing <strong>algorithm</strong><br />
The temperature would increase the cost by Δ(C). Also, s is the current schedule<br />
and s’ is a neighboring schedule obtained from the current neighborhood space (Ns)<br />
by swapping two courses in time and/or space.
15<br />
When the atoms are at a high temperature they are free to move around, and<br />
tend to move with random displacements. However, as the mass cools the interparticle<br />
bonds <strong>for</strong>ce the atoms together. When the mass is cool, no movement is<br />
possible, and the configuration is frozen. If the mass is cooled quickly then chance of<br />
obtaining a low cost solution is lower than if it is cooled slowly (or annealed). At any<br />
given temperature a new configuration of atoms is accepted if the system energy is<br />
lowered. However, if the energy is higher, then the configuration is accepted only if<br />
the probability of such an increase is lower than that expected at the given<br />
temperature [26-27].<br />
The SA <strong>algorithm</strong> has both advantages and disadvantages compared to other<br />
global optimization techniques. It is an extremely popular method and appears<br />
competitive with many of the best heuristics in solving large problems such as course<br />
scheduling, job scheduling, etc. However, it has two drawbacks: one being trapped by<br />
local minima or two taking too long to find a reasonable solution. In order to<br />
overcome these drawbacks, many recent researches combine using SA with other<br />
heuristics such as the <strong>genetic</strong> <strong>algorithm</strong>s or implemented SA as parallel <strong>algorithm</strong>s.<br />
The main aim is to avoid local minima traps and/or to have faster convergence [28-<br />
29].<br />
2.2.4.2 Tabu Search<br />
Tabu search is a meta-heuristic that guides a local heuristic search procedure to<br />
explore the solution space beyond local optimality. Tabu search has been applied<br />
successfully in a number of combinatorial optimization problems, in particular course<br />
scheduling [30-31].<br />
The basic concept of tabu search as described by Glover is as: “A meta-heuristic<br />
superimposed on another heuristic. The overall approach is to avoid entrainment in<br />
cycles by <strong>for</strong>bidding or penalizing moves which take the solution, in the next iteration,<br />
to points in the solution space previously visited (“tabu”)” [32].<br />
Tabu Search is a typical local search that explores its neighborhood <strong>for</strong> a<br />
trans<strong>for</strong>med solution (s’) that can be obtained by a simple local change. Each time<br />
that a solution is entered is known as a move. In simple cases, every move is added<br />
into a tabu list that remembers the N recent moves taken, where N is the size of the<br />
tabu list. A tabu list acts as a short-term memory (like a first in first out) that
16<br />
remembers the N recent moves. Any new move that is already in the tabu list is<br />
avoided, that is, a tabu. This approach prevents the recently tried movements and<br />
prevents the search from cycling round the local optimal area thus driving the search<br />
towards a different direction in the search space, resulting in better opportunity<br />
towards global optimal.<br />
The decision to move to a trans<strong>for</strong>med solution state is usually based on the<br />
steepest descent or mildest ascent in the <strong>objective</strong> function value. With this strategy, a<br />
heuristic accepts a marginal and temporary deterioration in its <strong>objective</strong> function<br />
value in exchange <strong>for</strong> opportunities to escape from a local optimal and move towards<br />
the global optimal, as illustrated in Figure 2-3. Figure 2-5 presents the tabu search<br />
<strong>algorithm</strong>.<br />
1. Generate an initially random but feasible solution s.<br />
2. Repeat:<br />
i. Attempt to find an improved feasible solution s' with the <strong>objective</strong> function<br />
value z(s'), avoid using moves already stored in the tabu list.<br />
ii. Compute the moves from s to s’.<br />
iii. Update tabu list by adding the latest move so that it is set as a tabu <strong>for</strong> some subsequent<br />
moves.<br />
iv. If z(s') < z(s) + (mildest ascent tolerance) then<br />
per<strong>for</strong>m exchanges: s := s', z(s) := z(s')<br />
End if<br />
Until (no improved solution is found) or (stopping criteria is met)<br />
FIGURE 2-5 Tabu search <strong>algorithm</strong><br />
Result z(s') is the best estimated minimum, it does not guarantee to find the<br />
global minimum but stands a better chance as compared to gradient descent approach.<br />
2.2.4.3 Genetic Algorithms<br />
The idea of <strong>genetic</strong> <strong>algorithm</strong>s is based on the evolutionary principle developed<br />
by Darwin [6]. A “population” of feasible timetables is maintained. The “fittest”<br />
timetables are selected to <strong>for</strong>m the basis of the next iteration, or “generation”, thus<br />
improving the overall fitness whilst maintaining diversity.
17<br />
The outline of the basic <strong>genetic</strong> <strong>algorithm</strong> is presented in section 1.1.2.<br />
At present, a large number of researches have used the GAs <strong>for</strong> course<br />
scheduling. The difference of the proposed GAs depends on representing<br />
chromosomes and populations, setting up GAs parameters (population size, crossover<br />
rate, and mutation rate), designing strategies in selection, crossover, and mutation, and<br />
evaluating the fitness function.<br />
The chromosome represents a timetable that is a solution. It can be represented<br />
directly or indirectly. In the <strong>for</strong>mer, the timetable is usually a long bit string of<br />
encoding, that stands <strong>for</strong> when and where each course takes place [33]. Thus, pairs of<br />
selected timetables may be “crossed over” by cutting and splicing the bit strings to<br />
create a new timetable. On the other hand, in the later, the timetable can be<br />
represented by using a data structure such as a <strong>multi</strong>-dimension array or a linked list.<br />
The indirect representation brings the advantage of processing time and simple GA<br />
operations. However, it needs complex processing to exchange and maintain<br />
constraints between the bit string and real timetable. In contrast, the direct<br />
representation needs more processing time <strong>for</strong> GA operations, but it is easy to<br />
maintain a large number of constraints <strong>for</strong> a real timetable. More details of the GAs<br />
will be presented in section 2.3.<br />
2.2.4.4 Hybrid Approaches<br />
The above approaches have been proved that they can create good solutions <strong>for</strong><br />
course scheduling problems. However, as above mentioned, they usually need a long<br />
computational time. In order to overcome this problem, many researchers have used<br />
hybrid approaches.<br />
Tuan et al. have successfully combined constraint programming and simulated<br />
annealing <strong>for</strong> the problem of exam scheduling with real data sets [34]. The proposed<br />
<strong>algorithm</strong> consists of two phases. A constraint programming phase is to provide an<br />
initial solution. This solution is improved by the simulated annealing phase. Tuan et<br />
al. have applied Kempe chain as neighborhood structure, a special technique <strong>for</strong><br />
determining starting temperature T 0 and a mechanism that allows the user to define a<br />
certain period of time in which the <strong>algorithm</strong> should run. The mentioned mechanism<br />
not only helps to increase the efficiency of the SA <strong>algorithm</strong> but also makes simulated<br />
annealing experiments easier.
18<br />
Alkan et al. have developed a Memetic Algorithms (MAs) by combining GAs<br />
and local search techniques, hill climbing [1]. This approach has achieved good<br />
computational per<strong>for</strong>mance. The idea behind hill climbing approach is to create a hill<br />
climbing method <strong>for</strong> each type of constraint and combine them under a single hill<br />
climbing method, denoted as AHC. Starting from a high resolution, select a constraint<br />
type based hill climbing method by using a selection method, giving a higher chance<br />
to an operator of the related constraint type causing more violations. There are 3<br />
improvement strategies. First of all, invoke the selected operator <strong>for</strong> the related type<br />
of constraints, producing a new individual. Second, if this attempt does not make any<br />
improvement on the old one, ignore the new individual. Depending on the constraint<br />
type, a selected block of genes, possibly causing more violations among the other<br />
blocks, are attempted to be corrected. Finally, if this attempt also fails to produce a<br />
better individual, then using the old one, a selected single gene in a block of genes,<br />
possibly causing more violations, is attempted to be corrected. If the fitness of an<br />
individual improves in any case, AHC is reapplied on it.<br />
Some other researchers have also used distributed and parallel computing<br />
models <strong>for</strong> course scheduling problem. One of them is the Multi Agent System model,<br />
which has mentioned to problems that are similar to our study.<br />
The Multi Agent System (MAS) model has been introduced to the course<br />
scheduling problem by Kaplansky et al. [35]. The architecture is composed of a set of<br />
autonomous scheduling agents (SAis) that solve the course scheduling <strong>for</strong> each<br />
department. Each agent has its own course scheduling problem and its own goals. The<br />
scheduling agents must coordinate these goals with the other agents in order to<br />
achieve a solution <strong>for</strong> the whole organization that yields a better result with respect to<br />
the global targets. To achieve a coherent and consistent global solution, the SAs make<br />
use of a sophisticated negotiation protocol among scheduling agents that always ends<br />
in an agreement (not ensured to be optimal). The main functionalities of this protocol<br />
are agent to agent relation definition, a mechanism to approve a chain of request <strong>for</strong><br />
changes (RfC) and an electronic marketplace <strong>for</strong> bidding on preferred common timeslots.
19<br />
As shown in Figure 2-6, first of all, the scheduling agents conduct negotiation<br />
<strong>for</strong> global timetable. Next, the room agent (RA) adds new constraints to the SAis. The<br />
SAis solve the modified problem and send back a new timetable.<br />
FIGURE 2-6 Multi agent system<br />
2.3 Genetic Algorithms<br />
The <strong>genetic</strong> <strong>algorithm</strong>s are inspired by Darwin's theory of evolution. Simply<br />
said, problems are solved by an evolutionary process resulting in a best (fittest)<br />
solution - in other words, the solution is evolved.<br />
Algorithm begins with a set of solutions (represented by chromosomes) called<br />
population. Solutions from one population are taken and used to <strong>for</strong>m a new<br />
population. This is motivated by a hope, that the new population will be better than<br />
the old one. Solutions which are then selected to <strong>for</strong>m new solutions (offspring) are<br />
selected according to their fitness - the more suitable they are the more chances they<br />
have to reproduce [6].<br />
The outline of the basic <strong>genetic</strong> <strong>algorithm</strong> is presented in section 1.1.2.<br />
2.3.1 Biological Background<br />
2.3.1.1 Chromosome<br />
All living organisms consist of cells. In each cell there is the same set of<br />
chromosomes. The chromosomes are strings of DNA and serve as a model <strong>for</strong> the<br />
whole organism. A chromosome consists of genes, blocks of DNA. Each gene<br />
encodes a particular protein. Basically, it can be said that each gene encodes a trait,<br />
<strong>for</strong> example color of eyes. Possible settings <strong>for</strong> a trait (e.g. blue, brown) are called<br />
alleles. Each gene has its own position in the chromosome. This position is called<br />
locus.
20<br />
Complete set of <strong>genetic</strong> material (all chromosomes) is called genome. Particular<br />
set of genes in genome is called a genotype. The genotype with later development<br />
after birth is the base <strong>for</strong> the organism's phenotype, its physical and mental<br />
characteristics, such as eye color, intelligence, etc.<br />
2.3.1.2 Reproduction<br />
During <strong>reproduction</strong>, recombination (or crossover) first occurs. Genes from<br />
parents combine to <strong>for</strong>m a whole new chromosome. The newly created offspring can<br />
then be mutated. Mutation means that the elements of DNA are a bit changed. These<br />
changes are mainly caused by errors in copying genes from parents.<br />
The fitness of an organism is measured by success of the organism in its life<br />
(survival).<br />
2.3.2 Operators of GA<br />
As presented in the outline of the basic <strong>genetic</strong> <strong>algorithm</strong>, the crossover and<br />
mutation are the most important parts of the <strong>genetic</strong> <strong>algorithm</strong>. The per<strong>for</strong>mance is<br />
influenced mainly by these two operators. Be<strong>for</strong>e we can explain more about<br />
crossover and mutation, more in<strong>for</strong>mation on chromosomes will be outlined.<br />
A chromosome should in some way contain in<strong>for</strong>mation about the solution that<br />
it represents. The most common way of encoding is a binary string, as shown in<br />
Figure 2-7.<br />
Chromosome 1 1101100100110110<br />
Chromosome 2 1101111000011110<br />
FIGURE 2-7 Encoding chromosome<br />
Each chromosome is represented by a binary string. Each bit in the string can<br />
represent some characteristics of the solution. Another possibility is that the whole<br />
string can represent a number. Of course, there are many other ways of encoding. The<br />
encoding depends mainly on the solved problem. For example, one can encode<br />
directly integer or real numbers. Sometimes it is useful to encode some permutations<br />
and so on.
21<br />
2.3.2.1 Crossover<br />
After we have decided what encoding we will use, we can proceed to crossover<br />
operation. Crossover operates on selected genes from parent chromosomes and<br />
creates a new offspring. The simplest way of doing that is to choose at random some<br />
crossover point and copy everything be<strong>for</strong>e this point from the first parent and then<br />
copy everything after the crossover point from the other parent.<br />
Crossover can be illustrated as in Figure 2-8 (| is the crossover point).<br />
Chromosome 1 11011 | 00100110110<br />
Chromosome 2 11011 | 11000011110<br />
Offspring 1 11011 | 11000011110<br />
Offspring 2 11011 | 00100110110<br />
FIGURE 2-8 Example of crossover<br />
There are other ways to make a crossover. For example, we can choose more<br />
crossover points. Crossover can be quite complicated and depends mainly on the<br />
encoding of chromosomes. A specific crossover made <strong>for</strong> a specific problem can<br />
improve the per<strong>for</strong>mance of the <strong>genetic</strong> <strong>algorithm</strong>.<br />
2.3.2.2 Mutation<br />
After a crossover is per<strong>for</strong>med, mutation takes place. Mutation is intended to<br />
prevent falling of all solutions in the population into a local optimum of the solved<br />
problem. Mutation operation randomly changes the offspring resulted from crossover.<br />
In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or<br />
from 0 to 1. Mutation can be then illustrated as in Figure 2-9.<br />
Original offspring 1 1101111000011110<br />
Original offspring 2 1101100100110110<br />
Mutated offspring 1 1100111000011110<br />
Mutated offspring 2 1101101100110110<br />
FIGURE 2-9 Example of mutation
22<br />
The technique of mutation (as well as crossover) depends mainly on the<br />
encoding of chromosomes. For example, when we are encoding permutations,<br />
mutation could be per<strong>for</strong>med as an exchange of two genes.<br />
2.3.3 Parameters of GA<br />
2.3.3.1 Crossover and Mutation Rate<br />
There are two basic parameters of a GA: crossover rate and mutation rate.<br />
The crossover rate describes how often a crossover will be per<strong>for</strong>med. If there is<br />
no crossover, offspring are exact copies of parents. If there is crossover, offspring are<br />
made from parts of both parent's chromosome. If crossover rate is 100%, then all<br />
offspring are made by crossover. If it is 0%, whole new generation is made from exact<br />
copies of chromosomes from the old population. Crossover is made in hope that new<br />
chromosomes will contain good parts of old chromosomes and there<strong>for</strong>e the new<br />
chromosomes will be better. However, it is good to leave some part of old population<br />
to survive to next generation.<br />
The mutation rate describes how often parts of chromosome will be mutated. If<br />
there is no mutation, offspring are generated immediately after crossover (or directly<br />
copied) without any change. If mutation is per<strong>for</strong>med, one or more parts of a<br />
chromosome are changed. If mutation rate is 100%, whole chromosome is changed, if<br />
it is 0%, nothing is changed. Mutation generally prevents the GA from falling into<br />
local extremes. Mutation should not occur very often because the GA will in fact<br />
change to random search.<br />
2.3.3.2 Other Parameters<br />
One another important parameter is population size. Population size describes<br />
how many chromosomes are in a population. If there are too few chromosomes, the<br />
GA has few possibilities to per<strong>for</strong>m crossover and only a small part of search space is<br />
explored. On the other hand, if there are too many chromosomes, the GA slows down.<br />
Research shows that after some limit (which depends mainly on encoding and the<br />
problem) it is not useful to use very large populations because it does not solve the<br />
problem faster than moderate sized populations.<br />
2.3.4 Methods of Selection<br />
As presented in the outline of the basic <strong>genetic</strong> <strong>algorithm</strong>, chromosomes are<br />
selected from the population to be parents <strong>for</strong> crossover. The problem is how to select
23<br />
these chromosomes. According to Darwin's theory of evolution the best ones survive<br />
to create new offspring. There are many different methods which a GA can use to<br />
select the chromosomes to be copied over into the next generation, but listed below<br />
are some of the most common methods.<br />
2.3.4.1 Roulette Wheel Selection<br />
Parents are selected according to their fitness. The better the chromosomes are,<br />
the more chances to be selected they have. Imagine a roulette wheel where all the<br />
chromosomes in the population are placed. The size of the section in the roulette<br />
wheel is proportional to the value of the fitness function of every chromosome - the<br />
bigger the value is, the larger the section is. Figure 2-10 shows an example.<br />
Chromosome 4<br />
Chromosome 3<br />
Chromosome 2<br />
Chromosome 1<br />
FIGURE 2-10 Roulette wheel selection<br />
A marble is thrown on the roulette wheel and the chromosome where it stops is<br />
selected. Clearly, the chromosomes with bigger fitness value will be selected more<br />
times.<br />
2.3.4.2 Rank Selection<br />
The previous type of selection has problems when there are big differences<br />
between the fitness values. For example, if the best chromosome fitness is 90% of the<br />
sum of all fitness then the other chromosomes will have very few chances to be<br />
selected.<br />
Rank selection ranks the population first and then every chromosome receives<br />
fitness value determined by this ranking, as shown in Figure 2-11. The worst will<br />
have the fitness 1, the second worst 2, etc, and the best will have fitness N.<br />
Now all the chromosomes have a chance to be selected. However this method<br />
can lead to slower convergence, because the best chromosomes do not differ so much<br />
from others.
24<br />
Chromosome 4<br />
Chromosome 3<br />
Chromosome 2<br />
Chromosome 1<br />
Rank Chromosome<br />
1 Chromosome 1<br />
2 Chromosome 2<br />
3 Chromosome 4<br />
4 Chromosome 3<br />
FIGURE 2-11 Rank selection<br />
2.3.4.3 Steady-State Selection<br />
The steady-state selection works in the following way. In every generation a<br />
few good (with higher fitness) chromosomes are selected <strong>for</strong> creating new offspring.<br />
Then some bad (with lower fitness) chromosomes are removed and the new offspring<br />
is placed in their place. The rest of population survives to new generation.<br />
2.3.4.4 Tournament selection<br />
Subgroups of chromosomes are chosen from a larger population, and members<br />
of each subgroup compete against each other. Only one chromosome from each<br />
subgroup is chosen to reproduce [36].<br />
2.3.4.5 Elitism Selection<br />
Elitism is the name of the method that first copies the best chromosome (or few<br />
best chromosomes) to the new population. The rest of the population can be<br />
constructed in the methods described above. Elitism can rapidly increase the<br />
per<strong>for</strong>mance of the GA, because it prevents a loss of the best found solution.<br />
2.4 Grid Computing<br />
Grid computing is a method <strong>for</strong> sharing computing and data resources. The grid<br />
computing is used <strong>for</strong> distributed systems that shares resources over a local or wide<br />
area network. The specific focus, that underlies grid computing, is coordinated<br />
resource sharing in a <strong>multi</strong>-institutional environment [7-8]. It attempts to combine all<br />
types of resources, including supercomputers and clusters of machines, from <strong>multi</strong>ple<br />
institutions, into a resource that is more powerful than any single resource.<br />
This section will introduce grid computing in the following topics: the<br />
application considerations, the Globus Toolkit, the Globus Toolkit 2.2 and the grid<br />
components.
25<br />
2.4.1 Application Considerations<br />
If an application consists of several jobs that can all be executed in parallel, a<br />
grid may be very suitable <strong>for</strong> effective execution on dedicated nodes, especially in the<br />
case when there is no or a very limited exchange of data among the jobs.<br />
From an initial job, a number of jobs are launched to execute on pre-selected or<br />
dynamically assigned nodes within the grid. Each job may receive a discrete set of<br />
data, and fulfills its computational task independently and delivers its output. The<br />
output is collected by a final job or stored in a defined data store, as shown in Figure<br />
2-12.<br />
FIGURE 2-12 Application consists of jobs: B, C, D, and E executed in parallel<br />
Many other applications can consist of jobs are executable in parallel, but there<br />
are interdependences between them. For example, shown in Figure 2-13, jobs B and C<br />
can be launched simultaneously, but they heavily exchange data with each other. Job<br />
F cannot be launched be<strong>for</strong>e B and C have completed, whereas job E or D can be<br />
launched upon completion of B or C respectively. Finally, job G finally collects all<br />
output from the jobs D, E, and F, and its termination and results then represent the<br />
completion of the grid application.<br />
For such applications, a possible approach is to do more analysis to determine<br />
how best to split the application into individual jobs, maximizing parallelism. It also<br />
adds more dependencies on the grid infrastructure services such as schedulers and<br />
brokers, but once that infrastructure is in place, the application can benefit from the<br />
flexibility and utilization of the virtual computing environment. The use of a job flow
26<br />
management service not only can handle the synchronization of the individual results,<br />
but also can create a loose coupling between the jobs to avoid high inter-process<br />
communication and reduces the overheads in the grid [37].<br />
FIGURE 2-13 Application consists of jobs that are networked<br />
2.4.2 The Globus Toolkit<br />
In the most general case, grid resources are supposed to be geographically<br />
distributed and to be owned by different organizations, each with proprietary policies<br />
regarding security, resource allocation, plat<strong>for</strong>m maintenance, and so on. Such an<br />
environment depends strongly upon the construction of a robust infrastructure of<br />
fundamental services, able to smooth out mismatches between different machines,<br />
security policies, scheduling policies, operating systems, and plat<strong>for</strong>ms. Besides this,<br />
resource sharing must be highly controlled, with resource providers and consumers<br />
clearly defining what is shared, who is allowed to share, and the conditions under<br />
which sharing occurs. Furthermore, access to resources has to be carefully scheduled<br />
in order to extract the maximum per<strong>for</strong>mance from the available resources, and<br />
applications should have the possibility of tailoring their behavior dynamically, in<br />
order to cope with resource failure, a highly probable event in such a variegated<br />
context.<br />
All these requirements can be summarized by the need to allow transparent<br />
access to resources, as if they belonged to a single, unified “metacomputer.” There are<br />
many grid projects worldwide aimed at achieving this ambitious goal, shown in Table<br />
2-4. Globus Toolkit is one of the most promising: it is rapidly becoming the de facto<br />
standard grid middleware [39]. Globus Toolkit is a joint initiative of the University of
27<br />
Southern Cali<strong>for</strong>nia, the Argonne National Lab, and the University of Chicago. It<br />
provides an open-source set of services addressing fundamental grid issues, such as<br />
security, in<strong>for</strong>mation discovery, resource management, data management, and<br />
communication. Due to its flexibility and high interoperability with the most<br />
widespread technologies used <strong>for</strong> distributed and parallel computing, Globus Toolkit<br />
has been chosen <strong>for</strong> our problem.<br />
TABLE 2-4 Tentative list of tools <strong>for</strong> grid computing [37]<br />
A bag of services giving basic software infrastructure <strong>for</strong> grid<br />
GLOBUS development: http://www.glohus.org<br />
LEGION<br />
An object-based project at the University of Virginia:<br />
http:/ilegion.virginia.edu<br />
UNICORE<br />
The UNi<strong>for</strong>m Interface to COmputing Resources is a European<br />
grid computing ef<strong>for</strong>t: http://www .unicore.org<br />
NETSOLVE<br />
A client/server system oriented to solve computational science<br />
problems: http://icl.cs.utk.edu/netsolve/<br />
CACTUS<br />
An open-source problem-solving environment designed <strong>for</strong><br />
parallel computing and collaborative software development:<br />
http://www.catcuscode.org<br />
The next section introduces about Globus Toolkit 2.2 that will be use <strong>for</strong> our<br />
study.<br />
2.4.3 Globus Toolkit 2.2<br />
The Globus Toolkit 2.2 provides [7]:<br />
2.4.3.1 A set of basic facilities needed <strong>for</strong> grid computing, shown in Figure<br />
2-14.
28<br />
FIGURE 2-14 Components of Globus Toolkit 2.2<br />
a) Security: Single sign-on, authentication, authorization, and<br />
secure data transfer.<br />
b) Resource Management provides support <strong>for</strong>:<br />
- Resource allocation.<br />
- Submitting jobs: Remotely running executable files and<br />
receiving results.<br />
- Managing job status and progress.<br />
c) Data Management provides a system to transfer files among<br />
machines in the grid and <strong>for</strong> the management of these transfers.<br />
d) In<strong>for</strong>mation Services includes directory services of available<br />
resources and their status. It provides support <strong>for</strong> collecting in<strong>for</strong>mation in the grid<br />
and <strong>for</strong> querying this in<strong>for</strong>mation, based on the Lightweight Directory Access<br />
Protocol (LDAP), shown in Figure 2-15.<br />
FIGURE 2-15 Simple LDAP configuration [7]
29<br />
2.4.3.2 Application Programming Interfaces (APIs) to the above facilities.<br />
2.4.3.3 C bindings are needed to build and compile programs.<br />
In addition to the above, which are considered the core of the toolkit, other<br />
components are also available that complement or build on top of these facilities. For<br />
instance, Globus provides a rapid development kit known as Commodity Grid (CoG),<br />
which supports technologies such as Java, Python, Web services, CORBA, and so on.<br />
2.4.4 Grid Components<br />
This section describes high level the primary components of the grid<br />
environment, shown in Figure 2-16. Depending on the grid design and its expected<br />
use, some of these components may or may not be required, and in some cases they<br />
may be combined to <strong>for</strong>m a hybrid component.<br />
FIGURE 2-16 Grid components: a high-level perspective [8]<br />
2.4.4.1 Grid portal<br />
The grid portal provides an interface <strong>for</strong> a user to launch applications that will<br />
utilize the resources and services provided by the grid.<br />
The current Globus Toolkit does not provide any services or tools to generate a<br />
portal.<br />
2.4.4.2 Security<br />
A major requirement <strong>for</strong> the grid computing is security. There must be<br />
mechanisms to provide security including authentication, authorization, and data<br />
encryption.
30<br />
The Grid Security Infrastructure (GSI) component of the Globus Toolkit<br />
provides robust security mechanisms. The GSI includes an OpenSSL implementation.<br />
It also provides a single sign-on mechanism. There<strong>for</strong>e, once a user is authenticated, a<br />
proxy certificate is created and used when per<strong>for</strong>ming actions within the grid.<br />
2.4.4.3 Broker<br />
Once authenticated, a user will launch the application. Based on the parameters<br />
provided by the user, the broker will identify the available and appropriate resources<br />
to utilize within the grid.<br />
Though there is no broker implementation provided by Globus Toolkit, there is<br />
an LDAP-based in<strong>for</strong>mation service. This service is called Grid Resource In<strong>for</strong>mation<br />
Service (GRIS), or more commonly the Monitoring and Discovery Service (MDS).<br />
2.4.4.4 Scheduler<br />
Once the resources have been identified, the next logical step is to schedule the<br />
individual jobs to run on the individual nodes within the grid.<br />
Globus Toolkit does not have its own job scheduler to find available resources<br />
and automatically send jobs to suitable machines. Instead, it provides the tools and<br />
interfaces needed to implement schedulers.<br />
2.4.4.5 Data Management<br />
If any data (including application modules) must be moved or made accessible<br />
to the nodes where the application’s jobs will execute, then there needs to be a secure<br />
and reliable method <strong>for</strong> moving files and data to various nodes within the grid.<br />
The Globus Toolkit contains a data management component that provides such<br />
services. This component, known as Grid Access to Secondary Storage (GASS),<br />
includes facilities such as GridFTP. The GridFTP is built on top of the authentication<br />
and authorization standard FTP protocol, but adds additional functions and utilizes the<br />
GSI <strong>for</strong> user authentication and authorization.<br />
2.4.4.6 Job and Resource Management<br />
This component provides the services to actually launch a job on a particular<br />
resource, check on its status, and retrieve its results when it is complete.<br />
The Grid Resource Allocation Manager (GRAM) of Globus Toolkit provides<br />
the services <strong>for</strong> this component.
31<br />
2.5 Summary<br />
The course scheduling is a part of a general scheduling problem. It schedules<br />
courses to periods of time and classrooms so that lecturers can teach and students can<br />
attend their courses without any conflicts.<br />
Many researches have been carried out on course scheduling problems. The<br />
different approaches can be divided into four groups: sequential methods, cluster<br />
methods, constraint based methods, and meta-heuristic methods. Although they have<br />
successfully solved the course scheduling problems, not many researches have<br />
focused on solving the problems of the <strong>multi</strong>ple faculty universities. In such<br />
universities, conflicts can occur across faculties due to both sharing and non sharing<br />
resources.<br />
This study proposes a new system <strong>for</strong> <strong>multi</strong>ple faculty universities. The<br />
proposed system will apply a hybrid centralized and de-centralized approach, a GA,<br />
and a grid computing environment. The GA is a global search optimization <strong>algorithm</strong><br />
using parallel points, so it is suitable and flexible to satisfy constraints in the required<br />
timetable. The combination between the GA and the hybrid centralized and decentralized<br />
approach is able to create solutions without any conflicts between the<br />
resources around the university. The grid computing environment is used as<br />
infrastructure <strong>for</strong> sharing computing and data over a local or wide area network.
CHAPTER 3<br />
METHODOLOGY<br />
The general course scheduling problem, <strong>objective</strong>s and scope of our study were<br />
presented in chapter 1. This chapter presents the plan and the phases of analyzing,<br />
designing and implementing the proposed course scheduling system.<br />
3.1 System Development<br />
In order to obtain the expected <strong>objective</strong>s, we will follow the six phases below:<br />
3.1.1 Phase 1: Systems Analysis<br />
a) To verify the requirements and the <strong>objective</strong>s of the study.<br />
b) To choose the tools and software to be used to develop the system.<br />
3.1.2 Phase 2: Design<br />
a) To study the <strong>genetic</strong> <strong>algorithm</strong>s and grid computing environment.<br />
b) To specify the proposed system.<br />
c) To design the interfaces and the module’s functions.<br />
d) To design the database.<br />
e) To design a prototype <strong>for</strong> connecting between users and the system.<br />
3.1.3 Phase 3: Implementation<br />
a) To study the <strong>genetic</strong> <strong>algorithm</strong>s and grid computing environment.<br />
b) To install the correct software to develop the system.<br />
c) To install the database.<br />
d) To implement the prototype <strong>for</strong> connecting between users and the<br />
system.<br />
e) To implement the designed modules.<br />
3.1.4 Phase 4: Testing<br />
a) To test the system.<br />
b) To run a demonstration.<br />
c) To do some evaluations on the effectiveness of the system.
34<br />
3.1.5 Phase 5: Measurement<br />
a) To evaluate the suitability of the proposed GA against the hard and soft<br />
constraints.<br />
b) To measure the per<strong>for</strong>mance of using grid computing vs. not using grid<br />
computing.<br />
3.1.6 Phase 6: Documentation<br />
a) To write the user manuals.<br />
b) To write reports.<br />
3.2 Problem Definition<br />
The more realistic the problem the more complex it is <strong>for</strong> the developers to<br />
overcome. In the real world, course scheduling problems are very complex. For<br />
<strong>multi</strong>ple faculty universities, they are really hard jobs. Also they are strongly based on<br />
the particular requirements of each university. This study will focus on the common<br />
requirements of <strong>multi</strong>ple faculty universities. However, the proposed system with its<br />
solved constraints is strong enough so that not many changes are needed to obtain a<br />
good system <strong>for</strong> a particular university.<br />
The <strong>multi</strong>ple faculty universities where we have the chance to collect data are<br />
King Mongkut’s Institute Technology North Bangkok in Thailand and Cantho<br />
Univesity in Vietnam. At these universities, each faculty has several departments.<br />
Each department has its own resources that include lecturers, courses, and classrooms.<br />
Each department desires to construct a timetable using its own resources. These<br />
resources can also be shared by other departments in the university.<br />
Each course that is usually divided into many sections belongs to just one<br />
department. However, it is almost always the case that a significant part of the<br />
curriculum of one department is provided by another department. If a course is<br />
provided to more than one department it must be scheduled at the same time-slot on<br />
all the departmental timetables that use this course. These courses are called shared<br />
courses.<br />
Similarly we have shared classrooms. Each department desires to use its own<br />
classrooms. However, some courses sometime need to use the shared classrooms of<br />
the faculty, common buildings or other faculties. There<strong>for</strong>e, a group of classrooms
35<br />
used <strong>for</strong> a particular course has to be assigned be<strong>for</strong>e scheduling. A course has to be<br />
scheduled to these classrooms without any conflicts between the departments. Figure<br />
3-1 illustrates an arrangement <strong>for</strong> the shared classrooms.<br />
Dept1.<br />
l<br />
Faculty 1<br />
Shared classrooms<br />
Deptn. classrooms<br />
Faculty n<br />
Dept1. classrooms Deptm. classrooms<br />
Shared classrooms<br />
Common building<br />
Shared classrooms<br />
FIGURE 3-1 Shared classrooms in a <strong>multi</strong>ple faculty university<br />
Each department has a responsibility to teach a number of courses. There<strong>for</strong>e, a<br />
teaching assignment <strong>for</strong> its lecturers has to be done. Some lecturers from other<br />
faculties are invited to teach. Now we have shared lecturers who are teaching courses<br />
in more than one faculty.<br />
Also we do not schedule <strong>for</strong> the individual students. However, we will handle<br />
student problems at a class level instead. The students are divided into classes and<br />
expected to chronologically follow their advised pre-requisites in the curriculum of<br />
their respective program. Our responsibility is to schedule a timetable to help the<br />
students fulfill the courses in their curriculum. We say that two courses are in conflict<br />
with each other if they belong to the same curriculum and are scheduled at the same<br />
time.<br />
In many cases, a course can be attended by students who come from classes of<br />
different departments or faculties. This means that the students who study this shared<br />
course can have different curriculums. In any case, we have to schedule so that the<br />
students can attend their courses.
36<br />
All the above problems can be presented in a brief and clear way, included in<br />
section 1.3, the set of hard and soft constraints solved in our study.<br />
3.3 The System Boundary<br />
The system boundary gives a brief application overview through a use case<br />
diagram in Figure 3-2.<br />
Assign classrooms to departments<br />
Faculty Staff<br />
Department Staff<br />
Lecturer<br />
Create classes<br />
Create combined classes<br />
Assign teaching<br />
Schedule courses<br />
View timetable<br />
Request busy time<br />
Request preferable time<br />
University In<strong>for</strong>mation<br />
System<br />
Central Office Staff<br />
FIGURE 3-2 Use case diagram of the course scheduling system<br />
There are five actors in the use case diagram of the course scheduling system.<br />
3.3.1 Lecturer: This is a person who can request his/her busy and preferable<br />
times so that the course scheduling programs try to avoid these times. The lecturers<br />
can view the timetable after it is completed.<br />
3.3.2 Department Staff: This is a person who works in the department. The<br />
department staff prepares classes to be scheduled. Based on the teaching plan, and the<br />
department staff will assign lecturers to teach the courses.<br />
3.3.3 Faculty Staff: This is a person who works in the faculty. The faculty staff<br />
can assign the classrooms to the departments in the faculty. Each department can use<br />
these classrooms <strong>for</strong> its courses. This allocation sometime does not need to be done<br />
in each semester.
37<br />
3.3.4 Central Office Staff: This is a person who works in the central office of the<br />
university. The central office staff will activate the course scheduling system to<br />
schedule all courses <strong>for</strong> the whole university.<br />
3.3.5 University In<strong>for</strong>mation System: This is a system actor that includes a<br />
database and a database management system. It is responsible <strong>for</strong> storing and<br />
managing the data of the university.<br />
3.4 The Proposed Course Scheduling System<br />
This section presents the proposed system through a scheduling strategy and the<br />
system architecture.<br />
3.4.1 The Scheduling Strategy<br />
In general, there are two approaches to the course scheduling problem, namely<br />
centralized and de-centralized. Both approaches have their own advantages and<br />
disadvantages.<br />
The centralized approach uses software to schedule the timetable <strong>for</strong> the entire<br />
of the university. This software has a global view of the problem, presenting all the<br />
in<strong>for</strong>mation necessary to most effectively create a timetable. Un<strong>for</strong>tunately, the size<br />
of the problem is too big, so the course scheduling program is unable to create a good<br />
timetable. Furthermore, the co-operation between faculties and the central scheduling<br />
office is also a difficult problem [5].<br />
The de-centralized approach lets each faculty schedule its own timetable using<br />
its own resources. However, this approach rapidly becomes infeasible when there are<br />
shared resources across faculties. This approach can only work well if the<br />
communication between faculties is reduced to a minimum [5]. Our study proposes a<br />
hybrid centralized and de-centralized approach. The centralized course scheduling<br />
program only schedules <strong>for</strong> shared resources whereas the decentralized course<br />
scheduling program schedules <strong>for</strong> the remaining resources of each faculty. The<br />
proposed course scheduling system is shown in Figure 3-3.<br />
The proposed system is designed to consist of jobs that are processed in parallel.<br />
After clients at all faculties send their own data used in course scheduling to the<br />
Central Manager Host, a client in the central office will run the course scheduling<br />
program. In turn, the following three stages will be per<strong>for</strong>med automatically.
38<br />
Client at a<br />
faculty<br />
Client at the<br />
central office<br />
Data submission<br />
<strong>for</strong> the course<br />
scheduling<br />
Job submission<br />
<strong>for</strong> the course<br />
scheduling<br />
Central<br />
Manager Host<br />
Data and job<br />
<strong>for</strong> execution<br />
Execution Host<br />
schedules <strong>for</strong><br />
Facuty 1<br />
. . . .<br />
Execution Host<br />
schedules <strong>for</strong><br />
Facuty n<br />
FIGURE 3-3 Proposed system<br />
3.4.1.1 Stage 1<br />
The Central Manager Host requests a job to execute the centralized course<br />
scheduling program on a remote Execution Host to create a timetable of the shared<br />
resources across the faculties. The result will be written into the database on the<br />
Central Manager Host.<br />
3.4.1.2 Stage 2<br />
The Central Manager Host requests jobs to execute the decentralized course<br />
scheduling program in parallel on remote Execution Hosts. In this stage, each remote<br />
host uses the fixed timetable created in Stage 1 as an initial input, and then tries to<br />
find a timetable <strong>for</strong> each faculty. The decentralized course scheduling program must<br />
give results that do not conflict with the centralized scheduling output. The results<br />
from all remote nodes will also be written into the database on the Central Manager<br />
Host.<br />
3.4.1.3 Stage 3<br />
The Central Manager Host requests a job to merge the results in the database of<br />
Central Manager Host. Finally, the entire timetable <strong>for</strong> the whole university will be<br />
created.<br />
We will use a <strong>genetic</strong> <strong>algorithm</strong> to develop both the centralized course<br />
scheduling program and decentralized course scheduling program. The grid<br />
computing environment is used as infrastructure <strong>for</strong> distributed and parallel<br />
computing.
39<br />
3.4.2 The System Architecture<br />
The system can be separated into two subsystems: Front End system and Grid<br />
system, shown in Figure 3-4.<br />
The Front End system is based on the 3-tier architecture. This will be used by<br />
the clients in the faculties and in the central office to prepare the data be<strong>for</strong>e<br />
scheduling. It includes three components: GUIs, application program and data<br />
storage.<br />
By separating the system into 3 tiers, they can work independently. The<br />
presentation tier involves the graphical user interface. The application tier consists of<br />
the application manager. The last tier, the database tier, consists of a database and its<br />
database management system (DBMS).<br />
Presentation tier<br />
Clients at the faculties<br />
Client at the central office<br />
Client 1 Client 2 Client n Client n+1<br />
Application<br />
tier<br />
Application<br />
Manager<br />
Scheduling<br />
Engine<br />
Commodity Grid<br />
Search available machines<br />
Send data to machines<br />
Send jobs to machines<br />
Distribute job/data<br />
Globus Grid<br />
Environment<br />
Node 1<br />
Node n<br />
Get results from jobs<br />
submitted to the machines<br />
Node 2<br />
Database tier<br />
DBMS<br />
Results<br />
DB<br />
FIGURE 3-4 System architecture
40<br />
The Grid system is only used by a client in the central office to start the<br />
scheduling engine that then activates the grid system. The grid system is also a 3-tier<br />
architecture of the following: Client, Commodity Grid (CoG), and Globus Grid<br />
Environment (Grid).<br />
The Client tier is the interface between users and the grid system. It is<br />
responsible <strong>for</strong> receiving command to run the scheduling engine.<br />
The CoG tier acts as an interface between the Grid and Client tier. Using the<br />
facilities provided by the API, the CoG is able to allow secure file transfers and also<br />
takes the responsibility of job scheduling and monitoring the status of jobs. There is<br />
one job <strong>for</strong> centralized course scheduling, and many other jobs <strong>for</strong> decentralized<br />
course scheduling. When a job needs to be per<strong>for</strong>med, the CoG will look <strong>for</strong> available<br />
nodes to assign it to. The Management and Discovery Service (MDS) provided by the<br />
Globus Toolkit will provide in<strong>for</strong>mation about the available nodes within the Grid.<br />
Next, it checks and locates the sequence data to available machines (nodes).<br />
Security (GSI) and reliability is important when transferring data to various nodes<br />
within the Grid. In order to provide <strong>for</strong> such requirements, the Globus Toolkit<br />
provides a data management component, known as Grid Access to Secondary Storage<br />
(GASS), <strong>for</strong> secure and reliable data transfers. It uses the GridFTP protocol to<br />
facilitate the checking and transport of data files.<br />
The CoG tier monitors the progress of each job and polls regularly to check if<br />
the jobs are finished. The Grid Resource Allocation Manager (GRAM) provides the<br />
necessary services <strong>for</strong> these processes. Once compiled, the results will be stored into<br />
the database, and their status will be shown to the Client.<br />
3.5 The Database Design<br />
In the database design, we present an entity relation diagram, shown in Figure<br />
3-5. This design also helps us understand more clearly the system requirements.<br />
Data relations between the entities in the above diagram are very important.<br />
Since the course scheduling programs will not work directly on the database, it works<br />
on the data structures instead. There<strong>for</strong>e, the data and its relations need to be loaded<br />
from the database into the corresponding data structures be<strong>for</strong>e scheduling. The
41<br />
course scheduling programs have to know the data relations so that they are able to<br />
look <strong>for</strong> enough in<strong>for</strong>mation to satisfy the hard and soft constraints.<br />
Building<br />
BuildingID<br />
Faculty<br />
FacultyID<br />
1<br />
BuildingName<br />
1<br />
FacultyName<br />
ClassroomGroupID<br />
has<br />
DeptID<br />
has<br />
consists<br />
of<br />
ClassroomGroupName<br />
N<br />
1 ClassroomGroup N controls M Department<br />
1 1 1<br />
DeptName<br />
ClassroomID<br />
N M<br />
Classroom<br />
M<br />
ClassroomName<br />
Seats<br />
has<br />
N<br />
has<br />
N<br />
Course<br />
has<br />
semester<br />
Curriculum<br />
year<br />
N M<br />
Program 1 has N Class<br />
ProgramID<br />
ProgramName<br />
NumSemesters<br />
Semester<br />
DayinWeek<br />
Year<br />
Time-slot<br />
N<br />
1<br />
classID<br />
className<br />
enrolYear<br />
CourseID<br />
CourseName<br />
Credits<br />
Kind<br />
takes<br />
numStudents<br />
hasTimeTable<br />
consists<br />
of<br />
M N N<br />
CourseSection<br />
N<br />
has<br />
teaches<br />
N<br />
Lecturer<br />
1<br />
SectionNo<br />
Semester<br />
Year<br />
NumStudents<br />
has<br />
1<br />
N<br />
BusyTime<br />
LecturerID<br />
LecturerName<br />
DayinWeek<br />
Working<br />
Session<br />
State<br />
Gender<br />
FIGURE 3-5 Entity relation diagram<br />
The data dictionary is presented in Appendix A.
42<br />
3.6 The Proposed Genetic Algorithm<br />
This section presents the proposed <strong>genetic</strong> <strong>algorithm</strong> that includes <strong>genetic</strong><br />
representations, processes to create constraint data, initialize a random population,<br />
evaluate fitness function, crossover, and mutate chromosomes. Figure 3-6 presents<br />
the high level representation of this <strong>algorithm</strong>.<br />
Start<br />
Create constraint data<br />
Initialize a random population of n chromosomes<br />
Is fitness f(x) of<br />
Yes<br />
first chromosome x<br />
satisfied<br />
No<br />
Delete some bad chromosomes (low fitness value)<br />
Output<br />
Solution<br />
Stop<br />
No<br />
Population size < n<br />
Yes<br />
Select 2 chromosomes as parent<br />
Crossover<br />
Breed a new chromosome (offspring)<br />
Mutate<br />
Evaluate the fitness value of the offspring<br />
Add the offspring to the population in order of fitness value<br />
FIGURE 3-6 High level representation of the proposed <strong>genetic</strong> <strong>algorithm</strong><br />
To generate an optimum result, we apply the <strong>genetic</strong> <strong>algorithm</strong> to create one or<br />
more solutions that have various fitness values. Based on comparisons, changes, and<br />
creation of new solutions, we can choose a good solution. Of course, we can obtain a<br />
variety of good solutions.<br />
Shown in Figure 3-6, we will insert a new chromosome that has just mutated<br />
into the right position in the population. The crossover and mutation operations are
43<br />
repeated to change the population until the first chromosome of the population obtains<br />
a good enough fitness value f(x). However, if repeated too many times, these<br />
operations will create a large number of chromosomes that is above the preset<br />
population size. To solve this problem, once the number of chromosomes increases up<br />
to a critical value n, we will kill off half of the population.<br />
3.6.1 Representations<br />
This section defines the <strong>genetic</strong> representations of the chromosomes, the genes,<br />
and the population.<br />
3.6.1.1 Chromosomes<br />
A chromosome is a solution, in our case a timetable of the university. The<br />
timetable contains a number of sub-timetables of classrooms. Each classroom has its<br />
own sub-timetable.<br />
Classroom i<br />
Hour Mon Tue Wed Thu Fri<br />
08:00-09:00 Course 1 Course 2<br />
09:00-10:00 Course 1 Course 2<br />
10:00-11:00 Course 1 Course 2<br />
11:00-12:00<br />
13:00-14:00 Course 3<br />
14:00-15:00 Course 3<br />
15:00-16:00 Course 4<br />
16:00-17:00 Course 4<br />
FIGURE 3-7 Sub-timetable of a classroom<br />
We use a classroom as a ‘storage space’. Courses are scheduled to the time-slots<br />
<strong>for</strong> each classroom. This direct representation creates a visual view. Here courseis are<br />
courses that are divided into sections. These sections are assigned to be taught by<br />
particular lecturer and studied by a class of students. A look at the data relations in the<br />
database, we have course → lecturer, course→ class. This is a good foundation <strong>for</strong><br />
checking the hard and soft constraint conflicts.<br />
The Figure 3-8 illustrates an entire chromosome.
44<br />
Chromosome x i<br />
Fitness = f(x i )<br />
Classroom n<br />
Mon Tue Wed Thu Fri<br />
Classroom 2<br />
Class1<br />
Class2<br />
Mon<br />
Class1<br />
Tue Wed<br />
Class2<br />
Thu Fri<br />
Classroom Class1 1<br />
Class1 Class2 Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Course 1<br />
Class1 Class3 Course 2<br />
Class2<br />
Course 1 Class3 Course 2<br />
Course 1<br />
Class3 Course 2<br />
Class4<br />
Class3<br />
Class4<br />
Course 3<br />
Course 3<br />
Class4<br />
Class4<br />
Course 4<br />
Course 4<br />
A gene=A time-slot<br />
FIGURE 3-8 Chromosome<br />
Each chromosome x i has a fitness value f(x i ). We will use this value to look <strong>for</strong> a<br />
good chromosome (a good solution).<br />
3.6.1.2 Genes<br />
A gene is a time-slot in a chromosome, so there are many genes in a<br />
chromosome. Each gene contains a 0 if no course is held at that position. On the<br />
contrary, the gene contains a course. If changing value of the genes, we will create a<br />
new chromosome.<br />
3.6.1.3 Population<br />
A population is a set of n chromosomes, or n solutions. The population is<br />
always sorted decreasingly in the order of the chromosome’s fitness value. As a<br />
result, the first chromosome has the highest fitness value, thus a candidate <strong>for</strong> the best<br />
solution, as illustrated in Figure 3-9.<br />
Chromosome x n<br />
Fitness = f(x n )<br />
A population<br />
Chromosome x 2<br />
Fitness = f(x 2 )<br />
Chromosome x 1<br />
Fitness = f(x 1 )<br />
FIGURE 3-9 Population
45<br />
3.6.2 Creating Constraint Data<br />
Figure 3-10 presents processes to prepare data be<strong>for</strong>e scheduling.<br />
User Input<br />
Faculties<br />
Departments<br />
Curriculums<br />
Classrooms<br />
Lecturers<br />
Courses<br />
Classes<br />
Assignments<br />
Constraint data<br />
are stored into<br />
Data Structures<br />
GA Parameters<br />
GA<br />
Timetable<br />
FIGURE 3-10 Creating constraint data<br />
All data, and their relations, plus the GA parameters have to be prepared be<strong>for</strong>e<br />
running the GA. The data about each faculty, department, curriculum, classroom,<br />
lecturer, course, class and teaching assignment are entered into the database by the<br />
users. Then automatically a program module will extract and store these data into the<br />
data structures. The list data structures are used because they are flexible <strong>for</strong><br />
designing the <strong>algorithm</strong>s. The GA parameters such as the population size, mutation<br />
and crossover rates, and penalty costs <strong>for</strong> the unsatisfied constraints are also prepared<br />
as variables in the program.<br />
3.6.3 Initializing a Random Population of Chromosomes<br />
Start<br />
Initialize an empty population<br />
Population size < n<br />
Yes<br />
Create a random chromosome x<br />
No<br />
Stop<br />
Evaluate the fitness f(x) <strong>for</strong> new chromosome x<br />
Add the new chromosome x to the population in order of fitness<br />
FIGURE 3-11 Algorithm <strong>for</strong> initializing a random population
46<br />
A population is a list of n chromosomes. Starting with an empty population, one<br />
after another we create and add new random chromosomes into this population.<br />
A pseudo code <strong>for</strong> creating this is given in Figure 3-12.<br />
For each course<br />
n= number of time-slots needed <strong>for</strong> this class (= number of credits)<br />
Repeat<br />
Randomly select a classroom in list of classrooms that are permissible <strong>for</strong> this course<br />
Search n free time-slots in the chosen classroom<br />
If (n free time-slots are found)<br />
Book the current course to these time-slots<br />
Until (course is booked)<br />
FIGURE 3-12 Pseudo code <strong>for</strong> creating a random chromosome<br />
3.6.4 Evaluating Fitness Function<br />
As represented above, each chromosome x has a fitness value f(x). In this<br />
section, we discuss how to find f(x).<br />
Assume that we have m hard constraints. Let Hc i denote the number of<br />
conflicted hard constraints i, where i = 1..m. Each hard constraints i is assigned a<br />
penalty cost Penalty_hc i . We use f 1 (x) to denote the fitness value of hard constraints.<br />
1<br />
f1(<br />
x)<br />
= Eq. 3-1<br />
m<br />
1+<br />
Hc Penalty _ hc<br />
∑<br />
i=<br />
1<br />
i<br />
Similarly assume that we have n soft constraints. Let Sc j denote the number of<br />
conflicted soft constraints j, where j = 0..n. Each soft constraint j is assigned a penalty<br />
cost Penalty_sc j . We use f 2 (x) to denote the fitness value of soft constraints.<br />
∑<br />
j=<br />
1<br />
j<br />
sc j<br />
i<br />
1<br />
f<br />
2<br />
( x)<br />
= Eq. 3-2<br />
n<br />
1+<br />
Sc Penalty _<br />
Thus, if a chromosome has a lower number of conflicts, f 1 (x) and f 2 (x) will have<br />
a higher fitness value. We use f(x) to denote the fitness value of the chromosome x.<br />
f ( x)<br />
= W ( ) ( )<br />
Eq. 3-3<br />
1<br />
f1<br />
x + W2<br />
f<br />
2<br />
x
47<br />
where W 1 and W 2 denote weights of hard and soft constraints respectively. We will<br />
do experiments to identify suitable values <strong>for</strong> these weights.<br />
In this study, we design a course scheduling <strong>algorithm</strong> to find solutions that<br />
have the highest fitness value f(x). This is a heuristic search, so we will look at<br />
solutions having high fitness value until we meet a solution whose f 1 (x) is equal to 1.<br />
3.6.4.1 Checking Conflicts about Small Classrooms<br />
Each course must be booked to a classroom that is large enough to hold the<br />
students of that course.<br />
A pseudo code <strong>for</strong> checking this is given in Figure 3-13.<br />
Count=0<br />
For each classroom<br />
For each day in a week<br />
For each time-slot in a day<br />
If ( number of students attending the course held in the current classroom ><br />
number of seats of the current classroom) Count =Count+1<br />
FIGURE 3-13 Pseudo code <strong>for</strong> checking small classroom conflicts<br />
3.6.4.2 Checking Conflicts Regarding Lecturer’s Busy Time<br />
The courses taught by a lecturer cannot be booked to his/her busy workingsessions<br />
in a week.<br />
A pseudo code <strong>for</strong> checking this is given in Figure 3-14.<br />
Count=0<br />
For each lecturer<br />
For each day in a week<br />
For each time-slot in a day<br />
For each classroom<br />
If (the current lecturer teaching the class is held in the current classroom and at<br />
this time-slot ) and (the current lecturer is busy at this time) Count=Count+1<br />
FIGURE 3-14 Pseudo code <strong>for</strong> checking lecturer’s busy time
48<br />
Lecturers register their busy time. This checking will compare their busy time<br />
with the time that is used to book the lecturers courses. If duplicated, an error is<br />
counted.<br />
3.6.4.3 Checking Conflicts about Preferable Time<br />
Some lecturers dislike teaching in some working-sessions in a week. The system<br />
should try to avoid booking their courses to this time.<br />
The course scheduling program tries to book lecturers’ courses in these desired<br />
time periods. Any conflict will be counted as a soft constraint.<br />
A pseudo code <strong>for</strong> checking this is given in Figure 3-15.<br />
Count=0<br />
For each lecturer<br />
For each day in a week<br />
For each time-slot in a day<br />
For each classroom<br />
If (the current lecturer teaching the class is held in the current classroom and at<br />
this time-slot ) and (the current lecturer dislikes teaching at this time) Count=Count+1<br />
FIGURE 3-15 Pseudo code <strong>for</strong> detecting conflicts about preferable times<br />
3.6.4.4 Checking Conflicts about Double Booked Lecturers<br />
A lecturer cannot teach more than one course at the same time.<br />
A pseudo code <strong>for</strong> checking this is given in Figure 3-16.<br />
Count=0<br />
For each lecturer<br />
For each day in a week<br />
For each time-slot in a day<br />
Booked=0<br />
For each classroom<br />
If (course held in this classroom is taught by the current lecturer) Booked = Booked+1<br />
If (Booked>1) Count=Count+1<br />
FIGURE 3-16 Pseudo code <strong>for</strong> checking conflicts about double scheduled lecturers
49<br />
At the same time, if a lecturer is booked to teach more than one course, a<br />
conflict will be counted.<br />
3.6.4.5 Checking Conflicts about Double Scheduled Classes<br />
Courses attended by the same class of students have to be scheduled to different<br />
time so that all students of that class can attend.<br />
A pseudo code <strong>for</strong> checking this is given in Figure 3-17.<br />
For each class<br />
For each day in a week<br />
For each time-slot in a day<br />
Count=0<br />
For each classroom<br />
If (the course held in the current time-slot is studied by the current class)<br />
Count=Count+1<br />
FIGURE 3-17 Pseudo code <strong>for</strong> checking conflicts about double scheduled classes<br />
At the same time, a class cannot be booked to study more than one course. If<br />
double scheduled, a conflict will be counted.<br />
3.6.4.6 Checking Conflicts about Double Scheduled Courses<br />
Every course must be scheduled exactly once in a week.<br />
A pseudo code <strong>for</strong> checking this is given in Figure 3-18.<br />
Count=0<br />
For each course<br />
Booked=0<br />
For each classroom<br />
For each day in a week<br />
For each time-slot in a day<br />
If (the current course is held in this time period)<br />
Booked=Booked+1<br />
If (Booked> the number of credits of the current course) Count=Count+1<br />
FIGURE 3-18 Pseudo code <strong>for</strong> checking conflicts about double scheduled courses
50<br />
A course is booked to the time-lots based on the number of its credits. In our<br />
study, the number of credits of a course can be 1, 2, 3, or 4. We stipulate that if a<br />
course has n credits, it will be scheduled to n straight time-slots in a day. For instance,<br />
course MAT125 has 3 credits, so it has to be scheduled to 3 straight time-slots. In any<br />
other case, a conflict will be counted.<br />
3.6.5 Crossover<br />
Two chromosomes from a population are chosen at random as mother and<br />
father. A new offspring is generated by creating an empty chromosome, then inserting<br />
alternately genes (time-slots) from the mother and father, as illustrated in Figure 3-19.<br />
Classroom n<br />
Mon Tue Wed Thu Fri<br />
Classroom 2<br />
Class1 Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Classroom<br />
Class1<br />
Class1 1<br />
Class2<br />
Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Class1<br />
Class3<br />
Class2<br />
Course 1 Class3 Course 2<br />
Course Class3 1 Course 2 Class4<br />
Course Class3 1 Course 2 Class4<br />
Class4<br />
Class4<br />
Course 3<br />
Course 3<br />
Chromosome x<br />
(Mother)<br />
Course 4<br />
Course 4<br />
Chromosome y<br />
(Father)<br />
Classroom n<br />
Mon Tue Wed Thu Fri<br />
Classroom 2<br />
Class1 Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Classroom Class1<br />
Class1 1 Class2<br />
Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Class1<br />
Class3<br />
Class2<br />
Course 2 Class3 Course 3<br />
Course Class3 2 Course 3<br />
Class4<br />
Course Class3 2<br />
Class4<br />
Class4<br />
Class4<br />
Course 4<br />
Course 4<br />
New chromosome z<br />
(Offspring)<br />
Classroom n<br />
Mon Tue Wed Thu Fri<br />
Classroom 2<br />
Class1 Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Classroom Class1 1<br />
Class1 Class2<br />
Class2<br />
Mon Class1 Tue Wed Class2 Thu Fri<br />
Class1<br />
Class3<br />
Class2<br />
Course 2 Class3 Course 2<br />
Course<br />
Class3 2 Course 2 Class4<br />
Course<br />
Class3 2 Course 2 Course 4 Class4<br />
Course<br />
Class4 4<br />
Class4<br />
Course 3<br />
Course 3<br />
Course 4<br />
Course 4<br />
FIGURE 3-19 Crossover
51<br />
The new offspring is created from an empty chromosome, and then it is inserted<br />
alternately with genes from mother and father. Because a n-credit course will be<br />
scheduled to n successive time-slots, successive time-slots have to be copied from<br />
mother and father. To facilitate this, all time-slots of morning or afternoon working<br />
sessions will be copied from the mother or father to the new offspring.<br />
Usually the new offspring is not correct thus it needs to be repaired. If a course<br />
has not been scheduled yet, it also needs to be scheduled. In the contrary, if a course<br />
has been scheduled more than one time in a week, it has to be removed.<br />
A pseudo code <strong>for</strong> crossover is given in Figure 3-20.<br />
Crossover rate pc=0.5<br />
Father x= a chromosome is chosen randomly from the population<br />
Mother y= a chromosome is chosen randomly from the population (y≠x)<br />
For each day in a week<br />
For each working-session in [morning, afternoon]<br />
For each classroom<br />
If (random(100) < pc*100)<br />
Copy afternoon time-slots of father x to afternoon time-slots of the new offspring z<br />
Else<br />
Copy morning time-slots of mother y to morning time-slots of the new offspring z<br />
Mutate the new offspring z<br />
Repair the new offspring z<br />
Calculate fitness value <strong>for</strong> the new offspring z<br />
Insert the new offspring z into the population in order of fitness value<br />
FIGURE 3-20 Pseudo code <strong>for</strong> crossover<br />
If the crossover rate pc is chosen to be 50%, the 50% of the genes from the<br />
mother and 50% of the genes from father are copied to the new offspring.<br />
3.6.6 Mutation<br />
A new offspring that has just been created by crossover will be mutated with a<br />
mutation rate. This is done via the following process: go through each gene and swap<br />
its content with another gene in the same chromosome.
52<br />
As mentioned in the previous section, a course has to be scheduled to successive<br />
time-slots, so we have to swap the successive time-slots booked <strong>for</strong> a course with<br />
other successive time-slots. To facilitate this, we choose all time-slots of a working<br />
session to swap with those of another, as illustrated in Figure 3-21.<br />
…<br />
…<br />
Classroom j<br />
Mon Tue Wed Thu Fri<br />
Chromosome x<br />
Course 6 Course 8<br />
Course 6 Course<br />
Classroom i<br />
8<br />
Course6 Course 8<br />
Mon Tue Wed Thu Fri<br />
Course 1 Course 2<br />
Course 1 Course 2<br />
Course 1 Course 2<br />
Course3 Course 9<br />
Course3 Course 9<br />
Course 3<br />
Course 3<br />
Course 4<br />
Course 4<br />
Swap contenst of 2 workingsessions<br />
with each other<br />
FIGURE 3-21 Mutation<br />
A pseudo code <strong>for</strong> mutating is given in Figure 3-22.<br />
Mutation rate pm=0.02<br />
For each classroom<br />
For each day in a week<br />
For each working-session in [morning, afternoon]<br />
If (random(100) < pm*100)<br />
R= a classroom is chosen randomly from the classroom group that is the<br />
same group of the current classroom<br />
Swap all time-slots of the current working-session of the current classroom<br />
with those of classroom R<br />
FIGURE 3-22 Pseudo code <strong>for</strong> mutating a chromosome<br />
Because a course is scheduled by only using classrooms in an assigned<br />
classroom group, any swapping has to ensure to use the classrooms within this<br />
classroom group.
53<br />
If the mutation rate is chosen to be 2%, only 2% of the genes are swapped their<br />
contents with others.<br />
3.7 The System <strong>for</strong> Experiment<br />
The Globus Toolkit 2.2 is used as middleware to develop our grid computing<br />
environment [7, 8]. This section presents the main steps <strong>for</strong> installing and setting up<br />
this environment.<br />
An Ethernet LAN and three Intel Pentium machines were used to build the grid<br />
environment. Redhat Linux 9.0 and Globus Toolkit 2.2 were installed and set up. In<br />
Figure 3-23, we present this environment with the host names and functions of each<br />
machine.<br />
m2.kmitnb.ac.th<br />
m1.kmitnb.ac.th<br />
Output<br />
Jobs<br />
- Globus client<br />
- J2sdk1.4, Java Cog Kit 1.1<br />
- MySQL 4.0<br />
m3.kmitnb.ac.th<br />
- Centralized course<br />
scheduling program<br />
- Decentralized course<br />
scheduling program<br />
- Globus server<br />
- GIIS, GRIS<br />
- CA<br />
- NTP server<br />
- Decentralized course<br />
scheduling program<br />
- Globus server<br />
- GRIS<br />
FIGURE 3-23 Hardware and software <strong>for</strong> each machine<br />
The host names are m1, m2 and m3. The machines should have a clock speed of<br />
at least 500 Mhz, at least 128 MB of memory and at least an 8 GB hard drive.<br />
We will configure the Monitoring and Discovery Service (MDS) to have one<br />
Grid In<strong>for</strong>mation Index Service (GIIS) on machine m2, which collects the data<br />
reported by the Grid Resource In<strong>for</strong>mation Servers (GRIS) on all the machines,<br />
shown in Figure 3-24.<br />
The GRIS servers send in<strong>for</strong>mation about their respective servers to the GIIS.<br />
We will use this to find the available machines. The user will be able to query the
54<br />
GIIS from the client machine m1. The machine m2 is used as a Certificate Authority<br />
machine.<br />
m2.kmitnb.ac.th<br />
m1.kmitnb.ac.th<br />
GRIS<br />
GIIS<br />
Grid-info-search<br />
GRIS<br />
m3.kmitnb.ac.th<br />
FIGURE 3-24 MDS configuration<br />
The MDS is secured so that only certified users can access the GIIS and only<br />
certified server GRISs can register to send in<strong>for</strong>mation to the GIIS. The machine m2<br />
is also used as a Network Time Protocol (NTP) server. We have to configure the NTP<br />
clients <strong>for</strong> the others (m1 and m3). The NTP needs to be installed because the grid<br />
needs the clocks on all of the machines to be synchronized.<br />
The installation and set up process in detail is presented in Appendix B.<br />
3.8 The Grid Components<br />
This section introduces the following grid components: broker, scheduler, and<br />
job and resource management.<br />
3.8.1 Broker<br />
The broker identifies the available resources to utilize within the grid<br />
environment. The Globus Toolkit 2.2 does not provide a broker implementation, but it<br />
provides the necessary functions and framework to create one through the MDS<br />
component.<br />
The broker will communicate via the LDAP protocol in the Globus Toolkit 2.2<br />
with the GIIS and GRIS servers. The broker can be linked with other in<strong>for</strong>mation
55<br />
stored in the databases or plain files that provide the resource in<strong>for</strong>mation, shown in<br />
Figure 3-25.<br />
In our study, we use a broker that uses the LDAP APIs provided by the Globus<br />
Toolkit 2.2 to send requests to the GIIS server located on machine m2.<br />
The complete source code <strong>for</strong> the broker is given in the file GridInfoSearch.java<br />
in Appendix E.<br />
m1.kmitnb.ac.th<br />
Broker<br />
LDAP query<br />
m2.kmitnb.ac.th<br />
GIIS<br />
GRIS<br />
Application<br />
GRIS<br />
GRIS<br />
m3.kmitnb.ac.th<br />
…<br />
FIGURE 3-25 Working with a broker<br />
When called, the GIIS server will return a list of available hosts within the grid.<br />
Each host has gathered the following resource in<strong>for</strong>mation:<br />
- Host name<br />
- CPU speed (MHz)<br />
- Number of CPU(s)<br />
- Free CPU Percentage<br />
The list of available hosts will be sorted by the weight that measures CPU<br />
workload.<br />
CPU<br />
speed<br />
* CPU<br />
count<br />
* CPU<br />
load<br />
Weight<br />
host<br />
= Eq. 3-4<br />
100<br />
where CPU speed : CPU speed; CPU count : the number of CPU(s); and CPU load : the<br />
current CPU workload.<br />
The most available host will be selected to run a new job.
56<br />
The complete source code <strong>for</strong> managing the available hosts is given in the file<br />
AvailableHost.java in Appendix E.<br />
3.8.2 Job Scheduler<br />
The job scheduler schedules the individual jobs to run on the individual hosts.<br />
Hamscher et al. [40] presented three job scheduling paradigms <strong>for</strong> a grid –<br />
centralized, hierarchical and distributed. Our study uses a centralized scheduling<br />
system. In addition, because the Globus Toolkit does not have its own job scheduler,<br />
our study will propose a job scheduler.<br />
In a centralized scheduling paradigm, a central machine acts as a resource<br />
manager to schedule jobs to all the surrounding hosts within the grid environment.<br />
Figure 3-26 presents the architecture of this scheduling.<br />
Jobs<br />
Central<br />
scheduling<br />
Job 1 Job 2 Job 3<br />
Host 1 Host 2 Host 3<br />
FIGURE 3-26 Centralized scheduling<br />
In this scenario, the jobs are first submitted to the central scheduler that then<br />
dispatches the jobs to the appropriate hosts. The jobs that cannot be started on a host<br />
are normally stored in a central job queue <strong>for</strong> later start.<br />
In our study, the central scheduling is implemented in machine m1. In addition,<br />
there are two kinds of jobs: one is the centralized course scheduling job and two is the<br />
decentralized course scheduling job. These jobs will be run on machine m2 and m3.<br />
Figure 3-27 presents the proposed <strong>algorithm</strong> <strong>for</strong> the centralized scheduling.
57<br />
Start<br />
Request the centralized course scheduling job<br />
to be run on a designated host<br />
Stage 1<br />
Wait <strong>for</strong> the results<br />
The job fails<br />
Yes<br />
No<br />
Select a job from the list of all<br />
decentralized course scheduling jobs<br />
Stage 2<br />
Search a host having the lowest load<br />
Request the decentralized course scheduling<br />
job to be run on the searched host<br />
All decentralized course<br />
scheduling jobs are requested<br />
No<br />
Yes<br />
All jobs were done<br />
No<br />
Select a job from the list of all<br />
decentralized course scheduling jobs<br />
Yes<br />
End<br />
Stage 3<br />
Get status of the job<br />
No<br />
The job failed<br />
Yes<br />
Search a host having the lowest load<br />
Request the failed job to be run on the searched host<br />
FIGURE 3-27 Job scheduler <strong>for</strong> the grid computing environment<br />
The <strong>algorithm</strong> can be divided into three stages:<br />
3.8.2.1 Stage 1<br />
The centralized course scheduling job is requested to be executed on a<br />
designated host, machine m2. The system will wait <strong>for</strong> the results and resubmit if it<br />
fails.
58<br />
3.8.2.2 Stage 2<br />
After the centralized course scheduling job is executed successfully, all<br />
decentralized course scheduling jobs are requested to be executed on remote<br />
machines: m2 and m3.<br />
There is no exchange of data between the decentralized course scheduling jobs,<br />
so these jobs can be requested one after another to be run in parallel in the grid.<br />
After each job is submitted to be executed on a host, the most available host will<br />
be updated.<br />
3.8.2.3 Stage 3<br />
The system monitors all the decentralized course scheduling jobs and resubmit a<br />
job if it fails.<br />
The complete source code <strong>for</strong> this job scheduler is given in the file<br />
Scheduling.java in Appendix E.<br />
3.8.3 Job and Resource Management<br />
The job and resource management submits a job to a particular resource, queries<br />
job status, and resubmits a job if it fails.<br />
FIGURE 3-28 Overview of GRAM and GASS
59<br />
The job and resource management in the Java Cog Kit is done by using the Grid<br />
Resource Allocation Manager (GRAM) and the Grid Access to Secondary Storage<br />
(GASS), shown in Figure 3-28.<br />
The GRAM is a module that provides the remote execution and status<br />
management of the execution. When a job is submitted by a client, the request is sent<br />
to the remote host and handled by the gatekeeper daemon located in the remote host.<br />
Then the gatekeeper creates a job manager to start and monitor the job. When the job<br />
is finished, the job manager sends the status in<strong>for</strong>mation back to the client and<br />
terminates.<br />
3.8.3.1 Job<br />
In Globus terminology, a job is a binary executable or command to be run on a<br />
remote resource (machine). In order to run this job, the remote server must have the<br />
Globus Toolkit installed. The remote server is also referred as a gatekeeper.<br />
In our case, we have two jobs that are executable programs: the centralized<br />
course scheduling and decentralized course scheduling. Both are written in C<br />
language. The centralized course scheduling program schedules <strong>for</strong> courses whose<br />
lecturers are invited from other faculties and courses whose students come from other<br />
faculties. On the other hand, the decentralized scheduling program schedules <strong>for</strong><br />
courses of each particular faculty that have not been scheduled yet by the centralized<br />
course scheduling program.<br />
3.8.3.2 The Resource Specific Language (RSL)<br />
RSL is a language used by the clients to submit a job. All job submission<br />
requests are described in RSL, including the executable file and condition on which it<br />
must be executed.<br />
The following is a sample RSL string that requests to execute the file<br />
decentralizedscheduling.exe one time on a remote host. The directory of this file is<br />
also identified.<br />
&(execuatable = decentralizedscheduling.exe)<br />
(directory = /usr/study/coursescheduling)<br />
(arguments = facultyID)(count=1)
60<br />
3.8.3.3 The Gatekeeper<br />
The gatekeeper daemon builds the secure communication between the clients<br />
and the servers. It communicates with the GRAM client and authenticates the right to<br />
submit jobs. After authentication, gatekeeper splits and creates a job manager<br />
delegating the authority to communicate with clients.<br />
The Java CoG Kit provides a personal gatekeeper that can be used as a<br />
lightweight alternative to the Globus gatekeeper. A gridmap file is used by the<br />
gatekeeper to map the Globus credentials to local users. The gridmap file is<br />
introduced in Appendix B.<br />
3.8.3.4 Job manager<br />
The job manager is created by the gatekeeper daemon as part of the job<br />
requesting process. It provides the interfaces that control the allocation of each local<br />
resource manager. The job manager functions are:<br />
a) Parse the RSL.<br />
b) Allocate job requests to the local resource managers. The local<br />
resource manager is usually a job scheduler like PBS, LSF, or LoadLeveler. However,<br />
our study does not use these job schedulers.<br />
c) Send callbacks to clients, if necessary.<br />
d) Receive the status and cancel requests from clients.<br />
e) Send output results to clients using the GASS, if requested.<br />
The GRAM uses the GASS <strong>for</strong> providing the mechanism to transfer the output<br />
file from servers to clients. Some APIs are provided under the Grid Security<br />
Infrastructure (GSI) protocol to furnish secure transfers.<br />
The complete source code <strong>for</strong> the job submission is given in the file<br />
GassJob.java in Appendix E.
CHAPTER 4<br />
EXPERIMENTAL RESULTS<br />
The system <strong>for</strong> the experiment was installed and set up as outlined in section<br />
3.7. This chapter discusses some of the results of our <strong>genetic</strong> <strong>algorithm</strong> (GA) and the<br />
grid computing environment. Section 4.1 presents the data used <strong>for</strong> the experiments.<br />
Section 4.2 presents experiments and discussions. Section 4.3 presents sample results.<br />
4.1 The Data <strong>for</strong> the Experiments<br />
The data used <strong>for</strong> the experiments are collected from the three departments of<br />
three different faculties: Department of English – Faculty of Education, Department<br />
of Electrical and Computer Engineering – Faculty of Engineering, and Department of<br />
Computer Science – Faculty of Science, in Cantho University (Vietnam). Twelve<br />
classes will be scheduled to study 76 sections of the courses in their curriculums in<br />
the first semester of 2006. They are Bachelor of Science in Computer Science<br />
(BSCS04A, BSCS04B, BSCS05A, BSCS05B, BSCS06A, and BSCS06B) and<br />
Bachelor of Science in Electrical Engineering (BSEE04A, BSEE04B, BSEE05A,<br />
BSEE05B, BSEE06A, and BSEE06B), shown in Table 4-1.<br />
TABLE 4-1 Courses fulfilled by each class<br />
Class Semester Course Section Credits Number of Students<br />
BSCS04A<br />
5<br />
CSC329<br />
001<br />
3<br />
30<br />
BSCS04A<br />
5<br />
CSC330<br />
001<br />
2<br />
30<br />
BSCS04A<br />
5<br />
ENL307<br />
001<br />
3<br />
30<br />
BSCS04A<br />
5<br />
CSC326<br />
001<br />
3<br />
30<br />
BSCS04A<br />
5<br />
CSC327<br />
001<br />
2<br />
30<br />
BSCS04A<br />
5<br />
CSC328<br />
001<br />
2<br />
30<br />
BSCS04B<br />
5<br />
CSC326<br />
002<br />
3<br />
30<br />
BSCS04B<br />
5<br />
CSC327<br />
002<br />
2<br />
30<br />
BSCS04B<br />
5<br />
CSC328<br />
002<br />
2<br />
30<br />
BSCS04B<br />
5<br />
CSC329<br />
002<br />
3<br />
30
62<br />
TABLE 4-1 (CONTINUED)<br />
Class Semester Course Section Credits Number of Students<br />
BSCS04B<br />
BSCS04B<br />
5<br />
5<br />
CSC330<br />
ENL307<br />
002<br />
001<br />
2<br />
3<br />
30<br />
30<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
3<br />
3<br />
3<br />
3<br />
3<br />
3<br />
3<br />
ECE218<br />
MAT220<br />
CSC211<br />
CSC215<br />
CSC221<br />
ECE217<br />
CSC210<br />
001<br />
001<br />
002<br />
002<br />
002<br />
001<br />
002<br />
2<br />
3<br />
4<br />
2<br />
3<br />
2<br />
3<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
3<br />
3<br />
3<br />
3<br />
3<br />
3<br />
3<br />
CSC215<br />
CSC221<br />
ECE217<br />
ECE218<br />
MAT220<br />
CSC211<br />
CSC210<br />
001<br />
001<br />
002<br />
002<br />
002<br />
001<br />
001<br />
2<br />
3<br />
2<br />
2<br />
3<br />
4<br />
3<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
CSC120<br />
CSC127<br />
ENL101<br />
MAT125<br />
CSC110<br />
CSC113<br />
CSC115<br />
002<br />
002<br />
001<br />
001<br />
002<br />
002<br />
002<br />
3<br />
2<br />
3<br />
3<br />
2<br />
2<br />
2<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
MAT125<br />
CSC113<br />
CSC115<br />
CSC120<br />
CSC127<br />
ENL101<br />
CSC110<br />
001<br />
001<br />
001<br />
001<br />
001<br />
001<br />
001<br />
3<br />
2<br />
2<br />
3<br />
2<br />
3<br />
2<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSEE04A<br />
BSEE04A<br />
BSEE04A<br />
5<br />
5<br />
5<br />
ECE320<br />
ECE325<br />
ECE326<br />
001<br />
001<br />
001<br />
2<br />
3<br />
2<br />
30<br />
30<br />
30
63<br />
TABLE 4-1 (CONTINUED)<br />
Class Semester Course Section Credits Number of Students<br />
BSEE04A<br />
BSEE04A<br />
BSEE04A<br />
5<br />
5<br />
5<br />
ENL308<br />
MAT322<br />
SIE305<br />
001<br />
001<br />
001<br />
3<br />
2<br />
3<br />
30<br />
30<br />
30<br />
BSEE04B<br />
BSEE04B<br />
BSEE04B<br />
BSEE04B<br />
BSEE04B<br />
BSEE04B<br />
5<br />
5<br />
5<br />
5<br />
5<br />
5<br />
ECE320<br />
ECE325<br />
ECE326<br />
ENL308<br />
MAT322<br />
SIE305<br />
002<br />
002<br />
002<br />
002<br />
002<br />
002<br />
2<br />
3<br />
2<br />
3<br />
2<br />
3<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSEE05A<br />
BSEE05A<br />
BSEE05A<br />
BSEE05A<br />
BSEE05A<br />
BSEE05A<br />
3<br />
3<br />
3<br />
3<br />
3<br />
3<br />
ECE212<br />
MAT223<br />
PHY241<br />
ECE200<br />
ECE205<br />
ECE203<br />
001<br />
001<br />
001<br />
001<br />
001<br />
001<br />
3<br />
2<br />
3<br />
2<br />
2<br />
2<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSEE05B<br />
BSEE05B<br />
BSEE05B<br />
BSEE05B<br />
BSEE05B<br />
BSEE05B<br />
3<br />
3<br />
3<br />
3<br />
3<br />
3<br />
MAT223<br />
PHY241<br />
ECE200<br />
ECE203<br />
ECE205<br />
ECE212<br />
002<br />
002<br />
002<br />
002<br />
002<br />
002<br />
2<br />
3<br />
2<br />
2<br />
2<br />
3<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSEE06A<br />
BSEE06A<br />
BSEE06A<br />
BSEE06A<br />
BSEE06A<br />
BSEE06A<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
ENL101<br />
MAT125<br />
CHE103<br />
CHE104<br />
ECE120<br />
ECE102<br />
002<br />
002<br />
006<br />
006<br />
001<br />
001<br />
3<br />
3<br />
3<br />
2<br />
3<br />
2<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30<br />
BSEE06B<br />
BSEE06B<br />
BSEE06B<br />
BSEE06B<br />
BSEE06B<br />
BSEE06B<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
CHE103<br />
CHE104<br />
ECE102<br />
ENL101<br />
MAT125<br />
ECE120<br />
005<br />
005<br />
002<br />
003<br />
002<br />
002<br />
3<br />
2<br />
2<br />
3<br />
3<br />
3<br />
30<br />
30<br />
30<br />
30<br />
30<br />
30
64<br />
26 lecturers are assigned to teach courses. Classroom groups used <strong>for</strong> each<br />
“course + section” are identified, shown in Table 4-2.<br />
TABLE 4-2 Lecturer and classroom assignment<br />
Course Section Lecturer Room Group<br />
ENL101<br />
ENL101<br />
ENL101<br />
001<br />
002<br />
003<br />
00001<br />
00001<br />
00001<br />
ENLLECRM<br />
ENLLECRM<br />
ENLLECRM<br />
ENL307<br />
001<br />
00003<br />
ENLLECRM<br />
ENL308<br />
001<br />
00003<br />
ENLLECRM<br />
ENL308<br />
002<br />
00003<br />
ENLLECRM<br />
PHY241 002 00006 PHYLECRM<br />
PHY241 001 00007 PHYLECRM<br />
CSC110<br />
CSC110<br />
CSC113<br />
CSC115<br />
001<br />
002<br />
002<br />
002<br />
00014<br />
00014<br />
00014<br />
00014<br />
CSCLECRM<br />
CSCLECRM<br />
CSCCOMLB<br />
CSCLECRM<br />
CSC120<br />
002<br />
00015<br />
CSCLECRM<br />
CSC127<br />
001<br />
00015<br />
CSCLECRM<br />
CSC127<br />
002<br />
00015<br />
CSCLECRM<br />
CSC210<br />
001<br />
00015<br />
CSCLECRM<br />
CSC113<br />
001<br />
00016<br />
CSCCOMLB<br />
CSC115<br />
001<br />
00016<br />
CSCLECRM<br />
CSC120<br />
001<br />
00016<br />
CSCLECRM<br />
CSC211<br />
001<br />
00016<br />
CSCCOMLB<br />
CSC221<br />
001<br />
00017<br />
CSCLECRM<br />
CSC221<br />
002<br />
00017<br />
CSCLECRM<br />
CSC210<br />
002<br />
00018<br />
CSCLECRM<br />
CSC211<br />
002<br />
00018<br />
CSCCOMLB<br />
CSC215<br />
001<br />
00018<br />
CSCLECRM<br />
CSC215<br />
002<br />
00018<br />
CSCLECRM<br />
CSC326<br />
001<br />
00019<br />
CSCLECRM<br />
CSC326<br />
002<br />
00019<br />
CSCLECRM<br />
CSC327<br />
001<br />
00019<br />
CSCLECRM<br />
CSC327<br />
002<br />
00019<br />
CSCLECRM
65<br />
TABLE 4-2 (CONTINUED)<br />
Course Section Lecturer Room Group<br />
CSC329<br />
CSC329<br />
CSC330<br />
001<br />
002<br />
001<br />
00020<br />
00020<br />
00020<br />
CSCLECRM<br />
CSCLECRM<br />
CSCLECRM<br />
CSC328<br />
CSC328<br />
CSC330<br />
001<br />
002<br />
002<br />
00021<br />
00021<br />
00021<br />
CSCCOMLB<br />
CSCCOMLB<br />
CSCLECRM<br />
ECE120<br />
ECE120<br />
ECE200<br />
ECE200<br />
001<br />
002<br />
001<br />
002<br />
00031<br />
00031<br />
00031<br />
00031<br />
ECELECRM<br />
ECELECRM<br />
ECEESTLB<br />
ECEESTLB<br />
ECE102<br />
ECE102<br />
ECE205<br />
ECE212<br />
001<br />
002<br />
002<br />
001<br />
00032<br />
00032<br />
00032<br />
00032<br />
ECELECRM<br />
ECELECRM<br />
ECELECRM<br />
ECELECRM<br />
ECE203<br />
ECE203<br />
ECE205<br />
ECE212<br />
001<br />
002<br />
001<br />
002<br />
00033<br />
00033<br />
00033<br />
00033<br />
ECELECRM<br />
ECELECRM<br />
ECELECRM<br />
ECELECRM<br />
ECE217<br />
ECE217<br />
ECE218<br />
ECE218<br />
001<br />
002<br />
001<br />
002<br />
00034<br />
00034<br />
00034<br />
00034<br />
ECELECRM<br />
ECELECRM<br />
ECEDCDLB<br />
ECEDCDLB<br />
ECE320<br />
ECE320<br />
ECE325<br />
ECE325<br />
001<br />
002<br />
001<br />
002<br />
00035<br />
00035<br />
00035<br />
00035<br />
ECELECRM<br />
ECELECRM<br />
ECELECRM<br />
ECELECRM<br />
ECE326<br />
ECE326<br />
001<br />
002<br />
00036<br />
00036<br />
ECEELCLB<br />
ECEELCLB<br />
SIE305 001 00046 SIELECRM<br />
SIE305 002 00047 SIELECRM<br />
MAT125<br />
MAT125<br />
001<br />
002<br />
00059<br />
00059<br />
MATLECRM<br />
MATLECRM<br />
MAT220<br />
MAT220<br />
MAT223<br />
001<br />
002<br />
001<br />
00061<br />
00061<br />
00061<br />
MATLECRM<br />
MATLECRM<br />
MATLECRM
66<br />
TABLE 4-2 (CONTINUED)<br />
Course Section Lecturer Room Group<br />
MAT223 002 00061 MATLECRM<br />
MAT322<br />
MAT322<br />
001<br />
002<br />
00063<br />
00063<br />
MATLECRM<br />
MATLECRM<br />
CHE103<br />
CHE103<br />
005<br />
006<br />
00071<br />
00071<br />
CHELECRM<br />
CHELECRM<br />
CHE104 005 00072 CHEFTCLB<br />
CHE104 006 00073 CHEFTCLB<br />
Similarly, constraints about classroom size and lecturer’s time are also prepared.<br />
4.2 The Experiments and Discussions<br />
4.2.1 Experimental Designs<br />
The aims of the experiments are to evaluate the influence of setting the GA<br />
parameters and the influence of the grid computing environment.<br />
The proposed GA that is presented in chapter 3 is applied to both the centralized<br />
course scheduling program and decentralized course scheduling program. In addition,<br />
the same values of the GA parameters will be applied to these programs. Thus, to<br />
evaluate the efficiency of the GA, we only need to test one of the above course<br />
scheduling programs. Here, we test the centralized course scheduling program. To<br />
evaluate the influence of the grid computing environment, we use the grid system as<br />
shown in section 3.7.<br />
We will do four separate experiments. The first experiment tests the influence of<br />
weighting <strong>for</strong> hard and soft constraints in the fitness function. The second and third<br />
experiments test the influence of the mutation rate and the population size on the<br />
speed of evolution respectively. Finally, the <strong>for</strong>th experiment tests the influence of<br />
using the grid computing environment.<br />
The course scheduling is a NP hard problem, and the GA itself is a metaheuristic<br />
<strong>algorithm</strong>. There<strong>for</strong>e, we would obtain a good enough solution if not the best<br />
one. Each experiment will run models until the GA detects the best solution or until<br />
the GA cannot improve the fitness value in 300 consecutive generations. The model<br />
giving a faster fitness value via many runs would be a better one.
67<br />
4.2.2 Experiment 1: Hard and Soft Constraint Weight Test<br />
The aim of this experiment is to analyze the behavior of the GA as weights W 1<br />
and W 2 in the fitness function f x)<br />
= W f ( x)<br />
+ W f ( ) are modified. More details<br />
(<br />
1 1<br />
2 2<br />
x<br />
about this function were presented in section 3.6.4.<br />
To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />
run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />
- Population size : 10<br />
- Crossover rate : 0.5<br />
- Mutation rate : 0.02<br />
- Selection method : Steady state<br />
- Hard constraint weight W 1 : Varied<br />
- Soft constraint weight W 2 : Varied<br />
This experiment is per<strong>for</strong>med <strong>for</strong> 3 different pairs of weights as below:<br />
- W 1 =1.0 and W 2 =0.0<br />
- W 1 =0.75 and W 2 =0.25<br />
- W 1 =0.5 and W 2 =0.5<br />
Each pair of weights is tested 5 times. Figure 4-1 presents the average fitness<br />
value f 1 (x) of hard constraints after 500 generations.<br />
The Fitness Value of Hard Constraints vs Various Weights<br />
1.00000<br />
Fitness Value f1(x)<br />
0.50000<br />
0.00000<br />
1 51 101 151 201 251 301 351 401 451 501<br />
Generation<br />
W1=1.0 & W2=0.0 W1=0.75 & W2=0.25 W1=0.5 & W2=0.5<br />
FIGURE 4-1 The average fitness value of hard constraints vs various weights
68<br />
This result shows that the GA rapidly obtains a high fitness value f 1 (x) if we use<br />
a large value W 1. This is because the solutions that have a high fitness value of hard<br />
constraints will have more chance to be selected <strong>for</strong> survival. When W 1 is 1.0, the GA<br />
gives the fastest evolution of hard constraints.<br />
Now, we will consider what will happen <strong>for</strong> fitness value f 2 (x) of soft<br />
constraints. Figure 4-2 presents the average fitness values f 2 (x) after 500 generations.<br />
The result also shows that the GA rapidly obtains a high value f 2 (x) if we use a<br />
large value W 2. When W 2 is 0.5, the GA gives the fastest evolution of soft constraints.<br />
1.00000<br />
The Fitness Value of Soft Constraints vs Various Weights<br />
Fitness Value f2(x)<br />
0.50000<br />
0.00000<br />
1 51 101 151 201 251 301 351 401 451 501<br />
Generation<br />
W1=1.0 & W2=0.0 W1=0.75 & W2= 0.25 W1=0.5 & W2= 0.5<br />
FIGURE 4-2 The average fitness value of soft constraints vs various weights<br />
However, using a larger weight <strong>for</strong> the hard constraints means using smaller<br />
weight <strong>for</strong> the soft constraints. We have to balance between hard and soft constraints.<br />
In our study, there are nine hard constraints and only one soft constraint. There<strong>for</strong>e,<br />
the pair of W 1 =0.75 and W 2 =0.25 seems the most suitable one <strong>for</strong> our GA.<br />
4.2.3 Experiment 2: Population Size Test<br />
The aim of this experiment is to analyze the behavior of the GA as population<br />
size is modified.
69<br />
To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />
run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />
- Crossover rate : 0.5<br />
- Mutation rate : 0.02<br />
- Selection method : Steady state<br />
- Hard constraint weight W 1 : 0.75<br />
- Soft constraint weight W 2 : 0.25<br />
- Population size : Varied<br />
This experiment is per<strong>for</strong>med <strong>for</strong> 3 different population sizes: 5, 10 and 15.<br />
Each the population size is tested 5 times. The chart of average execution time <strong>for</strong> a<br />
resultant solution as the population size is changed is given in Figure 4-3.<br />
The Average Time <strong>for</strong> a Resultant Solution<br />
Population Size<br />
15<br />
10<br />
5<br />
2842.6<br />
2652.8<br />
5829<br />
0 1000 2000 3000 4000 5000 6000 7000<br />
Execution Time in Secconds<br />
FIGURE 4-3 The average execution time <strong>for</strong> a resultant solution vs population sizes<br />
We know that a large population contains many different individuals. This<br />
creates a diversity of possible solutions. Using a large population size, the GA can<br />
obtain a resultant solution after a small number of generations. However, our<br />
experiment shows that in term of time, the GA with a small population size converges<br />
to a solution faster than the GA with a large size population. To explain this result, we
70<br />
should revise the chromosome representation, presented in section 3.6.1. Each<br />
chromosome represents directly a timetable or a solution, so it stores a large amount<br />
of data. It also has a large amount of related data from the database. As a result, the<br />
larger population needs more memory and more processing time <strong>for</strong> GA operations.<br />
This experiment also shows that with the smallest population size (five) we<br />
have the fastest GA.<br />
The GAs with a large population do not give a faster speed of evolution.<br />
However, in order to have diversity of solutions, it may be safe to keep the population<br />
size larger than an optimum size although it is a little slower to execute. We will use<br />
the population of 10 <strong>for</strong> our GA.<br />
4.2.4 Experiment 3: Mutation Rate Test<br />
The aim of this experiment is to analyze the behavior of the GA as mutation rate<br />
is modified.<br />
To per<strong>for</strong>m this experiment, the centralized course scheduling program will be<br />
run on one Pentium IV 1.7 GHz machine with the following GA settings:<br />
- Population size : 10<br />
- Crossover rate : 0.5<br />
- Selection method : Steady state<br />
- Hard constraint weight W 1 : 0.75<br />
- Soft constraint weight W 2 : 0.25<br />
- Mutation rate : Varied<br />
This experiment is per<strong>for</strong>med <strong>for</strong> 4 different mutation rates: 0.00, 0.02, 0.20 and<br />
0.40. Each rate is tested 5 times. The chart of the average fitness value f(x) after 500<br />
generations versus different mutation rates is given in Figure 4-4.<br />
The best mutation rate is found to be 0.02. The mutation rates that are lower or<br />
higher than this rate give slower evolution. This is shown definitely. If there is no<br />
mutation (0.00), offspring are generated immediately after crossover without any<br />
change. There<strong>for</strong>e, the GA would fall into local optimum. On the other hand, the high<br />
mutation rates usually cause the exploration of search space. The GA now can fall<br />
into a random search space instead of searching from offspring of good parents.
71<br />
The GA with Various Mutation Rates<br />
1.00000<br />
Fitness Value f(x)<br />
0.50000<br />
0.00000<br />
1 51 101 151 201 251 301 351 401 451 501<br />
Generation<br />
0.00 0.02 0.20 0.40<br />
FIGURE 4-4 The GA with various mutation rates<br />
4.2.5 Experiment 4: Parallel Execution on the Grid Computing Environment<br />
The aim of this experiment is to evaluate the influence of the grid computing<br />
environment to the resultant solutions.<br />
The experiment tests three different models. The first model uses a single<br />
machine to per<strong>for</strong>m the centralized course scheduling strategy as introduced in section<br />
3.4.1. The centralized course scheduling program is used to test a centralized<br />
execution that schedules <strong>for</strong> all courses. The second model also uses a single machine,<br />
but both the centralized course scheduling program and the decentralized course<br />
scheduling program are used <strong>for</strong> a serial execution. First, the centralized course<br />
scheduling program schedules <strong>for</strong> all shared resources, and then one after another the<br />
decentralized course scheduling program schedules <strong>for</strong> the remaining resources of<br />
each faculty. Finally, the third model uses a grid computing environment <strong>for</strong> parallel<br />
execution. First, the centralized course scheduling program is executed on a machine,<br />
and then the decentralized course scheduling program is executed in parallel on<br />
remote machines.<br />
Both the centralized course scheduling program and the decentralized course<br />
scheduling program will set up with the following GA settings:
72<br />
- Population size : 10<br />
- Crossover rate : 0.5<br />
- Mutation rate : 0.02<br />
- Selection method : Steady state<br />
- Hard constraint weight W 1 : 0.75<br />
- Soft constraint weight W 2 : 0.25<br />
The first and second models are per<strong>for</strong>med on a Pentium IV 1.7 GHz machine.<br />
On the other hand, the third model is per<strong>for</strong>med on a gird computing environment of 3<br />
machines, as shown in Figure 3-23. The Central Manager Host m1 is a Pentium III<br />
700 MHz machine. The remote machines m2 and m3 are Pentium IV 1.7 GHz<br />
machines.<br />
Figure 4-5 presents a chart of the average execution time of each model after 5<br />
runs. Each model is executed until the GA finds a resultant solution.<br />
Execution Time vs Models<br />
Parallel Execution on the<br />
Grid<br />
439.6<br />
Model<br />
Serial Execution<br />
852.6<br />
Centralized Execution<br />
2842.6<br />
0 500 1000 1500 2000 2500 3000<br />
Execution Time in Seconds<br />
FIGURE 4-5 The execution time versus various models<br />
The first model is slower than the second model. The first model has a global<br />
view of the whole data, so it should have given a resultant solution within a short time<br />
interval. However, it gave an unexpected result. This is because when the whole data<br />
are centralized to be processed on a single machine, the size of the problem becomes
73<br />
too big. Certainly, the GA is slowed down when it works on large size chromosomes<br />
with a large number of conflicted hard and soft constraints. However, if the data are<br />
separated to be processed one after another by the centralized course scheduling<br />
program and the decentralized course scheduling program, the overall execution time<br />
will be shorter.<br />
The parallel execution of the third model is significant faster than the serial<br />
execution of the second model. It is almost definitely understood. Instead course<br />
scheduling jobs are per<strong>for</strong>med one after another; some of them are per<strong>for</strong>med in<br />
parallel by many different processors, as illustrated in Figure 4-6.<br />
Processors<br />
Parallel<br />
Execution<br />
Centralized Course<br />
Scheduling Program<br />
Decentralized Course<br />
Scheduling Program <strong>for</strong><br />
Faculty of Engineering<br />
Decentralized Course<br />
Scheduling Program <strong>for</strong><br />
Faculty of Education<br />
Centralized Course<br />
Scheduling Program<br />
Decentralized Course<br />
Scheduling Program<br />
<strong>for</strong> Faculty of Science<br />
Decentralized Course<br />
Scheduling Program <strong>for</strong><br />
Faculty of Engineering<br />
Decentralized Course<br />
Scheduling Program <strong>for</strong><br />
Faculty of Education<br />
Serial<br />
Execution<br />
Decentralized Course<br />
Scheduling Program<br />
<strong>for</strong> Faculty of Science<br />
Execution Time<br />
FIGURE 4-6 Parallel execution versus serial execution<br />
The total execution time <strong>for</strong> a complete resultant solution of the third model can<br />
be presented as follow:<br />
Total parallel execution time = Time <strong>for</strong> the centralized course scheduling +<br />
Max(Time <strong>for</strong> the decentralized course scheduling on remote machines)<br />
The data that is used <strong>for</strong> the course scheduling programs is transferred from the<br />
central database to the remote machines once be<strong>for</strong>e they are processed. In addition,<br />
there are not any exchanges of data while the programs are being executed. The time
74<br />
<strong>for</strong> network communication is much smaller than the execution time of each program,<br />
so this time is not considered in this experiment.<br />
4.3 The Sample Results<br />
This section presents the results that are obtained by running the third model<br />
that is presented in the previous section.<br />
First of all, the centralized course scheduling program is executed on machine<br />
m2. It schedules <strong>for</strong> shared resources that consist of courses whose lecturers are<br />
invited from other faculties and courses whose students come from other faculties.<br />
The results are presented in Table 4-3. Then the decentralized course scheduling<br />
program is submitted to be executed in parallel on the machines m2 and m3. It<br />
schedules <strong>for</strong> the remaining resources of each faculty. All courses taught by the<br />
Faculty of Education have been scheduled by the centralized course scheduling<br />
program, so now the decentralized course scheduling program only schedules <strong>for</strong><br />
courses taught by the Faculty of Engineering and the Faculty of Science. The results<br />
are presented in Table 4-4 and Table 4-5.<br />
TABLE 4-3 Timetable created by the centralized course scheduling program<br />
Course Section Classroom Day Time-slot Class Lecturer<br />
ENL307 001 B201A01 3 4->6 BSCS04A 00003<br />
ENL307 001 B201A01 3 4->6 BSCS04B 00003<br />
ECE218<br />
ECE217<br />
001<br />
001<br />
B301B02<br />
B301A07<br />
4<br />
2<br />
2->3<br />
4->5<br />
BSCS05A<br />
BSCS05A<br />
00034<br />
00034<br />
ECE218<br />
ECE217<br />
002<br />
002<br />
B301B02<br />
B301A06<br />
1<br />
1<br />
6->7<br />
2->3<br />
BSCS05B<br />
BSCS05B<br />
00034<br />
00034<br />
ENL101 001 B201A01 2 4->6 BSCS06A 00001<br />
ENL101 001 B201A01 2 4->6 BSCS06B 00001<br />
MAT322<br />
ENL308<br />
001<br />
001<br />
B101A09<br />
B201A03<br />
0<br />
4<br />
6->7<br />
0->2<br />
BSEE04A<br />
BSEE04A<br />
00063<br />
00003<br />
ENL308<br />
MAT322<br />
002<br />
002<br />
B201A03<br />
B101A10<br />
4<br />
4<br />
4->6<br />
2->3<br />
BSEE04B<br />
BSEE04B<br />
00003<br />
00063<br />
MAT223<br />
PHY241<br />
001<br />
001<br />
B101A12<br />
B102A04<br />
1<br />
2<br />
4->5<br />
0->2<br />
BSEE05A<br />
BSEE05A<br />
00061<br />
00007
75<br />
TABLE 4-3 (CONTINUED)<br />
Course Section Classroom Day Time-slot Class Lecturer<br />
MAT223<br />
PHY241<br />
002<br />
002<br />
B101A08<br />
B102A06<br />
0<br />
3<br />
2->3<br />
4->6<br />
BSEE05B<br />
BSEE05B<br />
00061<br />
00006<br />
CHE104<br />
ENL101<br />
MAT125<br />
CHE103<br />
006<br />
002<br />
002<br />
006<br />
B103A15<br />
B201A02<br />
B101A01<br />
B103A06<br />
0<br />
3<br />
2<br />
0<br />
2->3<br />
0->2<br />
0->2<br />
4->6<br />
BSEE06A<br />
BSEE06A<br />
BSEE06A<br />
BSEE06A<br />
00073<br />
00001<br />
00059<br />
00071<br />
MAT125<br />
002<br />
B101A01<br />
2<br />
0->2<br />
BSEE06B<br />
00059<br />
CHE103<br />
005<br />
B103A01<br />
4<br />
0->2<br />
BSEE06B<br />
00071<br />
CHE104<br />
005<br />
B103A11<br />
4<br />
6->7<br />
BSEE06B<br />
00072<br />
ENL101<br />
003<br />
B201A01<br />
1<br />
0->2<br />
BSEE06B<br />
00001<br />
TABLE 4-4 Timetable created by the decentralized course scheduling program <strong>for</strong><br />
Faculty of Engineering<br />
Course Section Classroom Day Time-slot Class Lecturer<br />
ECE325<br />
ECE326<br />
SIE305<br />
ECE320<br />
001<br />
001<br />
001<br />
001<br />
B301A04<br />
B301B01<br />
B302A03<br />
B301A01<br />
3<br />
2<br />
4<br />
1<br />
0->2<br />
4->5<br />
4->6<br />
2->3<br />
BSEE04A<br />
BSEE04A<br />
BSEE04A<br />
BSEE04A<br />
00035<br />
00036<br />
00046<br />
00035<br />
ECE320<br />
002<br />
B301A10<br />
0<br />
2->3<br />
BSEE04B<br />
00035<br />
ECE325<br />
002<br />
B301A10<br />
2<br />
0->2<br />
BSEE04B<br />
00035<br />
SIE305<br />
002<br />
B302A02<br />
1<br />
4->6<br />
BSEE04B<br />
00047<br />
ECE326<br />
002<br />
B301B01<br />
4<br />
0->1<br />
BSEE04B<br />
00036<br />
ECE212<br />
001<br />
B301A01<br />
3<br />
4->6<br />
BSEE05A<br />
00032<br />
ECE203<br />
001<br />
B301A02<br />
4<br />
0->1<br />
BSEE05A<br />
00033<br />
ECE200<br />
001<br />
B301B05<br />
4<br />
4->5<br />
BSEE05A<br />
00031<br />
ECE205<br />
001<br />
B301A01<br />
1<br />
0->1<br />
BSEE05A<br />
00033<br />
ECE205<br />
002<br />
B301A01<br />
4<br />
4->5<br />
BSEE05B<br />
00032<br />
ECE212<br />
002<br />
B301A09<br />
2<br />
0->2<br />
BSEE05B<br />
00033<br />
ECE200<br />
002<br />
B301B05<br />
4<br />
6->7<br />
BSEE05B<br />
00031<br />
ECE203<br />
002<br />
B301A08<br />
0<br />
4->5<br />
BSEE05B<br />
00033<br />
ECE102<br />
001<br />
B301A07<br />
4<br />
6->7<br />
BSEE06A<br />
00032<br />
ECE120<br />
001<br />
B301A08<br />
2<br />
4->6<br />
BSEE06A<br />
00031<br />
ECE120<br />
002<br />
B301A08<br />
3<br />
0->2<br />
BSEE06B<br />
00031<br />
ECE102<br />
002<br />
B301A01<br />
0<br />
0->1<br />
BSEE06B<br />
00032
76<br />
TABLE 4-5 Timetable created by the decentralized course scheduling program <strong>for</strong><br />
Faculty of Science<br />
Course Section Classroom Day Time-slot Class Lecturer<br />
CSC328<br />
CSC326<br />
CSC329<br />
CSC327<br />
CSC330<br />
001<br />
001<br />
001<br />
001<br />
001<br />
B104B18<br />
B104B05<br />
B104B11<br />
B104B05<br />
B104B02<br />
2<br />
1<br />
0<br />
4<br />
4<br />
2->3<br />
0->2<br />
0->2<br />
6->7<br />
2->3<br />
BSCS04A<br />
BSCS04A<br />
BSCS04A<br />
BSCS04A<br />
BSCS04A<br />
00021<br />
00019<br />
00020<br />
00019<br />
00020<br />
CSC328<br />
CSC329<br />
CSC326<br />
CSC330<br />
CSC327<br />
002<br />
002<br />
002<br />
002<br />
002<br />
B104B16<br />
B104B09<br />
B104B10<br />
B104B03<br />
B104B01<br />
1<br />
0<br />
2<br />
4<br />
2<br />
2->3<br />
4->6<br />
4->6<br />
6->7<br />
2->3<br />
BSCS04B<br />
BSCS04B<br />
BSCS04B<br />
BSCS04B<br />
BSCS04B<br />
00021<br />
00020<br />
00019<br />
00021<br />
00019<br />
CSC210<br />
CSC215<br />
CSC221<br />
MAT220<br />
CSC211<br />
002<br />
002<br />
002<br />
001<br />
002<br />
B104B06<br />
B104B04<br />
B104B03<br />
B101A02<br />
B104B17<br />
3<br />
4<br />
1<br />
0<br />
2<br />
4->6<br />
6->7<br />
4->6<br />
4->6<br />
0->3<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
BSCS05A<br />
00018<br />
00018<br />
00017<br />
00061<br />
00018<br />
MAT220<br />
CSC221<br />
CSC211<br />
CSC210<br />
CSC215<br />
002<br />
001<br />
001<br />
001<br />
001<br />
B101A11<br />
B104B09<br />
B104B15<br />
B104B08<br />
B104B04<br />
2<br />
2<br />
3<br />
0<br />
3<br />
4->6<br />
0->2<br />
4->7<br />
0->2<br />
2->3<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
BSCS05B<br />
00061<br />
00017<br />
00016<br />
00015<br />
00018<br />
MAT125<br />
CSC120<br />
CSC115<br />
CSC110<br />
CSC127<br />
CSC113<br />
001<br />
002<br />
002<br />
002<br />
002<br />
002<br />
B101A03<br />
B104B07<br />
B104B12<br />
B104B06<br />
B104B01<br />
B104B14<br />
4<br />
3<br />
1<br />
3<br />
4<br />
4<br />
4->6<br />
4->6<br />
0->1<br />
0->1<br />
2->3<br />
0->1<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
BSCS06A<br />
00059<br />
00015<br />
00014<br />
00014<br />
00015<br />
00014<br />
CSC120<br />
CSC110<br />
MAT125<br />
CSC127<br />
CSC113<br />
CSC115<br />
001<br />
001<br />
001<br />
001<br />
001<br />
001<br />
B104B12<br />
B104B08<br />
B101A03<br />
B104B04<br />
B104B14<br />
B104B08<br />
4<br />
3<br />
4<br />
1<br />
3<br />
2<br />
0->2<br />
6->7<br />
4->6<br />
4->5<br />
2->3<br />
0->1<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
BSCS06B<br />
00016<br />
00014<br />
00059<br />
00015<br />
00016<br />
00016
77<br />
These results show that all constraints presented in section 1.3 have been<br />
satisfied. Every “course + section” is scheduled exactly once in a week. No course is<br />
scheduled cross morning and afternoon working sessions. Neither a class nor a<br />
lecturer nor a classroom is assigned to more than one course at the same time. For<br />
example, shown in Table 4-3, section 001 of course ENL308 is scheduled <strong>for</strong> lecturer<br />
00003 using classroom B201A03 on day 4 (Friday) and in the time-slots: 0, 1, and 2.<br />
There<strong>for</strong>e, this lecturer and this classroom are not booked <strong>for</strong> other courses at this<br />
time.<br />
Once a class of students studies from a list of courses, these courses have to be<br />
scheduled to different time periods. For example, shown in Table 4-1, class BSCS05B<br />
studies section 001 of courses: CSC215, CSC221, CSC210, and CSC211, and section<br />
002 of courses: ECE217, ECE218, and MAT220. There<strong>for</strong>e these “course + section”<br />
are scheduled to different time periods. Another example is shown in Table 4-3.<br />
Section 001 of course ENL307 is attended by both classes: BSCS04A and BSCS04B.<br />
There<strong>for</strong>e, this course section is scheduled to the same time periods and the same<br />
classroom so that these classes can attend it as well as their other courses.<br />
Other constraints presented in section 1.3 have also been satisfied, but they are<br />
not introduced here.<br />
The decentralized course scheduling program must give results that do not<br />
conflict with the centralized course scheduling output. If a class is scheduled by the<br />
centralized course scheduling program, then the decentralized course scheduling<br />
program has to schedule the remaining courses that concern this class to another time.<br />
For example, shown in Table 4-3, the centralized course scheduling program<br />
scheduled the courses that are attended by class BSEE06A. There<strong>for</strong>e, the<br />
decentralized course scheduling program scheduled other courses studied by this class<br />
to another time, shown in Table 4-4.
CHAPTER 5<br />
CONCLUSION<br />
5.1 Conclusions<br />
This study proposed a hybrid centralized and de-centralized approach, a <strong>genetic</strong><br />
<strong>algorithm</strong>, and a grid computing environment <strong>for</strong> course scheduling in <strong>multi</strong>ple<br />
faculty universities.<br />
The proposed GA demonstrated its ability <strong>for</strong> solving a complex optimization<br />
problem, the highly constrained course scheduling problem. The direct representation<br />
of chromosomes is convenient <strong>for</strong> representing a large number of constraints of a<br />
realistic timetable. Additional constraints can easily be added into the model without<br />
much modification on the basic model.<br />
The speed of evolution of the GA is significantly different dependent on GA<br />
parameters used. The GAs with large populations do not give a faster speed of<br />
convergence. However, in order to have diversity of solutions, it may be safe to keep<br />
the population size larger than an optimum size although it is a little slower. The<br />
experiments also show that the use of mutation is very important <strong>for</strong> the GA. A small<br />
enough rate is effective. No mutation or mutation with high rates gives a slower<br />
evolution. The weighting <strong>for</strong> hard and soft constraints in the fitness function should be<br />
based on the number and importance of them. The hard constraints should be<br />
weighted larger than the soft constraints.<br />
The hybrid centralized and de-centralized approach was used. The centralized<br />
course scheduling program only schedules <strong>for</strong> shared resources whereas the<br />
decentralized course scheduling program schedules <strong>for</strong> remaining resources of each<br />
faculty. The results showed that this approach gave the expected solutions without<br />
any constraint conflicts between resources around the university. The resultant<br />
solution can help lecturers not only teach at their faculty but also at other faculties. A<br />
course can be attended by many different classes.<br />
The grid computing environment is used as infrastructure <strong>for</strong> distributed and<br />
parallel computing. There is a combination of the hybrid centralized and de-
80<br />
centralized approach and grid computing environment. Now the centralized course<br />
scheduling program and decentralized course scheduling program are considered as<br />
jobs. These jobs are scheduled to be executed. The centralized course scheduling job<br />
is per<strong>for</strong>med first, and then the decentralized course scheduling jobs are per<strong>for</strong>med in<br />
parallel on separate machines. The decentralized course scheduling program must<br />
give results that do not conflict with the centralized course scheduling output.<br />
The use of the grid computing environment gave a high level of efficiency. It<br />
reduces significantly the overall execution time <strong>for</strong> a resultant solution. This is<br />
because a very large problem with many conflicted constraints is now separated into<br />
small size problems to be processed in parallel by many different machines instead of<br />
using only one machine.<br />
5.2 Future Works<br />
Overall, our preliminary experiments suggested that the proposed model has<br />
been successful to satisfy the <strong>objective</strong>s in our proposal. We have worked on two<br />
interesting areas: the <strong>genetic</strong> <strong>algorithm</strong> and the grid computing. They are wide areas,<br />
so what has been obtained is a foundation <strong>for</strong> further research.<br />
Our experiments identified the GA parameters <strong>for</strong> an effective GA. Further<br />
experiments should be done <strong>for</strong> various data and more soft constraints. We also need<br />
design <strong>algorithm</strong>s that are able to automatically identify suitable values <strong>for</strong> the GA<br />
parameters.<br />
Local search techniques should be used to improve the speed of the GA. The<br />
local search <strong>algorithm</strong>s should also help the GA to create solutions that are able to<br />
minimize use of university resources, e.g. the number of used classrooms and the<br />
stretch of lecturer time.<br />
To satisfy both hard and soft constraints in a balanced way, the <strong>multi</strong>-<strong>objective</strong><br />
<strong>genetic</strong> <strong>algorithm</strong> should be researched.<br />
The grid computing environment was implemented on Linux machines. For<br />
more flexible use, it should be developed <strong>for</strong> heterogeneous environments with more<br />
machines added.
REFERENCES<br />
1. Alkan, A. and Ozcan, E. “Memetic Algorithms <strong>for</strong> Timetabling.” IEEE Congress<br />
on Evolutionary Computation. 3 (2003, December 8-12) : 1796-1802.<br />
2. Marc Buf, Tim Fischer, et al. “Automated solution of a highly constrained school<br />
timetabling problem - preliminary results.” Applications of Evolutionary<br />
Computing : EvoWorkshops 2001: EvoCOP, EvoFlight, EvoIASP, EvoLearn,<br />
and EvoSTIM, Como, Italy. (2001, April 18-20) : 431-440.<br />
3 Goulas, G. and Housos, E. “SchedSP: Providing GRID-enabled Real - World<br />
Scheduling Solutions as Application Services.” EuroWeb 2002 Conference,<br />
St Anne's College, Ox<strong>for</strong>d, UK. (2002, December 17-18).<br />
4. Kaplansky, E., Kendall, G., et al. “Distributed Examination Timetabling.”<br />
PATAT '04 Proceedings of the 5th International Conference on the Practice<br />
and Theory of Automated Timetabling, Pittsburgh, PA USA. (2004, August<br />
18-20) : 511-516.<br />
5. Lim, A., Ang, J. C., et al. “UTTSExam: A Campus-Wide University Exam-<br />
Timetabling System”. Proceedings of the Eighteenth National Conference<br />
on Artificial Intelligence and Fourteenth Conference on Innovative<br />
Applications of Artificial Intelligence, Edmonton, Alberta, Canada. (2002,<br />
July 28 - August 1) : 838-844.<br />
6. Genetic Algorithm [Online]. Available from:<br />
http://cs.felk.cvut.cz/~xobitko/ga/gaintro.html [2005, May 2].<br />
7. Luis Ferreira, et al. Introduction to Grid Computing with Globus. IBM Redbooks,<br />
September 2003.<br />
8. Bart Jacob, et al. Enabling Applications <strong>for</strong> Grid Computing with Globus. IBM<br />
Redbooks, June 2003.<br />
9. Carter, M. W. and Laporte, G. “Recent Developments in Practical Course<br />
Timetabling.” In Edmund Burke and Michael Carter, editors, Practice and<br />
Theory of Automated Timetabling II, Springer-Verlag LNCS. 1408 (1998) :<br />
3-19.
82<br />
10. Carter, M. W. “A Survey of Practical Applications of Examination Timetabling<br />
Algorithms.” Operations Research. 34 (1986) : 193-202.<br />
11. Burke, E. K., Elliman, D. G., et al. “University Timetabling System Based on<br />
Graph Colouring and Constraint Manipulation.” Journal of Research on<br />
Computing in Education. 27(1) (1993) : 1-18.<br />
12. Burke, E. K., Dror, M., et al. “Hybrid Graph Heuristics within a Hyper-heuristic<br />
Approach to Exam Timetabling Problems.” The Next Wave in Computing,<br />
Optimization, and Decision Technologies. (2005) : 79-91.<br />
13. Redl, T. A. “A Study of University Timetabling that Blends Graph Coloring with<br />
the Satisfaction of Various Essential and Preferential Conditions.”<br />
PhD.Thesis, Rice University, Houston, Texas, 2004.<br />
14. Balakrishnan, N., Lucena, A. and Wong, R. T. “Scheduling Examinations to<br />
Reduce Second-Order Conflicts.” Computers & Operations Research. 19<br />
(1992) : 353-361.<br />
15. Arani, T. and Lotfi, V. “A Three Phased Approach to Final Exam Scheduling.”<br />
IIE Trans. 21 (1989) : 86-96.<br />
16. Sally C. Brails<strong>for</strong>d, Chris N. Potts, et al. ”Constraint Satisfaction Problems:<br />
Algorithms and Applications.” European Journal of Operational Research.<br />
119 (1999) : 557-581.<br />
17. White, G. M. “Constrained Satisfaction, Not So Constrained Satisfaction and the<br />
Timetabling Problem.” PATAT '00 Proceedings of the 3rd International<br />
Conference on the Practice and Theory of Automated Timetabling, Konstanz,<br />
Germany. 1 (2000, August 16-18) : 32-47.<br />
18. Valouxis, C. and Housos, E.. “Constraint Programming Approach <strong>for</strong> School<br />
Timetabling.” Computers & Operations Research. 30(1) (2003, September) :<br />
1555–1572.<br />
19. Gueret, C., Jussien, N., et al. “Building University timetables using Constraint<br />
Logic Programming.” Proceedings of the First International Conference on<br />
the Practice and Theory of Automated Timetabling (ICPTAT '95), France.<br />
(1995) : 393-408.
83<br />
20. Burke, E. K. and Newall, J. P. “A Multi-Stage Evolutionary Algorithm <strong>for</strong> the<br />
Timetable Problem.” The IEEE Transactions on Evolutionary Computation.<br />
3(1) (1999, April) : 63-74.<br />
21. Paechter, B., Rankin, R. C. and Cumming, A. “Improving a Lecture Timetabling<br />
System <strong>for</strong> University-Wide Use.” In: Burke, E., Carter, M. (eds.): The<br />
Practice and Theory of Automated Timetabling II: Selected Papers<br />
(PATAT ’97, University of Toronto), Lecture Notes in Computer Science,<br />
Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 156-165.<br />
22. Ross, P., Hart, E. and Corne, D. “Some Observations about GA based<br />
Timetabling.” In: Burke, E., Carter, M. (eds.): The Practice and Theory of<br />
Automated Timetabling II: Selected Papers (PATAT ’97, University of<br />
Toronto, Lecture Notes in Computer Science, Springer-Verlag, Berlin<br />
Heidelberg New York. 1408 (1998) : 115-129.<br />
23. Elmohamed, S., Coddington, P. and Fox., F. A. “Comparison of Annealing<br />
Techniques <strong>for</strong> Academic Course Scheduling.” In: Burke, E., Carter, M.<br />
(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />
Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />
Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 92-<br />
112.<br />
24. White, G. M. and Zhang, J. “Generating Complete University Timetables by<br />
Combining Tabu Search with Constraint Logic.” In: Burke, E., Carter, M.<br />
(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />
Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />
Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 187-<br />
210.<br />
25. Dowsland, K. A. “Off the Peg or Made to Measure.” In: Burke, E., Carter, M.<br />
(eds.): The Practice and Theory of Automated Timetabling II: Selected<br />
Papers (PATAT ’97, University of Toronto), Lecture Notes in Computer<br />
Science, Springer-Verlag, Berlin Heidelberg New York. 1408 (1998) : 37-52.<br />
26. Elmohamed, S., et al. “A Comparison of Annealing Techniques <strong>for</strong> Academic<br />
Course Scheduling.” Lecture Notes in Computer Science. 1408 (1998) : 92-<br />
114.
84<br />
27. Abramson, D. “Constructing School Timetables using Simulated Annealing:<br />
Sequential and Parallel Algorithms.” Management Science. 37(1) (1991,<br />
January) : 98 – 113.<br />
28. Aydin, M. E. “A Distributed Evolutionary Simulated Annealing Algorithm <strong>for</strong><br />
Combinatorial Optimisation Problems.” Journal of Heuristics. 10 (2004) :<br />
269–292.<br />
29. Calaor, A. E., Hermosilla, A.Y., et al. “Parallel Hybrid Adventures with<br />
Simulated Annealing and Genetic Algorithms.” Proceedings of the<br />
International Symposium on Parallel Architectures, Algorithms and<br />
Networks (ISPAN.02). (2002, May 22-24) : 33-38.<br />
30. Alvarez-valdes, R. “A Tabu Search Algorithm to Schedule University<br />
Examinations.” QUESTIIO. 21 (1997) : 201-215.<br />
31. Burke, E. K., Kendall, G. and Soubeiga, E. “Tabu-Search Hyperheuristic <strong>for</strong><br />
Timetabling and Rostering.” Journal of Heuristics. 9 (2003) : 451–470.<br />
32. Tabu Search [Online]. Available from:<br />
http://www.cs.sandia.gov/opt/survey/ts.html [2005, September 12].<br />
33. Wang, Y. Z. “A GA-based methodology to determine an optimal curriculum <strong>for</strong><br />
schools.” Expert Systems with Applications. 28 (2005) : 163–174.<br />
34. Tuan, D. A. and Kim, H. L. “Combining Constraint Programming and Simulated<br />
Annealing on University Exam Timetabling.” International Conference,<br />
RIVF’04, Hanoi, Vietnam. (2004, February 2-5) : 205-210.<br />
35. Kaplansky, E. and Meisels, A. “Negotiation among Scheduling Agents <strong>for</strong><br />
Distributed Timetabling.” In Submitted to the 5th International Conference<br />
on the Practice and Theory of Automated Timetabling PATAT'04, Pittsburgh,<br />
PA USA. (2004, August) : 84-105.<br />
36. Marczyk, A. Genetic Algorithms and Evolutionary Computation [Online].<br />
Available from: http://www.talkorigins.org/faqs/genalg/genalg.html [2005,<br />
September 18].<br />
37. Esposito, A. and Tarricone, L. “Grid Computing <strong>for</strong> Electromagnetics: A<br />
Beginner’s Guide with Applications.” IEEE Antennas and Propagation<br />
Magazine. 45(2) (2003, April) : 91-100.
85<br />
38. Globus Toolkit [Online]. Available From: http://www.globus.org [2005,<br />
September 20].<br />
39. Foster, I., Kesselman, C. and Tuecke, S. “The Anatomy of the Grid: Enabling<br />
Scalable Virtual Organizations.” International Journal of High Per<strong>for</strong>mance<br />
Computing Applications. 15(3) (2001) : 200-222.<br />
40. Hamscher, V., Schwiegelshohn, U., et al. “Evaluation of Job-Scheduling<br />
Strategies <strong>for</strong> Grid Computing.” In Proceedings of the 7th International<br />
Conference on High Per<strong>for</strong>mance Computing HiPC-2000, Springer, Berlin,<br />
Lecture Notes in Computer Science LNCS 1971, Bangalore, Indien. (2000,<br />
December) : 192-202.
APPENDIX A<br />
DATA DICTIONARY
88<br />
This section presents the structure of the tables in the database that is created <strong>for</strong><br />
the entity relationship diagram shown in Figure 3-5.<br />
A.1 Faculty<br />
TABLE A-1 Faculty<br />
Table: Faculty<br />
Field Type Key Description<br />
FacultyID char(2) Primary ID of faculty<br />
FacultyName char(100) Name of faculty<br />
The university has several faculties, e.g. Faculty of Science, Faculty of<br />
Engineering, and Faculty of Education.<br />
A.2 Department<br />
TABLE A-2 Department<br />
Table: Department<br />
Field Type Key Description<br />
DeptID char(3) Primary ID of department<br />
DeptName char(255) Name of department<br />
FacultyID char(2) Foreign ID of faculty<br />
Each faculty has several departments that include a set of lecturers and courses<br />
within the same scientific domain, e.g. Department of Computer Science, Department<br />
of Mathematics, and Department of Physics.
89<br />
A.3 Lecturer<br />
TABLE A-3 Lecturer<br />
Table: Lecturer<br />
Field Type Key Description<br />
LecturerID char(5) Primary ID of lecturer<br />
LecturerName char(40) Name of lecturer<br />
Gender char(1) Gender of lecturer<br />
DeptID char(3) Foreign ID of department<br />
Lecturers are responsible <strong>for</strong> teaching several courses. Each lecturer is member<br />
of a department.<br />
A.4 Busy Time<br />
TABLE A-4 Busy Time<br />
Table: BusyTime<br />
Field Type Key Description<br />
LecturerID char(5) Primary ID of lecturer<br />
Day int(2) Date in a week<br />
Workingsession int(2) Working session in a day<br />
State int(1) State of lecturer<br />
Not all working sessions of a day in each week are available to be scheduled <strong>for</strong><br />
a lecturer. For instance, Mr. Tim cannot teach on every Monday morning because of<br />
weekly meeting. Some other lecturers dislike teaching in some working sessions. For<br />
instance, Miss Mary dislikes teaching on Friday mornings. Based on data stored in the<br />
BusyTime table, the system tries to satisfy lecturers’ desires. A state has one of the<br />
following three states: 0, 1, or 2. The value of 2 presents a available working session.<br />
The value of 1 is used if the lecturer dislikes teaching at this time (soft constraint).<br />
Finally, the value of 0 is used if the lecturer cannot teach at this time (hard constraint).
90<br />
A.5 Building<br />
TABLE A-5 Building<br />
Table: Building<br />
Field Type Key Description<br />
BuildingID char(2) Primary ID of building<br />
BuildingName char(100) Name of building<br />
The university has several buildings that have a number of classrooms.<br />
A.6 Classroom<br />
TABLE A-6 Classroom<br />
Table: Classoom<br />
Field Type Key Description<br />
ClassroomID char(7) Primary ID of classroom<br />
ClassroomName char(10) Name of classroom<br />
Seats int(3) Number of seats<br />
BuildingID char(2) Foreign ID of building<br />
ClasssroomGroupID char(8) Foreign ID of classroom group<br />
A classroom in a building belongs to a certain classroom group.<br />
A.7 Classroom Group<br />
TABLE A-7 Classroom group<br />
Table: ClassroomGroup<br />
Field Type Key Description<br />
ClassroomGroupID char(8) Primary ID of classroom group<br />
ClassroomGroupName char(100) Name of classroom group
91<br />
Classrooms are grouped into groups. A course is scheduled to a classroom of<br />
designated groups. For instance, course ECE218 (Digital Circuit Design Lab) is only<br />
expected to be scheduled to group ECEDCDLB (Digital Circuit Design Labs).<br />
A.8 Department Controls Rooms<br />
TABLE A-8 Department controls classroom<br />
Table: DeptControlRoom<br />
Field Type Key Description<br />
DeptID char(3) Primary ID of department<br />
ClassroomGroupID char(8) Primary ID of classroom group<br />
A department owns a number of classroom groups that are used <strong>for</strong> its courses.<br />
A.9 Course<br />
TABLE A-9 Course<br />
Table: Course<br />
Field Type Key Description<br />
CourseID char(6) Primary ID of course<br />
CourseName char(80) Name of course<br />
Credits int(2) Number of credits<br />
Kind char(1) Kind : lecture or practice<br />
DeptID char(3) Foreign ID of a department<br />
A course belongs to a department.
92<br />
A.10 Program<br />
TABLE A-10 Program<br />
Table: Program<br />
Field Type Key Description<br />
ProgramID char(4) Primary ID of program<br />
ProgramName char(255) Name of program<br />
NumSemesters int(2) Number of semesters<br />
DeptID char(3) Foreign ID of department<br />
The university has a number of programs. After studying a program that<br />
includes a number of courses, a student will get a degree, e.g. Bachelor of Science in<br />
Computer Science. A program belongs to a department.<br />
A.11 Curriculum<br />
TABLE A-11 Curriculum<br />
Table: Curriculum<br />
Field Type Key Description<br />
ProgramID char(4) Primary ID of program<br />
CourseID char(6) Primary ID of course<br />
Semester int(2) Semester has this course<br />
Year int(4) Enrolment year of students<br />
<strong>for</strong> applying this curriculum<br />
To take a degree a student has to fulfill a list of courses in each semester. For<br />
instance, in the first semester, students of Bachelor of Science in Computer Science<br />
take courses: ENL101, CSC110, CSC113, MAT125, CSC115, CSC120, and CSC127.<br />
A curriculum is applied to students based on their enrolment year.
93<br />
A.12 Class<br />
TABLE A-12 Class<br />
Table: Class<br />
Field Type Key Description<br />
ClassID char(7) Primary ID of class<br />
ClassName char(100) Name of class<br />
NumStudents int(3) Number of students<br />
EnrolYear int(4) Enrolment year<br />
ProgramID char(4) Foreign ID of program<br />
Students who study the same program and have the same enrolment year are<br />
grouped into classes.<br />
A.13 Course Section<br />
TABLE A-13 Course section<br />
Table: CourseSection<br />
Field Type Key Description<br />
ClassID char(7) Primary ID of class<br />
Semester int(2) Primary Current semester<br />
Year int(4) Primary Current year<br />
CourseID char(6) Primary ID of course<br />
SectionNo char(3) Section number<br />
LecturerID char(5) ID of lecturer<br />
NumStudents char(4) Number of student<br />
A section is used as an instance of a course taught by a lecturer. “A section of a<br />
course + a lecturer + an estimated number of attended students” is that we will<br />
schedule to time-slots of a certain classroom.
94<br />
A.14 Timetable<br />
TABLE A-14 Timetable<br />
Table: Timetable<br />
Field Type Key Description<br />
RoomID char(7) Primary ID of room<br />
Day int(2) Primary Day in a week<br />
Hour int(2) Primary Hour in a day<br />
CourseSectionID char(9) CourseID+ SectionID<br />
Although this timetable looks simple, it stores the results from the whole course<br />
scheduling system. A section of a course will be schedule to successive time-slots.
APPENDIX B<br />
INSTALLING GRID ENVIRONMENT
96<br />
This section presents in detail steps <strong>for</strong> installing and setting up the grid<br />
environment that includes Red Hat Linux, Network Time Protocol, Globus, and a<br />
Certificate Authority.<br />
The following topics are discussed:<br />
- Required software<br />
- Hardware environment<br />
- Operating system installation<br />
- Globus installation and setup<br />
- CA installation and setup<br />
B.1 Required Software<br />
Globus Toolkit 2.2 is used in this study. Globus Toolkit 2.x supports Red Hat<br />
Linux on xSeries and AIX on pSeries. We select Red Hat Linux 9.0 as our host<br />
operating system.<br />
The below is the list of required files to be downloaded:<br />
- Globus Packaging Technology: gpt-2.2.2-src.tar.gz<br />
- Globus client: globus-all-client-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />
- Server bundle: globus-all-server-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />
- Certificate Authority: globus_simple_ca_bundle-0.9.tar.gz<br />
- Network Time Protocol (NTP): ntp-4.1.1-1.i386.rpm<br />
Place these files in the directory /usr/src. These Globus files can be downloaded<br />
from the address: ftp://ftp.globus.org/pub/gt2/2.2/.<br />
The NTP package already is installed in Red Hat Linux 9.0, so we do not need<br />
to download and install it. However, <strong>for</strong> other versions of Linux, we have to set up the<br />
NTP on hosts.<br />
B.2 Setting Up the Environment<br />
An Ethernet LAN and three Intel Pentium machines were used to build the grid<br />
environment. Figure 3-23 presents this environment with the host names and<br />
functions to be installed in each machine.<br />
The host names are m1, m2, and m3. The machines should have a clock speed<br />
of at least 500 Mhz, at least 128 MB of memory, and at least 8 GB hard drives.
97<br />
There are dependencies among steps of installing and setting up, so they require<br />
to be per<strong>for</strong>med in the order.<br />
The major steps to set up the grid environment include installing:<br />
- Red Hat Linux 9.0 on each machine<br />
- Network Time Protocol server on one machine (here we use m2) and<br />
configuring NTP clients <strong>for</strong> the others (m1 and m3)<br />
- Globus Packaging Technology on each machine<br />
- Globus Server on the m2 and m3 machines<br />
- Globus Client on m1<br />
- Globus Simple Certificate Authority on m2<br />
The grid is configured using the below major steps:<br />
- Sign the certificate requests from all components and users needing them<br />
- Set up gridmap files <strong>for</strong> each system<br />
- Set up automated grid startup<br />
- Set up each GRIS to talk to one GIIS<br />
- Set up MDS security<br />
B.2.1 Naming and Addressing Planning<br />
The Table B-1 shows names, IP addresses, and software to be installed on<br />
machines.<br />
TABLE B-1 Host names, IP addressing, and software<br />
Host name IP Software<br />
m1.kmitnb.ac.th 192.168.10.241 Globus client, centralized scheduling program, MySQL 4.0<br />
m2.kmitnb.ac.th 192.168.10.242 Globus server, CA, and NTP server<br />
m3.kmitnb.ac.th 192.168.10.243 Globus server<br />
We also define the user IDs, groups, and passwords be<strong>for</strong>e implementation,<br />
shown in Table B-2.<br />
The root and globususer ID are used on all machines. Some machines have no<br />
password <strong>for</strong> snobol and adminca ID because the corresponding machine does not<br />
have that user ID installed on it.
98<br />
TABLE B-2 Group, user ID and password<br />
User ID Group ID m1 password m2 password m3 password<br />
Root Root pwrtm1 pwrm2 pwrm3<br />
globususer globus pwgbm1 pwgm2 pwgm3<br />
snobol snobol pwsbm1<br />
adminca adminca pwamm2<br />
The globususer ID is used to run jobs on the grid <strong>for</strong> the user. Since this user ID<br />
has more than eight characters, we will need to install it later, rather than installing it<br />
as part of the Linux install process. The other user IDs can be installed as part of the<br />
Linux installation or later.<br />
The snobol ID is used to submit jobs to the grid.<br />
The adminca ID is used to receive certificate requests <strong>for</strong> the Certificate<br />
Authority. The adminca ID could be used to ftp the certificate requests to the machine<br />
m2 in our install. The certificates will be signed using the root ID on machine m2.<br />
Be<strong>for</strong>e installing the Globus Simple Certificate Authority, we must define the<br />
distinguished name (DN) that will be used by the CA in our environment. Table B-3<br />
describes the distinguished name used <strong>for</strong> the Certificate Authority in our<br />
environment. The distinguished names <strong>for</strong> the users and <strong>for</strong> the Globus services will<br />
be generated automatically.<br />
TABLE B-3 Distinguished name and passphrase<br />
Certificate Authority DN<br />
cn=my test CA, ou=m2.kmitnb.ac.th, ou=demotest, o=grid<br />
Passphrase<br />
mycapw<br />
The distinguished name (DN) and passphrase will be used by the Certificate<br />
Authority to sign certificate requests.<br />
B.2.2 Installing Linux<br />
Install Linux on all of the machines using the “server” install, selecting all<br />
packages and “no firewall”. Each system should use a fixed network IP address with a<br />
corresponding host name, given in Table B-1, and do not use DHCP.<br />
After installing Linux on each system, we create user IDs in Table B-2. The<br />
below is an example of how to add the globususer ID on machine m1.
99<br />
Add a group <strong>for</strong> globus by executing:<br />
groupadd -g 900 globus<br />
Add the user globususer (with password globususer) by executing:<br />
adduser -u 900 -g globus -d /home/globususer -n globususer<br />
Change the globususer ID’s password from globususer to pwsbm1 or other<br />
password by executing:<br />
passwd globususer<br />
B.2.3 Installing Network Time Protocol (NTP)<br />
NTP needs to be installed because the grid needs the clocks on the systems to be<br />
synchronized. The security process creates proxy certificates that are valid <strong>for</strong> specific<br />
times. If the systems do not have their clocks synchronized, then the users may not be<br />
able to use the grid, because the proxy certificates may look like they have expired or<br />
are not yet valid.<br />
On all of the grid machines, install NTP as follows using the root ID:<br />
$ rpm -ivh /usr/src/ntp-4.1.1-1.i386.rpm<br />
If the package is already installed as a part of the Linux distribution, ignore the<br />
error message and continue to set up the NTP server. Proceed by setting up the server<br />
and daemons.<br />
Edit the file /etc/ntp.conf on the machine designated to be the time server,<br />
machine m2, and leave the following four lines as the only un-commented ones,<br />
commenting all others with a leading “#” character:<br />
server 127.127.1.0 # local clock<br />
fudge 127.127.1.0 stratum 10<br />
driftfile /etc/ntp/drift<br />
broadcastdelay 0.008<br />
Also, on the NTP server machine (m2), use the settings ntsysv command to<br />
enable the NTP daemon (ntpd) on the next reboot. We can also start the Red Hat<br />
Service Configuration tool by clicking on Main Menu System Setting Server<br />
Setting Services. Scroll down the list of services on the left side until we get to the<br />
ntpd service. Click on the ntpd service and click Start to run it.<br />
On the other machines in the grid (m1 and m3), change the file /etc/ntp.conf to<br />
leave only the following lines un-commented:<br />
server m2.kmitnb.ac.th<br />
driftfile /etc/ntp/drift
100<br />
broadcastdelay 0.008<br />
authenticate no<br />
Next, execute the following command to have them check <strong>for</strong> the time from the<br />
above server machine m2:<br />
ntpdate -b m2.kmitnb.ac.th<br />
This should be executed at least once per boot, and could be set up to run<br />
periodically using crond and crontab.<br />
B.2.4 Setting Up Host Files and Environment Variables on Each Machine<br />
As root, use an editor to edit the hosts file /etc/hosts on each machine with the<br />
following lines:<br />
127.0.0.1 localhost<br />
192.168.10.241 m1.kmitnb.ac.th m1<br />
192.168.10.242 m2.kmitnb.ac.th m2<br />
192.168.10.243 m3.kmitnb.ac.th m3<br />
Verify machine connectivity after the next reboot, using the ping command to<br />
ping each of the other machines by name.<br />
Edit the file /etc/profile in each machine. Insert the following three lines after<br />
the line in /etc/profile that says “export PATH USER ...”:<br />
export GPT_LOCATION=/usr/local/gpt<br />
export GLOBUS_LOCATION=/usr/local/globus<br />
export PATH=$PATH:$GLOBUS_LOCATION/bin:$GLOBUS_LOCATION/sbin<br />
Log off and log back on the machines after modifying the file /etc/profile so that<br />
the above settings take effect.<br />
B.2.5 Installing the GPT<br />
Log on as root and install GPT on all of the machines. Please ignore all<br />
warnings from Globus:<br />
cd /usr/src<br />
tar -xzvf gpt-2.2.2-src.tar.gz<br />
cd gpt-2.2.2<br />
./build_gpt<br />
ls ${GPT_LOCATION}/sbin | wc -l<br />
The final ls command should show 29 gpt-* executable files.<br />
B.2.6 Installing a Globus Server Bundle<br />
The following is used to install the server bundle on each server machine.<br />
Per<strong>for</strong>m these steps on each machine that will be a server. In our demo, we will use<br />
machines m2 and m3 as servers.
101<br />
As root, run:<br />
cd /usr/src<br />
export PATH=$PATH:$GPT_LOCATION/sbin<br />
gpt-install globus-all-server-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />
gpt-postinstall<br />
/usr/local/globus/setup/globus/setup-gsi<br />
y<br />
q<br />
B.2.7 Installing a Globus Client Bundle<br />
The following is used to install the client bundle on any machines that will be<br />
used to query or submit jobs to the grid. In our application, we will install the client<br />
on the machine m1.<br />
As root, run:<br />
cd /usr/src<br />
export PATH=$PATH:$GPT_LOCATION/sbin<br />
gpt-install globus-all-client-2.2.3-i686-pc-linux-gnu-bin.tar.gz<br />
gpt-postinstall<br />
/usr/local/globus/setup/globus/setup-gsi<br />
y<br />
q<br />
B.2.8 Installing the Globus Simple Certificate Authority<br />
To install the Globus Simple Certificate Authority, one of the Globus bundles<br />
(server or client) needs to be installed on the machine due to a dependency. We will<br />
install the CA and a Globus server on the machine m2.<br />
As root, run:<br />
cd /usr/src<br />
export PATH=$PATH:$GPT_LOCATION/sbin<br />
gpt-build -nosrc gcc32<br />
gpt-build globus_simple_ca_bundle-0.9.tar.gz gcc32<br />
gpt-postinstall<br />
...<br />
Do you want to keep this as the CA subject (y/n) [y]: n<br />
Enter a unique subject name <strong>for</strong> this CA:<br />
cn=my test CA, ou=m2.kmitnb.ac.th, ou=demotest, o=grid<br />
Enter the email of the CA:<br />
adminca@m2.kmitnb.ac.th<br />
[default 5 years] 1825
102<br />
mycapw<br />
[enter]<br />
During the above process, a hash number is generated and used as part of the<br />
file name. Please note this number <strong>for</strong> use in the next steps. Run the script name<br />
printed at the end of the prior install, substituting the hex hash number printed by the<br />
above process in place of the shown below, adding the “-default” argument:<br />
/usr/local/globus/setup/globus_simple_ca__setup/setup-gsi -default<br />
y<br />
q<br />
The file /root/.globus/simpleCA/private/cakey.pem is the CA’s private key and<br />
should not be given out to anyone else. The file /root/.globus/simpleCA/cacert.pem<br />
contains the CA’s public key.<br />
The following is used to install the CA’s certificate on each of the other grid<br />
machines. /root/.globus/simpleCA/globus_simple_ca__setup-0.9.tar.gz is the<br />
file containing the public CA key and other in<strong>for</strong>mation needed to participate in this<br />
grid. This must be copied to each of the other machines and installed using the gptbuild<br />
command.<br />
First, on machine m2, use ftp to copy the file<br />
/root/.globus/simpleCA/globus_simple_ca__setup-0.9.tar.gz to the directory<br />
/usr/src/ of each of the other grid machines. This can be done in two steps by ftp-ing<br />
them to the directory /home/globususer on each of those machines using globususer<br />
ID. Then, using root, this file can be moved to the directory /usr/src. Next, issue the<br />
following commands on each of those machines as root:<br />
gpt-build /usr/src/globus_simple_ca__setup-0.9.tar.gz<br />
gpt-postinstall<br />
/usr/local/globus/setup/globus_simple_ca__setup/setup-gsi -default<br />
y<br />
q<br />
B.2.9 Requesting and Signing Gatekeeper Certificates <strong>for</strong> Servers<br />
On each of the server machines (m2 and m3), we per<strong>for</strong>m the following steps to<br />
request and sign certificates:<br />
grid-cert-request -host <br />
Use ftp or e-mail (if available and using the adminca ID) to copy the file<br />
/etc/grid-security/hostcert_request.pem to the CA machine and put it into the directory<br />
/root. On the CA machine, as root, sign the certificate using the following:
103<br />
grid-ca-sign -in /root/hostcert_request.pem -out /root/hostcert.pem<br />
mycapw<br />
Then, ftp the file /root/hostcert.pem back to the server machine and place it in<br />
the directory /etc/grid-security.<br />
B.2.10 Requesting and Signing User Certificates<br />
For each user who will use the grid (in our example, user snobol on the client<br />
machine m1), the following procedure must be executed by the user and Certificate<br />
Authority. On the snobol user’s logon, run:<br />
grid-cert-request<br />
<br />
<br />
The user should make up his own passphrase <strong>for</strong> his certificate. He will use this<br />
same passphrase later with the grid-proxy-init command to authenticate with the<br />
grid. In our example, the snobol user’s login password could be used here.<br />
The user must then send the file /home//.globus/usercert_request.pem<br />
to the Certificate Authority (machine m2) <strong>for</strong> signing. On the CA machine (m2), sign<br />
the certificate using root with the following command, adjusting the location of<br />
usercert_request.pem to point to wherever the above request file is now stored on m2:<br />
grid-ca-sign -in usercert_request.pem -out usercert.pem<br />
mycapw<br />
Securely send the file usercert.pem back the requesting user. The user should<br />
put the file usercert.pem into his /home//.globus directory.<br />
The user should also be added to the grid-mapfile (on machine m2 under root)<br />
using the following command (note the backward apostrophe characters next to the<br />
double quote characters):<br />
grid-mapfile-add-entry -dn “`grid-cert-info -f usercert.pem –subject`” –ln globususer<br />
Copy grid-mapfile in /etc/grid-security/grid-mapfile to each of the other servers<br />
(m2) so that all of the servers have this file.<br />
B.2.11 Setting Up the Gatekeepers<br />
On each server (m2 and m3), add the following two lines to the file<br />
/etc/services:<br />
gsigatekeeper 2119/tcp #globus gatekeeper<br />
gsiftp 2811/tcp #globus wuftp<br />
Create the file /etc/xinetd.d/gsigatekeeper on each server, containing the lines:
104<br />
service gsigatekeeper<br />
{<br />
socket_type = stream<br />
protocol = tcp<br />
wait = no<br />
user = root<br />
env = LD_LIBRARY_PATH=/usr/local/globus/lib<br />
server = /usr/local/globus/sbin/globus-gatekeeper<br />
server_args = -conf /usr/local/globus/etc/globus-gatekeeper.conf<br />
disable = no<br />
}<br />
Create the file /etc/xinetd.d/gsiftp on each server, containing the lines:<br />
service gsiftp<br />
{<br />
instances = 1000<br />
socket_type = stream<br />
wait = no<br />
user = root<br />
env = LD_LIBRARY_PATH=/usr/local/globus/lib<br />
server = /usr/local/globus/sbin/in.ftpd<br />
server_args = -l -a -G /usr/local/globus<br />
log_on_success += DURATION USERID<br />
log_on_failure += USERID<br />
nice = 10<br />
disable = no<br />
}<br />
Now reboot all of the machines.<br />
B.3 Setting Up the MDS<br />
We will configure the Monitoring and Discovery Service (MDS) to have one<br />
Grid In<strong>for</strong>mation Index Service (GIIS) in the machine m2, which collects the data<br />
reported by the Grid Resource In<strong>for</strong>mation Servers (GRIS) in all of the machines.<br />
The GRIS servers send in<strong>for</strong>mation about their respective servers to the GIIS. In<br />
the demo application, we will use this to find machines that are not too busy. The user<br />
will be able to query the GIIS from the client machine m1.
105<br />
To set up this structure, we need to modify several configuration files. These<br />
files name the GIIS and GRIS, and show how these components should register with<br />
each other.<br />
Figure 3-24 presents the relationship among the MDS components in our<br />
application.<br />
B.3.1 Setting Up the GIIS and GRIS on the Machine m2<br />
On m2, make the following modifications to the conf files in the directory<br />
$GLOBUS_LOCATION/etc.<br />
In the file grid-info-slapd.conf, name the GIIS on machine m2. Change the<br />
second of the lines:<br />
to<br />
to<br />
database giis<br />
suffix “Mds-Vo-name=site, o=Grid”<br />
database giis<br />
suffix “Mds-Vo-name=m2.kmitnb.ac.th, o=Grid”<br />
In the file grid-info-site-policy.conf, allow registrations from the domain.<br />
Change the below line:<br />
policydata: (&(Mds-Service-hn=site) (Mds-Service-port=2135))<br />
policydata: (&(Mds-Service-hn=*.kmitnb.ac.th) (Mds-Service-port=2135))<br />
In the file grid-info-resource-register.conf, tell the m2 GRIS to register with the<br />
m2 GIIS. Change the two matching lines to the settings shown below:<br />
dn: Mds-Vo-Op-name=register, Mds-Vo-name=m2.kmitnb.ac.th, o=grid<br />
reghn: m2.kmitnb.ac.th<br />
B.3.2 Setting Up the GRIS on m3<br />
On all of the other server machines (here we have only m3), make the following<br />
modifications to the conf files in the directory $GLOBUS_LOCATION/etc.<br />
In the file grid-info-slapd.conf, remove the GIIS server from these machines.<br />
Remove the block of lines starting with the following lines:<br />
database giis<br />
suffix “Mds-Vo-name=site, o=Grid”<br />
In the file grid-info-resource-register.conf, tell the GRIS which GIIS to register<br />
with. Change the two matching lines as shown below:<br />
dn: Mds-Vo-Op-name=register, Mds-Vo-name=m2.kmitnb.ac.th, o=grid<br />
reghn: m2.kmitnb.ac.th
106<br />
B.3.3 Starting the MDS on All of the Servers<br />
Start the MDS on all of the servers (m2 and m3) using:<br />
globus-mds start<br />
This can be automated by putting it in /etc/rc.d/rc.5 per the usual conventions.<br />
Copy the globus-mds script into the directory /etc/init.d/. Then create two symbolic<br />
links as follows:<br />
cp $GLOBUS_LOCATION/sbin/globus-mds /etc/init.d/<br />
cd /etc/rc.d/rc5.d/<br />
ln -s /etc/init.d/globus-mds S92globus-mds<br />
ln -s /etc/init.d/globus-mds K92globus-mds<br />
B.3.4 Setting Up the MDS Client m1<br />
Modify the file $GLOBUS_LOCATION/etc/grid-info.conf lines shown below<br />
so that searches go to the GIIS on machine m2:<br />
GRID_INFO_HOST=”m2.kmitnb.ac.th”<br />
GRID_INFO_ORGANIZATION_DN=”Mds-Vo-name=m2.kmitnb.ac.th, o=Grid”<br />
B.3.5 Setting Up a Secure MDS<br />
So far, we have set up an MDS that permits anonymous access. The grid-infosearch<br />
command should use the -x flag to indicate an anonymous search request.<br />
However, the MDS can be secured so that only certified users can access the GIIS and<br />
only certified server GRISs can register to send in<strong>for</strong>mation to the GIIS. The<br />
following steps should be per<strong>for</strong>med.<br />
B.3.5.1 Requesting and Signing Certificates <strong>for</strong> Each Server Machine<br />
For each of the server machines (m2 and m3) request LDAP certificates, sign<br />
them using the Certificate Authority on m2, and copy the signed certificates to the<br />
proper location. The steps <strong>for</strong> one of the servers (m3) are shown below.<br />
On the server machine (m3) under root, run:<br />
grid-cert-request -service ldap -host m3.kmitnb.ac.th<br />
Copy the request certificate from /etc/grid-security/ldap/ldapcert_request.pem to<br />
the Certificate Authority machine (m2) using ftp or any other desired method. Sign<br />
the certificate using root on m2 substituting the correct locations <strong>for</strong> the request<br />
certificate and signed certificates:<br />
grid-ca-sign -in ldapcert_request.pem -out ldapcert.pem
107<br />
Copy the resulting signed certificate file ldapcert.pem from the Certificate<br />
Authority machine (m2) to the file the server machine (m3) location /etc/gridsecurity/ldap/ldapcert.pem.<br />
B.3.5.2 Changing the conf Files<br />
Change the following configuration files on the servers.<br />
Change $GLOBUS_LOCATION/etc/grid-info-slapd.conf to change the<br />
anonymousbind setting(s) as follows:<br />
anonymousbind yes<br />
Change the files $GLOBUS_LOCATION/etc/grid-info-resource-register.conf<br />
on the servers to require authentication when registering:<br />
bindmethod: ANONYM-ONLY<br />
At this point, the registration "authentication" bind method has been specified.<br />
Who can register with whom and how, but when anonymous bind has been<br />
deactivated, each registrant node must be in<strong>for</strong>med that the GIIS (m2) is authorized to<br />
receive resource in<strong>for</strong>mation.<br />
To authorize m2 (the GIIS) to receive registration in<strong>for</strong>mation, m2's ldap<br />
subject name must be entered in the grid-mapfile file. To get m2's ldap subject name,<br />
we run "grid-cert-info" on m3 as follows, in directory /etc/grid-security, with the<br />
assumption that m3's ldap subject name would be similar.<br />
% grid-cert-info -f /etc/grid-security/ldap/ldapcert.pem -subject<br />
The name was<br />
/O=grid/OU=demotest/OU=m2.kmitnb.ac.th/CN=ldap/m3.kmitnb.ac.th<br />
Since direct editing of the grid-mapfile is discouraged, we run the following<br />
command using the name obtained from above, substituting "m2" <strong>for</strong> "m3."<br />
% grid-mapfile-add-entry \<br />
-dn "/O=grid/OU=demotest/OU=m2.kmitnb.ac.th/CN=ldap/ m2.kmitnb.ac.th" \<br />
-ln globususer<br />
Successful entry was indicated with the following string returned:<br />
(1) entry added<br />
After making all of these changes, the server machines should be rebooted or<br />
the following should be used to restart the MDS on each of the servers (m2 and m3):<br />
globus-mds stop<br />
globus-mds start
108<br />
B.4 Checking the Installation<br />
To check the installations on each machine, as root use the command:<br />
$GPT_LOCATION/sbin/gpt-verify<br />
The following commands can be used on a server machine to see if the GRAM<br />
and GridFTP are listening on their respective ports:<br />
netstat -an | grep 2119<br />
netstat -an | grep 2811<br />
From the client machine (m1) logged on as the user snobol, do the following:<br />
This command sets up the environment so that Globus commands can be issued<br />
by the user. One may want to add this line to one’s login profile:<br />
. $GLOBUS_LOCATION/etc/globus-user-env.sh<br />
This command refreshes the proxy certificate <strong>for</strong> the user (snobol):<br />
grid-proxy-init<br />
<br />
The following commands send a simple job to the server machine. This test<br />
whether jobs can be submitted to each of the server machines:<br />
globus-job-run m2.kmitnb.ac.th “/bin/hostname”<br />
globus-job-run m3.kmitnb.ac.th “/bin/hostname”<br />
To refine the search to look <strong>for</strong> processors having more than 90 percent free of<br />
CPU utilization <strong>for</strong> the last minute, use:<br />
grid-info-search -x “(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=90))”<br />
Now we are ready to install and run the course scheduling application.
APPENDIX C<br />
INSTALLING SOFTWARE
110<br />
This section introduces the steps <strong>for</strong> installing and setting up MySQL 4.0,<br />
J2sdk1.4, Java Cog Kit 1.1, Tomcat 5.0, mod_jk2 and JDBC driver on Redhat Linux<br />
9.0 (RH9). In this study, we will install this software on machine m1.<br />
C.1 Installing MySQL 4.0<br />
First, make sure there is no previous version of MySQL installed on the system.<br />
As root execute the command:<br />
#rpm –q mysql<br />
If there is none, proceed to install phase, otherwise uninstall it by the command:<br />
#rpm –e mysql<br />
Download the rpm packages <strong>for</strong> MySQL’s server, client and dynamic shared<br />
libraries:<br />
- MySQL-server-4.0.24-0.i386.rpm<br />
- MySQL-client-4.0.24-0.i386.rpm<br />
- MySQL-shared-4.0.24-0.i386.rpm<br />
- MySQL-devel-4.0.24-0.i386.rpm<br />
Then install them one by one by using the following commands as root:<br />
#rpm -ivh MySQL-server-4.0.24-0.i386.rpm<br />
#rpm -ivh MySQL-client-4.0.24-0.i386.rpm<br />
#rpm -ivh MySQL-shared-4.0.24-0.i386.rpm<br />
#rpm -ivh MySQL-devel-4.0.24-0.i386.rpm<br />
The MySQL database has been created in /var/lib/mysql.<br />
Initialize MySQL database after installation by typing:<br />
#mysql_install_db<br />
Do not <strong>for</strong>get to set the mysqlclient.so path into search path file /etc/ld.so.conf.<br />
For example, we have:<br />
/usr/lib/libmysqlclient.so<br />
Make sure /etc/ld.so.conf contains:<br />
/usr/lib<br />
Then run<br />
#/usr/sbin/ldconfig<br />
The following instructions are to change the default empty password <strong>for</strong><br />
MySQL users to what we like. For example, change the empty password to ncdanh:<br />
#/usr/bin/mysqladmin –u root password ncdanh<br />
Now, try to log in MySQL with the new password. As root, type:
111<br />
#mysql –u root<br />
Enter password: ncdanh<br />
mysql><br />
C.2 Installing J2sdk1.4<br />
To install J2sdk1.4, do the following steps:<br />
- Download j2sdk-1_4_2_10-linux-i586.bin file and copy it to /usr/local:<br />
[root@m1 root]#cp –p j2sdk-1_4_2_10-linux-i586.bin /usr/local<br />
- Run the above file:<br />
[root@m1 root]#./j2sdk-1_4_2_10-linux-i586.bin<br />
This leaves directory /usr/local/j2sdk-1.4.2_10.<br />
- Insert the following lines inside file /etc/profile or /root/.bashrc:<br />
export JAVA_HOME= /usr/local/j2sdk1.4.2_10<br />
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar:./<br />
C.3 Installing Java Cog Kit 1.1<br />
This section presents how to download, install and configure the Java CoG Kit<br />
1.1.<br />
Installation is the first step that needs to be accomplished be<strong>for</strong>e the Java CoG<br />
Kit can be used. It ensures that the Java CoG Kit exists on our local machine in a<br />
proper state. After installation, configuration is needed to adjust various parameters<br />
that are specific to our environment.<br />
C.3.1 Downloading the Java Cog Kit<br />
This study uses jglobus stable binary. Using this version, we are interested in<br />
just the jar files without modifying them.<br />
The stable binary distribution of the jglobus is available from the web-site:<br />
http://www.globus.org/cog/java/1.1/cog-1.1-bin.tar.gz.<br />
As root, do the following steps:<br />
- Download cog-1.1-bin.tar.gz file and copy to /usr/local.<br />
- Unpack this file:<br />
[root@m1 root]#cd /usr/local<br />
[root@m1 local]#tar –xzf cog-1.1-bin.tar.gz<br />
A directory named cog-1.1 will be created. This directory will, from now on, be<br />
referred to as
112<br />
C.3.2 Configuration<br />
This section shows how to configure the Java CoG Kit.<br />
C.3.2.1 Environment Variables<br />
The COG_INSTALL_PATH environment variable is used to determine the<br />
installation location of the Java CoG Kit. The COG_INSTALL_PATH should point to<br />
the directory.<br />
It is also highly recommended that you add the /bin directory<br />
to the binary search path (named PATH on most systems).<br />
Add the following commands to the /etc/profile:<br />
export COG_INSTALL_PATH=/usr/local/cog-1.1<br />
export PATH=$ COG_INSTALL_PATH/bin<br />
Log out and log in the RH9 machine to active the above profile.<br />
C.3.2.2 Configuration<br />
Manual configuration of the Java CoG Kit is also possible. Using an Editor, we<br />
create the configuration file named cog.properties and locate it in the directory /.globus.<br />
In our situation, this directory is /home/snobol/.globus (The snobol<br />
user is created in Appendix B).<br />
A sample Java CoG Kit configuration file is shown as follows:<br />
#Java CoG Kit Configuration File<br />
#Mon Dec 26 10:30:30 CST 2005<br />
usercert=/home/snobol/.globus/usercert.pem<br />
userkey=/home/snobol/.globus/userkey.pem<br />
proxy=/tmp/x509up_u800<br />
cacert=/usr/local/globus/etc/grid-security/certificates/42864e48.0<br />
ip=192.168.10.241<br />
It includes a number of important properties. These properties are:<br />
- usercert: points to the location of the Globus user certificate.<br />
- userkey: points to the location of the private key associated with the Globus<br />
user certificate.<br />
- proxy: points to the location of the user proxy. The proxy is located in a<br />
temporary directory, and has its name composed of the string x509up_u and a user id<br />
(OS specific). In the above example, the user id is 1000.<br />
- cacert: contains a comma separated list of certificate authorities that the user<br />
trusts.
113<br />
- ip: represents the IP address of the machine the Java CoG Kit will be run<br />
from.<br />
C.3.2.3 Managing Certificates and Proxies<br />
Currently, the Java CoG Kit provides some GUI-based tools <strong>for</strong> credential<br />
management. These tools need the environment variable COG_INSTALL_PATH to<br />
be set to .<br />
One of the tools is Visual-grid-proxy-init. This tool allows creation of a proxy.<br />
Lifetime and cryptographic strength of the proxy can be specified. Also, the locations<br />
of user’s long-term credentials and the location of the resulting proxy file can be<br />
specified.<br />
FIGURE C-1 Visual-grid-proxy-init<br />
To run this tool, as root, do the following steps:<br />
- Run the following command:<br />
[root@m1 root]# visual-grid-proxy-init<br />
The system will show a dialog box as presented in Figure C-1.<br />
- Input password: pwsbm1.<br />
- Input the options with the following values:<br />
• Proxy lifetime : 12h<br />
• Strength : 512<br />
• Proxy file : /tmp/x509up_u800<br />
• User certificate : /home/snobol/.globus/usercert.pem<br />
• User private key : /home/snobol/.globus/userkey.pem<br />
- Press ”Create” button.<br />
For testing, after running the proxy file, run some following commands:<br />
- Display in<strong>for</strong>mation regarding a proxy
114<br />
[root@m1 root]#grid-proxy-info<br />
- Execute a command on remote machine m2 from local machine m1:<br />
[root@m1 root]#globusrun –r m2.kmitnb.ac.th –o “&(executable=/bin/ls)”<br />
C.4 Installing Tomcat 5.0<br />
C.4.1 Installing Tomcat 5.0<br />
To install Tomcat 5.0, do the following steps:<br />
- Download file jakarta-tomcat-5.0.28.tar.gz and copy it to /usr/local/opt.<br />
[root@m1 root]#cp –p jakarta-tomcat-5.0.28.tar.gz /usr/local/opt<br />
- Change into /usr/local/opt and do the following commands:<br />
[root@m1 root]# cd /usr/local/opt<br />
[root@m1 opt]# tar –zxvf jakarta-tomcat-5.0.28.tar.gz<br />
[root@m1 opt]# ln –s jakarta-tomcat-5.0.28 tomcat<br />
Tomcat has been installed into /usr/local/opt/jakarta-tomcat-5.0.28 and<br />
linked to /usr/local/opt/tomcat.<br />
- Insert the following line inside file /etc/profile or /root/.bashrc:<br />
export CATALINA_HOME=/usr/local/opt/tomcat<br />
Now, log out and then log in the RH9 machine to ensure that all changes<br />
take effect.<br />
C.4.2 Starting and Stopping Tomcat 5.0<br />
First of all, we need to ensure that CATALINA_HOME and JAVA_HOME are<br />
correctly set. To do this, open a terminal and type the following commands:<br />
# echo $JAVA_HOME<br />
# echo $CATALINA_HOME<br />
If we get a blank line, or if the directory points anywhere besides where it is<br />
supposed to, we will have to correct these environment variables first, be<strong>for</strong>e<br />
continuing.<br />
If everything is fine, we can start Tomcat with the following command. As root,<br />
# $CATALINA_HOME/bin/startup.sh<br />
To check if Tomcat is running fine, we should open a browser and point the<br />
URL to http://localhost:8080. We should see the default Tomcat welcome page.<br />
To stop Tomcat, as root,<br />
# $CATALINA_HOME/bin/shutdown.sh
115<br />
If Tomcat does not start and we downloaded the zip file, the cause is probably<br />
due to permissions. Ensure that the following files are executable inside directory<br />
$CATALINA_HOME/bin,<br />
# chmod +x startup.sh<br />
# chmod +x shutdown.sh<br />
# chmod +x tomcat.sh<br />
After making the files executable, we try starting and stopping Tomcat again.<br />
C.5 Installing mod_jk<br />
We will use the Apache server included in RH9, instead of installing another<br />
one. The httpd service was installed in /etc/httpd.<br />
Be<strong>for</strong>e installing mod_jk, we should shutdown both the httpd service and<br />
Tomcat. The httpd service can be shutdown from Menu bar of RH9 (System<br />
Settings/Server Settings/Services), shown in Figure C-2. Select httpd and press<br />
“Stop”.<br />
FIGURE C-2 Service configuration<br />
Now, to install mod_jk do the following steps:<br />
- Download file mod_jk2-2.0.4-2jpp.i386.rpm (We can download at<br />
http://rpm.pbone.net) and copy it to /usr/software.<br />
[root@m1 root]#cd /usr/software<br />
- Install this file:<br />
[root@m1 software]#rpm –ihv mod_jk2-2.0.4-2jpp.i386.rpm
116<br />
The system will automatically put both mod_jk2.so and jkjni.so into<br />
/etc/httpd/modules of RH9.<br />
Now we configure <strong>for</strong> the following files: server.xml, workers2.properties and<br />
httpd.conf.<br />
C.5.1 Editing server.xml File<br />
Open the file CATALINA_HOME/conf/server.xml and look <strong>for</strong> the "non-SSL<br />
Coyote HTTP/1.1 Connector". This is a standard Tomcat-only connector. Comment it<br />
out since we will be using Apache <strong>for</strong> handling HTTP requests:<br />
<br />
<br />
C.5.2 Creating workers2.properties File<br />
Create file /etc/httpd/conf/workers2.properties with the following contents:<br />
[shm]<br />
file=/etc/httpd/logs/shm.file<br />
size=1048576<br />
# socket channel<br />
[channel.socket:localhost:8009]<br />
port=8009<br />
host=127.0.0.1<br />
# worker <strong>for</strong> the connector<br />
[ajp13:localhost:8009]<br />
channel=channel.socket:localhost:8009<br />
Note that the port matches that defined in the file server.xml <strong>for</strong> Tomcat.<br />
C.5.3 Editing httpd.conf File<br />
Open the file /etc/httpd/conf/httpd.conf and add the following lines at the end of<br />
the list of modules loaded into Apache.<br />
LoadModule jk2_module modules/mod_jk2.so<br />
<br />
JkUriSet worker ajp13:localhost:8009<br />
<br />
<br />
JkUriSet worker ajp13:localhost:8009<br />
117<br />
<br />
JkUriSet worker ajp13:localhost:8009<br />
<br />
<br />
JkUriSet worker ajp13:localhost:8009<br />
<br />
<br />
JkUriSet worker ajp13:localhost:8009<br />
<br />
For testing, we will create the directory<br />
CATALINA_HOME/webapps/ROOT/scheduling to store the JSP or html files <strong>for</strong> our<br />
system, then create a simple file test.jsp and put this file into the above directory. The<br />
file test.jsp has the following content:<br />
<br />
<br />
<br />
<br />
<br />
Now, try to access it from a web browser as presented in Figure C-3.<br />
FIGURE C-3 Result in the web browser<br />
Tomcat will automatically create the following files:<br />
CATALINA/work/Catalina/localhost/_/org/apache/jsp/scheduling/*.class
118<br />
C.6 Installing JDBC Driver on Linux<br />
Assume that we already have MySQL installed on the Redhat Linux machine.<br />
To access MySQL from Java or JSP programs, we need to download the MySQL<br />
Connector-J from its website. This study uses MySQL Connector/J 3.2.<br />
- Download the file mysql-connector-java-3.2.0-alpha.tar.gz (We can<br />
download it from http://www.mysql.com/products/connector/j/index.html).<br />
- Unzip, untar this tar.gz file and then place the above file into /usr/local.<br />
- Copy the file mysql-connector-java-3.2.0-alpha-bin.jar to the directory<br />
JAVA_HOME/jre/lib/ext.<br />
- Copy the file Driver.class to JAVA_HOME/jre/lib/ext. This will allow the<br />
java interpreter to find the driver.<br />
- Finally, insert the following lines inside file /etc/profile or /root/.bashrc.<br />
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar:<br />
$JAVA_HOME/jre/lib/ext/mysql-connector-java-3.2.0-alpha-bin.jar:./
APPENDIX D<br />
INSTALLING CENTRALIZED AND DECENTRLIZED COURSE<br />
SCHEDULING PROGRAMS
120<br />
This section presents how to compile the centralized and decentralized course<br />
scheduling programs. These programs are written in C language that was included in<br />
the Redhat Linux 9.0 installation.<br />
D.1 The Centralized Course Scheduling Program<br />
This program will be installed on machine m2. On machine m2, we do the<br />
following steps:<br />
- Copy the file centralizedscheduling.c to /usr/study/coursescheduling.<br />
- Run the following commands as root:<br />
[root@m2 root]#cd /usr/study/coursescheduling<br />
[root@m2 coursescheduling]# gcc –I/usr/include/mysql centralizedscheduling.c –I/usr/lib/mysql –<br />
lmysqlclient –lz –o centralizedscheduling.exe<br />
The file centralizedscheduling.exe has been created in the same directory.<br />
For testing, we can run the following command.<br />
[root@m2 coursescheduling]#./centralizedscheduling.exe<br />
D.2 The Decentralized Course Scheduling Program<br />
This program will be installed on machines m2 and m3. The following steps are<br />
to compile it on machine m2.<br />
- Copy the file decentralizedscheduling.c to /usr/study/coursescheduling.<br />
- Run the following commands as root:<br />
[root@m2 root]#cd /usr/study/coursescheduling<br />
[root@m2 coursescheduling]# gcc –I/usr/include/mysql decentralizedscheduling.c –I/usr/lib/mysql –<br />
lmysqlclient –lz –o decentralizedscheduling.exe<br />
The file decentralizedscheduling.exe has been created in the same directory.
APPENDIX E<br />
JAVA SOURCE CODE FOR GRID SYSTEM
122<br />
All the following files are complied and stored in the directory<br />
/usr/study/gridsystem on machine m1.<br />
GridInfoSearch.java<br />
import java.util.Hashtable;<br />
import java.util.Enumeration;<br />
import java.net.InetAddress;<br />
import java.net.UnknownHostException;<br />
import javax.naming.Context;<br />
import javax.naming.NamingEnumeration;<br />
import javax.naming.NamingException;<br />
import javax.naming.directory.Attribute;<br />
import javax.naming.directory.SearchControls;<br />
import javax.naming.directory.SearchResult;<br />
import javax.naming.directory.Attributes;<br />
import javax.naming.ldap.LdapContext;<br />
import javax.naming.ldap.InitialLdapContext;<br />
import org.globus.mds.gsi.common.GSIMechanism;<br />
// we could add: aliasing, referral support<br />
public class GridInfoSearch {<br />
//Default values<br />
private static final String version = org.globus.common.Version.getVersion();<br />
private static final String DEFAULT_CTX ="com.sun.jndi.ldap.LdapCtxFactory";<br />
private String hostname = "m2.sched.grid.com";<br />
private int port = 2135;<br />
private String baseDN = "mds-vo-name=m2.sched.grid.com, o=grid";<br />
private int scope = SearchControls.SUBTREE_SCOPE;<br />
private int ldapVersion = 3;<br />
private int sizeLimit = 0;<br />
private int timeLimit = 0;<br />
private boolean ldapTrace = false;<br />
private String saslMech;<br />
private String bindDN;<br />
private String password;<br />
private String qop = "auth"; //could be auth, auth-int, auth-conf<br />
private static AvailableHost ob;//static mean that the values of ob will exist until the program finishs<br />
public GridInfoSearch(){<br />
}
123<br />
public String getTheBestHost(){<br />
GridInfoSearch gridInfoSearch = new GridInfoSearch();<br />
String filter = "(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=0))";<br />
gridInfoSearch.search(filter);<br />
ob.displayHost();<br />
System.out.println("the best:"+ob.getBestHost());<br />
return ob.getBestHost();<br />
}<br />
//Search the ldap server <strong>for</strong> the filter specified in the main function<br />
private void search(String filter) {<br />
Hashtable env = new Hashtable();<br />
String url = "ldap://" + hostname + ":" + port;<br />
env.put("java.naming.ldap.version", String.valueOf(ldapVersion));<br />
env.put(Context.INITIAL_CONTEXT_FACTORY, DEFAULT_CTX);<br />
env.put(Context.PROVIDER_URL, url);<br />
if (bindDN != null) {<br />
env.put(Context.SECURITY_PRINCIPAL, bindDN);<br />
}<br />
//use GSI authentication from grid-proxy-init certificate<br />
saslMech = GSIMechanism.NAME;<br />
env.put("javax.security.sasl.client.pkgs",<br />
"org.globus.mds.gsi.jndi");<br />
env.put(Context.SECURITY_AUTHENTICATION, saslMech);<br />
env.put("javax.security.sasl.qop", qop);<br />
LdapContext ctx = null;<br />
//create a new ldap context to hold per<strong>for</strong>m search on filter<br />
try {<br />
ctx = new InitialLdapContext(env, null);<br />
SearchControls constraints = new SearchControls();<br />
constraints.setSearchScope(scope);<br />
constraints.setCountLimit(sizeLimit);<br />
constraints.setTimeLimit(timeLimit);<br />
//store the results of the search in the results variable<br />
NamingEnumeration results = ctx.search(baseDN, filter, constraints);<br />
//displayResults(results);<br />
getAvailableHosts(results);//the results will be stored in ob<br />
} catch (Exception e) {<br />
System.err.println("Failed to search: " + e.getMessage());<br />
} finally {<br />
if (ctx != null) {
124<br />
}<br />
}<br />
}<br />
try { ctx.close(); } catch (Exception e) {}<br />
// Display results of search<br />
private void displayResults(NamingEnumeration results) throws NamingException {<br />
if (results == null) return;<br />
String dn;<br />
String attribute;<br />
Attributes attrs;<br />
Attribute at;<br />
SearchResult si;<br />
}//while<br />
}<br />
//use the results variable from search method and store them in a printable variable.<br />
while (results.hasMoreElements()) {<br />
si = (SearchResult)results.next();<br />
attrs = si.getAttributes();<br />
if (si.getName().trim().length() == 0) {<br />
dn = baseDN;<br />
} else {<br />
dn = si.getName() + ", " + baseDN;<br />
if(dn.substring(0,11).equals("Mds-Host-hn")){<br />
System.out.println("dn: " + dn);<br />
<strong>for</strong> (NamingEnumeration ae = attrs.getAll(); ae.hasMoreElements();) {<br />
at = (Attribute)ae.next();<br />
attribute = at.getID();<br />
if(attribute.equals("Mds-Cpu-Free-1minX100")){<br />
Enumeration vals = at.getAll();<br />
while(vals.hasMoreElements()) {<br />
System.out.println(attribute + ": " + vals.nextElement());<br />
}<br />
}<br />
}<br />
System.out.println();<br />
}<br />
}//else
125<br />
// Display results of search<br />
private void getAvailableHosts(NamingEnumeration results)throws NamingException {<br />
if (results == null) return;<br />
String dn;<br />
String attribute;<br />
Attributes attrs;<br />
Attribute at;<br />
SearchResult si;<br />
int Mds_Cpu_speedMHz=0;<br />
int Mds_Memory_Ram_Total_freeMB=0;<br />
int Mds_Cpu_Total_count=0;<br />
String Mds_Host_hn="";<br />
int Mds_Cpu_Free_1minX100=0;<br />
//use the results variable from search method and store them in a printable variable.<br />
ob=new AvailableHost();<br />
while (results.hasMoreElements()) {<br />
si = (SearchResult)results.next();<br />
attrs = si.getAttributes();<br />
if (si.getName().trim().length() == 0) {<br />
dn = baseDN;<br />
} else {<br />
dn = si.getName() + ", " + baseDN;<br />
if(dn.substring(0,32).equals("Mds-Device-Group-name=processors")){<br />
System.out.println("dn: " + dn);<br />
<strong>for</strong> (NamingEnumeration ae = attrs.getAll(); ae.hasMoreElements();) {<br />
at = (Attribute)ae.next();<br />
attribute = at.getID();<br />
if(attribute.equals("Mds-Cpu-speedMHz")){<br />
Enumeration vals = at.getAll();<br />
Mds_Cpu_speedMHz=Integer.parseInt((String)vals.nextElement());<br />
System.out.println(attribute + ": " + Mds_Cpu_speedMHz);<br />
}else if(attribute.equals("Mds-Memory-Ram-Total-freeMB")){<br />
Enumeration vals = at.getAll();<br />
Mds_Memory_Ram_Total_freeMB=<br />
Integer.parseInt((String)vals.nextElement());<br />
System.out.println(attribute + ": " + Mds_Memory_Ram_Total_freeMB);<br />
}else if(attribute.equals("Mds-Cpu-Total-count")){<br />
Enumeration vals = at.getAll();<br />
Mds_Cpu_Total_count=Integer.parseInt((String)vals.nextElement());<br />
System.out.println(attribute + ": " + Mds_Cpu_Total_count);
126<br />
}//<strong>for</strong><br />
}else if(attribute.equals("Mds-Host-hn")){<br />
Enumeration vals = at.getAll();<br />
Mds_Host_hn=(String)vals.nextElement();<br />
System.out.println(attribute + ": " + Mds_Host_hn);<br />
}else if(attribute.equals("Mds-Cpu-Free-1minX100")){<br />
Enumeration vals = at.getAll();<br />
Mds_Cpu_Free_1minX100=<br />
Integer.parseInt((String)vals.nextElement());<br />
System.out.println(attribute + ": " + Mds_Cpu_Free_1minX100);<br />
}//else if<br />
}//while<br />
//extract hostname from dn<br />
Mds_Host_hn=(String)dn.substring(dn.indexOf("Mds-Host-hn")+12,<br />
dn.indexOf("mds-vo-name")-2);<br />
System.out.println(Mds_Host_hn);<br />
//add hosts into ArrayList<br />
ob.addHost( Mds_Host_hn,<br />
Mds_Cpu_speedMHz,<br />
Mds_Memory_Ram_Total_freeMB,<br />
Mds_Cpu_Total_count,<br />
Mds_Cpu_Free_1minX100);<br />
}<br />
System.out.println();<br />
}<br />
}<br />
}<br />
//Create new instance of MyGridInfoSearch and use specified filter string<br />
public static void main( String [] args ) {<br />
GridInfoSearch gridInfoSearch = new GridInfoSearch();<br />
String filter = "(&(Mds-Device-Group-name=processors)(Mds-Cpu-Free-1minX100>=0))";<br />
gridInfoSearch.search(filter);<br />
}
127<br />
AvailableHost.java<br />
import java.util.*;<br />
public class AvailableHost{<br />
ArrayList ar;<br />
public AvailableHost() {<br />
ar = new ArrayList();<br />
}<br />
public void addHost( String Mds_Host_hn,<br />
int Mds_Cpu_speedMHz,<br />
int Mds_Memory_Ram_Total_freeMB,<br />
int Mds_Cpu_Total_count,<br />
int Mds_Cpu_Free_1minX100){<br />
ar.add(new Host( Mds_Host_hn,<br />
Mds_Cpu_speedMHz,<br />
Mds_Memory_Ram_Total_freeMB,<br />
Mds_Cpu_Total_count,<br />
Mds_Cpu_Free_1minX100));<br />
}<br />
public void displayHost(){<br />
<strong>for</strong>(int i=0; i
128<br />
public static void main(String args[]){<br />
AvailableHost ob = new AvailableHost();<br />
ob.addHost("m1.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,70/*%freeCPU*/);<br />
ob.addHost("m2.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,90/*%freeCPU*/);<br />
ob.addHost("m3.sched.grid.com",2000/*MHz*/,123/*MB*/,1/*cpu*/,80/*%freeCPU*/);<br />
ob.displayHost();<br />
ob.displayBestHost();<br />
}//main<br />
}//class AvailableHost<br />
class Host implements Comparable {<br />
private int Mds_Cpu_speedMHz;<br />
private int Mds_Memory_Ram_Total_freeMB;<br />
private int Mds_Cpu_Total_count;<br />
private String Mds_Host_hn;<br />
private int Mds_Cpu_Free_1minX100;<br />
private int Weight;<br />
public Host(<br />
String Mds_Host_hn,<br />
int Mds_Cpu_speedMHz,<br />
int Mds_Memory_Ram_Total_freeMB,<br />
int Mds_Cpu_Total_count,<br />
int Mds_Cpu_Free_1minX100){<br />
}<br />
this.Mds_Host_hn=Mds_Host_hn;<br />
this.Mds_Cpu_speedMHz=Mds_Cpu_speedMHz;<br />
this.Mds_Memory_Ram_Total_freeMB=Mds_Memory_Ram_Total_freeMB;<br />
this.Mds_Cpu_Total_count=Mds_Cpu_Total_count;<br />
this.Mds_Cpu_Free_1minX100=Mds_Cpu_Free_1minX100;<br />
this.Weight=<br />
(int)(Mds_Cpu_Free_1minX100*Mds_Cpu_speedMHz*Mds_Cpu_Total_count/100.00);<br />
public String getHostname(){<br />
return Mds_Host_hn;<br />
}<br />
public int getWeight(){<br />
return Weight;<br />
}
129<br />
public String toString() {<br />
}<br />
return Mds_Host_hn + "\t" + Weight;<br />
//Order by cpu<br />
public int compareTo(Object ob) throws ClassCastException{<br />
Host temp = (Host)ob;<br />
int cpu1=Weight,cpu2=temp.Weight;<br />
if(cpu2>cpu1){<br />
return 1;}<br />
else if(cpu2
130<br />
System.out.println(CentralizedSchedulingJobOut);<br />
System.out.println(gassJob[0].doGetStatus());<br />
// if failed, resubmit it<br />
// waiting <strong>for</strong> the result<br />
System.out.println("\nWaiting <strong>for</strong> the centralized scheduling job to finish");<br />
do {<br />
stillRunningJob=false;<br />
if (jobListeners[0].stillActive()) {<br />
stillRunningJob = true;<br />
}<br />
if(jobListeners[0].fail()){<br />
System.out.println("Resubmit:"+CentralizedSchedulingRSL);<br />
gassJob[0]=new GassJob(centralmachine,false);<br />
CentralizedSchedulingJobOut =<br />
gassJob[0].GlobusRun(CentralizedSchedulingRSL);<br />
jobListeners[0]=gassJob[0].getInteractiveJobListener();<br />
stillRunningJob = true;<br />
}//esle if<br />
System.out.print(".");<br />
delay(1000);<br />
jobs.updateJobId(0, gassJob[0].doGetJobId());<br />
jobs.updateStatus(0,gassJob[0].doGetStatus());<br />
} while (stillRunningJob);<br />
System.out.println("\n");<br />
/********************************<br />
*Decentralized scheduling<br />
********************************/<br />
String gassJobOut;<br />
String deRSL;<br />
String theBestMachine;
131<br />
//request all these jobs<br />
<strong>for</strong>(int i=1; i
132<br />
gassJob[jobCount]=new GassJob(theBestMachine,false);<br />
gassJobOut = gassJob[jobCount].GlobusRun(deRSL);<br />
jobListeners[jobCount]=<br />
gassJob[jobCount].getInteractiveJobListener();<br />
//wait to receive a jobid<br />
//update jobid <strong>for</strong> this Job<br />
jobs.updateJobId(jobCount, gassJob[jobCount].doGetJobId());<br />
//update machine that is used <strong>for</strong> this job<br />
jobs.updateMachine(jobCount, theBestMachine);<br />
jobs.updateStatus(jobCount,gassJob[jobCount].doGetStatus());<br />
stillRunningJob = true;<br />
delay(30000);<br />
}//if<br />
}//<strong>for</strong><br />
System.out.print(".");<br />
delay(5000);<br />
} while (stillRunningJob);<br />
System.out.println("\n");<br />
}<br />
}//main<br />
GassJob.java<br />
import org.globus.gram.*;<br />
import org.grid<strong>for</strong>um.jgss.*;<br />
import org.ietf.jgss.*;<br />
import org.globus.security.gridmap.*;<br />
import org.globus.io.gass.server.*;<br />
import org.globus.util.deactivator.Deactivator;<br />
import COM.claymoresystems.sslg.*;<br />
import xjava.security.interfaces.*;<br />
import cryptix.asn1.lang.*;<br />
/**<br />
* Java CoG Job submission class<br />
**/<br />
public class GassJob implements JobOutputListener<br />
{<br />
private GassServer m_gassServer; // GASS Server: required to get job output<br />
private String m_gassURL = null; // URL of the GASS server<br />
private GramJob m_job = null; // GRAM JOB to be executed
133<br />
private String m_jobOutput = "";<br />
private boolean m_batch = false;<br />
private String m_remoteHost = null;<br />
private GSSCredential m_proxy=null;<br />
// job output as string<br />
// Submission modes: batch=do not wait <strong>for</strong> output<br />
// non-batch=wait <strong>for</strong> output.<br />
// host where job will run<br />
InteractiveJobListener jobListeners;<br />
// Globus proxy used <strong>for</strong> authentication against gatekeeper<br />
// Job output variables:<br />
// Used <strong>for</strong> non-batch mode jobs to receive output from<br />
// gatekeeper through the GASS server<br />
private JobOutputStream m_stdoutStream = null;<br />
private JobOutputStream m_stderrStream = null;<br />
private String m_jobid = null; // Globus job id on the <strong>for</strong>m:<br />
//https://server.com:39374/15621/1021382777/<br />
public GassJob(String Contact, boolean batch) {<br />
m_remoteHost = Contact; // remote host<br />
m_batch = batch; // submission mode<br />
}<br />
/**<br />
* Start the Globus GASS Server. Used to get the output from the server<br />
* back to the client.<br />
*/<br />
private boolean startGassServer(GSSCredential proxy) {<br />
if (m_gassServer != null) return true;<br />
try {<br />
m_gassServer = new GassServer(proxy, 0);<br />
m_gassURL = m_gassServer.getURL();<br />
} catch(Exception e) {<br />
System.err.println("gass server failed to start!");<br />
e.printStackTrace();<br />
return false;<br />
}<br />
m_gassServer.registerDefaultDeactivator();<br />
return true;<br />
}
134<br />
/**<br />
* Init job out listeners <strong>for</strong> non-batch mode jobs.<br />
*/<br />
private void initJobOutListeners() throws Exception {<br />
if ( m_stdoutStream != null ) return;<br />
// job output vars<br />
m_stdoutStream = new JobOutputStream(this);<br />
m_stderrStream = new JobOutputStream(this);<br />
m_jobid = String.valueOf(System.currentTimeMillis());<br />
}<br />
// register output listeners<br />
m_gassServer.registerJobOutputStream("err-" + m_jobid, m_stderrStream);<br />
m_gassServer.registerJobOutputStream("out-" + m_jobid, m_stdoutStream);<br />
return;<br />
/**<br />
* This method is used to notify the implementer when the status of a<br />
* GramJob has changed.<br />
*<br />
* @param job The GramJob whose status has changed.<br />
*/<br />
public void statusChanged(GramJob job) {<br />
try {<br />
if ( job.getStatus() == GramJob.STATUS_DONE ) {<br />
// notify waiting thread when job ready<br />
m_jobOutput = "Job sent. url=" + job.getIDAsString();<br />
// if notify enabled return URL as output<br />
synchronized(this) {<br />
notify();<br />
}<br />
}<br />
}<br />
catch (Exception ex) {<br />
System.out.println("statusChanged Error:" + ex.getMessage());<br />
}<br />
}
135<br />
/**<br />
* This method is used to get the status of the job<br />
*/<br />
public String doGetStatus(){<br />
return jobListeners.doGetStatus();<br />
}<br />
/**<br />
* This method is used to get the status of the job<br />
*/<br />
public String doGetJobId(){<br />
return m_job.getIDAsString();<br />
}<br />
public InteractiveJobListener getInteractiveJobListener(){<br />
return jobListeners;<br />
}<br />
/**<br />
* It is called whenever the job's output<br />
* has been updated.<br />
*<br />
* @param output new output<br />
*/<br />
public void outputChanged(String output) {<br />
m_jobOutput += output;<br />
}<br />
/**<br />
* It is called whenever job finished<br />
* and no more output will be generated.<br />
*/<br />
public void outputClosed() {<br />
}<br />
public synchronized String GlobusRun(String RSL) {<br />
try {<br />
// load default Globus proxy. Java CoG kit must be installed<br />
//and a user certificate setup properly<br />
ExtendedGSSManager manager =<br />
(ExtendedGSSManager)ExtendedGSSManager.getInstance();<br />
GSSCredential m_proxy =<br />
manager.createCredential(GSSCredential.INITIATE_AND_ACCEPT);
136<br />
// Start GASS server<br />
if (! startGassServer(m_proxy)) {<br />
throw new Exception("Unable to stat GASS server.");<br />
}<br />
// setup Job Output listeners<br />
initJobOutListeners();<br />
// Append GASS URL to job String so we can get some output back<br />
String newRSL = null;<br />
// if non-batch, then get some output back<br />
if ( !m_batch) {<br />
newRSL = "&" + RSL.substring(0, RSL.indexOf('&')) +<br />
"(rsl_substitution=(GLOBUSRUN_GASS_URL " + m_gassURL + "))" +<br />
RSL.substring(RSL.indexOf('&') + 1, RSL.length()) +<br />
"(stdout=$(GLOBUSRUN_GASS_URL)/dev/stdout-" + m_jobid + ")" +<br />
"(stderr=$(GLOBUSRUN_GASS_URL)/dev/stderr-" + m_jobid + ")";<br />
}<br />
else {<br />
// <strong>for</strong>mat batching RSL so output can be retrieved later on using any GTK commands<br />
newRSL = RSL +<br />
"(stdout=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stdout anExtraTag)"<br />
+ "(stderr=x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stderr anExtraTag)";<br />
}<br />
m_job = new GramJob(newRSL);<br />
// set proxy. CoG kit and user credentials must be installed and set<br />
// up properly<br />
m_job.setCredentials(m_proxy);<br />
// if non-batch then listen <strong>for</strong> output<br />
jobListeners=new InteractiveJobListener(false);<br />
m_job.addListener(jobListeners);<br />
System.out.println("Sending job request to: " + m_remoteHost);<br />
m_job.request(m_remoteHost, m_batch, false);<br />
m_jobOutput = "Job sent. url=" + m_job.getIDAsString();<br />
}<br />
catch (Exception ex) {
137<br />
}<br />
}<br />
if ( m_gassServer != null ) {<br />
// unregister from gass server<br />
m_gassServer.unregisterJobOutputStream("err-" + m_jobid);<br />
m_gassServer.unregisterJobOutputStream("out-" + m_jobid);<br />
}<br />
m_jobOutput = "Error submitting job: " + ex.getClass() + ":"<br />
+ ex.getMessage();<br />
}<br />
// cleanup<br />
//Deactivator.deactivateAll();<br />
return m_jobOutput;<br />
InteractiveJobListener.java<br />
import java.io.*;<br />
import org.globus.gram.Gram;<br />
import org.globus.gram.GramJob;<br />
import org.globus.gram.GramException;<br />
import org.globus.gram.WaitingForCommitException;<br />
import org.globus.gram.GramJobListener;<br />
class InteractiveJobListener extends JobListener {<br />
private boolean quiet;<br />
private boolean finished = false;<br />
private boolean fail=false;<br />
private String strStatus="";<br />
public InteractiveJobListener(boolean quiet) {<br />
this.quiet = quiet;<br />
}<br />
public boolean stillActive() {<br />
}<br />
return !this.finished;<br />
public boolean fail(){<br />
}<br />
return this.fail;
138<br />
// waits <strong>for</strong> DONE or FAILED status<br />
public synchronized void waitFor() throws InterruptedException {<br />
while (!finished) {<br />
wait();<br />
}<br />
}<br />
public synchronized String doGetStatus(){<br />
}<br />
return strStatus;<br />
public synchronized void statusChanged(GramJob job) {<br />
if (!quiet) {<br />
System.out.println("Job: "+ job.getStatusAsString());<br />
}<br />
status = job.getStatus();<br />
strStatus=job.getStatusAsString();<br />
}<br />
}<br />
if (status == GramJob.STATUS_DONE) {<br />
finished = true;<br />
error = 0;<br />
notify();<br />
} else if (job.getStatus() == GramJob.STATUS_FAILED) {<br />
finished = true;<br />
fail=true;<br />
error = job.getError();<br />
notify();<br />
}<br />
JobListener.java<br />
import org.globus.gram.GramJob;<br />
import org.globus.gram.GramJobListener;<br />
abstract class JobListener implements GramJobListener {<br />
protected int status = 0;<br />
protected int error = 0;<br />
public abstract void waitFor() throws InterruptedException;
139<br />
public int getError() {<br />
}<br />
return error;<br />
public int getStatus() {<br />
}<br />
return status;<br />
public boolean isFinished() {<br />
}<br />
return (status == GramJob.STATUS_DONE ||status == GramJob.STATUS_FAILED);<br />
}<br />
Jobs.java<br />
import java.util.*;<br />
public class Jobs{<br />
public static ArrayList ar;<br />
public Jobs() {<br />
ar = new ArrayList();<br />
ar.add(new Job("centralizedscheduling","",<br />
"& (executable =/usr/study/coursescheduling/centralizedscheduling)","m2.sched.grid.com","",0));<br />
ar.add(new Job("decentralizedschedulingER","",<br />
"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />
(arguments=ER)", "","",0));<br />
ar.add(new Job("decentralizedschedulingSC","",<br />
"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />
(arguments=SC)", "","",0));<br />
ar.add(new Job("decentralizedschedulingED","",<br />
"& (executable =/usr/study/coursescheduling/decentralizedscheduling.exe)<br />
(arguments=ED)", "","",0));<br />
}<br />
//get a job that has index i<br />
public Job getJob(int i){<br />
return (Job) ar.get(i);<br />
}<br />
public int getSize(){<br />
return (int) ar.size();<br />
}
140<br />
//get RSL of Job having index i<br />
public String getRSL(int i){<br />
Job ob= getJob(i);<br />
return ob.getRSL();<br />
}<br />
//get Machine of Job having index i<br />
public String getMachine(int i){<br />
Job ob= getJob(i);<br />
return ob.getMachine();<br />
}<br />
//get Status of Job having index i<br />
public String getStatus(int i){<br />
Job ob= getJob(i);<br />
return ob.getStatus();<br />
}<br />
//update a new jobid <strong>for</strong> the job that has index i<br />
public void updateJobId(int i, String jobid ){<br />
Job oldJob= getJob(i);<br />
ar.set(i, new Job( oldJob.getJobName(),<br />
jobid,<br />
oldJob.getRSL(),<br />
oldJob.getMachine(),<br />
oldJob.getStatus(),<br />
oldJob.getExectime()));<br />
}<br />
//update a new machine <strong>for</strong> the job that has index i<br />
public void updateMachine(int i, String machine){<br />
Job oldJob= getJob(i);<br />
ar.set(i, new Job( oldJob.getJobName(),<br />
oldJob.getJobId(),<br />
oldJob.getRSL(),<br />
machine,<br />
oldJob.getStatus(),<br />
oldJob.getExectime()));<br />
}
141<br />
//update a new jobid <strong>for</strong> the job that has index i<br />
public void updateStatus(int i, String status){<br />
Job oldJob= getJob(i);<br />
ar.set(i, new Job( oldJob.getJobName(),<br />
oldJob.getJobId(),<br />
oldJob.getRSL(),<br />
oldJob.getMachine(),<br />
status,<br />
oldJob.getExectime()));<br />
}<br />
public void displayJobs(){<br />
<strong>for</strong>(int i=0; i
142<br />
class Job {<br />
private String jobname;<br />
private String jobid;<br />
private String RSL;<br />
private String machine;<br />
private String status;<br />
private int exectime;<br />
public Job(String jobname, String jobid, String RSL, String machine, String status, int exectime){<br />
this.jobname = jobname;<br />
this.jobid = jobid;<br />
this.RSL = RSL;<br />
this.machine = machine;<br />
this.status = status;<br />
this.exectime= exectime;<br />
}<br />
public String getJobName(){<br />
return jobname;<br />
}<br />
public String getRSL(){<br />
}<br />
return RSL;<br />
public String getJobId(){<br />
}<br />
return jobid;<br />
public String getMachine(){<br />
}<br />
return machine;<br />
public String getStatus(){<br />
}<br />
return status;<br />
public int getExectime(){<br />
}<br />
return exectime;
143<br />
public void updateJobId(String jobid ){<br />
}<br />
this.jobid = jobid;<br />
public void updateMachine(String machine ){<br />
}<br />
this.machine = machine;<br />
public void updateStatus(String status){<br />
}<br />
this.status = status;<br />
public String toString() {<br />
}<br />
return jobname + "\t" + machine + "\t" + status + "\t" + exectime;<br />
}//class Job
145<br />
BIOGRAPHY<br />
Name : Mr. Nguyen Cong Danh<br />
Thesis Title : Course Scheduling in Multiple Faculties Using a Grid Computing<br />
Environment<br />
Major Field : In<strong>for</strong>mation Technology<br />
Biography<br />
I graduated with a bachelor’s degree in Computer Science from Cantho<br />
University (Vietnam) in 2000.<br />
My contact address is 1 Ly Tu Trong street, Ninh Kieu district, Cantho city,<br />
Vietnam. My e-mail address is ncdanh@cit.ctu.edu.vn.