13.07.2015 Views

Learning Techniques in a Mobile Network - Sidi Mohammed ...

Learning Techniques in a Mobile Network - Sidi Mohammed ...

Learning Techniques in a Mobile Network - Sidi Mohammed ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

272 Autonomic <strong>Network</strong>sbest part of each one. This approach does not require a model and is <strong>in</strong>cremental.These methods also dist<strong>in</strong>guish themselves <strong>in</strong> terms of convergence speed andefficiency. Below we describe two methods of learn<strong>in</strong>g resolution by temporaldifferences: Q-learn<strong>in</strong>g and Sarsa.11.2.3.1.1. Q-learn<strong>in</strong>g algorithmDeveloped <strong>in</strong> 1989 by Watk<strong>in</strong>s [WAT 92, WAT 89], Q-learn<strong>in</strong>g is part ofre<strong>in</strong>forcement learn<strong>in</strong>g methods without model s<strong>in</strong>ce the object is to learn byexperiment<strong>in</strong>g actions to accomplish accord<strong>in</strong>g to the current state. The agent’s goalis to learn a π: S → A policy which makes it possible to choose the next actiona t = π(s t ) to accomplish based on current state s t .For each policy π, we associate a value Q π (s, a), that we will call its Q-value andrepresents the average expected ga<strong>in</strong>s if the action has been executed when thecurrent system state is s and that π is then adopted as decision policy. Optimal policyπ*(s) is the policy maximiz<strong>in</strong>g the accumulation of revenues r t = r(s t , a t ) receivedafter an <strong>in</strong>f<strong>in</strong>ite time. Q-learn<strong>in</strong>g algorithm’s goal is to search for an approximationfor Q*(s, a) = Q π* (s,a) <strong>in</strong> a recursive manner with only quadruple (s t , a t , s’ t , r t ) asavailable <strong>in</strong>formation. This <strong>in</strong>formation conta<strong>in</strong>s the state at moment t (s t ), state atmoment t+1 (s’ t = s t+1 ), the action taken when the system is at state s t (a t ) and therevenue received at moment t (r t ) follow<strong>in</strong>g the execution of this action.Q-values are updated recursively at each transition by us<strong>in</strong>g the follow<strong>in</strong>gformula [11.2]:⎧Q(s,a) t +αt∆ Q(s,a), t ifs= sanda t = atQ t+1(s,a)= ⎨⎩Q t (s,a),otherwise[11.2]where:⎧⎫∆ Q(s,a)t= ⎨r t+γmaxQ(s',b) ⎡ ⎤⎣ t t ⎦⎬−Q(s,a)t⎩ b⎭[11.3]from where we obta<strong>in</strong> the algorithm summed up <strong>in</strong> Figure 11.2.


<strong>Learn<strong>in</strong>g</strong> <strong>Techniques</strong> <strong>in</strong> a <strong>Mobile</strong> <strong>Network</strong> 275themselves <strong>in</strong> the context where a service provider wants to sell network resourcesto maximize his revenues.11.3. Call admission controlThis section presents an approach which enables the resolution of the calladmission control problem <strong>in</strong> FCA cellular networks support<strong>in</strong>g several trafficclasses.We recall that a geographical zone served by a mobile network is divided <strong>in</strong>tocells shar<strong>in</strong>g a frequency band <strong>in</strong> two methods: FCA and DRA [KAT 96]. In FCA, afixed group of channels is allocated to each cell and a channel is allocated to eachuser. We also recall that one of the major concerns dur<strong>in</strong>g mobile network design isthe decrease of break probability dur<strong>in</strong>g <strong>in</strong>tercellular transfer.Guard channel techniques decrease the probability of communication break byreserv<strong>in</strong>g channels for the exclusive use of handoffs <strong>in</strong> each cell [ZHA 89, KAT 96].If these techniques are easy to size when we consider one traffic class (onetelephone call), they become much more complicated to optimize and are lessoptimal <strong>in</strong> a multiclass traffic context.In fact, <strong>in</strong> a multiclass traffic context, it is sometimes preferable to block a lowpriority class call and accept another call belong<strong>in</strong>g to a higher priority class. Calladmission control presented <strong>in</strong> this section enables this type of mechanism. It isobta<strong>in</strong>ed by us<strong>in</strong>g the re<strong>in</strong>forcement learn<strong>in</strong>g algorithm Q-learn<strong>in</strong>g which waspresented <strong>in</strong> the previous section.11.3.1. Problem formulationWe will focus on a simple FCA cell with N available channels and two 2 trafficclasses C1 and C2. First class calls require one channel only, whereas second classcalls require two channels. This cellular system can be considered as a discrete eventsystem. The ma<strong>in</strong> events which can occur <strong>in</strong> a cell are <strong>in</strong>com<strong>in</strong>g and outgo<strong>in</strong>g calls.These events are modeled by stochastic variables with appropriate distributions. Inparticular, new <strong>in</strong>com<strong>in</strong>g calls and handoffs are Poisson-ruled. Time spent on eachcall is exponentially distributed. Calls arrive <strong>in</strong> the cell and leave, either byexecut<strong>in</strong>g a handoff to another cell, or by term<strong>in</strong>at<strong>in</strong>g normally. The network willthen have to choose whether to accept or reject these connection requests. In return,2 The idea can easily be extended to several traffic classes.


276 Autonomic <strong>Network</strong>sit retrieves revenues received from accepted clients (ga<strong>in</strong>s), as well as all revenuesreceived from rejected clients (losses). The objective of the network adm<strong>in</strong>istrator isto f<strong>in</strong>d a CAC policy which will maximize long-term ga<strong>in</strong>s and reduce handoffblock<strong>in</strong>g probabilities and losses.We have chosen the description of all states as s = ((x 1 , x 2 ), e), where x i is thenumber of current class C i calls and e represents a new <strong>in</strong>com<strong>in</strong>g call or a handoffrequest <strong>in</strong> the cell. When an event occurs, the agent must choose one of the possibleactions A(s) = {reject, accept}. At the end of a call, no measure need be taken.For each type of call, revenue is associated with it. S<strong>in</strong>ce the ma<strong>in</strong> goal of thenetwork adm<strong>in</strong>istrator is to decrease handoff block<strong>in</strong>g probabilities, relatively highrevenue values are assigned to them. Revenue values for class C1 calls are higherthan those for class C2 calls, s<strong>in</strong>ce C1 is presumed higher priority than C2.The agent must determ<strong>in</strong>e a policy for call acceptance with the only knowledgebe<strong>in</strong>g the state of the network. This system constitutes an SMDP with a f<strong>in</strong>ite setS = {s = (x, e)} as space of states and a f<strong>in</strong>ite set A = {0,1} as space of possibleactions where Q-learn<strong>in</strong>g is the ideal solution.11.3.2. Implementation of algorithmAfter the call admission control problem formulation <strong>in</strong> the form of an SMDP,we will now describe two Q-learn<strong>in</strong>g algorithm implementations able to solve theproblem. We have named them TRL-CAC (table re<strong>in</strong>forcement learn<strong>in</strong>g CAC) andNRL-CAC (neural network re<strong>in</strong>forcement learn<strong>in</strong>g CAC). TRL-CAC uses a simpletable to represent function Q (the set of Q-values). On the other hand, NRL-CACuses a network of multilayer neurons [MIT 97]. The differences between these twoalgorithms (TRL-CAC and NRL-CAC) are expla<strong>in</strong>ed <strong>in</strong> terms of memory size andcalculation complexity.S<strong>in</strong>ce the approach uses a table, TRL-CAC is the simplest and most efficientmethod. This approach leads to an exact calculation and it is fully compliant withstructures assumptions achieved <strong>in</strong> order to prove Q-learn<strong>in</strong>g algorithmconvergence. However, when the group of state-action pairs (s, a) is large or when<strong>in</strong>com<strong>in</strong>g variables (x, e) constitut<strong>in</strong>g state s are cont<strong>in</strong>uous variables, the use of asimple table becomes unacceptable because of the huge storage requirements. In thiscase, some approximation functions, such as state aggregation, neuron networks[MIT 97, MCC 43] or even regression trees [BRE 84] can be used efficiently. Theneuron network used <strong>in</strong> NRL-CAC is made up of 4 entries, 10 hidden units and oneoutput unit.


<strong>Learn<strong>in</strong>g</strong> <strong>Techniques</strong> <strong>in</strong> a <strong>Mobile</strong> <strong>Network</strong> 27711.3.2.1. ImplementationWhen a call arrives (new call or handoff), the algorithm determ<strong>in</strong>es if quality ofservice is not violated by accept<strong>in</strong>g this call (by simply verify<strong>in</strong>g if there are enoughchannels available <strong>in</strong> the cell). If this quality of service is violated, the call isrejected; if not the action is chosen accord<strong>in</strong>g to the formula:a = arg max Q*(s,a)[11.4]a∈A(s)where A(s) = {1 = accept, 0 = reject}.In particular, [11.4] implies the follow<strong>in</strong>g procedures: when a call arrives,acceptance Q-value and call reject Q-value are determ<strong>in</strong>ed from the table (TRL-CAC), or from the neuron network (NRL-CAC). If the reject has a higher value, thecall is then rejected. Otherwise, the call is accepted.Comment [JL5]: Rejection?In these two cases, and to f<strong>in</strong>d out optimal values Q*(s,a), the function is updatedat each system transition of state s to state s’. For both algorithms, this is done <strong>in</strong> thefollow<strong>in</strong>g manner:– TRL-CAC: [11.2] is the formula used to update the appropriate Q-value <strong>in</strong> thetable;– NRL-CAC: when a network of neurons is used to store function Q, a secondlearn<strong>in</strong>g procedure is required to f<strong>in</strong>d out neuron network weights. This procedureuses back propagation (BP) algorithm [MIT 97]. In this case, ∆Q def<strong>in</strong>ed by formula[11.3] is used as error signal which is back propagated <strong>in</strong> the different neuronnetwork layers.Comment [JL6]: Is this OK orshould it be "propagated back"?11.3.2.2. ExplorationFor a correct and efficient execution of the Q-learn<strong>in</strong>g algorithm, all potentiallysignificant pairs of state-action (s, a) must be explored. For this, dur<strong>in</strong>g a longenough learn<strong>in</strong>g period, the action is not chosen from formula [11.4], but from thefollow<strong>in</strong>g formula [11.5] with a probability of exploration ∈:a = arg m<strong>in</strong> visits(s,a)[11.5]a∈A(s)where visits(s, a) <strong>in</strong>dicate the number of times (s, a) a configuration has been visited.This heuristic, called ∈-directed, significantly accelerates Q-value convergence.


<strong>Learn<strong>in</strong>g</strong> <strong>Techniques</strong> <strong>in</strong> a <strong>Mobile</strong> <strong>Network</strong> 281allocation and admission control has been previously studied [NAG 95, TAJ 88,YAN 94].This section presents a new approach to solve the dynamic resource allocationproblem, also consider<strong>in</strong>g call admission control <strong>in</strong> DRA systems. For reasonsmentioned <strong>in</strong> the previous section, call admission control is vital when the networksupports several classes of clients. Allocation policies are obta<strong>in</strong>ed by us<strong>in</strong>g there<strong>in</strong>forcement learn<strong>in</strong>g algorithm Q-learn<strong>in</strong>g. The ma<strong>in</strong> functions of the proposedmechanism called Q-DRA presented <strong>in</strong> this section are: accept<strong>in</strong>g clients andreject<strong>in</strong>g others, and allocat<strong>in</strong>g the best available channel for accepted clients. Thegoal is to maximize accumulation of received revenue through time.We will now briefly describe the problem formulation by a SMDP and theimplementation of the Q-learn<strong>in</strong>g algorithm capable of resolv<strong>in</strong>g this SMDP.11.4.1. Problem formulationThis contribution is an extension of the study by Nie and Hayk<strong>in</strong> [NIE 99] and ispart of exhaustive search<strong>in</strong>g DRA algorithms. We are consider<strong>in</strong>g the resolution ofthe dynamic resource allocation problem as well as the call admission controlproblem <strong>in</strong> a cellular network. This network conta<strong>in</strong>s N cells and M channelsavailable which are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> a common pool. It supports two traffic classes(contrary to [NIE 99] where only one traffic class – telephone call – was taken <strong>in</strong>toaccount). Each channel can be temporarily allocated to any cell, as long as theconstra<strong>in</strong>t on reuse distance is satisfied (a given signal quality must be ma<strong>in</strong>ta<strong>in</strong>ed).The calls arrive <strong>in</strong> the cell and leave based on appropriate distributions. Thenetwork will then choose to accept or reject these connection requests. If the call isaccepted, the system allocates one of the available channels to it from the commonpool. The goal of the network adm<strong>in</strong>istrator is to f<strong>in</strong>d a dynamic resource allocationpolicy capable of maximiz<strong>in</strong>g long-term ga<strong>in</strong>s and decrease probabilities of handoffblock<strong>in</strong>g (contrary to [NIE 99] which does not give any priority to handoff calls).We have chosen the group of states as be<strong>in</strong>g {s = (i, D(i), (x 1 , x 2 ), e i )}, whereD(i) represents the number of available channels <strong>in</strong> cell i where event e i hasoccurred, x k represents the number of current calls from class C k and e i <strong>in</strong>dicates thearrival of a new call or a handoff call <strong>in</strong> a cell i. When an event occurs, the agentmust choose one of the possible actions A(s) = {0 = reject} ∪ {1,…, M}. When acall ends, no measure has to be taken. The agent will have to determ<strong>in</strong>e the policiesfor accept<strong>in</strong>g or reject<strong>in</strong>g a call and, when accepted, to allocate the channel that willenable the maximization of ga<strong>in</strong> accumulation received by only know<strong>in</strong>g the current


282 Autonomic <strong>Network</strong>snetwork state s. This system constitutes an SMDP which has as space of states af<strong>in</strong>ite group S = {(i, D(i), x, e)} and as space of possible actions a f<strong>in</strong>ite groupA = {0, 1…, M}. The choice of revenues that we have used considers <strong>in</strong>tercellular<strong>in</strong>terferences and its consequence is a situation <strong>in</strong> which channels already used <strong>in</strong>compact cells 4 [ZHA 91] will have a better chance of be<strong>in</strong>g chosen. Contrary toother studies, Q-DRA considers the call type and grants priority to handoff calls.11.4.2. Algorithm implementationAfter the problem formulation <strong>in</strong> the form of SMDP, we will describeimplementation of the Q-learn<strong>in</strong>g algorithm capable of its resolution. The cellularsystem studied is made up of N = 36 hexagonal cells and M = 70 channels available<strong>in</strong> a common pool. We use a reuse distance D= 21R (R represents cell radius).This implies that if a channel is allocated to a cell i, then it cannot be reused <strong>in</strong> thetwo rows adjacent to i because of co-channel <strong>in</strong>terferences. In this way, there are atmost 18 cells <strong>in</strong>terfer<strong>in</strong>g with each system cell. The temporal propagation constant γhas been set to 0.5 and the learn<strong>in</strong>g rate α to 0.1.In the previous section, we have used a network of neurons to represent Q-values(NRL-CAC), but this time we have chosen an approximation based on stateaggregation. Instead of precisely def<strong>in</strong><strong>in</strong>g the number of calls x i for each traffic classC i , we have chosen to characterize traffic as follows: low, medium, high. The spaceof states is thus reduced and a simple table can be used to represent aggregatedstates. Because identical states (or states with a similar number of current calls) havethe same Q-values, performance loss l<strong>in</strong>ked to aggregation becomes <strong>in</strong>significant[TON 99].11.4.2.1. ImplementationWhen a call arrives (new call or handoff call) <strong>in</strong> cell i, the algorithm determ<strong>in</strong>esif quality of service is not violated by accept<strong>in</strong>g this call (by simply verify<strong>in</strong>g ifthere are free channels <strong>in</strong> the common pool). If this quality of service is violated, thecall is rejected, otherwise the action is chosen depend<strong>in</strong>g on the follow<strong>in</strong>gexpression:a= arg max Q*(s,a)[11.8]a∈A(s)4 Compact cells are cells with an average m<strong>in</strong>imum distance between co-channel cells.


<strong>Learn<strong>in</strong>g</strong> <strong>Techniques</strong> <strong>in</strong> a <strong>Mobile</strong> <strong>Network</strong> 283where A(s) = {0 = reject, 1, 2…, M}.Formula [11.8] implies the follow<strong>in</strong>g procedures. When there is a call connectionattempt <strong>in</strong> cell i, Q-value of reject (a = 0) as well as acceptance Q-values(a = 1, 2…, M) are determ<strong>in</strong>ed from the Q-value table. Acceptance Q-values <strong>in</strong>cludethe different Q-values correspond<strong>in</strong>g to choices of each channel a (a = 1, 2…, M) toserve the call. If the rejection has the highest value, then the call is rejected.Otherwise, if one of the acceptance values has the highest value, the call is acceptedand channel a is allocated to it.11.4.2.2. ExplorationFor a correct and efficient execution of Q-DRA algorithm, the action is notchosen from formula [11.4], but based on a Boltzmann distribution [WAT 89]dur<strong>in</strong>g a relatively long learn<strong>in</strong>g period. The idea is first to favor exploration (theprobability of execut<strong>in</strong>g actions other than those with the highest Q-value) by us<strong>in</strong>gall possible actions with the same probability. Then the goal is to gradually movetowards the use of the action with the highest Q-value. The values learned will beused later to <strong>in</strong>itialize Q-values <strong>in</strong> Q-DRA.11.4.3. Experimental resultsIn order to study Q-DRA performances, a group of simulations was completed.We have compared Q-DRA to greedy-DRA 5 policy [TON 99], as well as to theDRA-Nie algorithm [NIE 99]. Algorithm performances have also been evaluated <strong>in</strong>terms of ga<strong>in</strong>s, losses, as well as handoff block<strong>in</strong>g probabilities. A group ofsimulations was carried out <strong>in</strong>clud<strong>in</strong>g: a case of traffic load evenly distributed overall cells, a case of load not evenly distributed, a case of variable traffic load <strong>in</strong> timeand a case of equipment failure. We will not present all the results obta<strong>in</strong>ed (see[SEN 03a, SEN 03c] for more <strong>in</strong>formation):– even traffic distribution: the first experiment considers a constant traffic load<strong>in</strong> the 36 cells for both traffic classes. We have used policies learned dur<strong>in</strong>g thelearn<strong>in</strong>g period, but with five different traffic loads (for both C1 and C2 classes);– uneven traffic distribution: <strong>in</strong> this second experiment, we have used policieslearned dur<strong>in</strong>g the learn<strong>in</strong>g period but traffic loads <strong>in</strong> this case are no longer evenlydistributed over the 36 cells. The average traffic load considered was of 7.5 Erlangs;5 Greedy-DAC: policy which randomly chooses a channel to serve a call with no measure of<strong>in</strong>terference. Each M channel has the same probability to be chosen for serv<strong>in</strong>g the new call.


<strong>Learn<strong>in</strong>g</strong> <strong>Techniques</strong> <strong>in</strong> a <strong>Mobile</strong> <strong>Network</strong> 285generally less desirable than break/failure of a call connection belong<strong>in</strong>g to a lowerpriority class, new mechanisms of CAC are vital. In fact, efficient call admissioncontrol is required to prevent this limitation of available radio resources <strong>in</strong> thecellular network radio <strong>in</strong>terface. Efficient management of radio resources throughdynamic allocation policies also proves to be essential to prevent this type ofproblem. These new mechanisms consist of def<strong>in</strong><strong>in</strong>g resource management rules foreach traffic class, for the optimization of usage rate and to satisfy the multiple QoSconstra<strong>in</strong>ts.In order to prevent this type of problem, several propositions exist <strong>in</strong> other worksand their ma<strong>in</strong> goal is to avoid <strong>in</strong>conveniences caused by communication breaks forusers. However, we have noticed that these solutions often ignore experience andknowledge which could be acquired dur<strong>in</strong>g real-time system execution.The contributions presented <strong>in</strong> this chapter, relative to cellular networks, are<strong>in</strong>tended to benefit from this experience and knowledge <strong>in</strong> order to optimizeproblems encountered <strong>in</strong> cellular networks. In the first contribution we needed tof<strong>in</strong>d a new approach to solve the CAC problem <strong>in</strong> a multiservice cellular networkwhere channels are permanently allocated to cells. For the second, we needed a newapproach to the dynamic resource allocation problem <strong>in</strong> a multiservice cellularnetwork. This last contribution is <strong>in</strong>genious, s<strong>in</strong>ce it comb<strong>in</strong>es optimal CAC policyresearch and the best dynamic channel allocation strategy. These proposedmechanisms to solve such complex problems as those l<strong>in</strong>ked to cellular networksuse re<strong>in</strong>forcement learn<strong>in</strong>g as their solution. These are creative and <strong>in</strong>telligentsolutions. In addition to the creativity of these mechanisms, the advantages ga<strong>in</strong>edby us<strong>in</strong>g such approaches can be summarized as follows. Contrary to other studies(studies based on mathematical models or simulations, presum<strong>in</strong>g fixedexperimental parameters), these solutions are adaptable to variations of networkstate (i.e. variations of traffic conditions, equipment failure, etc.). Because of theirdistributed nature, they can easily be implemented <strong>in</strong> each base station, which makesthem more attractive. Channel admission control and dynamic allocation tasks arequickly determ<strong>in</strong>ed with little calculation efforts. They are obta<strong>in</strong>ed by a simplespecification of preferences between service classes. We have also demonstrated,with a large group of experiments, that these mechanisms give the best resultscompared to other heuristics. These are distributed algorithms and signal<strong>in</strong>g<strong>in</strong>formation exchanged between base stations are almost null. These mechanisms aretherefore more attractive because of their implementation simplicity.F<strong>in</strong>ally, we can say that our ma<strong>in</strong> contribution has been to propose and testmechanisms for solv<strong>in</strong>g problems encountered <strong>in</strong> mobile networks (CAC anddynamic resource allocation). We have been able to demonstrate that it is possible to


286 Autonomic <strong>Network</strong>suse techniques from AI and, more specifically, learn<strong>in</strong>g techniques <strong>in</strong> order todevelop efficient, robust and easy to implement mechanisms.11.6. Bibliography[AGH 01] AL AGHA K., PUJOLLE G., VIVIER G., Réseaux de mobiles et réseaux sans fil,Eyrolles Publish<strong>in</strong>g, 2001.[BOY 94] BOYAN J.A., LITTMAN M.L., “Packet rout<strong>in</strong>g <strong>in</strong> dynamically chang<strong>in</strong>gnetworks: a re<strong>in</strong>forcement approach”, Advances <strong>in</strong> Neural Information Process<strong>in</strong>gSystems (NIPS’94), vol. 6, p. 671-678, San Mateo, 1994.[BRE 84] BREIMAN L., FRIEDMAN J.H., OLSEN R.A., STONE C.J., Classification andRegression Trees, Chapman & Hall, 1984.[CAL 92] CALHOUN G., Radio cellulaire numérique, Tec & Doc, 1992.[COR] CORSINI M.M., Class on re<strong>in</strong>forcement learn<strong>in</strong>g, available at: http://www.sm.ubordeaux2.fr/~cors<strong>in</strong>i/Cours/HeVeA/rl.html.[COX 72] COX D.C., REUDINK D.O., “Dynamic channel assignment <strong>in</strong> two dimensionallarge mobile radio systems”, Bell Syst. Tech. J., vol. 51, p. 1611-1627, 1972.[DEL 93] DEL RE E., FANTACCI R., RONGA L., “A dynamic channel allocation techniquebased on Hopfield neural networks”, IEEE Trans. Vehicular Technology, vol. 45, p. 26-32, February 1996.[DIM 93] DIMITRIJEVIC D.D., VUCETIC J., “Design and performance analysis of thealgorithms for channel allocation <strong>in</strong> cellular networks”, IEEE Trans. VehicularTechnology, vol. 42, p. 526-534, November 1993.[GIB 96] GIBSON J.D., The Telecommunications Handbook, IEEE Press, 1996.[KAT 96] KATZELA I., NAGHSHINEH M., “Channel assignment schemes for cellularmobile telecommunications systems”, IEEE Personal Communications Magaz<strong>in</strong>e, June1996.[LEV 97] LEVINE D.A., AKYILDIZ I.F., NAGHSHINEH M., “A Resource Estimation andCall Adaptation Algorithm for Wireless Multimedia <strong>Network</strong>s Us<strong>in</strong>g the Shadow ClusterConcept”, IEEE/ACM Transactions on <strong>Network</strong><strong>in</strong>g, vol. 5, no. 1, p. 1-12, February 1997.[MANET] IETF MANET Work<strong>in</strong>g Group (mobile ad hoc networks), www.ietf.ora/html.charters/manet-charter.html.[MAR 97] MARBACH P., TSITSIKLIS J.N., “A Neuro-Dynamic Approach to AdmissionControl <strong>in</strong> ATM <strong>Network</strong>s: The S<strong>in</strong>gle L<strong>in</strong>k Case”, ICASSP’97, 1997.[MAR 98] MARBACH P., MIHATSCH O., SCHULTE M., TSITSIKLIS J.N.,“Re<strong>in</strong>forcement learn<strong>in</strong>g for call admission control and rout<strong>in</strong>g <strong>in</strong> <strong>in</strong>tegrated servicenetworks”, <strong>in</strong> JORDAN M. et al., (ed.), Advances <strong>in</strong> NIPS 10, MIT Press, 1998.


<strong>Learn<strong>in</strong>g</strong> <strong>Techniques</strong> <strong>in</strong> a <strong>Mobile</strong> <strong>Network</strong> 287[MAR 00] MARBACH P., MIHATSCH O., TSITSIKILS J.N., “Call admission control androut<strong>in</strong>g <strong>in</strong> <strong>in</strong>tegrated services networks us<strong>in</strong>g neuro-dynamic programm<strong>in</strong>g”, IEEEJournal on Selected Areas <strong>in</strong> Communications (JSAC’2000), vol. 18, no. 2, p. 197-208,February 2000.[MCC 43] MCCULLOCH W.S., PITTS W., “A logical calculus of the ideas imm<strong>in</strong>ent <strong>in</strong>nervous activity”, Bullet<strong>in</strong> of Math. Biophysics, vol. 5, 1943.[MIT 97] MITCHELL T.M., Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>, McGraw-Hill, 1997.[MIT 98] MITRA M.I., REIMAN J., WANG, “Robust dynamic admission control for unifiedcell and call QoS <strong>in</strong> statistical multiplexers”, IEEE Journal on Selected Areas <strong>in</strong>Communications (JSAC’1998), vol. 16, no. 5, p. 692-707, June 1998.[NAG 95] NAGHSHINEH M., SCHWARTZ O., “Distributed call admission control <strong>in</strong>mobile/wireless networks”, PIMRS, Proceed<strong>in</strong>gs of Personal Indoor and mobile radiocommunications, 1995.[NIE 99] NIE J., HAYKIN S., “A Q-<strong>Learn<strong>in</strong>g</strong> based dynamic channel assignment techniquefor mobile communication systems”, IEEE Transactions on Vehicular Technology, vol.48, no 5, September 1999.[RAM 96] RAMJEE R., NAGARAJAN R., TOWSLEY D., “On Optimal Call AdmissionControl <strong>in</strong> Cellular <strong>Network</strong>s”, IEEE INFOCOM, p. 43-50, San Francisco, March 1996.[SEN 03a] S. SENOUCI, Application de techniques d’apprentissage dans les réseaux mobiles,PhD Thesis Pierre and Marie Curie University, Paris, October 2003.[SEN 03b] SENOUCI S., BEYLOT A.-L., PUJOLLE G., “Call Admission Control <strong>in</strong> Cellular<strong>Network</strong>s: A Re<strong>in</strong>forcement <strong>Learn<strong>in</strong>g</strong> Solution”, ACM/Wiley International Journal of<strong>Network</strong> Management, vol. 14, no. 2, March-April 2003.[SEN 03c] SENOUCI S., PUJOLLE G., “New Channel Assignments <strong>in</strong> Cellular <strong>Network</strong>s: Are<strong>in</strong>forcement <strong>Learn<strong>in</strong>g</strong> Solution”, Asian Journal of Information Technology (AJIT’2003),p. 135-149, vol. 2, no. 3, Grace Publications <strong>Network</strong>, July-September 2003.[SIV 90] SIVARAJAN K.N., MCELIECE R.J., KETCHUM J.W., “Dynamic channelassignment <strong>in</strong> cellular radio”, Proc. IEEE 40 th Vehicular Technology Conf., p. 631-637,May 1990.[SRI 00] SRIDHARAN M., TESAURO G., “Multi-agent Q-learn<strong>in</strong>g and Regression Treesfor Automated Pric<strong>in</strong>g Decisions”, Proceed<strong>in</strong>gs of the Seventeenth InternationalConference on Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong> (ICML’00), Stanford, June-July, 2000.[SUT 98] SUTTON R.S., BARTO G., ANDREW, Re<strong>in</strong>forcement <strong>Learn<strong>in</strong>g</strong>: An Introduction,MIT Press, 1998.[TAJ 88] TAJIMA J., IMAMURA K., “A strategy for exible channel assignment <strong>in</strong> mobilecommunication systems”, IEEE Transaction on Vehicular Technology, vol. 37, p. 92-103,May 1988.[TON 99] TONG H., Adaptive Admission Control for Broadband Communications, PhDThesis, University of Colorado, Boulder, Summer 1999.Comment [JL8]: Author’s<strong>in</strong>itials are miss<strong>in</strong>g – please add


288 Autonomic <strong>Network</strong>s[TON 00] TONG H., BROWN T.X., “Adaptive Call Admission Control under quality ofservice Constra<strong>in</strong>t: a Re<strong>in</strong>forcement <strong>Learn<strong>in</strong>g</strong> Solution”, IEEE Journal on Selected Areas<strong>in</strong> Communications (JSAC’2000), vol. 18, no. 2, p. 209-221, February 2000.[WAT 89] WATKINS C.J.C.H., <strong>Learn<strong>in</strong>g</strong> from delayed rewards, PhD Thesis, University ofCambridge, Psychology Department, 1989.[WAT 92] WATKINS C.J.C.H., DAYAN P., “Q-learn<strong>in</strong>g”, Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>, vol. 8, p. 279-292, 1992.[YAN 94] YANG W.B., GERANIOTIS E., “Admission policies for <strong>in</strong>tegrated voice and datatraffic <strong>in</strong> CDMA packet radio networks”, IEEE Journal on Selected Areas <strong>in</strong>Communications, vol. 12, p. 654-664, May 1994.[ZHA 89] ZHANG M., YUM T.S., “Comparisons of channel assignment strategies <strong>in</strong> cellularmobile systems”, IEEE Trans. Vehicular Technology, vol. 38, no. 1, p. 211-215, June1989.[ZHA 91] ZHANG M., YUM T.S., “The non-uniform compact pattern allocation algorithmfor cellular mobile systems”, IEEE Trans. Vehicular Technology, vol. 40, no. 2, p. 387-391, May 1991.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!