30.07.2015 Views

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Actas</strong> XXII Jornadas <strong>de</strong> Paralelismo (<strong>JP2011</strong>) , <strong>La</strong> <strong>La</strong>guna, Tenerife, 7-9 septiembre 2011Fig. 1. Mo<strong>de</strong>led system.II. RELATED WORKScheduling in multiprocessor systems can be performedin two main ways <strong>de</strong>pending on the task queuemanagement: global scheduling, where a single taskqueue is shared by all the processors, or partitionedscheduling, that uses a private task queue for each processor.The former allows task migrations since all the processorsshare the same task queue. In the latter case, thescheduling in each processor can be performed by applyingwell-established uniprocessor theory algorithms suchas EDF (Earliest Deadline First) or RMS (Rate MonotonicScheduling). An example of global scheduling forsporadic tasks can be found in [9].In the partitioned scheduling case, research can focuseither on the partitioner or the scheduler. Acting inthe partitioner, recent works have addressed the energyawaretask allocation problem [10], [11], [8]. For instance,Wei et al. [10] reduce energy consumption byexploiting parallelism of multimedia tasks on a multicoreplatform combining DVS with switching-off cores. Aydinet al. [11] present a new algorithm that reserves asubset of processors for the execution of tasks with utilizationnot exceeding a threshold. Unlike our work, noneof these techniques use task migration among cores.Some proposals have been <strong>de</strong>aling with task migration.Bran<strong>de</strong>nburg et al. [12] evaluate some schedulingalgorithms (both global and partitioned) in terms ofscalability, although no power consumption were investigated.In [13] Zheng divi<strong>de</strong>s tasks into fixed and migrationtasks, allocating each of the latter to two cores, sothey can migrate from one to another. Unlike our work, inthis paper there is no consi<strong>de</strong>ration about dynamic workloadchanges (tasks arriving to and leaving the system),instead, all tasks are assumed to arrive at the same instant,so migrations can be scheduled off-line. Seo etal. [5] present a dynamic repartitioning algorithm withmigrations to balance the workload and reduce consumption.In [14] Brião et al. analyze how soft tasks migrationaffects NoC-based MPSoCs in terms of <strong>de</strong>adline missesand energy consumption. These two latter works focuson non-threa<strong>de</strong>d architectures.Regarding the scheduler, in [15] El-Haj-Mahmoud etal. virtualize a simultaneous multithrea<strong>de</strong>d (SMT) processorinto multiple single-threa<strong>de</strong>d superscalar processorswith the aim of combining high performance withreal-time formalism. In or<strong>de</strong>r to improve real-time taskspredictability, Cazorla et al. [16] <strong>de</strong>vise an interactiontechnique between the Operating System (OP) and anSMT processor. Notice that these works do not tackleenergy consumption.III. SYSTEM MODELFigure 1 shows a block diagram of the mo<strong>de</strong>led system.When a task reaches the system, a partitioner moduleallocates it into a task queue associated to a core,which contains the tasks that are ready for executionin that core. These task queues are components of thepower-aware scheduler that communicates with a DVSregulator, in charge of adjusting the working frequencyof the cores in or<strong>de</strong>r to satisfy the workload requirements.To focus our research, experiments consi<strong>de</strong>red a two-coreprocessor implementing three hardware threads each.Processor cores implement the coarse-grain multithreadingparadigm that switches the running threadwhen a long latency event occurs (i.e., a main memoryaccess). Thus, the running thread issues instructions toexecute while the other threads access memory, so overlappingtheir execution. In the mo<strong>de</strong>led system, the issueslots are always assigned to the thread executing the taskwith the highest real-time priority. If this thread stalls dueto a long latency memory event, then the issue slots aretemporarily reassigned until the event is resolved.A. Real-Time Task BehaviorThe system workload executes periodic hard real-timetasks. There is no task <strong>de</strong>pen<strong>de</strong>ncy and each task has itsown period of computation. A task can be launched to executeat the beginning of each active period, and it mustend its execution before reaching its <strong>de</strong>adline (hard realtime).The end of the period and the <strong>de</strong>adline of a task areconsi<strong>de</strong>red to be the same for a more tractable schedulingprocess. There are also some periods where tasks do notexecute since they are not active (i.e., inactive periods).In short, a task arrives to the system, executes severaltimes repeatedly, leaves the system, remains out of thesystem for some periods, and then it enters the system<strong>JP2011</strong>-186

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!