20.01.2013 Views

5kk80 Assignment 2 – Design-Space Exploration

5kk80 Assignment 2 – Design-Space Exploration

5kk80 Assignment 2 – Design-Space Exploration

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>5kk80</strong> <strong>Assignment</strong> 2 <strong>–</strong> <strong>Design</strong>-<strong>Space</strong> <strong>Exploration</strong><br />

1 Introduction<br />

Eindhoven University of Technology<br />

Department of Electrical Engineering<br />

3 April 2008<br />

A well-known approach for design-space exploration of multiprocessor systems is the Y-chart<br />

approach shown in Figure 1. A multiprocessor system consists of a (set of) parallelised application(s)<br />

that are mapped onto the multiprocessor platform. The overall performance of the system<br />

depends on all these three aspects. The mapping determines for example which processor unit of<br />

the platform executes what tasks of the application and which communication units realise the<br />

dependencies between the tasks. After developing a model incorporating all three parts, the obtained<br />

performance results may give hints to improving the application, platform and/or mapping.<br />

Iteratively applying such improvements leads to finding an optimal design solution. The goal of<br />

this assignment is to explore the design space of a multiprocessor system according the Y-chart.<br />

The assignment consists of two parts. <strong>Assignment</strong> 2-1 concentrates on searching the design<br />

space for Pareto optimal mappings and platforms for a single execution of a given application.<br />

<strong>Assignment</strong> 2-2 extends the search for an optimal realisation of a streaming or pipelined execution<br />

of the same application, where additional configuration options for the platform (voltage scaling<br />

and operating system type) as well as modification of the application are allowed.<br />

Figure 1: Y-chart approach.<br />

For your design-space exploration activities, a POOSL model is provided, which can be downloaded<br />

from StudyWeb. This handout discusses the provided POOSL model of the considered<br />

multiprocessor system in more detail as well as the actual assignment.<br />

2 Application<br />

A task graph of the application considered in this assignment is depicted in Figure 2. Each node<br />

represents certain functionality (like decoding or filtering) comprising a task that can potentially<br />

1


e executed in parallel with other tasks. The edges represent dependencies between the tasks;<br />

solid lines denote data dependencies, whereas dotted lines indicate control dependencies. Such<br />

dependencies emerge from the communication of information between the tasks, which is implemented<br />

using FIFO buffers. The unit of information that is exchanged between tasks is referred<br />

to as a token, where each token contains 1 Byte of information. The dot with label 4 for buffer<br />

F17 indicates that F17 already contains 4 tokens when the application is started. They allow a<br />

streaming or pipelined execution of the application, where Task1 can be executed at maximum 4<br />

times before Task7 must be executed.<br />

Figure 2: Task graph with data and control dependencies.<br />

Figure 3: Markov chain for determining<br />

scenarios in Task1.<br />

The application in Figure 2 can operate in two modes or scenarios named S1 and S2. The<br />

firing (execution) of a task starts with determining the scenario in which it will operate. Task1<br />

determines the scenario based on the Markov chain (state machine with probabilities) in Figure 3.<br />

This Markov chain models the real functionality of Task1 that interprets certain input data (which<br />

may for example be read from a file) to determine the scenario. Each time Task1 fires, a transition<br />

in the Markov chain is made, where the resulting state determines the scenario. All other tasks<br />

determine the scenario by means of receiving a token from Task1 through their Control port (i.e.,<br />

through the control dependencies in Figure 2). After fixing the scenario, a task continues with<br />

receiving a token on all its other inputs. When all input data has become available, the actual<br />

execution of the task can be performed by the processor node on which the task is mapped. After<br />

completing the execution, the task finalises its firing with producing a token on all its outputs.<br />

For Task1, this also includes sending tokens to the Control port of all other tasks valued with<br />

the scenario in which they operate.<br />

Not all tasks and dependencies are active in both scenarios. Task4 does not perform any<br />

functionality in scenario S2 implying that also no information is communicated through buffers F5<br />

and F12. Moreover, no information is exchanged through F9 in scenario S2. On the other hand,<br />

F11 is inactive in scenario S1. In addition to these variations, the number of tokens produced<br />

and consumed by the tasks using F8 differs for scenarios S1 and S2. A detailed overview of the<br />

resource requirements for the tasks and buffers is given in Tables 1 and 2 respectively. Note that<br />

the execution times in Table 1 are in cycles and not in seconds, see also Section 3.<br />

In <strong>Assignment</strong> 2-1, only a single firing in scenario S1 of each task is considered (the Markov<br />

chain in Figure 3 is not used). In <strong>Assignment</strong> 2-2, the application is to be executed continuously<br />

in a streaming (pipelined) fashion, where all tasks concurrently fire in subsequent scenarios.<br />

Task Graph Transformations <strong>Assignment</strong> 2-1 assumes that the task graph is fixed. For<br />

many applications, it is however possible to adapt the task graph by exploiting for instance task<br />

parallelism. Conversely, some of the overhead (like communication and task switching overhead)<br />

incurred by the parallelisation can be reduced or eliminated by combining tasks. In <strong>Assignment</strong><br />

2


MIPS Scenario S1 Scenario S2<br />

Task Exec. Time (Cycles) Mem. (Bytes) Exec. Time (Cycles) Mem. (Bytes)<br />

Task1 60000 43008 52000 24576<br />

Task2 18000 18432 46000 16384<br />

Task3 24000 36864 55000 49152<br />

Task4 40000 65536 0 0<br />

Task5 85000 20480 65000 28672<br />

Task6 70000 32768 120000 40960<br />

Task7 90000 49152 85000 34816<br />

ARM7 Scenario S1 Scenario S2<br />

Task Exec. Time (Cycles) Mem. (Bytes) Exec. Time (Cycles) Mem. (Bytes)<br />

Task1 40000 32768 48000 34816<br />

Task2 18000 28672 32000 18432<br />

Task3 15500 55296 28000 30720<br />

Task4 24000 24576 0 0<br />

Task5 32500 16384 55000 22528<br />

Task6 64000 20480 46000 36864<br />

Task7 56000 32768 48000 47104<br />

TriMedia Scenario S1 Scenario S2<br />

Task Exec. Time (Cycles) Mem. (Bytes) Exec. Time (Cycles) Mem. (Bytes)<br />

Task1 72000 49152 48000 20480<br />

Task2 15000 24576 32000 12288<br />

Task3 20000 65536 40000 57344<br />

Task4 64000 98304 0 0<br />

Task5 96000 32768 56000 30720<br />

Task6 108000 16384 64000 46080<br />

Task7 80000 12288 98000 22528<br />

Table 1: Resource requirements of the tasks.<br />

2-2, the task graph in Figure 2 is assumed to be the most parallel version of the application.<br />

From this most parallel version, it is allowed to adapt the task graph such that at maximum one<br />

combination of two of the tasks Task2, Task3, Task4, Task5 and Task6 is performed.<br />

Combining tasks affects the control and data dependencies as follows. Since all tasks receive the<br />

same control token indicating the scenario in which they will operate, a single control dependency<br />

is needed from Task1 (instead of two). On the other hand, all data dependencies with other tasks<br />

remain valid, even if this implies multiple data dependencies between two tasks. For example, when<br />

combining Task3 and Task4, one of the control dependencies through F4 and F6 can be removed,<br />

whereas both data dependencies through F3 and F5 must remain to be modelled separately. Instead<br />

of providing a list of profiling information for all allowed task combinations, the following approach<br />

is used to derive the profiling information for the combined task as a rough approximation. On<br />

any processor type and in each scenario, the execution time of the combined task equals the sum of<br />

the execution times of the two original tasks, minus 10% (representing the reduction in overhead).<br />

Similarly, the memory requirement for the combined task equals the maximum of the amount<br />

of memory required by the original tasks, plus the number of tokens communicated between the<br />

original tasks (if any). This approach is illustrated in Figure 4 when combining Task3 and Task4<br />

into TaskX.<br />

3 Platform<br />

The platform considered in this assignment concerns a Network-on-Chip (NoC) based Multi-<br />

Processor System-on-Chip (MPSoC) in a battery-powered embedded device. Four kinds of re-<br />

3


Scenario S1 Scenario S2<br />

Buffer # Tokens # Tokens<br />

F1 2048 2048<br />

F2 1 1<br />

F3 1024 1024<br />

F4 1 1<br />

F5 2048 0<br />

F6 1 1<br />

F7 1 1<br />

F8 1024 2048<br />

Scenario S1 Scenario S2<br />

Buffer # Tokens # Tokens<br />

F9 4096 0<br />

F10 2048 2048<br />

F11 0 4096<br />

F12 4096 0<br />

F13 1 1<br />

F14 3072 3072<br />

F15 1024 1024<br />

F16 1 1<br />

F17 1 1<br />

Table 2: Resource requirements of the buffers (Each token is 1 byte).<br />

MIPS Scenario S1 Scenario S2<br />

Task Exec. Time (Cycles) Mem. (Bytes) Exec. Time (Cycles) Mem. (Bytes)<br />

TaskX 57600 65536 49500 49152<br />

Figure 4: Transformed task graph with derived profiling information for a MIPS.<br />

sources can be distinguished: processor units, communication units, storage units and an energy<br />

source. The considered platform includes one energy resource (the battery), up to four processor<br />

nodes and a NoC. The NoC of the considered platform provides point-to-point connections with<br />

a guaranteed bandwidth and latency. Each node includes a processor unit that runs an operating<br />

system on which tasks can be mapped. The processor unit has a private storage unit (data<br />

memory) to store the code and context of the tasks. Moreover, a node includes a communication<br />

unit on which buffers can be mapped that realise dependencies between the tasks that are mapped<br />

on the processor unit. This communication unit has a private storage unit (buffer memory) to<br />

store the information in the buffers mapped onto the node. On the other hand, the NoC only<br />

includes a communication unit and a storage unit for mapping dependencies between tasks that<br />

are mapped onto different nodes. Any processor, communication and storage unit drains energy<br />

from the battery when it is used.<br />

The platform has several parameters that can be set to obtain an optimal realisation of the<br />

multiprocessor system. Next to the number of processor nodes that can be used, there is a choice<br />

of three different processor types: MIPS, ARM7 and TriMedia 1 . Each processor type has different<br />

characteristics, which are summarised in Table 3. The frequency in Table 3 denotes the base<br />

frequency of the processor unit, while the context switching time refers to the number of cycles<br />

it takes to initialise the execution of another task. The power consumption refers to the power<br />

1 The processor specifications used in this assignment are fictitious. The names do not refer to real processors.<br />

4


MIPS<br />

Frequency 167000000 Cyles per Second<br />

Context Switching Time 1500 Cycles<br />

Power Consumption 0.075 Watt<br />

ARM7<br />

Frequency 100000000 Cyles per Second<br />

Context Switching Time 1200 Cycles<br />

Power Consumption 0.07 Watt<br />

TriMedia<br />

Frequency 200000000 Cyles per Second<br />

Context Switching Time 3200 Cycles<br />

Power Consumption 0.085 Watt<br />

Table 3: Processor specifications.<br />

that is used when the processor is switching tasks or executing a task. Another parameter for<br />

each node is the type of scheduler in the operating system that runs on the processor. The<br />

alternatives are FCFS (representing a first-come first serve scheduling policy without preemption)<br />

and PB (denoting a priority-based scheduling policy with preemption). For the latter type of<br />

scheduler, each task of the application must have a priority assigned to it, ranging from 1 to 7<br />

(higher number means higher priority). The final parameter of a node is the voltage scaling factor<br />

(sV ), which scales the base frequency and power consumption of the processor. Valid voltage<br />

scaling factors for any processor type are 1/4, 1/3, 1/2, 2/3, 3/4 and 1/1. Although the platform<br />

has some other parameters (like bandwidth per connection realised by a communication unit or the<br />

power consumption per communicated and stored Byte), they should remain unchanged during the<br />

assignment. <strong>Assignment</strong> 2-1 only allows changing the number of nodes and the type of processors,<br />

while <strong>Assignment</strong> 2-2 also allows altering the operating system type (including priorities) and<br />

voltage scaling factor.<br />

Operating System The two types of schedulers that can be used for operating system running<br />

on a processor are discussed in a bit more detail. A FCFS scheduler uses a FIFO queue to store ready<br />

tasks, which denote the execution requests that are obtained from the different tasks that are ready<br />

to execute. These requests are put into the queue in the order of reception. In case a request<br />

is available in the queue, the first execution request is granted by executing the corresponding<br />

task without interruptions. An advantage of this type of scheduler is that it is relatively easy to<br />

implement. A disadvantage is that a higher priority task from which an execution request was<br />

received later than a low priority task has to wait until execution of the lower priority task is<br />

finished. The PB scheduler overcomes the latter disadvantage by allowing preemption of any lower<br />

priority task. Instead of a FIFO queue, a PB scheduler maintains a list of execution requests that<br />

is ordered 2 according to the priorities of the involved tasks. Though being more influenceable by<br />

a programmer, a PB scheduler is slightly more difficult to implement. Moreover, preempting a<br />

running task comes at the price of additional context switching, while the memory used by a task<br />

is only freed after completing its execution. Many other schedulers exist for operating systems,<br />

often being much more complex. In the first part of the assignment, only PB schedulers are used,<br />

where the priority of Taski is set equal to i.<br />

Voltage Scaling Changing the supply voltage of the processor units allows trading the energy<br />

used for a given computation load against the execution time. To explain voltage scaling in<br />

more detail, consider that the clock frequency f at which an integrated circuit can operate is<br />

roughly proportional to the supply voltage V for some constant Kf , that is f = Kf · V . Recall<br />

that the energy stored in a capacitor C equals 1<br />

2 · C · V 2 (i.e., proportional to the square of the<br />

2 The PB scheduler orders multiple tasks with equal priority in a (non-preemptive) FIFO fashion.<br />

5


voltage over the capacitor). Therefore, the power P consumed by a processor as the result of<br />

charging and discharging connections is proportional to the square of its supply voltage as well<br />

as to the frequency at which this takes place: P = KP · V 2 · f for some constant KP . As a<br />

result, it can be advantageous to reduce the supply voltage to a processor if the processor has idle<br />

time; frequency (execution speed) can be exchanged for smaller power consumption. Suppose a<br />

processor needs to perform a certain workload of W cycles as indicated in Table 1. The time t<br />

to perform this workload is t = W/f = W/(Kf · V ). The energy E used during that time equals<br />

E = P · t = KP · V 2 · f · W/f = KP · V 2 · W . Now, assume that VB denotes the base supply voltage<br />

and sV the voltage scaling factor, that is V = sV · VB. Then, E = KP · (sV · VB) 2 · W = s2 V · EB,<br />

where EB = KP · V 2 B · W being the base energy used for workload W . The time it takes to execute<br />

the task is t = W/(Kf · V ) = W/(Kf · sV · VB) = tB/sV , where tB = W/(Kf · VB) is the execution<br />

time for the base frequency and supply voltage. Thus, scaling the voltage by a factor sV increases<br />

the execution time by a factor 1/sV and saves energy by a factor of s2 V . From an energy point of<br />

view, a parallel implementation on multiple, slower processors is therefore preferred over a single<br />

very fast processor. This is one of the primary motivations for using MPSoCs. Note also that a<br />

solution that is too fast can be improved by decreasing the voltage scaling factor to make it slower,<br />

while also reducing power consumption. Similarly, a solution that is too slow can be made faster<br />

by increasing the voltage scale factor at the expense of additional power consumption.<br />

4 Mapping<br />

Given a particular (set of) application(s) and platform, the mapping describes which processor<br />

unit, communication unit and storage unit is/are used to realise the execution of tasks and the<br />

communication of tokens. For the considered multiprocessor system, it is only necessary to specify<br />

which tasks of the application are mapped onto which processor nodes since the mapping of<br />

buffers as well as which storage units are used to store what information can be derived from<br />

that automatically. As discussed in Section 3, processor units and communication units have<br />

private memories to store the tasks and tokens respectively. Moreover, each node includes a<br />

communication unit that realises all dependencies between tasks that are mapped on that node,<br />

while all dependencies between tasks mapped on different nodes must be realised by the NoC.<br />

5 Performance Metrics<br />

Application Next to the task graph, an application is characterised by a number of performance<br />

constraints. The following constraints are to be satisfied:<br />

• Latency: ≤ 1/500 seconds;<br />

• Throughput: ≥ 800 Firings of Task7 per second;<br />

• Deadline Miss Probability: ≤ 5%.<br />

Latency is defined here as the time between starting the first firing of Task1 and the corresponding<br />

completion of Task7. In general, the latency can be larger than the time between two<br />

executions of Task1 in case of streaming (pipelined) execution. The throughput indicates the<br />

average number of firings of Task7 per second in such case. Since the application can operate<br />

in different scenarios, there is some variation in the latency possible. Hence, the deadline for the<br />

next completion of Task7 may be missed, but on average, not more than 5% of those completions<br />

are allowed to be late.<br />

Platform Apart from the constraints for the application, there are some optimisation criteria for<br />

the platform. The optimisation criteria for the platform are (in order of importance): (1) energy or<br />

power consumption and (2) amount of resources. The latter refers, amongst others, to the number<br />

of processor units, the size of storage units and the number of concurrent connections to be served<br />

6


y communication units. The overall goal therefore is to look for the minimal configuration of<br />

the system that satisfies the constraints, thereby consuming as little power as possible. Hence, it<br />

makes sense to define some performance metrics for the platform to identify bottlenecks:<br />

• Processor Unit: average utilisation;<br />

• Storage Unit: maximum and average occupancy;<br />

• Communication Unit: maximum and average number of concurrently served connections;<br />

• Battery: peak and average power consumption, where the latter equals the amount of<br />

energy consumed in 1 second.<br />

6 <strong>Assignment</strong><br />

This assignment consists of two parts. The provided POOSL model, which is discussed in more<br />

detail in Section 7, forms the starting point for both parts.<br />

Part 1 <strong>Assignment</strong> 2-1 aims at exploring the trade-off between energy consumption, latency<br />

and number of processor nodes for alternative mappings and processor nodes of different types,<br />

considering a single execution of each task of the application in scenario S1. Use the provided<br />

POOSL model to explore the space of potential solutions that satisfy the latency and deadline<br />

miss constraints for the application and determine a set of feasible, Pareto-optimal configurations<br />

(platform description + mapping).<br />

Recall that in this part of the assignment, the processor units may only use the PB scheduler<br />

type of operating system (where the priority of Taski equals i) and the voltage scaling factors<br />

equal 1/1. Transformation of the task graph is also not considered in this part.<br />

Part 2 In <strong>Assignment</strong> 2-2, the design space is extended with the possibility to transform the<br />

task graph, alternative operating systems (including priorities) and voltage scaling options (which<br />

can be set individually for each of the different nodes). The goal is to find a single optimal solution<br />

in terms of power consumption and amount of resources, subject to the throughput and deadline<br />

miss constraints for the application, which is now executed in a streaming fashion.<br />

Ultimately, 2 June at 9.00h AM, you need to have delivered a short report on your findings<br />

(max. 8 pages, excluding graphs) together with at least the POOSL models that represent your<br />

optimal design solution(s) for both assignment parts, by means of submission to StudyWeb. Please<br />

submit your work as a single zip archive via the <strong>Assignment</strong>2/Submit folder. Studyweb<br />

confirms the correct submission of your work. It is then timestamped and moved to a folder not<br />

visible to you. Please do not resubmit if Studyweb confirmsthe sucessful submission of you file.<br />

Taking the guidelines for documenting experimental research into account, the (individually)<br />

written report should elaborate on at least following aspects for both parts of the assignment:<br />

• The tradeoffs that you have discovered and the resulting design solution(s)<br />

• The approach you used to search the design space for finding your design solution(s), including<br />

an argumentation why you selected this approach<br />

• The impact of streaming execution, task graph transformations, different operating system<br />

types and voltage scaling (Discuss your hypotheses about these aspects in relation to results<br />

you obtained for <strong>Assignment</strong> 2-1)<br />

• The adequacy of abstractions made in the provided model regarding the platform. So,<br />

how well does the model represent a real battery-powered NoC-based MPSoC? One may<br />

for example investigate what abstractions are made in the NoC model for <strong>Assignment</strong> 2 in<br />

7


comparison to those used in <strong>Assignment</strong> 1 and what impact these abstractions may have on<br />

the obtained performance results<br />

If you still have questions, you may contact Bart Theelen (B.D.Theelen@tue.nl) or Marc<br />

Geilen (M.C.W.Geilen@tue.nl). You may also exploit the office hours every Wednesday between<br />

12:30h and 13:30h in PT 9.19. Please announce your visit via B.D.Theelen@tue.nl.<br />

7 Provided Model<br />

This section discusses the provided POOSL model and how it can be used to complete the assignment.<br />

The screen shot of SHESim in Figure 5 shows that it includes a representation of<br />

the Application and the NoC-based MPSoC platform. The SimulationController process is<br />

concerned with terminating the simulation whenever the estimation results for all average performance<br />

metrics have become accurate and is not really a part of the system. It also terminates the<br />

simulation if not all estimations have become accurate after simulating 50 units of model time.<br />

7.1 Application Model<br />

Figure 5: Top-level diagram for the provided model.<br />

The Application cluster models the task graph of Figure 2 and has the following instantiation<br />

parameters that are relevant for performing the assignment:<br />

• Iterate - controls whether to execute the task graph in a streaming fashion or not. Iterate<br />

should be set to false for <strong>Assignment</strong> 2-1 and to true in <strong>Assignment</strong> 2-2;<br />

• MapTaskiTo - indicates on which processor node task i is mapped and hence, the possible<br />

values are "Node1", "Node2", "Node3" and "Node4" (Note that these are Strings; don’t<br />

forget to include the double quotes);<br />

• PriorityTaski - is the integer priority of task i, where a higher number refers to a higher<br />

priority.<br />

The Application cluster captures the structure of processes modelling the tasks and buffers<br />

of Figure 2. The process Task7 includes a monitor to evaluate the percentage of deadline misses<br />

and the throughput metric defined in Section 5. In addition, the latency metric is evaluated in<br />

8


case the Iterate instantiation parameters equals false. The monitoring results can be observed<br />

by inspecting the instance variable Status. On the other hand, the estimation results are logged<br />

to a file named Application.log. Note that no throughput estimation can be given for a single<br />

iteration of the application (i.e., for <strong>Assignment</strong> 2-1).<br />

Modelling Combined Tasks When performing task graph transformations, it is necessary to<br />

adapt process class Task1 and define a new process class for the combined task before instantiating<br />

it in the Application cluster. To enable doing so, the modelled protocols for communicating with<br />

buffers and processor nodes must be obeyed. Note that you must have opened the cluster<br />

class browser on Application when modifying class Task1 because of a bug in SHESim.<br />

Modifying Task1 is relatively easy and only involves changes to the NotifyBuffersAbout-<br />

Mapping, Reserve<strong>Space</strong>ForWrites and PerformWrites methods to ensure that communication<br />

through the control dependency in Figure 2 that has become obsolete is not performed anymore.<br />

For any task, the method NotifyBuffersAboutMapping enables automatic determination<br />

of the mapping of the buffers connected to it. Hence, for Task1, it is necessary to remove the<br />

concurrent activity on sending of the MappedTo message to the involved obsolete buffer. In the<br />

Reserve<strong>Space</strong>ForWrites, the concurrent activity on sending the ReserveRoom message and consecutively<br />

receiving the ReservationSuccessful message must be removed. Finally, in method<br />

PerformWrites, the statement for sending the WriteToken message must be removed.<br />

Defining a process class (let’s assume that it is called TaskX) to model the new combined<br />

task can best be based on defining a subclass of Task (and using an existing Task class definition<br />

as a reference). This process class provides a template that is used for defining all other<br />

tasks (except for Task1) in the provided model. When doing so, defining class TaskX only involves<br />

overloading methods NotifyBuffersAboutMapping, CheckTokenAvailabilityForReads,<br />

Reserve<strong>Space</strong>ForWrites, Release<strong>Space</strong>ForReads and PerformWrites while the reception of<br />

the token from the Control port and the communication with the processor node is already<br />

defined in the superclass Task. Method NotifyBuffersAboutMapping must include a concurrent<br />

activity for informing each buffer connected to TaskX about its mapping. This concurrent<br />

activity concerns the sending of a message MappedTo with MapTo as parameter. The method<br />

CheckTokenAvailabilityForReads models the request and acknowledgement with the buffer at<br />

each input of TaskX about the availability of a token. The protocol prescribes to send the request as<br />

a message InspectTokenAvailability and then receive the acknowledgement by means of a message<br />

TokenAvailable with some parameter of class Token. The method Reserve<strong>Space</strong>ForWrites<br />

involves the specification of a concurrent activity for informing the buffer at each output of TaskX<br />

about the intention to write a token after completing execution. Such a concurrent activity models<br />

the request and acknowledgement regarding the reservation of buffer space. The protocol<br />

prescribes to send the request by means of a message Reserve<strong>Space</strong> with a parameter that represents<br />

the token to be written (or more precisely, the size of the token to be written). To this<br />

end, a new object of class Token must be created and its size in Bytes (see also Table 2) must be<br />

initialised using data method setSize. The acknowledgement is modelled by receiving message<br />

ReservationSuccessful. The method Release<strong>Space</strong>ForReads models informing the buffer at<br />

all inputs about the completion of the execution of the task and hence that the space for the<br />

read token can be released by sending a message Release<strong>Space</strong> to all inputs. Finally, method<br />

PerformWrites informs the buffer at all outputs about the fact that the produced tokens have<br />

become available for consumption by the target tasks.<br />

After defining process class TaskX, an instance can be created in the Application. What remains<br />

to be done is properly connecting the new process with the involved buffers and initialising its<br />

instantiation parameters. Instantiation parameter Name should be a String indicating the task’s<br />

name, that must match with the new entry in the processor specification files MIPS.txt, ARM7.txt<br />

and TriMedia.txt, see also Section 7.2. The instantiation parameters MapTo and Priority initialise<br />

the processor node to which the new task is mapped and the priority of the task respectively.<br />

9


7.2 Platform Model<br />

Figure 6: Cluster class Platform.<br />

The MPSoC cluster in Figure 5 models the considered battery-powered NoC-based MPSoC platform<br />

with at maximum four processor nodes. Figure 6 gives a screen dump of the Platform cluster<br />

class definition indicating the models of the battery, NoC and four processor nodes. Note that<br />

although four processor nodes (i.e., Node1, Node2, Node3 and Node4) are instantiated in the<br />

Platform model, the mapping determines how many nodes are actually used 3 . The process<br />

PlatformMonitor is concerned with terminating the simulation whenever the estimation results<br />

for all average performance metrics have become accurate and is not really a part of the system.<br />

The ProcessorNode cluster class has the following relevant instantiation parameters:<br />

• ProcessorType - denotes the type of processor unit in the node. The possible values are<br />

"MIPS", "ARM7" and "TriMedia" (Note that these are Strings);<br />

• OSPolicy - indicates the type of scheduler for the operating system running on the processor<br />

unit. The possible values are "PB" and "FCFS" (Note that these are Strings);<br />

• VoltageScaleFactor - concerns the voltage scaling factor. Possible values are 1/4, 1/3,<br />

1/2, 2/3, 3/4 and 1/1 (Note that these must be Reals and hence, use e.g. 1/1 or 1.0).<br />

The Battery includes a monitor to evaluate the performance metrics defined in Section 5<br />

by means of the Status instance variable. The estimation results for these performance metrics<br />

are also logged in the file Battery.log. Note that a single iteration of the application will not<br />

provide sufficient information to properly estimate the average power consumption with a detailed<br />

accuracy indication. In this case, only the average power consumption that was observed during<br />

the particular simulation run is provided without any accuracy details. The other resources of<br />

the platform also include monitors, though whether they produce results depends on whether<br />

something is mapped onto the involved resource. The processor utilisation and data memory<br />

occupancy results are logged in files named ProcessorNodei.log, where i is the number of the<br />

node. On the other hand, the utilisation of the communication unit and buffer memory occupancy<br />

results are logged in files named CommunicationNodei.log, where i is again the number of the<br />

3 The instantiated processor, memory and communication resources of an unused node will not consume power.<br />

10


node. Finally, the utilisation of the communication unit and buffer memory occupancy results for<br />

the NoC are logged in file named CommunicationNoC.log.<br />

Processor Specifications The specification of the processor types and corresponding profiling<br />

information for the different tasks in Tables 3 and 1 are stored in the files MIPS.txt, ARM7.txt<br />

and TriMedia.txt. When simulating with SHESim, these 3 processor specification files must be<br />

located in the directory where the image file (originally named SHESim.im) is located. When using<br />

the Rotalumis tool, the 3 processor specification files must be located in the directory where the<br />

exported model file with extension .p4r (default name is model.p4r) is located.<br />

Specifying Combined Tasks When performing task graph transformations, it is necessary to<br />

add the profiling information derived for the combined task as illustrated in Figure 4 to the files<br />

MIPS.txt, ARM7.txt and TriMedia.txt in a similar way as for the original tasks. Ensure that<br />

you use the same task name (between double quotes) in the first column as you used to initialise<br />

the instantiation parameter of the process modelling the new task in the Application.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!