SimRisk: An Integrated Open-Source Tool for Agent-Based ...
SimRisk: An Integrated Open-Source Tool for Agent-Based ...
SimRisk: An Integrated Open-Source Tool for Agent-Based ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Server 1<br />
Memory<br />
Server 2<br />
Memory<br />
Server N<br />
Memory<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
1<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
2<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
3<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
4<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
1<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
2<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
3<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
4<br />
…<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
1<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
2<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
3<br />
L2<br />
cache<br />
C<br />
o<br />
r<br />
e<br />
4<br />
PCI-e<br />
PCI-e<br />
PCI-e<br />
Fiber Optical Channel<br />
Figure 5: A cluster of quad-core servers.<br />
a hardware architecture and an agent-based model, how to distribute threads and processes<br />
to cores and processors <strong>for</strong> better per<strong>for</strong>mance. This issue has a special meaning in a cluster<br />
environment: a multi-core cluster supports both shared-memory and message-passing communication,<br />
which have very different characteristics. We will study optimal distribution<br />
of threads and processes <strong>for</strong> minimizing communication overhead in context of agent-based<br />
supply-chain simulation. Specifically we will study the following methods:<br />
(a) Explore model structure and data dependency to improve load balancing. For example,<br />
consider the supply chain in Figure 1.(b), and assume that we will run simulation on<br />
a cluster of quad-core processors, whose architecture is shown in Figure 5. Figure 6<br />
shows a distribution of threads and processes using heuristics from model structure<br />
and data dependency. The communication between supply-chain elements is through<br />
shipments and messages. We assume that messages can only be passed along routes.<br />
As a general principle, threads <strong>for</strong> a sub-network of closely coupled elements will be<br />
placed on the cores of the same processor. These closely coupled elements require more<br />
frequent communication between them, which shall be implemented with less overhead<br />
using shared memory. As an example, in Figure 6 threads <strong>for</strong> the elements of the<br />
sub-networks of w21 a and wb 21 are assigned to the same processor, and processes <strong>for</strong> the<br />
sub-networks of s a and s b are allocated to different processors. In general, the higher<br />
elements are in a network hierarchy, the less they will communicate with each other, since<br />
the operations of elements on a higher level will be planned over a much bigger planning<br />
horizon. We reserve shared memory <strong>for</strong> communication among closely coupled low-level<br />
elements and use message passing <strong>for</strong> communication among high-level elements.<br />
(b) Profile threads and optimize thread scheduling. To further improve the per<strong>for</strong>mance<br />
of generated parallel simulators, we will profile the execution time and the overhead of<br />
threads, and use this in<strong>for</strong>mation to optimize thread scheduling. Using the result from<br />
profiling, the generative simulation engine will express the thread scheduling problem as<br />
a linear programming problem. It will use the optimization result to define scheduling<br />
policy <strong>for</strong> threads.<br />
11