12.07.2015 Views

Online proceedings - EDA Publishing Association

Online proceedings - EDA Publishing Association

Online proceedings - EDA Publishing Association

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

24-26 September 2008, Rome, ItalyMultithreading and Strassen’s algorithms inSUNRED field solverLászló Pohlpohl@eet.bme.huTel: +36-1-463-2704, Fax: +36-1-463-2973Budapest University of Technology and Economics (BME), Department of Electron Devices1521 Budapest, HungaryAbstract-Complex structures can be well modeled by simulationusing appropriate field solvers; however the investigationof detailed models is a time demanding process even on the latestcomputers. This article surveys the new developments resultingin execution time reduction of the thermal and electrothermal field solver that takes a Finite Differences Method(FDM) based model as input. This tool is based on the vectorizedversion of SUNRED algorithm. One part of the developmentsis architectural optimization: multithreading, memoryand cache system optimization, the other part is the implementationof matrix multiplication and inversion acceleration algorithms.The result is a significantly faster field solver program.I. INTRODUCTIONField simulation can be extremely computation time demandingexamination method, especially in complex casessuch as transient analysis. Transient analysis is based on seriesof steady-state (DC) simulations, often with hundreds ofsteps. There are two ways for software engineers to reducecomputation time: decrease the element number of the simulatedfield or, when this is not possible, optimizing the softwarewith special algorithms. An example for the first case isour earlier published algorithm [1] which lets SUNRED usersto simulate not only cuboid fields but any shape theywish. Element number can be also reduced by decreasing theresolution and details of the field, but this is not always applicable,and the benefit is questionable.This paper presents the new algorithms introduced in theSUNRED algorithm that result in significant speed increasewhile offering almost the same accuracy when simulatingthe same problem as with the older version of the program.The new algorithms can be split in two groups:• Taking advantage of merits of modern PC architectures:o Multithreading on multi-CPU or multi-core systems.The POSIX Threads library [2] is applied for thispurpose that is available for UNIX/Linux and Windowsplatforms.o Other important algorithmic changes concerningmemory and cache usage [3,4].• The second group of algorithmic developments consistsof the implementation of Strassen’s special matrix multiplicationand inversion algorithm [5], and the fast complexmatrix multiplication method [6].II. MULTITHREADING AND OTHER ARCHITECTURALOPTIMIZATIONSThe actual version of SUNRED algorithm is capable ofsteady-state simulation of finite differences equations describingthermal or coupled electro-thermal fields. The simulationmodel of the structure under investigation is turnedinto an electrical network which is treated by the SUccessiveNetwork REDuction algorithms (hence the name SUNRED).The purpose of computation is to determine the node voltagesof the electrical model network by applying boundaryconditions and excitations. Solution is done by the successivealgorithm shown in Fig.1: first the electrical network isdivided into elementary cells which are represented by theadmittance matrices and inhomogeneous current vectors. Detailscan be found our earlier publications, e.g.: [1,7]. Theelementary cells are merged by successive reduction steps(left to right in the figure). In the final step, when only twocells remained, the voltages of common nodes are calculated,and then the voltages of eliminated nodes are determinedin backward substitution steps.In a reduction or substitution step the merger or the separationof two cells does not depend on the other cells of thenetwork, so the operations are parallelizable. There is no useto start a new thread for every reduction because thread administrationconsumes time. Theoretically the most optimalcase is when the thread number is equal to the physical orlogical (Hyper-Threading [8]) processing units. In practice,in the case of SUNRED, it is recommended to start morethreads, because one thread can be slower than the other, andmore threads can flatten the differences, see benchmarkingdetails later in section IV. In SUNRED the number ofthreads can be controlled externally in the problem definitionfiles. Fig.1 represents the case when the node-reduction runsusing four threads; each thread deals with one quarter of thecells. After a reduction step the threads join, and in the nextstep new threads are started (because this required the leastmodification on the algorithm).As shown in Fig.1, in the last steps there are fewer threadsthan set, which means: in these steps the processor cores arenot fully utilized. The computation time demand is similar in©<strong>EDA</strong> <strong>Publishing</strong>/THERMINIC 2008 137ISBN: 978-2-35500-008-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!