Quantum Chemistry with GAMESS - Materials Computation Center

DDIwith high performance and potentially intelligentinterconnect networks like Gigabit Ethernet [9], Myrinet[10], SCI [11], or Infiniband [12]. A similar trend isalso evident in dedicated supercomputers, where, forexample large scale IBM SP and HP SC systems nowuse SMP nodes. Indeed, very large shared memorycomputers, like the SGI Origin 3000 or HP GS, usuallyhave Non-Uniform Memory Access (NUMA)architectures that can be viewed as a cluster of uniformmemory SMPs linked via a network, albeit a very goodnetwork.With this move away from single processor towardmulti-processor based clusters we are confronted with aconsiderably more complicated memory model than thatwhich was present when either DDI or GA wereoriginally conceived. Now small groups of processeshave equally fast access to chunks of memory, whileaccessing memory between groups of processes isslower. Recognizing this plus the success and popularityof these programming models, it is pertinent to considerhow these models might be extended to better exploitSMP clusters. The aim of this paper is to begin toaddress this issue, presenting an enhanced version ofDDI that includes new functionality specificallytargeting SMP clusters. Using both the new and originalversions of DDI, performance results are presented anddiscussed for a typical GAMESS computation run on avariety of MPP systems. First, however, we begin with abrief discussion of the existing DDI data server modelModeled on the Global Array Framework.used in GAMESS.The Distributed Data Interface provides2a pseudo globalshared memory interface for a portion of a nodes memory.Normal MPI version uses 2 processes per processor, 1compute, 1 data server.Sockets are used for interrupts on data servers becauseMPI often polls in receive.SHMEM and LAPI versions also available...Also provides processor subgroup support.memory is the memory reserved by all the remainingparallel processes for their portions of the distributeddata. Every process in a parallel job is allowed toaccess/modify any element in the distributed memorysegment (regardless of its physical location); however,access to local distributed-memory is assumed to befaster than access to remote distributed-memory. Thusthe DDI programming strategy aims to maximize theuse of local distributed data while minimizing remotedata requests. Note that the performance penalty foraccessing distributed-memory (local or remote) iscompletely dependent on the underlying machineFigure 1: The virtual shared-memory model. Each large box(grey) represents the memory available to a given CPU. Theinner boxes represent the memory used by the parallel processes(rank in lower right). The gold region depicts the memoryreserved for the storage of distributed data. The arrows indicatememory access (through any means) for the distributedoperations: get, put and accumulate.9

Previous page

Next page

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

40

41

42

43

44

45

46

47

48

49

50

51

Quantum Chemistry with GAMESS - Materials Computation Center

Create successful ePaper yourself

Delete template?

Save as template?