Scientific and Technical Aerospace Reports Volume 38 July 28, 2000

More documents

Recommendations

Info

cal constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today’s largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called ”percolation”, yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management. Author Architecture (Computers); Computer Networks; Computer Programs; Holography; Parallel Computers; Parallel Processing (Computers); Parallel Programming; Quantum Electronics; Supercomputers 20000064606 MRJ Technology Solutions, Moffett Field, CA USA Multithreaded Implementation of a Dynamic Irregular Application Biswas, Rupak, MRJ Technology Solutions, USA; Oliker, Leonid, California Univ., Lawrence Berkeley Lab., USA; February 2000; In English; See also 20000064579; No Copyright; Abstract Only; Available from CASI only as part of the entire parent document The success of parallel computing in solving realistic computational applications relies on their efficient mapping and execution on large-scale multiprocessor architectures. When the algorithms and data structures corresponding to these problems are unstructured or dynamic in nature, efficient implementation on parallel machines offers considerable challenges. Unstructured applications are characterized by irregular data access patterns while dynamic mesh adaptation causes computational workloads to grow or shrink at runtime. For such applications, dynamic load balancing is required to achieve algorithmic scaling on parallel machines. This talk describes a multithreaded implementation of a dynamic unstructured mesh adaptation algorithm on the Tera Multithreaded Architecture (MTA). Multithreaded machines can tolerate memory latency and utilize substantially more of their computing power by processing several threads of computation. For example, the MTA processors each have hardware support for up to 128 threads, and are therefore especially well suited for irregular and dynamic applications. Unlike traditional parallel machines, the MTA has a large uniform shared memory, no data cache, and is insensitive to data placement. Parallel programmability is significantly simplified since users have a global view of the memory, and need not be concerned with partitioning and load balancing issues. Performance is compared with an MPI implementation on the T3E and the Origin2000, and a sharedmemory directives-based implementation on the Origin2000. A standard computational mesh simulating flow over an airfoil was used for our experiments to compare the three parallel architectures. The initial mesh, consisting of more than 28,000 triangles, was refined a total of five times to generate a mesh 45 times larger. Performance by platform and programming paradigm is presented in Table 1. It is important to note that different parallel versions use different dynamic load balancing strategies. The multithreaded implementation of the adaptation algorithm required adding a trivial amount of code to the original serial version and had little memory overhead. In contrast, the MPI version doubled the size of the code and required significant additional memory for the communication buffers. The simulation on the 8-processor MTA at SDSC using 100 threads per processor was almost 10 times faster than that obtained on 160 processors of a T3E and more than 100 times faster than that on a 64-processor Origin2000 running a directives-based version. Results indicate that multithreaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers. Author Multiprocessing (Computers); Parallel Computers; Parallel Processing (Computers); Computerized Simulation 20000064623 NASA Ames Research Center, Moffett Field, CA USA Testing New Programming Paradigms with NAS Parallel Benchmarks Jin, H., NASA Ames Research Center, USA; Frumkin, M., NASA Ames Research Center, USA; Schultz, M., NASA Ames Research Center, USA; Yan, J., NASA Ames Research Center, USA; February 2000; In English; See also 20000064579 Contract(s)/Grant(s): NAS2-14303; No Copyright; Abstract Only; Available from CASI only as part of the entire parent document 178
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to ”compile once and run anywhere.” In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Javathreads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the ”REDIS- TRIBUTION” directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs. Author Parallel Programming; Performance Prediction; Java (Programming Language); Computer Systems Performance; Interprocessor Communication 20000064727 Carnegie-Mellon Univ., School of Computer Science, Pittsburgh, PA USA Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness Schroeder, Bianca; Harchol-Balter, Mor; Mar. 2000; 18p; In English Report No.(s): AD-A377091; CMU-CS-00-118; No Copyright; Avail: CASI; A03, Hardcopy; A01, Microfiche While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is growing increasingly common. This alternative setup may be described simply as a collection of multiprocessors or a distributed server system. This collection of multiprocessors is fed by a single common stream of jobs, where each job is dispatched to exactly one of the multiprocessor machines for processing. The biggest question which arises in such distributed server systems is what is a good policy for assigning jobs to host machines. Many task assignment policies have been proposed, but not systematically evaluated under supercomputing workloads. In this paper we start by comparing existing task assignment policies using a trace-driven simulation under supercomputing workloads. We use analysis to validate our results and to provide intuition. We find that while the performance of supercomputing servers varies widely with the task assignment policy, none of the above policies perform as well as we would like. We observe that all task assignment policies proposed thus far aim to balance load among the hosts. We propose a policy which purposely unbalances load among the hosts, yet, counter-to-intuition, is also fair in that it achieves the same expected slowdown for all jobs - thus no jobs are biased against. We evaluate this policy again using both trace-driven simulation and analysis. We find that the performance of the load unbalancing policy is significantly better than the best of those policies which balance load. DTIC Distributed Processing; Client Server Systems; Multiprocessing (Computers) 179
Page 1 and 2:
Scientific and Technical Aerospace
Page 3 and 4:
Introduction Scientific and Technic
Page 5 and 6:
Subject Divisions Table of Contents
Page 7 and 8:
ground equipment and systems. For a
Page 9 and 10:
Subject Categories of the Division
Page 11 and 12:
48 Oceanography 139 Includes the ph
Page 13 and 14:
72 Atomic and Molecular Physics 191
Page 15 and 16:
Document Availability Information T
Page 17 and 18:
Avail: NASA Public Document Rooms.
Page 19 and 20:
NASA CASI Price Tables — Effectiv
Page 21 and 22:
ALABAMA AUBURN UNIV. AT MONTGOMERY
Page 23 and 24:
��
Page 25 and 26:
20000061433 Pisa Univ., Dipartiment
Page 27 and 28:
English; 34th; 34th Thermophysics C
Page 29 and 30:
ciency of the generated derivative
Page 31 and 32:
20000063509 NASA Glenn Research Cen
Page 33 and 34:
05 AIRCRAFT DESIGN, TESTING AND PER
Page 35 and 36:
A long tradition of aerodynamic des
Page 37 and 38:
sonic and transonic cases will be p
Page 39 and 40:
20000061449 British Aerospace Publi
Page 41 and 42:
20000064593 NASA Langley Research C
Page 43 and 44:
performance required for a proposed
Page 45 and 46:
ment cost and create better product
Page 47 and 48:
tion (NPSS), is focused on the inte
Page 49 and 50:
surface measurement techniques used
Page 51 and 52:
with emphasis on the search for lin
Page 53 and 54:
20000063520 NASA Kennedy Space Cent
Page 55 and 56:
primary issues for it’s adaptatio
Page 57 and 58:
Footage shows the night vertical ta
Page 59 and 60:
Page 61 and 62:
necessary for ignition modeling, or
Page 63 and 64:
engine lifetime, and high power ope
Page 65 and 66:
24 COMPOSITE MATERIALS ��
Page 67 and 68:
ments. For the screening of liner m
Page 69 and 70:
formation of a neutral, fluorescent
Page 71 and 72:
20000065655 Ames Lab., IA USA Funda
Page 73 and 74:
This is a second Triennial Conferen
Page 75 and 76:
of completely charged PSSNa solutio
Page 77 and 78:
20000064100 NASA Langley Research C
Page 79 and 80:
Spacecraft in low Earth orbit (LEO)
Page 81 and 82:
Pressure gradients, i.e. pressure h
Page 83 and 84:
agreed that the NO(sub x) levels co
Page 85 and 86:
tectural approach to extending the
Page 87 and 88:
20000064078 Physics and Electronics
Page 89 and 90:
20000064925 Institute for Human Fac
Page 91 and 92:
20000066607 Institute for Human Fac
Page 93 and 94:
20000062455 National Inst. of Stand
Page 95 and 96:
Added Analysis (Risk Reduction); 6)
Page 97 and 98:
shipboard electrical power generati
Page 99 and 100:
20000067664 Jet Propulsion Lab., Ca
Page 101 and 102:
other, The theoretical framework go
Page 103 and 104:
figures compare the lift and pitchi
Page 105 and 106:
ment phase demonstrate desired feat
Page 107 and 108:
esolution mode with resolving power
Page 109 and 110:
to track 1% or 0.5% changes was hig
Page 111 and 112:
20000067643 NASA Marshall Space Fli
Page 113 and 114:
37 MECHANICAL ENGINEERING ��
Page 115 and 116:
solver based on overset grid techno
Page 117 and 118:
20000067659 NASA Marshall Space Fli
Page 119 and 120:
20000064621 Rensselaer Polytechnic
Page 121 and 122:
that for a specific example of a be
Page 123 and 124:
km x 1000 km). As with the nearly c
Page 125 and 126:
20000064527 Analytical Imaging and
Page 127 and 128:
20000064533 Austin State Univ., Dep
Page 129 and 130:
20000064539 California Univ., Inst.
Page 131 and 132:
Page 133 and 134:
in Yellowstone National Park. We ar
Page 135 and 136:
terns of community hysteresis based
Page 137 and 138:
Page 139 and 140:
tion model (DEM) fitting to the sca
Page 141 and 142:
The Carolina slate belt is a 10- to
Page 143 and 144:
20000063373 Los Alamos National Lab
Page 145 and 146:
20000064912 New Energy and Industri
Page 147 and 148:
and discusses the issues and resear
Page 149 and 150: The objective of this study was to
Page 151 and 152: distributions with engine test resu
Page 153 and 154: 7.6, while the 1926 events appear t
Page 155 and 156: We investigate the compositional di
Page 157 and 158: Flight Center, USA; Moore, T. E., N
Page 159 and 160: This report presents the Applied Me
Page 161 and 162: in Goa, India, by 3 - 4 days. Wind
Page 163 and 164: 51 LIFE SCIENCES (GENERAL) Includes
Page 165 and 166: 20000062717 Joint Inst. for Nuclear
Page 167 and 168: 20000064015 Institute for Human Fac
Page 169 and 170: during 50% (SD 4%) of the breathing
Page 171 and 172: 20000064711 Massachusetts Inst. of
Page 173 and 174: The analysis of this smaller data s
Page 175 and 176: vided a functional helmet mount. Th
Page 177 and 178: 61 COMPUTER PROGRAMMING AND SOFTWAR
Page 179 and 180: The goal of multi-image classificat
Page 181 and 182: 20000063526 Defence Science and Tec
Page 183 and 184: 85 dB amplifier design evolved by o
Page 185 and 186: 20000064594 New Mexico Univ., Elect
Page 187 and 188: 20000064615 NASA Ames Research Cent
Page 189 and 190: declarations with a pattern recogni
Page 191 and 192: containing point where interpolatio
Page 193 and 194: has his own responsibility, while t
Page 195 and 196: the people, documents and projects
Page 197 and 198: and experimental technology, many o
Page 199: tems. In particular this applicatio
Page 203 and 204: expensive than a single analysis. I
Page 205 and 206: 20000064726 Carnegie-Mellon Univ.,
Page 207 and 208: 20000064099 Teledyne Brown Engineer
Page 209 and 210: 20000066597 Geophysical Observatory
Page 211 and 212: 20000065633 Institute TNO of Applie
Page 213 and 214: important conclusion is that for su
Page 215 and 216: est frame of the string. The modifi
Page 217 and 218: isotopes. The method and program we
Page 219 and 220: A detailed study is undertaken, usi
Page 221 and 222: 20000063497 Kaiser Electro-Optics,
Page 223 and 224: delivers 60% of the x-ray flux from
Page 225 and 226: 76 SOLID-STATE PHYSICS ��
Page 227 and 228: 20000064660 Abdus Salam Internation
Page 229 and 230: 20000067648 NASA Marshall Space Fli
Page 231 and 232: 20000064703 Institute of Atomic Ene
Page 233 and 234: The BRST approach is applied to the
Page 235 and 236: 20000067671 Hampton Univ., VA USA A
Page 237 and 238: 82 DOCUMENTATION AND INFORMATION SC
Page 239 and 240: ecoming almost too cumbersome to be
Page 241 and 242: physical world. These are the two p
Page 243 and 244: with WFPC2 data. Tests of HSTphot
Page 245 and 246: A brief description is given of the
Page 247 and 248: 20000064070 NASA Ames Research Cent
Page 249 and 250: gas, show variations that have rema
Page 251 and 252:
egy, called Power In that would all
Page 253 and 254:
Page 255 and 256:
20000064089 Smithsonian Astrophysic
Page 257 and 258:
A ABERRATION, 143, 198 ABRASION, 22
Page 259 and 260:
COMPRESSIBLE FLOW, 83 COMPRESSION L
Page 261 and 262:
FATIGUE (MATERIALS), 50, 93, 96 FAT
Page 263 and 264:
LASING, 90 LATE STARS, 224 LATITUDE
Page 265 and 266:
PATCH ANTENNAS, 74 PATHOLOGY, 98 PA
Page 267 and 268:
SPACE PLATFORMS, 38 SPACE PROBES, 2
Page 269 and 270:
WIND PROFILES, 137 WIND TUNNEL TEST
Page 271 and 272:
Brown, Andy, 38 Brown, April S., 20
Page 273 and 274:
Grosvenor, C. A., 70 Grosvenor, J.
Page 275 and 276:
Lu, Gui-Ying, 50 Lugg, Desmond J.,
Page 277 and 278:
Robertson, Tony, 39 Robinson, Jeffr
Page 279:
Warner, J. D., 63 Warren, James, 16
show all

Scientific and Technical Aerospace Reports Volume 38 July 28, 2000

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?