ANNUAL REPORT 2012

More documents

Recommendations

Info

STAMP - Streaming Applications on Embedded High-Performance Commercial Platforms Streaming Applications on Embedded High-Performance Commercial Platforms Zain-ul-Abdin 1 , Bertil Svensson 1 , Hoai Hoang Bengtsson 1 1. Centre for Research on Embedded Systems, Halmstad University, SE-301 18 Halmstad, Sweden STAMP project has sprung out of the successful ELLIIT cooperation ”Flexible embedded platforms for ELLIIT applications” between LTH/CS, LTH/EIT, LiU/datorteknik, and HH/CERES. The consortium managed to get funding from SSF for a five-year project within the framework program Electronic and Photonic Systems (project HiPEC – High Performance Embedded Computing, project leader Kris Kuchcinski). While the HiPEC project has a focus on the design of new hardware architectures and efficient mapping of applications on these, the proposed project has a focus on the efficient use of architectures that are emerging on the commercial market at an increasing pace. The focus in this project is on the efficient use of architectures that are emerging on the commercial market at an increasing pace. We will develop the necessary methods and tools to complete the design flow, namely compiling and running CAL applications, targeting these emerging commercial architectures. The project started in 2012 and will continue until 2014. Introduction & Motivation The massively parallel processor arrays are customized to achieve high-performance in the order of tens of GFLOPS at a very low power budget typically a few watts. It is these properties that make these architectures very well-suited for implementing high-performance embedded applications having regular data streaming patterns. While these massively parallel and reconfigurable architectures promise to bring the needed performance and energy efficiency to tomorrow’s industrial applications, the big challenge is how to program them efficiently and achieve the necessary portability and scalability. STAMP Tool-chain In this project, we intend to propose the necessary methods (and tools) to complete the design flow, namely compiling and running CAL applications, targeting these emerging commercial architectures. Goals The overall goals of the project are: (1) (2) to complete the design flow (methodology and tools) for efficient execution, in terms of performance and energy consumption, of Cal programs on selected commercial architectures. to propose transformations, optimizations, representations, and runtime-environments that would enable the best scaling of an implementation across diverse commercial architectures, obtained from the same initial specification. The project does not address general purpose processing; rather it is oriented towards the needs of high-performance embedded signal processing applications. Results so far 1. Intermediate Representation for dataflow programs based on Actor Machine Model: A machine model for dataflow actors has been proposed that captures the structure and logic of selecting and executing the actions comprising the actor. 2. Translator from CAL actor language into the Intermediate Representation: A translator has been developed that translates the CAL actor language to the intermediate representation of actor machine model. 3. CAL compilation backend to Ambric array of processors: A prototype backend of CAL compilation framework has been developed to compile CAL programs to the proprietary languages of Ambric i.e., aJava and aStruct languages. 22 CERES Annual Report 2012
HTM - Hierarchical Temporal Memory on Manycores HTM - Hierarchical Temporal Memory on Many-cores T. Nordström, D. Hammerstrom, Zain-ul-Abdin, J. Duracz, and B. Svensson Centre for Research on Embedded Systems Introduction An increasingly important aspect of embedded computing is the processing and understanding of noisy real world data, then making decisions and taking timely actions based on these data. Consequently, various kinds of intelligent computing structures are being investigated as important building blocks in embedded system design. One very promising algorithm is Hierarchical Temporal Memory (HTM) which is being developed by Numenta, Inc. and which is already being used in a number of real applications. Being based on more biological kinds of models, HTM is massively parallel, but it is also computationally intensive and as it is integrated into real applications, it is starting to run into performance limitations. The goal of this CERES project is to explore suitable hardware support for the acceleration of the Hierarchical Temporal Memory HTM Learning Cortical Learning Algorithm (CLA) is a memory system that learns sequences of patterns and makes predictions. When an HTM model is exposed to a stream of data the CLA predicts what is likely to happen next, similar to how you predict the next note in a familiar song or the next word someone is likely to say in a common phrase. In addition, the CLA modifies its memory with each new record. Thus the HTM models are continually adapting to reflect the most recent patterns. Next Steps As memory is critical in HTM, as in most artificial neural network models, we will focus our effort on many-core mapping strategies towards optimizing memory management. In our cooperation with Portland State University, PSU has been focusing on FPGA and GPU implementation and HH on multi-core and Ambric/Adapteva “many-core” style parallelism. As a next step we hope to compare these different implementations. Project Key Data Partners: Halmstad University, Portland State University, Numenta, Inc., Nethra Imaging, Inc. , and Adapteva, Inc. Duration: Sep. 2011 – Dec. 2013, Funding: CERES+ project, Volume: 820 kSEK Contact: Tomas Nordström, HH HTM Structure The HTM-CLA is a highly detailed model of a layer of cells in the neocortex. In a typical CLA implementation there are 2000 columns of simulated neurons (one per output bit of the spatial memory structure) and twenty simulated neurons per column, giving each CLA over 40,000 neurons. Each neuron has dozens of non-linear dendrite segments and potentially thousands of synapses. HTM on Adapteva Master students Zhou Xi & Luo Yaoyao, have implemented CLA on our Adapteva development board. They have investigated how to map HTM-CLA onto Adapteva’s Epiphany many-core architecture and have been running experiments to find out the speedup and efficiency of this HTM mapping onto this many-core architecture. Preliminary results show an almost perfect scalability when mapping HTM onto Adapteva. CERES Annual Report 2012 23
Page 1 and 2: Centre for Research on Embedded Sys
Page 3 and 4: CERES Centre for Research on Embedd
Page 5 and 6: CERES Centre for Research on Embedd
Page 7 and 8: First year of the “CERES+” Phas
Page 9 and 10: Cooperation with Industry A group o
Page 11 and 12: Personell Jonsson, Magnus Prof., Ph
Page 13 and 14: Duracz, Adam M. Sc. C. S. PhD Stude
Page 15 and 16: Halmstad University now a Foundatio
Page 17 and 18: Ph D Graduation Situation-Aware Veh
Page 19 and 20: Mohammad Mousavi Mohammad Mousavi (
Page 21: characteristics of the architecture
Page 25 and 26: PARTNERS AND STATUS Project funding
Page 27 and 28: Acumen+: Core Enabling Technology f
Page 29 and 30: comparatively more frequent instanc
Page 31 and 32: WisCon - Wireless sensor concept no
Page 33 and 34: provided by the 802.11p quality of
Page 35 and 36: decentralized, self-organizing and
Page 37 and 38: Publications 2010-2012 Internationa
Page 39 and 40: F. Dechesne and M.R. Mousavi. Inter
Page 41 and 42: Freitas E.P., T. Heimfarth, C.E. Pe
Page 43 and 44: Böhm, Annette, Jonsson, Magnus (20
Page 45 and 46: CERES Annual Report 2012

ANNUAL REPORT 2012

Create successful ePaper yourself

Delete template?

Save as template?