possibility to execute graph algorithms on the softwarearchitecture itself. The graph algorithms operate on thesoftware’s logical graph not the execution graph. Thisprovides the possibility for higher level optimizations(super optimization). The architecture is designed to beeasily modelable with a domain specific language. Thisdomain specific language eases the development of thesoftware, but its primary purpose is to provide informationfor higher level optimizations. It can be viewed as thelogical description, documentation of the software. Basedon the description language it is possible to generate thelow level execution of the software, this means that it isnot necessary to work at a low level during thedevelopment of the software. The development isconcentrated around the logic of the application. It focuseson what is to be achieved instead of the small steps thatneed to be taken in order to get there.Figure 1.The data flow of a softwareand determine how to respond to the next messagereceived. Figure 1 demonstrates the most basic butessential problem in the concurrent programming. Eachnumber represents one process or one operation to beperformed. The main goal is not to find the resource forparallel computing but to find the way to passintermediate results between the numbered nodes.C. AdvantagesIncreased application throughput - the number of tasksdone in certain time period will increase. Highresponsiveness for input/output - input/output intensiveapplications mostly wait for input or output operations tocomplete. Concurrent programming allows the time thatwould be spent waiting to be used for another task. It canbe stated that there are more appropriate programstructures - some problems and problem domains are wellsuitedto representation as concurrent tasks or processes.II. COMMUNICATIONIn case of distributed systems the performance ofparallelization largely depends on the performance of thecommunication between the peers of the system. Twopeers communicate by sending data to each other,therefore the performance of the peers depends on theprocessing of the data sent and received. Thecommunication data contains the application data as wellas the transfer layer data. It is important for the transferlayer to operate with small overhead and provide fastprocessing. Embedded systems have specificrequirements. It is important that the communicationmeets these requirements.The design of the presented method is focused aroundthe possibility to support and execute high leveloptimizations and abstractions on the whole program. Thegraph-based software layout of the method provides theIII. REALIZATION IN EMBEDDED SYSTEMSThe architecture of modern embedded systems is basedon multi-core or multi-processor setups. This makesconcurrent computing an important problem in the case ofthese systems, as well. The existing algorithms andsolutions for concurrency were not designed for embeddedsystems with resource constraints. In the case of real-timeembedded systems it is necessary to meet time andresource constraints. It is important to create algorithmswhich prioritize these requirements. Also, it is vital to takehuman factor into consideration and simplify thedevelopment of concurrent applications as much aspossible and help the transition from the sequential worldto the parallel world. It is also important to have thepossibility to trace and verify the created concurrentapplications. The traditional methods used for parallelprogramming are not suitable for embedded systemsbecause of the possibility of dead-locks. Dead-locks posea serious problem for embedded systems [5], because theycan cause huge losses. The methods presented in [6](Actor model and STM), which do not have dead-locks,have increased memory and processing requirements, thisalso means that achieving real-time execution becomesharder due to the use of garbage collection. Using thesemethods and taking into account the requirements ofembedded systems one can create a method which iseasier to use than low-level threading and the resourcerequirements are negligible. In the development ofconcurrent software the primary affecting factor is not themethod used for parallelization, but the possibility toparallelize the algorithms and the software itself. To createan efficient method for parallel programming, it isimportant to ease the process of parallelizing software andalgorithms. To achieve this, the used method must forcethe user to a correct, concurrent approach of developingsoftware. This has its drawbacks as well, since the userhas to follow the rules set by the method. The presentedmethod has a steep learning curve, due to its requirementstoward its usage (software architecture, algorithmimplementations, data structures, resource management).On the other hand, these strict rules provide advantages tothe users as well, both in correctness of the applicationand the speed of development. The created applicationscan be checked by verification algorithms and theintegration of parts, created by other users is provided bythe method itself. The requirements of the method providea solid base for the users. In the case of sequential132
applications the development, optimization andmanagement is easier than in the case of concurrentapplications. Imperative applications when executed havea state. This state can be viewed as the context of theapplication. The results produced by imperativeapplications are context-dependent. Imperativeapplications can produce different results for the sameinput because of different contexts. Sequentialapplications execute one action at a given moment with agiven context. In the case of concurrent applications, at agiven moment, one or more actions are executed with inone or more contexts, where the contexts may affect eachother. Concurrent applications can be decomposed intosequential applications which communicate with eachother through their input, but their contexts areindependent. This is the simplest and cleanest form ofconcurrent programming.IV. MAIN PROBLEMSEmbedded systems are designed to execute specifictasks in a specific field. The tasks can range fromprocessing to peripheral control. In the case of peripheralcontrol, concurrent execution is not as important, in mostcases the use of event-driven asynchronous execution orcollective IO is a better solution [7]. In the case of dataandsignal processing systems the parallelization ofprocessing tasks and algorithms is important. It provides asignificant advantage in scaling and increasing processingcapabilities of the system. The importance of peripheraland resource management is present in data processingsystems as well. The processing of the data and peripheralmanagement needs to be synchronized. If we fail tosynchronize the data acquisition with data processing theprocessing will be blocked until the necessary data areacquired, this means that the available resources are notbeing used effectively. The idea of the presented methodis to separate the execution, data management andresource handling parts of the application. The presentedmethod emphasizes on data processing and is made up ofseparate modules. Every module has a specific task andcan only communicate with one other module. Thesemodules are peripheral/resource management module,data management module and the execution module. Theexecution module is a light weight thread, it does not haveits own stack or heap. This is a requirement due to theresource constrains of embedded systems. If required, thestack or heap can be added into the components of theexecution thread with to the possibility of extending thecomponents of the execution thread with user-defined datastructures. The main advantage of light weight threads isthat they have small resource requirements and fast taskswitching capabilities [8][9]. The execution moduleinteracts with the data manager module which convertsraw data to a specific data type and provides input for theexecution module. The connection between the datamanager and the execution module is based on the Actormodel [10] which can be optimally implemented in thiscase, due to the restrictions put on the execution modulewhich can only read and create new data (types) andcannot modify it. The execution module can be monolithicor modular. The modular composition is required forcomplex threads were processing is coupled with actions(IO). The execution threads can be built up from twokinds of components, processing and execution/actioncomponents. The component used in the executionFigure 2. The software development processmodule is a type which for a given input type ’a’ creates agiven type ’b’. This operation will always give the sameresult for the same input.The processing component is referentially transparent,meaning it does not support destructive actions [11]. Thetype variables ’a’ and ’b’ can have the same types. Theaction component is similar to the processing component,it is usable in case where one needs to support destructiveactions. These components request the execution ofspecific actions which are received and executed by atransactional unit. The design of the transactionalmechanism is based on transactions, just as in softwaretransactional memory. The threads in the executionmodule are not connected to each other. It is possible toachieve interaction between the threads. One or moreexecution threads can be joined with the use of the reducecomponent. The reduce component iterates through thevalues of the given threads, merging them into onecomponent or value. The merging algorithm is specifiedby the user, as well as the order of the merging. Thejoining of the threads follows the MapReduce model,where the map functions correspond to the threads and thereduce function corresponds to the merging algorithmprovided by the user [12]. The method introduced in thispaper is usable for concurrent programming in real-timeembedded systems as well. The complexities of thealgorithms used in the method are linear in the worst case.The priority of threads can be specified, this mean that theorder of execution can be predetermined. It is possible tocalculate the amount of time required to execute a specificaction. This way the created systems can be deterministic.Threads can be separated into two parts. The two partscreate a client server architecture, where the server is thedata manager and the client is the actions/steps of thethread. The job of the server (producer) is to provide theclient (consumer) with data. The server part sends the datato the client part. The server part protects the system formpossible collisions due to concurrent access or request toresources. The client part has a simple design it is made upof processing steps and actions.The job of the asynchronous resource manager is toprovide safe access to resources for the server part of thethreads. The resource manager does not check theintegrity of data, its only job is to provide the executionthreads server part with raw data. Parallelization ofsoftware is not trivial in most cases. The method presentedin the paper takes this fact into consideration. It is animportant that the parallelizable and sequential parts of thesoftware can be easily synchronizable. The presented viewof software (as seen in Figure 2) is easily implementableinto the model of the presented method. Based on the data133