Op cIn 3 ALU tel-00553143, version 1 - 6 Jan 2011 8 aluOut cOut 84 Hardware micro-task synthesis Lt_GT0 Lt_GT1 Lt_GT2 op1 op2 cIn op1 op2 op1 op2 op1 op2 op1 op2 op1 op2 op1 op2 op1 op2 ADD NOT CMP AND OR XOR SHL SHR Lt_GT 3 Figure 4.5: Architectural template of customizable ALU block present in hardware micro-task datapath (shown in Figure 4.6). to save the leakage power. More details about the memory management are provided in Section 5.4.4 where the system-level execution-model is presented. Moreover, since the hardware micro-tasks are power-gated (turn-on only when needed), we do not need arbiters, tri-state or multiplexer logic at the input of the shared resources. This also leads to an overall reduction in power and area. 4.1.2 Generic architecture Figure 4.4 shows the micro-architecture of a generic hardware micro-task. In this figure, the hardware micro-task consists of an 8-bit datapath whereas our design-flow is capable of generating both 8-bit and 16-bit datapath according to designer’s choice. The possible trade-offs in terms of energy, power and area consumption of an 8-bit and a 16-bit hardware micro-tasks implementing the same function are discussed in Section 6.4.1. The main components of a hardware micro-task are: � FSM : The control part of the application task is directly micro-coded in the form of an FSM that controls the underlying datapath. � Register file: The register file is implemented in the form of a dual-port RAM. The size of the register file can be customized according to the application by the design-flow. It contains two read-ports and one write-port that enables regreg-type instruction patterns to be executed in one clock-cycle by our datapath. cOut 3 aluOp aluOut
tel-00553143, version 1 - 6 Jan 2011 Proposed design-flow for micro-task generation 85 A hardware micro-task having a dual-port memory benefits from approximately 50% reduction in the size of the control FSM as compared to a hardware microtask having a single-port memory. This potential reduction comes from the fact that a single-port memory has to be accessed twice consecutively while fetching two operands from the register file. Hence, if size of the control FSM is very large and its power consumption is dominant, a hardware micro-task containing a dual-port register file would consume lower power than a hardware micro-task containing a single-port register file. In addition, since the size of the register file needed by a hardware micro-task is relatively small (as it will be discussed in Section 4.2.4), it does not consume huge power in the resultant circuit. � Immediate ROM : The datapath also contains a ROM that stores all the constants present in the application code. Since the constants stored in ROM are hardwired in the hardware and the ROM can be turned-off without data loss, this approach consumes less static and dynamic power for storing constants. � ALU : The ALU block contains several arithmetic and logic operations (such as add, sub, or, shl, etc.). A generic template of the ALU block implemented in our hardware micro-task is shown in Figure 4.5 where different operators can be added/removed and size of the multiplexer is customized according to the application at hand. � I/O interface: The final major component of the hardware micro-task is an I/O interface module that provides the interface to external data memories and I/O peripherals (that can be possibly shared among multiple micro-tasks). According to the legend of Figure 4.4, dotted lines represent control signals generated and exchanged between the control FSM and datapath components, whereas solid lines represent data-flow connections between datapath components. A more detailed view of the hardware micro-task architecture is shown in Figure 4.6 where we can find different control-flow multiplexers (such as those connecting different types of operands to the ALU block). To generate such architecture of a hardware micro-task from a high-level C-specification of the application, we developed a design-flow that is a hybrid of High-Level Synthesis (HLS) and retargetable Application Specific Instruction-Set Processor (ASIP) design-flows. 4.2 Proposed design-flow for micro-task generation Of course, even the soundest proposal for hardware specialization is useless without a supporting design-flow, which allows the programmer to proceed directly from a specification written in a high-level language (e.g. C) to an executable specification, which in our case consists of an RTL description of the each specialized hardware micro-task.