02.06.2015 Views

STM32 Journal - Digikey

STM32 Journal - Digikey

STM32 Journal - Digikey

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>STM32</strong> <strong>Journal</strong><br />

Volume 1, Issue 2<br />

In this Issue:<br />

〉〉 Bringing 32-bit Performance to 8- and 16-bit Applications<br />

〉〉 Developing High-Quality Audio for Consumer Electronics Applications<br />

〉〉 Bringing Floating-Point Performance and Precision to Embedded Applications<br />

〉〉 Achieving Ultra-Low-Power Efficiency for Portable Medical Devices<br />

〉〉 Accelerating Time-to-Market Through the ARM Cortex-M Ecosystem<br />

〉〉 Introducing a Graphical User Interface to Your Embedded Application


<strong>STM32</strong> <strong>Journal</strong><br />

<strong>STM32</strong> <strong>Journal</strong><br />

Volume 2, Issue 2<br />

We Are Not Alone<br />

By Nicholas Cravotta, Technical Editor<br />

Table of Contents<br />

2 Editorial<br />

3<br />

11<br />

19<br />

26<br />

34<br />

42<br />

Bringing 32-bit<br />

Performance to 8- and<br />

16-bit Applications<br />

Developing High-<br />

Quality Audio for<br />

Consumer Electronics<br />

Applications<br />

Bringing Floating-<br />

Point Performance and<br />

Precision to Embedded<br />

Applications<br />

Achieving Ultra-Low-<br />

Power Efficiency for<br />

Portable Medical<br />

Devices<br />

Accelerating Timeto-Market<br />

Through<br />

the ARM Cortex-M<br />

Ecosystem<br />

Introducing a Graphical<br />

User Interface to Your<br />

Embedded Application<br />

When I got my first paycheck as<br />

an engineer nearly three decades<br />

ago, coding and layout weren’t<br />

exactly social activities. While<br />

there was a certain amount of<br />

team collaboration to decide what<br />

I would work on, the majority of<br />

what I did was by myself. When I<br />

decided to shed the ten pounds<br />

I had gained as a freshman, it<br />

was a similar story. I never found<br />

someone willing to go consistently<br />

to the gym with me, so I pressed<br />

those weights alone as well.<br />

It’s quite a different world<br />

today. Take at look at the Nike+<br />

FuelBand on the cover of this<br />

issue’s <strong>STM32</strong> <strong>Journal</strong>. Worn on<br />

your wrist, it records your every<br />

activity, not just when you’re on<br />

the treadmill.<br />

What makes the FuelBand such<br />

a ground-breaking product is<br />

how it brings people together.<br />

It doesn’t matter whether you<br />

work out at 2am or are in a<br />

strange city on travel, with<br />

this next-generation exercise<br />

monitor, you are never alone.<br />

Connected to your phone via<br />

Bluetooth, you can be in touch<br />

with exercise buddies all around<br />

the world through the Nike+<br />

online community.<br />

The Nike+ FuelBand is quite<br />

a feat of engineering. To<br />

differentiate between simple<br />

gestures and active motions<br />

requires complex signal<br />

processing capabilities. The<br />

device must also be constantly<br />

on since even you don’t know<br />

when you might jump into action.<br />

120 LEDs comprise the display<br />

and “Fuel” indicator, and the<br />

device can operate for up to four<br />

full days without recharging. It<br />

also weighs less than one ounce,<br />

including the batteries. Now<br />

that’s an efficient design.<br />

At the heart of the FuelBand is<br />

ST’s ultra-low power <strong>STM32</strong> L1<br />

microcontroller. In addition to<br />

providing the 32-bit performance<br />

and processing capacity required<br />

for advanced signal processing,<br />

the <strong>STM32</strong> architecture offers the<br />

real-time responsiveness, power<br />

efficiency, and highly integrated<br />

peripherals and memory required<br />

for even the most demanding<br />

embedded applications.<br />

With innovations like FuelBand<br />

and Nike+ technology, Nike has<br />

leveraged social networking to<br />

change the way we live together.<br />

Exercise, as a result, is no longer<br />

a solo endeavor.<br />

Neither, it turns out, is<br />

engineering. The network<br />

supporting the <strong>STM32</strong><br />

architecture enables a whole new<br />

level of collaboration. Design<br />

tools from companies like Keil,<br />

IAR Systems, and Micriµm are<br />

like having a team of experts<br />

sitting right next you. Need<br />

to extend a design by adding<br />

audio or a capacitive touch GUIbased<br />

interface? Just call upon<br />

partners like DSP Concepts and<br />

GeeseWare. And with the <strong>STM32</strong><br />

architecture based on the ARM<br />

Cortex-M0, M3, and M4 cores,<br />

you have access to a global<br />

ecosystem second to none.<br />

You can even ask questions of<br />

your fellow engineers at 2am<br />

or share your own hard-won<br />

experience through forums,<br />

blogs, and tweets.<br />

It truly is a different world we live,<br />

play, exercise, and work in.<br />

2


<strong>STM32</strong> <strong>Journal</strong><br />

Bringing 32-bit Performance<br />

to 8- and 16-bit Applications<br />

By Reinhard Keil, Director of MCU Tools, ARM Germany GmbH<br />

Shawn Prestridge, Senior Field Applications Engineer, IAR Systems<br />

Sean Newton, Field Applications Engineering Manager, STMicroelectronics<br />

Today’s embedded applications<br />

are being called upon to<br />

provide an increasing number<br />

of capabilities. More and more<br />

devices need to be connected,<br />

require greater precision, must<br />

offer a graphics-based interface<br />

with touch capabilities, utilize<br />

sophisticated signal processing,<br />

and support multimedia playback.<br />

In the past, developers were<br />

compelled by cost constraints<br />

to base their designs on 8- and<br />

16-bit architectures that limited<br />

performance. Now, with the<br />

availability of next-generation<br />

MCUs like the <strong>STM32</strong> F0 that<br />

provide 32-bit performance<br />

at 8-bit budget pricing, OEMs<br />

can bring substantial value to<br />

end-users without having to<br />

compromise functionality. In<br />

addition, powerful development<br />

tools like Keil’s MDK-ARM and<br />

IAR Embedded Workbench<br />

enable developers new to 32-<br />

bit programming to immediately<br />

exploit the full capabilities of the<br />

<strong>STM32</strong> F0 architecture.<br />

The 32-bit Advantage<br />

There are several ways in which<br />

the <strong>STM32</strong> F0 lowers product<br />

cost compared to 8- and 16-bitbased<br />

designs. Specifically,<br />

because these MCUs tend to be<br />

based on legacy architectures,<br />

they have many limitations that<br />

slow development by forcing<br />

designers to work around the<br />

architecture, so to speak. For<br />

example, to complete a 16 x 16<br />

multiplication for a processing<br />

algorithm, a 16-bit CPU requires<br />

four multiplies and several<br />

additions, depending upon the<br />

implementation. An 8-bit CPU<br />

would require significantly more<br />

cycles. With the <strong>STM32</strong> F0, this<br />

takes a single instruction.<br />

The result is code that makes<br />

better utilization of MCU<br />

resources, leading to faster<br />

operation, more performance per<br />

MHz, higher code density, and<br />

greater power efficiency. Since<br />

each instruction does more per<br />

clock cycle, applications can be<br />

written using less code. In addition<br />

to accelerating development,<br />

shorter code is easier to debug as<br />

well. Together, all of these benefits<br />

lead to lower system cost.<br />

Cost, however, is only one of<br />

the numerous advantages the<br />

<strong>STM32</strong> F0 has over 8- and 16-<br />

bit architectures. The <strong>STM32</strong><br />

F0 is a full embedded MCU<br />

built using the same <strong>STM32</strong><br />

DNA that the rest of the <strong>STM32</strong><br />

family has, including excellent<br />

real-time performance, DMA,<br />

high-resolution ADC and DAC<br />

peripherals, motor control<br />

timers, and connectivity<br />

interfaces. These integrated<br />

capabilities bring tremendous<br />

efficiency to cost-sensitive<br />

designs in a way that limited<br />

8- and 16-bit MCU architectures<br />

cannot (see Figure 1).<br />

For example, the availability of<br />

a 32-bit bus not only speeds<br />

data transfers and increases<br />

computing performance, it<br />

improves system reliability.<br />

Consider the challenge of reading<br />

a 12-bit DAC using an 8-bit bus<br />

where the CPU has to read the<br />

DAC twice to capture the entire<br />

sample. If an interrupt occurs<br />

between these reads, the DAC<br />

data may be overwritten by the<br />

next sample before the interrupt<br />

is completed and the second<br />

read can be executed. To prevent<br />

this, developers have to manually<br />

disable interrupts for every<br />

such “atomic” operation in an<br />

application. If even one instance<br />

is missed, this creates a potential<br />

for an intermittent error that will<br />

be extremely difficult to resolve.<br />

DMA: Moving Data<br />

Efficiently<br />

The <strong>STM32</strong> F0 is a modern<br />

architecture integrating the<br />

3


<strong>STM32</strong> <strong>Journal</strong><br />

Real-Time<br />

performance<br />

48 MHz/38 DMIPS,<br />

5 channels DMA<br />

mapped on 11 IPs +<br />

Bus Matrix allows Flash<br />

execution in parallel<br />

with DMA transfer<br />

Power<br />

efficiency<br />

5 µA in Stop mode<br />

2 µA in standby mode<br />

0.4 µA Vbat with RTC,<br />

1.8 V or 2...3.6 V supply,<br />

Fast wake-up time<br />

Figure 1 The <strong>STM32</strong> F0 is a full embedded MCU built using the same <strong>STM32</strong> DNA that the rest of the <strong>STM32</strong> family has and offers<br />

tremendous efficiency to cost-sensitive designs in a way that limited 8- and 16-bit MCU architectures cannot.<br />

latest in processing, power,<br />

and debugging technology. For<br />

example, multiple low power<br />

modes extend greater control<br />

over power consumption to<br />

achieve longer operating life for<br />

battery-operated and portable<br />

devices. In addition, the <strong>STM32</strong><br />

F0 offers advanced features,<br />

including full Direct Memory<br />

Access (DMA) and the ability to<br />

shut down the ADC between<br />

samples to further increase<br />

performance while lowering<br />

power consumption.<br />

<strong>STM32</strong> F0 Series: Key Features<br />

Superior and<br />

innovative<br />

peripherals<br />

1 Mbps I 2 C Fast mode+<br />

SPI with 4- to 16-bit data<br />

frame, HDMI CEC,<br />

16-bit 3-phase MC timer<br />

In general, 8-bit MCUs don’t<br />

have the powerful peripherals<br />

that higher performance MCUs<br />

tend to have. For example,<br />

DMA has become an essential<br />

peripheral for applications that<br />

need to move a great deal<br />

of data, whether as part of a<br />

processing algorithm, receiving<br />

data from an interface, playing<br />

back audio, or transferring<br />

graphics to the display. In a<br />

traditional 8-bit architecture,<br />

each word of data has to be<br />

moved by the CPU. In addition,<br />

Maximum<br />

integration<br />

Calendar RTC with<br />

independent supply,<br />

battery-backed RAM,<br />

separate analog supply,<br />

safety<br />

Extensive tools<br />

and software<br />

ARM + ST ecosystem<br />

(eval board, discovery,<br />

SW library)<br />

<strong>STM32</strong> F0 series<br />

L1, F0, F1, F2, F4 series: seamless migration amongst 300 pin-to-pin compatible part #s<br />

pointers need to be updated<br />

and a loop managed. Thus,<br />

every 8-bits of data takes<br />

several cycles of CPU time<br />

to move.<br />

With the DMA in the <strong>STM32</strong><br />

F0, an entire block of data can<br />

be moved without involving<br />

the CPU. After the program<br />

configures the transfer, the<br />

DMA manages moving the data<br />

in the background. In fact, the<br />

CPU can drop into a low power<br />

sleep mode while it waits for<br />

the transfer to complete. As<br />

a result, data transfers do not<br />

consume unnecessary CPU<br />

cycles and require less power to<br />

complete than for 8- and 16-bit<br />

architectures.<br />

The availability of a DMA<br />

controller can also greatly<br />

simplify and accelerate product<br />

development. Consider reading<br />

data off of a high-speed data<br />

interface such as I2C. Because<br />

of the load on the CPU, 8-bit<br />

developers have to work around<br />

the MCU’s architecture, using<br />

many interrupts to utilize the time<br />

between data reads. With the<br />

<strong>STM32</strong> F0, the CPU operates<br />

independently of the interface,<br />

allowing developers to program<br />

the CPU for other tasks without<br />

having to worry about missing an<br />

interrupt or losing data.<br />

Because the <strong>STM32</strong> architecture<br />

uses an internal bus matrix, the<br />

DMA can be used in conjunction<br />

with each of the different onchip<br />

memories as well as many<br />

of the peripherals. For example,<br />

the DMA can be configured to<br />

sample the ADC regularly over<br />

a period of time: a timer triggers<br />

the DMA to read the ADC and<br />

store the result in memory<br />

without involving the CPU.<br />

Once the operation is complete,<br />

the ADC shuts down until the<br />

4


<strong>STM32</strong> <strong>Journal</strong><br />

next sample time. In fact, the<br />

bus matrix combined with a<br />

5-channel DMA enables the<br />

<strong>STM32</strong> F0 to support execution<br />

of code from Flash in parallel<br />

with other memory-memory,<br />

peripheral-memory, or memoryperipheral<br />

DMA transfers.<br />

There are many tools to<br />

assist developers in taking<br />

advantage of the <strong>STM32</strong> F0’s<br />

DMA capabilities without<br />

requiring them to become<br />

DMA experts. The ARM DSP<br />

Cortex Microcontroller Software<br />

Interface Standard (CMSIS)<br />

library, for example, provides<br />

signal processing functionality<br />

that has been optimized for<br />

the <strong>STM32</strong> F0 and takes full<br />

advantage of the DMA.<br />

An intelligent compiler can<br />

also help developers exploit<br />

DMA technology to its fullest<br />

advantage. IAR Embedded<br />

Workbench, for example, offers<br />

a feature that will automatically<br />

rearrange program data to<br />

maximize the use of the DMA.<br />

This enables developers to<br />

achieve high efficiency without<br />

having to put much forethought<br />

into how to layout the data<br />

space. The compiler achieves<br />

this by analyzing how data<br />

is used by the application.<br />

Consider a program that<br />

copies two different data<br />

structures using DMA. Each<br />

copy operation requires a<br />

separate DMA operation.<br />

However, after the compiler<br />

collocates the data structures<br />

in memory, they can be copied<br />

with a single DMA transfer.<br />

Note that each MCU may use<br />

the DMA in a slightly different<br />

manner. Keil’s MDK-ARM,<br />

for example, abstracts how<br />

the DMA is used from the<br />

application through an API<br />

that prevents code from being<br />

tied to a particular processor.<br />

This enables developers to<br />

migrate applications to other<br />

<strong>STM32</strong> devices and know that<br />

code utilizing the DMA will still<br />

perform optimally.<br />

Writing 32-bit Code<br />

Moving from 8-bit to 32-bit<br />

assembly is not trivial, given<br />

the vastly different instructions<br />

32-bit architectures offer; i.e.,<br />

single-instruction, multiple data<br />

(SIMD) instructions work on<br />

multiple data to vastly accelerate<br />

processing. Even moving<br />

between 16-bit architectures<br />

is challenging given that the<br />

peripherals can differ and impact<br />

how application code is written.<br />

5


<strong>STM32</strong> <strong>Journal</strong><br />

Benchmark<br />

Application 8-bit Math 8-bit Matrix 8-bit Switch 16-bit Math 16-bit Matrix 16-bit Switch 32-bit Math<br />

Floating-Point<br />

Math<br />

Matrix<br />

Multiplication<br />

FIR filter<br />

MSP430 178 86 198 126 90 198 222 1102 136 980<br />

dsPIC 236 420 424 224 552 424 424 2020 464 2256<br />

PIC24 236 420 416 224 552 416 424 2020 464 2256<br />

16-bit<br />

H8/300H 344 412 444 352 482 478 574 1104 482 1392<br />

MaxQ20 230 252 192 204 328 184 288 1172 398 1478<br />

HCS12 83 188 162 76 262 174 323 2082 219 1917<br />

ATxmega64A1 118 398 338 174 490 350 300 1080 584 1362<br />

ARM7TDMI 636 392 452 636 396 452 620 1832 428 1528<br />

8051 233 398 305 452 504 493 909 2190 536 2056<br />

8-bit<br />

PIC18F242 170 324 208 286 692 282 542 1400 676 2006<br />

ATmega8 134 354 350 198 434 382 342 1088 490 1358<br />

<strong>STM32</strong> F0<br />

Compiler Option Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size<br />

Code Size 110 94 68 52 98 120 114 106 68 52 98 120 112 108 640 636 268 84 550 550<br />

Figure 2 The <strong>STM32</strong> F0 delivers substantial code size reduction when comparing to other 8/16 architectures. This grid show the code size in bytes for various benchmark<br />

applications. Source: Benchmark applications and results for 8/16-bit: TI MSP430 Competitive Benchmarking (www.ti.com/lit/an/slaa205c/slaa205c.pdf). <strong>STM32</strong> F0<br />

code generated with MDK V4.23, MicroLib, and compiler optimization for execution speed or code size.<br />

The <strong>STM32</strong> F0 architecture<br />

facilitates a smooth migration to<br />

32-bits. The ability to develop<br />

in embedded C reduces the<br />

learning curve of moving to<br />

a new architecture. In many<br />

cases, engineers are already<br />

familiar with the ARM Cortex-M<br />

architecture. Developers can<br />

further ease migration by using<br />

a tool chain they are already<br />

familiar with, such as IAR<br />

Embedded Workbench and Keil's<br />

MDK-ARM. Finally, developing<br />

for the <strong>STM32</strong> F0 is simplified<br />

through the use of the ARM<br />

CMSIS libraries that abstract<br />

much of the underlying hardware<br />

from the application.<br />

Moving to the <strong>STM32</strong> F0 will<br />

result in a substantial reduction<br />

in code size because of the<br />

density possible with 32-bit<br />

instructions, on the order of<br />

30% (see Figure 2). With its<br />

32-bit address space, the<br />

<strong>STM32</strong> F0 also eliminates<br />

addressing and paging<br />

limitations that complicate<br />

memory management in 8-bit<br />

designs. For example, data<br />

6


<strong>STM32</strong> <strong>Journal</strong><br />

sets can be larger than a single<br />

page and there are no longer<br />

“far” addressing penalties.<br />

The use of object-oriented<br />

constructs, as is common with<br />

modern programming and<br />

modeling tools, can also be<br />

implemented without disruptive<br />

fragmentation.<br />

Without question, the best<br />

compiler is the human brain.<br />

Given enough time, a person<br />

can create a highly optimized<br />

program that no compiler can<br />

beat. Programming in assembly<br />

can also be more efficient than a<br />

C version of the same program.<br />

Time, however, is one of the<br />

resources of which developers<br />

don’t have a surplus. In addition,<br />

hand-written code can be<br />

extremely fragile; if the product<br />

specs change in a material<br />

way, many of a programmer’s<br />

optimizations will need to be<br />

completely reevaluated.<br />

The reality is that Keil’s MDK-<br />

ARM and IAR Embedded<br />

Workbench are smart enough<br />

to make excellent coding<br />

choices that might take a<br />

person weeks to evaluate. For<br />

example, how data is laid out<br />

impacts performance. There’s<br />

also the challenge of balancing<br />

optimization techniques like<br />

loop unrolling to memory<br />

footprint. A compiler can make<br />

these decisions for an entire<br />

program in just minutes. Each<br />

of these tools offers numerous<br />

optimization options it can<br />

perform automatically for the<br />

<strong>STM32</strong> F0 architecture that are<br />

significantly different than those<br />

typical with 8- and 16-bit MCUs.<br />

These options include data-flow<br />

optimizations such as common<br />

sub-expression elimination and<br />

loop optimizations such as loop<br />

combining and distribution.<br />

They also include advanced<br />

techniques like branch<br />

speculation and executing code<br />

out of sequence.<br />

These development tools for<br />

the <strong>STM32</strong> F0 give excellent<br />

results. Compiler efficiency<br />

compared to human coding<br />

has been estimated at 97%.<br />

Put another way, the cost of<br />

achieving that last 3% is on the<br />

order of weeks to months of<br />

development time. In addition,<br />

if a major design change<br />

is required, the compiler<br />

can complete a new set of<br />

optimizations with just a simple<br />

recompile.<br />

As a modern architecture, the<br />

<strong>STM32</strong> F0 is supported by<br />

similarly modern tools that<br />

utilize the latest advancements<br />

in compiler, debugger, and<br />

middleware technology to<br />

reduce development time and<br />

effort considerably. Being based<br />

on the Cortex-M architecture,<br />

the <strong>STM32</strong> F0 is backed by a<br />

larger ecosystem of tools and<br />

production-ready software than<br />

any other MCU architecture on<br />

the market. In addition, for many<br />

applications where the code<br />

base is small, the tools may be<br />

effectively free. For example,<br />

both IAR Embedded Workbench<br />

and Keil’s MDK-ARM are<br />

free when used for programs<br />

under 32 KB, thus enabling<br />

32-bit design with a low initial<br />

investment.<br />

Advanced Debugging<br />

While the ability to design<br />

demanding applications quickly<br />

is important, developers need<br />

debugging capabilities that<br />

can abstract the complexity of<br />

applications while still providing<br />

full visibility and control during<br />

run-time operation. In addition,<br />

many embedded markets,<br />

including medical and industrial,<br />

require that application software<br />

be certified as well.<br />

The integrated debug<br />

capabilities of the <strong>STM32</strong><br />

F0 provide many advanced<br />

capabilities that offer a superior<br />

debug experience compared<br />

to old-fashioned 8- and 16-bit<br />

architectures. For example, the<br />

<strong>STM32</strong> F0 architecture features<br />

ARM’s Coresight technology<br />

to help developers analyze,<br />

optimize, and verify program<br />

execution with minimal effort<br />

and cost.<br />

Coresight represents the<br />

latest in advanced debugging<br />

technology. Traditional MCUs<br />

offer only limited run/stop debug<br />

capabilities. To achieve greater<br />

visibility, an in-circuit emulator<br />

on the order of $1000s may be<br />

required, and a different pod will<br />

be required for each MCU in use.<br />

A few of the benefits Coresight<br />

provides which other MCU<br />

architectures do not include<br />

on-the-fly read/write access<br />

and trace capabilities at the<br />

instruction, data, and application<br />

level. As implemented in the<br />

<strong>STM32</strong> F0, Coresight also<br />

supports up to 4 hardware<br />

breakpoints and 2 watchpoints<br />

without requiring the use of<br />

intrusive monitoring techniques<br />

that can skew performance.<br />

Developers also have a choice<br />

of many low-cost debug<br />

adapters for the <strong>STM32</strong> F0. For<br />

example, the STLink in-circuit<br />

7


<strong>STM32</strong> <strong>Journal</strong><br />

debugger and programmer,<br />

which links the <strong>STM32</strong> F0<br />

target board to a PC via USB,<br />

is $25. For more advanced<br />

debugging, IAR Systems has the<br />

I-Jet debugger while Keil offers<br />

developers its ULINK2 and<br />

ULINKpro debuggers.<br />

These debuggers offer<br />

powerful capabilities that are<br />

often not available for 8- and<br />

16-bit designs. Keil MDK-<br />

ARM tools, for example,<br />

enable comprehensive code<br />

coverage, execution profiling,<br />

and performance analysis to<br />

ensure maximum performance<br />

efficiency. With the I-jet<br />

debugger, IAR Systems is able<br />

to offer non-intrusive power<br />

consumption monitoring at<br />

the board- and chip-level.<br />

Such “power debugging”<br />

enables developers to uncover<br />

opportunities to utilize and tune<br />

hardware to achieve the highest<br />

power efficiency.<br />

<strong>STM32</strong> F0 Features<br />

<strong>STM32</strong> F0 MCUs have been<br />

designed with real-time<br />

operating system (RTOS) and<br />

kernel support in mind to<br />

enable much tighter integration<br />

with RTOSes like Keil’s royaltyfree<br />

RTX. In a typical 8- or<br />

16-bit MCU, for example, the<br />

RTOS and application share<br />

the stack, and complex<br />

nesting problems can arise<br />

that overflow the stack and<br />

crash the system. The only<br />

way to avoid such issues is to<br />

overprovision the stack. The<br />

<strong>STM32</strong> F0, in contrast, has two<br />

stacks: one for the application<br />

and one for the RTOS. This<br />

prevents applications from<br />

compromising RTOS integrity.<br />

In addition, RAM overhead is<br />

much lower.<br />

Other companies basing MCUs<br />

on the Cortex-M0 architecture<br />

integrate only the minimum<br />

capabilities an MCU requires.<br />

ST is the only company to offer<br />

Cortex-M0-based MCUs with:<br />

〉〉 Easy Communication: Using<br />

the integrated DMA controller,<br />

the <strong>STM32</strong> F0 can support<br />

continuous I 2 C at a rate of 1<br />

Mbps without bogging down<br />

the CPU. This data rate isn’t<br />

possible to achieve on an 8-<br />

or 16-bit MCU that does not<br />

support DMA.<br />

〉〉 Advanced Digital and Analog<br />

Capabilities: The <strong>STM32</strong> F0<br />

integrates a wide range of<br />

IP to facilitate the design of<br />

sensing and control systems.<br />

For example, advanced timers<br />

enable the accurate output<br />

of complex AC waveforms.<br />

On-chip comparators simplify<br />

the design of sensors.<br />

The 12-bit, multi-channel<br />

ADC operating at up to 1<br />

MSample/s allows for fast<br />

and precise data acquisition,<br />

as well as improves system<br />

responsiveness to external<br />

events. Advanced timing<br />

control is enabled using the<br />

32-bit and 16-bit PWM timers<br />

with 17 capture/compare I/O<br />

mapped onto up to 28 pins.<br />

〉〉 Safety Ready: With shrinking<br />

process technologies and<br />

larger memories combined<br />

with frequently changing data,<br />

bit errors from cosmic rays<br />

can occur. For systems that<br />

must meet stringent safety<br />

compliance standards, the<br />

<strong>STM32</strong> F0 performs realtime,<br />

hardware-based RAM<br />

parity checking and 16-bit<br />

CRC verification for Flash to<br />

ensure the integrity of memory.<br />

RAM checks are performed<br />

automatically whenever<br />

memory is accessed. Flash<br />

verification is self-managed,<br />

enabling developers to confirm<br />

program integrity upon startup<br />

and when updating firmware<br />

to verify that no bits have been<br />

flipped since they were written.<br />

Double Sourcing<br />

In recent years, the<br />

industry has seen<br />

shortages when devices<br />

are manufactured in a<br />

single location. To ensure<br />

its customers will always<br />

have uninterrupted<br />

access to product,<br />

ST employs a double<br />

sourcing strategy in<br />

which all <strong>STM32</strong> devices<br />

are manufactured in at<br />

least two fabs in different<br />

parts of the world. This<br />

prevents product supply<br />

from being vulnerable to<br />

environmental factors that<br />

shut down a particular<br />

production fab. It also<br />

enables ST to meet any<br />

unanticipated rise in<br />

demand more easily by<br />

shifting production among<br />

multiple fabs.<br />

〉〉 Reliability: The <strong>STM32</strong> F0<br />

integrates two watchdog<br />

timers, one of which is a<br />

windowed watchdog timer.<br />

These timers, which can<br />

operate in low power modes<br />

as well, provide a higher level<br />

9


<strong>STM32</strong> <strong>Journal</strong><br />

1.8<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

<strong>STM32</strong> F0 Benchmark Positioning<br />

CoreMark/MHz<br />

of reliability not available in<br />

most 8- and 16-bit MCUs. A<br />

Clock Security System (CSS)<br />

enables systems to switch to<br />

internal RC-based clocking in<br />

case of external clock failure<br />

to ensure systems can shut<br />

down gracefully rather than<br />

catastrophically.<br />

〉〉 Optimized Communications:<br />

The <strong>STM32</strong> F0 supports the<br />

HDMI Consumer Electronics<br />

Communication (CEC)<br />

protocol. Important for<br />

devices targeted for consumer<br />

markets, this peripheral<br />

enables devices to have smart<br />

control over multiple HDMI<br />

lines. For devices needing<br />

Competitor A (8/16-bit)<br />

Competitor B (8-bit)<br />

Competitor C (16-bit)<br />

Competitor D (16-bit)<br />

<strong>STM32</strong> F0 (Cortex-M0)<br />

Figure 3 The <strong>STM32</strong> F0 provides an optimal balance of cost, performance, and peripherals for<br />

embedded applications.<br />

remote control capabilities,<br />

ST provides a full infrared<br />

firmware library.<br />

〉〉 Memory: Memory capacity<br />

ranges from 16 KB to<br />

128 KB Flash<br />

〉〉 1.8V Ready: The <strong>STM32</strong><br />

F0 can interface directly to<br />

1.8 to 3.6 V-based devices.<br />

This eliminates the need<br />

for additional conditioning<br />

circuitry 8- and 16-bit<br />

MCUs require.<br />

〉〉 Capacitive Touch Sensing:<br />

To add touch to 8- and 16-<br />

bit MCU-based designs, a<br />

second processor is typically<br />

required. With the <strong>STM32</strong><br />

F0, developers can easily<br />

introduce capacitive touch<br />

sensing to applications,<br />

with up to 18 keys and<br />

slider/wheel configurations,<br />

all with a single chip. In<br />

addition, touch sensing can<br />

be implemented with zero<br />

CPU loading when using the<br />

charge transfer method.<br />

Overall, the <strong>STM32</strong> F0 provides<br />

an optimal balance of cost,<br />

performance, and peripherals<br />

for embedded applications<br />

(see Figure 3). Rather than tie<br />

developers to a proprietary<br />

architecture with limited tools<br />

and support, ST offers the<br />

industry’s widest Cortex-M<br />

portfolio with more than 300<br />

compatible devices across the<br />

entire <strong>STM32</strong> family.<br />

With code-, pin-, and peripheralcompatibility<br />

across the <strong>STM32</strong><br />

family, developers can leverage<br />

Cortex-M0-based designs to<br />

M3- and M4-based MCUs<br />

with unparalleled flexibility. For<br />

example, applications designed<br />

using the <strong>STM32</strong> F0 are easily<br />

migrated to the <strong>STM32</strong> F2 and<br />

<strong>STM32</strong> F4. With Keil’s MDK-<br />

ARM and IAR Embedded<br />

Workbench, developers just need<br />

to change the MCU selection<br />

and the compiler handles all of<br />

the details by recompiling the<br />

code. This enables developers<br />

to easily migrate to an MCU with<br />

more performance, memory,<br />

and peripherals without rewriting<br />

the application. As a result,<br />

developers can leverage the<br />

same application and tool chain<br />

across an entire product line and<br />

a variety of MCUs.<br />

Similarly, developers have the<br />

option of designing code on<br />

the <strong>STM32</strong> F2 or F4 with the<br />

intention of later downsizing<br />

to the <strong>STM32</strong> F0. This enables<br />

design to take place on a<br />

platform with the highest<br />

performance and memory to<br />

accelerate proof-of-concept<br />

design. Once the design has<br />

settled, developers can optimize<br />

it for the <strong>STM32</strong> F0.<br />

With the <strong>STM32</strong> F0, ST offers a<br />

compelling alternative to 8- and<br />

16-bit devices. For the same<br />

price, developers get more<br />

performance, higher resolution<br />

peripherals, better tools,<br />

wider support, accelerated<br />

development, and faster timeto-market.<br />

To explore how the<br />

new <strong>STM32</strong> F0 can bring the<br />

benefits of 32-bit technology<br />

to your designs, the <strong>STM32</strong> F0<br />

Discovery Kit is available now<br />

for less than $10.<br />

10


<strong>STM32</strong> <strong>Journal</strong><br />

Developing High-Quality Audio for<br />

Consumer Electronics Applications<br />

By Paul Beckmann, CEO/CTO, DSP Concepts<br />

Dragos Davidescu, Chief System Architect, STMicroelectronics<br />

John Knab, Application Engineer, STMicroelectronics<br />

With the falling cost of<br />

high-performance MCUs,<br />

manufacturers are considering<br />

adding digital audio functionality<br />

to more and more consumer<br />

devices and other embedded<br />

applications. Their goal is to<br />

support the wide variety of media<br />

sources users want to access<br />

such as an iPhone, Internet<br />

radio, external USB devices, and<br />

SD cards.<br />

Achieving high-quality sound<br />

output, however, is nontrivial.<br />

Sound quality depends<br />

greatly upon the final system<br />

configuration, making it difficult<br />

to design even when prototype<br />

hardware is available. In<br />

addition, implementing realtime<br />

digital signal processing<br />

algorithms introduces a<br />

whole new set of concerns<br />

for developers used to MCUbased<br />

design. These include<br />

implementing advanced filters<br />

and processing algorithms,<br />

handling fixed-point issues,<br />

using DSP-like instructions,<br />

and optimizing complex<br />

algorithms for speed, MIPS,<br />

memory, and power.<br />

In this article, we’ll show how<br />

developers can leverage MCUbased<br />

digital signal processing<br />

(DSP) and floating-point unit<br />

(FPU) capabilities to enable<br />

real-time audio playback,<br />

implement enhanced algorithms,<br />

convert between multiple<br />

clock domains, manage highspeed<br />

communications without<br />

impacting audio quality, optimize<br />

designs to balance quality and<br />

cost, and manage other system<br />

tasks such as a graphical<br />

user interface, all with a single<br />

microcontroller.<br />

Consumer Audio<br />

Traditionally, introducing audio<br />

to an embedded application<br />

requires digital signal-processing<br />

capabilities beyond the capabilities<br />

of most MCUs. Even a “simple”<br />

product like an iPod speaker dock<br />

requires a significant number of<br />

advanced audio algorithms to<br />

achieve full performance:<br />

Spatial enhancement: In<br />

an iPod docking station, the<br />

speakers may be only 12-18<br />

inches apart. To create a more<br />

spacious, rich sound, spatial<br />

enhancement is required is<br />

to compensate for the close<br />

proximity of the speakers.<br />

Multi-channel audio: For<br />

systems supporting more than<br />

two speakers, the stereo input<br />

signal requires processing to<br />

create the additional audio<br />

channels.<br />

Equalization: Speakers need to<br />

be equalized to achieve better<br />

sound quality. If the speakers<br />

in use change, the equalization<br />

needs to be adjusted as well.<br />

Developers can employ a<br />

variety of equalization methods,<br />

including graphic and parametric<br />

equalization. For higher-end<br />

applications, developers may<br />

even want to design their own<br />

equalization algorithms using a<br />

tool like MATLAB.<br />

Peak limiting: Speakers exhibit<br />

nonlinearity at louder sound<br />

levels. By applying a time-varying<br />

gain and carefully controlling the<br />

peak levels, the system can play<br />

louder with a minimum amount<br />

of distortion.<br />

Boost: When listening to music<br />

at low volume levels, much<br />

detail, and therefore depth, can<br />

be lost. Boosting of the bass and<br />

certain other frequencies at low<br />

volume levels using loudness<br />

compensation or perceptual<br />

volume-control techniques can<br />

significantly improve perceived<br />

sound quality.<br />

Level matching: Level matching<br />

eliminates the need for users to<br />

adjust the volume for each song<br />

11


<strong>STM32</strong> <strong>Journal</strong><br />

when shuffling through a large<br />

library of albums.<br />

Digital audio has commonly<br />

been implemented in consumer<br />

electronics and embedded<br />

applications using a second<br />

processor dedicated to this task.<br />

To meet market cost pressures,<br />

however, manufacturers need to<br />

be able to process audio on the<br />

host CPU.<br />

In general, it is easier to<br />

implement audio on an MCU<br />

than it is to implement real-time<br />

responsiveness and connectivity<br />

on a DSP. DSPs, while excellent<br />

at processing audio, don’t have<br />

the peripherals or interrupt<br />

responsiveness required for realtime<br />

systems. DSP architectures<br />

are also typically designed for<br />

high-end signal processing<br />

and massive parallelism that<br />

exceeds the requirements of the<br />

typical consumer application. In<br />

addition, DSPs are not designed<br />

to support communication<br />

interfaces like USB, SD cards, or<br />

Wi-Fi, so a DSP-based docking<br />

station would still require a<br />

second processor to handle<br />

connectivity.<br />

With the introduction of DSP<br />

capabilities to MCU instruction<br />

sets, MCUs now have the<br />

advanced math processing<br />

System<br />

Power supply<br />

1.2 V regulator<br />

POR/PDR/PVD<br />

Xtal oscillators<br />

32 kHz + 4 ~26 MHz<br />

Internal RC oscillators<br />

32 kHz + 16 MHz<br />

PLL<br />

Clock control<br />

RTC/AWU<br />

SysTick timer<br />

2x watchdogs<br />

(independent and window)<br />

51/82/114/140 I/Os<br />

Cyclic redundancy<br />

check (CRC)<br />

Control<br />

2x 16-bit motor control<br />

PWM<br />

Synchronized AC timer<br />

10x 16-bit timers<br />

2x 32-bit timers<br />

capabilities required to handle<br />

not only basic audio processing<br />

but the advanced algorithms<br />

required to improve quality as<br />

well. In addition, rather than<br />

requiring developers to handcode<br />

assembly as is typical for<br />

DSP-based designs, MCUs offer<br />

ease-of-use and faster time-tomarket<br />

through C programming<br />

and application libraries.<br />

MCUs are also specifically<br />

ART Accelerator<br />

<strong>STM32</strong> F4<br />

ARM Cortex-M4<br />

168 MHz<br />

Floating-point unit (FPU)<br />

Nested vector<br />

interrupt<br />

controller (NVIC)<br />

MPU<br />

JTAG/SW debug/ETM<br />

Multi-AHB bus matrix<br />

16-channel DMA<br />

Crypto/hash processor<br />

3DES, AES 256<br />

SHA-1, MD5, HMAC<br />

True random number generator (RNG)<br />

architected to provide short and<br />

deterministic interrupt latency as<br />

well as ultra low-power operation<br />

for battery-powered applications.<br />

The <strong>STM32</strong> MCU + Audio<br />

Architecture<br />

The <strong>STM32</strong> architecture from<br />

ST has been designed to<br />

bring 32-bit MCU capabilities<br />

to a wide range of consumer<br />

audio applications, including<br />

Up to 1-Mbyte Flash memory<br />

Up to 192-Kbyte SRAM<br />

FSMC/SRAM/NOR/NAND/CF/LCD<br />

parallel interface<br />

80-byte + 4-Kbyte backup SRAM<br />

512 OTP bytes<br />

Connectivity<br />

Camera interface<br />

3x SPI, 2x I 2 S, 3x I 2 C<br />

Ethernet MAC 10/100<br />

with IEEE 1588<br />

2x CAN 2.0B<br />

1x USB 2.0 OTG FS/HS<br />

1x USB 2.0 OTG FS<br />

SDIO<br />

6x USART<br />

LIN, smartcard, lrDA,<br />

modem control<br />

Analog<br />

2-channel 2x 12-bit DAC<br />

3x 12-bit ADC<br />

24 channels / 2.4 MSPS<br />

Temperature sensor<br />

Figure 1 ST has expanded its <strong>STM32</strong> MCUs beyond the base Cortex-M architecture with a variety of integrated peripherals<br />

to create a wide range of MCUs that optimize performance, memory, and cost for nearly every embedded application.<br />

multimedia speakers, docking<br />

stations, and headphones. The<br />

<strong>STM32</strong> F4, based on an ARM<br />

Cortex-M4 core operating at<br />

up to 168 MHz, also integrates<br />

capabilities such as DSP<br />

instructions and a floating-point<br />

unit to allow manufacturers<br />

to produce consumer audio<br />

applications offering quality<br />

playback at the lowest cost<br />

(see Figure 1).<br />

12


<strong>STM32</strong> <strong>Journal</strong><br />

DSP application example: MP3 audio playback<br />

General Purpose MCUs<br />

Min<br />

Max<br />

Discrete DSPs<br />

Cortex-M4<br />

Specialised Audio DSPs<br />

0 5 10 15 20 25 30<br />

MHz required for MP3 decode (smaller is better!)<br />

DSP Concept<br />

Figure 2 With its Cortex-M4 core, the <strong>STM32</strong> F4 offers excellent audio processing capabilities<br />

that exceed the performance of many general-purpose MCUs and discrete DSPs.<br />

The <strong>STM32</strong> F4 offers excellent<br />

audio processing capabilities<br />

(see Figure 2). With its rich<br />

peripheral integration, a single<br />

<strong>STM32</strong> F4 can provide a costeffective,<br />

single-chip solution for<br />

implementing embedded audio<br />

that combines performance,<br />

ease-of-use, connectivity, and<br />

signal processing to achieve<br />

quality audio playback. Key<br />

capabilities of the <strong>STM32</strong> F4<br />

for accelerating audio design,<br />

enhancing performance, and<br />

lowering system cost include:<br />

Digital Signal Processing<br />

Instructions: With the <strong>STM32</strong><br />

F4, developers have access<br />

to up to 105 DSP-specific<br />

instructions. These instructions<br />

include single-cycle multiplyand-accumulate<br />

(MAC),<br />

saturated arithmetic, and both<br />

8- and 16-bit SIMD integer<br />

operations. Its architecture is<br />

designed to enable high-quality<br />

audio in consumer electronics<br />

and embedded applications in a<br />

more cost-effective manner than<br />

is possible with DSPs.<br />

13


<strong>STM32</strong> <strong>Journal</strong><br />

Floating-Point Unit: All <strong>STM32</strong><br />

F4 devices also have an<br />

integrated floating-point unit<br />

(FPU). While signal-processing<br />

algorithms can be implemented<br />

using fixed-point arithmetic,<br />

this approach adds complexity<br />

in that underflow and overflow<br />

need to be manually managed. In<br />

addition, fixed-point processing<br />

offers less dynamic range than<br />

floating-point, which impacts<br />

many audio functions. With<br />

the integrated FPU, there is no<br />

penalty for retaining this precision.<br />

Code based on floating-point can<br />

also be substantially faster and<br />

requires less memory than fixedpoint<br />

code.<br />

32-bit Efficiency: The bus<br />

size of the processor has a<br />

tremendous impact on both<br />

performance and power<br />

efficiency. Even if audio<br />

samples are streaming at 16<br />

bits, the system still needs<br />

32-bits to store intermediate<br />

computations. A 16-bit MCU<br />

or DSP, for example, requires<br />

seven operations (4 multiplies<br />

and 3 additions) just to complete<br />

a single 32 x 32 multiplication.<br />

The <strong>STM32</strong> F4 can execute<br />

a 32-bit MAC (multiply and<br />

accumulate) with only one<br />

single-cycle operation.<br />

7-layer 32-bit multi-AHB bus matrix<br />

Instructions<br />

Cortex-M4<br />

with CPU<br />

120 168 MHz<br />

Data<br />

64 Kbytes SRAM<br />

System<br />

Generalpurpose<br />

DMA1<br />

8 channels<br />

DMA_MEM1<br />

DMA_P1<br />

Multi-Layer Bus Fabric:<br />

The key to real-time signal<br />

processing is maintaining<br />

efficient data flow. In a<br />

consumer audio device,<br />

however, the MCU must<br />

move not only signal data but<br />

manage program memory,<br />

communication ports, and other<br />

system tasks. The complexity<br />

and real-time nature of audio<br />

algorithms also requires<br />

them to be integrated with<br />

application code to ensure that<br />

Generalpurpose<br />

DMA2<br />

8 channels<br />

DMA_MEM2<br />

DMA_P2<br />

Ethernet<br />

MAC 10/100<br />

DMA<br />

USB OTG<br />

HS<br />

DMA<br />

100 Mbit/s 480 Mbit/s<br />

12.5 MByte/s 60 MByte/s<br />

Bus masters<br />

neither core application tasks<br />

nor audio playback compromise<br />

each other.<br />

The <strong>STM32</strong> architecture is<br />

designed to minimize this<br />

problem so that developers<br />

do not need to spend time<br />

resolving potential conflicts.<br />

This is achieved through the low<br />

interrupt service overhead of<br />

<strong>STM32</strong> devices combined with<br />

the multi-layer bus fabric that<br />

allows multiple DMA transactions<br />

to occur simultaneously without<br />

Bus Slaves<br />

FSMC<br />

AHB2 peripheral<br />

AHB1 peripheral<br />

SRAM 16 Kbytes<br />

SRAM 112 Kbytes<br />

672 MByte/s D D<br />

Flash<br />

672 MByte/s I I 1 Mbyte<br />

ART<br />

Accelerator<br />

AHB1/APB1<br />

AHB1/APB2<br />

Figure 3 The multi-layer matrix that interconnects <strong>STM32</strong> MCUs with peripherals and memory enables simultaneous transfer between multiple<br />

masters and slaves without requiring involvement from the CPU. This provides <strong>STM32</strong> MCUs with a tremendous interconnect<br />

capacity that eliminates peripheral and memory access bottlenecks for the highest operating performance.<br />

burdening the CPU. Figure<br />

3 shows the high level of<br />

parallelism that can be achieved<br />

through simultaneous transfers<br />

over a multi-layer fabric:<br />

〉〉 Program code is executed<br />

from Flash with data stored in<br />

SRAM (red)<br />

〉〉 The compressed audio stream<br />

is received over USB and<br />

stored in SRAM (green).<br />

〉〉 CPU with DSP and FPU<br />

functionality accesses the<br />

14


<strong>STM32</strong> <strong>Journal</strong><br />

compressed audio stream for<br />

decompression and signal<br />

processing (green)<br />

〉〉 Decompressed MP3 data is<br />

sent from the CPU to SRAM<br />

(yellow)<br />

〉〉 Audio data is output to I2S<br />

through DMA (orange)<br />

〉〉 Graphical icons are transferred<br />

from Flash to the display<br />

through DMA (blue)<br />

Communications Interfaces:<br />

Users want to be able to access<br />

audio data from different sources<br />

and over different interfaces.<br />

With the right mix of interfaces—<br />

including USB (host and device),<br />

Ethernet for Internet Radio,<br />

SDIO, and external memory—<br />

developers can create flexible<br />

devices that support a wide<br />

range of usage models.<br />

In addition to being able to<br />

receive data without loading<br />

the CPU, developers need to<br />

be able to address the many<br />

issues related to streaming<br />

audio, including lost packets and<br />

lack of feedback controls. For<br />

example, USB feedback controls<br />

to prevent under and overflow of<br />

the audio buffer are not always<br />

used or well implemented. This<br />

can result in lost or dropped<br />

packets that impact audio<br />

quality. To overcome this<br />

limitation, developers can utilize<br />

sample-rate conversion (SRC).<br />

SRC is also useful for converting<br />

between audio speeds (i.e., clock<br />

domains) while maintaining audio<br />

fidelity, compensating for slight<br />

mismatches in clock speed, or<br />

for mixing audio from different<br />

sources. For applications that<br />

need SRC, the <strong>STM32</strong> F4<br />

requires only 10% utilization,<br />

leaving plenty of headroom for<br />

other signal-processing tasks.<br />

Multiple Clock Sources:<br />

Consumer audio systems require<br />

a number of different clock<br />

domains—including the CPU,<br />

USB, and I2S—that have fixed<br />

frequencies and need to be<br />

accurate as well as free of jitter.<br />

Trying to use the same clock<br />

for each of these can impact<br />

precision. For example, it is<br />

straightforward to achieve a clean<br />

clock at 168 MHz for the CPU,<br />

44.1 KHz for an I2S interface or<br />

48 MHz for USB but not for all<br />

three using a single clock source.<br />

The <strong>STM32</strong> F4 integrates two<br />

PLLs for increased clocking<br />

flexibility. The main PLL is used<br />

to generate the system clock<br />

and the second PLL is available<br />

to generate the accurate clocks<br />

needed for high-quality audio.<br />

Complete audio system<br />

<strong>STM32</strong> F2<br />

CPU load<br />

<strong>STM32</strong> F4<br />

CPU load<br />

Flash<br />

footprint<br />

RAM<br />

footprint<br />

MP3 decoder 17% 6% 23k 12344<br />

MP3 encoder 22.5% 9% 25k 16060<br />

WMA decoder 17.5% 6% 45k 36076<br />

AAC+ v2 decoder 25% 11% 54k 87000<br />

Channel mixer 2.5% 2% 0.6k 16<br />

Parametric Equalizer 16% 12% 2k 300<br />

Loudness Control 4.5% 3.5% 3.25k 632<br />

SRC 22.5% 10% 17.5k 1880<br />

Figure 4 When computing 16- and 32-bit DSP functions, the<strong>STM32</strong> F4 offers a 25–70%<br />

improvement. As a result, systems can drop into sleep mode faster to conserve<br />

power or run more algorithms to further improve audio quality.<br />

In addition to being able to receive data<br />

without loading the CPU, developers need<br />

to be able to address the many issues<br />

related to streaming audio, including lost<br />

packets and lack of feedback controls.<br />

For example, USB feedback controls to<br />

prevent under and overflow of the audio<br />

are not always used or well implemented.<br />

15


<strong>STM32</strong> <strong>Journal</strong><br />

CMSIS DSP library to accelerate<br />

development. The CMSIS<br />

DSP library includes a large<br />

number of DSP and floatingpoint<br />

functions optimized for<br />

the algorithms commonly used<br />

in audio applications. This<br />

library is supplied by ARM for<br />

processors built around the<br />

Cortex-M4 processor. DSP<br />

Concepts is the company that<br />

wrote the CMSIS DSP library.<br />

They have leveraged their<br />

intimate knowledge of the library<br />

to create the audio blocks that<br />

make up their signal processing<br />

design tool, Audio Weaver.<br />

Figure 5 Audio Weaver from DSP Concepts offers a GUI-based development environment that<br />

enables developers to design the signal flow for their application by selecting processing<br />

blocks and connecting them using a drag-and-drop editor.<br />

The ability to source different<br />

clock domains enables designs<br />

based on the <strong>STM32</strong> architecture<br />

to maintain a permanent USB<br />

connection and avoid audio<br />

synchronization issues.<br />

Integrated Audio Interfaces:<br />

The <strong>STM32</strong> F4 has two fullduplex<br />

I2S standard stereo<br />

interfaces offering less than<br />

0.5% sampling frequency error.<br />

There is also an external clock<br />

input to the I2S peripheral if<br />

an external high-quality audio<br />

PLL is preferred. In addition to<br />

simplifying design, integrating<br />

the I2S interfaces reduces<br />

component count, board size,<br />

and system cost.<br />

MCU Peripherals: The <strong>STM32</strong><br />

architecture includes all of the<br />

real-time peripherals required for<br />

even the most demanding MCUbased<br />

application.<br />

The combination of the <strong>STM32</strong><br />

F4’s capabilities brings a new<br />

level of performance to audio<br />

applications. Performing<br />

a long 32-bit multiply or<br />

multiply-accumulate (MAC)<br />

operation on an <strong>STM32</strong> F2, for<br />

example, takes 3-7 cycles. With<br />

the <strong>STM32</strong> F4, this operation<br />

is performed in a single cycle.<br />

When computing 16- and 32-<br />

bit DSP functions, the <strong>STM32</strong><br />

F4 offers a 25-70% (see Figure<br />

4) improvement. As a result,<br />

systems can drop into sleep<br />

mode faster to conserve power<br />

or run more algorithms to further<br />

improve audio quality.<br />

In addition to the integrated DSP<br />

capabilities of the <strong>STM32</strong> F4,<br />

developers have access to the<br />

Audio Algorithm Design<br />

Audio Weaver enables<br />

developers to quickly design<br />

the audio processing portion<br />

of their system; i.e., everything<br />

that goes on between receiving<br />

an audio signal and outputting<br />

it. Audio Weaver offers a<br />

GUI-based development<br />

environment that enables<br />

developers to design the signal<br />

flow for their application by<br />

selecting processing blocks<br />

and connecting them using<br />

a drag-and-drop editor (see<br />

Figure 5). Each block has handoptimized<br />

code behind it, and<br />

the tool automatically creates<br />

the required data structures.<br />

16


<strong>STM32</strong> <strong>Journal</strong><br />

Because complex functions are<br />

built from base audio functions,<br />

the final code executes with<br />

no performance or efficiency<br />

losses compared to handcoding<br />

from scratch.<br />

When algorithm code is written<br />

by hand, each design iteration<br />

requires substantial time<br />

investment since the code must<br />

be optimized and tuned to<br />

see what its actual impact on<br />

sound quality and processing<br />

load are. With Audio Weaver,<br />

the design cycle is much faster,<br />

giving developers the ability to<br />

explore more configurations in<br />

their efforts to increase sound<br />

quality while reducing system<br />

cost. Code is highly optimized<br />

for MIPS and memory usage,<br />

supports floating-point<br />

processors such as the <strong>STM32</strong><br />

F4, offers flexible deployment<br />

modules, and does not require<br />

an RTOS to operate. The library<br />

includes over 150 different<br />

audio blocks, including thirdparty<br />

IP.<br />

With tools like Audio Weaver, it<br />

has become possible to create<br />

highly tuned audio applications<br />

without engineers needing<br />

to have a deep knowledge<br />

of audio processing. For<br />

companies new to audio,<br />

complete reference designs are<br />

available, with assistance from<br />

DSP Concepts to tune them<br />

for the final production system.<br />

Companies that are comfortable<br />

with audio processing can<br />

work with individual audio<br />

blocks that provide basic<br />

functionality and build them<br />

into higher-level processing<br />

algorithms. Even sophisticated<br />

companies can accelerate<br />

design using Audio Weaver as<br />

it provides a framework with<br />

core components that not only<br />

jumpstarts design with highly<br />

optimized code but provides a<br />

development environment that<br />

facilitates fast prototyping and<br />

tuning. For these companies the<br />

value-add of Audio Weaver is<br />

faster time-to-market.<br />

Accelerating Optimization<br />

To speed design, Audio Weaver<br />

supports cross-platform<br />

development. The ability to run<br />

the same algorithms on a PC as<br />

on the <strong>STM32</strong> F4 gives engineers<br />

a powerful environment in<br />

which to design and tune the<br />

software in parallel with hardware<br />

development. Once target<br />

hardware is available, the code<br />

can be retargeted for the <strong>STM32</strong><br />

F4 and final optimizations made,<br />

resulting in significant time-tomarket<br />

savings.<br />

During final optimization,<br />

developers can profile the<br />

MIPS and memory required by<br />

each audio processing stage.<br />

This enables engineers to<br />

measure how much a particular<br />

improvement in sound quality will<br />

cost in terms of CPU utilization<br />

to determine the most efficient<br />

use of processing resources<br />

when many functions have to<br />

operate simultaneously.<br />

When algorithm code is written by hand, each design<br />

iteration requires substantial time investment since the<br />

code must be optimized and tuned to see what its actual<br />

impact on sound quality and processing load are. With Audio<br />

Weaver, the design cycle is much faster, giving developers<br />

the ability to explore more configurations in their efforts to<br />

increase sound quality while reducing system cost.<br />

Consider the use of different<br />

order filters to equalize the<br />

speakers. A lower-order filter,<br />

for example, may provide a<br />

frequency response that is 3 dB<br />

off of the ideal response while a<br />

higher order filter is off by only<br />

1 dB. The relative difference<br />

in CPU utilization between<br />

these two filters can be used<br />

17


<strong>STM32</strong> <strong>Journal</strong><br />

to determine where to allocate<br />

CPU resources to maximize<br />

sound quality.<br />

At the end of the day, however,<br />

audio quality is not about<br />

response graphs but how it<br />

actually sounds to people. With<br />

many development systems,<br />

engineers have to make<br />

adjustments to code, recompile,<br />

and download code before they<br />

can hear a new configuration.<br />

However, to assess the impact of<br />

a lower-order filter on quality, for<br />

example, developers need to be<br />

able to hear both configurations<br />

right after each other.<br />

Audio Weaver solves this<br />

problem by supporting a<br />

tuning interface that can<br />

change filter characteristics<br />

in real-time. With the ability to<br />

configure and switch between<br />

multiple settings with the click<br />

of a button, developers can<br />

compare two sets of speaker<br />

equalizations or different spatial<br />

processing. Note that the<br />

tuning interface is seamless<br />

and transparent, compared to<br />

instrumenting code that can<br />

impact quality because of extra<br />

loading on the CPU.<br />

The ability to tune quickly and<br />

easily without recompiling can<br />

substantially shorten the time<br />

it takes to optimize a system.<br />

Flexible tuning also simplifies<br />

the optimization process for<br />

developers new to audio.<br />

Note that audio applications are<br />

not comprised solely of audio<br />

processing. To accelerate system<br />

design, DSP Concepts also<br />

provides an extensive range of<br />

software functionality beyond its<br />

extensive audio module library,<br />

including:<br />

〉〉 Real-time kernel<br />

〉〉 Audio I/O management<br />

〉〉 PC/host control interface<br />

〉〉 Boot loader<br />

〉〉 Update manager<br />

〉〉 Flash file system<br />

System-level Design<br />

One of the challenges to<br />

adding audio to embedded<br />

designs is that while many MCU<br />

manufacturers offer reference<br />

designs, audio is typically<br />

not one of the applications<br />

supported.<br />

To address this shortcoming,<br />

ST has invested significantly in<br />

creating digital audio resources<br />

for its customers in order to<br />

offer complete audio reference<br />

designs as well as tools that<br />

enable the design of quality<br />

audio optimized for the <strong>STM32</strong><br />

architecture. For example,<br />

ST and its partners offer a<br />

variety of evaluation boards<br />

with audio capabilities. ST also<br />

offers several docking station<br />

reference designs that provide<br />

a representative design that<br />

can be used in a wide range of<br />

embedded applications.<br />

For Apple Made for iPod (MFI)<br />

licensees, ST offers the Apple<br />

iAP application, a complete<br />

solution based on <strong>STM32</strong> F2 and<br />

<strong>STM32</strong> F4 devices to deliver a<br />

high-quality music experience.<br />

The Apple iAP application<br />

support both simple accessory<br />

and audio streaming accessory<br />

for iPod, iPhone, and iPad<br />

devices. Components include:<br />

〉〉 Either the <strong>STM32</strong>2xG-EVAL<br />

or <strong>STM32</strong>4xG-EVAL board<br />

to which developers connect<br />

their Apple Authentication<br />

Coprocessor (ACP) circuit<br />

〉〉 Free Apple “iPod Accessory<br />

Protocol” (iAP) firmware with<br />

Lingoes for authentication and<br />

control/information data<br />

〉〉 Free USB Host Library with<br />

USB Host HID class for control<br />

and information data<br />

For audio streaming accessories,<br />

the Apple iAP application also<br />

supports:<br />

〉〉 Free USB Host Library USB<br />

Host Audio classes<br />

〉〉 Remote iPod/iPhone/iPAD<br />

control<br />

〉〉 Digital audio streaming<br />

〉〉 Music tag extraction<br />

〉〉 Flash card reader capabilities,<br />

such as using an SD card or<br />

MMC, that can decode audio<br />

files from this media. Optimized<br />

decoders are provided for this<br />

purpose free of charge<br />

Today’s consumer audio devices<br />

are complex systems that<br />

require both high performance<br />

to support quality playback<br />

and flexibility to meet rapidly<br />

changing market expectations.<br />

With its high performance<br />

core, efficient multi-layer bus<br />

fabric enabling simultaneous<br />

data transactions, and the right<br />

mix of MCU peripherals and<br />

connectivity, the <strong>STM32</strong> F4 is<br />

an ideal architecture for many<br />

embedded and consumer audio<br />

applications. Developers can<br />

now design systems offering<br />

synchronized digital audio<br />

playback of the highest quality<br />

using a single MCU.<br />

18


<strong>STM32</strong> <strong>Journal</strong><br />

Bringing Floating-Point Performance and<br />

Precision to Embedded Applications<br />

By Olivier Ferrand, Application Engineer, STMicroelectronics<br />

Stephane Rainsard, Application Engineer, STMicroelectronics<br />

Abdelhamid Ghith, Application Engineer, STMicroelectronics<br />

Hardware-based floating-point<br />

capabilities have long been<br />

an option on high-end MPUs<br />

and DSPs designed to serve<br />

as computational workhorses.<br />

Embedded systems based<br />

on MCUs, however, have<br />

classically used a fixed-point<br />

implementation.<br />

There are many reasons for<br />

this. The types of simple<br />

calculations embedded<br />

systems needed to make<br />

could be handled sufficiently<br />

using fixed-point math. The<br />

resulting code was not only<br />

well-suited to an MCU’s native<br />

mathematical capabilities, it<br />

resulted in faster execution and<br />

was more memory efficient than<br />

an equivalent floating-point<br />

implementation would be. The<br />

only disadvantage of using<br />

fixed-point was the tradeoff<br />

between range and precision,<br />

which could be tolerated in<br />

most cases.<br />

Some of today’s embedded<br />

systems are completely different<br />

from their early predecessors.<br />

They operate at high clock<br />

speeds and need to perform<br />

more complex calculations than<br />

ever before. While processing<br />

and memory efficiency are still<br />

primary design considerations,<br />

a more useful range and higher<br />

precision have become essential<br />

in many applications.<br />

For example, to output high<br />

quality audio, MCUs need<br />

to support a wide dynamic<br />

range so that the system<br />

sounds good at both low and<br />

high frequencies. Medical<br />

applications, such as heart-rate<br />

monitors and glucose meters,<br />

need to measure more subtle<br />

signals and quantities. For<br />

industrial applications, higher<br />

precision in the MCU enables<br />

developers to use lower quality<br />

components in other parts of<br />

the system, reducing overall<br />

system cost. And with the<br />

increasing use of capacitivesensing<br />

technology, nearly<br />

every embedded application<br />

can benefit from detecting user<br />

inputs with greater accuracy,<br />

especially on displays with<br />

limited surface area.<br />

Floating-Point vs<br />

Fixed-Point<br />

The balanced combination of<br />

range and precision in a floatingpoint<br />

number comes from its<br />

separation of the exponent and<br />

mantissa. When a number is<br />

stored in a fixed-point format,<br />

the position of the exponent is<br />

assumed and all of the bits are<br />

reserved for the mantissa (i.e., the<br />

actual digits used to represent<br />

the number). The issues that<br />

arise with fixed-point formats<br />

are similar to those associated<br />

with reading an ADC: the<br />

wider the range of values to be<br />

represented, the less resolution<br />

there is at the lower end of the<br />

range. If very small changes in<br />

value need to be captured, the<br />

range must be more narrow.<br />

Developers, then, must choose<br />

between range and resolution.<br />

Floating-point addresses these<br />

issues by dedicating a few bits to<br />

an exponent to track the decimal<br />

point. For example, the singleprecision<br />

format supported<br />

by the <strong>STM32</strong> F4 uses one bit<br />

for the sign and 8 bits for the<br />

exponent, leaving 23+1 bits for<br />

the mantissa (the normalized<br />

format used adds an implicit<br />

bit to the 23 bits stored in the<br />

floating-point number). As such,<br />

a single-precision floating-point<br />

number gives a more useful<br />

range and precision combination<br />

compared to 8-, 16-, or even<br />

32-bit fixed-point numbers<br />

which either give a wide range<br />

with greatly reduced precision<br />

or higher precision with a much<br />

smaller range.<br />

19


<strong>STM32</strong> <strong>Journal</strong><br />

The 32-bit single-precision format<br />

is part of the IEEE 754 Floating-<br />

Point Arithmetic standard. This<br />

standard represents decades<br />

of experience and provides a<br />

common approach for supporting<br />

floating-point arithmetic that<br />

unifies processors, coding tools,<br />

and high-level design tools.<br />

Specifically, the 754 standard is<br />

the basis for the floating-point<br />

data types used in C. In turn,<br />

these floating-point C formats are<br />

used in code generated by highlevel<br />

modeling/Meta language<br />

tools like MATLAB and Scilab.<br />

One of the key benefits of using<br />

floating-point is that it enables<br />

developers to make more<br />

efficient use of C and powerful<br />

algorithm development tools that<br />

have previously been reserved<br />

for DSP- and MPU-based<br />

design. Generally speaking,<br />

floating-point numbers are easy<br />

to manipulate in C. High-level<br />

tools further accelerate and<br />

simplify development by enabling<br />

developers to describe complex<br />

algorithms in equation form and<br />

then quickly generate efficient<br />

C code rather than requiring<br />

them to hand-code algorithms<br />

in assembly. In addition, these<br />

high-level tools offer tremendous<br />

flexibility in allowing developers<br />

to rapidly implement changes<br />

to algorithms without having<br />

to completely rewrite and reoptimize<br />

algorithm code. The<br />

end result is significant savings<br />

in development time and cost<br />

savings. Such tools also offer a<br />

powerful means for developers<br />

to test and validate applications.<br />

Software-based<br />

Floating-Point<br />

In the past, developers had<br />

a limited number of ways in<br />

which they could utilize floatingpoint<br />

technology in MCUbased<br />

designs to increase<br />

computational precision.<br />

Introducing a second processor<br />

to handle calculations was<br />

often too expensive to consider.<br />

Alternatively, while the design<br />

could be migrated to a powerful<br />

MPU or DSP, neither type of<br />

architecture is as well-suited<br />

for the real-time demands of<br />

embedded systems as an MCU.<br />

The availability of highperformance<br />

MCUs made it<br />

possible for developers to bring<br />

in a floating-point library to<br />

perform calculations in software.<br />

While such libraries enabled<br />

developers to use floating-point<br />

algorithms, they came at the<br />

expense of significantly reducing<br />

system throughput.<br />

Such libraries also tended to<br />

be quite large and introduced<br />

significant processing overhead.<br />

For example, every operation<br />

between two numbers had to<br />

first align the numbers to have<br />

the same exponent and then,<br />

after performing the operation,<br />

round the result and code it<br />

back into the fixed-point format.<br />

While all of these actions were<br />

performed by the floating-point<br />

library and were not directly<br />

visible to developers, they still<br />

resulted in significantly degraded<br />

performance, greater processing<br />

latency, and an increased<br />

memory footprint.<br />

If none of these alternatives was<br />

feasible and developers still<br />

needed the flexibility that highlevel<br />

modeling tools bring to the<br />

design process, they had the<br />

alternative to manually convert<br />

the floating-point operations in<br />

the C code generated by the tool<br />

into a fixed-point implementation.<br />

The downside of this approach<br />

is the significant time investment<br />

required to adapt the code<br />

combined with the inflexibility to<br />

easily modify the algorithm later in<br />

the design cycle.<br />

The <strong>STM32</strong> F4<br />

Advantage: Integrated<br />

Floating-Point Technology<br />

To stay competitive, developers of<br />

precision and high-performance<br />

embedded systems need to<br />

have access to floating-point<br />

functionality without sacrificing<br />

performance or memory efficiency<br />

To stay competitive, developers of<br />

precision and high-performance embedded<br />

systems need to have access to floatingpoint<br />

functionality without sacrificing<br />

performance or memory efficiency.<br />

through the use of a softwarebased<br />

floating-point library or<br />

having to adapt the code by hand<br />

to a fixed-point implementation.<br />

With the <strong>STM32</strong> F4 MCU<br />

architecture, developers have<br />

20


<strong>STM32</strong> <strong>Journal</strong><br />

the option of bringing floatingpoint<br />

efficiency to an extensive<br />

range of low-cost embedded<br />

applications. The <strong>STM32</strong> F4<br />

integrates a floating-point unit<br />

(FPU) to execute these operations<br />

natively in hardware. The FPU is<br />

fully compliant with the IEEE.754<br />

standard and has its own 32-<br />

bit single-precision registers to<br />

handle operands and results.<br />

These registers can be viewed<br />

as double-word registers to<br />

enable more efficient load and<br />

store operations. The context<br />

of the FPU can be saved to<br />

the CPU stack using several<br />

methods based on the application<br />

architecture and whether registers<br />

need to be preserved or not.<br />

The FPU supports the five different<br />

classes of numbers defined by<br />

the 754 standard—normalized,<br />

denormalized, zeros, infinites,<br />

and NaNs (Not-a-Number). It also<br />

supports the five exceptions of the<br />

standard—overflow, underflow,<br />

inexact, divide by zero and invalid<br />

operation—allowing applications<br />

to handle operations such as trying<br />

to compute the square root of a<br />

negative number (i.e., resulting in<br />

NaN + invalid operation exception).<br />

Exceptions are “untrapped”,<br />

meaning that the FPU will return<br />

the result as specified by the 754<br />

standard and raise an exception<br />

float function1(float number1, float number2)<br />

{<br />

float temp1, temp2;<br />

}<br />

temp1 = number1 + number2;<br />

temp2 = number1/temp1;<br />

return temp2;<br />

# float function1(float number1, float number2)<br />

# {<br />

# float temp1, temp2;<br />

#<br />

# temp1 = number1 + number2;<br />

VADD.F32 S1,S0,S1<br />

# temp2 = number1/temp1;<br />

VDIV.F32 S0,S0,S1<br />

#<br />

# return temp2;<br />

BX LR<br />

# }<br />

Figure 1 There is a significant reduction in code size when an integrated FPU is available (code on left) than when one is not (code on right).<br />

flag. If needed, developers can also<br />

use the <strong>STM32</strong> F4 floating-point<br />

global interrupt to address the issue.<br />

The integrated FPU of the<br />

<strong>STM32</strong> F4 offers a number<br />

of advantages to embedded<br />

designers:<br />

〉〉 Access to the more useful<br />

range and precision that<br />

floating-point brings<br />

〉〉 Reduced coding complexity by<br />

being able to work with numbers<br />

in a more natural format<br />

FPU assembly code generation<br />

1 assembly instruction<br />

〉〉 Greater throughput compared to<br />

software floating-point libraries.<br />

〉〉 Accelerated application<br />

development as C code<br />

generated by high-level<br />

tools can be used without<br />

modification or wrappers<br />

〉〉 Smaller code footprint since<br />

instructions that used to be<br />

multiple lines of code in software<br />

libraries are now implemented<br />

with a single instruction<br />

# float function1(float number1, float number2)<br />

# {<br />

PUSH {R4,LR}<br />

MOVS R4,R0<br />

MOVS R0,R1<br />

MOVS R1,R4<br />

BL __aeabi_fadd<br />

MOVS R1,R0<br />

MOVS R0,R4<br />

BL __aeabi_fdiv<br />

POP {R4,PC}<br />

Call Soft-FPU<br />

〉〉 Simplified debugging as macro<br />

calls in floating-point libraries<br />

are eliminated<br />

Effectively, the <strong>STM32</strong> F4’s FPU<br />

reverses the value proposition<br />

between fixed- and floating-point<br />

for many MCU-based designs.<br />

Seamless Integration<br />

Figure 1 shows the difference<br />

between the assembly code<br />

generated when an FPU<br />

is available on an MCU as<br />

22


<strong>STM32</strong> <strong>Journal</strong><br />

void GenerateJulia_fpu(uint16_t size_x, uint16_t<br />

size_y, uint16_t offset_x, uint16_t offset_y, uint16_t<br />

zoom, uint8_t * buffer)<br />

{<br />

float tmp1, tmp2;<br />

float num_real, num_img;<br />

float radius;<br />

uint8_t i;<br />

uint16_t x,y;<br />

for (y=0; y


<strong>STM32</strong> <strong>Journal</strong><br />

Frame Zoom Duration<br />

with FPU<br />

Duration<br />

without<br />

FPU (microlib)<br />

Ratio<br />

Duration<br />

without<br />

FPU (no<br />

microlib)<br />

Ratio<br />

0 120 222 3783 17.04 2597 11.70<br />

1 110 194 3276 16.89 2243 11.56<br />

2 100 167 2794 16.73 1906 11.41<br />

3 150 298 5156 17.30 3558 11.94<br />

4 200 312 5412 17.35 3740 11.99<br />

5 275 296 5124 17.31 3540 11.96<br />

6 350 284 4905 17.27 3389 11.93<br />

7 450 289 4989 17.26 3448 11.93<br />

8 600 273 4705 17.23 3251 11.91<br />

9 800 267 4592 17.20 3173 11.88<br />

10 1000 261 4485 17.18 3100 11.88<br />

11 1200 255 4374 17.15 3023 11.85<br />

12 1500 242 4138 17.10 2860 11.82<br />

13 2000 210 3555 16.93 2455 11.69<br />

14 1500 242 4138 17.10 2860 11.82<br />

15 1200 255 4374 17.15 3023 11.85<br />

16 1000 261 4485 17.18 3100 11.88<br />

17 800 267 4592 17.20 3173 11.88<br />

18 600 273 4705 17.23 3251 11.91<br />

19 450 289 4989 17.26 3448 11.93<br />

20 350 284 4905 17.27 3389 11.93<br />

21 275 296 5123 17.31 3540 11.96<br />

22 200 312 5412 17.35 3740 11.99<br />

23 150 298 5156 17.30 3558 11.94<br />

24 100 167 2794 16.73 1906 11.41<br />

25 110 194 3276 16.89 2243 11.56<br />

Figure 3 This table shows the time spent for the calculation of the Julia set using different<br />

zooming factors. The presence of an FPU yields an increase in performance on<br />

the order of 11.5 to 17 faster with no modification to code required.<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

FIR filter execution time (using CMSIS DSP library)<br />

32-bit float<br />

no FPU<br />

10X improvement<br />

Best compromise<br />

Development time vs.<br />

performance<br />

implement the wide variety<br />

of algorithms embedded<br />

applications require. Each<br />

accelerates different types<br />

of processing and together<br />

they complement each other<br />

to enable the most optimized<br />

implementation based on<br />

performance, memory, and ease<br />

of programming.<br />

The ease of migrating an<br />

existing application to<br />

seamlessly take advantage<br />

of the <strong>STM32</strong> F4’s integrated<br />

FPU capabilities is important<br />

32-bit float<br />

FPU<br />

17.9X improvement<br />

Best performance<br />

requires effort for proper<br />

data management<br />

16-bit fixed-point<br />

SIMD optimized<br />

Figure 4 The relative time it takes to execute a FIR filter is 10X better when utilizing the <strong>STM32</strong><br />

F4's FPU. If more headroom is needed, the use of the 16-bit fixed-point SIMD (single<br />

instruction, multiple data) optimized instructions that are part of the DSP capabilities<br />

of the <strong>STM32</strong> F4 increase performance by 17.9X.<br />

as well. Consider an <strong>STM32</strong> F2<br />

application executing a floatingpoint<br />

FIR filter based on the<br />

CMSIS DSP library available<br />

through ARM. Figure 4 shows<br />

the relative time it takes to<br />

perform the FIR filter 100 times<br />

on an <strong>STM32</strong> F2 with no FPU or<br />

DSP capabilities using a purely<br />

software implementation to<br />

make the necessary floatingpoint<br />

calculations.<br />

When the same code is compiled<br />

for the <strong>STM32</strong> F4 to leverage<br />

the FPU in hardware, there is<br />

24


<strong>STM32</strong> <strong>Journal</strong><br />

Figure 5 A top-level view of the Julia set.<br />

an immediate 10X improvement<br />

in relative performance (i.e.,<br />

factoring out the impact of<br />

clock speed). Just by changing<br />

processors and activating<br />

the FPU, there is a significant<br />

performance advantage. This<br />

one change to the design is<br />

enough to yield a tremendous<br />

amount of headroom for<br />

additional tasks, depending upon<br />

the application.<br />

If more headroom is needed,<br />

the performance of the FIR<br />

filter can be further improved<br />

through the use of the 16-<br />

bit fixed-point SIMD (single<br />

instruction, multiple data)<br />

optimized instructions that are<br />

part of the DSP capabilities of<br />

the <strong>STM32</strong> F4. For example,<br />

when using the SIMD<br />

optimized FIR algorithm of the<br />

CMSIS library from ARM, the<br />

performance improvement is<br />

17.9X, again not considering<br />

clock speed.<br />

Unparalleled Flexibility<br />

Traditionally, when designing a<br />

DSP-based system, developers<br />

have to choose between fixedand<br />

floating-point architectures.<br />

Floating-point comes at a price<br />

premium, and manufacturers<br />

have had to balance this<br />

added cost against the extra<br />

complexity in application<br />

development that comes when<br />

using fixed-point. While the<br />

FPU is an optional component<br />

of the ARM Cortex-M4<br />

architecture, ST has designed<br />

its <strong>STM32</strong> F4 family so that<br />

every device offers an FPU. In<br />

this way, developers always<br />

have access to floating-point<br />

performance and precision<br />

without compromise.<br />

The bottom line is that the<br />

<strong>STM32</strong> F4 provides developers<br />

with a flexible, high-performance<br />

architecture that offers the realtime<br />

responsiveness of an MCU<br />

with the precise floating-point<br />

and digital signal processing<br />

capabilities that today’s<br />

embedded designs need. The<br />

ability of the <strong>STM32</strong> F4’s FPU<br />

to perform fast mathematical<br />

computations on C-based float<br />

data is a key benefit for many<br />

application tasks that require<br />

precision, including loop control,<br />

audio processing, sensor signal<br />

conditioning, digital media<br />

decoding, and digital filtering, to<br />

name just a few.<br />

The availability of an FPU<br />

speeds complex algorithm<br />

development, all the way from<br />

high-level design tools down to<br />

software generation. Hardware<br />

native support for floating-point<br />

operations simplifies coding and<br />

substantially accelerates product<br />

development by enabling the<br />

most efficient implementation for<br />

any mathematical calculation.<br />

Code that has been generated<br />

by such tools to be used directly<br />

by the FPU will offer the highest<br />

level of performance.<br />

The faster processing an FPU<br />

brings to applications offers<br />

either more headroom to<br />

support new functionality or<br />

faster time-to-sleep for powersensitive<br />

applications. It also<br />

enables developers to introduce<br />

more complex processing and<br />

functionality to applications<br />

than was previously possible<br />

with traditional MCUs. As a<br />

result, when implementing a<br />

mathematical algorithm on an<br />

<strong>STM32</strong> F4, developers never have<br />

to choose between performance<br />

and development time.<br />

25


<strong>STM32</strong> <strong>Journal</strong><br />

Achieving Ultra-Low-Power Efficiency<br />

for Portable Medical Devices<br />

By Jean J. Labrosse, Founder, CEO and President, Micriµm<br />

Jim Lombard, Application Engineer, STMicroelectronics<br />

The availability of low-cost, fullfeatured<br />

microcontrollers has<br />

created a revolution in the health<br />

industry leading to equipment<br />

migrating from the hospital and<br />

into patients’ homes. Designing<br />

these portable medical devices<br />

presents new challenges for<br />

engineers, such as implementing<br />

precision analog processing<br />

requiring complex calculations<br />

and creating a system that is<br />

simple and comfortable to operate<br />

even for physically-challenged<br />

users. In addition, these devices<br />

have to be able to run as long as<br />

possible without having to change<br />

or recharge batteries, even after<br />

sitting on a warehouse shelf for up<br />

to 18 months. To meet the unique<br />

requirements of portable medical<br />

devices, developers need an<br />

advanced processor architecture<br />

that combines performance,<br />

ultra-low-power process<br />

technology, low-cost, and<br />

efficient power management and<br />

communications capabilities.<br />

Designing for Medical<br />

Applications<br />

Next-generation medical<br />

equipment is evolving along<br />

two primary paths: sports and<br />

personal health. Both markets<br />

require innovation that enables<br />

portable devices to collect more<br />

run-time data and complete<br />

advanced calculations to create<br />

a comprehensive profile of a<br />

person’s current condition. Within<br />

personal health, developers<br />

also need to understand the<br />

special requirements of emerging<br />

disposable devices.<br />

The portable medical device<br />

market is extremely strong.<br />

Consider that with 8.3% of the<br />

US population suffering from<br />

diabetes, a portable glucose<br />

meter is an essential tool for<br />

individuals who need to monitor<br />

their glucose for conceivably the<br />

rest of their lives. Similarly, an<br />

electrocardiogram (ECG) monitor<br />

enables people at risk for any<br />

number of conditions to record<br />

their ECG waveform at home. In<br />

addition to eliminating the need<br />

for multiple office visits, homebased<br />

testing gives doctors a<br />

more comprehensive patient<br />

history to work with.<br />

Portable handheld medical<br />

devices have a stringent set<br />

of requirements (see Figure 1.)<br />

Given that prospective users can<br />

be from all walks of life, devices<br />

must be as simple to use as<br />

possible, with limited or no setup<br />

required. They also need to be<br />

comfortable to use and operable<br />

even by physically-challenged<br />

users. In general, the display<br />

must be large for ease of reading<br />

and utilize a minimum number of<br />

buttons to avoid confusing users.<br />

Ideally, as many functions as<br />

possible need to be automated<br />

so that users don’t have to be<br />

trained how to use the device.<br />

If the device has a touchscreen,<br />

the GUI must be intuitive<br />

and have a limited number of<br />

〉〉 Easy and simple to operate<br />

〉〉 Large display<br />

〉〉 Minimal number of large<br />

operator buttons<br />

〉〉 Batteries are easy to<br />

change, easy to recharge,<br />

or are sufficient to last<br />

for the operating life of<br />

the device<br />

〉〉 Safe and accurate<br />

operation<br />

〉〉 Low cost<br />

〉〉 Low power<br />

〉〉 Simple connectivity<br />

〉〉 Audio feedback<br />

Figure 1 Common Product Requirements for<br />

Portable Medical Devices<br />

operating modes. Both low cost<br />

and low power consumption<br />

are critical as well, and devices<br />

need to be able to run as long<br />

as possible without having to<br />

change or recharge the battery.<br />

26


<strong>STM32</strong> <strong>Journal</strong><br />

An important trend in medical<br />

applications is the use of<br />

disposable devices. Highvolume<br />

devices such as heartrate<br />

monitors can provide<br />

medical professionals with<br />

important data that is useful in<br />

identifying issues before they<br />

become full-blown problems;<br />

e.g., by having a patient track<br />

his or her heart rate for a few<br />

weeks after an operation, a<br />

doctor can verify the patient’s<br />

successful recovery.<br />

The use of disposable devices<br />

offers many benefits. Rather than<br />

require patients to purchase a<br />

monitoring device designed to<br />

operate for years, the hospital<br />

can provide a “disposable”<br />

version that can perform the<br />

task reliably for weeks at a<br />

significantly lower cost. This<br />

strategy also enables hospitals<br />

to leverage innovations in<br />

technology faster.<br />

Designing a medical device<br />

for limited use, however,<br />

substantially shifts the design<br />

mindset. For example, device<br />

cost becomes significantly more<br />

important when the revenue<br />

stream from consumables is<br />

weeks rather than years. Devices<br />

also have to be ready to operate<br />

out-of-the-box, so parameters<br />

such as which test strips are<br />

going to be used need to be<br />

preprogrammed into devices by<br />

manufacturers. Power efficiency<br />

becomes more critical as well.<br />

Even though they will only be<br />

used for a handful of weeks,<br />

devices may first sit on the shelf<br />

for up to 18 months. During this<br />

time, the device is in a low power<br />

mode with the real-time clock<br />

running. With an ultra-low-power<br />

MCU, it is possible to achieve<br />

this without requiring the user to<br />

change batteries.<br />

The <strong>STM32</strong> Architecture<br />

With the <strong>STM32</strong> architecture,<br />

ST offers developers a variety<br />

of options for balancing cost,<br />

power, and performance.<br />

With its smaller die, reduced<br />

instruction set, and smaller<br />

memory footprint, the <strong>STM32</strong><br />

F0 provides excellent power<br />

efficiency at a very low cost.<br />

This is an ideal MCU for<br />

applications that don’t need<br />

to operate longer than six<br />

months or that use rechargeable<br />

batteries. For applications where<br />

power is tantamount, such as<br />

devices operating on a coin cell,<br />

the <strong>STM32</strong> L1 is optimized for<br />

ultra-low power performance.<br />

For devices needing more<br />

processing capabilities and<br />

27


<strong>STM32</strong> <strong>Journal</strong><br />

connectivity, the <strong>STM32</strong> F1,<br />

<strong>STM32</strong> F2, and <strong>STM32</strong> F4<br />

offer an increasing range of<br />

capabilities.<br />

The <strong>STM32</strong> F0 is based on the<br />

ARM Cortex-M0 core. While<br />

other manufacturers also offer<br />

MCUs based on this core,<br />

ST has integrated its “<strong>STM32</strong><br />

DNA” into the <strong>STM32</strong> F0 to<br />

create an MCU that provides<br />

efficient data processing with<br />

the key peripherals MCUbased<br />

applications require.<br />

The <strong>STM32</strong> F0 also offers DMA<br />

capabilities to accelerate data<br />

processing and enable the<br />

lowest power operation even<br />

when sampling the ADC at<br />

a high data rate. With MCUs<br />

that have a lower level of<br />

integration, developers have to<br />

make compromises, such as<br />

settling for an 8-bit ADC rather<br />

than having access to the 12-<br />

bit ADC of the <strong>STM32</strong> F0.<br />

For complex medical systems<br />

that involve extensive<br />

computations, power<br />

consumption can be reduced<br />

by introducing the <strong>STM32</strong> F0<br />

as a second processor. When<br />

a system is in a low power<br />

mode, for example, it must still<br />

manage the UI and incoming<br />

data over its interfaces. Using<br />

<strong>STM32</strong> L15x<br />

System<br />

Power supply<br />

Internal regulator<br />

POR/PDR/PVD/BOR<br />

Xtal oscillators<br />

32 kHz + 1 ~24 MHz<br />

Internal RC oscillators<br />

37 kHz + 16 MHz<br />

Internal multispeed<br />

ULP RC oscillator<br />

64 kHz to 4 MHz<br />

PLL<br />

Clock control<br />

RTC/AWU<br />

2x watchdogs<br />

(independent and window)<br />

37/51/83/109/115 I/Os<br />

Cyclic redundancy<br />

check (CRC)<br />

Voltage scaling 3 modes<br />

Control<br />

6 to 8x 16-bit timer<br />

1x 32-bit timer<br />

Abbreviations:<br />

AWU: Auto wakeup from halt<br />

BOR: Brown out reset<br />

I 2 C: Inter integrated circuit<br />

ARM Cortex-M3 CPU<br />

32 MHz<br />

Nested vector<br />

interrupt<br />

controller (NVIC)<br />

JTAG/SW debug<br />

Embedded Trace<br />

Macrocell (ETM)<br />

Memory protection<br />

unit (MPU)<br />

AHB bus matrix<br />

Up to 12-channel DMA<br />

Touch sensing<br />

Analog I/O groups<br />

Up to 39 touchkeys<br />

Display<br />

LCD driver<br />

(8x40 / 4x44)<br />

Encryption<br />

AES (128 bits)*<br />

PDR: Power down reset<br />

POR: Power on reset<br />

PVD: Programmable voltage detector<br />

32- to 384-Kbyte<br />

Flash memory, dual bank, RWW<br />

10- to 48-Kbyte SRAM<br />

Up to 128-byte backup data<br />

4- to 12-byte EEPROM<br />

Boot ROM<br />

Connectivity<br />

FSMC<br />

USB 2.0 FS<br />

3 to 5x USART<br />

2 to 3x SPI<br />

2x I 2 C<br />

SDIO<br />

Analog<br />

2x 12-bit DAC<br />

12-bit ADC<br />

Up to 40 channels<br />

2x comparators<br />

3x op-amps<br />

Temperature sensor<br />

Note: * <strong>STM32</strong>L16x only<br />

RTC: Real time clock<br />

SPI: Serial peripheral interface<br />

USART: Universal sync/async receiver transmitter<br />

Figure 2 The <strong>STM32</strong> L1 family is built on ST’s 130 nm ultra-low leakage process technology that minimizes node capacitance<br />

for ultra-low-power operation and efficiency.<br />

28


<strong>STM32</strong> <strong>Journal</strong><br />

the primary host processor to<br />

perform these operations will<br />

use substantially more power<br />

than having a more powerefficient<br />

<strong>STM32</strong> F0 dedicated<br />

to these tasks.<br />

For the highest power<br />

efficiency, ST offers its <strong>STM32</strong><br />

L1 family (see Figure 2). Based<br />

on the ARM Cortex-M3, the<br />

<strong>STM32</strong> L1 is built on ST’s 130<br />

nm ultra-low leakage process<br />

technology that minimizes node<br />

capacitance. This, combined<br />

with how the ARM Cortex<br />

RISC architecture reduces the<br />

overall number of active nodes,<br />

creates a powerful combination<br />

for ultra-low-power operation.<br />

Providing 32-bit performance at<br />

up to 32 MHz, <strong>STM32</strong> L1 MCUs<br />

offer up to 384 K Flash and 48<br />

K SRAM, as well as from 48 to<br />

144 pins to support high I/O<br />

applications.<br />

With the <strong>STM32</strong> F1, <strong>STM32</strong><br />

F2, and <strong>STM32</strong> F4 families,<br />

developers have access to a<br />

tremendous range of processing<br />

capacity and connectivity.<br />

The <strong>STM32</strong> F1 is based on<br />

a Cortex-M3 core running at<br />

up to 72 MHz. For higher-end<br />

applications, the <strong>STM32</strong> F4<br />

offers a 168 MHz Cortex-M4<br />

core with both integrated<br />

floating-point (FPU) and<br />

digital signal processing (DSP)<br />

capabilities. In addition, these<br />

MCUs have multiple DMAs<br />

and a multi-layer bus matrix to<br />

offload the CPU and maximize<br />

data throughput.<br />

Power Efficiency<br />

There are many ways the <strong>STM32</strong><br />

architecture optimizes power<br />

consumption. For example,<br />

the <strong>STM32</strong> L1 has an internal<br />

LDO regulator that can be<br />

programmed to three discrete<br />

voltage levels. Since energy<br />

consumed is proportional to the<br />

square of the core voltage, even<br />

a very small change in voltage<br />

can produce dramatic results.<br />

This allows developers to keep<br />

the <strong>STM32</strong> L1 running at 1.2V<br />

and only switch to a higher<br />

performance range when a<br />

specific task needs to run faster.<br />

Integrated DSP capabilities<br />

in the <strong>STM32</strong> L1 also speed<br />

complex algorithm processing.<br />

This is in addition to the<br />

power/performance advantage<br />

32-bit architectures have<br />

compared to an 8- or 16-bit<br />

MCUs. With its wider bus, the<br />

<strong>STM32</strong> can perform tasks such<br />

as moving memory or complex<br />

mathematics much faster<br />

and more efficiently as well.<br />

The <strong>STM32</strong> L1's high level of integration<br />

enables it to provide a single-chip solution,<br />

with the exception of a few analog signal<br />

conditioning circuits, for many portable medical<br />

applications, including glucose meters, heartrate<br />

monitors, and pulse oximeters.<br />

The <strong>STM32</strong> L1 also provides<br />

multiple, industry-leading lowpower<br />

modes (see Figure 3),<br />

dynamic voltage scaling, and<br />

the ability to run out of RAM<br />

and disable the Flash controller<br />

Run<br />

230µA/MHz<br />

From<br />

FLASH<br />

Range 3<br />

Run<br />

186µA/MHz<br />

From<br />

RAM<br />

Range 3<br />

Low Power<br />

Run<br />

@ 32 KHz<br />

to conserve power. It also<br />

integrates a programmable<br />

voltage detector that can<br />

monitor battery voltage using<br />

less current than an ADC.<br />

<strong>STM32</strong>L15x Ultra-Low Power Consumption<br />

CPU on<br />

Peripherals activated<br />

RAM & context preserved<br />

Backup registers preserved<br />

10.4µA<br />

6.1µA<br />

Low Power<br />

Sleep<br />

@ 32 KHz<br />

1.30µA/<br />

0.43µA<br />

Stop with<br />

or without<br />

RTC<br />

1.0µA/<br />

0.27µA<br />

Standby<br />

with or<br />

without<br />

RTC<br />

Figure 3 The <strong>STM32</strong> L1 provides multiple, industry-leading low-power modes to maximize<br />

device operating power efficiency.<br />

29


<strong>STM32</strong> <strong>Journal</strong><br />

The <strong>STM32</strong> L1’s high level of<br />

integration enables it to provide<br />

a single-chip solution, with<br />

the exception of a few analog<br />

signal conditioning circuits,<br />

for many portable medical<br />

applications, including glucose<br />

meters, heart-rate monitors,<br />

and pulse oximeters. For<br />

example, in a glucose meter<br />

(see Figure 4), the <strong>STM32</strong><br />

L1 can automatically wake<br />

from sleep when a test strip<br />

is inserted into the device. Its<br />

2-channel DAC can be used to<br />

generate a strip bias, enable<br />

strip calibration, and output<br />

audible instructions and test<br />

results. Timers accurately<br />

control the ADC trigger for<br />

sample measurement, onboard<br />

comparators measure correct<br />

sample staging, and the<br />

temperature sensor logs the<br />

ambient temperature for use in<br />

calculating results. Developers<br />

can also use the integrated<br />

comparators to create a powerefficient<br />

analog watchdog that<br />

monitors an input and wakes<br />

the <strong>STM32</strong> L1 when either the<br />

upper or lower threshold is<br />

exceeded (see Figure 5).<br />

A key manner in which the<br />

<strong>STM32</strong> architecture conserves<br />

power is through the ability of<br />

the CPU to sleep during ADC<br />

Glucose<br />

Test Strip<br />

Vout<br />

Cal<br />

Reference<br />

Temp<br />

Sensor<br />

Strip Detect<br />

<strong>STM32</strong> L1<br />

DMA Controllers<br />

Timers<br />

PWM<br />

1 x 12-bit ADC<br />

26 channels/<br />

1Msps<br />

2 x Comparators<br />

2 x 12-bit DAC<br />

sample capture. To achieve this,<br />

the entire analog data capture<br />

chain needs to be completely<br />

automated, with no need for<br />

CPU intervention at any point<br />

after the chain is initiated.<br />

Specifically, after the CPU<br />

configures and starts the auto<br />

Data EEPROM 4KB<br />

Up to 16KB SRAM<br />

64KB-128KB<br />

Flash Memory<br />

CORTEX TM -M3<br />

CPU 32 MHz<br />

With MPU<br />

8x40<br />

Segment LCD<br />

2 x I 2 C<br />

GPIOs<br />

sample capture, it enters sleep<br />

mode. Between samples, the<br />

ADC enters an automatic shut<br />

down mode until a timer triggers<br />

it. The ADC captures the current<br />

sample, using the DMA to store<br />

the data in SRAM, and then<br />

shuts down again. This process<br />

Power Supply<br />

Reg 1.8V/1.5V/1.2V<br />

POR/PDR/PVD/BOR<br />

RTC/AWU +<br />

80B Backup Regs<br />

USB 2.0 FS<br />

Figure 4 The <strong>STM32</strong> L1’s high level of integration enables it to provide a single-chip solution, with the exception of a few<br />

analog signal conditioning circuits, for many portable medical applications such as blood glucose meters.<br />

2 AA battery<br />

Internet<br />

Host PC<br />

LCD Panel<br />

8 x 40<br />

USER Buttons<br />

Voice<br />

repeats until the entire capture<br />

sequence is complete. The<br />

DMA will then wake the CPU to<br />

process the samples that have<br />

been stored in SRAM.<br />

The power savings can be<br />

significant using the unique<br />

ability of the <strong>STM32</strong> L1 to<br />

30


<strong>STM32</strong> <strong>Journal</strong><br />

Window comparator<br />

configuration switch<br />

Figure 5 Developers can use the integrated comparators to create a power-efficient analog<br />

watchdog that monitors an input and wakes the <strong>STM32</strong> L1 when either the upper or<br />

lower threshold is exceeded.<br />

CPU running at<br />

Input voltage<br />

Upper threshold:<br />

V REFINT<br />

= 1.22V<br />

Lower threshold:<br />

Multiple source<br />

COMP1<br />

–<br />

+<br />

COMP2<br />

+<br />

–<br />

ADC current consumption (in µA)<br />

(ADC in continuous mode, delay of 15 cycles between channel<br />

conversions, conversions last 1µs (16 cycles at 16MHz)<br />

ADC is running<br />

in Normal Mode **<br />

ADC is On<br />

in Power Saving Mode***<br />

16MHz (from HSI) 1453µA 630µA<br />

4MHz (from MSI)* 1453µA 445µA<br />

1MHz (from MSI)* 1000µA 258µA<br />

32kHz (from MSI)* 900µA 150µA<br />

* : HSI is On ** : PDI=PDD=0 *** : PDI=PDD=1<br />

Figure 6 The automatic shut down mode of the <strong>STM32</strong> L1 turns off the ADC for the majority of<br />

time between samples to provide dramatic savings in power consumption compared<br />

to when the ADC must be left on continuously.<br />

power down the ADC between<br />

samples. Consider an ECG<br />

monitor where the sampling rate<br />

is 1 KHz or one sample every 1<br />

ms. The ADC of the <strong>STM32</strong> L1<br />

consumes at most 900 µA and<br />

has a capture time of only 1 µs<br />

for a power duty cycle of 0.1%.<br />

If the <strong>STM32</strong> L1 were a typical<br />

MCU, the ADC would consume<br />

full power even when it is not<br />

capturing samples. Thus, if<br />

the system were using a 1 ms<br />

measurement window, the ADC<br />

would draw 900 µA during a<br />

power duty cycle of 100%.<br />

With the automatic power down<br />

control mode of the <strong>STM32</strong><br />

L1, however, the ADC is able<br />

to power down for the majority<br />

of time between samples.<br />

Figure 6 shows actual power<br />

consumption numbers for<br />

various operating conditions. The<br />

difference in power consumption<br />

is dramatic between when the<br />

ADC is left on continuously<br />

versus using automatic power<br />

down control saving mode,<br />

dropping in the lowest power<br />

consumption example from 900<br />

µA to 150 µA at 32 kHz.<br />

Note that the <strong>STM32</strong> L1 has<br />

the flexibility to scale the CPU<br />

clock while keeping a constant<br />

16 MHz clock available for ADC<br />

conversion. This means that the<br />

CPU can run at a frequency of 32<br />

kHz while the ADC maintains a 1<br />

MSPS sampling rate while only<br />

drawing 150 µA.<br />

Precision Analog<br />

Many medical applications require<br />

the ability to read very small<br />

signals. For example, an ECG<br />

monitor has to capture the very<br />

small electrical signal generated by<br />

the heart muscle while removing<br />

the 50 or 60 Hz noise signals<br />

that are common in electrodes<br />

attached to the human body.<br />

MCUs typically offer an 8- or 10-<br />

bit ADC. With its 12-bit ADCs,<br />

<strong>STM32</strong> MCUs enable developers<br />

to achieve greater precision when<br />

measuring the weak signals<br />

typical of the human body. In<br />

addition, with sample rates up to<br />

1 Msample/second, developers<br />

have the headroom to access<br />

even more accuracy through<br />

oversampling. This enables<br />

devices to operate in noisy<br />

environments and applications<br />

that could not be served by an<br />

MCU with only an 8-bit ADC.<br />

Power-Efficient<br />

Communications<br />

The <strong>STM32</strong> architecture offers<br />

a wide range of interfaces,<br />

31


<strong>STM32</strong> <strong>Journal</strong><br />

including Ethernet (available<br />

on the <strong>STM32</strong> F1, F2, and F4),<br />

USB (available on the <strong>STM32</strong><br />

L1, F1, F2, and F4), SPI, SDIO,<br />

and CAN. For applications that<br />

only transmit data a few times<br />

a day, interface current has<br />

minimal impact on operating<br />

life. However, for an application<br />

like a heart-rate monitor which<br />

may be constantly transmitting<br />

data, the efficiency of the<br />

<strong>STM32</strong> L1’s peripherals can<br />

provide substantial benefit.<br />

With integrated USB and/or<br />

Ethernet, developers can support<br />

connectivity at the lowest cost.<br />

Some devices may need<br />

to support both USB and<br />

Ethernet. By abstracting the<br />

communications interface, the<br />

application won’t need to know<br />

which type of link it is using.<br />

Developers can achieve this by<br />

creating a data in/out function<br />

that determines the actual data<br />

interface to be used in any<br />

particular operating circumstance<br />

and applies the appropriate<br />

communications APIs.<br />

Developers can also take steps<br />

at the application level to ensure<br />

power efficient transmissions.<br />

For example, collecting data for<br />

a short time and then bursting it<br />

will conserve power compared to<br />

continuous transmission.<br />

For devices that will connect<br />

directly to the Internet and<br />

transmit personal health data,<br />

security is essential. This<br />

will impact power efficiency<br />

since security handshaking<br />

will increase the time the<br />

communications link must be<br />

active. The <strong>STM32</strong> F2 and<br />

<strong>STM32</strong> F4 minimize transmit<br />

processing latency with an<br />

integrated security engine that<br />

accelerates AES 256, 3DES,<br />

SHA-1, MD-5, and HMAC<br />

processing.<br />

Using an off-the-shelf stack<br />

substantially speeds time-tomarket.<br />

For example, the 40<br />

standard APIs in the Micriµm<br />

µC/TCP-IP stack encapsulate<br />

more than 150,000 lines of code.<br />

This stack has been designed<br />

to minimize memory usage<br />

and so further reduce power<br />

consumption. Extensions to the<br />

stack are also available, such as<br />

SSL support.<br />

Micriµm is also introducing<br />

IPv6 over IPv4 support to its<br />

µC/TCP-IP stack. IPv6 is an<br />

important enabling technology<br />

for medical applications. With the<br />

virtually unlimited number of IPv6<br />

addresses available, individual<br />

devices can be assigned a<br />

unique IP address. This enables<br />

devices and patients to be<br />

identified automatically and<br />

without interaction from users<br />

who may not be tech-savvy<br />

enough to register a device<br />

online. It also future-proofs<br />

devices to be able to be used<br />

in the years to come as IPv4 is<br />

phased out.<br />

To further simplify device<br />

connectivity, ST offers the<br />

ST Healthcare Library that is<br />

certified for the USB Personal<br />

Health Device specification.<br />

This spec supports different<br />

subclasses of common medical<br />

devices and enables doctors<br />

and patients to connect<br />

seamlessly over the Internet.<br />

The library brings remote<br />

monitoring capabilities to<br />

every <strong>STM32</strong>-based design,<br />

as well as accelerates timeto-market.<br />

In addition, the<br />

library components have<br />

been tested and certified,<br />

enabling developers to bypass<br />

several certification tests with<br />

confidence that the stack<br />

will perform as required. The<br />

stack footprint is very small<br />

at just 9.2 K Flash and 2.9 K<br />

RAM for a typical thermometer<br />

application. A thermometer<br />

reference demo is available on<br />

the <strong>STM32</strong> L1 evaluation board.<br />

Real-time Integration<br />

To enable a portable device to<br />

support a wireless interface<br />

with only 3AAA batteries, it is<br />

critical to be able to intelligently<br />

manage power. Using an<br />

OS like Linux, for example,<br />

significantly increases startup<br />

time from low power modes.<br />

In contrast, tight integration<br />

between the stack and a realtime<br />

kernel ensures that the<br />

radio can be turned on and off<br />

as quickly as possible.<br />

Running a real-time kernel also<br />

helps simplify power mode<br />

management. Without a kernel,<br />

the typical embedded application<br />

manages tasks using a main<br />

loop. Effectively, the loop polls<br />

a series of tasks. Even if the<br />

device is not doing anything,<br />

the CPU still needs to poll tasks<br />

continuously to see if any of<br />

them need attention.<br />

Because developers never<br />

know which tasks will be<br />

executed during any loop<br />

iteration, it becomes difficult to<br />

determine the optimal power<br />

saving mode in which to place<br />

the system after any particular<br />

task has completed since<br />

each mode offers different<br />

responsiveness. In addition,<br />

as the main loop increases in<br />

32


<strong>STM32</strong> <strong>Journal</strong><br />

size, managing power becomes<br />

more convoluted.<br />

With a real-time kernel, the<br />

CPU returns to the IDLE task<br />

whenever none of the tasks<br />

need attention. Because no<br />

tasks are currently running, it is<br />

much more straightforward to<br />

manage power. When the IDLE<br />

task is interrupted, the system<br />

can quickly be powered up. This<br />

gives developers the ability to<br />

take the system context into<br />

account when selecting a power<br />

mode and clock speed that<br />

provides the appropriate level<br />

of responsiveness (i.e., how<br />

fast the system can wake to an<br />

active state).<br />

Using the IDLE function to<br />

manage power also simplifies<br />

any migration of applications<br />

between <strong>STM32</strong> processors.<br />

For example, tuning an<br />

application based on the <strong>STM32</strong><br />

L1 for power may require a<br />

different approach compared<br />

to an <strong>STM32</strong> F0 or <strong>STM32</strong><br />

F4. By consolidating power<br />

management in a single routine,<br />

developers can be sure to be<br />

able to optimize efficiency<br />

without having to completely<br />

rewrite the application.<br />

Designing your own real-time<br />

kernel is an option. However,<br />

the use of a commercial<br />

real-time kernel like µC/OS-<br />

III from Micriµm provides a<br />

wide range of multitasking<br />

capabilities that make it easier<br />

to add new functionality to<br />

a system without impacting<br />

system reliability. For example,<br />

when a new feature like a<br />

GUI is introduced, it can be<br />

given a low priority. Thus, a<br />

critical system task such as<br />

driving the pump on a blood<br />

pressure monitor is always<br />

executed when it needs to be<br />

and not disrupted because<br />

a USB packet arrived at a<br />

critical moment. In this way,<br />

developers can extend the<br />

UI of a device, as well as its<br />

connectivity options, without<br />

negatively impacting reliability.<br />

A real-time kernel also<br />

simplifies design and power<br />

management as more complex<br />

functions are added to a<br />

system. For example, while<br />

executing a time-intensive<br />

function without a kernel,<br />

developers need to manually<br />

check within the function if<br />

shorter, higher priority tasks<br />

need attention. With µC/OS-<br />

III, multitasking and priority<br />

management is automatically<br />

managed, and developers can<br />

break their system into separate<br />

and distinct pieces without<br />

having to manually manage<br />

how these pieces interact. This<br />

also enables developers to<br />

more accurately measure power<br />

consumption during early<br />

design stages as well as verify<br />

where in their applications<br />

power spikes may be occurring.<br />

Certification is a concern for<br />

many manufacturers of medical<br />

devices as well. Given that it is<br />

the complete system that must<br />

be certified, the Micriµm µC/OS-<br />

III kernel and µC/TCP-IP stack<br />

are provided with source code<br />

and are fully documented so<br />

as to facilitate the certification<br />

process. Using the Micriµm<br />

kernel and stack can simplify<br />

certification compared to an<br />

in-house kernel implementation<br />

that does not have the maturity<br />

and years of industry testing<br />

commercial code offers.<br />

Rather than base a design on a<br />

specialized MCU that can serve<br />

only one price point, the <strong>STM32</strong><br />

F0 and <strong>STM32</strong> L1, along with<br />

the rest of the <strong>STM32</strong> families,<br />

provide a flexible architecture<br />

that can enable developers to<br />

leverage their code and other<br />

IP across an entire product<br />

line. In addition, code- and pincompatibility<br />

between devices<br />

allows for a nearly seamless<br />

migration between processors.<br />

Designing with the <strong>STM32</strong><br />

architecture is also faster<br />

compared to 8- or 16-bit MCUs.<br />

Specifically, 8- and 16-bit MCUs<br />

tend to have a limited tool chain,<br />

especially when the MCU vendor<br />

is the only source for tools.<br />

In contrast, the <strong>STM32</strong><br />

architecture is based on the ARM<br />

Cortex-M architecture. As such,<br />

it has the largest ecosystem<br />

available for any MCU. These<br />

tools provide more run-time<br />

information and advanced<br />

debugging capabilities to ensure<br />

system robustness and power<br />

efficiency, as well as verify device<br />

operation to speed certification.<br />

Creating portable medical<br />

devices requires an MCU<br />

architecture that provides<br />

both performance and power<br />

efficiency. With its tightly<br />

integrated architecture, <strong>STM32</strong><br />

MCUs offer a single-chip<br />

solution that balances cost<br />

and ultra-low-power operation.<br />

Combined with the powerful<br />

capabilities of production-ready<br />

software such as Micriµm’s µC/<br />

OS-III kernel and µC/TCP-IP<br />

stack, developers can bring<br />

reliable medical products to<br />

market faster.<br />

33


<strong>STM32</strong> <strong>Journal</strong><br />

Accelerating Time-to-Market<br />

Through the ARM Cortex-M Ecosystem<br />

By Reinhard Keil, Director of MCU Tools, ARM Germany GmbH<br />

Shawn Prestridge, Senior Field Applications Engineer, IAR Systems<br />

Laurent Desseignes, Embedded Software Marketing Manager, STMicroelectronics<br />

Before ARM, the MCU market<br />

was fragmented with a multitude<br />

of vendors offering proprietary<br />

architectures and tool chains.<br />

Now, with open architectures<br />

like the Cortex-M, developers<br />

are able to base products on<br />

an entire architecture rather<br />

than have to tie designs to<br />

a specific MCU. The result<br />

is that designs won’t be tied<br />

to an outdated architecture<br />

in just a few years. Rather,<br />

designs will always be current<br />

with the evolving Cortex-M<br />

architecture and extensible<br />

across a wide range of devices<br />

that can leverage the same<br />

application code. For example,<br />

ARM has recently shown its<br />

commitment to keeping the<br />

Cortex-M architecture relevant to<br />

embedded design as well as on<br />

the leading edge of technology<br />

by introducing the Cortex-M0<br />

core. This core extends the<br />

Cortex-M architecture into<br />

traditionally 8- and 16-bit<br />

applications while offering 32-bit<br />

performance.<br />

The <strong>STM32</strong> Ecosystem<br />

ST is the MCU industry leader<br />

with the broadest portfolio<br />

of Cortex-M-based devices<br />

from a single company. The<br />

<strong>STM32</strong> product line offers an<br />

extraordinary variety of options<br />

with more than 300 devices<br />

all based on the Cortex-M<br />

cores (M0, M3, and M4) to give<br />

developers unparalleled flexibility<br />

in finding the optimal MCU for<br />

their application (see Figure 1).<br />

Part of ST’s strength can be<br />

found in the software and tools<br />

available to developers for<br />

the <strong>STM32</strong> architecture. ST<br />

provides a range of evaluation<br />

boards, including the <strong>STM32</strong><br />

Discovery Kits, which enable<br />

developers to evaluate each<br />

of the <strong>STM32</strong> MCU product<br />

lines. The Discovery kits are<br />

the cheapest and quickest way<br />

to discover the <strong>STM32</strong> family<br />

while the evaluation boards<br />

are complete development<br />

platforms providing access to<br />

all peripherals in each <strong>STM32</strong><br />

product line and extension<br />

Cortex-M4<br />

Cortex-M3<br />

Cortex-M0<br />

High-Performance DSP MCUs<br />

168 MHz Cortex-M4<br />

Up to 2-Mbyte Flash<br />

Up to 256 -Kbyte SRAM<br />

Mainstream MCUs<br />

24 to 72 MHz Cortex-M3<br />

16-Kbyte to 1-Mbyte Flash<br />

Up to 96-Kbyte SRAM<br />

<strong>STM32</strong> F1<br />

Entry-level MCUs<br />

48 MHz Cortex-M0<br />

16- to 128-Kbyte Flash<br />

Up to 20-Kbyte SRAM<br />

<strong>STM32</strong> F2<br />

headers to make it easy to<br />

connect a daughterboard<br />

or wrapping board for the<br />

target specific application.<br />

Development tools include<br />

in-circuit debuggers and<br />

eprogrammers, reference<br />

<strong>STM32</strong> F4<br />

32-bit/DSC applications<br />

High-performance MCUs<br />

120 MHz Cortex-M3<br />

256-Kbyte to1-Mbyte Flash<br />

Up to 128-Kbyte SRAM<br />

<strong>STM32</strong> L1<br />

Ultra-low-power MCUs<br />

32 MHz Cortex-M3<br />

32-to 384-Kbyte Flash<br />

Up to 48-Kbyte SRAM<br />

16/32-bit applications<br />

<strong>STM32</strong> F0<br />

8/16-bit applications<br />

Highest Performance<br />

MCU with DSP and FPU<br />

Performance Efficiency<br />

Feature-rich<br />

Connectivity<br />

Full-featured 32-bit MCU<br />

Budget Price<br />

Figure 1 The <strong>STM32</strong> product line offers more than 300 devices all based on the Cortex-M to<br />

give developers unparalleled flexibility in finding the optimal MCU for their application.<br />

34


AD329 Keil <strong>STM32</strong> <strong>Journal</strong>(2).pdf 1 23/03/2012 16:39<br />

<strong>STM32</strong> <strong>Journal</strong><br />

C<br />

M<br />

Y<br />

CM<br />

MY<br />

CY<br />

CMY<br />

K<br />

Leading Embedded<br />

Development Tools<br />

The complete development environment<br />

for ARM ® processor-based<br />

microcontroller designs<br />

1-800-348-8051<br />

www.keil.com<br />

designs, and application notes.<br />

ST also offers a wide variety<br />

of firmware libraries, many of<br />

which are available at no cost,<br />

for implementing motor control,<br />

DMA, timers, interfaces, audio,<br />

LCDs, communications, GUIs,<br />

all standard peripherals, touch<br />

sensing, and in-application<br />

programming, to name a few.<br />

ST also recognizes the benefit of<br />

working with other leaders in the<br />

industry to expand the tools and<br />

software available to developers.<br />

The <strong>STM32</strong> architecture is<br />

based on the ARM Cortex-M<br />

architecture which has the most<br />

comprehensive ecosystem of<br />

tools, software, and engineering<br />

resources of any microcontroller<br />

in the world. ST has worked<br />

directly with a wide range of<br />

partners to create offerings<br />

that specifically accelerate the<br />

development of applications<br />

based on the <strong>STM32</strong> architecture,<br />

thereby minimizing product<br />

development investment and<br />

speeding time-to-market.<br />

The global ecosystem available for<br />

the <strong>STM32</strong> architecture also offers<br />

many cost savings for developers:<br />

〉〉 The open architecture of the<br />

ARM Cortex-M cores gives<br />

developers access to the largest<br />

MCU ecosystem in the world<br />

〉〉 The vast diversity of the<br />

Cortex-M ecosystem ensures<br />

that developers can find<br />

the tools, software, and<br />

middleware they need to<br />

speed the design of nearly any<br />

application<br />

〉〉 With so many OEMs<br />

standardizing on ARM, the<br />

pool of engineers already<br />

experienced with the Cortex-M<br />

architecture is large, making it<br />

easier to find developers. Many<br />

colleges are teaching to the<br />

ARM architecture as well<br />

〉〉 Architectural and tool chain<br />

familiarity reduce the developer<br />

learning curve and speed timeto-market<br />

The ecosystem for the <strong>STM32</strong><br />

consists of a complete range of<br />

tools and software specifically<br />

designed to speed design. The<br />

comprehensive tool chain is<br />

just the beginning. Developers<br />

also have access to the ARM<br />

CMSIS libraries, ST peripheral<br />

drivers, extensive middleware,<br />

and production-ready<br />

application software.<br />

Developer’s Foundation:<br />

The Tool Chain<br />

There are several key<br />

components comprising an<br />

efficient development tool chain:<br />

35


<strong>STM32</strong> <strong>Journal</strong><br />

Compiler: Each compiler offers<br />

different strengths. Key factors<br />

to consider are code size and<br />

performance (typically measured<br />

in Coremarks). In addition,<br />

compilers like IAR Embedded<br />

Workbench automatically use<br />

the <strong>STM32</strong> architecture to<br />

its best advantage, both in<br />

terms of leveraging silicon and<br />

middleware.<br />

Assembler: Assemblers have<br />

become more of a specialty<br />

tool over the years given the<br />

performance and time-to-market<br />

advantages of coding in C.<br />

Today, if an assembler is used<br />

at all, it will likely be to code an<br />

application’s critical control loop.<br />

Linker: The linker pulls<br />

together all the code required<br />

to create the final application.<br />

It needs to supply program<br />

and data information to the<br />

debugger.<br />

Debugger: The debugger is<br />

the tool most frequently used<br />

by developers. It provides<br />

visibility into systems to<br />

verify program operation and<br />

troubleshoot issues.<br />

IDE: The Integrated<br />

Development Environment<br />

(IDE) defines the workspace<br />

and design flow. Some tool<br />

Cortex-M Processor<br />

Debug<br />

Interface<br />

Run Control<br />

Breakpoint<br />

Unit<br />

Memory<br />

Access Unit<br />

Cortex Debug 10-pin or<br />

ARM JTAG 20-pin Connector<br />

chains support the option of<br />

replacing the IDE with one of a<br />

developer’s own choice.<br />

Given how much time a<br />

developer will spend with the<br />

debugger, using a debugger<br />

that is well-integrated with<br />

the compiler and middleware<br />

can substantially simplify<br />

troubleshooting. For example, a<br />

debugger that is RTOS-aware,<br />

USB-aware, TCP/IP-aware,<br />

etc. will facilitate faster problem<br />

Cortex-M<br />

CPU Core<br />

Serial Wire Viewer<br />

JTAG or Serial Wire Debug<br />

identification and resolution.<br />

The usefulness of the debugger,<br />

however, is limited by the ability<br />

of the MCU to provide visibility<br />

into its real-time operation.<br />

MCUs without integrated<br />

debugging capabilities will<br />

require developers to rely<br />

upon outdated and intrusive<br />

debugging techniques such as<br />

instrumenting code that can<br />

interfere materially with run-time<br />

execution.<br />

ETM Instruction Trace<br />

(optional)<br />

ITM Instrumentation<br />

Trace<br />

DWT Data Watchpoint &<br />

Trace Unit<br />

CPU & Interrupt Events<br />

Trace Point<br />

Interface<br />

4-pin Trace<br />

Cortex Debug + ETM<br />

20-pin Connector (optional)<br />

Trace (ETM, ITM, DWT) not available on Cortex-M0<br />

Figure 2 The <strong>STM32</strong> family of MCUs utilizes ARM’s Coresight Debug Technology to provide leading-edge troubleshooting<br />

capabilities including streaming trace, full code coverage testing, timing analysis, and performance profiling.<br />

The <strong>STM32</strong> family of MCUs<br />

utilizes ARM’s Coresight Debug<br />

Technology to provide leadingedge<br />

troubleshooting capabilities<br />

(see Figure 2). For example, with<br />

streaming trace, developers can<br />

capture trace data limited only<br />

by the size of a PC’s hard drive.<br />

This enables full code coverage<br />

testing, timing analysis, and<br />

performance profiling.<br />

Because Coresight is<br />

implemented in hardware, it<br />

36


<strong>STM32</strong> <strong>Journal</strong><br />

CMSIS provides a consistent software<br />

interface that simplifies software reuse across<br />

the entire <strong>STM32</strong> product line of 300+ devices.<br />

Each of the CMSIS libraries reduces the<br />

learning curve behind using new software and<br />

tools to significantly accelerate design and<br />

lower development costs.<br />

Figure 3 The I-jet in-circuit debugging probe from IAR Systems offers seamless integration<br />

with IAR Embedded Workbench and provides a wide range of real-time debugging<br />

capabilities, including power debugging with ~200 µA resolution at 200 kHz, for finetuning<br />

performance while achieving the highest power efficiency.<br />

imposes no software overhead<br />

on the CPU and requires no<br />

external emulation hardware.<br />

In addition, breakpoints can be<br />

set and variable values changed<br />

while the system is running.<br />

Extensive trace records<br />

capture the program counter<br />

and read/write accesses while<br />

noting time delays and lost<br />

cycles. Statistical information<br />

about exceptions, interrupts,<br />

and events is also collected.<br />

Signals the MCU is processing<br />

can be monitored graphically,<br />

and Coresight provides<br />

instrumented trace capabilities<br />

enabling developers to write<br />

data to specific memory<br />

locations at run-time.<br />

With proprietary architectures,<br />

typically only one or two debug<br />

platforms are available. With<br />

<strong>STM32</strong> MCUs, developers can<br />

chose from more than a dozen<br />

vendors. In addition, the options<br />

include low-end tools for those<br />

with simple applications and<br />

tight budgets as well as high-end<br />

tools that provide sophisticated<br />

capabilities for accelerating<br />

development and simplifying<br />

verification and certification<br />

processes.<br />

For example, IAR Systems offers<br />

its new I-jet in-circuit debugging<br />

probe for use with the <strong>STM32</strong><br />

architecture (see Figure 3). It<br />

offers seamless integration with<br />

IAR Embedded Workbench and<br />

provides a wide range of realtime<br />

debugging capabilities. The<br />

I-jet is also capable of measuring<br />

target power consumption with<br />

~200 µA resolution at 200 kHz.<br />

This enables developers to<br />

debug applications in terms of<br />

power to fine-tune performance<br />

while achieving the highest<br />

power efficiency.<br />

CMSIS:<br />

Faster Time-to-Market<br />

Part of the value of the <strong>STM32</strong><br />

architecture is the Cortex<br />

Microcontroller Software<br />

Interface Standard (CMSIS).<br />

Created by ARM, CMSIS<br />

provides a consistent software<br />

interface that simplifies<br />

software reuse across the<br />

entire <strong>STM32</strong> product line of<br />

300+ devices. Each of the<br />

CMSIS libraries reduces the<br />

learning curve behind using<br />

new software and tools to<br />

significantly accelerate design<br />

and lower development costs.<br />

Development tools and<br />

middleware that are CMSIScompliant<br />

can accelerate<br />

product design in multiple<br />

ways. For example, developers<br />

can create a proof-of-concept<br />

using a high-performance<br />

<strong>STM32</strong> MCU with the maximum<br />

memory. This provides full<br />

flexibility during early design<br />

when the core product is still<br />

being defined. Once the design<br />

is proven, CMSIS makes<br />

it easier for designs to be<br />

transparently implemented on a<br />

different, more optimal <strong>STM32</strong><br />

MCU, even one based on a<br />

different Cortex-M core.<br />

37


<strong>STM32</strong> <strong>Journal</strong><br />

CMSIS comprises several parts<br />

(see Figure 4):<br />

〉〉 CMSIS-CORE is a standard API<br />

for all Cortex-M-based devices<br />

and abstracts key functionality<br />

〉〉 CMSIS-RTOS provides an API<br />

for RTOS integration with other<br />

tools and middleware<br />

〉〉 CMSIS-DSP is a collection of<br />

61 digital signal processing<br />

functions that take advantage<br />

of the <strong>STM32</strong>’s integrated<br />

floating-point and DSP<br />

capabilities. The library is<br />

designed for block processing<br />

to reduce interrupt overhead,<br />

optimize DMA utilization, and<br />

facilitate easy integration with<br />

an RTOS or kernel<br />

〉〉 CMSIS-SVD provides a<br />

System View Description for all<br />

peripherals to abstract actual<br />

peripheral implementations<br />

from application code and<br />

enable peripheral-awareness<br />

for debuggers<br />

With the release of CMSIS 3.0,<br />

ARM is standardizing on an<br />

RTOS API that makes it easier<br />

for middleware components from<br />

different vendors to interoperate.<br />

This will facilitate the propagation<br />

of application-specific<br />

components, such as a TCP/<br />

IP stack, as well as enable the<br />

USER<br />

CMSIS<br />

MCU<br />

CMSIS-DSP<br />

DSP-Library<br />

CMSIS-CORE<br />

SIMD<br />

Cortex-M4<br />

Cortex<br />

CPU<br />

development tool chain to be<br />

RTOS-aware.<br />

The CMSIS DSP library gives<br />

developers a substantial head<br />

start on the development of<br />

complex algorithms. It also<br />

allows developers to create<br />

complex application code that<br />

can be carried between MCUs<br />

Application Code<br />

CMSIS-RTOS<br />

API<br />

Real Time Kernel<br />

(3 rd Party)<br />

Core Peripheral Functions<br />

Peripheral Register & Interrupt Vector Definitions<br />

SysTick<br />

RTOS Kernel<br />

Timer<br />

NVIC<br />

Nested Vectored<br />

Interrupt Controller<br />

DEBUG<br />

+ Trace<br />

that may not have hardwarebased<br />

DSP instructions. This<br />

means that developers can write<br />

code for the <strong>STM32</strong> F2 and,<br />

when they compile it for the<br />

<strong>STM32</strong> F4, automatically get the<br />

full advantage of the <strong>STM32</strong> F4’s<br />

DSP capabilities.<br />

Device Peripheral<br />

Functions<br />

(Silicon Vendor)<br />

Other<br />

Peripherals<br />

Debugger<br />

(3 rd Party)<br />

System View<br />

Description (XML)<br />

CMSIS-<br />

SVD<br />

Figure 4 The ARM Cortex Microcontroller Software Interface Standard (CMSIS) provides a consistent software interface that<br />

accelerates product development as well as simplifies software reuse across the entire <strong>STM32</strong> product line of 300+ devices.<br />

Peripheral<br />

View<br />

Accelerated Peripheral<br />

Configuration and<br />

Driver Design<br />

One time-consuming part<br />

of product development is<br />

configuring an MCU’s peripherals<br />

and then writing drivers for<br />

the application to utilize them.<br />

To facilitate faster application<br />

38


<strong>STM32</strong> <strong>Journal</strong><br />

design, ST supplies a complete<br />

peripheral library for use with all<br />

<strong>STM32</strong> MCUs. Instead of writing<br />

to peripherals directly and locking<br />

code to a particular device,<br />

developers use APIs that abstract<br />

the use of peripherals. This<br />

approach offers several benefits:<br />

〉〉 Faster time-to-market as<br />

developers do not need to<br />

write peripherals drivers or<br />

debug them<br />

〉〉 More reliable code since the<br />

peripheral drivers supplied are<br />

mature and industry-proven<br />

〉〉 Easy migration between<br />

<strong>STM32</strong> devices, even ones<br />

based on different Cortex-M<br />

cores. Because the peripherals,<br />

pin-outs, and code are the<br />

same or similar among the<br />

<strong>STM32</strong> family, migrating<br />

between devices really is just a<br />

recompile<br />

〉〉 Simple customization as C<br />

source code is provided for all<br />

peripherals<br />

Middleware: Production-<br />

Ready Software Solutions<br />

One of the factors that<br />

substantially accelerates product<br />

development and time-tomarket<br />

is the extensive range<br />

of middleware available for the<br />

<strong>STM32</strong> architecture, including:<br />

〉〉 Real-time kernels and<br />

operating systems (RTOS)<br />

〉〉 Stacks for all major<br />

communications protocols,<br />

including USB, TCP/IP, and<br />

Bluetooth<br />

〉〉 Advanced design and modeling<br />

tools such as those used for<br />

signal processing<br />

〉〉 Display/GUI solutions<br />

〉〉 Touch sensing<br />

〉〉 Java<br />

The <strong>STM32</strong> architecture has<br />

wide industry support, with<br />

ST’s partners providing a<br />

multitude of optimized solutions<br />

for a diversity of applications,<br />

including audio, industrial, and<br />

motor control, to name a few.<br />

For example, developers have<br />

the option of selecting a fullfeatured<br />

RTOS for applications<br />

needing high reliability or a more<br />

simple kernel for a low-cost<br />

consumer device.<br />

To accelerate design,<br />

however, the development<br />

tool chain needs to be able<br />

to integrate well with<br />

middleware offerings. IAR<br />

Embedded Workbench, for<br />

example, has hooks into the<br />

CSpy software debugger that<br />

enables RTOS vendors to<br />

create drivers that make the<br />

debugger kernel-aware.<br />

Effectively, this approach<br />

maximizes the flow of information<br />

available to developers to<br />

speed problem resolution.<br />

Since the kernel/RTOS is the<br />

primary interface between the<br />

application and middleware such<br />

as communications stacks, it<br />

is important that the debugger<br />

understand how middleware is<br />

being managed by the system.<br />

When a debugger is RTOS-aware<br />

and USB-aware, for example, it<br />

can show developers directly into<br />

frame buffers and display protocol<br />

information in its native format.<br />

Without visibility into the RTOS,<br />

developers would have to<br />

manually collect this information<br />

themselves, introducing<br />

potential error and delay to the<br />

design process. Rather than<br />

having to manually resolve<br />

headers to find the payload, the<br />

debugger can interpret packets<br />

and present them in a format<br />

that allows developers to debug<br />

at the link level. Complete<br />

visibility into the system also<br />

enables developers to more<br />

thoroughly verify program<br />

execution for both ensuring<br />

reliable operation and for<br />

certification processes.<br />

One of the factors that substantially<br />

accelerates product development and time-tomarket<br />

is the extensive range of middleware<br />

available for the <strong>STM32</strong> architecture.<br />

IAR Embedded Workbench<br />

directly supports many<br />

middleware companies<br />

and nearly every hardware<br />

debugger. This enables<br />

developers not only to<br />

seamlessly take advantage<br />

of components from different<br />

companies but also select<br />

best-in-class components<br />

for their applications. IAR<br />

Systems is the world’s oldest<br />

embedded C compiler company<br />

and understands the needs<br />

of developers to be able to<br />

choose the elements of the<br />

ARM ecosystem that are best<br />

for their application.<br />

39


<strong>STM32</strong> <strong>Journal</strong><br />

Alternatively, Keil’s MDK-ARM<br />

development system offers a<br />

comprehensive approach to<br />

embedded design by bringing<br />

together an integrated set of<br />

development tools, middleware,<br />

and debug hardware.<br />

MDK-ARM is available in<br />

four offerings, from Lite to<br />

Professional, to match the<br />

varying needs of development<br />

teams (see Figure 5). Some of<br />

its components include:<br />

〉〉 Version 5 of the MDK compiler<br />

has recently been released<br />

and offers best-in-class<br />

performance with full support<br />

for the CMSIS libraries<br />

〉〉 The Keil µVision IDE provides<br />

a wide range of features from<br />

trace view with source code<br />

synchronization and editing to<br />

detailed peripheral debugging<br />

to comprehensive project<br />

management<br />

〉〉 RTX is a full-featured RTOS<br />

supporting multiple scheduling<br />

options, low interrupt latency,<br />

unlimited tasks, and a memory<br />

footprint of less than 5 KB. Full<br />

source code is available<br />

〉〉 Extensive middleware libraries<br />

optimized for the Cortex-M<br />

architecture enable developers<br />

to quickly add support for<br />

CAN, USB Host/Device, TCP/<br />

IP, GUI development, and<br />

file system management<br />

to embedded applications.<br />

Intuitive configuration wizards<br />

simplify the use of middleware<br />

〉〉 The ULINK2 and ULINKpro<br />

debuggers support high-speed<br />

streaming trace for nonintrusive,<br />

real-time visibility into<br />

embedded systems<br />

〉〉 ETM Trace enables full<br />

code coverage testing and<br />

performance analysis<br />

Frameworks and Libraries<br />

Part of the <strong>STM32</strong> value<br />

proposition is the availability of<br />

a tremendous amount of offthe-shelf<br />

software. Because<br />

of the huge volumes of ARM<br />

processors shipped, even niche<br />

markets are large enough to<br />

incent third parties to create<br />

specialized tools. Since the<br />

market can support software that<br />

is highly application-specific,<br />

developers have a greater<br />

chance of finding exactly what<br />

they need already available.<br />

Some of the application-specific<br />

libraries offered for the <strong>STM32</strong><br />

include audio, motor control, and<br />

industrial control.<br />

Using a library with an <strong>STM32</strong><br />

MCU is a straightforward process:<br />

ARM C/C++ Compiler<br />

CAN Interface<br />

USB Host<br />

TCP/IP Networking Suite<br />

just include it into the application<br />

workspace. Depending upon the<br />

compiler, each function will be<br />

recognized as a keyword and<br />

right-clicking will provide fast<br />

access to the function’s definition<br />

and parameters. Many compilers<br />

and linkers will also automatically<br />

pare out those parts of the library<br />

you are not using to conserve<br />

code space. Many libraries are<br />

provided as binary files created<br />

for a specific <strong>STM32</strong> MCU and<br />

tool chain. Others are offered with<br />

source code to enable developers<br />

to customize code.<br />

MDK-ARM Professional<br />

µVision<br />

Project Manager, Editor & Debugger<br />

RTX Real-Time Operating System<br />

File System<br />

USB Device<br />

GUI Library<br />

Figure 5 Keil’s MDK-ARM development system offers a comprehensive approach to<br />

embedded design by bringing together an integrated set of development tools,<br />

middleware, and debug hardware to match the varying needs of development teams.<br />

The availability of so much<br />

production-ready software<br />

accelerates design by enabling<br />

developers to work with<br />

code that provides the base<br />

functionality for their application.<br />

Rather than having to learn<br />

the <strong>STM32</strong> architecture from<br />

scratch, developers can let the<br />

tool chain abstract low-level<br />

implementation details.<br />

For example, IAR Embedded<br />

Workbench offers over 3000<br />

example projects. With these,<br />

developers can load a project<br />

onto an evaluation board such<br />

40


<strong>STM32</strong> <strong>Journal</strong><br />

as the <strong>STM32</strong> F4 evaluation<br />

board and see how a complete<br />

system has been implemented.<br />

These projects provide a working<br />

framework for a wide variety of<br />

applications ranging from simply<br />

blinking an LED to operating<br />

the LCD screen to creating a<br />

lightweight IP stack to implement<br />

a web server. Effectively, these<br />

projects give developers a<br />

template upon which to base their<br />

own design. The startup code<br />

gets developers into their design<br />

immediately, rather than forcing<br />

them to spend time figuring out<br />

how to initialize the MCU.<br />

Combined with the CMSIS<br />

peripheral library, many of the<br />

projects can be transferred<br />

between devices with minimal<br />

reworking. Thus, developers can<br />

take projects built for another<br />

MCU altogether and quickly pull<br />

the core of the application over to<br />

the <strong>STM32</strong> device of their choice.<br />

The 80%+ Advantage<br />

With the extensive variety of<br />

software, middleware, and<br />

reference designs available,<br />

developers can expect to<br />

substantially jumpstart their<br />

designs. Depending upon the<br />

application, it is not uncommon<br />

to find a combination of off-theshelf<br />

software and middleware<br />

within the <strong>STM32</strong> ecosystem that<br />

provides 80% or more of the code<br />

required for a design. This enables<br />

developers to focus their design<br />

efforts on the last 20%, leading to<br />

faster time-to-market as well as a<br />

highly differentiated product.<br />

This 80% figure is not unrealistic.<br />

For example, with the availability<br />

of off-the-shelf USB and TCP/<br />

IP stacks, a communications link<br />

truly becomes a component that<br />

can easily be added to a system<br />

rather than a subsystem that has<br />

to be designed and fine-tuned. In<br />

addition, there are many support<br />

resources available to assist<br />

developers in customizing code for<br />

applications that have a specific<br />

need. ST’s support engineers and<br />

Field Application Engineers (FAE),<br />

for example, can provide valuable<br />

insight in overcoming a variety of<br />

design challenges.<br />

With the rise of social<br />

networking, peer support has<br />

become another important<br />

element of an MCU architecture.<br />

The size of the ARM community<br />

is tremendous, and there are<br />

a variety of forums where<br />

engineers can provide peer-topeer<br />

support. This support can<br />

be product-based, such as the<br />

support communities hosted<br />

by Keil, IAR Systems, and ST<br />

for their own products. Free<br />

software, such as FreeRTOS,<br />

tends to have its own support<br />

network as well.<br />

Many independent forums have<br />

also come into being in the<br />

last decade to assist engineers<br />

by hosting discussions about<br />

silicon manufacturers and their<br />

middleware partners. Engineers<br />

can search for specific MCUs<br />

and tools to hear how useful<br />

others in the industry find them.<br />

They can also post issues and<br />

share solutions.<br />

Software the Way<br />

You Want It<br />

ST embraces the benefits that an<br />

extensive ecosystem offers its<br />

customers. By giving developers<br />

the choice of software developed<br />

by ST, by a commercial thirdparty,<br />

or through open source,<br />

ST ensures that developers<br />

have the broadest variety of<br />

options available to match their<br />

application needs.<br />

For example, with its<br />

cryptographic library, ST<br />

provides developers with<br />

security capabilities free-ofcharge<br />

that have been 100%<br />

optimized for the <strong>STM32</strong><br />

architecture. ST also offers a<br />

number of other applicationspecific<br />

libraries,including<br />

USB, motor control, and<br />

HDMI Consumer Electronics<br />

Communication (CEC).<br />

ST has focused on working with<br />

numerous partners to ensure that<br />

there are many solutions available<br />

for the <strong>STM32</strong> architecture. Their<br />

goal is that developers should be<br />

able to find whatever software<br />

and tools they need and, in many<br />

cases, have a choice between<br />

several vendors as well as<br />

between open and commercial<br />

versions. For example,<br />

developers can select from a<br />

range of MP3 codecs ranging<br />

from the open source Helix, ST’s<br />

codec, and commercial codecs<br />

with extensions including mixers,<br />

equalization, etc. This allows<br />

developers to determine for<br />

themselves the balance between<br />

cost, support, efficiency, and<br />

time-to-market.<br />

The <strong>STM32</strong> family of 32-bit<br />

MCUs offers developers the<br />

most extensive and complete<br />

portfolio of Cortex-M-based<br />

devices. Combined with the<br />

comprehensive ecosystem<br />

around the <strong>STM32</strong> architecture,<br />

developers are able to simplify<br />

development while bringing<br />

products to market more quickly<br />

and at the lowest cost.<br />

41


<strong>STM32</strong> <strong>Journal</strong><br />

Introducing a Graphical User Interface<br />

to Your Embedded Application<br />

By Gérard Bouvet, Marketing and Sales Manager, GeeseWare<br />

Dominique Jugnon, Microcontrollers Development Tools Manager, STMicroelectronics<br />

Smart phones and similar<br />

devices have redefined the way<br />

we interact with technology and<br />

created a new set of consumer<br />

expectations as to how a<br />

Graphical User Interface (GUI)<br />

should appear. The complex<br />

layout of Windows that was<br />

thought to define how a GUI<br />

should be has given way to<br />

streamlined interfaces that can<br />

give users access to all of a<br />

system’s functionality on the<br />

smaller displays often used with<br />

handheld devices.<br />

Manufacturers have recognized<br />

that touch-based GUIs can<br />

bring value to a wide range of<br />

embedded applications beyond<br />

consumer electronics, including<br />

industrial automation, appliances,<br />

meters, HVAC, security, access<br />

control, military, automotive,<br />

and infotainment devices. For<br />

example, traditional push-button<br />

interfaces have mechanical<br />

parts which can fail. Moving to a<br />

capacitive touch sensing display<br />

enables not only a more robust<br />

interface but one that offers more<br />

flexibility and extensibility.<br />

32-bit Processing<br />

Adding a GUI to a system,<br />

however, is not like adding a<br />

few more buttons or controls to<br />

a device’s front panel. With the<br />

nearly ubiquitous availability of<br />

touchscreens in mobile handsets,<br />

consumers have come to expect<br />

electronic devices of all types to<br />

have a sophisticated user interface<br />

utilizing 3D objects, perceived<br />

depth, animated transitions,<br />

textures, and complex background<br />

lighting. To create an intuitive<br />

interface that adds value as well<br />

as flair to an application, GUIs also<br />

need to support gestures such<br />

as tap, drag, fling, and slide that<br />

consumers are quickly learning<br />

to consider as a natural aspect of<br />

any touch-based interface.<br />

Applications based on 8- and<br />

16-bit processors simply don’t<br />

have the horsepower to handle<br />

even simple graphics. The<br />

availability of high-performance<br />

MCUs like the <strong>STM32</strong> family<br />

that can provide complete GUI<br />

functionality with capacitive<br />

touch sensing on top of the<br />

primary application has been<br />

a key enabling factor in the<br />

proliferation of advanced user<br />

interfaces. For example, the<br />

<strong>STM32</strong>-F0 provides 32-bit<br />

processing at 8- and 16-bit<br />

pricing. For more graphicsintensive<br />

applications, the<br />

<strong>STM32</strong>-F2 and <strong>STM32</strong>-F4<br />

provide sufficient Flash to store<br />

large graphic images. With<br />

devices up to 168 MHz/210<br />

DMIPs, the <strong>STM32</strong> family is<br />

also fast enough to provide the<br />

responsiveness consumers are<br />

used to from GUI-based devices.<br />

However, as the cost of<br />

hardware has dropped, software<br />

complexity has continued to<br />

increase. In fact, application<br />

software has become the leading<br />

development cost in embedded<br />

systems, both in terms of<br />

money and time-to-market.<br />

To maintain their competitive<br />

edge, companies need to be<br />

able to introduce complex<br />

GUI functionality while tightly<br />

controlling software development<br />

cost. Achieving this goal requires<br />

access to a GUI framework,<br />

the ability to define the look<br />

and feel in Java rather than C,<br />

rapid prototyping capabilities<br />

to enable consumers to provide<br />

feedback while target hardware<br />

is still being designed, and tools<br />

optimized for the tight memory<br />

and processing constraints of the<br />

typical embedded application.<br />

GUI Framework<br />

There are two primary phases<br />

to designing a GUI. The first is<br />

the creation of the underlying<br />

software code that provides<br />

basic UI functionality. Once this<br />

GUI framework is in place, the<br />

look-and-feel of the GUI must be<br />

designed. To keep system cost<br />

42


<strong>STM32</strong> <strong>Journal</strong><br />

down, developers must minimize<br />

the expense for both of these<br />

processes as well as eliminate<br />

any unnecessary design delays.<br />

In the past, UIs for embedded<br />

systems were designed<br />

specifically for the hardware on<br />

which they were going to be<br />

run. With increasing pressure to<br />

shorten product design cycles, IP<br />

Java Application<br />

Java Class/API<br />

Wrapper C Java<br />

API C<br />

Driver C<br />

External<br />

Port<br />

Figure 1 Abstracting the hardware through a<br />

Hardware Abstraction Layer (HAL)<br />

frees the GUI framework from<br />

handling specific low-level details,<br />

leading to faster code development<br />

and greater reusability.<br />

reuse has become an important<br />

consideration in UI design.<br />

Ideally, developers need to be<br />

able to carry a UI across different<br />

products using MCUs that may<br />

be from different families as well.<br />

To achieve this, GUI application<br />

code is abstracted above<br />

the hardware. A Hardware<br />

Abstraction Layer (HAL) handles<br />

specific low-level details such<br />

as how graphical data is stored<br />

in memory and transferred to<br />

the display (see Figure 1). By<br />

interacting with the HAL using<br />

APIs, the GUI application code<br />

becomes a framework that can<br />

be ported across an MCU family<br />

with minimal rewriting required.<br />

Creating an extensible GUI<br />

framework is an involved<br />

process. The HAL enables<br />

developers to build the<br />

framework using a language<br />

other than assembly, leading to<br />

faster code development and<br />

greater reusability. Designing the<br />

framework using C, however,<br />

can still require substantial<br />

development resources.<br />

Ideally, rather than design<br />

a framework from scratch,<br />

developers can use off-the-shelf<br />

software to minimize development<br />

investment. With the right tools,<br />

the GUI design cycle can be<br />

shortened from months to weeks.<br />

GWStudio from GeeseWare,<br />

for example, is a Java Framework<br />

with a wide array of preexisting<br />

GUI libraries providing a complete<br />

human-machine interface (HMI)<br />

development environment. Its Java<br />

engine based on IS2T MicroEJ ®<br />

technology is specifically optimized<br />

for embedded applications that<br />

have limited memory, constrained<br />

peripherals, restricted network<br />

connectivity, and low power<br />

consumption requirements.<br />

Java brings many advantages to<br />

GUI-based design compared to<br />

working in C. Java was designed<br />

to facilitate GUI creation with an<br />

emphasis on reuse. In addition, its<br />

flexibility ensures a simple revision<br />

process that enables developers<br />

to quickly implement changes<br />

to existing designs. With the<br />

availability of Java for embedded<br />

applications, developers can<br />

leverage the benefits of Java in<br />

many applications:<br />

〉〉 Improved code portability<br />

and reuse<br />

〉〉 Accelerated development up<br />

to 3-5 times faster than<br />

working in C<br />

〉〉 Equivalent performance to<br />

C-based designs; i.e., less<br />

than 1 ms responsiveness for<br />

machine-to-machine (M2M)<br />

processes (i.e., Ethernet) or<br />

touchscreen latency<br />

〉〉 Greater functionality in a<br />

smaller footprint<br />

〉〉 Higher reliability and<br />

robustness by eliminating<br />

manual management of<br />

memory and exceptions<br />

〉〉 Operating system<br />

independence<br />

〉〉 Large development community<br />

Note that even though the GUI is<br />

written in Java, the main application<br />

can be based on C. This enables<br />

developers to introduce a GUI to an<br />

existing design without having to<br />

rewrite the application.<br />

Even with the framework in<br />

place, however, only half the job<br />

is done. Now the look and feel of<br />

the GUI needs to be designed.<br />

Intuitive Look and Feel<br />

Developing an effective GUI can<br />

be one of the most challenging<br />

aspects of system design. GUI<br />

design involves much more<br />

than simply arranging icons on<br />

a screen. To be intuitive, a user<br />

interface has to anticipate how a<br />

variety of different types of people<br />

are going to use the device.<br />

However, it can be very difficult<br />

to know how an end-customer<br />

is going to use a device from<br />

43


<strong>STM32</strong> <strong>Journal</strong><br />

a developer’s isolated position<br />

behind his or her lab bench.<br />

For example, while it may be<br />

logical to a developer to group<br />

functions based on what part of<br />

the system they impact, users<br />

will interact with devices based<br />

on what they want it to do. If the<br />

function a person wants to use<br />

most frequently is buried under<br />

a series of icons, the overall<br />

experience will be frustrating. The<br />

UI has become the core factor<br />

determining the user experience.<br />

In today’s market where<br />

consumers have become quite<br />

sophisticated, a poorly designed<br />

GUI can mean failure for a product<br />

regardless of its other qualities.<br />

The reality is that developers<br />

don’t always know how<br />

perspective users will try to<br />

interact with a system. Ideally,<br />

the hardware should not come<br />

between users and how they<br />

want to use the device. In<br />

addition, to keep the interface<br />

from being cluttered and difficult<br />

to navigate, the screen needs<br />

to display as little information<br />

as possible while still displaying<br />

enough data to allow the user<br />

to easily and quickly make<br />

decisions. Tappable UI objects/<br />

elements must also be of a<br />

minimum size but yet also<br />

comfortable to select.<br />

The placement of icons and<br />

ordering of GUI elements is, at<br />

least at first, a fairly arbitrary<br />

process. However, a slider may<br />

end up being in an inconvenient<br />

location or be improperly sized<br />

for reasons that couldn’t be<br />

predicted during initial design<br />

stages. This won’t be clear until<br />

users are actually given a chance<br />

to use the interface.<br />

Designing an effective GUI<br />

involves many such intangible<br />

considerations that require direct<br />

feedback. With limited screen real<br />

estate, the UI must be selective<br />

and display only content that is<br />

relevant to the choices a user<br />

is currently considering. The<br />

application’s main function should<br />

be accessible quickly upon startup<br />

and always simple to return to.<br />

The final test for an intuitive GUI<br />

is that it must be obvious to users<br />

how to use it efficiently without a<br />

steep learning curve or more than<br />

a few minutes of training.<br />

It may take extensive testing<br />

with users for developers to<br />

understand how the GUI should<br />

be laid out. This likely won’t<br />

happen after bringing in just a<br />

single focus group but rather<br />

will involve many rounds that<br />

iteratively improve the ease-ofuse<br />

of the interface. The design<br />

schedule needs to take into<br />

44


<strong>STM32</strong> <strong>Journal</strong><br />

consideration that the GUI may<br />

need to be redesigned several<br />

times. The sooner accurate user<br />

feedback can be incorporated<br />

into the GUI design, the more<br />

confidence developers can<br />

have that major changes will<br />

not be required after significant<br />

engineering resources have<br />

been invested in implementing<br />

the design.<br />

GUI Testing<br />

The iterative aspect of GUI design<br />

is an important consideration when<br />

selecting a GUI toolset. The speed<br />

and ease with which developers<br />

can modify an existing GUI will<br />

determine how many design<br />

iterations the schedule will allow<br />

and, consequently, how well the GUI<br />

will capture actual user behaviors.<br />

Any testing process needs to<br />

enable stakeholders and endusers<br />

to provide timely input<br />

into GUI design, preferably<br />

as early in the design cycle<br />

as possible. This means that<br />

developers need access to rapid<br />

prototyping capabilities before<br />

target hardware is available. The<br />

testing process should facilitate<br />

the capture of user behaviors<br />

and generation of use cases.<br />

To achieve this, GUI tools must<br />

accelerate design to shorten the<br />

time between test iterations.<br />

Traditionally, developers have<br />

created simulated environments<br />

for users to test. Often these<br />

“wireframe” simulators are<br />

independent tools that allow<br />

developers to put together a<br />

GUI but not necessarily one that<br />

accurately reflects how the final<br />

product will look or operate. For<br />

example, because the simulator<br />

is running on a high-speed<br />

PC, screen updates can be<br />

near instantaneous. Unless the<br />

simulator is able to emulate the<br />

MCU that will be used in the actual<br />

product, developers will be unable<br />

to verify whether the system is<br />

responsive enough to satisfy users.<br />

In fact, feedback from such testing<br />

may actually mislead developers<br />

and result in launch delays.<br />

To ensure that the simulator<br />

matches how the interface will<br />

operate in production hardware<br />

as accurately as possible, the<br />

simulator needs to emulate the<br />

operation of the target MCU.<br />

Developing an embedded GUI<br />

on a PC capable of accurate<br />

emulation offers several benefits<br />

to developers (see Figure 2). In<br />

addition to speeding testing by<br />

eliminating the need to download<br />

new firmware to a target, the<br />

simulator provides several<br />

analysis capabilities to facilitate<br />

optimization and debugging:<br />

Figure 2 GUI design tools like the IS2T MicroEJ® simulator that is part of GWPack from<br />

GeeseWare emulate the target MCU to ensure consumer testing matches the<br />

operation of the production GUI as closely as possible.<br />

〉〉 Static and run-time analysis of<br />

timing and memory footprint<br />

〉〉 Functional code coverage<br />

〉〉 Task profiling and scheduling<br />

Ideally, testers need to have<br />

as realistic an experience as<br />

possible. This means working<br />

on a small screen the same size<br />

as the one that will be used for<br />

production. When running on a<br />

PC, the user may need to use<br />

a mouse rather than be able to<br />

touch the device screen. Even if<br />

a touchscreen is available, it is<br />

likely the wrong size or a monitor<br />

that the user cannot hold in his<br />

or her hand.<br />

To enable realistic testing,<br />

GeeseWare offers its GWPack<br />

development system. The<br />

GWPack includes a standalone<br />

and small form factor board<br />

based on either the <strong>STM32</strong>-F2<br />

or <strong>STM32</strong>-F4 that is prequalified<br />

and production-ready. With a 4.3”<br />

resistive or capacitive touchscreen,<br />

10 ms response time, and access<br />

in Java to all of the <strong>STM32</strong><br />

architectures peripherals and<br />

interfaces of the pack, the GWPack<br />

gives developers a fully operational<br />

Java platform upon which to carry<br />

applications from proof-of-concept<br />

to production faster than has been<br />

possible before.<br />

45


<strong>STM32</strong> <strong>Journal</strong><br />

A complete framework is<br />

provided as part of GWPack<br />

that allows developers to dive<br />

immediately into interface<br />

development rather than writing<br />

low-level drivers and managing<br />

system resources to develop<br />

the Board Support Package<br />

(BSP). The IS2T MicroEJ ®<br />

simulator for GeeseWare<br />

GWPack further allows<br />

developers to simulate their<br />

designs on a PC while emulating<br />

the operation of an <strong>STM32</strong> MCU.<br />

Key features include:<br />

〉〉 Built-in display tree and events<br />

management<br />

〉〉 Window and clickable area<br />

definition and management<br />

〉〉 Support for single and twofinger<br />

gestures<br />

〉〉 Driving of low-level events to<br />

closest tree nodes<br />

〉〉 Sub-level display and event<br />

management at the node level<br />

〉〉 Fully customizable<br />

〉〉 Support for TCP/IP, UDP, and<br />

HTTP via the Java framework<br />

〉〉 The ability to reference all<br />

<strong>STM32</strong> peripherals directly<br />

in Java<br />

Because events are object<br />

dependent, developers can easily<br />

define—and redefine—interface<br />

operation, thus accelerating the<br />

ability to implement feedback<br />

into designs.<br />

With the GWPack, developers<br />

can begin GUI development<br />

immediately with a low initial<br />

investment. The standalone board<br />

is also available in preproduction<br />

volumes to enable manufacturers<br />

to quickly mock up a large<br />

number of systems that can be<br />

put out into the field to capture<br />

customer feedback while target<br />

hardware is still be defined and<br />

built. Finally, for applications with<br />

volumes under 10,000 production<br />

units, the board is a costeffective,<br />

off-the-shelf alternative<br />

to designing your own.<br />

For volume applications,<br />

GWPack enables<br />

manufacturers to complete a<br />

proof-of-design before investing<br />

in implementing a design in<br />

hardware and software. This<br />

ability to design the GUI in parallel<br />

with hardware can substantially<br />

reduce time-to-market.<br />

Critical Design<br />

Constraints<br />

Applications like industrial control,<br />

metering, home automation,<br />

medical, and smart energy don’t<br />

have the memory or processing<br />

capabilities of smart phones. As<br />

a consequence, they need a GUI<br />

that is specifically designed to<br />

minimize processing requirements<br />

and memory footprint. For<br />

example, to implement a GUI in<br />

an embedded Linux environment<br />

can be quite expensive: 32 MB<br />

RAM + 8 MB Flash for the Linux<br />

OS plus graphics libraries push<br />

Flash requirements up to 50-60<br />

MB, adding on the order of $100<br />

to system cost.<br />

The framework supplied with<br />

GWPack keeps system<br />

memory requirements under<br />

tight control: a full GUI can<br />

fit in just 128 KB RAM and<br />

1 MB Flash. In addition, the<br />

framework does not require<br />

a real-time operating system<br />

(RTOS) or kernel to operate. For<br />

applications with more intensive<br />

graphical requirements, more<br />

Flash may be required.<br />

The <strong>STM32</strong> architecture, with<br />

its 32-bit bus, DSP and FPU<br />

capabilities, and multi-layer bus<br />

fabric supporting simultaneous<br />

data transfers provides more<br />

than enough processing capacity<br />

to handle the GUI, application,<br />

drivers, touchscreen, and<br />

communication ports all with<br />

a single chip. With its broad<br />

portfolio of MCUs based on the<br />

ARM Cortex-M0, M3, and M4<br />

cores, the <strong>STM32</strong> architecture<br />

enables developers to select<br />

a processor with the optimal<br />

blend of performance, memory,<br />

and peripherals for their<br />

application. ST also provides a<br />

variety of tools to accelerate the<br />

development of general-purpose<br />

Java-based systems with the ST<br />

Evaluation Kit.<br />

Touch-based GUIs can bring<br />

substantial value to a wide range<br />

of applications by bringing the<br />

look and feel of products up to<br />

date while also by capturing more<br />

value through the user interface.<br />

In addition, they can improve<br />

system robustness by eliminating<br />

mechanical components while<br />

increasing device flexibility<br />

to modify the UI over time to<br />

introduce new features without<br />

redesigning the enclosure.<br />

With the <strong>STM32</strong> family of MCUs<br />

and GUI design tools such<br />

as those from GeeseWare,<br />

manufacturers are able to<br />

cost-effectively introduce<br />

next-generation GUIs to new<br />

designs as well as to legacy<br />

products built on 8-, 16-, and<br />

32-bit MCUs. With the ability to<br />

design a fully-functional system<br />

in weeks instead of months,<br />

developers can accelerate GUI<br />

design to enable a dramatic<br />

reduction in time-to-market.<br />

46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!