STM32 Journal - Digikey
STM32 Journal - Digikey
STM32 Journal - Digikey
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>STM32</strong> <strong>Journal</strong><br />
Volume 1, Issue 2<br />
In this Issue:<br />
〉〉 Bringing 32-bit Performance to 8- and 16-bit Applications<br />
〉〉 Developing High-Quality Audio for Consumer Electronics Applications<br />
〉〉 Bringing Floating-Point Performance and Precision to Embedded Applications<br />
〉〉 Achieving Ultra-Low-Power Efficiency for Portable Medical Devices<br />
〉〉 Accelerating Time-to-Market Through the ARM Cortex-M Ecosystem<br />
〉〉 Introducing a Graphical User Interface to Your Embedded Application
<strong>STM32</strong> <strong>Journal</strong><br />
<strong>STM32</strong> <strong>Journal</strong><br />
Volume 2, Issue 2<br />
We Are Not Alone<br />
By Nicholas Cravotta, Technical Editor<br />
Table of Contents<br />
2 Editorial<br />
3<br />
11<br />
19<br />
26<br />
34<br />
42<br />
Bringing 32-bit<br />
Performance to 8- and<br />
16-bit Applications<br />
Developing High-<br />
Quality Audio for<br />
Consumer Electronics<br />
Applications<br />
Bringing Floating-<br />
Point Performance and<br />
Precision to Embedded<br />
Applications<br />
Achieving Ultra-Low-<br />
Power Efficiency for<br />
Portable Medical<br />
Devices<br />
Accelerating Timeto-Market<br />
Through<br />
the ARM Cortex-M<br />
Ecosystem<br />
Introducing a Graphical<br />
User Interface to Your<br />
Embedded Application<br />
When I got my first paycheck as<br />
an engineer nearly three decades<br />
ago, coding and layout weren’t<br />
exactly social activities. While<br />
there was a certain amount of<br />
team collaboration to decide what<br />
I would work on, the majority of<br />
what I did was by myself. When I<br />
decided to shed the ten pounds<br />
I had gained as a freshman, it<br />
was a similar story. I never found<br />
someone willing to go consistently<br />
to the gym with me, so I pressed<br />
those weights alone as well.<br />
It’s quite a different world<br />
today. Take at look at the Nike+<br />
FuelBand on the cover of this<br />
issue’s <strong>STM32</strong> <strong>Journal</strong>. Worn on<br />
your wrist, it records your every<br />
activity, not just when you’re on<br />
the treadmill.<br />
What makes the FuelBand such<br />
a ground-breaking product is<br />
how it brings people together.<br />
It doesn’t matter whether you<br />
work out at 2am or are in a<br />
strange city on travel, with<br />
this next-generation exercise<br />
monitor, you are never alone.<br />
Connected to your phone via<br />
Bluetooth, you can be in touch<br />
with exercise buddies all around<br />
the world through the Nike+<br />
online community.<br />
The Nike+ FuelBand is quite<br />
a feat of engineering. To<br />
differentiate between simple<br />
gestures and active motions<br />
requires complex signal<br />
processing capabilities. The<br />
device must also be constantly<br />
on since even you don’t know<br />
when you might jump into action.<br />
120 LEDs comprise the display<br />
and “Fuel” indicator, and the<br />
device can operate for up to four<br />
full days without recharging. It<br />
also weighs less than one ounce,<br />
including the batteries. Now<br />
that’s an efficient design.<br />
At the heart of the FuelBand is<br />
ST’s ultra-low power <strong>STM32</strong> L1<br />
microcontroller. In addition to<br />
providing the 32-bit performance<br />
and processing capacity required<br />
for advanced signal processing,<br />
the <strong>STM32</strong> architecture offers the<br />
real-time responsiveness, power<br />
efficiency, and highly integrated<br />
peripherals and memory required<br />
for even the most demanding<br />
embedded applications.<br />
With innovations like FuelBand<br />
and Nike+ technology, Nike has<br />
leveraged social networking to<br />
change the way we live together.<br />
Exercise, as a result, is no longer<br />
a solo endeavor.<br />
Neither, it turns out, is<br />
engineering. The network<br />
supporting the <strong>STM32</strong><br />
architecture enables a whole new<br />
level of collaboration. Design<br />
tools from companies like Keil,<br />
IAR Systems, and Micriµm are<br />
like having a team of experts<br />
sitting right next you. Need<br />
to extend a design by adding<br />
audio or a capacitive touch GUIbased<br />
interface? Just call upon<br />
partners like DSP Concepts and<br />
GeeseWare. And with the <strong>STM32</strong><br />
architecture based on the ARM<br />
Cortex-M0, M3, and M4 cores,<br />
you have access to a global<br />
ecosystem second to none.<br />
You can even ask questions of<br />
your fellow engineers at 2am<br />
or share your own hard-won<br />
experience through forums,<br />
blogs, and tweets.<br />
It truly is a different world we live,<br />
play, exercise, and work in.<br />
2
<strong>STM32</strong> <strong>Journal</strong><br />
Bringing 32-bit Performance<br />
to 8- and 16-bit Applications<br />
By Reinhard Keil, Director of MCU Tools, ARM Germany GmbH<br />
Shawn Prestridge, Senior Field Applications Engineer, IAR Systems<br />
Sean Newton, Field Applications Engineering Manager, STMicroelectronics<br />
Today’s embedded applications<br />
are being called upon to<br />
provide an increasing number<br />
of capabilities. More and more<br />
devices need to be connected,<br />
require greater precision, must<br />
offer a graphics-based interface<br />
with touch capabilities, utilize<br />
sophisticated signal processing,<br />
and support multimedia playback.<br />
In the past, developers were<br />
compelled by cost constraints<br />
to base their designs on 8- and<br />
16-bit architectures that limited<br />
performance. Now, with the<br />
availability of next-generation<br />
MCUs like the <strong>STM32</strong> F0 that<br />
provide 32-bit performance<br />
at 8-bit budget pricing, OEMs<br />
can bring substantial value to<br />
end-users without having to<br />
compromise functionality. In<br />
addition, powerful development<br />
tools like Keil’s MDK-ARM and<br />
IAR Embedded Workbench<br />
enable developers new to 32-<br />
bit programming to immediately<br />
exploit the full capabilities of the<br />
<strong>STM32</strong> F0 architecture.<br />
The 32-bit Advantage<br />
There are several ways in which<br />
the <strong>STM32</strong> F0 lowers product<br />
cost compared to 8- and 16-bitbased<br />
designs. Specifically,<br />
because these MCUs tend to be<br />
based on legacy architectures,<br />
they have many limitations that<br />
slow development by forcing<br />
designers to work around the<br />
architecture, so to speak. For<br />
example, to complete a 16 x 16<br />
multiplication for a processing<br />
algorithm, a 16-bit CPU requires<br />
four multiplies and several<br />
additions, depending upon the<br />
implementation. An 8-bit CPU<br />
would require significantly more<br />
cycles. With the <strong>STM32</strong> F0, this<br />
takes a single instruction.<br />
The result is code that makes<br />
better utilization of MCU<br />
resources, leading to faster<br />
operation, more performance per<br />
MHz, higher code density, and<br />
greater power efficiency. Since<br />
each instruction does more per<br />
clock cycle, applications can be<br />
written using less code. In addition<br />
to accelerating development,<br />
shorter code is easier to debug as<br />
well. Together, all of these benefits<br />
lead to lower system cost.<br />
Cost, however, is only one of<br />
the numerous advantages the<br />
<strong>STM32</strong> F0 has over 8- and 16-<br />
bit architectures. The <strong>STM32</strong><br />
F0 is a full embedded MCU<br />
built using the same <strong>STM32</strong><br />
DNA that the rest of the <strong>STM32</strong><br />
family has, including excellent<br />
real-time performance, DMA,<br />
high-resolution ADC and DAC<br />
peripherals, motor control<br />
timers, and connectivity<br />
interfaces. These integrated<br />
capabilities bring tremendous<br />
efficiency to cost-sensitive<br />
designs in a way that limited<br />
8- and 16-bit MCU architectures<br />
cannot (see Figure 1).<br />
For example, the availability of<br />
a 32-bit bus not only speeds<br />
data transfers and increases<br />
computing performance, it<br />
improves system reliability.<br />
Consider the challenge of reading<br />
a 12-bit DAC using an 8-bit bus<br />
where the CPU has to read the<br />
DAC twice to capture the entire<br />
sample. If an interrupt occurs<br />
between these reads, the DAC<br />
data may be overwritten by the<br />
next sample before the interrupt<br />
is completed and the second<br />
read can be executed. To prevent<br />
this, developers have to manually<br />
disable interrupts for every<br />
such “atomic” operation in an<br />
application. If even one instance<br />
is missed, this creates a potential<br />
for an intermittent error that will<br />
be extremely difficult to resolve.<br />
DMA: Moving Data<br />
Efficiently<br />
The <strong>STM32</strong> F0 is a modern<br />
architecture integrating the<br />
3
<strong>STM32</strong> <strong>Journal</strong><br />
Real-Time<br />
performance<br />
48 MHz/38 DMIPS,<br />
5 channels DMA<br />
mapped on 11 IPs +<br />
Bus Matrix allows Flash<br />
execution in parallel<br />
with DMA transfer<br />
Power<br />
efficiency<br />
5 µA in Stop mode<br />
2 µA in standby mode<br />
0.4 µA Vbat with RTC,<br />
1.8 V or 2...3.6 V supply,<br />
Fast wake-up time<br />
Figure 1 The <strong>STM32</strong> F0 is a full embedded MCU built using the same <strong>STM32</strong> DNA that the rest of the <strong>STM32</strong> family has and offers<br />
tremendous efficiency to cost-sensitive designs in a way that limited 8- and 16-bit MCU architectures cannot.<br />
latest in processing, power,<br />
and debugging technology. For<br />
example, multiple low power<br />
modes extend greater control<br />
over power consumption to<br />
achieve longer operating life for<br />
battery-operated and portable<br />
devices. In addition, the <strong>STM32</strong><br />
F0 offers advanced features,<br />
including full Direct Memory<br />
Access (DMA) and the ability to<br />
shut down the ADC between<br />
samples to further increase<br />
performance while lowering<br />
power consumption.<br />
<strong>STM32</strong> F0 Series: Key Features<br />
Superior and<br />
innovative<br />
peripherals<br />
1 Mbps I 2 C Fast mode+<br />
SPI with 4- to 16-bit data<br />
frame, HDMI CEC,<br />
16-bit 3-phase MC timer<br />
In general, 8-bit MCUs don’t<br />
have the powerful peripherals<br />
that higher performance MCUs<br />
tend to have. For example,<br />
DMA has become an essential<br />
peripheral for applications that<br />
need to move a great deal<br />
of data, whether as part of a<br />
processing algorithm, receiving<br />
data from an interface, playing<br />
back audio, or transferring<br />
graphics to the display. In a<br />
traditional 8-bit architecture,<br />
each word of data has to be<br />
moved by the CPU. In addition,<br />
Maximum<br />
integration<br />
Calendar RTC with<br />
independent supply,<br />
battery-backed RAM,<br />
separate analog supply,<br />
safety<br />
Extensive tools<br />
and software<br />
ARM + ST ecosystem<br />
(eval board, discovery,<br />
SW library)<br />
<strong>STM32</strong> F0 series<br />
L1, F0, F1, F2, F4 series: seamless migration amongst 300 pin-to-pin compatible part #s<br />
pointers need to be updated<br />
and a loop managed. Thus,<br />
every 8-bits of data takes<br />
several cycles of CPU time<br />
to move.<br />
With the DMA in the <strong>STM32</strong><br />
F0, an entire block of data can<br />
be moved without involving<br />
the CPU. After the program<br />
configures the transfer, the<br />
DMA manages moving the data<br />
in the background. In fact, the<br />
CPU can drop into a low power<br />
sleep mode while it waits for<br />
the transfer to complete. As<br />
a result, data transfers do not<br />
consume unnecessary CPU<br />
cycles and require less power to<br />
complete than for 8- and 16-bit<br />
architectures.<br />
The availability of a DMA<br />
controller can also greatly<br />
simplify and accelerate product<br />
development. Consider reading<br />
data off of a high-speed data<br />
interface such as I2C. Because<br />
of the load on the CPU, 8-bit<br />
developers have to work around<br />
the MCU’s architecture, using<br />
many interrupts to utilize the time<br />
between data reads. With the<br />
<strong>STM32</strong> F0, the CPU operates<br />
independently of the interface,<br />
allowing developers to program<br />
the CPU for other tasks without<br />
having to worry about missing an<br />
interrupt or losing data.<br />
Because the <strong>STM32</strong> architecture<br />
uses an internal bus matrix, the<br />
DMA can be used in conjunction<br />
with each of the different onchip<br />
memories as well as many<br />
of the peripherals. For example,<br />
the DMA can be configured to<br />
sample the ADC regularly over<br />
a period of time: a timer triggers<br />
the DMA to read the ADC and<br />
store the result in memory<br />
without involving the CPU.<br />
Once the operation is complete,<br />
the ADC shuts down until the<br />
4
<strong>STM32</strong> <strong>Journal</strong><br />
next sample time. In fact, the<br />
bus matrix combined with a<br />
5-channel DMA enables the<br />
<strong>STM32</strong> F0 to support execution<br />
of code from Flash in parallel<br />
with other memory-memory,<br />
peripheral-memory, or memoryperipheral<br />
DMA transfers.<br />
There are many tools to<br />
assist developers in taking<br />
advantage of the <strong>STM32</strong> F0’s<br />
DMA capabilities without<br />
requiring them to become<br />
DMA experts. The ARM DSP<br />
Cortex Microcontroller Software<br />
Interface Standard (CMSIS)<br />
library, for example, provides<br />
signal processing functionality<br />
that has been optimized for<br />
the <strong>STM32</strong> F0 and takes full<br />
advantage of the DMA.<br />
An intelligent compiler can<br />
also help developers exploit<br />
DMA technology to its fullest<br />
advantage. IAR Embedded<br />
Workbench, for example, offers<br />
a feature that will automatically<br />
rearrange program data to<br />
maximize the use of the DMA.<br />
This enables developers to<br />
achieve high efficiency without<br />
having to put much forethought<br />
into how to layout the data<br />
space. The compiler achieves<br />
this by analyzing how data<br />
is used by the application.<br />
Consider a program that<br />
copies two different data<br />
structures using DMA. Each<br />
copy operation requires a<br />
separate DMA operation.<br />
However, after the compiler<br />
collocates the data structures<br />
in memory, they can be copied<br />
with a single DMA transfer.<br />
Note that each MCU may use<br />
the DMA in a slightly different<br />
manner. Keil’s MDK-ARM,<br />
for example, abstracts how<br />
the DMA is used from the<br />
application through an API<br />
that prevents code from being<br />
tied to a particular processor.<br />
This enables developers to<br />
migrate applications to other<br />
<strong>STM32</strong> devices and know that<br />
code utilizing the DMA will still<br />
perform optimally.<br />
Writing 32-bit Code<br />
Moving from 8-bit to 32-bit<br />
assembly is not trivial, given<br />
the vastly different instructions<br />
32-bit architectures offer; i.e.,<br />
single-instruction, multiple data<br />
(SIMD) instructions work on<br />
multiple data to vastly accelerate<br />
processing. Even moving<br />
between 16-bit architectures<br />
is challenging given that the<br />
peripherals can differ and impact<br />
how application code is written.<br />
5
<strong>STM32</strong> <strong>Journal</strong><br />
Benchmark<br />
Application 8-bit Math 8-bit Matrix 8-bit Switch 16-bit Math 16-bit Matrix 16-bit Switch 32-bit Math<br />
Floating-Point<br />
Math<br />
Matrix<br />
Multiplication<br />
FIR filter<br />
MSP430 178 86 198 126 90 198 222 1102 136 980<br />
dsPIC 236 420 424 224 552 424 424 2020 464 2256<br />
PIC24 236 420 416 224 552 416 424 2020 464 2256<br />
16-bit<br />
H8/300H 344 412 444 352 482 478 574 1104 482 1392<br />
MaxQ20 230 252 192 204 328 184 288 1172 398 1478<br />
HCS12 83 188 162 76 262 174 323 2082 219 1917<br />
ATxmega64A1 118 398 338 174 490 350 300 1080 584 1362<br />
ARM7TDMI 636 392 452 636 396 452 620 1832 428 1528<br />
8051 233 398 305 452 504 493 909 2190 536 2056<br />
8-bit<br />
PIC18F242 170 324 208 286 692 282 542 1400 676 2006<br />
ATmega8 134 354 350 198 434 382 342 1088 490 1358<br />
<strong>STM32</strong> F0<br />
Compiler Option Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size Speed Size<br />
Code Size 110 94 68 52 98 120 114 106 68 52 98 120 112 108 640 636 268 84 550 550<br />
Figure 2 The <strong>STM32</strong> F0 delivers substantial code size reduction when comparing to other 8/16 architectures. This grid show the code size in bytes for various benchmark<br />
applications. Source: Benchmark applications and results for 8/16-bit: TI MSP430 Competitive Benchmarking (www.ti.com/lit/an/slaa205c/slaa205c.pdf). <strong>STM32</strong> F0<br />
code generated with MDK V4.23, MicroLib, and compiler optimization for execution speed or code size.<br />
The <strong>STM32</strong> F0 architecture<br />
facilitates a smooth migration to<br />
32-bits. The ability to develop<br />
in embedded C reduces the<br />
learning curve of moving to<br />
a new architecture. In many<br />
cases, engineers are already<br />
familiar with the ARM Cortex-M<br />
architecture. Developers can<br />
further ease migration by using<br />
a tool chain they are already<br />
familiar with, such as IAR<br />
Embedded Workbench and Keil's<br />
MDK-ARM. Finally, developing<br />
for the <strong>STM32</strong> F0 is simplified<br />
through the use of the ARM<br />
CMSIS libraries that abstract<br />
much of the underlying hardware<br />
from the application.<br />
Moving to the <strong>STM32</strong> F0 will<br />
result in a substantial reduction<br />
in code size because of the<br />
density possible with 32-bit<br />
instructions, on the order of<br />
30% (see Figure 2). With its<br />
32-bit address space, the<br />
<strong>STM32</strong> F0 also eliminates<br />
addressing and paging<br />
limitations that complicate<br />
memory management in 8-bit<br />
designs. For example, data<br />
6
<strong>STM32</strong> <strong>Journal</strong><br />
sets can be larger than a single<br />
page and there are no longer<br />
“far” addressing penalties.<br />
The use of object-oriented<br />
constructs, as is common with<br />
modern programming and<br />
modeling tools, can also be<br />
implemented without disruptive<br />
fragmentation.<br />
Without question, the best<br />
compiler is the human brain.<br />
Given enough time, a person<br />
can create a highly optimized<br />
program that no compiler can<br />
beat. Programming in assembly<br />
can also be more efficient than a<br />
C version of the same program.<br />
Time, however, is one of the<br />
resources of which developers<br />
don’t have a surplus. In addition,<br />
hand-written code can be<br />
extremely fragile; if the product<br />
specs change in a material<br />
way, many of a programmer’s<br />
optimizations will need to be<br />
completely reevaluated.<br />
The reality is that Keil’s MDK-<br />
ARM and IAR Embedded<br />
Workbench are smart enough<br />
to make excellent coding<br />
choices that might take a<br />
person weeks to evaluate. For<br />
example, how data is laid out<br />
impacts performance. There’s<br />
also the challenge of balancing<br />
optimization techniques like<br />
loop unrolling to memory<br />
footprint. A compiler can make<br />
these decisions for an entire<br />
program in just minutes. Each<br />
of these tools offers numerous<br />
optimization options it can<br />
perform automatically for the<br />
<strong>STM32</strong> F0 architecture that are<br />
significantly different than those<br />
typical with 8- and 16-bit MCUs.<br />
These options include data-flow<br />
optimizations such as common<br />
sub-expression elimination and<br />
loop optimizations such as loop<br />
combining and distribution.<br />
They also include advanced<br />
techniques like branch<br />
speculation and executing code<br />
out of sequence.<br />
These development tools for<br />
the <strong>STM32</strong> F0 give excellent<br />
results. Compiler efficiency<br />
compared to human coding<br />
has been estimated at 97%.<br />
Put another way, the cost of<br />
achieving that last 3% is on the<br />
order of weeks to months of<br />
development time. In addition,<br />
if a major design change<br />
is required, the compiler<br />
can complete a new set of<br />
optimizations with just a simple<br />
recompile.<br />
As a modern architecture, the<br />
<strong>STM32</strong> F0 is supported by<br />
similarly modern tools that<br />
utilize the latest advancements<br />
in compiler, debugger, and<br />
middleware technology to<br />
reduce development time and<br />
effort considerably. Being based<br />
on the Cortex-M architecture,<br />
the <strong>STM32</strong> F0 is backed by a<br />
larger ecosystem of tools and<br />
production-ready software than<br />
any other MCU architecture on<br />
the market. In addition, for many<br />
applications where the code<br />
base is small, the tools may be<br />
effectively free. For example,<br />
both IAR Embedded Workbench<br />
and Keil’s MDK-ARM are<br />
free when used for programs<br />
under 32 KB, thus enabling<br />
32-bit design with a low initial<br />
investment.<br />
Advanced Debugging<br />
While the ability to design<br />
demanding applications quickly<br />
is important, developers need<br />
debugging capabilities that<br />
can abstract the complexity of<br />
applications while still providing<br />
full visibility and control during<br />
run-time operation. In addition,<br />
many embedded markets,<br />
including medical and industrial,<br />
require that application software<br />
be certified as well.<br />
The integrated debug<br />
capabilities of the <strong>STM32</strong><br />
F0 provide many advanced<br />
capabilities that offer a superior<br />
debug experience compared<br />
to old-fashioned 8- and 16-bit<br />
architectures. For example, the<br />
<strong>STM32</strong> F0 architecture features<br />
ARM’s Coresight technology<br />
to help developers analyze,<br />
optimize, and verify program<br />
execution with minimal effort<br />
and cost.<br />
Coresight represents the<br />
latest in advanced debugging<br />
technology. Traditional MCUs<br />
offer only limited run/stop debug<br />
capabilities. To achieve greater<br />
visibility, an in-circuit emulator<br />
on the order of $1000s may be<br />
required, and a different pod will<br />
be required for each MCU in use.<br />
A few of the benefits Coresight<br />
provides which other MCU<br />
architectures do not include<br />
on-the-fly read/write access<br />
and trace capabilities at the<br />
instruction, data, and application<br />
level. As implemented in the<br />
<strong>STM32</strong> F0, Coresight also<br />
supports up to 4 hardware<br />
breakpoints and 2 watchpoints<br />
without requiring the use of<br />
intrusive monitoring techniques<br />
that can skew performance.<br />
Developers also have a choice<br />
of many low-cost debug<br />
adapters for the <strong>STM32</strong> F0. For<br />
example, the STLink in-circuit<br />
7
<strong>STM32</strong> <strong>Journal</strong><br />
debugger and programmer,<br />
which links the <strong>STM32</strong> F0<br />
target board to a PC via USB,<br />
is $25. For more advanced<br />
debugging, IAR Systems has the<br />
I-Jet debugger while Keil offers<br />
developers its ULINK2 and<br />
ULINKpro debuggers.<br />
These debuggers offer<br />
powerful capabilities that are<br />
often not available for 8- and<br />
16-bit designs. Keil MDK-<br />
ARM tools, for example,<br />
enable comprehensive code<br />
coverage, execution profiling,<br />
and performance analysis to<br />
ensure maximum performance<br />
efficiency. With the I-jet<br />
debugger, IAR Systems is able<br />
to offer non-intrusive power<br />
consumption monitoring at<br />
the board- and chip-level.<br />
Such “power debugging”<br />
enables developers to uncover<br />
opportunities to utilize and tune<br />
hardware to achieve the highest<br />
power efficiency.<br />
<strong>STM32</strong> F0 Features<br />
<strong>STM32</strong> F0 MCUs have been<br />
designed with real-time<br />
operating system (RTOS) and<br />
kernel support in mind to<br />
enable much tighter integration<br />
with RTOSes like Keil’s royaltyfree<br />
RTX. In a typical 8- or<br />
16-bit MCU, for example, the<br />
RTOS and application share<br />
the stack, and complex<br />
nesting problems can arise<br />
that overflow the stack and<br />
crash the system. The only<br />
way to avoid such issues is to<br />
overprovision the stack. The<br />
<strong>STM32</strong> F0, in contrast, has two<br />
stacks: one for the application<br />
and one for the RTOS. This<br />
prevents applications from<br />
compromising RTOS integrity.<br />
In addition, RAM overhead is<br />
much lower.<br />
Other companies basing MCUs<br />
on the Cortex-M0 architecture<br />
integrate only the minimum<br />
capabilities an MCU requires.<br />
ST is the only company to offer<br />
Cortex-M0-based MCUs with:<br />
〉〉 Easy Communication: Using<br />
the integrated DMA controller,<br />
the <strong>STM32</strong> F0 can support<br />
continuous I 2 C at a rate of 1<br />
Mbps without bogging down<br />
the CPU. This data rate isn’t<br />
possible to achieve on an 8-<br />
or 16-bit MCU that does not<br />
support DMA.<br />
〉〉 Advanced Digital and Analog<br />
Capabilities: The <strong>STM32</strong> F0<br />
integrates a wide range of<br />
IP to facilitate the design of<br />
sensing and control systems.<br />
For example, advanced timers<br />
enable the accurate output<br />
of complex AC waveforms.<br />
On-chip comparators simplify<br />
the design of sensors.<br />
The 12-bit, multi-channel<br />
ADC operating at up to 1<br />
MSample/s allows for fast<br />
and precise data acquisition,<br />
as well as improves system<br />
responsiveness to external<br />
events. Advanced timing<br />
control is enabled using the<br />
32-bit and 16-bit PWM timers<br />
with 17 capture/compare I/O<br />
mapped onto up to 28 pins.<br />
〉〉 Safety Ready: With shrinking<br />
process technologies and<br />
larger memories combined<br />
with frequently changing data,<br />
bit errors from cosmic rays<br />
can occur. For systems that<br />
must meet stringent safety<br />
compliance standards, the<br />
<strong>STM32</strong> F0 performs realtime,<br />
hardware-based RAM<br />
parity checking and 16-bit<br />
CRC verification for Flash to<br />
ensure the integrity of memory.<br />
RAM checks are performed<br />
automatically whenever<br />
memory is accessed. Flash<br />
verification is self-managed,<br />
enabling developers to confirm<br />
program integrity upon startup<br />
and when updating firmware<br />
to verify that no bits have been<br />
flipped since they were written.<br />
Double Sourcing<br />
In recent years, the<br />
industry has seen<br />
shortages when devices<br />
are manufactured in a<br />
single location. To ensure<br />
its customers will always<br />
have uninterrupted<br />
access to product,<br />
ST employs a double<br />
sourcing strategy in<br />
which all <strong>STM32</strong> devices<br />
are manufactured in at<br />
least two fabs in different<br />
parts of the world. This<br />
prevents product supply<br />
from being vulnerable to<br />
environmental factors that<br />
shut down a particular<br />
production fab. It also<br />
enables ST to meet any<br />
unanticipated rise in<br />
demand more easily by<br />
shifting production among<br />
multiple fabs.<br />
〉〉 Reliability: The <strong>STM32</strong> F0<br />
integrates two watchdog<br />
timers, one of which is a<br />
windowed watchdog timer.<br />
These timers, which can<br />
operate in low power modes<br />
as well, provide a higher level<br />
9
<strong>STM32</strong> <strong>Journal</strong><br />
1.8<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
<strong>STM32</strong> F0 Benchmark Positioning<br />
CoreMark/MHz<br />
of reliability not available in<br />
most 8- and 16-bit MCUs. A<br />
Clock Security System (CSS)<br />
enables systems to switch to<br />
internal RC-based clocking in<br />
case of external clock failure<br />
to ensure systems can shut<br />
down gracefully rather than<br />
catastrophically.<br />
〉〉 Optimized Communications:<br />
The <strong>STM32</strong> F0 supports the<br />
HDMI Consumer Electronics<br />
Communication (CEC)<br />
protocol. Important for<br />
devices targeted for consumer<br />
markets, this peripheral<br />
enables devices to have smart<br />
control over multiple HDMI<br />
lines. For devices needing<br />
Competitor A (8/16-bit)<br />
Competitor B (8-bit)<br />
Competitor C (16-bit)<br />
Competitor D (16-bit)<br />
<strong>STM32</strong> F0 (Cortex-M0)<br />
Figure 3 The <strong>STM32</strong> F0 provides an optimal balance of cost, performance, and peripherals for<br />
embedded applications.<br />
remote control capabilities,<br />
ST provides a full infrared<br />
firmware library.<br />
〉〉 Memory: Memory capacity<br />
ranges from 16 KB to<br />
128 KB Flash<br />
〉〉 1.8V Ready: The <strong>STM32</strong><br />
F0 can interface directly to<br />
1.8 to 3.6 V-based devices.<br />
This eliminates the need<br />
for additional conditioning<br />
circuitry 8- and 16-bit<br />
MCUs require.<br />
〉〉 Capacitive Touch Sensing:<br />
To add touch to 8- and 16-<br />
bit MCU-based designs, a<br />
second processor is typically<br />
required. With the <strong>STM32</strong><br />
F0, developers can easily<br />
introduce capacitive touch<br />
sensing to applications,<br />
with up to 18 keys and<br />
slider/wheel configurations,<br />
all with a single chip. In<br />
addition, touch sensing can<br />
be implemented with zero<br />
CPU loading when using the<br />
charge transfer method.<br />
Overall, the <strong>STM32</strong> F0 provides<br />
an optimal balance of cost,<br />
performance, and peripherals<br />
for embedded applications<br />
(see Figure 3). Rather than tie<br />
developers to a proprietary<br />
architecture with limited tools<br />
and support, ST offers the<br />
industry’s widest Cortex-M<br />
portfolio with more than 300<br />
compatible devices across the<br />
entire <strong>STM32</strong> family.<br />
With code-, pin-, and peripheralcompatibility<br />
across the <strong>STM32</strong><br />
family, developers can leverage<br />
Cortex-M0-based designs to<br />
M3- and M4-based MCUs<br />
with unparalleled flexibility. For<br />
example, applications designed<br />
using the <strong>STM32</strong> F0 are easily<br />
migrated to the <strong>STM32</strong> F2 and<br />
<strong>STM32</strong> F4. With Keil’s MDK-<br />
ARM and IAR Embedded<br />
Workbench, developers just need<br />
to change the MCU selection<br />
and the compiler handles all of<br />
the details by recompiling the<br />
code. This enables developers<br />
to easily migrate to an MCU with<br />
more performance, memory,<br />
and peripherals without rewriting<br />
the application. As a result,<br />
developers can leverage the<br />
same application and tool chain<br />
across an entire product line and<br />
a variety of MCUs.<br />
Similarly, developers have the<br />
option of designing code on<br />
the <strong>STM32</strong> F2 or F4 with the<br />
intention of later downsizing<br />
to the <strong>STM32</strong> F0. This enables<br />
design to take place on a<br />
platform with the highest<br />
performance and memory to<br />
accelerate proof-of-concept<br />
design. Once the design has<br />
settled, developers can optimize<br />
it for the <strong>STM32</strong> F0.<br />
With the <strong>STM32</strong> F0, ST offers a<br />
compelling alternative to 8- and<br />
16-bit devices. For the same<br />
price, developers get more<br />
performance, higher resolution<br />
peripherals, better tools,<br />
wider support, accelerated<br />
development, and faster timeto-market.<br />
To explore how the<br />
new <strong>STM32</strong> F0 can bring the<br />
benefits of 32-bit technology<br />
to your designs, the <strong>STM32</strong> F0<br />
Discovery Kit is available now<br />
for less than $10.<br />
10
<strong>STM32</strong> <strong>Journal</strong><br />
Developing High-Quality Audio for<br />
Consumer Electronics Applications<br />
By Paul Beckmann, CEO/CTO, DSP Concepts<br />
Dragos Davidescu, Chief System Architect, STMicroelectronics<br />
John Knab, Application Engineer, STMicroelectronics<br />
With the falling cost of<br />
high-performance MCUs,<br />
manufacturers are considering<br />
adding digital audio functionality<br />
to more and more consumer<br />
devices and other embedded<br />
applications. Their goal is to<br />
support the wide variety of media<br />
sources users want to access<br />
such as an iPhone, Internet<br />
radio, external USB devices, and<br />
SD cards.<br />
Achieving high-quality sound<br />
output, however, is nontrivial.<br />
Sound quality depends<br />
greatly upon the final system<br />
configuration, making it difficult<br />
to design even when prototype<br />
hardware is available. In<br />
addition, implementing realtime<br />
digital signal processing<br />
algorithms introduces a<br />
whole new set of concerns<br />
for developers used to MCUbased<br />
design. These include<br />
implementing advanced filters<br />
and processing algorithms,<br />
handling fixed-point issues,<br />
using DSP-like instructions,<br />
and optimizing complex<br />
algorithms for speed, MIPS,<br />
memory, and power.<br />
In this article, we’ll show how<br />
developers can leverage MCUbased<br />
digital signal processing<br />
(DSP) and floating-point unit<br />
(FPU) capabilities to enable<br />
real-time audio playback,<br />
implement enhanced algorithms,<br />
convert between multiple<br />
clock domains, manage highspeed<br />
communications without<br />
impacting audio quality, optimize<br />
designs to balance quality and<br />
cost, and manage other system<br />
tasks such as a graphical<br />
user interface, all with a single<br />
microcontroller.<br />
Consumer Audio<br />
Traditionally, introducing audio<br />
to an embedded application<br />
requires digital signal-processing<br />
capabilities beyond the capabilities<br />
of most MCUs. Even a “simple”<br />
product like an iPod speaker dock<br />
requires a significant number of<br />
advanced audio algorithms to<br />
achieve full performance:<br />
Spatial enhancement: In<br />
an iPod docking station, the<br />
speakers may be only 12-18<br />
inches apart. To create a more<br />
spacious, rich sound, spatial<br />
enhancement is required is<br />
to compensate for the close<br />
proximity of the speakers.<br />
Multi-channel audio: For<br />
systems supporting more than<br />
two speakers, the stereo input<br />
signal requires processing to<br />
create the additional audio<br />
channels.<br />
Equalization: Speakers need to<br />
be equalized to achieve better<br />
sound quality. If the speakers<br />
in use change, the equalization<br />
needs to be adjusted as well.<br />
Developers can employ a<br />
variety of equalization methods,<br />
including graphic and parametric<br />
equalization. For higher-end<br />
applications, developers may<br />
even want to design their own<br />
equalization algorithms using a<br />
tool like MATLAB.<br />
Peak limiting: Speakers exhibit<br />
nonlinearity at louder sound<br />
levels. By applying a time-varying<br />
gain and carefully controlling the<br />
peak levels, the system can play<br />
louder with a minimum amount<br />
of distortion.<br />
Boost: When listening to music<br />
at low volume levels, much<br />
detail, and therefore depth, can<br />
be lost. Boosting of the bass and<br />
certain other frequencies at low<br />
volume levels using loudness<br />
compensation or perceptual<br />
volume-control techniques can<br />
significantly improve perceived<br />
sound quality.<br />
Level matching: Level matching<br />
eliminates the need for users to<br />
adjust the volume for each song<br />
11
<strong>STM32</strong> <strong>Journal</strong><br />
when shuffling through a large<br />
library of albums.<br />
Digital audio has commonly<br />
been implemented in consumer<br />
electronics and embedded<br />
applications using a second<br />
processor dedicated to this task.<br />
To meet market cost pressures,<br />
however, manufacturers need to<br />
be able to process audio on the<br />
host CPU.<br />
In general, it is easier to<br />
implement audio on an MCU<br />
than it is to implement real-time<br />
responsiveness and connectivity<br />
on a DSP. DSPs, while excellent<br />
at processing audio, don’t have<br />
the peripherals or interrupt<br />
responsiveness required for realtime<br />
systems. DSP architectures<br />
are also typically designed for<br />
high-end signal processing<br />
and massive parallelism that<br />
exceeds the requirements of the<br />
typical consumer application. In<br />
addition, DSPs are not designed<br />
to support communication<br />
interfaces like USB, SD cards, or<br />
Wi-Fi, so a DSP-based docking<br />
station would still require a<br />
second processor to handle<br />
connectivity.<br />
With the introduction of DSP<br />
capabilities to MCU instruction<br />
sets, MCUs now have the<br />
advanced math processing<br />
System<br />
Power supply<br />
1.2 V regulator<br />
POR/PDR/PVD<br />
Xtal oscillators<br />
32 kHz + 4 ~26 MHz<br />
Internal RC oscillators<br />
32 kHz + 16 MHz<br />
PLL<br />
Clock control<br />
RTC/AWU<br />
SysTick timer<br />
2x watchdogs<br />
(independent and window)<br />
51/82/114/140 I/Os<br />
Cyclic redundancy<br />
check (CRC)<br />
Control<br />
2x 16-bit motor control<br />
PWM<br />
Synchronized AC timer<br />
10x 16-bit timers<br />
2x 32-bit timers<br />
capabilities required to handle<br />
not only basic audio processing<br />
but the advanced algorithms<br />
required to improve quality as<br />
well. In addition, rather than<br />
requiring developers to handcode<br />
assembly as is typical for<br />
DSP-based designs, MCUs offer<br />
ease-of-use and faster time-tomarket<br />
through C programming<br />
and application libraries.<br />
MCUs are also specifically<br />
ART Accelerator<br />
<strong>STM32</strong> F4<br />
ARM Cortex-M4<br />
168 MHz<br />
Floating-point unit (FPU)<br />
Nested vector<br />
interrupt<br />
controller (NVIC)<br />
MPU<br />
JTAG/SW debug/ETM<br />
Multi-AHB bus matrix<br />
16-channel DMA<br />
Crypto/hash processor<br />
3DES, AES 256<br />
SHA-1, MD5, HMAC<br />
True random number generator (RNG)<br />
architected to provide short and<br />
deterministic interrupt latency as<br />
well as ultra low-power operation<br />
for battery-powered applications.<br />
The <strong>STM32</strong> MCU + Audio<br />
Architecture<br />
The <strong>STM32</strong> architecture from<br />
ST has been designed to<br />
bring 32-bit MCU capabilities<br />
to a wide range of consumer<br />
audio applications, including<br />
Up to 1-Mbyte Flash memory<br />
Up to 192-Kbyte SRAM<br />
FSMC/SRAM/NOR/NAND/CF/LCD<br />
parallel interface<br />
80-byte + 4-Kbyte backup SRAM<br />
512 OTP bytes<br />
Connectivity<br />
Camera interface<br />
3x SPI, 2x I 2 S, 3x I 2 C<br />
Ethernet MAC 10/100<br />
with IEEE 1588<br />
2x CAN 2.0B<br />
1x USB 2.0 OTG FS/HS<br />
1x USB 2.0 OTG FS<br />
SDIO<br />
6x USART<br />
LIN, smartcard, lrDA,<br />
modem control<br />
Analog<br />
2-channel 2x 12-bit DAC<br />
3x 12-bit ADC<br />
24 channels / 2.4 MSPS<br />
Temperature sensor<br />
Figure 1 ST has expanded its <strong>STM32</strong> MCUs beyond the base Cortex-M architecture with a variety of integrated peripherals<br />
to create a wide range of MCUs that optimize performance, memory, and cost for nearly every embedded application.<br />
multimedia speakers, docking<br />
stations, and headphones. The<br />
<strong>STM32</strong> F4, based on an ARM<br />
Cortex-M4 core operating at<br />
up to 168 MHz, also integrates<br />
capabilities such as DSP<br />
instructions and a floating-point<br />
unit to allow manufacturers<br />
to produce consumer audio<br />
applications offering quality<br />
playback at the lowest cost<br />
(see Figure 1).<br />
12
<strong>STM32</strong> <strong>Journal</strong><br />
DSP application example: MP3 audio playback<br />
General Purpose MCUs<br />
Min<br />
Max<br />
Discrete DSPs<br />
Cortex-M4<br />
Specialised Audio DSPs<br />
0 5 10 15 20 25 30<br />
MHz required for MP3 decode (smaller is better!)<br />
DSP Concept<br />
Figure 2 With its Cortex-M4 core, the <strong>STM32</strong> F4 offers excellent audio processing capabilities<br />
that exceed the performance of many general-purpose MCUs and discrete DSPs.<br />
The <strong>STM32</strong> F4 offers excellent<br />
audio processing capabilities<br />
(see Figure 2). With its rich<br />
peripheral integration, a single<br />
<strong>STM32</strong> F4 can provide a costeffective,<br />
single-chip solution for<br />
implementing embedded audio<br />
that combines performance,<br />
ease-of-use, connectivity, and<br />
signal processing to achieve<br />
quality audio playback. Key<br />
capabilities of the <strong>STM32</strong> F4<br />
for accelerating audio design,<br />
enhancing performance, and<br />
lowering system cost include:<br />
Digital Signal Processing<br />
Instructions: With the <strong>STM32</strong><br />
F4, developers have access<br />
to up to 105 DSP-specific<br />
instructions. These instructions<br />
include single-cycle multiplyand-accumulate<br />
(MAC),<br />
saturated arithmetic, and both<br />
8- and 16-bit SIMD integer<br />
operations. Its architecture is<br />
designed to enable high-quality<br />
audio in consumer electronics<br />
and embedded applications in a<br />
more cost-effective manner than<br />
is possible with DSPs.<br />
13
<strong>STM32</strong> <strong>Journal</strong><br />
Floating-Point Unit: All <strong>STM32</strong><br />
F4 devices also have an<br />
integrated floating-point unit<br />
(FPU). While signal-processing<br />
algorithms can be implemented<br />
using fixed-point arithmetic,<br />
this approach adds complexity<br />
in that underflow and overflow<br />
need to be manually managed. In<br />
addition, fixed-point processing<br />
offers less dynamic range than<br />
floating-point, which impacts<br />
many audio functions. With<br />
the integrated FPU, there is no<br />
penalty for retaining this precision.<br />
Code based on floating-point can<br />
also be substantially faster and<br />
requires less memory than fixedpoint<br />
code.<br />
32-bit Efficiency: The bus<br />
size of the processor has a<br />
tremendous impact on both<br />
performance and power<br />
efficiency. Even if audio<br />
samples are streaming at 16<br />
bits, the system still needs<br />
32-bits to store intermediate<br />
computations. A 16-bit MCU<br />
or DSP, for example, requires<br />
seven operations (4 multiplies<br />
and 3 additions) just to complete<br />
a single 32 x 32 multiplication.<br />
The <strong>STM32</strong> F4 can execute<br />
a 32-bit MAC (multiply and<br />
accumulate) with only one<br />
single-cycle operation.<br />
7-layer 32-bit multi-AHB bus matrix<br />
Instructions<br />
Cortex-M4<br />
with CPU<br />
120 168 MHz<br />
Data<br />
64 Kbytes SRAM<br />
System<br />
Generalpurpose<br />
DMA1<br />
8 channels<br />
DMA_MEM1<br />
DMA_P1<br />
Multi-Layer Bus Fabric:<br />
The key to real-time signal<br />
processing is maintaining<br />
efficient data flow. In a<br />
consumer audio device,<br />
however, the MCU must<br />
move not only signal data but<br />
manage program memory,<br />
communication ports, and other<br />
system tasks. The complexity<br />
and real-time nature of audio<br />
algorithms also requires<br />
them to be integrated with<br />
application code to ensure that<br />
Generalpurpose<br />
DMA2<br />
8 channels<br />
DMA_MEM2<br />
DMA_P2<br />
Ethernet<br />
MAC 10/100<br />
DMA<br />
USB OTG<br />
HS<br />
DMA<br />
100 Mbit/s 480 Mbit/s<br />
12.5 MByte/s 60 MByte/s<br />
Bus masters<br />
neither core application tasks<br />
nor audio playback compromise<br />
each other.<br />
The <strong>STM32</strong> architecture is<br />
designed to minimize this<br />
problem so that developers<br />
do not need to spend time<br />
resolving potential conflicts.<br />
This is achieved through the low<br />
interrupt service overhead of<br />
<strong>STM32</strong> devices combined with<br />
the multi-layer bus fabric that<br />
allows multiple DMA transactions<br />
to occur simultaneously without<br />
Bus Slaves<br />
FSMC<br />
AHB2 peripheral<br />
AHB1 peripheral<br />
SRAM 16 Kbytes<br />
SRAM 112 Kbytes<br />
672 MByte/s D D<br />
Flash<br />
672 MByte/s I I 1 Mbyte<br />
ART<br />
Accelerator<br />
AHB1/APB1<br />
AHB1/APB2<br />
Figure 3 The multi-layer matrix that interconnects <strong>STM32</strong> MCUs with peripherals and memory enables simultaneous transfer between multiple<br />
masters and slaves without requiring involvement from the CPU. This provides <strong>STM32</strong> MCUs with a tremendous interconnect<br />
capacity that eliminates peripheral and memory access bottlenecks for the highest operating performance.<br />
burdening the CPU. Figure<br />
3 shows the high level of<br />
parallelism that can be achieved<br />
through simultaneous transfers<br />
over a multi-layer fabric:<br />
〉〉 Program code is executed<br />
from Flash with data stored in<br />
SRAM (red)<br />
〉〉 The compressed audio stream<br />
is received over USB and<br />
stored in SRAM (green).<br />
〉〉 CPU with DSP and FPU<br />
functionality accesses the<br />
14
<strong>STM32</strong> <strong>Journal</strong><br />
compressed audio stream for<br />
decompression and signal<br />
processing (green)<br />
〉〉 Decompressed MP3 data is<br />
sent from the CPU to SRAM<br />
(yellow)<br />
〉〉 Audio data is output to I2S<br />
through DMA (orange)<br />
〉〉 Graphical icons are transferred<br />
from Flash to the display<br />
through DMA (blue)<br />
Communications Interfaces:<br />
Users want to be able to access<br />
audio data from different sources<br />
and over different interfaces.<br />
With the right mix of interfaces—<br />
including USB (host and device),<br />
Ethernet for Internet Radio,<br />
SDIO, and external memory—<br />
developers can create flexible<br />
devices that support a wide<br />
range of usage models.<br />
In addition to being able to<br />
receive data without loading<br />
the CPU, developers need to<br />
be able to address the many<br />
issues related to streaming<br />
audio, including lost packets and<br />
lack of feedback controls. For<br />
example, USB feedback controls<br />
to prevent under and overflow of<br />
the audio buffer are not always<br />
used or well implemented. This<br />
can result in lost or dropped<br />
packets that impact audio<br />
quality. To overcome this<br />
limitation, developers can utilize<br />
sample-rate conversion (SRC).<br />
SRC is also useful for converting<br />
between audio speeds (i.e., clock<br />
domains) while maintaining audio<br />
fidelity, compensating for slight<br />
mismatches in clock speed, or<br />
for mixing audio from different<br />
sources. For applications that<br />
need SRC, the <strong>STM32</strong> F4<br />
requires only 10% utilization,<br />
leaving plenty of headroom for<br />
other signal-processing tasks.<br />
Multiple Clock Sources:<br />
Consumer audio systems require<br />
a number of different clock<br />
domains—including the CPU,<br />
USB, and I2S—that have fixed<br />
frequencies and need to be<br />
accurate as well as free of jitter.<br />
Trying to use the same clock<br />
for each of these can impact<br />
precision. For example, it is<br />
straightforward to achieve a clean<br />
clock at 168 MHz for the CPU,<br />
44.1 KHz for an I2S interface or<br />
48 MHz for USB but not for all<br />
three using a single clock source.<br />
The <strong>STM32</strong> F4 integrates two<br />
PLLs for increased clocking<br />
flexibility. The main PLL is used<br />
to generate the system clock<br />
and the second PLL is available<br />
to generate the accurate clocks<br />
needed for high-quality audio.<br />
Complete audio system<br />
<strong>STM32</strong> F2<br />
CPU load<br />
<strong>STM32</strong> F4<br />
CPU load<br />
Flash<br />
footprint<br />
RAM<br />
footprint<br />
MP3 decoder 17% 6% 23k 12344<br />
MP3 encoder 22.5% 9% 25k 16060<br />
WMA decoder 17.5% 6% 45k 36076<br />
AAC+ v2 decoder 25% 11% 54k 87000<br />
Channel mixer 2.5% 2% 0.6k 16<br />
Parametric Equalizer 16% 12% 2k 300<br />
Loudness Control 4.5% 3.5% 3.25k 632<br />
SRC 22.5% 10% 17.5k 1880<br />
Figure 4 When computing 16- and 32-bit DSP functions, the<strong>STM32</strong> F4 offers a 25–70%<br />
improvement. As a result, systems can drop into sleep mode faster to conserve<br />
power or run more algorithms to further improve audio quality.<br />
In addition to being able to receive data<br />
without loading the CPU, developers need<br />
to be able to address the many issues<br />
related to streaming audio, including lost<br />
packets and lack of feedback controls.<br />
For example, USB feedback controls to<br />
prevent under and overflow of the audio<br />
are not always used or well implemented.<br />
15
<strong>STM32</strong> <strong>Journal</strong><br />
CMSIS DSP library to accelerate<br />
development. The CMSIS<br />
DSP library includes a large<br />
number of DSP and floatingpoint<br />
functions optimized for<br />
the algorithms commonly used<br />
in audio applications. This<br />
library is supplied by ARM for<br />
processors built around the<br />
Cortex-M4 processor. DSP<br />
Concepts is the company that<br />
wrote the CMSIS DSP library.<br />
They have leveraged their<br />
intimate knowledge of the library<br />
to create the audio blocks that<br />
make up their signal processing<br />
design tool, Audio Weaver.<br />
Figure 5 Audio Weaver from DSP Concepts offers a GUI-based development environment that<br />
enables developers to design the signal flow for their application by selecting processing<br />
blocks and connecting them using a drag-and-drop editor.<br />
The ability to source different<br />
clock domains enables designs<br />
based on the <strong>STM32</strong> architecture<br />
to maintain a permanent USB<br />
connection and avoid audio<br />
synchronization issues.<br />
Integrated Audio Interfaces:<br />
The <strong>STM32</strong> F4 has two fullduplex<br />
I2S standard stereo<br />
interfaces offering less than<br />
0.5% sampling frequency error.<br />
There is also an external clock<br />
input to the I2S peripheral if<br />
an external high-quality audio<br />
PLL is preferred. In addition to<br />
simplifying design, integrating<br />
the I2S interfaces reduces<br />
component count, board size,<br />
and system cost.<br />
MCU Peripherals: The <strong>STM32</strong><br />
architecture includes all of the<br />
real-time peripherals required for<br />
even the most demanding MCUbased<br />
application.<br />
The combination of the <strong>STM32</strong><br />
F4’s capabilities brings a new<br />
level of performance to audio<br />
applications. Performing<br />
a long 32-bit multiply or<br />
multiply-accumulate (MAC)<br />
operation on an <strong>STM32</strong> F2, for<br />
example, takes 3-7 cycles. With<br />
the <strong>STM32</strong> F4, this operation<br />
is performed in a single cycle.<br />
When computing 16- and 32-<br />
bit DSP functions, the <strong>STM32</strong><br />
F4 offers a 25-70% (see Figure<br />
4) improvement. As a result,<br />
systems can drop into sleep<br />
mode faster to conserve power<br />
or run more algorithms to further<br />
improve audio quality.<br />
In addition to the integrated DSP<br />
capabilities of the <strong>STM32</strong> F4,<br />
developers have access to the<br />
Audio Algorithm Design<br />
Audio Weaver enables<br />
developers to quickly design<br />
the audio processing portion<br />
of their system; i.e., everything<br />
that goes on between receiving<br />
an audio signal and outputting<br />
it. Audio Weaver offers a<br />
GUI-based development<br />
environment that enables<br />
developers to design the signal<br />
flow for their application by<br />
selecting processing blocks<br />
and connecting them using<br />
a drag-and-drop editor (see<br />
Figure 5). Each block has handoptimized<br />
code behind it, and<br />
the tool automatically creates<br />
the required data structures.<br />
16
<strong>STM32</strong> <strong>Journal</strong><br />
Because complex functions are<br />
built from base audio functions,<br />
the final code executes with<br />
no performance or efficiency<br />
losses compared to handcoding<br />
from scratch.<br />
When algorithm code is written<br />
by hand, each design iteration<br />
requires substantial time<br />
investment since the code must<br />
be optimized and tuned to<br />
see what its actual impact on<br />
sound quality and processing<br />
load are. With Audio Weaver,<br />
the design cycle is much faster,<br />
giving developers the ability to<br />
explore more configurations in<br />
their efforts to increase sound<br />
quality while reducing system<br />
cost. Code is highly optimized<br />
for MIPS and memory usage,<br />
supports floating-point<br />
processors such as the <strong>STM32</strong><br />
F4, offers flexible deployment<br />
modules, and does not require<br />
an RTOS to operate. The library<br />
includes over 150 different<br />
audio blocks, including thirdparty<br />
IP.<br />
With tools like Audio Weaver, it<br />
has become possible to create<br />
highly tuned audio applications<br />
without engineers needing<br />
to have a deep knowledge<br />
of audio processing. For<br />
companies new to audio,<br />
complete reference designs are<br />
available, with assistance from<br />
DSP Concepts to tune them<br />
for the final production system.<br />
Companies that are comfortable<br />
with audio processing can<br />
work with individual audio<br />
blocks that provide basic<br />
functionality and build them<br />
into higher-level processing<br />
algorithms. Even sophisticated<br />
companies can accelerate<br />
design using Audio Weaver as<br />
it provides a framework with<br />
core components that not only<br />
jumpstarts design with highly<br />
optimized code but provides a<br />
development environment that<br />
facilitates fast prototyping and<br />
tuning. For these companies the<br />
value-add of Audio Weaver is<br />
faster time-to-market.<br />
Accelerating Optimization<br />
To speed design, Audio Weaver<br />
supports cross-platform<br />
development. The ability to run<br />
the same algorithms on a PC as<br />
on the <strong>STM32</strong> F4 gives engineers<br />
a powerful environment in<br />
which to design and tune the<br />
software in parallel with hardware<br />
development. Once target<br />
hardware is available, the code<br />
can be retargeted for the <strong>STM32</strong><br />
F4 and final optimizations made,<br />
resulting in significant time-tomarket<br />
savings.<br />
During final optimization,<br />
developers can profile the<br />
MIPS and memory required by<br />
each audio processing stage.<br />
This enables engineers to<br />
measure how much a particular<br />
improvement in sound quality will<br />
cost in terms of CPU utilization<br />
to determine the most efficient<br />
use of processing resources<br />
when many functions have to<br />
operate simultaneously.<br />
When algorithm code is written by hand, each design<br />
iteration requires substantial time investment since the<br />
code must be optimized and tuned to see what its actual<br />
impact on sound quality and processing load are. With Audio<br />
Weaver, the design cycle is much faster, giving developers<br />
the ability to explore more configurations in their efforts to<br />
increase sound quality while reducing system cost.<br />
Consider the use of different<br />
order filters to equalize the<br />
speakers. A lower-order filter,<br />
for example, may provide a<br />
frequency response that is 3 dB<br />
off of the ideal response while a<br />
higher order filter is off by only<br />
1 dB. The relative difference<br />
in CPU utilization between<br />
these two filters can be used<br />
17
<strong>STM32</strong> <strong>Journal</strong><br />
to determine where to allocate<br />
CPU resources to maximize<br />
sound quality.<br />
At the end of the day, however,<br />
audio quality is not about<br />
response graphs but how it<br />
actually sounds to people. With<br />
many development systems,<br />
engineers have to make<br />
adjustments to code, recompile,<br />
and download code before they<br />
can hear a new configuration.<br />
However, to assess the impact of<br />
a lower-order filter on quality, for<br />
example, developers need to be<br />
able to hear both configurations<br />
right after each other.<br />
Audio Weaver solves this<br />
problem by supporting a<br />
tuning interface that can<br />
change filter characteristics<br />
in real-time. With the ability to<br />
configure and switch between<br />
multiple settings with the click<br />
of a button, developers can<br />
compare two sets of speaker<br />
equalizations or different spatial<br />
processing. Note that the<br />
tuning interface is seamless<br />
and transparent, compared to<br />
instrumenting code that can<br />
impact quality because of extra<br />
loading on the CPU.<br />
The ability to tune quickly and<br />
easily without recompiling can<br />
substantially shorten the time<br />
it takes to optimize a system.<br />
Flexible tuning also simplifies<br />
the optimization process for<br />
developers new to audio.<br />
Note that audio applications are<br />
not comprised solely of audio<br />
processing. To accelerate system<br />
design, DSP Concepts also<br />
provides an extensive range of<br />
software functionality beyond its<br />
extensive audio module library,<br />
including:<br />
〉〉 Real-time kernel<br />
〉〉 Audio I/O management<br />
〉〉 PC/host control interface<br />
〉〉 Boot loader<br />
〉〉 Update manager<br />
〉〉 Flash file system<br />
System-level Design<br />
One of the challenges to<br />
adding audio to embedded<br />
designs is that while many MCU<br />
manufacturers offer reference<br />
designs, audio is typically<br />
not one of the applications<br />
supported.<br />
To address this shortcoming,<br />
ST has invested significantly in<br />
creating digital audio resources<br />
for its customers in order to<br />
offer complete audio reference<br />
designs as well as tools that<br />
enable the design of quality<br />
audio optimized for the <strong>STM32</strong><br />
architecture. For example,<br />
ST and its partners offer a<br />
variety of evaluation boards<br />
with audio capabilities. ST also<br />
offers several docking station<br />
reference designs that provide<br />
a representative design that<br />
can be used in a wide range of<br />
embedded applications.<br />
For Apple Made for iPod (MFI)<br />
licensees, ST offers the Apple<br />
iAP application, a complete<br />
solution based on <strong>STM32</strong> F2 and<br />
<strong>STM32</strong> F4 devices to deliver a<br />
high-quality music experience.<br />
The Apple iAP application<br />
support both simple accessory<br />
and audio streaming accessory<br />
for iPod, iPhone, and iPad<br />
devices. Components include:<br />
〉〉 Either the <strong>STM32</strong>2xG-EVAL<br />
or <strong>STM32</strong>4xG-EVAL board<br />
to which developers connect<br />
their Apple Authentication<br />
Coprocessor (ACP) circuit<br />
〉〉 Free Apple “iPod Accessory<br />
Protocol” (iAP) firmware with<br />
Lingoes for authentication and<br />
control/information data<br />
〉〉 Free USB Host Library with<br />
USB Host HID class for control<br />
and information data<br />
For audio streaming accessories,<br />
the Apple iAP application also<br />
supports:<br />
〉〉 Free USB Host Library USB<br />
Host Audio classes<br />
〉〉 Remote iPod/iPhone/iPAD<br />
control<br />
〉〉 Digital audio streaming<br />
〉〉 Music tag extraction<br />
〉〉 Flash card reader capabilities,<br />
such as using an SD card or<br />
MMC, that can decode audio<br />
files from this media. Optimized<br />
decoders are provided for this<br />
purpose free of charge<br />
Today’s consumer audio devices<br />
are complex systems that<br />
require both high performance<br />
to support quality playback<br />
and flexibility to meet rapidly<br />
changing market expectations.<br />
With its high performance<br />
core, efficient multi-layer bus<br />
fabric enabling simultaneous<br />
data transactions, and the right<br />
mix of MCU peripherals and<br />
connectivity, the <strong>STM32</strong> F4 is<br />
an ideal architecture for many<br />
embedded and consumer audio<br />
applications. Developers can<br />
now design systems offering<br />
synchronized digital audio<br />
playback of the highest quality<br />
using a single MCU.<br />
18
<strong>STM32</strong> <strong>Journal</strong><br />
Bringing Floating-Point Performance and<br />
Precision to Embedded Applications<br />
By Olivier Ferrand, Application Engineer, STMicroelectronics<br />
Stephane Rainsard, Application Engineer, STMicroelectronics<br />
Abdelhamid Ghith, Application Engineer, STMicroelectronics<br />
Hardware-based floating-point<br />
capabilities have long been<br />
an option on high-end MPUs<br />
and DSPs designed to serve<br />
as computational workhorses.<br />
Embedded systems based<br />
on MCUs, however, have<br />
classically used a fixed-point<br />
implementation.<br />
There are many reasons for<br />
this. The types of simple<br />
calculations embedded<br />
systems needed to make<br />
could be handled sufficiently<br />
using fixed-point math. The<br />
resulting code was not only<br />
well-suited to an MCU’s native<br />
mathematical capabilities, it<br />
resulted in faster execution and<br />
was more memory efficient than<br />
an equivalent floating-point<br />
implementation would be. The<br />
only disadvantage of using<br />
fixed-point was the tradeoff<br />
between range and precision,<br />
which could be tolerated in<br />
most cases.<br />
Some of today’s embedded<br />
systems are completely different<br />
from their early predecessors.<br />
They operate at high clock<br />
speeds and need to perform<br />
more complex calculations than<br />
ever before. While processing<br />
and memory efficiency are still<br />
primary design considerations,<br />
a more useful range and higher<br />
precision have become essential<br />
in many applications.<br />
For example, to output high<br />
quality audio, MCUs need<br />
to support a wide dynamic<br />
range so that the system<br />
sounds good at both low and<br />
high frequencies. Medical<br />
applications, such as heart-rate<br />
monitors and glucose meters,<br />
need to measure more subtle<br />
signals and quantities. For<br />
industrial applications, higher<br />
precision in the MCU enables<br />
developers to use lower quality<br />
components in other parts of<br />
the system, reducing overall<br />
system cost. And with the<br />
increasing use of capacitivesensing<br />
technology, nearly<br />
every embedded application<br />
can benefit from detecting user<br />
inputs with greater accuracy,<br />
especially on displays with<br />
limited surface area.<br />
Floating-Point vs<br />
Fixed-Point<br />
The balanced combination of<br />
range and precision in a floatingpoint<br />
number comes from its<br />
separation of the exponent and<br />
mantissa. When a number is<br />
stored in a fixed-point format,<br />
the position of the exponent is<br />
assumed and all of the bits are<br />
reserved for the mantissa (i.e., the<br />
actual digits used to represent<br />
the number). The issues that<br />
arise with fixed-point formats<br />
are similar to those associated<br />
with reading an ADC: the<br />
wider the range of values to be<br />
represented, the less resolution<br />
there is at the lower end of the<br />
range. If very small changes in<br />
value need to be captured, the<br />
range must be more narrow.<br />
Developers, then, must choose<br />
between range and resolution.<br />
Floating-point addresses these<br />
issues by dedicating a few bits to<br />
an exponent to track the decimal<br />
point. For example, the singleprecision<br />
format supported<br />
by the <strong>STM32</strong> F4 uses one bit<br />
for the sign and 8 bits for the<br />
exponent, leaving 23+1 bits for<br />
the mantissa (the normalized<br />
format used adds an implicit<br />
bit to the 23 bits stored in the<br />
floating-point number). As such,<br />
a single-precision floating-point<br />
number gives a more useful<br />
range and precision combination<br />
compared to 8-, 16-, or even<br />
32-bit fixed-point numbers<br />
which either give a wide range<br />
with greatly reduced precision<br />
or higher precision with a much<br />
smaller range.<br />
19
<strong>STM32</strong> <strong>Journal</strong><br />
The 32-bit single-precision format<br />
is part of the IEEE 754 Floating-<br />
Point Arithmetic standard. This<br />
standard represents decades<br />
of experience and provides a<br />
common approach for supporting<br />
floating-point arithmetic that<br />
unifies processors, coding tools,<br />
and high-level design tools.<br />
Specifically, the 754 standard is<br />
the basis for the floating-point<br />
data types used in C. In turn,<br />
these floating-point C formats are<br />
used in code generated by highlevel<br />
modeling/Meta language<br />
tools like MATLAB and Scilab.<br />
One of the key benefits of using<br />
floating-point is that it enables<br />
developers to make more<br />
efficient use of C and powerful<br />
algorithm development tools that<br />
have previously been reserved<br />
for DSP- and MPU-based<br />
design. Generally speaking,<br />
floating-point numbers are easy<br />
to manipulate in C. High-level<br />
tools further accelerate and<br />
simplify development by enabling<br />
developers to describe complex<br />
algorithms in equation form and<br />
then quickly generate efficient<br />
C code rather than requiring<br />
them to hand-code algorithms<br />
in assembly. In addition, these<br />
high-level tools offer tremendous<br />
flexibility in allowing developers<br />
to rapidly implement changes<br />
to algorithms without having<br />
to completely rewrite and reoptimize<br />
algorithm code. The<br />
end result is significant savings<br />
in development time and cost<br />
savings. Such tools also offer a<br />
powerful means for developers<br />
to test and validate applications.<br />
Software-based<br />
Floating-Point<br />
In the past, developers had<br />
a limited number of ways in<br />
which they could utilize floatingpoint<br />
technology in MCUbased<br />
designs to increase<br />
computational precision.<br />
Introducing a second processor<br />
to handle calculations was<br />
often too expensive to consider.<br />
Alternatively, while the design<br />
could be migrated to a powerful<br />
MPU or DSP, neither type of<br />
architecture is as well-suited<br />
for the real-time demands of<br />
embedded systems as an MCU.<br />
The availability of highperformance<br />
MCUs made it<br />
possible for developers to bring<br />
in a floating-point library to<br />
perform calculations in software.<br />
While such libraries enabled<br />
developers to use floating-point<br />
algorithms, they came at the<br />
expense of significantly reducing<br />
system throughput.<br />
Such libraries also tended to<br />
be quite large and introduced<br />
significant processing overhead.<br />
For example, every operation<br />
between two numbers had to<br />
first align the numbers to have<br />
the same exponent and then,<br />
after performing the operation,<br />
round the result and code it<br />
back into the fixed-point format.<br />
While all of these actions were<br />
performed by the floating-point<br />
library and were not directly<br />
visible to developers, they still<br />
resulted in significantly degraded<br />
performance, greater processing<br />
latency, and an increased<br />
memory footprint.<br />
If none of these alternatives was<br />
feasible and developers still<br />
needed the flexibility that highlevel<br />
modeling tools bring to the<br />
design process, they had the<br />
alternative to manually convert<br />
the floating-point operations in<br />
the C code generated by the tool<br />
into a fixed-point implementation.<br />
The downside of this approach<br />
is the significant time investment<br />
required to adapt the code<br />
combined with the inflexibility to<br />
easily modify the algorithm later in<br />
the design cycle.<br />
The <strong>STM32</strong> F4<br />
Advantage: Integrated<br />
Floating-Point Technology<br />
To stay competitive, developers of<br />
precision and high-performance<br />
embedded systems need to<br />
have access to floating-point<br />
functionality without sacrificing<br />
performance or memory efficiency<br />
To stay competitive, developers of<br />
precision and high-performance embedded<br />
systems need to have access to floatingpoint<br />
functionality without sacrificing<br />
performance or memory efficiency.<br />
through the use of a softwarebased<br />
floating-point library or<br />
having to adapt the code by hand<br />
to a fixed-point implementation.<br />
With the <strong>STM32</strong> F4 MCU<br />
architecture, developers have<br />
20
<strong>STM32</strong> <strong>Journal</strong><br />
the option of bringing floatingpoint<br />
efficiency to an extensive<br />
range of low-cost embedded<br />
applications. The <strong>STM32</strong> F4<br />
integrates a floating-point unit<br />
(FPU) to execute these operations<br />
natively in hardware. The FPU is<br />
fully compliant with the IEEE.754<br />
standard and has its own 32-<br />
bit single-precision registers to<br />
handle operands and results.<br />
These registers can be viewed<br />
as double-word registers to<br />
enable more efficient load and<br />
store operations. The context<br />
of the FPU can be saved to<br />
the CPU stack using several<br />
methods based on the application<br />
architecture and whether registers<br />
need to be preserved or not.<br />
The FPU supports the five different<br />
classes of numbers defined by<br />
the 754 standard—normalized,<br />
denormalized, zeros, infinites,<br />
and NaNs (Not-a-Number). It also<br />
supports the five exceptions of the<br />
standard—overflow, underflow,<br />
inexact, divide by zero and invalid<br />
operation—allowing applications<br />
to handle operations such as trying<br />
to compute the square root of a<br />
negative number (i.e., resulting in<br />
NaN + invalid operation exception).<br />
Exceptions are “untrapped”,<br />
meaning that the FPU will return<br />
the result as specified by the 754<br />
standard and raise an exception<br />
float function1(float number1, float number2)<br />
{<br />
float temp1, temp2;<br />
}<br />
temp1 = number1 + number2;<br />
temp2 = number1/temp1;<br />
return temp2;<br />
# float function1(float number1, float number2)<br />
# {<br />
# float temp1, temp2;<br />
#<br />
# temp1 = number1 + number2;<br />
VADD.F32 S1,S0,S1<br />
# temp2 = number1/temp1;<br />
VDIV.F32 S0,S0,S1<br />
#<br />
# return temp2;<br />
BX LR<br />
# }<br />
Figure 1 There is a significant reduction in code size when an integrated FPU is available (code on left) than when one is not (code on right).<br />
flag. If needed, developers can also<br />
use the <strong>STM32</strong> F4 floating-point<br />
global interrupt to address the issue.<br />
The integrated FPU of the<br />
<strong>STM32</strong> F4 offers a number<br />
of advantages to embedded<br />
designers:<br />
〉〉 Access to the more useful<br />
range and precision that<br />
floating-point brings<br />
〉〉 Reduced coding complexity by<br />
being able to work with numbers<br />
in a more natural format<br />
FPU assembly code generation<br />
1 assembly instruction<br />
〉〉 Greater throughput compared to<br />
software floating-point libraries.<br />
〉〉 Accelerated application<br />
development as C code<br />
generated by high-level<br />
tools can be used without<br />
modification or wrappers<br />
〉〉 Smaller code footprint since<br />
instructions that used to be<br />
multiple lines of code in software<br />
libraries are now implemented<br />
with a single instruction<br />
# float function1(float number1, float number2)<br />
# {<br />
PUSH {R4,LR}<br />
MOVS R4,R0<br />
MOVS R0,R1<br />
MOVS R1,R4<br />
BL __aeabi_fadd<br />
MOVS R1,R0<br />
MOVS R0,R4<br />
BL __aeabi_fdiv<br />
POP {R4,PC}<br />
Call Soft-FPU<br />
〉〉 Simplified debugging as macro<br />
calls in floating-point libraries<br />
are eliminated<br />
Effectively, the <strong>STM32</strong> F4’s FPU<br />
reverses the value proposition<br />
between fixed- and floating-point<br />
for many MCU-based designs.<br />
Seamless Integration<br />
Figure 1 shows the difference<br />
between the assembly code<br />
generated when an FPU<br />
is available on an MCU as<br />
22
<strong>STM32</strong> <strong>Journal</strong><br />
void GenerateJulia_fpu(uint16_t size_x, uint16_t<br />
size_y, uint16_t offset_x, uint16_t offset_y, uint16_t<br />
zoom, uint8_t * buffer)<br />
{<br />
float tmp1, tmp2;<br />
float num_real, num_img;<br />
float radius;<br />
uint8_t i;<br />
uint16_t x,y;<br />
for (y=0; y
<strong>STM32</strong> <strong>Journal</strong><br />
Frame Zoom Duration<br />
with FPU<br />
Duration<br />
without<br />
FPU (microlib)<br />
Ratio<br />
Duration<br />
without<br />
FPU (no<br />
microlib)<br />
Ratio<br />
0 120 222 3783 17.04 2597 11.70<br />
1 110 194 3276 16.89 2243 11.56<br />
2 100 167 2794 16.73 1906 11.41<br />
3 150 298 5156 17.30 3558 11.94<br />
4 200 312 5412 17.35 3740 11.99<br />
5 275 296 5124 17.31 3540 11.96<br />
6 350 284 4905 17.27 3389 11.93<br />
7 450 289 4989 17.26 3448 11.93<br />
8 600 273 4705 17.23 3251 11.91<br />
9 800 267 4592 17.20 3173 11.88<br />
10 1000 261 4485 17.18 3100 11.88<br />
11 1200 255 4374 17.15 3023 11.85<br />
12 1500 242 4138 17.10 2860 11.82<br />
13 2000 210 3555 16.93 2455 11.69<br />
14 1500 242 4138 17.10 2860 11.82<br />
15 1200 255 4374 17.15 3023 11.85<br />
16 1000 261 4485 17.18 3100 11.88<br />
17 800 267 4592 17.20 3173 11.88<br />
18 600 273 4705 17.23 3251 11.91<br />
19 450 289 4989 17.26 3448 11.93<br />
20 350 284 4905 17.27 3389 11.93<br />
21 275 296 5123 17.31 3540 11.96<br />
22 200 312 5412 17.35 3740 11.99<br />
23 150 298 5156 17.30 3558 11.94<br />
24 100 167 2794 16.73 1906 11.41<br />
25 110 194 3276 16.89 2243 11.56<br />
Figure 3 This table shows the time spent for the calculation of the Julia set using different<br />
zooming factors. The presence of an FPU yields an increase in performance on<br />
the order of 11.5 to 17 faster with no modification to code required.<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
FIR filter execution time (using CMSIS DSP library)<br />
32-bit float<br />
no FPU<br />
10X improvement<br />
Best compromise<br />
Development time vs.<br />
performance<br />
implement the wide variety<br />
of algorithms embedded<br />
applications require. Each<br />
accelerates different types<br />
of processing and together<br />
they complement each other<br />
to enable the most optimized<br />
implementation based on<br />
performance, memory, and ease<br />
of programming.<br />
The ease of migrating an<br />
existing application to<br />
seamlessly take advantage<br />
of the <strong>STM32</strong> F4’s integrated<br />
FPU capabilities is important<br />
32-bit float<br />
FPU<br />
17.9X improvement<br />
Best performance<br />
requires effort for proper<br />
data management<br />
16-bit fixed-point<br />
SIMD optimized<br />
Figure 4 The relative time it takes to execute a FIR filter is 10X better when utilizing the <strong>STM32</strong><br />
F4's FPU. If more headroom is needed, the use of the 16-bit fixed-point SIMD (single<br />
instruction, multiple data) optimized instructions that are part of the DSP capabilities<br />
of the <strong>STM32</strong> F4 increase performance by 17.9X.<br />
as well. Consider an <strong>STM32</strong> F2<br />
application executing a floatingpoint<br />
FIR filter based on the<br />
CMSIS DSP library available<br />
through ARM. Figure 4 shows<br />
the relative time it takes to<br />
perform the FIR filter 100 times<br />
on an <strong>STM32</strong> F2 with no FPU or<br />
DSP capabilities using a purely<br />
software implementation to<br />
make the necessary floatingpoint<br />
calculations.<br />
When the same code is compiled<br />
for the <strong>STM32</strong> F4 to leverage<br />
the FPU in hardware, there is<br />
24
<strong>STM32</strong> <strong>Journal</strong><br />
Figure 5 A top-level view of the Julia set.<br />
an immediate 10X improvement<br />
in relative performance (i.e.,<br />
factoring out the impact of<br />
clock speed). Just by changing<br />
processors and activating<br />
the FPU, there is a significant<br />
performance advantage. This<br />
one change to the design is<br />
enough to yield a tremendous<br />
amount of headroom for<br />
additional tasks, depending upon<br />
the application.<br />
If more headroom is needed,<br />
the performance of the FIR<br />
filter can be further improved<br />
through the use of the 16-<br />
bit fixed-point SIMD (single<br />
instruction, multiple data)<br />
optimized instructions that are<br />
part of the DSP capabilities of<br />
the <strong>STM32</strong> F4. For example,<br />
when using the SIMD<br />
optimized FIR algorithm of the<br />
CMSIS library from ARM, the<br />
performance improvement is<br />
17.9X, again not considering<br />
clock speed.<br />
Unparalleled Flexibility<br />
Traditionally, when designing a<br />
DSP-based system, developers<br />
have to choose between fixedand<br />
floating-point architectures.<br />
Floating-point comes at a price<br />
premium, and manufacturers<br />
have had to balance this<br />
added cost against the extra<br />
complexity in application<br />
development that comes when<br />
using fixed-point. While the<br />
FPU is an optional component<br />
of the ARM Cortex-M4<br />
architecture, ST has designed<br />
its <strong>STM32</strong> F4 family so that<br />
every device offers an FPU. In<br />
this way, developers always<br />
have access to floating-point<br />
performance and precision<br />
without compromise.<br />
The bottom line is that the<br />
<strong>STM32</strong> F4 provides developers<br />
with a flexible, high-performance<br />
architecture that offers the realtime<br />
responsiveness of an MCU<br />
with the precise floating-point<br />
and digital signal processing<br />
capabilities that today’s<br />
embedded designs need. The<br />
ability of the <strong>STM32</strong> F4’s FPU<br />
to perform fast mathematical<br />
computations on C-based float<br />
data is a key benefit for many<br />
application tasks that require<br />
precision, including loop control,<br />
audio processing, sensor signal<br />
conditioning, digital media<br />
decoding, and digital filtering, to<br />
name just a few.<br />
The availability of an FPU<br />
speeds complex algorithm<br />
development, all the way from<br />
high-level design tools down to<br />
software generation. Hardware<br />
native support for floating-point<br />
operations simplifies coding and<br />
substantially accelerates product<br />
development by enabling the<br />
most efficient implementation for<br />
any mathematical calculation.<br />
Code that has been generated<br />
by such tools to be used directly<br />
by the FPU will offer the highest<br />
level of performance.<br />
The faster processing an FPU<br />
brings to applications offers<br />
either more headroom to<br />
support new functionality or<br />
faster time-to-sleep for powersensitive<br />
applications. It also<br />
enables developers to introduce<br />
more complex processing and<br />
functionality to applications<br />
than was previously possible<br />
with traditional MCUs. As a<br />
result, when implementing a<br />
mathematical algorithm on an<br />
<strong>STM32</strong> F4, developers never have<br />
to choose between performance<br />
and development time.<br />
25
<strong>STM32</strong> <strong>Journal</strong><br />
Achieving Ultra-Low-Power Efficiency<br />
for Portable Medical Devices<br />
By Jean J. Labrosse, Founder, CEO and President, Micriµm<br />
Jim Lombard, Application Engineer, STMicroelectronics<br />
The availability of low-cost, fullfeatured<br />
microcontrollers has<br />
created a revolution in the health<br />
industry leading to equipment<br />
migrating from the hospital and<br />
into patients’ homes. Designing<br />
these portable medical devices<br />
presents new challenges for<br />
engineers, such as implementing<br />
precision analog processing<br />
requiring complex calculations<br />
and creating a system that is<br />
simple and comfortable to operate<br />
even for physically-challenged<br />
users. In addition, these devices<br />
have to be able to run as long as<br />
possible without having to change<br />
or recharge batteries, even after<br />
sitting on a warehouse shelf for up<br />
to 18 months. To meet the unique<br />
requirements of portable medical<br />
devices, developers need an<br />
advanced processor architecture<br />
that combines performance,<br />
ultra-low-power process<br />
technology, low-cost, and<br />
efficient power management and<br />
communications capabilities.<br />
Designing for Medical<br />
Applications<br />
Next-generation medical<br />
equipment is evolving along<br />
two primary paths: sports and<br />
personal health. Both markets<br />
require innovation that enables<br />
portable devices to collect more<br />
run-time data and complete<br />
advanced calculations to create<br />
a comprehensive profile of a<br />
person’s current condition. Within<br />
personal health, developers<br />
also need to understand the<br />
special requirements of emerging<br />
disposable devices.<br />
The portable medical device<br />
market is extremely strong.<br />
Consider that with 8.3% of the<br />
US population suffering from<br />
diabetes, a portable glucose<br />
meter is an essential tool for<br />
individuals who need to monitor<br />
their glucose for conceivably the<br />
rest of their lives. Similarly, an<br />
electrocardiogram (ECG) monitor<br />
enables people at risk for any<br />
number of conditions to record<br />
their ECG waveform at home. In<br />
addition to eliminating the need<br />
for multiple office visits, homebased<br />
testing gives doctors a<br />
more comprehensive patient<br />
history to work with.<br />
Portable handheld medical<br />
devices have a stringent set<br />
of requirements (see Figure 1.)<br />
Given that prospective users can<br />
be from all walks of life, devices<br />
must be as simple to use as<br />
possible, with limited or no setup<br />
required. They also need to be<br />
comfortable to use and operable<br />
even by physically-challenged<br />
users. In general, the display<br />
must be large for ease of reading<br />
and utilize a minimum number of<br />
buttons to avoid confusing users.<br />
Ideally, as many functions as<br />
possible need to be automated<br />
so that users don’t have to be<br />
trained how to use the device.<br />
If the device has a touchscreen,<br />
the GUI must be intuitive<br />
and have a limited number of<br />
〉〉 Easy and simple to operate<br />
〉〉 Large display<br />
〉〉 Minimal number of large<br />
operator buttons<br />
〉〉 Batteries are easy to<br />
change, easy to recharge,<br />
or are sufficient to last<br />
for the operating life of<br />
the device<br />
〉〉 Safe and accurate<br />
operation<br />
〉〉 Low cost<br />
〉〉 Low power<br />
〉〉 Simple connectivity<br />
〉〉 Audio feedback<br />
Figure 1 Common Product Requirements for<br />
Portable Medical Devices<br />
operating modes. Both low cost<br />
and low power consumption<br />
are critical as well, and devices<br />
need to be able to run as long<br />
as possible without having to<br />
change or recharge the battery.<br />
26
<strong>STM32</strong> <strong>Journal</strong><br />
An important trend in medical<br />
applications is the use of<br />
disposable devices. Highvolume<br />
devices such as heartrate<br />
monitors can provide<br />
medical professionals with<br />
important data that is useful in<br />
identifying issues before they<br />
become full-blown problems;<br />
e.g., by having a patient track<br />
his or her heart rate for a few<br />
weeks after an operation, a<br />
doctor can verify the patient’s<br />
successful recovery.<br />
The use of disposable devices<br />
offers many benefits. Rather than<br />
require patients to purchase a<br />
monitoring device designed to<br />
operate for years, the hospital<br />
can provide a “disposable”<br />
version that can perform the<br />
task reliably for weeks at a<br />
significantly lower cost. This<br />
strategy also enables hospitals<br />
to leverage innovations in<br />
technology faster.<br />
Designing a medical device<br />
for limited use, however,<br />
substantially shifts the design<br />
mindset. For example, device<br />
cost becomes significantly more<br />
important when the revenue<br />
stream from consumables is<br />
weeks rather than years. Devices<br />
also have to be ready to operate<br />
out-of-the-box, so parameters<br />
such as which test strips are<br />
going to be used need to be<br />
preprogrammed into devices by<br />
manufacturers. Power efficiency<br />
becomes more critical as well.<br />
Even though they will only be<br />
used for a handful of weeks,<br />
devices may first sit on the shelf<br />
for up to 18 months. During this<br />
time, the device is in a low power<br />
mode with the real-time clock<br />
running. With an ultra-low-power<br />
MCU, it is possible to achieve<br />
this without requiring the user to<br />
change batteries.<br />
The <strong>STM32</strong> Architecture<br />
With the <strong>STM32</strong> architecture,<br />
ST offers developers a variety<br />
of options for balancing cost,<br />
power, and performance.<br />
With its smaller die, reduced<br />
instruction set, and smaller<br />
memory footprint, the <strong>STM32</strong><br />
F0 provides excellent power<br />
efficiency at a very low cost.<br />
This is an ideal MCU for<br />
applications that don’t need<br />
to operate longer than six<br />
months or that use rechargeable<br />
batteries. For applications where<br />
power is tantamount, such as<br />
devices operating on a coin cell,<br />
the <strong>STM32</strong> L1 is optimized for<br />
ultra-low power performance.<br />
For devices needing more<br />
processing capabilities and<br />
27
<strong>STM32</strong> <strong>Journal</strong><br />
connectivity, the <strong>STM32</strong> F1,<br />
<strong>STM32</strong> F2, and <strong>STM32</strong> F4<br />
offer an increasing range of<br />
capabilities.<br />
The <strong>STM32</strong> F0 is based on the<br />
ARM Cortex-M0 core. While<br />
other manufacturers also offer<br />
MCUs based on this core,<br />
ST has integrated its “<strong>STM32</strong><br />
DNA” into the <strong>STM32</strong> F0 to<br />
create an MCU that provides<br />
efficient data processing with<br />
the key peripherals MCUbased<br />
applications require.<br />
The <strong>STM32</strong> F0 also offers DMA<br />
capabilities to accelerate data<br />
processing and enable the<br />
lowest power operation even<br />
when sampling the ADC at<br />
a high data rate. With MCUs<br />
that have a lower level of<br />
integration, developers have to<br />
make compromises, such as<br />
settling for an 8-bit ADC rather<br />
than having access to the 12-<br />
bit ADC of the <strong>STM32</strong> F0.<br />
For complex medical systems<br />
that involve extensive<br />
computations, power<br />
consumption can be reduced<br />
by introducing the <strong>STM32</strong> F0<br />
as a second processor. When<br />
a system is in a low power<br />
mode, for example, it must still<br />
manage the UI and incoming<br />
data over its interfaces. Using<br />
<strong>STM32</strong> L15x<br />
System<br />
Power supply<br />
Internal regulator<br />
POR/PDR/PVD/BOR<br />
Xtal oscillators<br />
32 kHz + 1 ~24 MHz<br />
Internal RC oscillators<br />
37 kHz + 16 MHz<br />
Internal multispeed<br />
ULP RC oscillator<br />
64 kHz to 4 MHz<br />
PLL<br />
Clock control<br />
RTC/AWU<br />
2x watchdogs<br />
(independent and window)<br />
37/51/83/109/115 I/Os<br />
Cyclic redundancy<br />
check (CRC)<br />
Voltage scaling 3 modes<br />
Control<br />
6 to 8x 16-bit timer<br />
1x 32-bit timer<br />
Abbreviations:<br />
AWU: Auto wakeup from halt<br />
BOR: Brown out reset<br />
I 2 C: Inter integrated circuit<br />
ARM Cortex-M3 CPU<br />
32 MHz<br />
Nested vector<br />
interrupt<br />
controller (NVIC)<br />
JTAG/SW debug<br />
Embedded Trace<br />
Macrocell (ETM)<br />
Memory protection<br />
unit (MPU)<br />
AHB bus matrix<br />
Up to 12-channel DMA<br />
Touch sensing<br />
Analog I/O groups<br />
Up to 39 touchkeys<br />
Display<br />
LCD driver<br />
(8x40 / 4x44)<br />
Encryption<br />
AES (128 bits)*<br />
PDR: Power down reset<br />
POR: Power on reset<br />
PVD: Programmable voltage detector<br />
32- to 384-Kbyte<br />
Flash memory, dual bank, RWW<br />
10- to 48-Kbyte SRAM<br />
Up to 128-byte backup data<br />
4- to 12-byte EEPROM<br />
Boot ROM<br />
Connectivity<br />
FSMC<br />
USB 2.0 FS<br />
3 to 5x USART<br />
2 to 3x SPI<br />
2x I 2 C<br />
SDIO<br />
Analog<br />
2x 12-bit DAC<br />
12-bit ADC<br />
Up to 40 channels<br />
2x comparators<br />
3x op-amps<br />
Temperature sensor<br />
Note: * <strong>STM32</strong>L16x only<br />
RTC: Real time clock<br />
SPI: Serial peripheral interface<br />
USART: Universal sync/async receiver transmitter<br />
Figure 2 The <strong>STM32</strong> L1 family is built on ST’s 130 nm ultra-low leakage process technology that minimizes node capacitance<br />
for ultra-low-power operation and efficiency.<br />
28
<strong>STM32</strong> <strong>Journal</strong><br />
the primary host processor to<br />
perform these operations will<br />
use substantially more power<br />
than having a more powerefficient<br />
<strong>STM32</strong> F0 dedicated<br />
to these tasks.<br />
For the highest power<br />
efficiency, ST offers its <strong>STM32</strong><br />
L1 family (see Figure 2). Based<br />
on the ARM Cortex-M3, the<br />
<strong>STM32</strong> L1 is built on ST’s 130<br />
nm ultra-low leakage process<br />
technology that minimizes node<br />
capacitance. This, combined<br />
with how the ARM Cortex<br />
RISC architecture reduces the<br />
overall number of active nodes,<br />
creates a powerful combination<br />
for ultra-low-power operation.<br />
Providing 32-bit performance at<br />
up to 32 MHz, <strong>STM32</strong> L1 MCUs<br />
offer up to 384 K Flash and 48<br />
K SRAM, as well as from 48 to<br />
144 pins to support high I/O<br />
applications.<br />
With the <strong>STM32</strong> F1, <strong>STM32</strong><br />
F2, and <strong>STM32</strong> F4 families,<br />
developers have access to a<br />
tremendous range of processing<br />
capacity and connectivity.<br />
The <strong>STM32</strong> F1 is based on<br />
a Cortex-M3 core running at<br />
up to 72 MHz. For higher-end<br />
applications, the <strong>STM32</strong> F4<br />
offers a 168 MHz Cortex-M4<br />
core with both integrated<br />
floating-point (FPU) and<br />
digital signal processing (DSP)<br />
capabilities. In addition, these<br />
MCUs have multiple DMAs<br />
and a multi-layer bus matrix to<br />
offload the CPU and maximize<br />
data throughput.<br />
Power Efficiency<br />
There are many ways the <strong>STM32</strong><br />
architecture optimizes power<br />
consumption. For example,<br />
the <strong>STM32</strong> L1 has an internal<br />
LDO regulator that can be<br />
programmed to three discrete<br />
voltage levels. Since energy<br />
consumed is proportional to the<br />
square of the core voltage, even<br />
a very small change in voltage<br />
can produce dramatic results.<br />
This allows developers to keep<br />
the <strong>STM32</strong> L1 running at 1.2V<br />
and only switch to a higher<br />
performance range when a<br />
specific task needs to run faster.<br />
Integrated DSP capabilities<br />
in the <strong>STM32</strong> L1 also speed<br />
complex algorithm processing.<br />
This is in addition to the<br />
power/performance advantage<br />
32-bit architectures have<br />
compared to an 8- or 16-bit<br />
MCUs. With its wider bus, the<br />
<strong>STM32</strong> can perform tasks such<br />
as moving memory or complex<br />
mathematics much faster<br />
and more efficiently as well.<br />
The <strong>STM32</strong> L1's high level of integration<br />
enables it to provide a single-chip solution,<br />
with the exception of a few analog signal<br />
conditioning circuits, for many portable medical<br />
applications, including glucose meters, heartrate<br />
monitors, and pulse oximeters.<br />
The <strong>STM32</strong> L1 also provides<br />
multiple, industry-leading lowpower<br />
modes (see Figure 3),<br />
dynamic voltage scaling, and<br />
the ability to run out of RAM<br />
and disable the Flash controller<br />
Run<br />
230µA/MHz<br />
From<br />
FLASH<br />
Range 3<br />
Run<br />
186µA/MHz<br />
From<br />
RAM<br />
Range 3<br />
Low Power<br />
Run<br />
@ 32 KHz<br />
to conserve power. It also<br />
integrates a programmable<br />
voltage detector that can<br />
monitor battery voltage using<br />
less current than an ADC.<br />
<strong>STM32</strong>L15x Ultra-Low Power Consumption<br />
CPU on<br />
Peripherals activated<br />
RAM & context preserved<br />
Backup registers preserved<br />
10.4µA<br />
6.1µA<br />
Low Power<br />
Sleep<br />
@ 32 KHz<br />
1.30µA/<br />
0.43µA<br />
Stop with<br />
or without<br />
RTC<br />
1.0µA/<br />
0.27µA<br />
Standby<br />
with or<br />
without<br />
RTC<br />
Figure 3 The <strong>STM32</strong> L1 provides multiple, industry-leading low-power modes to maximize<br />
device operating power efficiency.<br />
29
<strong>STM32</strong> <strong>Journal</strong><br />
The <strong>STM32</strong> L1’s high level of<br />
integration enables it to provide<br />
a single-chip solution, with<br />
the exception of a few analog<br />
signal conditioning circuits,<br />
for many portable medical<br />
applications, including glucose<br />
meters, heart-rate monitors,<br />
and pulse oximeters. For<br />
example, in a glucose meter<br />
(see Figure 4), the <strong>STM32</strong><br />
L1 can automatically wake<br />
from sleep when a test strip<br />
is inserted into the device. Its<br />
2-channel DAC can be used to<br />
generate a strip bias, enable<br />
strip calibration, and output<br />
audible instructions and test<br />
results. Timers accurately<br />
control the ADC trigger for<br />
sample measurement, onboard<br />
comparators measure correct<br />
sample staging, and the<br />
temperature sensor logs the<br />
ambient temperature for use in<br />
calculating results. Developers<br />
can also use the integrated<br />
comparators to create a powerefficient<br />
analog watchdog that<br />
monitors an input and wakes<br />
the <strong>STM32</strong> L1 when either the<br />
upper or lower threshold is<br />
exceeded (see Figure 5).<br />
A key manner in which the<br />
<strong>STM32</strong> architecture conserves<br />
power is through the ability of<br />
the CPU to sleep during ADC<br />
Glucose<br />
Test Strip<br />
Vout<br />
Cal<br />
Reference<br />
Temp<br />
Sensor<br />
Strip Detect<br />
<strong>STM32</strong> L1<br />
DMA Controllers<br />
Timers<br />
PWM<br />
1 x 12-bit ADC<br />
26 channels/<br />
1Msps<br />
2 x Comparators<br />
2 x 12-bit DAC<br />
sample capture. To achieve this,<br />
the entire analog data capture<br />
chain needs to be completely<br />
automated, with no need for<br />
CPU intervention at any point<br />
after the chain is initiated.<br />
Specifically, after the CPU<br />
configures and starts the auto<br />
Data EEPROM 4KB<br />
Up to 16KB SRAM<br />
64KB-128KB<br />
Flash Memory<br />
CORTEX TM -M3<br />
CPU 32 MHz<br />
With MPU<br />
8x40<br />
Segment LCD<br />
2 x I 2 C<br />
GPIOs<br />
sample capture, it enters sleep<br />
mode. Between samples, the<br />
ADC enters an automatic shut<br />
down mode until a timer triggers<br />
it. The ADC captures the current<br />
sample, using the DMA to store<br />
the data in SRAM, and then<br />
shuts down again. This process<br />
Power Supply<br />
Reg 1.8V/1.5V/1.2V<br />
POR/PDR/PVD/BOR<br />
RTC/AWU +<br />
80B Backup Regs<br />
USB 2.0 FS<br />
Figure 4 The <strong>STM32</strong> L1’s high level of integration enables it to provide a single-chip solution, with the exception of a few<br />
analog signal conditioning circuits, for many portable medical applications such as blood glucose meters.<br />
2 AA battery<br />
Internet<br />
Host PC<br />
LCD Panel<br />
8 x 40<br />
USER Buttons<br />
Voice<br />
repeats until the entire capture<br />
sequence is complete. The<br />
DMA will then wake the CPU to<br />
process the samples that have<br />
been stored in SRAM.<br />
The power savings can be<br />
significant using the unique<br />
ability of the <strong>STM32</strong> L1 to<br />
30
<strong>STM32</strong> <strong>Journal</strong><br />
Window comparator<br />
configuration switch<br />
Figure 5 Developers can use the integrated comparators to create a power-efficient analog<br />
watchdog that monitors an input and wakes the <strong>STM32</strong> L1 when either the upper or<br />
lower threshold is exceeded.<br />
CPU running at<br />
Input voltage<br />
Upper threshold:<br />
V REFINT<br />
= 1.22V<br />
Lower threshold:<br />
Multiple source<br />
COMP1<br />
–<br />
+<br />
COMP2<br />
+<br />
–<br />
ADC current consumption (in µA)<br />
(ADC in continuous mode, delay of 15 cycles between channel<br />
conversions, conversions last 1µs (16 cycles at 16MHz)<br />
ADC is running<br />
in Normal Mode **<br />
ADC is On<br />
in Power Saving Mode***<br />
16MHz (from HSI) 1453µA 630µA<br />
4MHz (from MSI)* 1453µA 445µA<br />
1MHz (from MSI)* 1000µA 258µA<br />
32kHz (from MSI)* 900µA 150µA<br />
* : HSI is On ** : PDI=PDD=0 *** : PDI=PDD=1<br />
Figure 6 The automatic shut down mode of the <strong>STM32</strong> L1 turns off the ADC for the majority of<br />
time between samples to provide dramatic savings in power consumption compared<br />
to when the ADC must be left on continuously.<br />
power down the ADC between<br />
samples. Consider an ECG<br />
monitor where the sampling rate<br />
is 1 KHz or one sample every 1<br />
ms. The ADC of the <strong>STM32</strong> L1<br />
consumes at most 900 µA and<br />
has a capture time of only 1 µs<br />
for a power duty cycle of 0.1%.<br />
If the <strong>STM32</strong> L1 were a typical<br />
MCU, the ADC would consume<br />
full power even when it is not<br />
capturing samples. Thus, if<br />
the system were using a 1 ms<br />
measurement window, the ADC<br />
would draw 900 µA during a<br />
power duty cycle of 100%.<br />
With the automatic power down<br />
control mode of the <strong>STM32</strong><br />
L1, however, the ADC is able<br />
to power down for the majority<br />
of time between samples.<br />
Figure 6 shows actual power<br />
consumption numbers for<br />
various operating conditions. The<br />
difference in power consumption<br />
is dramatic between when the<br />
ADC is left on continuously<br />
versus using automatic power<br />
down control saving mode,<br />
dropping in the lowest power<br />
consumption example from 900<br />
µA to 150 µA at 32 kHz.<br />
Note that the <strong>STM32</strong> L1 has<br />
the flexibility to scale the CPU<br />
clock while keeping a constant<br />
16 MHz clock available for ADC<br />
conversion. This means that the<br />
CPU can run at a frequency of 32<br />
kHz while the ADC maintains a 1<br />
MSPS sampling rate while only<br />
drawing 150 µA.<br />
Precision Analog<br />
Many medical applications require<br />
the ability to read very small<br />
signals. For example, an ECG<br />
monitor has to capture the very<br />
small electrical signal generated by<br />
the heart muscle while removing<br />
the 50 or 60 Hz noise signals<br />
that are common in electrodes<br />
attached to the human body.<br />
MCUs typically offer an 8- or 10-<br />
bit ADC. With its 12-bit ADCs,<br />
<strong>STM32</strong> MCUs enable developers<br />
to achieve greater precision when<br />
measuring the weak signals<br />
typical of the human body. In<br />
addition, with sample rates up to<br />
1 Msample/second, developers<br />
have the headroom to access<br />
even more accuracy through<br />
oversampling. This enables<br />
devices to operate in noisy<br />
environments and applications<br />
that could not be served by an<br />
MCU with only an 8-bit ADC.<br />
Power-Efficient<br />
Communications<br />
The <strong>STM32</strong> architecture offers<br />
a wide range of interfaces,<br />
31
<strong>STM32</strong> <strong>Journal</strong><br />
including Ethernet (available<br />
on the <strong>STM32</strong> F1, F2, and F4),<br />
USB (available on the <strong>STM32</strong><br />
L1, F1, F2, and F4), SPI, SDIO,<br />
and CAN. For applications that<br />
only transmit data a few times<br />
a day, interface current has<br />
minimal impact on operating<br />
life. However, for an application<br />
like a heart-rate monitor which<br />
may be constantly transmitting<br />
data, the efficiency of the<br />
<strong>STM32</strong> L1’s peripherals can<br />
provide substantial benefit.<br />
With integrated USB and/or<br />
Ethernet, developers can support<br />
connectivity at the lowest cost.<br />
Some devices may need<br />
to support both USB and<br />
Ethernet. By abstracting the<br />
communications interface, the<br />
application won’t need to know<br />
which type of link it is using.<br />
Developers can achieve this by<br />
creating a data in/out function<br />
that determines the actual data<br />
interface to be used in any<br />
particular operating circumstance<br />
and applies the appropriate<br />
communications APIs.<br />
Developers can also take steps<br />
at the application level to ensure<br />
power efficient transmissions.<br />
For example, collecting data for<br />
a short time and then bursting it<br />
will conserve power compared to<br />
continuous transmission.<br />
For devices that will connect<br />
directly to the Internet and<br />
transmit personal health data,<br />
security is essential. This<br />
will impact power efficiency<br />
since security handshaking<br />
will increase the time the<br />
communications link must be<br />
active. The <strong>STM32</strong> F2 and<br />
<strong>STM32</strong> F4 minimize transmit<br />
processing latency with an<br />
integrated security engine that<br />
accelerates AES 256, 3DES,<br />
SHA-1, MD-5, and HMAC<br />
processing.<br />
Using an off-the-shelf stack<br />
substantially speeds time-tomarket.<br />
For example, the 40<br />
standard APIs in the Micriµm<br />
µC/TCP-IP stack encapsulate<br />
more than 150,000 lines of code.<br />
This stack has been designed<br />
to minimize memory usage<br />
and so further reduce power<br />
consumption. Extensions to the<br />
stack are also available, such as<br />
SSL support.<br />
Micriµm is also introducing<br />
IPv6 over IPv4 support to its<br />
µC/TCP-IP stack. IPv6 is an<br />
important enabling technology<br />
for medical applications. With the<br />
virtually unlimited number of IPv6<br />
addresses available, individual<br />
devices can be assigned a<br />
unique IP address. This enables<br />
devices and patients to be<br />
identified automatically and<br />
without interaction from users<br />
who may not be tech-savvy<br />
enough to register a device<br />
online. It also future-proofs<br />
devices to be able to be used<br />
in the years to come as IPv4 is<br />
phased out.<br />
To further simplify device<br />
connectivity, ST offers the<br />
ST Healthcare Library that is<br />
certified for the USB Personal<br />
Health Device specification.<br />
This spec supports different<br />
subclasses of common medical<br />
devices and enables doctors<br />
and patients to connect<br />
seamlessly over the Internet.<br />
The library brings remote<br />
monitoring capabilities to<br />
every <strong>STM32</strong>-based design,<br />
as well as accelerates timeto-market.<br />
In addition, the<br />
library components have<br />
been tested and certified,<br />
enabling developers to bypass<br />
several certification tests with<br />
confidence that the stack<br />
will perform as required. The<br />
stack footprint is very small<br />
at just 9.2 K Flash and 2.9 K<br />
RAM for a typical thermometer<br />
application. A thermometer<br />
reference demo is available on<br />
the <strong>STM32</strong> L1 evaluation board.<br />
Real-time Integration<br />
To enable a portable device to<br />
support a wireless interface<br />
with only 3AAA batteries, it is<br />
critical to be able to intelligently<br />
manage power. Using an<br />
OS like Linux, for example,<br />
significantly increases startup<br />
time from low power modes.<br />
In contrast, tight integration<br />
between the stack and a realtime<br />
kernel ensures that the<br />
radio can be turned on and off<br />
as quickly as possible.<br />
Running a real-time kernel also<br />
helps simplify power mode<br />
management. Without a kernel,<br />
the typical embedded application<br />
manages tasks using a main<br />
loop. Effectively, the loop polls<br />
a series of tasks. Even if the<br />
device is not doing anything,<br />
the CPU still needs to poll tasks<br />
continuously to see if any of<br />
them need attention.<br />
Because developers never<br />
know which tasks will be<br />
executed during any loop<br />
iteration, it becomes difficult to<br />
determine the optimal power<br />
saving mode in which to place<br />
the system after any particular<br />
task has completed since<br />
each mode offers different<br />
responsiveness. In addition,<br />
as the main loop increases in<br />
32
<strong>STM32</strong> <strong>Journal</strong><br />
size, managing power becomes<br />
more convoluted.<br />
With a real-time kernel, the<br />
CPU returns to the IDLE task<br />
whenever none of the tasks<br />
need attention. Because no<br />
tasks are currently running, it is<br />
much more straightforward to<br />
manage power. When the IDLE<br />
task is interrupted, the system<br />
can quickly be powered up. This<br />
gives developers the ability to<br />
take the system context into<br />
account when selecting a power<br />
mode and clock speed that<br />
provides the appropriate level<br />
of responsiveness (i.e., how<br />
fast the system can wake to an<br />
active state).<br />
Using the IDLE function to<br />
manage power also simplifies<br />
any migration of applications<br />
between <strong>STM32</strong> processors.<br />
For example, tuning an<br />
application based on the <strong>STM32</strong><br />
L1 for power may require a<br />
different approach compared<br />
to an <strong>STM32</strong> F0 or <strong>STM32</strong><br />
F4. By consolidating power<br />
management in a single routine,<br />
developers can be sure to be<br />
able to optimize efficiency<br />
without having to completely<br />
rewrite the application.<br />
Designing your own real-time<br />
kernel is an option. However,<br />
the use of a commercial<br />
real-time kernel like µC/OS-<br />
III from Micriµm provides a<br />
wide range of multitasking<br />
capabilities that make it easier<br />
to add new functionality to<br />
a system without impacting<br />
system reliability. For example,<br />
when a new feature like a<br />
GUI is introduced, it can be<br />
given a low priority. Thus, a<br />
critical system task such as<br />
driving the pump on a blood<br />
pressure monitor is always<br />
executed when it needs to be<br />
and not disrupted because<br />
a USB packet arrived at a<br />
critical moment. In this way,<br />
developers can extend the<br />
UI of a device, as well as its<br />
connectivity options, without<br />
negatively impacting reliability.<br />
A real-time kernel also<br />
simplifies design and power<br />
management as more complex<br />
functions are added to a<br />
system. For example, while<br />
executing a time-intensive<br />
function without a kernel,<br />
developers need to manually<br />
check within the function if<br />
shorter, higher priority tasks<br />
need attention. With µC/OS-<br />
III, multitasking and priority<br />
management is automatically<br />
managed, and developers can<br />
break their system into separate<br />
and distinct pieces without<br />
having to manually manage<br />
how these pieces interact. This<br />
also enables developers to<br />
more accurately measure power<br />
consumption during early<br />
design stages as well as verify<br />
where in their applications<br />
power spikes may be occurring.<br />
Certification is a concern for<br />
many manufacturers of medical<br />
devices as well. Given that it is<br />
the complete system that must<br />
be certified, the Micriµm µC/OS-<br />
III kernel and µC/TCP-IP stack<br />
are provided with source code<br />
and are fully documented so<br />
as to facilitate the certification<br />
process. Using the Micriµm<br />
kernel and stack can simplify<br />
certification compared to an<br />
in-house kernel implementation<br />
that does not have the maturity<br />
and years of industry testing<br />
commercial code offers.<br />
Rather than base a design on a<br />
specialized MCU that can serve<br />
only one price point, the <strong>STM32</strong><br />
F0 and <strong>STM32</strong> L1, along with<br />
the rest of the <strong>STM32</strong> families,<br />
provide a flexible architecture<br />
that can enable developers to<br />
leverage their code and other<br />
IP across an entire product<br />
line. In addition, code- and pincompatibility<br />
between devices<br />
allows for a nearly seamless<br />
migration between processors.<br />
Designing with the <strong>STM32</strong><br />
architecture is also faster<br />
compared to 8- or 16-bit MCUs.<br />
Specifically, 8- and 16-bit MCUs<br />
tend to have a limited tool chain,<br />
especially when the MCU vendor<br />
is the only source for tools.<br />
In contrast, the <strong>STM32</strong><br />
architecture is based on the ARM<br />
Cortex-M architecture. As such,<br />
it has the largest ecosystem<br />
available for any MCU. These<br />
tools provide more run-time<br />
information and advanced<br />
debugging capabilities to ensure<br />
system robustness and power<br />
efficiency, as well as verify device<br />
operation to speed certification.<br />
Creating portable medical<br />
devices requires an MCU<br />
architecture that provides<br />
both performance and power<br />
efficiency. With its tightly<br />
integrated architecture, <strong>STM32</strong><br />
MCUs offer a single-chip<br />
solution that balances cost<br />
and ultra-low-power operation.<br />
Combined with the powerful<br />
capabilities of production-ready<br />
software such as Micriµm’s µC/<br />
OS-III kernel and µC/TCP-IP<br />
stack, developers can bring<br />
reliable medical products to<br />
market faster.<br />
33
<strong>STM32</strong> <strong>Journal</strong><br />
Accelerating Time-to-Market<br />
Through the ARM Cortex-M Ecosystem<br />
By Reinhard Keil, Director of MCU Tools, ARM Germany GmbH<br />
Shawn Prestridge, Senior Field Applications Engineer, IAR Systems<br />
Laurent Desseignes, Embedded Software Marketing Manager, STMicroelectronics<br />
Before ARM, the MCU market<br />
was fragmented with a multitude<br />
of vendors offering proprietary<br />
architectures and tool chains.<br />
Now, with open architectures<br />
like the Cortex-M, developers<br />
are able to base products on<br />
an entire architecture rather<br />
than have to tie designs to<br />
a specific MCU. The result<br />
is that designs won’t be tied<br />
to an outdated architecture<br />
in just a few years. Rather,<br />
designs will always be current<br />
with the evolving Cortex-M<br />
architecture and extensible<br />
across a wide range of devices<br />
that can leverage the same<br />
application code. For example,<br />
ARM has recently shown its<br />
commitment to keeping the<br />
Cortex-M architecture relevant to<br />
embedded design as well as on<br />
the leading edge of technology<br />
by introducing the Cortex-M0<br />
core. This core extends the<br />
Cortex-M architecture into<br />
traditionally 8- and 16-bit<br />
applications while offering 32-bit<br />
performance.<br />
The <strong>STM32</strong> Ecosystem<br />
ST is the MCU industry leader<br />
with the broadest portfolio<br />
of Cortex-M-based devices<br />
from a single company. The<br />
<strong>STM32</strong> product line offers an<br />
extraordinary variety of options<br />
with more than 300 devices<br />
all based on the Cortex-M<br />
cores (M0, M3, and M4) to give<br />
developers unparalleled flexibility<br />
in finding the optimal MCU for<br />
their application (see Figure 1).<br />
Part of ST’s strength can be<br />
found in the software and tools<br />
available to developers for<br />
the <strong>STM32</strong> architecture. ST<br />
provides a range of evaluation<br />
boards, including the <strong>STM32</strong><br />
Discovery Kits, which enable<br />
developers to evaluate each<br />
of the <strong>STM32</strong> MCU product<br />
lines. The Discovery kits are<br />
the cheapest and quickest way<br />
to discover the <strong>STM32</strong> family<br />
while the evaluation boards<br />
are complete development<br />
platforms providing access to<br />
all peripherals in each <strong>STM32</strong><br />
product line and extension<br />
Cortex-M4<br />
Cortex-M3<br />
Cortex-M0<br />
High-Performance DSP MCUs<br />
168 MHz Cortex-M4<br />
Up to 2-Mbyte Flash<br />
Up to 256 -Kbyte SRAM<br />
Mainstream MCUs<br />
24 to 72 MHz Cortex-M3<br />
16-Kbyte to 1-Mbyte Flash<br />
Up to 96-Kbyte SRAM<br />
<strong>STM32</strong> F1<br />
Entry-level MCUs<br />
48 MHz Cortex-M0<br />
16- to 128-Kbyte Flash<br />
Up to 20-Kbyte SRAM<br />
<strong>STM32</strong> F2<br />
headers to make it easy to<br />
connect a daughterboard<br />
or wrapping board for the<br />
target specific application.<br />
Development tools include<br />
in-circuit debuggers and<br />
eprogrammers, reference<br />
<strong>STM32</strong> F4<br />
32-bit/DSC applications<br />
High-performance MCUs<br />
120 MHz Cortex-M3<br />
256-Kbyte to1-Mbyte Flash<br />
Up to 128-Kbyte SRAM<br />
<strong>STM32</strong> L1<br />
Ultra-low-power MCUs<br />
32 MHz Cortex-M3<br />
32-to 384-Kbyte Flash<br />
Up to 48-Kbyte SRAM<br />
16/32-bit applications<br />
<strong>STM32</strong> F0<br />
8/16-bit applications<br />
Highest Performance<br />
MCU with DSP and FPU<br />
Performance Efficiency<br />
Feature-rich<br />
Connectivity<br />
Full-featured 32-bit MCU<br />
Budget Price<br />
Figure 1 The <strong>STM32</strong> product line offers more than 300 devices all based on the Cortex-M to<br />
give developers unparalleled flexibility in finding the optimal MCU for their application.<br />
34
AD329 Keil <strong>STM32</strong> <strong>Journal</strong>(2).pdf 1 23/03/2012 16:39<br />
<strong>STM32</strong> <strong>Journal</strong><br />
C<br />
M<br />
Y<br />
CM<br />
MY<br />
CY<br />
CMY<br />
K<br />
Leading Embedded<br />
Development Tools<br />
The complete development environment<br />
for ARM ® processor-based<br />
microcontroller designs<br />
1-800-348-8051<br />
www.keil.com<br />
designs, and application notes.<br />
ST also offers a wide variety<br />
of firmware libraries, many of<br />
which are available at no cost,<br />
for implementing motor control,<br />
DMA, timers, interfaces, audio,<br />
LCDs, communications, GUIs,<br />
all standard peripherals, touch<br />
sensing, and in-application<br />
programming, to name a few.<br />
ST also recognizes the benefit of<br />
working with other leaders in the<br />
industry to expand the tools and<br />
software available to developers.<br />
The <strong>STM32</strong> architecture is<br />
based on the ARM Cortex-M<br />
architecture which has the most<br />
comprehensive ecosystem of<br />
tools, software, and engineering<br />
resources of any microcontroller<br />
in the world. ST has worked<br />
directly with a wide range of<br />
partners to create offerings<br />
that specifically accelerate the<br />
development of applications<br />
based on the <strong>STM32</strong> architecture,<br />
thereby minimizing product<br />
development investment and<br />
speeding time-to-market.<br />
The global ecosystem available for<br />
the <strong>STM32</strong> architecture also offers<br />
many cost savings for developers:<br />
〉〉 The open architecture of the<br />
ARM Cortex-M cores gives<br />
developers access to the largest<br />
MCU ecosystem in the world<br />
〉〉 The vast diversity of the<br />
Cortex-M ecosystem ensures<br />
that developers can find<br />
the tools, software, and<br />
middleware they need to<br />
speed the design of nearly any<br />
application<br />
〉〉 With so many OEMs<br />
standardizing on ARM, the<br />
pool of engineers already<br />
experienced with the Cortex-M<br />
architecture is large, making it<br />
easier to find developers. Many<br />
colleges are teaching to the<br />
ARM architecture as well<br />
〉〉 Architectural and tool chain<br />
familiarity reduce the developer<br />
learning curve and speed timeto-market<br />
The ecosystem for the <strong>STM32</strong><br />
consists of a complete range of<br />
tools and software specifically<br />
designed to speed design. The<br />
comprehensive tool chain is<br />
just the beginning. Developers<br />
also have access to the ARM<br />
CMSIS libraries, ST peripheral<br />
drivers, extensive middleware,<br />
and production-ready<br />
application software.<br />
Developer’s Foundation:<br />
The Tool Chain<br />
There are several key<br />
components comprising an<br />
efficient development tool chain:<br />
35
<strong>STM32</strong> <strong>Journal</strong><br />
Compiler: Each compiler offers<br />
different strengths. Key factors<br />
to consider are code size and<br />
performance (typically measured<br />
in Coremarks). In addition,<br />
compilers like IAR Embedded<br />
Workbench automatically use<br />
the <strong>STM32</strong> architecture to<br />
its best advantage, both in<br />
terms of leveraging silicon and<br />
middleware.<br />
Assembler: Assemblers have<br />
become more of a specialty<br />
tool over the years given the<br />
performance and time-to-market<br />
advantages of coding in C.<br />
Today, if an assembler is used<br />
at all, it will likely be to code an<br />
application’s critical control loop.<br />
Linker: The linker pulls<br />
together all the code required<br />
to create the final application.<br />
It needs to supply program<br />
and data information to the<br />
debugger.<br />
Debugger: The debugger is<br />
the tool most frequently used<br />
by developers. It provides<br />
visibility into systems to<br />
verify program operation and<br />
troubleshoot issues.<br />
IDE: The Integrated<br />
Development Environment<br />
(IDE) defines the workspace<br />
and design flow. Some tool<br />
Cortex-M Processor<br />
Debug<br />
Interface<br />
Run Control<br />
Breakpoint<br />
Unit<br />
Memory<br />
Access Unit<br />
Cortex Debug 10-pin or<br />
ARM JTAG 20-pin Connector<br />
chains support the option of<br />
replacing the IDE with one of a<br />
developer’s own choice.<br />
Given how much time a<br />
developer will spend with the<br />
debugger, using a debugger<br />
that is well-integrated with<br />
the compiler and middleware<br />
can substantially simplify<br />
troubleshooting. For example, a<br />
debugger that is RTOS-aware,<br />
USB-aware, TCP/IP-aware,<br />
etc. will facilitate faster problem<br />
Cortex-M<br />
CPU Core<br />
Serial Wire Viewer<br />
JTAG or Serial Wire Debug<br />
identification and resolution.<br />
The usefulness of the debugger,<br />
however, is limited by the ability<br />
of the MCU to provide visibility<br />
into its real-time operation.<br />
MCUs without integrated<br />
debugging capabilities will<br />
require developers to rely<br />
upon outdated and intrusive<br />
debugging techniques such as<br />
instrumenting code that can<br />
interfere materially with run-time<br />
execution.<br />
ETM Instruction Trace<br />
(optional)<br />
ITM Instrumentation<br />
Trace<br />
DWT Data Watchpoint &<br />
Trace Unit<br />
CPU & Interrupt Events<br />
Trace Point<br />
Interface<br />
4-pin Trace<br />
Cortex Debug + ETM<br />
20-pin Connector (optional)<br />
Trace (ETM, ITM, DWT) not available on Cortex-M0<br />
Figure 2 The <strong>STM32</strong> family of MCUs utilizes ARM’s Coresight Debug Technology to provide leading-edge troubleshooting<br />
capabilities including streaming trace, full code coverage testing, timing analysis, and performance profiling.<br />
The <strong>STM32</strong> family of MCUs<br />
utilizes ARM’s Coresight Debug<br />
Technology to provide leadingedge<br />
troubleshooting capabilities<br />
(see Figure 2). For example, with<br />
streaming trace, developers can<br />
capture trace data limited only<br />
by the size of a PC’s hard drive.<br />
This enables full code coverage<br />
testing, timing analysis, and<br />
performance profiling.<br />
Because Coresight is<br />
implemented in hardware, it<br />
36
<strong>STM32</strong> <strong>Journal</strong><br />
CMSIS provides a consistent software<br />
interface that simplifies software reuse across<br />
the entire <strong>STM32</strong> product line of 300+ devices.<br />
Each of the CMSIS libraries reduces the<br />
learning curve behind using new software and<br />
tools to significantly accelerate design and<br />
lower development costs.<br />
Figure 3 The I-jet in-circuit debugging probe from IAR Systems offers seamless integration<br />
with IAR Embedded Workbench and provides a wide range of real-time debugging<br />
capabilities, including power debugging with ~200 µA resolution at 200 kHz, for finetuning<br />
performance while achieving the highest power efficiency.<br />
imposes no software overhead<br />
on the CPU and requires no<br />
external emulation hardware.<br />
In addition, breakpoints can be<br />
set and variable values changed<br />
while the system is running.<br />
Extensive trace records<br />
capture the program counter<br />
and read/write accesses while<br />
noting time delays and lost<br />
cycles. Statistical information<br />
about exceptions, interrupts,<br />
and events is also collected.<br />
Signals the MCU is processing<br />
can be monitored graphically,<br />
and Coresight provides<br />
instrumented trace capabilities<br />
enabling developers to write<br />
data to specific memory<br />
locations at run-time.<br />
With proprietary architectures,<br />
typically only one or two debug<br />
platforms are available. With<br />
<strong>STM32</strong> MCUs, developers can<br />
chose from more than a dozen<br />
vendors. In addition, the options<br />
include low-end tools for those<br />
with simple applications and<br />
tight budgets as well as high-end<br />
tools that provide sophisticated<br />
capabilities for accelerating<br />
development and simplifying<br />
verification and certification<br />
processes.<br />
For example, IAR Systems offers<br />
its new I-jet in-circuit debugging<br />
probe for use with the <strong>STM32</strong><br />
architecture (see Figure 3). It<br />
offers seamless integration with<br />
IAR Embedded Workbench and<br />
provides a wide range of realtime<br />
debugging capabilities. The<br />
I-jet is also capable of measuring<br />
target power consumption with<br />
~200 µA resolution at 200 kHz.<br />
This enables developers to<br />
debug applications in terms of<br />
power to fine-tune performance<br />
while achieving the highest<br />
power efficiency.<br />
CMSIS:<br />
Faster Time-to-Market<br />
Part of the value of the <strong>STM32</strong><br />
architecture is the Cortex<br />
Microcontroller Software<br />
Interface Standard (CMSIS).<br />
Created by ARM, CMSIS<br />
provides a consistent software<br />
interface that simplifies<br />
software reuse across the<br />
entire <strong>STM32</strong> product line of<br />
300+ devices. Each of the<br />
CMSIS libraries reduces the<br />
learning curve behind using<br />
new software and tools to<br />
significantly accelerate design<br />
and lower development costs.<br />
Development tools and<br />
middleware that are CMSIScompliant<br />
can accelerate<br />
product design in multiple<br />
ways. For example, developers<br />
can create a proof-of-concept<br />
using a high-performance<br />
<strong>STM32</strong> MCU with the maximum<br />
memory. This provides full<br />
flexibility during early design<br />
when the core product is still<br />
being defined. Once the design<br />
is proven, CMSIS makes<br />
it easier for designs to be<br />
transparently implemented on a<br />
different, more optimal <strong>STM32</strong><br />
MCU, even one based on a<br />
different Cortex-M core.<br />
37
<strong>STM32</strong> <strong>Journal</strong><br />
CMSIS comprises several parts<br />
(see Figure 4):<br />
〉〉 CMSIS-CORE is a standard API<br />
for all Cortex-M-based devices<br />
and abstracts key functionality<br />
〉〉 CMSIS-RTOS provides an API<br />
for RTOS integration with other<br />
tools and middleware<br />
〉〉 CMSIS-DSP is a collection of<br />
61 digital signal processing<br />
functions that take advantage<br />
of the <strong>STM32</strong>’s integrated<br />
floating-point and DSP<br />
capabilities. The library is<br />
designed for block processing<br />
to reduce interrupt overhead,<br />
optimize DMA utilization, and<br />
facilitate easy integration with<br />
an RTOS or kernel<br />
〉〉 CMSIS-SVD provides a<br />
System View Description for all<br />
peripherals to abstract actual<br />
peripheral implementations<br />
from application code and<br />
enable peripheral-awareness<br />
for debuggers<br />
With the release of CMSIS 3.0,<br />
ARM is standardizing on an<br />
RTOS API that makes it easier<br />
for middleware components from<br />
different vendors to interoperate.<br />
This will facilitate the propagation<br />
of application-specific<br />
components, such as a TCP/<br />
IP stack, as well as enable the<br />
USER<br />
CMSIS<br />
MCU<br />
CMSIS-DSP<br />
DSP-Library<br />
CMSIS-CORE<br />
SIMD<br />
Cortex-M4<br />
Cortex<br />
CPU<br />
development tool chain to be<br />
RTOS-aware.<br />
The CMSIS DSP library gives<br />
developers a substantial head<br />
start on the development of<br />
complex algorithms. It also<br />
allows developers to create<br />
complex application code that<br />
can be carried between MCUs<br />
Application Code<br />
CMSIS-RTOS<br />
API<br />
Real Time Kernel<br />
(3 rd Party)<br />
Core Peripheral Functions<br />
Peripheral Register & Interrupt Vector Definitions<br />
SysTick<br />
RTOS Kernel<br />
Timer<br />
NVIC<br />
Nested Vectored<br />
Interrupt Controller<br />
DEBUG<br />
+ Trace<br />
that may not have hardwarebased<br />
DSP instructions. This<br />
means that developers can write<br />
code for the <strong>STM32</strong> F2 and,<br />
when they compile it for the<br />
<strong>STM32</strong> F4, automatically get the<br />
full advantage of the <strong>STM32</strong> F4’s<br />
DSP capabilities.<br />
Device Peripheral<br />
Functions<br />
(Silicon Vendor)<br />
Other<br />
Peripherals<br />
Debugger<br />
(3 rd Party)<br />
System View<br />
Description (XML)<br />
CMSIS-<br />
SVD<br />
Figure 4 The ARM Cortex Microcontroller Software Interface Standard (CMSIS) provides a consistent software interface that<br />
accelerates product development as well as simplifies software reuse across the entire <strong>STM32</strong> product line of 300+ devices.<br />
Peripheral<br />
View<br />
Accelerated Peripheral<br />
Configuration and<br />
Driver Design<br />
One time-consuming part<br />
of product development is<br />
configuring an MCU’s peripherals<br />
and then writing drivers for<br />
the application to utilize them.<br />
To facilitate faster application<br />
38
<strong>STM32</strong> <strong>Journal</strong><br />
design, ST supplies a complete<br />
peripheral library for use with all<br />
<strong>STM32</strong> MCUs. Instead of writing<br />
to peripherals directly and locking<br />
code to a particular device,<br />
developers use APIs that abstract<br />
the use of peripherals. This<br />
approach offers several benefits:<br />
〉〉 Faster time-to-market as<br />
developers do not need to<br />
write peripherals drivers or<br />
debug them<br />
〉〉 More reliable code since the<br />
peripheral drivers supplied are<br />
mature and industry-proven<br />
〉〉 Easy migration between<br />
<strong>STM32</strong> devices, even ones<br />
based on different Cortex-M<br />
cores. Because the peripherals,<br />
pin-outs, and code are the<br />
same or similar among the<br />
<strong>STM32</strong> family, migrating<br />
between devices really is just a<br />
recompile<br />
〉〉 Simple customization as C<br />
source code is provided for all<br />
peripherals<br />
Middleware: Production-<br />
Ready Software Solutions<br />
One of the factors that<br />
substantially accelerates product<br />
development and time-tomarket<br />
is the extensive range<br />
of middleware available for the<br />
<strong>STM32</strong> architecture, including:<br />
〉〉 Real-time kernels and<br />
operating systems (RTOS)<br />
〉〉 Stacks for all major<br />
communications protocols,<br />
including USB, TCP/IP, and<br />
Bluetooth<br />
〉〉 Advanced design and modeling<br />
tools such as those used for<br />
signal processing<br />
〉〉 Display/GUI solutions<br />
〉〉 Touch sensing<br />
〉〉 Java<br />
The <strong>STM32</strong> architecture has<br />
wide industry support, with<br />
ST’s partners providing a<br />
multitude of optimized solutions<br />
for a diversity of applications,<br />
including audio, industrial, and<br />
motor control, to name a few.<br />
For example, developers have<br />
the option of selecting a fullfeatured<br />
RTOS for applications<br />
needing high reliability or a more<br />
simple kernel for a low-cost<br />
consumer device.<br />
To accelerate design,<br />
however, the development<br />
tool chain needs to be able<br />
to integrate well with<br />
middleware offerings. IAR<br />
Embedded Workbench, for<br />
example, has hooks into the<br />
CSpy software debugger that<br />
enables RTOS vendors to<br />
create drivers that make the<br />
debugger kernel-aware.<br />
Effectively, this approach<br />
maximizes the flow of information<br />
available to developers to<br />
speed problem resolution.<br />
Since the kernel/RTOS is the<br />
primary interface between the<br />
application and middleware such<br />
as communications stacks, it<br />
is important that the debugger<br />
understand how middleware is<br />
being managed by the system.<br />
When a debugger is RTOS-aware<br />
and USB-aware, for example, it<br />
can show developers directly into<br />
frame buffers and display protocol<br />
information in its native format.<br />
Without visibility into the RTOS,<br />
developers would have to<br />
manually collect this information<br />
themselves, introducing<br />
potential error and delay to the<br />
design process. Rather than<br />
having to manually resolve<br />
headers to find the payload, the<br />
debugger can interpret packets<br />
and present them in a format<br />
that allows developers to debug<br />
at the link level. Complete<br />
visibility into the system also<br />
enables developers to more<br />
thoroughly verify program<br />
execution for both ensuring<br />
reliable operation and for<br />
certification processes.<br />
One of the factors that substantially<br />
accelerates product development and time-tomarket<br />
is the extensive range of middleware<br />
available for the <strong>STM32</strong> architecture.<br />
IAR Embedded Workbench<br />
directly supports many<br />
middleware companies<br />
and nearly every hardware<br />
debugger. This enables<br />
developers not only to<br />
seamlessly take advantage<br />
of components from different<br />
companies but also select<br />
best-in-class components<br />
for their applications. IAR<br />
Systems is the world’s oldest<br />
embedded C compiler company<br />
and understands the needs<br />
of developers to be able to<br />
choose the elements of the<br />
ARM ecosystem that are best<br />
for their application.<br />
39
<strong>STM32</strong> <strong>Journal</strong><br />
Alternatively, Keil’s MDK-ARM<br />
development system offers a<br />
comprehensive approach to<br />
embedded design by bringing<br />
together an integrated set of<br />
development tools, middleware,<br />
and debug hardware.<br />
MDK-ARM is available in<br />
four offerings, from Lite to<br />
Professional, to match the<br />
varying needs of development<br />
teams (see Figure 5). Some of<br />
its components include:<br />
〉〉 Version 5 of the MDK compiler<br />
has recently been released<br />
and offers best-in-class<br />
performance with full support<br />
for the CMSIS libraries<br />
〉〉 The Keil µVision IDE provides<br />
a wide range of features from<br />
trace view with source code<br />
synchronization and editing to<br />
detailed peripheral debugging<br />
to comprehensive project<br />
management<br />
〉〉 RTX is a full-featured RTOS<br />
supporting multiple scheduling<br />
options, low interrupt latency,<br />
unlimited tasks, and a memory<br />
footprint of less than 5 KB. Full<br />
source code is available<br />
〉〉 Extensive middleware libraries<br />
optimized for the Cortex-M<br />
architecture enable developers<br />
to quickly add support for<br />
CAN, USB Host/Device, TCP/<br />
IP, GUI development, and<br />
file system management<br />
to embedded applications.<br />
Intuitive configuration wizards<br />
simplify the use of middleware<br />
〉〉 The ULINK2 and ULINKpro<br />
debuggers support high-speed<br />
streaming trace for nonintrusive,<br />
real-time visibility into<br />
embedded systems<br />
〉〉 ETM Trace enables full<br />
code coverage testing and<br />
performance analysis<br />
Frameworks and Libraries<br />
Part of the <strong>STM32</strong> value<br />
proposition is the availability of<br />
a tremendous amount of offthe-shelf<br />
software. Because<br />
of the huge volumes of ARM<br />
processors shipped, even niche<br />
markets are large enough to<br />
incent third parties to create<br />
specialized tools. Since the<br />
market can support software that<br />
is highly application-specific,<br />
developers have a greater<br />
chance of finding exactly what<br />
they need already available.<br />
Some of the application-specific<br />
libraries offered for the <strong>STM32</strong><br />
include audio, motor control, and<br />
industrial control.<br />
Using a library with an <strong>STM32</strong><br />
MCU is a straightforward process:<br />
ARM C/C++ Compiler<br />
CAN Interface<br />
USB Host<br />
TCP/IP Networking Suite<br />
just include it into the application<br />
workspace. Depending upon the<br />
compiler, each function will be<br />
recognized as a keyword and<br />
right-clicking will provide fast<br />
access to the function’s definition<br />
and parameters. Many compilers<br />
and linkers will also automatically<br />
pare out those parts of the library<br />
you are not using to conserve<br />
code space. Many libraries are<br />
provided as binary files created<br />
for a specific <strong>STM32</strong> MCU and<br />
tool chain. Others are offered with<br />
source code to enable developers<br />
to customize code.<br />
MDK-ARM Professional<br />
µVision<br />
Project Manager, Editor & Debugger<br />
RTX Real-Time Operating System<br />
File System<br />
USB Device<br />
GUI Library<br />
Figure 5 Keil’s MDK-ARM development system offers a comprehensive approach to<br />
embedded design by bringing together an integrated set of development tools,<br />
middleware, and debug hardware to match the varying needs of development teams.<br />
The availability of so much<br />
production-ready software<br />
accelerates design by enabling<br />
developers to work with<br />
code that provides the base<br />
functionality for their application.<br />
Rather than having to learn<br />
the <strong>STM32</strong> architecture from<br />
scratch, developers can let the<br />
tool chain abstract low-level<br />
implementation details.<br />
For example, IAR Embedded<br />
Workbench offers over 3000<br />
example projects. With these,<br />
developers can load a project<br />
onto an evaluation board such<br />
40
<strong>STM32</strong> <strong>Journal</strong><br />
as the <strong>STM32</strong> F4 evaluation<br />
board and see how a complete<br />
system has been implemented.<br />
These projects provide a working<br />
framework for a wide variety of<br />
applications ranging from simply<br />
blinking an LED to operating<br />
the LCD screen to creating a<br />
lightweight IP stack to implement<br />
a web server. Effectively, these<br />
projects give developers a<br />
template upon which to base their<br />
own design. The startup code<br />
gets developers into their design<br />
immediately, rather than forcing<br />
them to spend time figuring out<br />
how to initialize the MCU.<br />
Combined with the CMSIS<br />
peripheral library, many of the<br />
projects can be transferred<br />
between devices with minimal<br />
reworking. Thus, developers can<br />
take projects built for another<br />
MCU altogether and quickly pull<br />
the core of the application over to<br />
the <strong>STM32</strong> device of their choice.<br />
The 80%+ Advantage<br />
With the extensive variety of<br />
software, middleware, and<br />
reference designs available,<br />
developers can expect to<br />
substantially jumpstart their<br />
designs. Depending upon the<br />
application, it is not uncommon<br />
to find a combination of off-theshelf<br />
software and middleware<br />
within the <strong>STM32</strong> ecosystem that<br />
provides 80% or more of the code<br />
required for a design. This enables<br />
developers to focus their design<br />
efforts on the last 20%, leading to<br />
faster time-to-market as well as a<br />
highly differentiated product.<br />
This 80% figure is not unrealistic.<br />
For example, with the availability<br />
of off-the-shelf USB and TCP/<br />
IP stacks, a communications link<br />
truly becomes a component that<br />
can easily be added to a system<br />
rather than a subsystem that has<br />
to be designed and fine-tuned. In<br />
addition, there are many support<br />
resources available to assist<br />
developers in customizing code for<br />
applications that have a specific<br />
need. ST’s support engineers and<br />
Field Application Engineers (FAE),<br />
for example, can provide valuable<br />
insight in overcoming a variety of<br />
design challenges.<br />
With the rise of social<br />
networking, peer support has<br />
become another important<br />
element of an MCU architecture.<br />
The size of the ARM community<br />
is tremendous, and there are<br />
a variety of forums where<br />
engineers can provide peer-topeer<br />
support. This support can<br />
be product-based, such as the<br />
support communities hosted<br />
by Keil, IAR Systems, and ST<br />
for their own products. Free<br />
software, such as FreeRTOS,<br />
tends to have its own support<br />
network as well.<br />
Many independent forums have<br />
also come into being in the<br />
last decade to assist engineers<br />
by hosting discussions about<br />
silicon manufacturers and their<br />
middleware partners. Engineers<br />
can search for specific MCUs<br />
and tools to hear how useful<br />
others in the industry find them.<br />
They can also post issues and<br />
share solutions.<br />
Software the Way<br />
You Want It<br />
ST embraces the benefits that an<br />
extensive ecosystem offers its<br />
customers. By giving developers<br />
the choice of software developed<br />
by ST, by a commercial thirdparty,<br />
or through open source,<br />
ST ensures that developers<br />
have the broadest variety of<br />
options available to match their<br />
application needs.<br />
For example, with its<br />
cryptographic library, ST<br />
provides developers with<br />
security capabilities free-ofcharge<br />
that have been 100%<br />
optimized for the <strong>STM32</strong><br />
architecture. ST also offers a<br />
number of other applicationspecific<br />
libraries,including<br />
USB, motor control, and<br />
HDMI Consumer Electronics<br />
Communication (CEC).<br />
ST has focused on working with<br />
numerous partners to ensure that<br />
there are many solutions available<br />
for the <strong>STM32</strong> architecture. Their<br />
goal is that developers should be<br />
able to find whatever software<br />
and tools they need and, in many<br />
cases, have a choice between<br />
several vendors as well as<br />
between open and commercial<br />
versions. For example,<br />
developers can select from a<br />
range of MP3 codecs ranging<br />
from the open source Helix, ST’s<br />
codec, and commercial codecs<br />
with extensions including mixers,<br />
equalization, etc. This allows<br />
developers to determine for<br />
themselves the balance between<br />
cost, support, efficiency, and<br />
time-to-market.<br />
The <strong>STM32</strong> family of 32-bit<br />
MCUs offers developers the<br />
most extensive and complete<br />
portfolio of Cortex-M-based<br />
devices. Combined with the<br />
comprehensive ecosystem<br />
around the <strong>STM32</strong> architecture,<br />
developers are able to simplify<br />
development while bringing<br />
products to market more quickly<br />
and at the lowest cost.<br />
41
<strong>STM32</strong> <strong>Journal</strong><br />
Introducing a Graphical User Interface<br />
to Your Embedded Application<br />
By Gérard Bouvet, Marketing and Sales Manager, GeeseWare<br />
Dominique Jugnon, Microcontrollers Development Tools Manager, STMicroelectronics<br />
Smart phones and similar<br />
devices have redefined the way<br />
we interact with technology and<br />
created a new set of consumer<br />
expectations as to how a<br />
Graphical User Interface (GUI)<br />
should appear. The complex<br />
layout of Windows that was<br />
thought to define how a GUI<br />
should be has given way to<br />
streamlined interfaces that can<br />
give users access to all of a<br />
system’s functionality on the<br />
smaller displays often used with<br />
handheld devices.<br />
Manufacturers have recognized<br />
that touch-based GUIs can<br />
bring value to a wide range of<br />
embedded applications beyond<br />
consumer electronics, including<br />
industrial automation, appliances,<br />
meters, HVAC, security, access<br />
control, military, automotive,<br />
and infotainment devices. For<br />
example, traditional push-button<br />
interfaces have mechanical<br />
parts which can fail. Moving to a<br />
capacitive touch sensing display<br />
enables not only a more robust<br />
interface but one that offers more<br />
flexibility and extensibility.<br />
32-bit Processing<br />
Adding a GUI to a system,<br />
however, is not like adding a<br />
few more buttons or controls to<br />
a device’s front panel. With the<br />
nearly ubiquitous availability of<br />
touchscreens in mobile handsets,<br />
consumers have come to expect<br />
electronic devices of all types to<br />
have a sophisticated user interface<br />
utilizing 3D objects, perceived<br />
depth, animated transitions,<br />
textures, and complex background<br />
lighting. To create an intuitive<br />
interface that adds value as well<br />
as flair to an application, GUIs also<br />
need to support gestures such<br />
as tap, drag, fling, and slide that<br />
consumers are quickly learning<br />
to consider as a natural aspect of<br />
any touch-based interface.<br />
Applications based on 8- and<br />
16-bit processors simply don’t<br />
have the horsepower to handle<br />
even simple graphics. The<br />
availability of high-performance<br />
MCUs like the <strong>STM32</strong> family<br />
that can provide complete GUI<br />
functionality with capacitive<br />
touch sensing on top of the<br />
primary application has been<br />
a key enabling factor in the<br />
proliferation of advanced user<br />
interfaces. For example, the<br />
<strong>STM32</strong>-F0 provides 32-bit<br />
processing at 8- and 16-bit<br />
pricing. For more graphicsintensive<br />
applications, the<br />
<strong>STM32</strong>-F2 and <strong>STM32</strong>-F4<br />
provide sufficient Flash to store<br />
large graphic images. With<br />
devices up to 168 MHz/210<br />
DMIPs, the <strong>STM32</strong> family is<br />
also fast enough to provide the<br />
responsiveness consumers are<br />
used to from GUI-based devices.<br />
However, as the cost of<br />
hardware has dropped, software<br />
complexity has continued to<br />
increase. In fact, application<br />
software has become the leading<br />
development cost in embedded<br />
systems, both in terms of<br />
money and time-to-market.<br />
To maintain their competitive<br />
edge, companies need to be<br />
able to introduce complex<br />
GUI functionality while tightly<br />
controlling software development<br />
cost. Achieving this goal requires<br />
access to a GUI framework,<br />
the ability to define the look<br />
and feel in Java rather than C,<br />
rapid prototyping capabilities<br />
to enable consumers to provide<br />
feedback while target hardware<br />
is still being designed, and tools<br />
optimized for the tight memory<br />
and processing constraints of the<br />
typical embedded application.<br />
GUI Framework<br />
There are two primary phases<br />
to designing a GUI. The first is<br />
the creation of the underlying<br />
software code that provides<br />
basic UI functionality. Once this<br />
GUI framework is in place, the<br />
look-and-feel of the GUI must be<br />
designed. To keep system cost<br />
42
<strong>STM32</strong> <strong>Journal</strong><br />
down, developers must minimize<br />
the expense for both of these<br />
processes as well as eliminate<br />
any unnecessary design delays.<br />
In the past, UIs for embedded<br />
systems were designed<br />
specifically for the hardware on<br />
which they were going to be<br />
run. With increasing pressure to<br />
shorten product design cycles, IP<br />
Java Application<br />
Java Class/API<br />
Wrapper C Java<br />
API C<br />
Driver C<br />
External<br />
Port<br />
Figure 1 Abstracting the hardware through a<br />
Hardware Abstraction Layer (HAL)<br />
frees the GUI framework from<br />
handling specific low-level details,<br />
leading to faster code development<br />
and greater reusability.<br />
reuse has become an important<br />
consideration in UI design.<br />
Ideally, developers need to be<br />
able to carry a UI across different<br />
products using MCUs that may<br />
be from different families as well.<br />
To achieve this, GUI application<br />
code is abstracted above<br />
the hardware. A Hardware<br />
Abstraction Layer (HAL) handles<br />
specific low-level details such<br />
as how graphical data is stored<br />
in memory and transferred to<br />
the display (see Figure 1). By<br />
interacting with the HAL using<br />
APIs, the GUI application code<br />
becomes a framework that can<br />
be ported across an MCU family<br />
with minimal rewriting required.<br />
Creating an extensible GUI<br />
framework is an involved<br />
process. The HAL enables<br />
developers to build the<br />
framework using a language<br />
other than assembly, leading to<br />
faster code development and<br />
greater reusability. Designing the<br />
framework using C, however,<br />
can still require substantial<br />
development resources.<br />
Ideally, rather than design<br />
a framework from scratch,<br />
developers can use off-the-shelf<br />
software to minimize development<br />
investment. With the right tools,<br />
the GUI design cycle can be<br />
shortened from months to weeks.<br />
GWStudio from GeeseWare,<br />
for example, is a Java Framework<br />
with a wide array of preexisting<br />
GUI libraries providing a complete<br />
human-machine interface (HMI)<br />
development environment. Its Java<br />
engine based on IS2T MicroEJ ®<br />
technology is specifically optimized<br />
for embedded applications that<br />
have limited memory, constrained<br />
peripherals, restricted network<br />
connectivity, and low power<br />
consumption requirements.<br />
Java brings many advantages to<br />
GUI-based design compared to<br />
working in C. Java was designed<br />
to facilitate GUI creation with an<br />
emphasis on reuse. In addition, its<br />
flexibility ensures a simple revision<br />
process that enables developers<br />
to quickly implement changes<br />
to existing designs. With the<br />
availability of Java for embedded<br />
applications, developers can<br />
leverage the benefits of Java in<br />
many applications:<br />
〉〉 Improved code portability<br />
and reuse<br />
〉〉 Accelerated development up<br />
to 3-5 times faster than<br />
working in C<br />
〉〉 Equivalent performance to<br />
C-based designs; i.e., less<br />
than 1 ms responsiveness for<br />
machine-to-machine (M2M)<br />
processes (i.e., Ethernet) or<br />
touchscreen latency<br />
〉〉 Greater functionality in a<br />
smaller footprint<br />
〉〉 Higher reliability and<br />
robustness by eliminating<br />
manual management of<br />
memory and exceptions<br />
〉〉 Operating system<br />
independence<br />
〉〉 Large development community<br />
Note that even though the GUI is<br />
written in Java, the main application<br />
can be based on C. This enables<br />
developers to introduce a GUI to an<br />
existing design without having to<br />
rewrite the application.<br />
Even with the framework in<br />
place, however, only half the job<br />
is done. Now the look and feel of<br />
the GUI needs to be designed.<br />
Intuitive Look and Feel<br />
Developing an effective GUI can<br />
be one of the most challenging<br />
aspects of system design. GUI<br />
design involves much more<br />
than simply arranging icons on<br />
a screen. To be intuitive, a user<br />
interface has to anticipate how a<br />
variety of different types of people<br />
are going to use the device.<br />
However, it can be very difficult<br />
to know how an end-customer<br />
is going to use a device from<br />
43
<strong>STM32</strong> <strong>Journal</strong><br />
a developer’s isolated position<br />
behind his or her lab bench.<br />
For example, while it may be<br />
logical to a developer to group<br />
functions based on what part of<br />
the system they impact, users<br />
will interact with devices based<br />
on what they want it to do. If the<br />
function a person wants to use<br />
most frequently is buried under<br />
a series of icons, the overall<br />
experience will be frustrating. The<br />
UI has become the core factor<br />
determining the user experience.<br />
In today’s market where<br />
consumers have become quite<br />
sophisticated, a poorly designed<br />
GUI can mean failure for a product<br />
regardless of its other qualities.<br />
The reality is that developers<br />
don’t always know how<br />
perspective users will try to<br />
interact with a system. Ideally,<br />
the hardware should not come<br />
between users and how they<br />
want to use the device. In<br />
addition, to keep the interface<br />
from being cluttered and difficult<br />
to navigate, the screen needs<br />
to display as little information<br />
as possible while still displaying<br />
enough data to allow the user<br />
to easily and quickly make<br />
decisions. Tappable UI objects/<br />
elements must also be of a<br />
minimum size but yet also<br />
comfortable to select.<br />
The placement of icons and<br />
ordering of GUI elements is, at<br />
least at first, a fairly arbitrary<br />
process. However, a slider may<br />
end up being in an inconvenient<br />
location or be improperly sized<br />
for reasons that couldn’t be<br />
predicted during initial design<br />
stages. This won’t be clear until<br />
users are actually given a chance<br />
to use the interface.<br />
Designing an effective GUI<br />
involves many such intangible<br />
considerations that require direct<br />
feedback. With limited screen real<br />
estate, the UI must be selective<br />
and display only content that is<br />
relevant to the choices a user<br />
is currently considering. The<br />
application’s main function should<br />
be accessible quickly upon startup<br />
and always simple to return to.<br />
The final test for an intuitive GUI<br />
is that it must be obvious to users<br />
how to use it efficiently without a<br />
steep learning curve or more than<br />
a few minutes of training.<br />
It may take extensive testing<br />
with users for developers to<br />
understand how the GUI should<br />
be laid out. This likely won’t<br />
happen after bringing in just a<br />
single focus group but rather<br />
will involve many rounds that<br />
iteratively improve the ease-ofuse<br />
of the interface. The design<br />
schedule needs to take into<br />
44
<strong>STM32</strong> <strong>Journal</strong><br />
consideration that the GUI may<br />
need to be redesigned several<br />
times. The sooner accurate user<br />
feedback can be incorporated<br />
into the GUI design, the more<br />
confidence developers can<br />
have that major changes will<br />
not be required after significant<br />
engineering resources have<br />
been invested in implementing<br />
the design.<br />
GUI Testing<br />
The iterative aspect of GUI design<br />
is an important consideration when<br />
selecting a GUI toolset. The speed<br />
and ease with which developers<br />
can modify an existing GUI will<br />
determine how many design<br />
iterations the schedule will allow<br />
and, consequently, how well the GUI<br />
will capture actual user behaviors.<br />
Any testing process needs to<br />
enable stakeholders and endusers<br />
to provide timely input<br />
into GUI design, preferably<br />
as early in the design cycle<br />
as possible. This means that<br />
developers need access to rapid<br />
prototyping capabilities before<br />
target hardware is available. The<br />
testing process should facilitate<br />
the capture of user behaviors<br />
and generation of use cases.<br />
To achieve this, GUI tools must<br />
accelerate design to shorten the<br />
time between test iterations.<br />
Traditionally, developers have<br />
created simulated environments<br />
for users to test. Often these<br />
“wireframe” simulators are<br />
independent tools that allow<br />
developers to put together a<br />
GUI but not necessarily one that<br />
accurately reflects how the final<br />
product will look or operate. For<br />
example, because the simulator<br />
is running on a high-speed<br />
PC, screen updates can be<br />
near instantaneous. Unless the<br />
simulator is able to emulate the<br />
MCU that will be used in the actual<br />
product, developers will be unable<br />
to verify whether the system is<br />
responsive enough to satisfy users.<br />
In fact, feedback from such testing<br />
may actually mislead developers<br />
and result in launch delays.<br />
To ensure that the simulator<br />
matches how the interface will<br />
operate in production hardware<br />
as accurately as possible, the<br />
simulator needs to emulate the<br />
operation of the target MCU.<br />
Developing an embedded GUI<br />
on a PC capable of accurate<br />
emulation offers several benefits<br />
to developers (see Figure 2). In<br />
addition to speeding testing by<br />
eliminating the need to download<br />
new firmware to a target, the<br />
simulator provides several<br />
analysis capabilities to facilitate<br />
optimization and debugging:<br />
Figure 2 GUI design tools like the IS2T MicroEJ® simulator that is part of GWPack from<br />
GeeseWare emulate the target MCU to ensure consumer testing matches the<br />
operation of the production GUI as closely as possible.<br />
〉〉 Static and run-time analysis of<br />
timing and memory footprint<br />
〉〉 Functional code coverage<br />
〉〉 Task profiling and scheduling<br />
Ideally, testers need to have<br />
as realistic an experience as<br />
possible. This means working<br />
on a small screen the same size<br />
as the one that will be used for<br />
production. When running on a<br />
PC, the user may need to use<br />
a mouse rather than be able to<br />
touch the device screen. Even if<br />
a touchscreen is available, it is<br />
likely the wrong size or a monitor<br />
that the user cannot hold in his<br />
or her hand.<br />
To enable realistic testing,<br />
GeeseWare offers its GWPack<br />
development system. The<br />
GWPack includes a standalone<br />
and small form factor board<br />
based on either the <strong>STM32</strong>-F2<br />
or <strong>STM32</strong>-F4 that is prequalified<br />
and production-ready. With a 4.3”<br />
resistive or capacitive touchscreen,<br />
10 ms response time, and access<br />
in Java to all of the <strong>STM32</strong><br />
architectures peripherals and<br />
interfaces of the pack, the GWPack<br />
gives developers a fully operational<br />
Java platform upon which to carry<br />
applications from proof-of-concept<br />
to production faster than has been<br />
possible before.<br />
45
<strong>STM32</strong> <strong>Journal</strong><br />
A complete framework is<br />
provided as part of GWPack<br />
that allows developers to dive<br />
immediately into interface<br />
development rather than writing<br />
low-level drivers and managing<br />
system resources to develop<br />
the Board Support Package<br />
(BSP). The IS2T MicroEJ ®<br />
simulator for GeeseWare<br />
GWPack further allows<br />
developers to simulate their<br />
designs on a PC while emulating<br />
the operation of an <strong>STM32</strong> MCU.<br />
Key features include:<br />
〉〉 Built-in display tree and events<br />
management<br />
〉〉 Window and clickable area<br />
definition and management<br />
〉〉 Support for single and twofinger<br />
gestures<br />
〉〉 Driving of low-level events to<br />
closest tree nodes<br />
〉〉 Sub-level display and event<br />
management at the node level<br />
〉〉 Fully customizable<br />
〉〉 Support for TCP/IP, UDP, and<br />
HTTP via the Java framework<br />
〉〉 The ability to reference all<br />
<strong>STM32</strong> peripherals directly<br />
in Java<br />
Because events are object<br />
dependent, developers can easily<br />
define—and redefine—interface<br />
operation, thus accelerating the<br />
ability to implement feedback<br />
into designs.<br />
With the GWPack, developers<br />
can begin GUI development<br />
immediately with a low initial<br />
investment. The standalone board<br />
is also available in preproduction<br />
volumes to enable manufacturers<br />
to quickly mock up a large<br />
number of systems that can be<br />
put out into the field to capture<br />
customer feedback while target<br />
hardware is still be defined and<br />
built. Finally, for applications with<br />
volumes under 10,000 production<br />
units, the board is a costeffective,<br />
off-the-shelf alternative<br />
to designing your own.<br />
For volume applications,<br />
GWPack enables<br />
manufacturers to complete a<br />
proof-of-design before investing<br />
in implementing a design in<br />
hardware and software. This<br />
ability to design the GUI in parallel<br />
with hardware can substantially<br />
reduce time-to-market.<br />
Critical Design<br />
Constraints<br />
Applications like industrial control,<br />
metering, home automation,<br />
medical, and smart energy don’t<br />
have the memory or processing<br />
capabilities of smart phones. As<br />
a consequence, they need a GUI<br />
that is specifically designed to<br />
minimize processing requirements<br />
and memory footprint. For<br />
example, to implement a GUI in<br />
an embedded Linux environment<br />
can be quite expensive: 32 MB<br />
RAM + 8 MB Flash for the Linux<br />
OS plus graphics libraries push<br />
Flash requirements up to 50-60<br />
MB, adding on the order of $100<br />
to system cost.<br />
The framework supplied with<br />
GWPack keeps system<br />
memory requirements under<br />
tight control: a full GUI can<br />
fit in just 128 KB RAM and<br />
1 MB Flash. In addition, the<br />
framework does not require<br />
a real-time operating system<br />
(RTOS) or kernel to operate. For<br />
applications with more intensive<br />
graphical requirements, more<br />
Flash may be required.<br />
The <strong>STM32</strong> architecture, with<br />
its 32-bit bus, DSP and FPU<br />
capabilities, and multi-layer bus<br />
fabric supporting simultaneous<br />
data transfers provides more<br />
than enough processing capacity<br />
to handle the GUI, application,<br />
drivers, touchscreen, and<br />
communication ports all with<br />
a single chip. With its broad<br />
portfolio of MCUs based on the<br />
ARM Cortex-M0, M3, and M4<br />
cores, the <strong>STM32</strong> architecture<br />
enables developers to select<br />
a processor with the optimal<br />
blend of performance, memory,<br />
and peripherals for their<br />
application. ST also provides a<br />
variety of tools to accelerate the<br />
development of general-purpose<br />
Java-based systems with the ST<br />
Evaluation Kit.<br />
Touch-based GUIs can bring<br />
substantial value to a wide range<br />
of applications by bringing the<br />
look and feel of products up to<br />
date while also by capturing more<br />
value through the user interface.<br />
In addition, they can improve<br />
system robustness by eliminating<br />
mechanical components while<br />
increasing device flexibility<br />
to modify the UI over time to<br />
introduce new features without<br />
redesigning the enclosure.<br />
With the <strong>STM32</strong> family of MCUs<br />
and GUI design tools such<br />
as those from GeeseWare,<br />
manufacturers are able to<br />
cost-effectively introduce<br />
next-generation GUIs to new<br />
designs as well as to legacy<br />
products built on 8-, 16-, and<br />
32-bit MCUs. With the ability to<br />
design a fully-functional system<br />
in weeks instead of months,<br />
developers can accelerate GUI<br />
design to enable a dramatic<br />
reduction in time-to-market.<br />
46