25949117
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The almanac has always been one of the key factors for success for farmers, ranchers, hunters, and
fishermen. Historical data about past weather patterns, phases of the moon, rain, and drought
measurements were all critical elements used by the authors to provide their readership strong
guidance for the coming year about the best times to plant, harvest, and hunt.
Fast-forward to modern times. One of the best examples of the power, practicality, and tremendous
cost savings of machine learning can be found in the simple example of the U.S. Postal Service,
specifically the ability for machines to accurately perform OCR to successfully interpret the postal
addresses on hundreds of thousands of postal correspondences that are processed every hour. In 2013
alone, the U.S. Postal Service handled more than 158.4 billion pieces of mail. That means that every day,
the Postal Service correctly interprets addresses and zip codes for literally millions of pieces of mail. As
you can imagine, this amount of mail is far too much for humans to process manually.
Back in the early days, the postal sorting process was performed entirely by hand by thousands of
postal workers nationwide. In the late 1980s and early 1990s, the Postal Service started to introduce
early handwriting recognition algorithms and patterns, along with rules-based processing techniques to
help “prefilter” the steady streams of mail.
The problem of character recognition for the Postal Service is actually a very difficult one when you
consider the many different letter formats, shapes, and sizes. Add to that complexity all the different
potential handwriting styles and writing instruments that could be used to address an envelope—from
pens to crayons—and you have a real appreciation for the magnitude of the problem that faced the
Postal Service. Despite all the technological advances, by 1997, only 10 percent of the nation’s mail was
being sorted automatically. Those pieces that were not able to be scanned automatically were routed to
manual processing centers for humans to interpret.
In the late 1990s, the U.S. Postal Service started to address this automation problem as a machine
learning problem, using character recognition examples as data sets for input, along with known results
from the human translations that were performed on the data. Over time, this method provided a
wealth of training data that helped create the first highly accurate OCR prediction models. They
fine-tuned the models by adding character noise reduction algorithms along with random rotations to
increase effectiveness.
Today, the U.S. Postal Service is the world leader in OCR technology, with machines reading nearly
98 percent of all hand-addressed letter mail and 99.5 percent of all machine-printed mail. This is an
amazing achievement, especially when you consider that only 10 percent of the volume was processed
automatically in 1997. The author is happy to note that all letters addressed to “Santa Claus” are still
carefully routed to a processing center in Alaska, where they are manually answered by volunteers.
Here are a few more interesting factoids on just how much impact machine learning has had on
driving efficiency at one of the oldest and largest U.S. government agencies:
523 million: Number of mail pieces processed and delivered each day.
20