DM1901

CASE STUDY: NEW YORK TIMES Dm 

The Times to protect one of their most 

unique assets migrating from steel filing 

cabinets to a cloud-based platform where 

journalists can bring visual storytelling to a 

whole new level." 

Simply storing high-resolution images is 

not enough to create a system that photo 

editors can easily use. A working asset 

management system must allow the users 

to be able to browse and search for photos 

easily. The Times built a processing pipeline 

that stores and processes the photos and 

will use cloud technology to process and 

recognise text, handwriting and other 

details that can be found in the images. 

MACHINE LEARNING ADDS INSIGHTS 

Storing the images is only one half of 

the story. To make an archive like The 

Times' morgue even more accessible and 

useful, it's beneficial to leverage 

additional GCP features. In the case of 

The Times, one of the bigger challenges 

in scanning their photo archive has been 

adding data regarding the contents of 

the images. Google's Cloud Vision API 

helps fill that gap. 

The photo (above) of the old Penn 

Station from The Times gives an example. 

Without additional context, it's not clear 

from the front of the photo what it 

contains, but the back of the photo 

contains a wealth of useful information, 

and the Cloud Vision API can help 

process, store, and read it. When we 

submit the back of the image to the API 

with no additional processing, the Cloud 

Vision API detects the text and recognises 

data such as dates, descriptions ('The 

scene in Pennsylvania Station yesterday 

afternoon') and context ('Pub NYT Sun 

5/2/93', 'RECEIVED DEC 25 1942'). Of 

course, the digital text transcription isn't 

perfect, but it's faster and more cost 

effective than alternatives for processing 

millions of images. 

This is only the start of what's possible for 

companies with physical archives. They can 

use the Vision API to identify objects, 

places and images. For example, if we pass 

the black and white photo above through 

the Cloud Vision API with Logo Detection, 

it recognises Pennsylvania Station. 

Furthermore, AutoML can be used to 

better identify images in collections using a 

corpus of already captioned images. 

The Cloud Natural Language API could 

also be used to add additional semantic 

information to recognised text. For 

example, passing the text "THE WAY IT 

WAS - Crowded Penn Station in 1942, an 

era when only the brave flew - to 

Washington, Miami and assorted way 

stations." through the Cloud Natural 

Language API, it correctly identifies "Penn 

Station," "Washington," and "Miami" as 

locations, and classifies the entire sentence 

into the category "travel" and the 

subcategory "bus & rail." 

Going forward the NYT newsroom will 

use the digitised archives to inspire stories 

for Past Tense, a body of coverage 

dedicated to revisiting history. The first 

package from The Times newsroom to 

utilise the digitised archives will focus on 

how The Times covered California in the 

20th century, examining how California's 

free-spiritedness and culture of recreation 

and innovation appeared to Times 

journalists 3,000 miles away. 

More info: cloud.google.com 

www.document-manager.com 

January/February 2019 

@DMMagAndAwards 

27

Previous page

Next page

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

36

DM1901

Create successful ePaper yourself

Delete template?

Save as template?