19.07.2014 Views

Document Metadata, The Silent Killer - PaulDotCom

Document Metadata, The Silent Killer - PaulDotCom

Document Metadata, The Silent Killer - PaulDotCom

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Office 2007<br />

• Changed metadata storage format to XML<br />

• XML parsing with shell scripting is like herding cats<br />

• New document is just a ZIP archive<br />

• <strong>The</strong> best goodies for are typically located in<br />

docProps/core.xml<br />

• Wrote my first Perl script to extract author metadata<br />

http://www.pauldotcom.com/2007XMLextract.pl<br />

• Yes, the zip can be completed in Perl as well...<br />

unzip -e -j Testing<strong>Metadata</strong>2007.docx docProps/core.xml | perl ./2007XMLextract.pl core.xml | tr<br />

'[:space:]' '\n' | sort | uniq > 2007users.txt<br />

Wednesday, March 11, 2009

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!