Document Metadata, The Silent Killer - PaulDotCom
Document Metadata, The Silent Killer - PaulDotCom
Document Metadata, The Silent Killer - PaulDotCom
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Office 2007<br />
• Changed metadata storage format to XML<br />
• XML parsing with shell scripting is like herding cats<br />
• New document is just a ZIP archive<br />
• <strong>The</strong> best goodies for are typically located in<br />
docProps/core.xml<br />
• Wrote my first Perl script to extract author metadata<br />
http://www.pauldotcom.com/2007XMLextract.pl<br />
• Yes, the zip can be completed in Perl as well...<br />
unzip -e -j Testing<strong>Metadata</strong>2007.docx docProps/core.xml | perl ./2007XMLextract.pl core.xml | tr<br />
'[:space:]' '\n' | sort | uniq > 2007users.txt<br />
Wednesday, March 11, 2009