25.01.2015 Views

Working with XML in POCO.

Working with XML in POCO.

Working with XML in POCO.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>XML</strong><br />

<strong>Work<strong>in</strong>g</strong> <strong>with</strong> <strong>XML</strong> <strong>in</strong> <strong>POCO</strong>.


Overview<br />

> <strong>XML</strong><br />

> Simple API for <strong>XML</strong> (SAX)<br />

> Document Object Model (DOM)<br />

> Creat<strong>in</strong>g <strong>XML</strong> documents


<strong>XML</strong><br />

> eXtensible Markup Language, free, open standard<br />

> is a general-purpose specification for creat<strong>in</strong>g custom markup<br />

languages<br />

> purpose is to aid <strong>in</strong>formation systems <strong>in</strong> shar<strong>in</strong>g structured data,<br />

expecially over the net<br />

> documents must be well-formed (e.g. a start tag () must<br />

always be closed by an end tag ()<br />

> valid: the document can be checked aga<strong>in</strong>st a schema<br />

(not supported by <strong>POCO</strong>)


<strong>XML</strong> – Vocabulary<br />

> <strong>XML</strong> declaration (optional)<br />

<br />

> root element<br />

the top most element start<strong>in</strong>g a <strong>XML</strong> document<br />

> elements and attributes:<br />

<br />

Element Content<br />

<br />

> Comments<br />


<strong>XML</strong> – Vocabulary (cont)<br />

> Process<strong>in</strong>g Instruction<br />

is actually <strong>in</strong>formation for the application. Not really of <strong>in</strong>terest to<br />

the <strong>XML</strong> parser (exception: <strong>XML</strong> declaration)<br />

<br />

> CDATA<br />

used to escape blocks of text conta<strong>in</strong><strong>in</strong>g characters which would<br />

otherwise be recognized as markup<br />

]]>


<strong>XML</strong> Programm<strong>in</strong>g Interfaces<br />

> <strong>POCO</strong> supports two <strong>in</strong>terfaces for work<strong>in</strong>g <strong>with</strong> (read<strong>in</strong>g and<br />

writ<strong>in</strong>g) <strong>XML</strong> data:<br />

> The Simple API for <strong>XML</strong>, Version 2<br />

> The Document Object Model


The Simple API for <strong>XML</strong> (SAX)<br />

> SAX was orig<strong>in</strong>ally a Java-only API for read<strong>in</strong>g <strong>XML</strong> data.<br />

> The API has been developed by a group of volunteers, not by an<br />

"official" standardization group.<br />

> The current version of the API is 2.0.2 (s<strong>in</strong>ce April 2004)<br />

> <strong>POCO</strong> supports a C++ variant of the orig<strong>in</strong>al Java API.<br />

> For more <strong>in</strong>formation: http://www.saxproject.org


Event-driven Pars<strong>in</strong>g<br />

> SAX is an event-driven <strong>in</strong>terface.<br />

> The <strong>XML</strong> document is not loaded <strong>in</strong>to memory as a whole for<br />

pars<strong>in</strong>g.<br />

> Instead, the parser scans the <strong>XML</strong> document, and for every <strong>XML</strong><br />

construct (element, text, process<strong>in</strong>g <strong>in</strong>struction, etc.) it f<strong>in</strong>ds, calls<br />

a certa<strong>in</strong> member function of a handler object.<br />

> SAX basically def<strong>in</strong>es the <strong>in</strong>terfaces of these handler objects, as<br />

well as the <strong>in</strong>terface you use to start and configure the parser.


startDocument()<br />

startElement()<br />

<br />

<br />

<br />

some text<br />

<br />

<br />

<br />

startElement()<br />

characters()<br />

endElement()<br />

startElement()<br />

endElement()<br />

endElement()<br />

endDocument()


SAX Interfaces<br />

> Attributes<br />

(access attributes values by <strong>in</strong>dex or name)<br />

> ContentHandler<br />

(startElement(), endElement(), characters(), ...)<br />

> DeclHandler<br />

(partly supported for report<strong>in</strong>g entity declarations)<br />

> DTDHandler<br />

(notationDecl(), unparsedEntityDecl())<br />

> LexicalHandler<br />

(startDTD(), endDTD(), startCDATA(), endCDATA(), comment())


#<strong>in</strong>clude "Poco/SAX/ContentHandler.h"<br />

class MyHandler: public Poco::<strong>XML</strong>::ContentHandler<br />

{<br />

public:<br />

MyHandler();<br />

void setDocumentLocator(const Locator* loc);<br />

void startDocument();<br />

void endDocument();<br />

void startElement(<br />

const <strong>XML</strong>Str<strong>in</strong>g& namespaceURI,<br />

const <strong>XML</strong>Str<strong>in</strong>g& localName,<br />

const <strong>XML</strong>Str<strong>in</strong>g& qname,<br />

const Attributes& attributes);<br />

void endElement(<br />

const <strong>XML</strong>Str<strong>in</strong>g& uri,<br />

const <strong>XML</strong>Str<strong>in</strong>g& localName,<br />

const <strong>XML</strong>Str<strong>in</strong>g& qname);


void characters(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t length);<br />

void ignorableWhitespace(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t len);<br />

void process<strong>in</strong>gInstruction(<br />

const <strong>XML</strong>Str<strong>in</strong>g& target,<br />

const <strong>XML</strong>Str<strong>in</strong>g& data);<br />

void startPrefixMapp<strong>in</strong>g(<br />

const <strong>XML</strong>Str<strong>in</strong>g& prefix,<br />

const <strong>XML</strong>Str<strong>in</strong>g& uri);<br />

void endPrefixMapp<strong>in</strong>g(const <strong>XML</strong>Str<strong>in</strong>g& prefix);<br />

};<br />

void skippedEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);


Decl Handler<br />

> optional handler for DTD declarations <strong>in</strong> an <strong>XML</strong> file<br />

> Document Type Declaration<br />

> sort of simple schema language<br />

handles DTD declarations <strong>in</strong> an <strong>XML</strong> file<br />

<br />

<br />

<br />

<br />


<br />

<br />

<br />

Peter Schojer<br />

15/03/1976<br />

Male<br />

<br />


#<strong>in</strong>clude “Poco/SAX/DeclHandler.h”<br />

class MyDeclHandler: public Poco::<strong>XML</strong>::DeclHandler<br />

{<br />

public:<br />

MyDeclHandler();<br />

void attributeDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& eName,<br />

const <strong>XML</strong>Str<strong>in</strong>g& aName,<br />

const <strong>XML</strong>Str<strong>in</strong>g* valueDefault,<br />

const <strong>XML</strong>Str<strong>in</strong>g* value);<br />

void elementDecl(const <strong>XML</strong>Str<strong>in</strong>g& name, const <strong>XML</strong>Str<strong>in</strong>g& model);<br />

void externalEntityDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& systemId);<br />

};<br />

void <strong>in</strong>ternalEntityDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g& value);


DTDHandler<br />

> handles DTD not handled by DeclHandler<br />

> unparsed entities<br />

> notations<br />

> all reported between startDocument and first startElement


DTD Entity<br />

> variables used to def<strong>in</strong>e shortcuts to text<br />

> either as <strong>in</strong>ternal entity<br />

<br />

&owner;<br />

> or as an external entity<br />


DTD Notation<br />

> allows you to <strong>in</strong>clude data <strong>in</strong> your <strong>XML</strong> file which is not <strong>XML</strong><br />

<br />

<br />

>


#<strong>in</strong>clude “Poco/SAX/DTDHandler.h”<br />

class MyDTDHandler: public Poco::<strong>XML</strong>::DTDHandler<br />

{<br />

public:<br />

MyDTDHandler();<br />

void notationDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g* systemId);<br />

};<br />

void unparsedEntityDecl(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g* publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& systemId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& notationName);


LexicalHandler<br />

> optional extension handler for SAX to provide lexical <strong>in</strong>formation<br />

about an <strong>XML</strong> document<br />

> comments<br />

> CDATA sections<br />

> reports starts and end of DTD sections and entities


#<strong>in</strong>clude “Poco/SAX/LexicalHandler.h”<br />

class MyLexicalHandler: public Poco::<strong>XML</strong>::LexicalHandler {<br />

public:<br />

MyLexicalHandler();<br />

void startDTD(<br />

const <strong>XML</strong>Str<strong>in</strong>g& name,<br />

const <strong>XML</strong>Str<strong>in</strong>g& publicId,<br />

const <strong>XML</strong>Str<strong>in</strong>g& systemId);<br />

void endDTD();<br />

void startEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);<br />

void endEntity(const <strong>XML</strong>Str<strong>in</strong>g& name);<br />

void startCDATA();<br />

void endCDATA();<br />

};<br />

void comment(const <strong>XML</strong>Char ch[], <strong>in</strong>t start, <strong>in</strong>t length);


SAX Parser Configuration<br />

> <strong>XML</strong>Reader def<strong>in</strong>es the <strong>in</strong>terface of the parser.<br />

> Methods for register<strong>in</strong>g handlers<br />

(setContentHandler(), etc.)<br />

> Methods for parser configuration:<br />

> setFeature(), getFeature()<br />

e.g. for enabl<strong>in</strong>g/disabl<strong>in</strong>g namespaces support<br />

> setProperty(), getProperty()<br />

e.g. for register<strong>in</strong>g LexicalHandler, DeclHandler


SAX Namespaces Support<br />

feature<br />

namespaces<br />

feature<br />

namespaceprefixes<br />

Namespace<br />

URI<br />

Local<br />

Name<br />

QName<br />

false false – – ✔<br />

true false ✔ ✔ –<br />

true true ✔ ✔ ✔<br />

false true – – ✔


class MyHandler: public ContentHandler, public LexicalHandler<br />

{<br />

[...]<br />

};<br />

MyHandler handler;<br />

SAXParser parser;<br />

parser.setFeature(<strong>XML</strong>Reader::FEATURE_NAMESPACES, true);<br />

parser.setFeature(<strong>XML</strong>Reader::FEATURE_NAMESPACE_PREFIXES, true);<br />

parser.setContentHandler(&handler);<br />

parser.setProperty(<strong>XML</strong>Reader::PROPERTY_LEXICAL_HANDLER,<br />

static_cast(&handler));<br />

try<br />

{<br />

parser.parse(“test.xml”);<br />

}<br />

catch (Poco::Exception& e)<br />

{<br />

std::cerr


The Document Object Model<br />

> The Document Object Model is an API specified by the World<br />

Wide Web Consortium (W3C)<br />

> DOM uses a tree representation of the <strong>XML</strong> document<br />

> The entire document has to be loaded <strong>in</strong>to memory<br />

> You can modify the <strong>XML</strong> document directly


<br />

<br />

some text<br />

<br />

<br />

<br />

Document<br />

Element<br />

Text<br />

Element<br />

Element<br />

Attr<br />

Attr


EventTarget<br />

Node<br />

Document Element CharacterData Process<strong>in</strong>gInstruction<br />

Text<br />

Comment<br />

CDATASection


Navigat<strong>in</strong>g the DOM<br />

> Node has<br />

> parentNode()<br />

> firstChild(), lastChild()<br />

> nextSibl<strong>in</strong>g(), previousSibl<strong>in</strong>g()<br />

> NodeIterator for document-order traversal:<br />

nextNode(), previousNode()<br />

> TreeWalker for arbitraty navigation:<br />

parentNode(), firstChild(), lastChild(), etc.<br />

> NodeIterator and TreeWalker support node filter<strong>in</strong>g


Memory Management <strong>in</strong> the DOM<br />

> DOM Nodes are reference counted.<br />

> If you create a new node and add it to a document, the<br />

document <strong>in</strong>crements its reference count. So use an AutoPtr.<br />

> You only get ownership of non-tree objects implement<strong>in</strong>g the<br />

NamedNodeMap and NodeList <strong>in</strong>terface.<br />

You have to release them (or use an AutoPtr).<br />

> The document keeps ownership of nodes you remove from the<br />

tree. These nodes end up <strong>in</strong> the document's AutoReleasePool.


#<strong>in</strong>clude "Poco/DOM/DOMParser.h"<br />

#<strong>in</strong>clude "Poco/DOM/Document.h"<br />

#<strong>in</strong>clude "Poco/DOM/NodeIterator.h"<br />

#<strong>in</strong>clude "Poco/DOM/NodeFilter.h"<br />

#<strong>in</strong>clude "Poco/DOM/AutoPtr.h"<br />

#<strong>in</strong>clude "Poco/SAX/InputSource.h"<br />

[...]<br />

std::ifstream <strong>in</strong>(“test.xml”);<br />

Poco::<strong>XML</strong>::InputSource src(<strong>in</strong>);<br />

Poco::<strong>XML</strong>::DOMParser parser;<br />

Poco::AutoPtr pDoc = parser.parse(&src);<br />

Poco::<strong>XML</strong>::NodeIterator it(pDoc, Poco::<strong>XML</strong>::NodeFilter::SHOW_ELEMENTS);<br />

Poco::<strong>XML</strong>::Node* pNode = it.nextNode();<br />

while (pNode)<br />

{<br />

std::cout


Creat<strong>in</strong>g <strong>XML</strong> Documents<br />

> You can create an <strong>XML</strong> document by:<br />

> build<strong>in</strong>g a DOM document from scratch, or<br />

> by us<strong>in</strong>g the <strong>XML</strong>Writer class,<br />

> or by generat<strong>in</strong>g the <strong>XML</strong> yourself.<br />

> <strong>XML</strong>Writer supports a SAX <strong>in</strong>terface for generat<strong>in</strong>g <strong>XML</strong> data.


#<strong>in</strong>clude "Poco/DOM/Document.h"<br />

#<strong>in</strong>clude "Poco/DOM/Element.h"<br />

#<strong>in</strong>clude "Poco/DOM/Text.h"<br />

#<strong>in</strong>clude "Poco/DOM/AutoPtr.h" //typedef to Poco::AutoPtr<br />

#<strong>in</strong>clude "Poco/DOM/DOMWriter.h"<br />

#<strong>in</strong>clude "Poco/<strong>XML</strong>/<strong>XML</strong>Writer.h"<br />

us<strong>in</strong>g namespace Poco::<strong>XML</strong>;<br />

AutoPtr pDoc = new Document;<br />

AutoPtr pRoot = pDoc->createElement("root");<br />

pDoc->appendChild(pRoot);<br />

AutoPtr pChild1 = pDoc->createElement("child1");<br />

AutoPtr pText1 = pDoc->createTextNode("text1");<br />

pChild1->appendChild(pText1);<br />

pRoot->appendChild(pChild1);<br />

AutoPtr pChild2 = pDoc->createElement("child2");<br />

AutoPtr pText2 = pDoc->createTextNode("text2");<br />

pChild2->appendChild(pText2);<br />

pRoot->appendChild(pChild2);<br />

DOMWriter writer;<br />

writer.setNewL<strong>in</strong>e("\n");<br />

writer.setOptions(<strong>XML</strong>Writer::PRETTY_PRINT);<br />

writer.writeNode(std::cout, pDoc);


#<strong>in</strong>clude "Poco/<strong>XML</strong>/<strong>XML</strong>Writer.h"<br />

#<strong>in</strong>clude "Poco/SAX/AttributesImpl.h"<br />

std::ofstream str(“test.xml”)<br />

<strong>XML</strong>Writer writer(str, <strong>XML</strong>Writer::WRITE_<strong>XML</strong>_DECLARATION |<br />

<strong>XML</strong>Writer::PRETTY_PRINT);<br />

writer.setNewL<strong>in</strong>e("\n");<br />

writer.startDocument();<br />

AttributesImpl attrs;<br />

attrs.addAttribute("", "", "a1", "", "v1");<br />

attrs.addAttribute("", "", "a2", "", "v2");<br />

writer.startElement("urn:mynamespace", "root", "", attrs);<br />

writer.startElement("", "", "sub");<br />

writer.endElement("", "", "sub");<br />

writer.endElement("urn:mynamespace", "root", "");<br />

writer.endDocument();


Copyright © 2006-2010 by Applied Informatics Software Eng<strong>in</strong>eer<strong>in</strong>g GmbH.<br />

Some rights reserved.<br />

www.app<strong>in</strong>f.com | <strong>in</strong>fo@app<strong>in</strong>f.com<br />

T +43 4253 32596 | F +43 4253 32096

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!