27.12.2013 Views

python.pdf

python.pdf

python.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.


and libxslt installed and run "<strong>python</strong> setup.py build install" in the<br />

module tree.<br />

The distribution includes a set of examples and regression tests for the<br />

<strong>python</strong> bindings in the <strong>python</strong>/tests directory. Here are some<br />

excerpts from those tests:tst.py:This is a basic test of the file interface an<br />

doc = libxml2.parseFile("tst.xml")<br />

if doc.name != "tst.xml":<br />

print "doc.name failed"<br />

sys.exit(1)<br />

root = doc.children<br />

if root.name != "doc":<br />

print "root.name failed"<br />

sys.exit(1)<br />

child = root.children<br />

if child.name != "foo":<br />

print "child.name failed"<br />

sys.exit(1)<br />

doc.freeDoc()The Python module is called libxml2; parseFile is the equivalent of<br />

xmlParseFile (most of the bindings are automatically generated, and the xml<br />

prefix is removed and the casing convention are kept). All node seen at the<br />

binding level share the same subset of accessors:name : returns the n<br />

type : returns a string indicating the node type<br />

content : returns the content of the node, it is based on<br />

xmlNodeGetContent() and hence is recursive.<br />

parent , children, last,<br />

next, prev, doc,<br />

properties: pointing to the associated element in the tree,<br />

those may return None in case no such link exists.<br />

Also note the need to explicitly deallocate documents with freeDoc() .<br />

Reference counting for libxml2 trees would need quite a lot of work to<br />

function properly, and rather than risk memory leaks if not implemented<br />

correctly it sounds safer to have an explicit function to free a tree. The<br />

wrapper <strong>python</strong> objects like doc, root or child are them automatically garbage<br />

collected.validate.py:This test check the validation interfaces and redirectio<br />

messages:import libxml2<br />

#deactivate error messages from the validation<br />

def noerr(ctx, str):<br />

pass<br />

libxml2.registerErrorHandler(noerr, None)<br />

ctxt = libxml2.createFileParserCtxt("invalid.xml")<br />

ctxt.validate(1)<br />

ctxt.parseDocument()<br />

doc = ctxt.doc()<br />

valid = ctxt.isValid()<br />

doc.freeDoc()<br />

if valid != 0:<br />

print "validity check failed"The first thing to notice is the call to registerErr<br />

defines a new error handler global to the library. It is used to avoid seeing<br />

the error messages when trying to validate the invalid document.The main interest of th<br />

createFileParserCtxt() and how the behaviour can be changed before calling<br />

parseDocument() . Similarly the informations resulting from the parsing phase<br />

are also available using context methods.Contexts like nodes are defined as class and t<br />

C function interfaces in terms of objects method as much as possible. The<br />

best to get a complete view of what methods are supported is to look at the<br />

libxml2.py module containing all the wrappers.push.py:This test show how to ac


ctxt = libxml2.createPushParser(None, "&lt;foo", 4, "test.xml")<br />

ctxt.parseChunk("/&gt;", 2, 1)<br />

doc = ctxt.doc()<br />

doc.freeDoc()The context is created with a special call based on the<br />

xmlCreatePushParser() from the C library. The first argument is an optional<br />

SAX callback object, then the initial set of data, the length and the name of<br />

the resource in case URI-References need to be computed by the parser.Then the data are<br />

setting the third argument terminate to 1.pushSAX.py:this test show the use of<br />

the parser does not build a document, but provides callback information as<br />

the parser makes progresses analyzing the data being provided:import libxml2<br />

log = ""<br />

class callback:<br />

def startDocument(self):<br />

global log<br />

log = log + "startDocument:"<br />

def endDocument(self):<br />

global log<br />

log = log + "endDocument:"<br />

def startElement(self, tag, attrs):<br />

global log<br />

log = log + "startElement %s %s:" % (tag, attrs)<br />

def endElement(self, tag):<br />

global log<br />

log = log + "endElement %s:" % (tag)<br />

def characters(self, data):<br />

global log<br />

log = log + "characters: %s:" % (data)<br />

def warning(self, msg):<br />

global log<br />

log = log + "warning: %s:" % (msg)<br />

def error(self, msg):<br />

global log<br />

log = log + "error: %s:" % (msg)<br />

def fatalError(self, msg):<br />

global log<br />

log = log + "fatalError: %s:" % (msg)<br />

handler = callback()<br />

ctxt = libxml2.createPushParser(handler, "&lt;foo", 4, "test.xml")<br />

chunk = " url=’tst’&gt;b"<br />

ctxt.parseChunk(chunk, len(chunk), 0)<br />

chunk = "ar&lt;/foo&gt;"<br />

ctxt.parseChunk(chunk, len(chunk), 1)<br />

reference = "startDocument:startElement foo {’url’: ’tst’}:" + \<br />

"characters: bar:endElement foo:endDocument:"<br />

if log != reference:<br />

print "Error got: %s" % log<br />

print "Expected: %s" % referenceThe key object in that test is the handler, it pr<br />

points which can be called by the parser as it makes progresses to indicate


the information set obtained. The full set of callback is larger than what<br />

the callback class in that specific example implements (see the SAX<br />

definition for a complete list). The wrapper will only call those supplied by<br />

the object when activated. The startElement receives the names of the element<br />

and a dictionary containing the attributes carried by this element.Also note that the r<br />

single character call even though the string "bar" is passed to the parser<br />

from 2 different call to parseChunk()xpath.py:This is a basic test of XPath wr<br />

doc = libxml2.parseFile("tst.xml")<br />

ctxt = doc.xpathNewContext()<br />

res = ctxt.xpathEval("//*")<br />

if len(res) != 2:<br />

print "xpath query: wrong node set size"<br />

sys.exit(1)<br />

if res[0].name != "doc" or res[1].name != "foo":<br />

print "xpath query: wrong node set value"<br />

sys.exit(1)<br />

doc.freeDoc()<br />

ctxt.xpathFreeContext()This test parses a file, then create an XPath context to evalu<br />

expression on it. The xpathEval() method execute an XPath query and returns<br />

the result mapped in a Python way. String and numbers are natively converted,<br />

and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like<br />

the document, the XPath context need to be freed explicitly, also not that<br />

the result of the XPath query may point back to the document tree and hence<br />

the document must be freed after the result of the query is used.xpathext.py:T<br />

<strong>python</strong>:import libxml2<br />

def foo(ctx, x):<br />

return x + 1<br />

doc = libxml2.parseFile("tst.xml")<br />

ctxt = doc.xpathNewContext()<br />

libxml2.registerXPathFunction(ctxt._o, "foo", None, foo)<br />

res = ctxt.xpathEval("foo(1)")<br />

if res != 2:<br />

print "xpath extension failure"<br />

doc.freeDoc()<br />

ctxt.xpathFreeContext()Note how the extension function is registered with the context<br />

part is not yet finalized, this may change slightly in the future).tstxpath.py:


calls dumpMemory() which saves that list in a .memdump file.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!