Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
and libxslt installed and run "<strong>python</strong> setup.py build install" in the<br />
module tree.<br />
The distribution includes a set of examples and regression tests for the<br />
<strong>python</strong> bindings in the <strong>python</strong>/tests directory. Here are some<br />
excerpts from those tests:tst.py:This is a basic test of the file interface an<br />
doc = libxml2.parseFile("tst.xml")<br />
if doc.name != "tst.xml":<br />
print "doc.name failed"<br />
sys.exit(1)<br />
root = doc.children<br />
if root.name != "doc":<br />
print "root.name failed"<br />
sys.exit(1)<br />
child = root.children<br />
if child.name != "foo":<br />
print "child.name failed"<br />
sys.exit(1)<br />
doc.freeDoc()The Python module is called libxml2; parseFile is the equivalent of<br />
xmlParseFile (most of the bindings are automatically generated, and the xml<br />
prefix is removed and the casing convention are kept). All node seen at the<br />
binding level share the same subset of accessors:name : returns the n<br />
type : returns a string indicating the node type<br />
content : returns the content of the node, it is based on<br />
xmlNodeGetContent() and hence is recursive.<br />
parent , children, last,<br />
next, prev, doc,<br />
properties: pointing to the associated element in the tree,<br />
those may return None in case no such link exists.<br />
Also note the need to explicitly deallocate documents with freeDoc() .<br />
Reference counting for libxml2 trees would need quite a lot of work to<br />
function properly, and rather than risk memory leaks if not implemented<br />
correctly it sounds safer to have an explicit function to free a tree. The<br />
wrapper <strong>python</strong> objects like doc, root or child are them automatically garbage<br />
collected.validate.py:This test check the validation interfaces and redirectio<br />
messages:import libxml2<br />
#deactivate error messages from the validation<br />
def noerr(ctx, str):<br />
pass<br />
libxml2.registerErrorHandler(noerr, None)<br />
ctxt = libxml2.createFileParserCtxt("invalid.xml")<br />
ctxt.validate(1)<br />
ctxt.parseDocument()<br />
doc = ctxt.doc()<br />
valid = ctxt.isValid()<br />
doc.freeDoc()<br />
if valid != 0:<br />
print "validity check failed"The first thing to notice is the call to registerErr<br />
defines a new error handler global to the library. It is used to avoid seeing<br />
the error messages when trying to validate the invalid document.The main interest of th<br />
createFileParserCtxt() and how the behaviour can be changed before calling<br />
parseDocument() . Similarly the informations resulting from the parsing phase<br />
are also available using context methods.Contexts like nodes are defined as class and t<br />
C function interfaces in terms of objects method as much as possible. The<br />
best to get a complete view of what methods are supported is to look at the<br />
libxml2.py module containing all the wrappers.push.py:This test show how to ac
ctxt = libxml2.createPushParser(None, "<foo", 4, "test.xml")<br />
ctxt.parseChunk("/>", 2, 1)<br />
doc = ctxt.doc()<br />
doc.freeDoc()The context is created with a special call based on the<br />
xmlCreatePushParser() from the C library. The first argument is an optional<br />
SAX callback object, then the initial set of data, the length and the name of<br />
the resource in case URI-References need to be computed by the parser.Then the data are<br />
setting the third argument terminate to 1.pushSAX.py:this test show the use of<br />
the parser does not build a document, but provides callback information as<br />
the parser makes progresses analyzing the data being provided:import libxml2<br />
log = ""<br />
class callback:<br />
def startDocument(self):<br />
global log<br />
log = log + "startDocument:"<br />
def endDocument(self):<br />
global log<br />
log = log + "endDocument:"<br />
def startElement(self, tag, attrs):<br />
global log<br />
log = log + "startElement %s %s:" % (tag, attrs)<br />
def endElement(self, tag):<br />
global log<br />
log = log + "endElement %s:" % (tag)<br />
def characters(self, data):<br />
global log<br />
log = log + "characters: %s:" % (data)<br />
def warning(self, msg):<br />
global log<br />
log = log + "warning: %s:" % (msg)<br />
def error(self, msg):<br />
global log<br />
log = log + "error: %s:" % (msg)<br />
def fatalError(self, msg):<br />
global log<br />
log = log + "fatalError: %s:" % (msg)<br />
handler = callback()<br />
ctxt = libxml2.createPushParser(handler, "<foo", 4, "test.xml")<br />
chunk = " url=’tst’>b"<br />
ctxt.parseChunk(chunk, len(chunk), 0)<br />
chunk = "ar</foo>"<br />
ctxt.parseChunk(chunk, len(chunk), 1)<br />
reference = "startDocument:startElement foo {’url’: ’tst’}:" + \<br />
"characters: bar:endElement foo:endDocument:"<br />
if log != reference:<br />
print "Error got: %s" % log<br />
print "Expected: %s" % referenceThe key object in that test is the handler, it pr<br />
points which can be called by the parser as it makes progresses to indicate
the information set obtained. The full set of callback is larger than what<br />
the callback class in that specific example implements (see the SAX<br />
definition for a complete list). The wrapper will only call those supplied by<br />
the object when activated. The startElement receives the names of the element<br />
and a dictionary containing the attributes carried by this element.Also note that the r<br />
single character call even though the string "bar" is passed to the parser<br />
from 2 different call to parseChunk()xpath.py:This is a basic test of XPath wr<br />
doc = libxml2.parseFile("tst.xml")<br />
ctxt = doc.xpathNewContext()<br />
res = ctxt.xpathEval("//*")<br />
if len(res) != 2:<br />
print "xpath query: wrong node set size"<br />
sys.exit(1)<br />
if res[0].name != "doc" or res[1].name != "foo":<br />
print "xpath query: wrong node set value"<br />
sys.exit(1)<br />
doc.freeDoc()<br />
ctxt.xpathFreeContext()This test parses a file, then create an XPath context to evalu<br />
expression on it. The xpathEval() method execute an XPath query and returns<br />
the result mapped in a Python way. String and numbers are natively converted,<br />
and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like<br />
the document, the XPath context need to be freed explicitly, also not that<br />
the result of the XPath query may point back to the document tree and hence<br />
the document must be freed after the result of the query is used.xpathext.py:T<br />
<strong>python</strong>:import libxml2<br />
def foo(ctx, x):<br />
return x + 1<br />
doc = libxml2.parseFile("tst.xml")<br />
ctxt = doc.xpathNewContext()<br />
libxml2.registerXPathFunction(ctxt._o, "foo", None, foo)<br />
res = ctxt.xpathEval("foo(1)")<br />
if res != 2:<br />
print "xpath extension failure"<br />
doc.freeDoc()<br />
ctxt.xpathFreeContext()Note how the extension function is registered with the context<br />
part is not yet finalized, this may change slightly in the future).tstxpath.py:
calls dumpMemory() which saves that list in a .memdump file.