XPathNavigatorReader
I've recntly worked on a project where I've replaced the existing XML-XSLT engine with one that is more performant. The existing system took a large Xml document, in some cases larger than 80Mb, and performed multiple XSL transformations against it.
It was written in VB6 and used MSXML. The main work was done by the transformNodes method, but several steps were involved before hand in reaching this point - generation of the XSLT based on the main XML for example. The output from these transformations usually resulted in tens, even hundreds of HTML documents.
It wasn't performant, and it used alot of memory, a consequence of DOM based processing, especially for large documents. Typically processing time would be in the minutes, tens of minutes in some instances. As the process was an asynchronous publishing system triggered by results becoming available, performance wasn't the most important thing (except that if lots of requests came in at once, the system would *really* slow down, and there were occasions where a client would have expected some results to be available, but they hadn't finished yet).
Analysing the application, I could see that a major performance hit came from loading the Xml documents into memory. C# is the main language I work with these days, and I throught I'd use the XPathDocument class along with the XsltTransform class. Simple choice, really, as the process is about transforming the document, so I don't need a DOM - I'm not altering the original Xml tree in any way.
My first experiments got the process down to performing about 700 XSL transforms in 27 seconds. This is a vast improvement, and memory usage was lowered too. Still, I could see that a lot of time was spent in moving between nodes using the MoveNext() method. Also, awkwardly, I still needed to get at the Xml of certain nodes within the document, something that the XPathDocument/XPathNavigator combo doesn't really cater for easily.
I did some research and discovered the XPathNavigatorReader (in the MVP.Xml namespace). Apparently, it adopts many of the features of the Xml classes found in Framework 2.0.
It's cool, as it allows me to efficiently move through the XPathDocument and get at the Xml when I need too. Implementing this over the XPathDocument further increased performance - about 700 transformations in 3 seconds.
Which was nice.
It was written in VB6 and used MSXML. The main work was done by the transformNodes method, but several steps were involved before hand in reaching this point - generation of the XSLT based on the main XML for example. The output from these transformations usually resulted in tens, even hundreds of HTML documents.
It wasn't performant, and it used alot of memory, a consequence of DOM based processing, especially for large documents. Typically processing time would be in the minutes, tens of minutes in some instances. As the process was an asynchronous publishing system triggered by results becoming available, performance wasn't the most important thing (except that if lots of requests came in at once, the system would *really* slow down, and there were occasions where a client would have expected some results to be available, but they hadn't finished yet).
Analysing the application, I could see that a major performance hit came from loading the Xml documents into memory. C# is the main language I work with these days, and I throught I'd use the XPathDocument class along with the XsltTransform class. Simple choice, really, as the process is about transforming the document, so I don't need a DOM - I'm not altering the original Xml tree in any way.
My first experiments got the process down to performing about 700 XSL transforms in 27 seconds. This is a vast improvement, and memory usage was lowered too. Still, I could see that a lot of time was spent in moving between nodes using the MoveNext() method. Also, awkwardly, I still needed to get at the Xml of certain nodes within the document, something that the XPathDocument/XPathNavigator combo doesn't really cater for easily.
I did some research and discovered the XPathNavigatorReader (in the MVP.Xml namespace). Apparently, it adopts many of the features of the Xml classes found in Framework 2.0.
It's cool, as it allows me to efficiently move through the XPathDocument and get at the Xml when I need too. Implementing this over the XPathDocument further increased performance - about 700 transformations in 3 seconds.
Which was nice.

0 Comments:
Post a Comment
<< Home