XML_SPLIT

LANSA Composer

XML_SPLIT

This activity can split an XML document into multiple documents at a specified element name, with certain constraints and caveats as further described below.

For most routine XML document processing, this approach is NOT necessary or recommended.  This activity is provided primarily for use in cases involving XML documents that are so exceptionally large that they cannot be efficiently processed in one step due to memory constraints.  An exceptionally large XML file will still take a long time to process with this activity - you should verify the performance with your files and in your operating environment before designing a solution that relies on this activity.

In order to optimise the activity, it processes the source XML document without reference to any associated schema or DTD.  The split documents created by this activity should be well-formed (*) XML (providing the source XML document is well-formed).  However, it cannot be guaranteed that the split documents are valid (*) with reference to the source document schema or DTD.  In addition, depending on the structure of the source +document and upon the split element specified, the split documents may not always contain all the available data in the source document.

(*) This description assumes you understand the distinction between the terms "well-formed" and "valid" in relation to XML documents.  If in doubt, please refer to appropriate XML references.

In addition, please understand that XML is capable of representing an infinite variety of document structures.  This activity may not be capable of splitting every source document in the way that you intended or expected.  In order to keep the activity simple to use while enabling it to be useful for the largest range of likely scenarios, the activity makes certain assumptions about how to split documents and what antecedent data to retain or duplicate in the split documents.  You should thoroughly test it with representative samples of the XML documents with which you intend to use it before designing a solution that relies on this activity.

In addition, you must be aware of the following further limitations and caveats:

  • The DOCTYPE of the source XML document (for XML documents that reference a DTD) is NOT output to the split documents.
  • Certain other parts of an XML document may similarly NOT be output to the split documents, in particular CDATA sections.
  • White space (including line feeds and carriage-returns) in the source XML document will be lost from the split documents.  In most cases, this simply makes the XML less-readable when viewed with a non-XML aware text editor or viewer.
  • Finally, it should be evident that the XML_SPLIT activity cannot and does not adjust any XML element or attribute values that represents an aggregation of other data in the source XML document.  For example, if an <ORDERS> element includes values that represent an aggregation of the order lines contained within it (for example, a count of the number of order lines or a total of the order quantities or values), then that data would be repeated without modification in each split document that arises from that <ORDERS> element.

The output split document paths and names are generated according to the values specified or assumed for the XMLSPLITPATH and XMLSPLITROOT parameters.  If documents of the generated name(s) exist in the target location, they will be replaced by this activity.

INPUT Parameters:

XMLFILE : Required

This parameter specifies the path and name of the XML document file to be split.

XMLSPLITELEMENT : Optional

This parameter may specify the name of the XML element at which the source XML document is to be split.  If not specified, a default of *FIRST is assumed.  *FIRST means the name of the first XML element INSIDE the root element (ie: NOT the root element) is used as the split element name.

Note that, for convenience, the element name comparisons are performed without regard to case.  (In strict XML terms, this is potentially ambiguous as XML element names are case-sensitive.)  Also the undecorated element name is used for comparisons - that is, excluding any namespace qualifiers.

Each instance of an element (other than the root element) that matches the specified name (and that is not contained within another instance of an element with the same name) may trigger a new split XML document, according to the value specified for the XMLSPLITELEMENTSMAX parameter.  Each split document will contain an XML structure that includes:

  • all the antecedent elements of the split element and all preceding elements that they contain
  • PRECEDING sibling elements, ie: elements (other than preceding instances of the split element) with the same parent element
  • the instance(s) of the split element and all elements that they contain.

XMLSPLITELEMENTSMAX : Optional

This parameter may specify the maximum number of split element instances per split document.  If not specified, a default of 1 is assumed.  The activity will place "sibling" instances of the split element into the same split document, up to the number specified by this parameter.  When the parent element is closed, the split document is closed whether or not the maximum is reached.  Further instances of the split element will trigger another split document for which the count restarts at one.

XMLSPLITPATH : Optional

This parameter may specify the path in which the split XML document files are to be created.  If not specified, a default of *SAME is assumed.  *SAME means the split XML document files will be created in the same location as the input XML document file.

XMLSPLITROOT : Optional

This parameter may specify the root file name and the file extension for the split XML document files.  If not specified, a default of *SAME is assumed.  *SAME means the activity will use the file name and extension of the input XML document file as the root file name and the file extension for the split XML document files.  The activity will append a sequential number to the root file name to make each split XML document file name.  For example, if you specify 'ORDER.xml'as the value for this parameter and the input file is split into three XML document files, then they would have the names 'ORDER1.xml', 'ORDER2.xml' and 'ORDER3.xml'.

XMLSPLITNOTRACE : Optional

This activity uses LANSA Integrator's XMLReaderService and XMLWriterService.  These services may generate particularly large LANSA Integrator trace files, especially as this activity is typically used to process exceptionally large XML files.  More so than many activities, the performance of this activity will be dramatically improved if LANSA Integrator tracing is not in effect while it is executing.  The default value for this parameter, if not specified, turns the LANSA Integrator tracing OFF.

If you want to use LANSA Integrator tracing, then specify 'NO' for this parameter.  You are advised to do this only during design and testing and only with relatively small sample input XML files.  Note that this does NOT necessarily enable tracing - it simply makes it subject once more to LANSA Composer's System Settings.

OUTPUT Parameters:

XMLSPLITCOUNT:

Upon successful completion this parameter will contain the count of split documents created.

XMLSPLITLIST:

Upon successful completion this parameter will contain a list of the full file paths of the split documents created.