alpcentaur
/
permapp

# saxes

A sax-style non-validating parser for XML.
Saxes is a fork of [sax](https://github.com/isaacs/sax-js) 1.2.4. All mentionsof sax in this project's documentation are references to sax 1.2.4.
Designed with [node](http://nodejs.org/) in mind, but should work fine in thebrowser or other CommonJS implementations.
Saxes does not support Node versions older than 10.
## Notable Differences from Sax.

* Saxes aims to be much stricter than sax with regards to XML  well-formedness. Sax, even in its so-called "strict mode", is not strict. It  silently accepts structures that are not well-formed XML. Projects that need  better compliance with well-formedness constraints cannot use sax as-is.
  Consequently, saxes does not support HTML, or pseudo-XML, or bad XML. Saxes  will report well-formedness errors in all these cases but it won't try to  extract data from malformed documents like sax does.
* Saxes is much much faster than sax, mostly because of a substantial redesign  of the internal parsing logic. The speed improvement is not merely due to  removing features that were supported by sax. That helped a bit, but saxes  adds some expensive checks in its aim for conformance with the XML  specification. Redesigning the parsing logic is what accounts for most of the  performance improvement.
* Saxes does not aim to support antiquated platforms. We will not pollute the  source or the default build with support for antiquated platforms. If you want  support for IE 11, you are welcome to produce a PR that adds a *new build*  transpiled to ES5.
* Saxes handles errors differently from sax: it provides a default onerror  handler which throws. You can replace it with your own handler if you want. If  your handler does nothing, there is no `resume` method to call.
* There's no `Stream` API. A revamped API may be introduced later. (It is still  a "streaming parser" in the general sense that you write a character stream to  it.)
* Saxes does not have facilities for limiting the size the data chunks passed to  event handlers. See the FAQ entry for more details.
## Conformance

Saxes supports:
* [XML 1.0 fifth edition](https://www.w3.org/TR/2008/REC-xml-20081126/)* [XML 1.1 second edition](https://www.w3.org/TR/2006/REC-xml11-20060816/)* [Namespaces in XML 1.0 (Third Edition)](https://www.w3.org/TR/2009/REC-xml-names-20091208/).* [Namespaces in XML 1.1 (Second Edition)](https://www.w3.org/TR/2006/REC-xml-names11-20060816/).
## Limitations

This is a non-validating parser so it only verifies whether the document iswell-formed. We do aim to raise errors for all malformed constructsencountered. However, this parser does not thorougly parse the contents ofDTDs. So most malformedness errors caused by errors **in DTDs** cannot bereported.
## Regarding `<!DOCTYPE` and `<!ENTITY`

The parser will handle the basic XML entities in text nodes and attributevalues: `&amp; &lt; &gt; &apos; &quot;`. It's possible to define additionalentities in XML by putting them in the DTD. This parser doesn't do anything withthat. If you want to listen to the `doctype` event, and then fetch thedoctypes, and read the entities and add them to `parser.ENTITIES`, then be myguest.
## Documentation

The source code contains JSDOC comments. Use them. What follows is a briefsummary of what is available. The final authority is the source code.
**PAY CLOSE ATTENTION TO WHAT IS PUBLIC AND WHAT IS PRIVATE.**
The move to TypeScript makes it so that everything is now formally private,protected, or public.
If you use anything not public, that's at your own peril.
If there's a mistake in the documentation, raise an issue. If you just assume,you may assume incorrectly.
## Summary Usage Information

### Example

```javascriptvar saxes = require("./lib/saxes"),  parser = new saxes.SaxesParser();
parser.on("error", function (e) {  // an error happened.});parser.on("text", function (t) {  // got some text.  t is the string of text.});parser.on("opentag", function (node) {  // opened a tag.  node has "name" and "attributes"});parser.on("end", function () {  // parser stream is done, and ready to have more stuff written to it.});
parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();```
### Constructor Arguments

Settings supported:
* `xmlns` - Boolean. If `true`, then namespaces are supported. Default  is `false`.
* `position` - Boolean. If `false`, then don't track line/col/position. Unset is  treated as `true`. Default is unset. Currently, setting this to `false` only  results in a cosmetic change: the errors reported do not contain position  information. sax-js would literally turn off the position-computing logic if  this flag was set to false. The notion was that it would optimize  execution. In saxes at least it turns out that continually testing this flag  causes a cost that offsets the benefits of turning off this logic.
* `fileName` - String. Set a file name for error reporting. This is useful only  when tracking positions. You may leave it unset.
* `fragment` - Boolean. If `true`, parse the XML as an XML fragment. Default is  `false`.
* `additionalNamespaces` - A plain object whose key, value pairs define   namespaces known before parsing the XML file. It is not legal to pass   bindings for the namespaces `"xml"` or `"xmlns"`.
* `defaultXMLVersion` - The default version of the XML specification to use if  the document contains no XML declaration. If the document does contain an XML  declaration, then this setting is ignored. Must be `"1.0"` or `"1.1"`. The  default is `"1.0"`.
* `forceXMLVersion` - Boolean. A flag indicating whether to force the XML  version used for parsing to the value of ``defaultXMLVersion``. When this flag  is ``true``, ``defaultXMLVersion`` must be specified. If unspecified, the  default value of this flag is ``false``.
  Example: suppose you are parsing a document that has an XML declaration  specifying XML version 1.1.
  If you set ``defaultXMLVersion`` to ``"1.0"`` without setting  ``forceXMLVersion`` then the XML declaration will override the value of  ``defaultXMLVersion`` and the document will be parsed according to XML 1.1.
  If you set ``defaultXMLVersion`` to ``"1.0"`` and set ``forceXMLVersion`` to  ``true``, then the XML declaration will be ignored and the document will be  parsed according to XML 1.0.
### Methods

`write` - Write bytes onto the stream. You don't have to pass the whole documentin one `write` call. You can read your source chunk by chunk and call `write`with each chunk.
`close` - Close the stream. Once closed, no more data may be written until it isdone processing the buffer, which is signaled by the `end` event.
### Properties

The parser has the following properties:
`line`, `column`, `columnIndex`, `position` - Indications of the position in theXML document where the parser currently is looking. The `columnIndex` propertycounts columns as if indexing into a JavaScript string, whereas the `column`property counts Unicode characters.
`closed` - Boolean indicating whether or not the parser can be written to.  Ifit's `true`, then wait for the `ready` event to write again.
`opt` - Any options passed into the constructor.
`xmlDecl` - The XML declaration for this document. It contains the fields`version`, `encoding` and `standalone`. They are all `undefined` beforeencountering the XML declaration. If they are undefined after the XMLdeclaration, the corresponding value was not set by the declaration. There is noevent associated with the XML declaration. In a well-formed document, the XMLdeclaration may be preceded only by an optional BOM. So by the time any eventgenerated by the parser happens, the declaration has been processed if presentat all. Otherwise, you have a malformed document, and as stated above, youcannot rely on the parser data!
### Error Handling

The parser continues to parse even upon encountering errors, and does its bestto continue reporting errors. You should heed all errors reported. After anerror, however, saxes may interpret your document incorrectly. For instance``<foo a=bc="d"/>`` is invalid XML. Did you mean to have ``<foo a="bc=d"/>`` or``<foo a="b" c="d"/>`` or some other variation?  For the sake of continuing toprovide errors, saxes will continue parsing the document, but the structure itreports may be incorrect. It is only after the errors are fixed in the documentthat saxes can provide a reliable interpretation of the document.
That leaves you with two rules of thumb when using saxes:
* Pay attention to the errors that saxes report. The default `onerror` handler  throws, so by default, you cannot miss errors.
* **ONCE AN ERROR HAS BEEN ENCOUNTERED, STOP RELYING ON THE EVENT HANDLERS OTHER  THAN `onerror`.** As explained above, when saxes runs into a well-formedness  problem, it makes a guess in order to continue reporting more errors. The guess  may be wrong.
### Events

To listen to an event, override `on<eventname>`. The list of supported eventsare also in the exported `EVENTS` array.
See the JSDOC comments in the source code for a description of each supportedevent.
### Parsing XML Fragments

The XML specification does not define any method by which to parse XMLfragments. However, there are usage scenarios in which it is desirable to parsefragments. In order to allow this, saxes provides three initialization options.
If you pass the option `fragment: true` to the parser constructor, the parserwill expect an XML fragment. It essentially starts with a parsing stateequivalent to the one it would be in if `parser.write("<foo">)` had been calledright after initialization. In other words, it expects content which isacceptable inside an element. This also turns off well-formedness checks thatare inappropriate when parsing a fragment.
The option `additionalNamespaces` allows you to define additional prefix-to-URIbindings known before parsing starts. You would use this over `resolvePrefix` ifyou have at the ready a series of namespaces bindings to use.
The option `resolvePrefix` allows you to pass a function which saxes will use ifit is unable to resolve a namespace prefix by itself. You would use this over`additionalNamespaces` in a context where getting a complete list of definednamespaces is onerous.
Note that you can use `additionalNamespaces` and `resolvePrefix` together if youwant. `additionalNamespaces` applies before `resolvePrefix`.
The options `additionalNamespaces` and `resolvePrefix` are really meant to beused for parsing fragments. However, saxes won't prevent you from using themwith `fragment: false`. Note that if you do this, your document may parsewithout errors and yet be malformed because the document can refer to namespaceswhich are not defined *in* the document.
Of course, `additionalNamespaces` and `resolvePrefix` are used only if `xmlns`is `true`. If you are parsing a fragment that does not use namespaces, there'sno point in setting these options.
### Performance Tips

* saxes works faster on files that use newlines (``\u000A``) as end of line  markers than files that use other end of line markers (like ``\r`` or  ``\r\n``). The XML specification requires that conformant applications behave  as if all characters that are to be treated as end of line characters are  converted to ``\u000A`` prior to parsing. The optimal code path for saxes is a  file in which all end of line characters are already ``\u000A``.
* Don't split Unicode strings you feed to saxes across surrogates. When you  naively split a string in JavaScript, you run the risk of splitting a Unicode  character into two surrogates. e.g.  In the following example ``a`` and ``b``  each contain half of a single Unicode character: ``const a = "\u{1F4A9}"[0];  const b = "\u{1F4A9}"[1]`` If you feed such split surrogates to versions of  saxes prior to 4, you'd get errors. Saxes version 4 and over are able to  detect when a chunk of data ends with a surrogate and carry over the surrogate  to the next chunk. However this operation entails slicing and concatenating  strings. If you can feed your data in a way that does not split surrogates,  you should do it. (Obviously, feeding all the data at once with a single write  is fastest.)
* Don't set event handlers you don't need. Saxes has always aimed to avoid doing  work that will just be tossed away but future improvements hope to do this  more aggressively. One way saxes knows whether or not some data is needed is  by checking whether a handler has been set for a specific event.
## FAQ

Q. Why has saxes dropped support for limiting the size of data chunks passed toevent handlers?
A. With sax you could set ``MAX_BUFFER_LENGTH`` to cause the parser to limit thesize of data chunks passed to event handlers. So if you ran into a span of textabove the limit, multiple ``text`` events with smaller data chunks were firedinstead of a single event with a large chunk.
However, that functionality had some problematic characteristics. It had anarbitrary default value. It was library-wide so all parsers created from asingle instance of the ``sax`` library shared it. This could potentially causeconflicts among libraries running in the same VM but using sax for differentpurposes.
These issues could have been easily fixed, but there were larger issues. Thebuffer limit arbitrarily applied to some events but not others. It would split``text``, ``cdata`` and ``script`` events. However, if a ``comment``,``doctype``, ``attribute`` or ``processing instruction`` were more than thelimit, the parser would generate an error and you were left picking up thepieces.
It was not intuitive to use. You'd think setting the limit to 1K would preventchunks bigger than 1K to be passed to event handlers. But that was not thecase. A comment in the source code told you that you might go over the limit ifyou passed large chunks to ``write``. So if you want a 1K limit, don't pass 64Kchunks to ``write``. Fair enough. You know what limit you want so you cancontrol the size of the data you pass to ``write``. So you limit the chunks to``write`` to 1K at a time. Even if you do this, your event handlers may get datachunks that are 2K in size. Suppose on the previous ``write`` the parser hasjust finished processing an open tag, so it is ready for text. Your ``write``passes 1K of text. You are not above the limit yet, so no event is generatedyet. The next ``write`` passes another 1K of text. It so happens that sax checksbuffer limits only once per ``write``, after the chunk of data has beenprocessed. Now you've hit the limit and you get a ``text`` event with 2K ofdata. So even if you limit your ``write`` calls to the buffer limit you've set,you may still get events with chunks at twice the buffer size limit you'vespecified.
We may consider reinstating an equivalent functionality, provided that itaddresses the issues above and does not cause a huge performance drop foruse-case scenarios that don't need it.