|
|
- Metadata-Version: 2.1
- Name: parsel
- Version: 1.5.1
- Summary: Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
- Home-page: https://github.com/scrapy/parsel
- Author: Scrapy project
- Author-email: info@scrapy.org
- License: BSD
- Keywords: parsel
- Platform: UNKNOWN
- Classifier: Development Status :: 5 - Production/Stable
- Classifier: Intended Audience :: Developers
- Classifier: License :: OSI Approved :: BSD License
- Classifier: Natural Language :: English
- Classifier: Topic :: Text Processing :: Markup
- Classifier: Topic :: Text Processing :: Markup :: HTML
- Classifier: Topic :: Text Processing :: Markup :: XML
- Classifier: Programming Language :: Python :: 2
- Classifier: Programming Language :: Python :: 2.7
- Classifier: Programming Language :: Python :: 3
- Classifier: Programming Language :: Python :: 3.4
- Classifier: Programming Language :: Python :: 3.5
- Classifier: Programming Language :: Python :: 3.6
- Classifier: Programming Language :: Python :: 3.7
- Classifier: Programming Language :: Python :: Implementation :: CPython
- Classifier: Programming Language :: Python :: Implementation :: PyPy
- Requires-Dist: w3lib (>=1.19.0)
- Requires-Dist: lxml (>=2.3)
- Requires-Dist: six (>=1.5.2)
- Requires-Dist: cssselect (>=0.9)
- Requires-Dist: functools32; python_version<'3.0'
-
- ===============================
- Parsel
- ===============================
-
- .. image:: https://img.shields.io/travis/scrapy/parsel/master.svg
- :target: https://travis-ci.org/scrapy/parsel
- :alt: Build Status
-
- .. image:: https://img.shields.io/pypi/v/parsel.svg
- :target: https://pypi.python.org/pypi/parsel
- :alt: PyPI Version
-
- .. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
- :target: http://codecov.io/github/scrapy/parsel?branch=master
- :alt: Coverage report
-
-
- Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
-
- * Free software: BSD license
- * Documentation: https://parsel.readthedocs.org.
-
- Features
- --------
-
- * Extract text using CSS or XPath selectors
- * Regular expression helper methods
-
- Example::
-
- >>> from parsel import Selector
- >>> sel = Selector(text=u"""<html>
- <body>
- <h1>Hello, Parsel!</h1>
- <ul>
- <li><a href="http://example.com">Link 1</a></li>
- <li><a href="http://scrapy.org">Link 2</a></li>
- </ul
- </body>
- </html>""")
- >>>
- >>> sel.css('h1::text').get()
- 'Hello, Parsel!'
- >>>
- >>> sel.css('h1::text').re('\w+')
- ['Hello', 'Parsel']
- >>>
- >>> for e in sel.css('ul > li'):
- ... print(e.xpath('.//a/@href').get())
- http://example.com
- http://scrapy.org
-
-
-
-
- History
- -------
-
- 1.5.1 (2018-10-25)
- ~~~~~~~~~~~~~~~~~~
-
- * ``has-class`` XPath function handles newlines and other separators
- in class names properly;
- * fixed parsing of HTML documents with null bytes;
- * documentation improvements;
- * Python 3.7 tests are run on CI; other test improvements.
-
- 1.5.0 (2018-07-04)
- ~~~~~~~~~~~~~~~~~~
-
- * New ``Selector.attrib`` and ``SelectorList.attrib`` properties which make
- it easier to get attributes of HTML elements.
- * CSS selectors became faster: compilation results are cached
- (LRU cache is used for ``css2xpath``), so there is
- less overhead when the same CSS expression is used several times.
- * ``.get()`` and ``.getall()`` selector methods are documented and recommended
- over ``.extract_first()`` and ``.extract()``.
- * Various documentation tweaks and improvements.
-
- One more change is that ``.extract()`` and ``.extract_first()`` methods
- are now implemented using ``.get()`` and ``.getall()``, not the other
- way around, and instead of calling ``Selector.extract`` all other methods
- now call ``Selector.get`` internally. It can be **backwards incompatible**
- in case of custom Selector subclasses which override ``Selector.extract``
- without doing the same for ``Selector.get``. If you have such Selector
- subclass, make sure ``get`` method is also overridden. For example, this::
-
- class MySelector(parsel.Selector):
- def extract(self):
- return super().extract() + " foo"
-
- should be changed to this::
-
- class MySelector(parsel.Selector):
- def get(self):
- return super().get() + " foo"
- extract = get
-
-
- 1.4.0 (2018-02-08)
- ~~~~~~~~~~~~~~~~~~
-
- * ``Selector`` and ``SelectorList`` can't be pickled because
- pickling/unpickling doesn't work for ``lxml.html.HtmlElement``;
- parsel now raises TypeError explicitly instead of allowing pickle to
- silently produce wrong output. This is technically backwards-incompatible
- if you're using Python < 3.6.
-
-
- 1.3.1 (2017-12-28)
- ~~~~~~~~~~~~~~~~~~
-
- * Fix artifact uploads to pypi.
-
-
- 1.3.0 (2017-12-28)
- ~~~~~~~~~~~~~~~~~~
-
- * ``has-class`` XPath extension function;
- * ``parsel.xpathfuncs.set_xpathfunc`` is a simplified way to register
- XPath extensions;
- * ``Selector.remove_namespaces`` now removes namespace declarations;
- * Python 3.3 support is dropped;
- * ``make htmlview`` command for easier Parsel docs development.
- * CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.
-
-
- 1.2.0 (2017-05-17)
- ~~~~~~~~~~~~~~~~~~
-
- * Add ``SelectorList.get`` and ``SelectorList.getall``
- methods as aliases for ``SelectorList.extract_first``
- and ``SelectorList.extract`` respectively
- * Add default value parameter to ``SelectorList.re_first`` method
- * Add ``Selector.re_first`` method
- * Add ``replace_entities`` argument on ``.re()`` and ``.re_first()``
- to turn off replacing of character entity references
- * Bug fix: detect ``None`` result from lxml parsing and fallback with an empty document
- * Rearrange XML/HTML examples in the selectors usage docs
- * Travis CI:
-
- * Test against Python 3.6
- * Test against PyPy using "Portable PyPy for Linux" distribution
-
-
- 1.1.0 (2016-11-22)
- ~~~~~~~~~~~~~~~~~~
-
- * Change default HTML parser to `lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>`_,
- which makes easier to use some HTML specific features
- * Add css2xpath function to translate CSS to XPath
- * Add support for ad-hoc namespaces declarations
- * Add support for XPath variables
- * Documentation improvements and updates
-
-
- 1.0.3 (2016-07-29)
- ~~~~~~~~~~~~~~~~~~
-
- * Add BSD-3-Clause license file
- * Re-enable PyPy tests
- * Integrate py.test runs with setuptools (needed for Debian packaging)
- * Changelog is now called ``NEWS``
-
-
- 1.0.2 (2016-04-26)
- ~~~~~~~~~~~~~~~~~~
-
- * Fix bug in exception handling causing original traceback to be lost
- * Added docstrings and other doc fixes
-
-
- 1.0.1 (2015-08-24)
- ~~~~~~~~~~~~~~~~~~
-
- * Updated PyPI classifiers
- * Added docstrings for csstranslator module and other doc fixes
-
-
- 1.0.0 (2015-08-22)
- ~~~~~~~~~~~~~~~~~~
-
- * Documentation fixes
-
-
- 0.9.6 (2015-08-14)
- ~~~~~~~~~~~~~~~~~~
-
- * Updated documentation
- * Extended test coverage
-
-
- 0.9.5 (2015-08-11)
- ~~~~~~~~~~~~~~~~~~
-
- * Support for extending SelectorList
-
-
- 0.9.4 (2015-08-10)
- ~~~~~~~~~~~~~~~~~~
-
- * Try workaround for travis-ci/dpl#253
-
-
- 0.9.3 (2015-08-07)
- ~~~~~~~~~~~~~~~~~~
-
- * Add base_url argument
-
-
- 0.9.2 (2015-08-07)
- ~~~~~~~~~~~~~~~~~~
-
- * Rename module unified -> selector and promoted root attribute
- * Add create_root_node function
-
-
- 0.9.1 (2015-08-04)
- ~~~~~~~~~~~~~~~~~~
-
- * Setup Sphinx build and docs structure
- * Build universal wheels
- * Rename some leftovers from package extraction
-
-
- 0.9.0 (2015-07-30)
- ~~~~~~~~~~~~~~~~~~
-
- * First release on PyPI.
-
-
|