You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

263 lines
7.2 KiB

4 years ago
  1. Metadata-Version: 2.1
  2. Name: parsel
  3. Version: 1.5.1
  4. Summary: Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
  5. Home-page: https://github.com/scrapy/parsel
  6. Author: Scrapy project
  7. Author-email: info@scrapy.org
  8. License: BSD
  9. Keywords: parsel
  10. Platform: UNKNOWN
  11. Classifier: Development Status :: 5 - Production/Stable
  12. Classifier: Intended Audience :: Developers
  13. Classifier: License :: OSI Approved :: BSD License
  14. Classifier: Natural Language :: English
  15. Classifier: Topic :: Text Processing :: Markup
  16. Classifier: Topic :: Text Processing :: Markup :: HTML
  17. Classifier: Topic :: Text Processing :: Markup :: XML
  18. Classifier: Programming Language :: Python :: 2
  19. Classifier: Programming Language :: Python :: 2.7
  20. Classifier: Programming Language :: Python :: 3
  21. Classifier: Programming Language :: Python :: 3.4
  22. Classifier: Programming Language :: Python :: 3.5
  23. Classifier: Programming Language :: Python :: 3.6
  24. Classifier: Programming Language :: Python :: 3.7
  25. Classifier: Programming Language :: Python :: Implementation :: CPython
  26. Classifier: Programming Language :: Python :: Implementation :: PyPy
  27. Requires-Dist: w3lib (>=1.19.0)
  28. Requires-Dist: lxml (>=2.3)
  29. Requires-Dist: six (>=1.5.2)
  30. Requires-Dist: cssselect (>=0.9)
  31. Requires-Dist: functools32; python_version<'3.0'
  32. ===============================
  33. Parsel
  34. ===============================
  35. .. image:: https://img.shields.io/travis/scrapy/parsel/master.svg
  36. :target: https://travis-ci.org/scrapy/parsel
  37. :alt: Build Status
  38. .. image:: https://img.shields.io/pypi/v/parsel.svg
  39. :target: https://pypi.python.org/pypi/parsel
  40. :alt: PyPI Version
  41. .. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
  42. :target: http://codecov.io/github/scrapy/parsel?branch=master
  43. :alt: Coverage report
  44. Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
  45. * Free software: BSD license
  46. * Documentation: https://parsel.readthedocs.org.
  47. Features
  48. --------
  49. * Extract text using CSS or XPath selectors
  50. * Regular expression helper methods
  51. Example::
  52. >>> from parsel import Selector
  53. >>> sel = Selector(text=u"""<html>
  54. <body>
  55. <h1>Hello, Parsel!</h1>
  56. <ul>
  57. <li><a href="http://example.com">Link 1</a></li>
  58. <li><a href="http://scrapy.org">Link 2</a></li>
  59. </ul
  60. </body>
  61. </html>""")
  62. >>>
  63. >>> sel.css('h1::text').get()
  64. 'Hello, Parsel!'
  65. >>>
  66. >>> sel.css('h1::text').re('\w+')
  67. ['Hello', 'Parsel']
  68. >>>
  69. >>> for e in sel.css('ul > li'):
  70. ... print(e.xpath('.//a/@href').get())
  71. http://example.com
  72. http://scrapy.org
  73. History
  74. -------
  75. 1.5.1 (2018-10-25)
  76. ~~~~~~~~~~~~~~~~~~
  77. * ``has-class`` XPath function handles newlines and other separators
  78. in class names properly;
  79. * fixed parsing of HTML documents with null bytes;
  80. * documentation improvements;
  81. * Python 3.7 tests are run on CI; other test improvements.
  82. 1.5.0 (2018-07-04)
  83. ~~~~~~~~~~~~~~~~~~
  84. * New ``Selector.attrib`` and ``SelectorList.attrib`` properties which make
  85. it easier to get attributes of HTML elements.
  86. * CSS selectors became faster: compilation results are cached
  87. (LRU cache is used for ``css2xpath``), so there is
  88. less overhead when the same CSS expression is used several times.
  89. * ``.get()`` and ``.getall()`` selector methods are documented and recommended
  90. over ``.extract_first()`` and ``.extract()``.
  91. * Various documentation tweaks and improvements.
  92. One more change is that ``.extract()`` and ``.extract_first()`` methods
  93. are now implemented using ``.get()`` and ``.getall()``, not the other
  94. way around, and instead of calling ``Selector.extract`` all other methods
  95. now call ``Selector.get`` internally. It can be **backwards incompatible**
  96. in case of custom Selector subclasses which override ``Selector.extract``
  97. without doing the same for ``Selector.get``. If you have such Selector
  98. subclass, make sure ``get`` method is also overridden. For example, this::
  99. class MySelector(parsel.Selector):
  100. def extract(self):
  101. return super().extract() + " foo"
  102. should be changed to this::
  103. class MySelector(parsel.Selector):
  104. def get(self):
  105. return super().get() + " foo"
  106. extract = get
  107. 1.4.0 (2018-02-08)
  108. ~~~~~~~~~~~~~~~~~~
  109. * ``Selector`` and ``SelectorList`` can't be pickled because
  110. pickling/unpickling doesn't work for ``lxml.html.HtmlElement``;
  111. parsel now raises TypeError explicitly instead of allowing pickle to
  112. silently produce wrong output. This is technically backwards-incompatible
  113. if you're using Python < 3.6.
  114. 1.3.1 (2017-12-28)
  115. ~~~~~~~~~~~~~~~~~~
  116. * Fix artifact uploads to pypi.
  117. 1.3.0 (2017-12-28)
  118. ~~~~~~~~~~~~~~~~~~
  119. * ``has-class`` XPath extension function;
  120. * ``parsel.xpathfuncs.set_xpathfunc`` is a simplified way to register
  121. XPath extensions;
  122. * ``Selector.remove_namespaces`` now removes namespace declarations;
  123. * Python 3.3 support is dropped;
  124. * ``make htmlview`` command for easier Parsel docs development.
  125. * CI: PyPy installation is fixed; parsel now runs tests for PyPy3 as well.
  126. 1.2.0 (2017-05-17)
  127. ~~~~~~~~~~~~~~~~~~
  128. * Add ``SelectorList.get`` and ``SelectorList.getall``
  129. methods as aliases for ``SelectorList.extract_first``
  130. and ``SelectorList.extract`` respectively
  131. * Add default value parameter to ``SelectorList.re_first`` method
  132. * Add ``Selector.re_first`` method
  133. * Add ``replace_entities`` argument on ``.re()`` and ``.re_first()``
  134. to turn off replacing of character entity references
  135. * Bug fix: detect ``None`` result from lxml parsing and fallback with an empty document
  136. * Rearrange XML/HTML examples in the selectors usage docs
  137. * Travis CI:
  138. * Test against Python 3.6
  139. * Test against PyPy using "Portable PyPy for Linux" distribution
  140. 1.1.0 (2016-11-22)
  141. ~~~~~~~~~~~~~~~~~~
  142. * Change default HTML parser to `lxml.html.HTMLParser <http://lxml.de/api/lxml.html.HTMLParser-class.html>`_,
  143. which makes easier to use some HTML specific features
  144. * Add css2xpath function to translate CSS to XPath
  145. * Add support for ad-hoc namespaces declarations
  146. * Add support for XPath variables
  147. * Documentation improvements and updates
  148. 1.0.3 (2016-07-29)
  149. ~~~~~~~~~~~~~~~~~~
  150. * Add BSD-3-Clause license file
  151. * Re-enable PyPy tests
  152. * Integrate py.test runs with setuptools (needed for Debian packaging)
  153. * Changelog is now called ``NEWS``
  154. 1.0.2 (2016-04-26)
  155. ~~~~~~~~~~~~~~~~~~
  156. * Fix bug in exception handling causing original traceback to be lost
  157. * Added docstrings and other doc fixes
  158. 1.0.1 (2015-08-24)
  159. ~~~~~~~~~~~~~~~~~~
  160. * Updated PyPI classifiers
  161. * Added docstrings for csstranslator module and other doc fixes
  162. 1.0.0 (2015-08-22)
  163. ~~~~~~~~~~~~~~~~~~
  164. * Documentation fixes
  165. 0.9.6 (2015-08-14)
  166. ~~~~~~~~~~~~~~~~~~
  167. * Updated documentation
  168. * Extended test coverage
  169. 0.9.5 (2015-08-11)
  170. ~~~~~~~~~~~~~~~~~~
  171. * Support for extending SelectorList
  172. 0.9.4 (2015-08-10)
  173. ~~~~~~~~~~~~~~~~~~
  174. * Try workaround for travis-ci/dpl#253
  175. 0.9.3 (2015-08-07)
  176. ~~~~~~~~~~~~~~~~~~
  177. * Add base_url argument
  178. 0.9.2 (2015-08-07)
  179. ~~~~~~~~~~~~~~~~~~
  180. * Rename module unified -> selector and promoted root attribute
  181. * Add create_root_node function
  182. 0.9.1 (2015-08-04)
  183. ~~~~~~~~~~~~~~~~~~
  184. * Setup Sphinx build and docs structure
  185. * Build universal wheels
  186. * Rename some leftovers from package extraction
  187. 0.9.0 (2015-07-30)
  188. ~~~~~~~~~~~~~~~~~~
  189. * First release on PyPI.