Metadata-Version: 2.1
|
|
Name: bleach
|
|
Version: 3.0.2
|
|
Summary: An easy safelist-based HTML-sanitizing tool.
|
|
Home-page: https://github.com/mozilla/bleach
|
|
Maintainer: Will Kahn-Greene
|
|
Maintainer-email: willkg@mozilla.com
|
|
License: Apache Software License
|
|
Platform: UNKNOWN
|
|
Classifier: Development Status :: 5 - Production/Stable
|
|
Classifier: Environment :: Web Environment
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
Classifier: Operating System :: OS Independent
|
|
Classifier: Programming Language :: Python
|
|
Classifier: Programming Language :: Python :: 2
|
|
Classifier: Programming Language :: Python :: 2.7
|
|
Classifier: Programming Language :: Python :: 3
|
|
Classifier: Programming Language :: Python :: 3.4
|
|
Classifier: Programming Language :: Python :: 3.5
|
|
Classifier: Programming Language :: Python :: 3.6
|
|
Classifier: Programming Language :: Python :: 3.7
|
|
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
|
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*
|
|
Requires-Dist: six
|
|
Requires-Dist: webencodings
|
|
|
|
======
|
|
Bleach
|
|
======
|
|
|
|
.. image:: https://travis-ci.org/mozilla/bleach.svg?branch=master
|
|
:target: https://travis-ci.org/mozilla/bleach
|
|
|
|
.. image:: https://badge.fury.io/py/bleach.svg
|
|
:target: http://badge.fury.io/py/bleach
|
|
|
|
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips
|
|
markup and attributes.
|
|
|
|
Bleach can also linkify text safely, applying filters that Django's ``urlize``
|
|
filter cannot, and optionally setting ``rel`` attributes, even on links already
|
|
in the text.
|
|
|
|
Bleach is intended for sanitizing text from *untrusted* sources. If you find
|
|
yourself jumping through hoops to allow your site administrators to do lots of
|
|
things, you're probably outside the use cases. Either trust those users, or
|
|
don't.
|
|
|
|
Because it relies on html5lib_, Bleach is as good as modern browsers at dealing
|
|
with weird, quirky HTML fragments. And *any* of Bleach's methods will fix
|
|
unbalanced or mis-nested tags.
|
|
|
|
The version on GitHub_ is the most up-to-date and contains the latest bug
|
|
fixes. You can find full documentation on `ReadTheDocs`_.
|
|
|
|
:Code: https://github.com/mozilla/bleach
|
|
:Documentation: https://bleach.readthedocs.io/
|
|
:Issue tracker: https://github.com/mozilla/bleach/issues
|
|
:IRC: ``#bleach`` on irc.mozilla.org
|
|
:License: Apache License v2; see LICENSE file
|
|
|
|
|
|
Reporting Bugs
|
|
==============
|
|
|
|
For regular bugs, please report them `in our issue tracker
|
|
<https://github.com/mozilla/bleach/issues>`_.
|
|
|
|
If you believe that you've found a security vulnerability, please `file a secure
|
|
bug report in our bug tracker
|
|
<https://bugzilla.mozilla.org/enter_bug.cgi?assigned_to=nobody%40mozilla.org&product=Webtools&component=Bleach-security&groups=webtools-security>`_
|
|
or send an email to *security AT mozilla DOT org*.
|
|
|
|
For more information on security-related bug disclosure and the PGP key to use
|
|
for sending encrypted mail or to verify responses received from that address,
|
|
please read our wiki page at
|
|
`<https://www.mozilla.org/en-US/security/#For_Developers>`_.
|
|
|
|
|
|
Security
|
|
========
|
|
|
|
Bleach is a security-focused library.
|
|
|
|
We have a responsible security vulnerability reporting process. Please use
|
|
that if you're reporting a security issue.
|
|
|
|
Security issues are fixed in private. After we land such a fix, we'll do a
|
|
release.
|
|
|
|
For every release, we mark security issues we've fixed in the ``CHANGES`` in
|
|
the **Security issues** section. We include any relevant CVE links.
|
|
|
|
|
|
Installing Bleach
|
|
=================
|
|
|
|
Bleach is available on PyPI_, so you can install it with ``pip``::
|
|
|
|
$ pip install bleach
|
|
|
|
|
|
Upgrading Bleach
|
|
================
|
|
|
|
.. warning::
|
|
|
|
Before doing any upgrades, read through `Bleach Changes
|
|
<https://bleach.readthedocs.io/en/latest/changes.html>`_ for backwards
|
|
incompatible changes, newer versions, etc.
|
|
|
|
|
|
Basic use
|
|
=========
|
|
|
|
The simplest way to use Bleach is:
|
|
|
|
.. code-block:: python
|
|
|
|
>>> import bleach
|
|
|
|
>>> bleach.clean('an <script>evil()</script> example')
|
|
u'an <script>evil()</script> example'
|
|
|
|
>>> bleach.linkify('an http://example.com url')
|
|
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url
|
|
|
|
|
|
Code of conduct
|
|
===============
|
|
|
|
This project and repository is governed by Mozilla's code of conduct and
|
|
etiquette guidelines. For more details please see the `Mozilla Community
|
|
Participation Guidelines
|
|
<https://www.mozilla.org/about/governance/policies/participation/>`_ and
|
|
`Developer Etiquette Guidelines
|
|
<https://bugzilla.mozilla.org/page.cgi?id=etiquette.html>`_.
|
|
|
|
|
|
.. _html5lib: https://github.com/html5lib/html5lib-python
|
|
.. _GitHub: https://github.com/mozilla/bleach
|
|
.. _ReadTheDocs: https://bleach.readthedocs.io/
|
|
.. _PyPI: http://pypi.python.org/pypi/bleach
|
|
|
|
|
|
Bleach changes
|
|
==============
|
|
|
|
Version 3.0.2 (October 11th, 2018)
|
|
----------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
None
|
|
|
|
**Features**
|
|
|
|
None
|
|
|
|
**Bug fixes**
|
|
|
|
* Merge ``Characters`` tokens after sanitizing them. This fixes issues in the
|
|
``LinkifyFilter`` where it was only linkifying parts of urls. (#374)
|
|
|
|
|
|
Version 3.0.1 (October 9th, 2018)
|
|
---------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
None
|
|
|
|
**Features**
|
|
|
|
* Support Python 3.7. It supported Python 3.7 just fine, but we added 3.7 to
|
|
the list of Python environments we test so this is now officially supported.
|
|
(#377)
|
|
|
|
**Bug fixes**
|
|
|
|
* Fix ``list`` object has no attribute ``lower`` in ``clean``. (#398)
|
|
* Fix ``abbr`` getting escaped in ``linkify``. (#400)
|
|
|
|
|
|
Version 3.0.0 (October 3rd, 2018)
|
|
---------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
* A bunch of functions were moved from one module to another.
|
|
|
|
These were moved from ``bleach.sanitizer`` to ``bleach.html5lib_shim``:
|
|
|
|
* ``convert_entity``
|
|
* ``convert_entities``
|
|
* ``match_entity``
|
|
* ``next_possible_entity``
|
|
* ``BleachHTMLSerializer``
|
|
* ``BleachHTMLTokenizer``
|
|
* ``BleachHTMLParser``
|
|
|
|
These functions and classes weren't documented and aren't part of the
|
|
public API, but people read code and might be using them so we're
|
|
considering it an incompatible API change.
|
|
|
|
If you're using them, you'll need to update your code.
|
|
|
|
**Features**
|
|
|
|
* Bleach no longer depends on html5lib. html5lib==1.0.1 is now vendored into
|
|
Bleach. You can remove it from your requirements file if none of your other
|
|
requirements require html5lib.
|
|
|
|
This means Bleach will now work fine with other libraries that depend on
|
|
html5lib regardless of what version of html5lib they require. (#386)
|
|
|
|
**Bug fixes**
|
|
|
|
* Fixed tags getting added when using clean or linkify. This was a
|
|
long-standing regression from the Bleach 2.0 rewrite. (#280, #392)
|
|
|
|
* Fixed ``<isindex>`` getting replaced with a string. Now it gets escaped or
|
|
stripped depending on whether it's in the allowed tags or not. (#279)
|
|
|
|
|
|
Version 2.1.4 (August 16th, 2018)
|
|
---------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
* Dropped support for Python 3.3. (#328)
|
|
|
|
**Features**
|
|
|
|
None
|
|
|
|
**Bug fixes**
|
|
|
|
* Handle ambiguous ampersands in correctly. (#359)
|
|
|
|
|
|
Version 2.1.3 (March 5th, 2018)
|
|
-------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
* Attributes that have URI values weren't properly sanitized if the
|
|
values contained character entities. Using character entities, it
|
|
was possible to construct a URI value with a scheme that was not
|
|
allowed that would slide through unsanitized.
|
|
|
|
This security issue was introduced in Bleach 2.1. Anyone using
|
|
Bleach 2.1 is highly encouraged to upgrade.
|
|
|
|
https://bugzilla.mozilla.org/show_bug.cgi?id=1442745
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
None
|
|
|
|
**Features**
|
|
|
|
None
|
|
|
|
**Bug fixes**
|
|
|
|
* Fixed some other edge cases for attribute URI value sanitizing and
|
|
improved testing of this code.
|
|
|
|
|
|
Version 2.1.2 (December 7th, 2017)
|
|
----------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
None
|
|
|
|
**Features**
|
|
|
|
None
|
|
|
|
**Bug fixes**
|
|
|
|
* Support html5lib-python 1.0.1. (#337)
|
|
|
|
* Add deprecation warning for supporting html5lib-python < 1.0.
|
|
|
|
* Switch to semver.
|
|
|
|
|
|
Version 2.1.1 (October 2nd, 2017)
|
|
---------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
None
|
|
|
|
**Features**
|
|
|
|
None
|
|
|
|
**Bug fixes**
|
|
|
|
* Fix ``setup.py`` opening files when ``LANG=``. (#324)
|
|
|
|
|
|
Version 2.1 (September 28th, 2017)
|
|
----------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
* Convert control characters (backspace particularly) to "?" preventing
|
|
malicious copy-and-paste situations. (#298)
|
|
|
|
See `<https://github.com/mozilla/bleach/issues/298>`_ for more details.
|
|
|
|
This affects all previous versions of Bleach. Check the comments on that
|
|
issue for ways to alleviate the issue if you can't upgrade to Bleach 2.1.
|
|
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
* Redid versioning. ``bleach.VERSION`` is no longer available. Use the string
|
|
version at ``bleach.__version__`` and parse it with
|
|
``pkg_resources.parse_version``. (#307)
|
|
|
|
* clean, linkify: linkify and clean should only accept text types; thank you,
|
|
Janusz! (#292)
|
|
|
|
* clean, linkify: accept only unicode or utf-8-encoded str (#176)
|
|
|
|
|
|
**Features**
|
|
|
|
|
|
**Bug fixes**
|
|
|
|
* ``bleach.clean()`` no longer unescapes entities including ones that are missing
|
|
a ``;`` at the end which can happen in urls and other places. (#143)
|
|
|
|
* linkify: fix http links inside of mailto links; thank you, sedrubal! (#300)
|
|
|
|
* clarify security policy in docs (#303)
|
|
|
|
* fix dependency specification for html5lib 1.0b8, 1.0b9, and 1.0b10; thank you,
|
|
Zoltán! (#268)
|
|
|
|
* add Bleach vs. html5lib comparison to README; thank you, Stu Cox! (#278)
|
|
|
|
* fix KeyError exceptions on tags without href attr; thank you, Alex Defsen!
|
|
(#273)
|
|
|
|
* add test website and scripts to test ``bleach.clean()`` output in browser;
|
|
thank you, Greg Guthe!
|
|
|
|
|
|
Version 2.0 (March 8th, 2017)
|
|
-----------------------------
|
|
|
|
**Security fixes**
|
|
|
|
* None
|
|
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
* Removed support for Python 2.6. #206
|
|
|
|
* Removed support for Python 3.2. #224
|
|
|
|
* Bleach no longer supports html5lib < 0.99999999 (8 9s).
|
|
|
|
This version is a rewrite to use the new sanitizing API since the old
|
|
one was dropped in html5lib 0.99999999 (8 9s).
|
|
|
|
If you're using 0.9999999 (7 9s) upgrade to 0.99999999 (8 9s) or higher.
|
|
|
|
If you're using 1.0b8 (equivalent to 0.9999999 (7 9s)), upgrade to 1.0b9
|
|
(equivalent to 0.99999999 (8 9s)) or higher.
|
|
|
|
* ``bleach.clean`` and friends were rewritten
|
|
|
|
``clean`` was reimplemented as an html5lib filter and happens at a different
|
|
step in the HTML parsing -> traversing -> serializing process. Because of
|
|
that, there are some differences in clean's output as compared with previous
|
|
versions.
|
|
|
|
Amongst other things, this version will add end tags even if the tag in
|
|
question is to be escaped.
|
|
|
|
* ``bleach.clean`` and friends attribute callables now take three arguments:
|
|
tag, attribute name and attribute value. Previously they only took attribute
|
|
name and attribute value.
|
|
|
|
All attribute callables will need to be updated.
|
|
|
|
* ``bleach.linkify`` was rewritten
|
|
|
|
``linkify`` was reimplemented as an html5lib Filter. As such, it no longer
|
|
accepts a ``tokenizer`` argument.
|
|
|
|
The callback functions for adjusting link attributes now takes a namespaced
|
|
attribute.
|
|
|
|
Previously you'd do something like this::
|
|
|
|
def check_protocol(attrs, is_new):
|
|
if not attrs.get('href', '').startswith('http:', 'https:')):
|
|
return None
|
|
return attrs
|
|
|
|
Now it's more like this::
|
|
|
|
def check_protocol(attrs, is_new):
|
|
if not attrs.get((None, u'href'), u'').startswith(('http:', 'https:')):
|
|
# ^^^^^^^^^^^^^^^
|
|
return None
|
|
return attrs
|
|
|
|
Further, you need to make sure you're always using unicode values. If you
|
|
don't then html5lib will raise an assertion error that the value is not
|
|
unicode.
|
|
|
|
All linkify filters will need to be updated.
|
|
|
|
* ``bleach.linkify`` and friends had a ``skip_pre`` argument--that's been
|
|
replaced with a more general ``skip_tags`` argument.
|
|
|
|
Before, you might do::
|
|
|
|
bleach.linkify(some_text, skip_pre=True)
|
|
|
|
The equivalent with Bleach 2.0 is::
|
|
|
|
bleach.linkify(some_text, skip_tags=['pre'])
|
|
|
|
You can skip other tags, too, like ``style`` or ``script`` or other places
|
|
where you don't want linkification happening.
|
|
|
|
All uses of linkify that use ``skip_pre`` will need to be updated.
|
|
|
|
|
|
**Changes**
|
|
|
|
* Supports Python 3.6.
|
|
|
|
* Supports html5lib >= 0.99999999 (8 9s).
|
|
|
|
* There's a ``bleach.sanitizer.Cleaner`` class that you can instantiate with your
|
|
favorite clean settings for easy reuse.
|
|
|
|
* There's a ``bleach.linkifier.Linker`` class that you can instantiate with your
|
|
favorite linkify settings for easy reuse.
|
|
|
|
* There's a ``bleach.linkifier.LinkifyFilter`` which is an htm5lib filter that
|
|
you can pass as a filter to ``bleach.sanitizer.Cleaner`` allowing you to clean
|
|
and linkify in one pass.
|
|
|
|
* ``bleach.clean`` and friends can now take a callable as an attributes arg value.
|
|
|
|
* Tons of bug fixes.
|
|
|
|
* Cleaned up tests.
|
|
|
|
* Documentation fixes.
|
|
|
|
|
|
Version 1.5 (November 4th, 2016)
|
|
--------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
* None
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
* clean: The list of ``ALLOWED_PROTOCOLS`` now defaults to http, https and
|
|
mailto.
|
|
|
|
Previously it was a long list of protocols something like ed2k, ftp, http,
|
|
https, irc, mailto, news, gopher, nntp, telnet, webcal, xmpp, callto, feed,
|
|
urn, aim, rsync, tag, ssh, sftp, rtsp, afs, data. #149
|
|
|
|
**Changes**
|
|
|
|
* clean: Added ``protocols`` to arguments list to let you override the list of
|
|
allowed protocols. Thank you, Andreas Malecki! #149
|
|
|
|
* linkify: Fix a bug involving periods at the end of an email address. Thank you,
|
|
Lorenz Schori! #219
|
|
|
|
* linkify: Fix linkification of non-ascii ports. Thank you Alexandre, Macabies!
|
|
#207
|
|
|
|
* linkify: Fix linkify inappropriately removing node tails when dropping nodes.
|
|
#132
|
|
|
|
* Fixed a test that failed periodically. #161
|
|
|
|
* Switched from nose to py.test. #204
|
|
|
|
* Add test matrix for all supported Python and html5lib versions. #230
|
|
|
|
* Limit to html5lib ``>=0.999,!=0.9999,!=0.99999,<0.99999999`` because 0.9999
|
|
and 0.99999 are busted.
|
|
|
|
* Add support for ``python setup.py test``. #97
|
|
|
|
|
|
Version 1.4.3 (May 23rd, 2016)
|
|
------------------------------
|
|
|
|
**Security fixes**
|
|
|
|
* None
|
|
|
|
**Changes**
|
|
|
|
* Limit to html5lib ``>=0.999,<0.99999999`` because of impending change to
|
|
sanitizer api. #195
|
|
|
|
|
|
Version 1.4.2 (September 11, 2015)
|
|
----------------------------------
|
|
|
|
**Changes**
|
|
|
|
* linkify: Fix hang in linkify with ``parse_email=True``. #124
|
|
|
|
* linkify: Fix crash in linkify when removing a link that is a first-child. #136
|
|
|
|
* Updated TLDs.
|
|
|
|
* linkify: Don't remove exterior brackets when linkifying. #146
|
|
|
|
|
|
Version 1.4.1 (December 15, 2014)
|
|
---------------------------------
|
|
|
|
**Changes**
|
|
|
|
* Consistent order of attributes in output.
|
|
|
|
* Python 3.4 support.
|
|
|
|
|
|
Version 1.4 (January 12, 2014)
|
|
------------------------------
|
|
|
|
**Changes**
|
|
|
|
* linkify: Update linkify to use etree type Treewalker instead of simpletree.
|
|
|
|
* Updated html5lib to version ``>=0.999``.
|
|
|
|
* Update all code to be compatible with Python 3 and 2 using six.
|
|
|
|
* Switch to Apache License.
|
|
|
|
|
|
Version 1.3
|
|
-----------
|
|
|
|
* Used by Python 3-only fork.
|
|
|
|
|
|
Version 1.2.2 (May 18, 2013)
|
|
----------------------------
|
|
|
|
* Pin html5lib to version 0.95 for now due to major API break.
|
|
|
|
|
|
Version 1.2.1 (February 19, 2013)
|
|
---------------------------------
|
|
|
|
* ``clean()`` no longer considers ``feed:`` an acceptable protocol due to
|
|
inconsistencies in browser behavior.
|
|
|
|
|
|
Version 1.2 (January 28, 2013)
|
|
------------------------------
|
|
|
|
* ``linkify()`` has changed considerably. Many keyword arguments have been
|
|
replaced with a single callbacks list. Please see the documentation for more
|
|
information.
|
|
|
|
* Bleach will no longer consider unacceptable protocols when linkifying.
|
|
|
|
* ``linkify()`` now takes a tokenizer argument that allows it to skip
|
|
sanitization.
|
|
|
|
* ``delinkify()`` is gone.
|
|
|
|
* Removed exception handling from ``_render``. ``clean()`` and ``linkify()`` may
|
|
now throw.
|
|
|
|
* ``linkify()`` correctly ignores case for protocols and domain names.
|
|
|
|
* ``linkify()`` correctly handles markup within an <a> tag.
|
|
|
|
|
|
Version 1.1.5
|
|
-------------
|
|
|
|
|
|
Version 1.1.4
|
|
-------------
|
|
|
|
|
|
Version 1.1.3 (July 10, 2012)
|
|
-----------------------------
|
|
|
|
* Fix parsing bare URLs when parse_email=True.
|
|
|
|
|
|
Version 1.1.2 (June 1, 2012)
|
|
----------------------------
|
|
|
|
* Fix hang in style attribute sanitizer. (#61)
|
|
|
|
* Allow ``/`` in style attribute values.
|
|
|
|
|
|
Version 1.1.1 (February 17, 2012)
|
|
---------------------------------
|
|
|
|
* Fix tokenizer for html5lib 0.9.5.
|
|
|
|
|
|
Version 1.1.0 (October 24, 2011)
|
|
--------------------------------
|
|
|
|
* ``linkify()`` now understands port numbers. (#38)
|
|
|
|
* Documented character encoding behavior. (#41)
|
|
|
|
* Add an optional target argument to ``linkify()``.
|
|
|
|
* Add ``delinkify()`` method. (#45)
|
|
|
|
* Support subdomain whitelist for ``delinkify()``. (#47, #48)
|
|
|
|
|
|
Version 1.0.4 (September 2, 2011)
|
|
---------------------------------
|
|
|
|
* Switch to SemVer git tags.
|
|
|
|
* Make ``linkify()`` smarter about trailing punctuation. (#30)
|
|
|
|
* Pass ``exc_info`` to logger during rendering issues.
|
|
|
|
* Add wildcard key for attributes. (#19)
|
|
|
|
* Make ``linkify()`` use the ``HTMLSanitizer`` tokenizer. (#36)
|
|
|
|
* Fix URLs wrapped in parentheses. (#23)
|
|
|
|
* Make ``linkify()`` UTF-8 safe. (#33)
|
|
|
|
|
|
Version 1.0.3 (June 14, 2011)
|
|
-----------------------------
|
|
|
|
* ``linkify()`` works with 3rd level domains. (#24)
|
|
|
|
* ``clean()`` supports vendor prefixes in style values. (#31, #32)
|
|
|
|
* Fix ``linkify()`` email escaping.
|
|
|
|
|
|
Version 1.0.2 (June 6, 2011)
|
|
----------------------------
|
|
|
|
* ``linkify()`` supports email addresses.
|
|
|
|
* ``clean()`` supports callables in attributes filter.
|
|
|
|
|
|
Version 1.0.1 (April 12, 2011)
|
|
------------------------------
|
|
|
|
* ``linkify()`` doesn't drop trailing slashes. (#21)
|
|
* ``linkify()`` won't linkify 'libgl.so.1'. (#22)
|
|
|
|
|