======
|
|
Bleach
|
|
======
|
|
|
|
.. image:: https://travis-ci.org/mozilla/bleach.png?branch=master
|
|
:target: https://travis-ci.org/mozilla/bleach
|
|
|
|
.. image:: https://badge.fury.io/py/Bleach.svg
|
|
:target: http://badge.fury.io/py/Bleach
|
|
|
|
Bleach is a whitelist-based HTML sanitizing library that escapes or strips
|
|
markup and attributes.
|
|
|
|
Bleach can also linkify text safely, applying filters that Django's ``urlize``
|
|
filter cannot, and optionally setting ``rel`` attributes, even on links already
|
|
in the text.
|
|
|
|
Bleach is intended for sanitizing text from *untrusted* sources. If you find
|
|
yourself jumping through hoops to allow your site administrators to do lots of
|
|
things, you're probably outside the use cases. Either trust those users, or
|
|
don't.
|
|
|
|
Because it relies on html5lib_, Bleach is as good as modern browsers at dealing
|
|
with weird, quirky HTML fragments. And *any* of Bleach's methods will fix
|
|
unbalanced or mis-nested tags.
|
|
|
|
The version on GitHub_ is the most up-to-date and contains the latest bug
|
|
fixes. You can find full documentation on `ReadTheDocs`_.
|
|
|
|
:Code: https://github.com/mozilla/bleach
|
|
:Documentation: https://bleach.readthedocs.io/
|
|
:Issue tracker: https://github.com/mozilla/bleach/issues
|
|
:IRC: ``#bleach`` on irc.mozilla.org
|
|
:License: Apache License v2; see LICENSE file
|
|
|
|
|
|
Reporting Bugs
|
|
==============
|
|
|
|
For regular bugs, please report them `in our issue tracker
|
|
<https://github.com/mozilla/bleach/issues>`_.
|
|
|
|
If you believe that you've found a security vulnerability, please `file a secure
|
|
bug report in our bug tracker
|
|
<https://bugzilla.mozilla.org/enter_bug.cgi?assigned_to=nobody%40mozilla.org&product=Webtools&component=Bleach-security&groups=webtools-security>`_
|
|
or send an email to *security AT mozilla DOT org*.
|
|
|
|
For more information on security-related bug disclosure and the PGP key to use
|
|
for sending encrypted mail or to verify responses received from that address,
|
|
please read our wiki page at
|
|
`<https://www.mozilla.org/en-US/security/#For_Developers>`_.
|
|
|
|
|
|
Installing Bleach
|
|
=================
|
|
|
|
Bleach is available on PyPI_, so you can install it with ``pip``::
|
|
|
|
$ pip install bleach
|
|
|
|
Or with ``easy_install``::
|
|
|
|
$ easy_install bleach
|
|
|
|
Or by cloning the repo from GitHub_::
|
|
|
|
$ git clone git://github.com/mozilla/bleach.git
|
|
|
|
Then install it by running::
|
|
|
|
$ python setup.py install
|
|
|
|
|
|
Upgrading Bleach
|
|
================
|
|
|
|
.. warning::
|
|
|
|
Before doing any upgrades, read through `Bleach Changes
|
|
<https://bleach.readthedocs.io/en/latest/changes.html>`_ for backwards
|
|
incompatible changes, newer versions, etc.
|
|
|
|
|
|
Basic use
|
|
=========
|
|
|
|
The simplest way to use Bleach is:
|
|
|
|
.. code-block:: python
|
|
|
|
>>> import bleach
|
|
|
|
>>> bleach.clean('an <script>evil()</script> example')
|
|
u'an <script>evil()</script> example'
|
|
|
|
>>> bleach.linkify('an http://example.com url')
|
|
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url
|
|
|
|
|
|
.. _html5lib: https://github.com/html5lib/html5lib-python
|
|
.. _GitHub: https://github.com/mozilla/bleach
|
|
.. _ReadTheDocs: https://bleach.readthedocs.io/
|
|
.. _PyPI: http://pypi.python.org/pypi/bleach
|
|
|
|
|
|
Bleach Changes
|
|
==============
|
|
|
|
Version 1.5 (November 4th, 2016)
|
|
--------------------------------
|
|
|
|
**Backwards incompatible changes**
|
|
|
|
- clean: The list of ``ALLOWED_PROTOCOLS`` now defaults to http, https and
|
|
mailto. Previously it was a long list of protocols something like ed2k, ftp,
|
|
http, https, irc, mailto, news, gopher, nntp, telnet, webcal, xmpp, callto,
|
|
feed, urn, aim, rsync, tag, ssh, sftp, rtsp, afs, data. #149
|
|
|
|
**Changes**
|
|
|
|
- clean: Added ``protocols`` to arguments list to let you override the list of
|
|
allowed protocols. Thank you, Andreas Malecki! #149
|
|
- linkify: Fix a bug involving periods at the end of an email address. Thank you,
|
|
Lorenz Schori! #219
|
|
- linkify: Fix linkification of non-ascii ports. Thank you Alexandre, Macabies!
|
|
#207
|
|
- linkify: Fix linkify inappropriately removing node tails when dropping nodes.
|
|
#132
|
|
- Fixed a test that failed periodically. #161
|
|
- Switched from nose to py.test. #204
|
|
- Add test matrix for all supported Python and html5lib versions. #230
|
|
- Limit to html5lib ``>=0.999,!=0.9999,!=0.99999,<0.99999999`` because 0.9999
|
|
and 0.99999 are busted.
|
|
- Add support for ``python setup.py test``. #97
|
|
|
|
|
|
Version 1.4.3 (May 23rd, 2016)
|
|
------------------------------
|
|
|
|
**Changes**
|
|
|
|
- Limit to html5lib ``>=0.999,<0.99999999`` because of impending change to
|
|
sanitizer api. #195
|
|
|
|
|
|
Version 1.4.2 (September 11, 2015)
|
|
----------------------------------
|
|
|
|
**Changes**
|
|
|
|
- linkify: Fix hang in linkify with ``parse_email=True``. #124
|
|
- linkify: Fix crash in linkify when removing a link that is a first-child. #136
|
|
- Updated TLDs.
|
|
- linkify: Don't remove exterior brackets when linkifying. #146
|
|
|
|
|
|
Version 1.4.1 (December 15, 2014)
|
|
---------------------------------
|
|
|
|
**Changes**
|
|
|
|
- Consistent order of attributes in output.
|
|
- Python 3.4 support.
|
|
|
|
|
|
Version 1.4 (January 12, 2014)
|
|
------------------------------
|
|
|
|
**Changes**
|
|
|
|
- linkify: Update linkify to use etree type Treewalker instead of simpletree.
|
|
- Updated html5lib to version ``>=0.999``.
|
|
- Update all code to be compatible with Python 3 and 2 using six.
|
|
- Switch to Apache License.
|
|
|
|
|
|
Version 1.3
|
|
-----------
|
|
|
|
- Used by Python 3-only fork.
|
|
|
|
|
|
Version 1.2.2 (May 18, 2013)
|
|
----------------------------
|
|
|
|
- Pin html5lib to version 0.95 for now due to major API break.
|
|
|
|
Version 1.2.1 (February 19, 2013)
|
|
---------------------------------
|
|
|
|
- clean() no longer considers ``feed:`` an acceptable protocol due to
|
|
inconsistencies in browser behavior.
|
|
|
|
|
|
Version 1.2 (January 28, 2013)
|
|
------------------------------
|
|
|
|
- linkify() has changed considerably. Many keyword arguments have been
|
|
replaced with a single callbacks list. Please see the documentation
|
|
for more information.
|
|
- Bleach will no longer consider unacceptable protocols when linkifying.
|
|
- linkify() now takes a tokenizer argument that allows it to skip
|
|
sanitization.
|
|
- delinkify() is gone.
|
|
- Removed exception handling from _render. clean() and linkify() may now
|
|
throw.
|
|
- linkify() correctly ignores case for protocols and domain names.
|
|
- linkify() correctly handles markup within an <a> tag.
|
|
|
|
|
|
Version 1.1.5
|
|
-------------
|
|
|
|
|
|
Version 1.1.4
|
|
-------------
|
|
|
|
|
|
Version 1.1.3 (July 10, 2012)
|
|
-----------------------------
|
|
|
|
- Fix parsing bare URLs when parse_email=True.
|
|
|
|
|
|
Version 1.1.2 (June 1, 2012)
|
|
----------------------------
|
|
|
|
- Fix hang in style attribute sanitizer. (#61)
|
|
- Allow '/' in style attribute values.
|
|
|
|
|
|
Version 1.1.1 (February 17, 2012)
|
|
---------------------------------
|
|
|
|
- Fix tokenizer for html5lib 0.9.5.
|
|
|
|
|
|
Version 1.1.0 (October 24, 2011)
|
|
--------------------------------
|
|
|
|
- linkify() now understands port numbers. (#38)
|
|
- Documented character encoding behavior. (#41)
|
|
- Add an optional target argument to linkify().
|
|
- Add delinkify() method. (#45)
|
|
- Support subdomain whitelist for delinkify(). (#47, #48)
|
|
|
|
|
|
Version 1.0.4 (September 2, 2011)
|
|
---------------------------------
|
|
|
|
- Switch to SemVer git tags.
|
|
- Make linkify() smarter about trailing punctuation. (#30)
|
|
- Pass exc_info to logger during rendering issues.
|
|
- Add wildcard key for attributes. (#19)
|
|
- Make linkify() use the HTMLSanitizer tokenizer. (#36)
|
|
- Fix URLs wrapped in parentheses. (#23)
|
|
- Make linkify() UTF-8 safe. (#33)
|
|
|
|
|
|
Version 1.0.3 (June 14, 2011)
|
|
-----------------------------
|
|
|
|
- linkify() works with 3rd level domains. (#24)
|
|
- clean() supports vendor prefixes in style values. (#31, #32)
|
|
- Fix linkify() email escaping.
|
|
|
|
|
|
Version 1.0.2 (June 6, 2011)
|
|
----------------------------
|
|
|
|
- linkify() supports email addresses.
|
|
- clean() supports callables in attributes filter.
|
|
|
|
|
|
Version 1.0.1 (April 12, 2011)
|
|
------------------------------
|
|
|
|
- linkify() doesn't drop trailing slashes. (#21)
|
|
- linkify() won't linkify 'libgl.so.1'. (#22)
|
|
|
|
|