alpcentaur
/
basabuuka_prototyp

"""
This is an implementation of wcwidth() and wcswidth().
Defined in IEEE Std 1002.1-2001.
https://github.com/jquast/wcwidth
from Markus Kuhn's C code at:
    http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
This is an implementation of wcwidth() and wcswidth() (defined inIEEE Std 1002.1-2001) for Unicode.
http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.htmlhttp://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
In fixed-width output devices, Latin characters all occupy a single"cell" position of equal width, whereas ideographic CJK charactersoccupy two such cells. Interoperability between terminal-lineapplications and (teletype-style) character terminals using theUTF-8 encoding requires agreement on which character should advancethe cursor by how many cell positions. No established formalstandards exist at present on which Unicode character shall occupyhow many cell positions on character terminals. These routines area first attempt of defining such behavior based on simple rulesapplied to data provided by the Unicode Consortium.
For some graphical characters, the Unicode standard explicitlydefines a character-cell width via the definition of the East AsianFullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.In all these cases, there is no ambiguity about which width aterminal shall use. For characters in the East Asian Ambiguous (A)class, the width choice depends purely on a preference of backwardcompatibility with either historic CJK or Western practice.Choosing single-width for these characters is easy to justify asthe appropriate long-term solution, as the CJK practice ofdisplaying these characters as double-width comes from historicimplementation simplicity (8-bit encoded characters were displayedsingle-width and 16-bit ones double-width, even for Greek,Cyrillic, etc.) and not any typographic considerations.
Much less clear is the choice of width for the Not East Asian(Neutral) class. Existing practice does not dictate a width for anyof these characters. It would nevertheless make sensetypographically to allocate two character cells to characters suchas for instance EM SPACE or VOLUME INTEGRAL, which cannot berepresented adequately with a single-width glyph. The followingroutines at present merely assign a single-cell width to allneutral characters, in the interest of simplicity. This is notentirely satisfactory and should be reconsidered beforeestablishing a formal standard in this area. At the moment, thedecision which Not East Asian (Neutral) characters should berepresented by double-width glyphs cannot yet be answered byapplying a simple rule from the Unicode database content. Settingup a proper standard for the behavior of UTF-8 character terminalswill require a careful analysis not only of each Unicode character,but also of each presentation form, something the author of theseroutines has avoided to do so far.
http://www.unicode.org/unicode/reports/tr11/
Markus Kuhn -- 2007-05-26 (Unicode 5.0)
Permission to use, copy, modify, and distribute this softwarefor any purpose and without fee is hereby granted. The authordisclaims all warranties with regard to this software.
Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c"""

from __future__ import divisionfrom .table_wide import WIDE_EASTASIANfrom .table_zero import ZERO_WIDTH

def _bisearch(ucs, table):    """
    Auxiliary function for binary search in interval table.
    :arg int ucs: Ordinal value of unicode character.    :arg list table: List of starting and ending ranges of ordinal values,        in form of ``[(start, end), ...]``.    :rtype: int    :returns: 1 if ordinal value ucs is found within lookup table, else 0.    """
    lbound = 0    ubound = len(table) - 1
    if ucs < table[0][0] or ucs > table[ubound][1]:        return 0    while ubound >= lbound:        mid = (lbound + ubound) // 2        if ucs > table[mid][1]:            lbound = mid + 1        elif ucs < table[mid][0]:            ubound = mid - 1        else:            return 1
    return 0

def wcwidth(wc):    r"""
    Given one unicode character, return its printable length on a terminal.
    The wcwidth() function returns 0 if the wc argument has no printable effect    on a terminal (such as NUL '\0'), -1 if wc is not printable, or has an    indeterminate effect on the terminal, such as a control character.    Otherwise, the number of column positions the character occupies on a    graphic terminal (1 or 2) is returned.
    The following have a column width of -1:
        - C0 control characters (U+001 through U+01F).
        - C1 control characters and DEL (U+07F through U+0A0).
    The following have a column width of 0:
        - Non-spacing and enclosing combining characters (general          category code Mn or Me in the Unicode database).
        - NULL (U+0000, 0).
        - COMBINING GRAPHEME JOINER (U+034F).
        - ZERO WIDTH SPACE (U+200B) through          RIGHT-TO-LEFT MARK (U+200F).
        - LINE SEPERATOR (U+2028) and          PARAGRAPH SEPERATOR (U+2029).
        - LEFT-TO-RIGHT EMBEDDING (U+202A) through          RIGHT-TO-LEFT OVERRIDE (U+202E).
        - WORD JOINER (U+2060) through          INVISIBLE SEPARATOR (U+2063).
    The following have a column width of 1:
        - SOFT HYPHEN (U+00AD) has a column width of 1.
        - All remaining characters (including all printable          ISO 8859-1 and WGL4 characters, Unicode control characters,          etc.) have a column width of 1.
    The following have a column width of 2:
        - Spacing characters in the East Asian Wide (W) or East Asian          Full-width (F) category as defined in Unicode Technical          Report #11 have a column width of 2.    """
    # pylint: disable=C0103    #         Invalid argument name "wc"    ucs = ord(wc)
    # NOTE: created by hand, there isn't anything identifiable other than    # general Cf category code to identify these, and some characters in Cf    # category code are of non-zero width.
    # pylint: disable=too-many-boolean-expressions    #          Too many boolean expressions in if statement (7/5)    if (ucs == 0 or            ucs == 0x034F or            0x200B <= ucs <= 0x200F or            ucs == 0x2028 or            ucs == 0x2029 or            0x202A <= ucs <= 0x202E or            0x2060 <= ucs <= 0x2063):        return 0
    # C0/C1 control characters    if ucs < 32 or 0x07F <= ucs < 0x0A0:        return -1
    # combining characters with zero width    if _bisearch(ucs, ZERO_WIDTH):        return 0
    return 1 + _bisearch(ucs, WIDE_EASTASIAN)

def wcswidth(pwcs, n=None):    """
    Given a unicode string, return its printable length on a terminal.
    Return the width, in cells, necessary to display the first ``n``    characters of the unicode string ``pwcs``.  When ``n`` is None (default),    return the length of the entire string.
    Returns ``-1`` if a non-printable character is encountered.    """
    # pylint: disable=C0103    #         Invalid argument name "n"
    end = len(pwcs) if n is None else n    idx = slice(0, end)    width = 0    for char in pwcs[idx]:        wcw = wcwidth(char)        if wcw < 0:            return -1        else:            width += wcw    return width