You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

269 lines
11 KiB

4 years ago
  1. Metadata-Version: 2.1
  2. Name: blis
  3. Version: 0.4.1
  4. Summary: The Blis BLAS-like linear algebra library, as a self-contained C-extension.
  5. Home-page: https://github.com/explosion/cython-blis
  6. Author: Matthew Honnibal
  7. Author-email: matt@explosion.ai
  8. License: BSD
  9. Platform: UNKNOWN
  10. Classifier: Development Status :: 4 - Beta
  11. Classifier: Environment :: Console
  12. Classifier: Intended Audience :: Developers
  13. Classifier: Intended Audience :: Information Technology
  14. Classifier: License :: OSI Approved :: MIT License
  15. Classifier: Operating System :: POSIX :: Linux
  16. Classifier: Operating System :: MacOS :: MacOS X
  17. Classifier: Programming Language :: Cython
  18. Classifier: Programming Language :: Python :: 2.7
  19. Classifier: Programming Language :: Python :: 3.5
  20. Classifier: Programming Language :: Python :: 3.6
  21. Classifier: Topic :: Scientific/Engineering
  22. Description-Content-Type: text/markdown
  23. Requires-Dist: numpy (>=1.15.0)
  24. <a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
  25. # Cython BLIS: Fast BLAS-like operations from Python and Cython, without the tears
  26. This repository provides the [Blis linear algebra](https://github.com/flame/blis)
  27. routines as a self-contained Python C-extension.
  28. Currently, we only supports single-threaded execution, as this is actually best for our workloads (ML inference).
  29. [![Travis](https://img.shields.io/travis/explosion/cython-blis/master.svg?style=flat-square&logo=travis)](https://travis-ci.org/explosion/cython-blis)
  30. [![Appveyor](https://img.shields.io/appveyor/ci/explosion/cython-blis/master.svg?style=flat-square&logo=appveyor)](https://ci.appveyor.com/project/explosion/cython-blis)
  31. [![pypi Version](https://img.shields.io/pypi/v/blis.svg?style=flat-square)](https://pypi.python.org/pypi/blis)
  32. [![conda](https://img.shields.io/conda/vn/conda-forge/cython-blis.svg?style=flat-square)](https://anaconda.org/conda-forge/cython-blis)
  33. [![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)
  34. ## Installation
  35. You can install the package via pip:
  36. ```bash
  37. pip install blis
  38. ```
  39. Wheels should be available, so installation should be fast. If you want to install from source and you're on Windows, you'll need to install LLVM.
  40. ### Building BLIS for alternative architectures
  41. The provided wheels should work on x86_86 architectures. Unfortunately we do not currently know a way to provide different wheels for alternative architectures, and we cannot provide a single binary that works everywhere. So if the wheel doesn't work for your CPU, you'll need to specify source distribution, and tell Blis your CPU architecture using the `BLIS_ARCH` environment variable.
  42. #### a) Installing with generic arch support
  43. ```bash
  44. BLIS_ARCH="generic" pip install spacy --no-binary blis
  45. ```
  46. #### b) Building specific support
  47. In order to compile Blis, `cython-blis` bundles makefile scripts for specific architectures, that are compiled by running the Blis build system and logging the commands. We do not yet have logs for every architecture, as there are some architectures we have not had access to.
  48. [See here](https://github.com/flame/blis/blob/0.5.1/config_registry) for list of
  49. architectures. For example, here's how to build support for the ARM architecture `cortexa57`:
  50. ```bash
  51. git clone https://github.com/explosion/cython-blis && cd cython-blis
  52. git pull && git submodule init && git submodule update && git submodule status
  53. python3 -m venv env3.6
  54. source env3.6/bin/activate
  55. pip install -r requirements.txt
  56. ./bin/generate-make-jsonl linux cortexa57
  57. BLIS_ARCH="cortexa57" python setup.py build_ext --inplace
  58. BLIS_ARCH="cortexa57" python setup.py bdist_wheel
  59. ```
  60. Fingers crossed, this will build you a wheel that supports your platform. You
  61. could then [submit a PR](https://github.com/explosion/cython-blis/pulls) with
  62. the `blis/_src/make/linux-cortexa57.jsonl` and
  63. `blis/_src/include/linux-cortexa57/blis.h` files so that you can run:
  64. ```bash
  65. BLIS_ARCH=cortexa57 pip install spacy --no-binary=blis
  66. ```
  67. ### Running the benchmark
  68. After installation, run a small matrix multiplication benchmark:
  69. ```bash
  70. $ export OMP_NUM_THREADS=1 # Tell Numpy to only use one thread.
  71. $ python -m blis.benchmark
  72. Setting up data nO=384 nI=384 batch_size=2000. Running 1000 iterations
  73. Blis...
  74. Total: 11032014.6484
  75. 7.35 seconds
  76. Numpy (Openblas)...
  77. Total: 11032016.6016
  78. 16.81 seconds
  79. Blis einsum ab,cb->ca
  80. 8.10 seconds
  81. Numpy einsum ab,cb->ca
  82. Total: 5510596.19141
  83. 83.18 seconds
  84. ```
  85. The low `numpy.einsum` performance is
  86. expected, but the low `numpy.dot` performance is surprising. Linking numpy
  87. against MKL gives better performance:
  88. ```bash
  89. Numpy (mkl_rt) gemm...
  90. Total: 11032011.71875
  91. 5.21 seconds
  92. ```
  93. These figures refer to performance on a Dell XPS 13 i7-7500U. Running the
  94. same benchmark on a 2015 MacBook Air gives:
  95. ```bash
  96. Blis...
  97. Total: 11032014.6484
  98. 8.89 seconds
  99. Numpy (Accelerate)...
  100. Total: 11032012.6953
  101. 6.68 seconds
  102. ```
  103. Clearly the Dell's numpy+OpenBLAS performance is the outlier, so it's likely
  104. something has gone wrong in the compilation and architecture detection.
  105. ## Usage
  106. Two APIs are provided: a high-level Python API, and direct
  107. [Cython](http://cython.org) access. The best part of the Python API is the
  108. [einsum function](https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/),
  109. which works like numpy's, but with some restrictions that allow
  110. a direct mapping to Blis routines. Example usage:
  111. ```python
  112. from blis.py import einsum
  113. from numpy import ndarray, zeros
  114. dim_a = 500
  115. dim_b = 128
  116. dim_c = 300
  117. arr1 = ndarray((dim_a, dim_b))
  118. arr2 = ndarray((dim_b, dim_c))
  119. out = zeros((dim_a, dim_c))
  120. einsum('ab,bc->ac', arr1, arr2, out=out)
  121. # Change dimension order of output
  122. out = einsum('ab,bc->ca', arr1, arr2)
  123. assert out.shape == (dim_a, dim_c)
  124. # Matrix vector product, with transposed output
  125. arr2 = ndarray((dim_b,))
  126. out = einsum('ab,b->ba', arr1, arr2)
  127. assert out.shape == (dim_b, dim_a)
  128. ```
  129. The Einstein summation format is really awesome, so it's always been
  130. disappointing that it's so much slower than equivalent calls to `tensordot`
  131. in numpy. The `blis.einsum` function gives up the numpy version's generality,
  132. so that calls can be easily mapped to Blis:
  133. - Only two input tensors
  134. - Maximum two dimensions
  135. - Dimensions must be labelled `a`, `b` and `c`
  136. - The first argument's dimensions must be `'a'` (for 1d inputs) or `'ab'` (for 2d inputs).
  137. With these restrictions, there are ony 15 valid combinations – which
  138. correspond to all the things you would otherwise do with the `gemm`, `gemv`,
  139. `ger` and `axpy` functions. You can therefore forget about all the other
  140. functions and just use the `einsum`. Here are the valid einsum strings, the
  141. calls they correspond to, and the numpy equivalents:
  142. | Equation | Maps to | Numpy |
  143. | ------------- | ---------------------------------------- | --------------- |
  144. | `'a,a->a'` | `axpy(A, B)` | `A+B` |
  145. | `'a,b->ab'` | `ger(A, B)` | `outer(A, B)` |
  146. | `'a,b->ba'` | `ger(B, A)` | `outer(B, A)` |
  147. | `'ab,a->ab'` | `batch_axpy(A, B)` | `A*B` |
  148. | `'ab,a->ba'` | `batch_axpy(A, B, trans1=True)` | `(A*B).T` |
  149. | `'ab,b->a'` | `gemv(A, B)` | `A*B` |
  150. | `'ab,a->b'` | `gemv(A, B, trans1=True)` | `A.T*B` |
  151. | `'ab,ac->cb'` | `gemm(B, A, trans1=True, trans2=True)` | `dot(B.T, A)` |
  152. | `'ab,ac->bc'` | `gemm(A, B, trans1=True, trans2=False)` | `dot(A.T, B)` |
  153. | `'ab,bc->ac'` | `gemm(A, B, trans1=False, trans2=False)` | `dot(A, B)` |
  154. | `'ab,bc->ca'` | `gemm(B, A, trans1=False, trans2=True)` | `dot(B.T, A.T)` |
  155. | `'ab,ca->bc'` | `gemm(A, B, trans1=True, trans2=True)` | `dot(B, A.T)` |
  156. | `'ab,ca->cb'` | `gemm(B, A, trans1=False, trans2=False)` | `dot(B, A)` |
  157. | `'ab,cb->ac'` | `gemm(A, B, trans1=False, trans2=True)` | `dot(A.T, B.T)` |
  158. | `'ab,cb->ca'` | `gemm(B, A, trans1=False, trans2=True)` | `dot(B, A.T)` |
  159. We also provide fused-type, nogil Cython bindings to the underlying
  160. Blis linear algebra library. Fused types are a simple template mechanism,
  161. allowing just a touch of compile-time generic programming:
  162. ```python
  163. cimport blis.cy
  164. A = <float*>calloc(nN * nI, sizeof(float))
  165. B = <float*>calloc(nO * nI, sizeof(float))
  166. C = <float*>calloc(nr_b0 * nr_b1, sizeof(float))
  167. blis.cy.gemm(blis.cy.NO_TRANSPOSE, blis.cy.NO_TRANSPOSE,
  168. nO, nI, nN,
  169. 1.0, A, nI, 1, B, nO, 1,
  170. 1.0, C, nO, 1)
  171. ```
  172. Bindings have been added as we've needed them. Please submit pull requests if
  173. the library is missing some functions you require.
  174. ## Development
  175. To build the source package, you should run the following command:
  176. ```bash
  177. ./bin/copy-source-files.sh
  178. ```
  179. This populates the `blis/_src` folder for the various architectures, using the
  180. `flame-blis` submodule.
  181. ## Updating the build files
  182. In order to compile the Blis sources, we use jsonl files that provide the
  183. explicit compiler flags. We build these jsonl files by running Blis's build
  184. system, and then converting the log. This avoids us having to replicate the
  185. build system within Python: we just use the jsonl to make a bunch of subprocess
  186. calls. To support a new OS/architecture combination, we have to provide the
  187. jsonl file and the header.
  188. ### Linux
  189. The Linux build files need to be produced from within the manylinux1 docker
  190. container, so that they will be compatible with the wheel building process.
  191. First, install docker. Then do the following to start the container:
  192. sudo docker run -it quay.io/pypa/manylinux1_x86_64:latest
  193. Once within the container, the following commands should check out the repo and
  194. build the jsonl files for the generic arch:
  195. mkdir /usr/local/repos
  196. cd /usr/local/repos
  197. git clone https://github.com/explosion/cython-blis && cd cython-blis
  198. git pull && git submodule init && git submodule update && git submodule
  199. status
  200. /opt/python/cp36-cp36m/bin/python -m venv env3.6
  201. source env3.6/bin/activate
  202. pip install -r requirements.txt
  203. ./bin/generate-make-jsonl linux generic --export
  204. BLIS_ARCH=generic python setup.py build_ext --inplace
  205. # N.B.: don't copy to /tmp, docker cp doesn't work from there.
  206. cp blis/_src/include/linux-generic/blis.h /linux-generic-blis.h
  207. cp blis/_src/make/linux-generic.jsonl /
  208. Then from a new terminal, retrieve the two files we need out of the container:
  209. sudo docker ps -l # Get the container ID
  210. # When I'm in Vagrant, I need to go via cat -- but then I end up with dummy
  211. # lines at the top and bottom. Sigh. If you don't have that problem and
  212. # sudo docker cp just works, just copy the file.
  213. sudo docker cp aa9d42588791:/linux-generic-blis.h - | cat > linux-generic-blis.h
  214. sudo docker cp aa9d42588791:/linux-generic.jsonl - | cat > linux-generic.jsonl