You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

329 lines
14 KiB

4 years ago
  1. Metadata-Version: 2.1
  2. Name: srsly
  3. Version: 1.0.1
  4. Summary: Modern high-performance serialization utilities for Python
  5. Home-page: https://explosion.ai
  6. Author: Explosion AI
  7. Author-email: contact@explosion.ai
  8. License: MIT
  9. Platform: UNKNOWN
  10. Classifier: Development Status :: 5 - Production/Stable
  11. Classifier: Environment :: Console
  12. Classifier: Intended Audience :: Developers
  13. Classifier: Intended Audience :: Science/Research
  14. Classifier: License :: OSI Approved :: MIT License
  15. Classifier: Operating System :: POSIX :: Linux
  16. Classifier: Operating System :: MacOS :: MacOS X
  17. Classifier: Operating System :: Microsoft :: Windows
  18. Classifier: Programming Language :: Cython
  19. Classifier: Programming Language :: Python :: 2
  20. Classifier: Programming Language :: Python :: 2.7
  21. Classifier: Programming Language :: Python :: 3
  22. Classifier: Programming Language :: Python :: 3.4
  23. Classifier: Programming Language :: Python :: 3.5
  24. Classifier: Programming Language :: Python :: 3.6
  25. Classifier: Programming Language :: Python :: 3.7
  26. Classifier: Programming Language :: Python :: 3.8
  27. Classifier: Topic :: Scientific/Engineering
  28. Description-Content-Type: text/markdown
  29. Requires-Dist: pathlib (==1.0.1); python_version < "3.4"
  30. <a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
  31. # srsly: Modern high-performance serialization utilities for Python
  32. This package bundles some of the best Python serialization libraries into one
  33. standalone package, with a high-level API that makes it easy to write code
  34. that's correct across platforms and Pythons. This allows us to provide all the
  35. serialization utilities we need in a single binary wheel.
  36. [![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/4/master.svg?logo=azure-pipelines&style=flat-square)](https://dev.azure.com/explosion-ai/public/_build?definitionId=4)
  37. [![PyPi](https://img.shields.io/pypi/v/srsly.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.python.org/pypi/srsly)
  38. [![conda](https://img.shields.io/conda/vn/conda-forge/srsly.svg?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/srsly)
  39. [![GitHub](https://img.shields.io/github/release/explosion/srsly/all.svg?style=flat-square&logo=github)](https://github.com/explosion/srsly)
  40. [![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)
  41. ## Motivation
  42. Serialization is hard, especially across Python versions and multiple platforms.
  43. After dealing with many subtle bugs over the years (encodings, locales, large
  44. files) our libraries like [spaCy](https://github.com/explosion/spaCy) and
  45. [Prodigy](https://prodi.gy) have steadily grown a number of utility functions to
  46. wrap the multiple serialization formats we need to support (especially `json`,
  47. `msgpack` and `pickle`). These wrapping functions ended up duplicated across our
  48. codebases, so we wanted to put them in one place.
  49. At the same time, we noticed that having a lot of small dependencies was making
  50. maintenance harder, and making installation slower. To solve this, we've made
  51. `srsly` standalone, by including the component packages directly within it. This
  52. way we can provide all the serialization utilities we need in a single binary
  53. wheel.
  54. `srsly` currently includes forks of the following packages:
  55. - [`ujson`](https://github.com/esnme/ultrajson)
  56. - [`msgpack`](https://github.com/msgpack/msgpack-python)
  57. - [`msgpack-numpy`](https://github.com/lebedov/msgpack-numpy)
  58. - [`cloudpickle`](https://github.com/cloudpipe/cloudpickle)
  59. ## Installation
  60. `srsly` can be installed from pip:
  61. ```bash
  62. pip install srsly
  63. ```
  64. Or from conda via conda-forge:
  65. ```bash
  66. conda install -c conda-forge srsly
  67. ```
  68. Alternatively, you can also compile the library from source. You'll need to make
  69. sure that you have a development environment consisting of a Python distribution
  70. including header files, a compiler (XCode command-line tools on macOS / OS X or
  71. Visual C++ build tools on Windows), pip, virtualenv and git installed.
  72. ```bash
  73. pip install -r requirements.txt # install development dependencies
  74. python setup.py build_ext --inplace # compile the library
  75. ```
  76. ## API
  77. ### JSON
  78. > 📦 The underlying module is exposed via `srsly.ujson`. However, we normally
  79. > interact with it via the utility functions only.
  80. #### <kbd>function</kbd> `srsly.json_dumps`
  81. Serialize an object to a JSON string. Takes care of Python 2/3 compatibility and
  82. falls back to `json` if `sort_keys=True` is used (until it's fixed in `ujson`).
  83. ```python
  84. data = {"foo": "bar", "baz": 123}
  85. json_string = srsly.json_dumps(data)
  86. ```
  87. | Argument | Type | Description |
  88. | ----------- | ------- | ------------------------------------------------------ |
  89. | `data` | - | The JSON-serializable data to output. |
  90. | `indent` | int | Number of spaces used to indent JSON. Defaults to `0`. |
  91. | `sort_keys` | bool | Sort dictionary keys. Defaults to `False`. |
  92. | **RETURNS** | unicode | The serialized string. |
  93. #### <kbd>function</kbd> `srsly.json_loads`
  94. Deserialize unicode or bytes to a Python object.
  95. ```python
  96. data = '{"foo": "bar", "baz": 123}'
  97. obj = srsly.json_loads(data)
  98. ```
  99. | Argument | Type | Description |
  100. | ----------- | --------------- | ------------------------------- |
  101. | `data` | unicode / bytes | The data to deserialize. |
  102. | **RETURNS** | - | The deserialized Python object. |
  103. #### <kbd>function</kbd> `srsly.write_json`
  104. Create a JSON file and dump contents or write to standard output.
  105. ```python
  106. data = {"foo": "bar", "baz": 123}
  107. srsly.write_json("/path/to/file.json", data)
  108. ```
  109. | Argument | Type | Description |
  110. | ---------- | ---------------- | ------------------------------------------------------ |
  111. | `location` | unicode / `Path` | The file path or `"-"` to write to stdout. |
  112. | `data` | - | The JSON-serializable data to output. |
  113. | `indent` | int | Number of spaces used to indent JSON. Defaults to `2`. |
  114. #### <kbd>function</kbd> `srsly.read_json`
  115. Load JSON from a file or standard input.
  116. ```python
  117. data = srsly.read_json("/path/to/file.json")
  118. ```
  119. | Argument | Type | Description |
  120. | ----------- | ---------------- | ------------------------------------------ |
  121. | `location` | unicode / `Path` | The file path or `"-"` to read from stdin. |
  122. | **RETURNS** | dict / list | The loaded JSON content. |
  123. #### <kbd>function</kbd> `srsly.write_gzip_json`
  124. Create a gzipped JSON file and dump contents.
  125. ```python
  126. data = {"foo": "bar", "baz": 123}
  127. srsly.write_gzip_json("/path/to/file.json.gz", data)
  128. ```
  129. | Argument | Type | Description |
  130. | ---------- | ---------------- | ------------------------------------------------------ |
  131. | `location` | unicode / `Path` | The file path. |
  132. | `data` | - | The JSON-serializable data to output. |
  133. | `indent` | int | Number of spaces used to indent JSON. Defaults to `2`. |
  134. #### <kbd>function</kbd> `srsly.read_gzip_json`
  135. Load gzipped JSON from a file.
  136. ```python
  137. data = srsly.read_gzip_json("/path/to/file.json.gz")
  138. ```
  139. | Argument | Type | Description |
  140. | ----------- | ---------------- | ------------------------ |
  141. | `location` | unicode / `Path` | The file path. |
  142. | **RETURNS** | dict / list | The loaded JSON content. |
  143. #### <kbd>function</kbd> `srsly.write_jsonl`
  144. Create a JSONL file (newline-delimited JSON) and dump contents line by line, or
  145. write to standard output.
  146. ```python
  147. data = [{"foo": "bar"}, {"baz": 123}]
  148. srsly.write_jsonl("/path/to/file.jsonl", data)
  149. ```
  150. | Argument | Type | Description |
  151. | ---------- | ---------------- | ---------------------------------------------------------------------------------------------------------------------- |
  152. | `location` | unicode / `Path` | The file path or `"-"` to write to stdout. |
  153. | `lines` | iterable | The JSON-serializable lines. |
  154. | `append` | bool | Append to an existing file. Will open it in `"a"` mode and insert a newline before writing lines. Defaults to `False`. |
  155. | `append_new_line` | bool | Defines whether a new line should first be written when appending to an existing file. Defaults to `True`. |
  156. #### <kbd>function</kbd> `srsly.read_jsonl`
  157. Read a JSONL file (newline-delimited JSON) or from JSONL data from standard
  158. input and yield contents line by line. Blank lines will always be skipped.
  159. ```python
  160. data = srsly.read_jsonl("/path/to/file.jsonl")
  161. ```
  162. | Argument | Type | Description |
  163. | ---------- | -------------- | -------------------------------------------------------------------- |
  164. | `location` | unicode / Path | The file path or `"-"` to read from stdin. |
  165. | `skip` | bool | Skip broken lines and don't raise `ValueError`. Defaults to `False`. |
  166. | **YIELDS** | - | The loaded JSON contents of each line. |
  167. #### <kbd>function</kbd> `srsly.is_json_serializable`
  168. Check if a Python object is JSON-serializable.
  169. ```python
  170. assert srsly.is_json_serializable({"hello": "world"}) is True
  171. assert srsly.is_json_serializable(lambda x: x) is False
  172. ```
  173. | Argument | Type | Description |
  174. | ----------- | ---- | ---------------------------------------- |
  175. | `obj` | - | The object to check. |
  176. | **RETURNS** | bool | Whether the object is JSON-serializable. |
  177. ### msgpack
  178. > 📦 The underlying module is exposed via `srsly.msgpack`. However, we normally
  179. > interact with it via the utility functions only.
  180. #### <kbd>function</kbd> `srsly.msgpack_dumps`
  181. Serialize an object to a msgpack byte string.
  182. ```python
  183. data = {"foo": "bar", "baz": 123}
  184. msg = srsly.msgpack_dumps(data)
  185. ```
  186. | Argument | Type | Description |
  187. | ----------- | ----- | ---------------------- |
  188. | `data` | - | The data to serialize. |
  189. | **RETURNS** | bytes | The serialized bytes. |
  190. #### <kbd>function</kbd> `srsly.msgpack_loads`
  191. Deserialize msgpack bytes to a Python object.
  192. ```python
  193. msg = b"\x82\xa3foo\xa3bar\xa3baz{"
  194. data = srsly.msgpack_loads(msg)
  195. ```
  196. | Argument | Type | Description |
  197. | ----------- | ----- | --------------------------------------------------------------------------------------- |
  198. | `data` | bytes | The data to deserialize. |
  199. | `use_list` | bool | Don't use tuples instead of lists. Can make deserialization slower. Defaults to `True`. |
  200. | **RETURNS** | - | The deserialized Python object. |
  201. #### <kbd>function</kbd> `srsly.write_msgpack`
  202. Create a msgpack file and dump contents.
  203. ```python
  204. data = {"foo": "bar", "baz": 123}
  205. srsly.write_msgpack("/path/to/file.msg", data)
  206. ```
  207. | Argument | Type | Description |
  208. | ---------- | ---------------- | ---------------------- |
  209. | `location` | unicode / `Path` | The file path. |
  210. | `data` | - | The data to serialize. |
  211. #### <kbd>function</kbd> `srsly.read_msgpack`
  212. Load a msgpack file.
  213. ```python
  214. data = srsly.read_msgpack("/path/to/file.msg")
  215. ```
  216. | Argument | Type | Description |
  217. | ----------- | ---------------- | --------------------------------------------------------------------------------------- |
  218. | `location` | unicode / `Path` | The file path. |
  219. | `use_list` | bool | Don't use tuples instead of lists. Can make deserialization slower. Defaults to `True`. |
  220. | **RETURNS** | - | The loaded and deserialized content. |
  221. ### pickle
  222. > 📦 The underlying module is exposed via `srsly.cloudpickle`. However, we
  223. > normally interact with it via the utility functions only.
  224. #### <kbd>function</kbd> `srsly.pickle_dumps`
  225. Serialize a Python object with pickle.
  226. ```python
  227. data = {"foo": "bar", "baz": 123}
  228. pickled_data = srsly.pickle_dumps(data)
  229. ```
  230. | Argument | Type | Description |
  231. | ----------- | ----- | ------------------------------------------------------ |
  232. | `data` | - | The object to serialize. |
  233. | `protocol` | int | Protocol to use. `-1` for highest. Defaults to `None`. |
  234. | **RETURNS** | bytes | The serialized object. |
  235. #### <kbd>function</kbd> `srsly.pickle_loads`
  236. Deserialize bytes with pickle.
  237. ```python
  238. pickled_data = b"\x80\x04\x95\x19\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03foo\x94\x8c\x03bar\x94\x8c\x03baz\x94K{u."
  239. data = srsly.pickle_loads(pickled_data)
  240. ```
  241. | Argument | Type | Description |
  242. | ----------- | ----- | ------------------------------- |
  243. | `data` | bytes | The data to deserialize. |
  244. | **RETURNS** | - | The deserialized Python object. |