You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

82 lines
2.7 KiB

4 years ago
  1. Metadata-Version: 2.1
  2. Name: pdfminer3k
  3. Version: 1.3.1
  4. Summary: PDF parser and analyzer
  5. Home-page: https://github.com/jaepil/pdfminer3k
  6. Author: Yusuke Shinyama
  7. Author-email: yusuke at cs dot nyu dot edu
  8. Maintainer: Jaepil Jeong, Virgil Dupras
  9. Maintainer-email: jaepil@kaist.ac.kr, hsoft@hardcoded.net
  10. License: MIT/X
  11. Keywords: pdf parser,pdf converter,layout analysis,text mining
  12. Platform: UNKNOWN
  13. Classifier: Development Status :: 4 - Beta
  14. Classifier: Environment :: Console
  15. Classifier: Intended Audience :: Developers
  16. Classifier: Intended Audience :: Science/Research
  17. Classifier: License :: OSI Approved :: MIT License
  18. Classifier: Topic :: Text Processing
  19. Requires-Dist: pytest (>=2.0)
  20. Requires-Dist: ply (>=3.4)
  21. pdfminer3k is a Python 3 port of pdfminer.
  22. PDFMiner is a tool for extracting information from PDF documents.
  23. Unlike other PDF-related tools, it focuses entirely on getting
  24. and analyzing text data. PDFMiner allows to obtain
  25. the exact location of texts in a page, as well as
  26. other information such as fonts or lines.
  27. It includes a PDF converter that can transform PDF files
  28. into other text formats (such as HTML). It has an extensible
  29. PDF parser that can be used for other purposes instead of text analysis.
  30. Changes
  31. =======
  32. Version 1.3.1 -- 2016/11/05
  33. ---------------------------
  34. * Replaced root loggers with module-wide loggers. This allows user to disable the log messages from pdfminer3k.
  35. Version 1.3.0 -- 2012/07/20
  36. ---------------------------
  37. * Added `pdfexplore`, a tool to debug PDFs by exploring their data.
  38. * Don't try to group textboxes when there's too many (it takes too long).
  39. * Support object references as filters in streams.
  40. * Parse every object as soon as an objectid can't be found.
  41. * Improved the `STRICT`-based error handling idiom.
  42. Version 1.2.4 -- 2011/10/07
  43. ---------------------------
  44. * When xref tables are corrupt, parse and cache all objects as a fallback.
  45. * Fixed a bogus assertion in layouts.
  46. Version 1.2.3 -- 2011/09/05
  47. ---------------------------
  48. * Fixed a crash on uneven cmap codes.
  49. * Fixed a meta-crash caused by bad PSParser repr.
  50. Version 1.2.2 -- 2011/08/30
  51. ---------------------------
  52. * Fixed crash on corrupt LZW data.
  53. * Ignore lines with no text for textlines grouping.
  54. * Don't crash on invalid dictionary constructs when parsing postscript.
  55. Version 1.2.1 -- 2011/08/22
  56. ---------------------------
  57. * Fixed a crash on corrupted inline images.
  58. * Tweaked layout detection algo.
  59. Version 1.2.0 -- 2011/08/09
  60. ---------------------------
  61. * There wasn't a changelog until now. Starting it.
  62. * Removed the old Postscript lexer and replaced it by a PLY-based one.
  63. * Added a couple of heuristic layout features.
  64. * Fixed a couple of crashes on opening PDFs.