|
Metadata-Version: 2.1
|
|
Name: pdfminer3k
|
|
Version: 1.3.1
|
|
Summary: PDF parser and analyzer
|
|
Home-page: https://github.com/jaepil/pdfminer3k
|
|
Author: Yusuke Shinyama
|
|
Author-email: yusuke at cs dot nyu dot edu
|
|
Maintainer: Jaepil Jeong, Virgil Dupras
|
|
Maintainer-email: jaepil@kaist.ac.kr, hsoft@hardcoded.net
|
|
License: MIT/X
|
|
Keywords: pdf parser,pdf converter,layout analysis,text mining
|
|
Platform: UNKNOWN
|
|
Classifier: Development Status :: 4 - Beta
|
|
Classifier: Environment :: Console
|
|
Classifier: Intended Audience :: Developers
|
|
Classifier: Intended Audience :: Science/Research
|
|
Classifier: License :: OSI Approved :: MIT License
|
|
Classifier: Topic :: Text Processing
|
|
Requires-Dist: pytest (>=2.0)
|
|
Requires-Dist: ply (>=3.4)
|
|
|
|
pdfminer3k is a Python 3 port of pdfminer.
|
|
PDFMiner is a tool for extracting information from PDF documents.
|
|
Unlike other PDF-related tools, it focuses entirely on getting
|
|
and analyzing text data. PDFMiner allows to obtain
|
|
the exact location of texts in a page, as well as
|
|
other information such as fonts or lines.
|
|
It includes a PDF converter that can transform PDF files
|
|
into other text formats (such as HTML). It has an extensible
|
|
PDF parser that can be used for other purposes instead of text analysis.
|
|
|
|
Changes
|
|
=======
|
|
|
|
Version 1.3.1 -- 2016/11/05
|
|
---------------------------
|
|
|
|
* Replaced root loggers with module-wide loggers. This allows user to disable the log messages from pdfminer3k.
|
|
|
|
Version 1.3.0 -- 2012/07/20
|
|
---------------------------
|
|
|
|
* Added `pdfexplore`, a tool to debug PDFs by exploring their data.
|
|
* Don't try to group textboxes when there's too many (it takes too long).
|
|
* Support object references as filters in streams.
|
|
* Parse every object as soon as an objectid can't be found.
|
|
* Improved the `STRICT`-based error handling idiom.
|
|
|
|
Version 1.2.4 -- 2011/10/07
|
|
---------------------------
|
|
|
|
* When xref tables are corrupt, parse and cache all objects as a fallback.
|
|
* Fixed a bogus assertion in layouts.
|
|
|
|
Version 1.2.3 -- 2011/09/05
|
|
---------------------------
|
|
|
|
* Fixed a crash on uneven cmap codes.
|
|
* Fixed a meta-crash caused by bad PSParser repr.
|
|
|
|
Version 1.2.2 -- 2011/08/30
|
|
---------------------------
|
|
|
|
* Fixed crash on corrupt LZW data.
|
|
* Ignore lines with no text for textlines grouping.
|
|
* Don't crash on invalid dictionary constructs when parsing postscript.
|
|
|
|
Version 1.2.1 -- 2011/08/22
|
|
---------------------------
|
|
|
|
* Fixed a crash on corrupted inline images.
|
|
* Tweaked layout detection algo.
|
|
|
|
Version 1.2.0 -- 2011/08/09
|
|
---------------------------
|
|
|
|
* There wasn't a changelog until now. Starting it.
|
|
* Removed the old Postscript lexer and replaced it by a PLY-based one.
|
|
* Added a couple of heuristic layout features.
|
|
* Fixed a couple of crashes on opening PDFs.
|
|
|
|
|