You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
alpcentaur 953f85ee5b added new lines to chromedriver, to make it work on other systems 9 months ago
spiders added new lines to chromedriver, to make it work on other systems 9 months ago
.gitignore first function works, actuall xml parser has still problems with certain xml types 10 months ago
README.md update README.md 10 months ago
main.py added pdf child text downloading and parse to json exceptions/cases for javascript entry data and normal data 9 months ago
requirements.txt updated requirements.txt 9 months ago

README.md

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).