No description
Find a file
2023-11-28 15:27:39 +00:00
spiders added further handling for javascript links not being urls, made config for giz work 2023-11-28 15:27:39 +00:00
.gitignore first function works, actuall xml parser has still problems with certain xml types 2023-11-06 19:19:31 +00:00
main.py added further database in config.yaml, added new exception for downloading js generated html pages 2023-11-27 15:10:11 +00:00
README.md update README.md 2023-11-20 16:38:18 +01:00
requirements.txt added trafilatura to requirements 2023-11-22 00:07:59 +00:00

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).