fdb-spider

No description

Find a file

alpcentaur ece5cf1301 added better logic for getting the right link of entry		2023-12-15 13:34:23 +01:00
spiders	added better logic for getting the right link of entry	2023-12-15 13:34:23 +01:00
.gitignore	first function works, actuall xml parser has still problems with certain xml types	2023-11-06 19:19:31 +00:00
main.py	added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links	2023-12-15 11:33:50 +00:00
README.md	update README.md	2023-11-20 16:38:18 +01:00
requirements.txt	last commit in detached head	2023-12-13 16:20:27 +01:00

README.md

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).