No description
Find a file
2023-11-20 16:38:18 +01:00
spiders added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 2023-11-20 15:28:04 +00:00
.gitignore first function works, actuall xml parser has still problems with certain xml types 2023-11-06 19:19:31 +00:00
main.py added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 2023-11-20 15:28:04 +00:00
README.md update README.md 2023-11-20 16:38:18 +01:00
requirements.txt specifying the links, new exception clause if soupparser does not work 2023-11-07 14:55:05 +00:00

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).