You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
alpcentaur 69c517292b Update 'README.md' 11 months ago
spiders added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 11 months ago
.gitignore first function works, actuall xml parser has still problems with certain xml types 1 year ago
README.md Update 'README.md' 11 months ago
main.py added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 11 months ago
requirements.txt specifying the links, new exception clause if soupparser does not work 1 year ago

README.md

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks.