No description
Find a file
2024-01-15 19:12:04 +00:00
spiders deleted config.yaml.save 2024-01-15 19:12:04 +00:00
.gitignore added other exception for unitrue var text not being found, before saving index 0 to variable produced error to whole execution 2024-01-10 15:28:41 +00:00
main.py corrected link glueing for pdf links for loop 2024-01-15 19:09:28 +00:00
README.md added instruction for downloading chromium driver for python selenium to README.md 2023-12-15 14:13:41 +01:00
requirements.txt last commit in detached head 2023-12-13 16:20:27 +01:00

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).

To run this, create a python3 virtualenv, pip install -r requirements, and

apt install xvfb
apt install chromium
apt install chromium-webdriver