No description
Find a file
2024-01-10 15:18:36 +01:00
spiders added first changes to fix js related bug for giz db 2024-01-10 15:18:36 +01:00
.gitignore added instruction for downloading chromium driver for python selenium to README.md 2023-12-15 14:13:41 +01:00
main.py added first changes to fix js related bug for giz db 2024-01-10 15:18:36 +01:00
README.md added instruction for downloading chromium driver for python selenium to README.md 2023-12-15 14:13:41 +01:00
requirements.txt last commit in detached head 2023-12-13 16:20:27 +01:00
spider.log added first changes to fix js related bug for giz db 2024-01-10 15:18:36 +01:00

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).

To run this, create a python3 virtualenv, pip install -r requirements, and

apt install xvfb
apt install chromium
apt install chromium-webdriver