No description
Find a file
2023-12-18 21:26:53 +00:00
spiders added changed logic for link addition regarding entry links 2023-12-18 21:26:53 +00:00
.gitignore added instruction for downloading chromium driver for python selenium to README.md 2023-12-15 14:13:41 +01:00
main.py added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links 2023-12-15 11:33:50 +00:00
README.md added instruction for downloading chromium driver for python selenium to README.md 2023-12-15 14:13:41 +01:00
requirements.txt last commit in detached head 2023-12-13 16:20:27 +01:00

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).

To run this, create a python3 virtualenv, pip install -r requirements, and

apt install xvfb
apt install chromium
apt install chromium-webdriver