fdb-spider

No description

Find a file

alpcentaur dec60f9bf5 added changed logic for link addition regarding entry links		2023-12-18 21:26:53 +00:00
spiders	added changed logic for link addition regarding entry links	2023-12-18 21:26:53 +00:00
.gitignore	added instruction for downloading chromium driver for python selenium to README.md	2023-12-15 14:13:41 +01:00
main.py	added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links	2023-12-15 11:33:50 +00:00
README.md	added instruction for downloading chromium driver for python selenium to README.md	2023-12-15 14:13:41 +01:00
requirements.txt	last commit in detached head	2023-12-13 16:20:27 +01:00

README.md

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).

To run this, create a python3 virtualenv, pip install -r requirements, and

apt install xvfb
apt install chromium
apt install chromium-webdriver