You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
alpcentaur 20db0028e1 added first changes to fix js related bug for giz db 10 months ago
spiders added first changes to fix js related bug for giz db 10 months ago
.gitignore added instruction for downloading chromium driver for python selenium to README.md 10 months ago
README.md added instruction for downloading chromium driver for python selenium to README.md 10 months ago
main.py added first changes to fix js related bug for giz db 10 months ago
requirements.txt last commit in detached head 11 months ago
spider.log added first changes to fix js related bug for giz db 10 months ago

README.md

  __     _ _                     _     _
 / _| __| | |__        ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
|  _| (_| | |_) |_____\__ | |_) | | (_| |  __| |
|_|  \__,_|_.__/      |___| .__/|_|\__,_|\___|_|
                          |_|

Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.

Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).

To run this, create a python3 virtualenv, pip install -r requirements, and

apt install xvfb
apt install chromium
apt install chromium-webdriver