alpcentaur 92c238a2ed | 11 months ago | |
---|---|---|
spiders | 11 months ago | |
.gitignore | 11 months ago | |
README.md | 11 months ago | |
main.py | 11 months ago | |
requirements.txt | 11 months ago |
__ _ _ _ _
/ _| __| | |__ ___ _ __ (_) __| | ___ _ __
| |_ / _` | '_ \ _____/ __| '_ \| |/ _` |/ _ | '__|
| _| (_| | |_) |_____\__ | |_) | | (_| | __| |
|_| \__,_|_.__/ |___| .__/|_|\__,_|\___|_|
|_|
Configure fdb-spider in a yaml file. Spider Multi page databases of links. Filter and serialize content to json.
Filter either by xpath syntax. Or Filter with the help of Artificial Neural Networks (work in progress).
To run this, create a python3 virtualenv, pip install -r requirements, and
apt install xvfb
apt install chromium
apt install chromium-webdriver