Browse Source

update var javascriptlink in README.md

master
alpcentaur 6 months ago
parent
commit
0d7728240e
1 changed files with 3 additions and 1 deletions
  1. +3
    -1
      README.md

+ 3
- 1
README.md View File

@ -275,7 +275,9 @@ In future it will be possible to define any variables anywhere and get that fed
#### var javascript link
In case the whole website to spider is javascript generated gibberish, you need to download the htmls behind the links already while downloading the entry list htmls with javascript. Then define the javascript link that needs to be clicked in xpath syntax. To either become a pop up, which source code will be processed, or to become an actual page which source code will be parsed, or to become a pdf that gets downloaded and parsed to text.
In case the whole website to spider is javascript generated gibberish, there is another possibility for you. To find out if the website is generated gibberish not containing your payload, just search in the outputed pages for the child name etc. If you do not find them, or directly see on the html pages no real xml, try to download your sites with jsdomain, and go up again to the paragraphs before. For the actual link child, the spider can javascript style clicking download the htmls behind the links already while downloading the entry list htmls with javascript.
For that to happen, you can define the javascript link that needs to be clicked in xpath syntax. If it becomes a pop up, which source code needs to be processed, or if it becomes an actual page the source code will be parsed, or if the clickable link refers to a pdf, the spider will handle all situations and output the resulting text in json under spiders/output.
#### var slow downlading

Loading…
Cancel
Save