Part 2 of Step by Step Guide
This commit is contained in:
parent
e4fa13d29d
commit
5fd6b7f781
1 changed files with 36 additions and 2 deletions
38
README.md
38
README.md
|
@ -147,8 +147,42 @@ for n in iteration var list:
|
|||
|
||||
So if you are on the no javascript side of reality, you are lucky. Thats all needed to get the collection of links.
|
||||
|
||||
We can just come to
|
||||
|
||||
An example to understand better:
|
||||
Lets say we go on greenjobs.de.
|
||||
We go on search without search query. To get the biggest displayed output, in best case a table of everything the site has listed.
|
||||
|
||||
https://www.greenjobs.de/angebote/index.html?s=&loc=&countrycode=de&dist=10&lng=&lat=
|
||||
is the resulting url.
|
||||
|
||||
So now we navigate through the pages.
|
||||
In this case everything is displayed and scrollable on exactly this url. Which means, we leave link2 and iteration var list empty. And put the resulting url into link1.
|
||||
|
||||
Another example:
|
||||
This time we go on giz. There we have https://ausschreibungen.giz.de/Satellite/company/welcome.do as our url for a general search. If I go on the "nextpage" button of the displayed table, a new url pattern appears being on the next page:
|
||||
|
||||
https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=2
|
||||
|
||||
Going on the next side again, we get the url:
|
||||
|
||||
https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=3
|
||||
|
||||
So now we already see the pattern, that any and every machine generated output cant hide.
|
||||
|
||||
RSULT=1 .... we put it in the url bar of the browser
|
||||
|
||||
https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=1
|
||||
|
||||
and get to the first pages.
|
||||
Which leads to the following variables, considering that there were 6 pages:
|
||||
|
||||
* link1 = "https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT="
|
||||
* link2 = ""
|
||||
* iteration-var-list = "[1,2,3,4,5,6]"
|
||||
|
||||
|
||||
Having done the configuration, we can just come to
|
||||
|
||||
#### var parent
|
||||
|
||||
Oi
|
||||
The parent means
|
Loading…
Reference in a new issue