From 5fd6b7f781eb2ff8d7268ea21801fb2f3156b29a Mon Sep 17 00:00:00 2001 From: alpcentaur Date: Wed, 28 Feb 2024 17:34:57 +0100 Subject: [PATCH] Part 2 of Step by Step Guide --- README.md | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 970103e..c4c19fb 100644 --- a/README.md +++ b/README.md @@ -147,8 +147,42 @@ for n in iteration var list: So if you are on the no javascript side of reality, you are lucky. Thats all needed to get the collection of links. -We can just come to + +An example to understand better: +Lets say we go on greenjobs.de. +We go on search without search query. To get the biggest displayed output, in best case a table of everything the site has listed. + +https://www.greenjobs.de/angebote/index.html?s=&loc=&countrycode=de&dist=10&lng=&lat= +is the resulting url. + +So now we navigate through the pages. +In this case everything is displayed and scrollable on exactly this url. Which means, we leave link2 and iteration var list empty. And put the resulting url into link1. + +Another example: +This time we go on giz. There we have https://ausschreibungen.giz.de/Satellite/company/welcome.do as our url for a general search. If I go on the "nextpage" button of the displayed table, a new url pattern appears being on the next page: + +https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=2 + +Going on the next side again, we get the url: + +https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=3 + +So now we already see the pattern, that any and every machine generated output cant hide. + +RSULT=1 .... we put it in the url bar of the browser + +https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=1 + +and get to the first pages. +Which leads to the following variables, considering that there were 6 pages: + +* link1 = "https://ausschreibungen.giz.de/Satellite/company/welcome.do?method=showTable&fromSearch=1&tableSortPROJECT_RESULT=2&tableSortAttributePROJECT_RESULT=publicationDate&selectedTablePagePROJECT_RESULT=" +* link2 = "" +* iteration-var-list = "[1,2,3,4,5,6]" + + +Having done the configuration, we can just come to #### var parent -Oi \ No newline at end of file +The parent means \ No newline at end of file