From c52ea0cf0a290d5090cd6cf4154ef80e21550dac Mon Sep 17 00:00:00 2001 From: alpcentaur Date: Mon, 4 Mar 2024 16:46:57 +0100 Subject: [PATCH] added example1 for js configuration in README.md --- README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/README.md b/README.md index 289fe3c..f00e29a 100644 --- a/README.md +++ b/README.md @@ -24,6 +24,7 @@ - [javascript](#javascript) - [var jsdomain](#var-jsdomain) - [vars jslink and jsiteration](#vars-jslink-and-jsiteration) + - [example1 jslink and jsiteration](#example1-jslink-and-jsiteration) - [var parent](#var-parent) - [example1 parent](#example1-parent) - [vars children](#vars-children) @@ -221,7 +222,35 @@ You can run the spider with display=1 instead of display=0 in the python line of With running the spider while watching the automated mouse moves and clicks, you will be able to find the right xpath for every step and element. +#### example1 jslink and jsiteration +So let us consider evergabe-online as an example. + +``` +evergabe-online: + domain: 'https://www.evergabe-online.de/' + entry-list: + link1: 'https://www.evergabe-online.de/search.html?101-1.-searchPanel> + link2: '-pageLink' + jsdomain: 'https://www.evergabe-online.de/search.html' + jslink1: '/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[' + jslink2: ']' + jsiteration-var-list: "[1,2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] + iteration-var-list: "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] +``` + +Go on jsdomain https://www.evergabe-online.de/search.html. +You will see the table we want to spider. +Open the inspector, and have a look at the button to get to the next site. +Its xpath is '/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[1]' + +Now we click on it. On page two, the button to click us to page three has the xpath: +'/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[2]' + +From page 5 on, the button to get to the next pages stays +'/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[6]' + +until the end. #### var parent