|
@ -24,6 +24,7 @@ |
|
|
- [javascript](#javascript) |
|
|
- [javascript](#javascript) |
|
|
- [var jsdomain](#var-jsdomain) |
|
|
- [var jsdomain](#var-jsdomain) |
|
|
- [vars jslink and jsiteration](#vars-jslink-and-jsiteration) |
|
|
- [vars jslink and jsiteration](#vars-jslink-and-jsiteration) |
|
|
|
|
|
- [example1 jslink and jsiteration](#example1-jslink-and-jsiteration) |
|
|
- [var parent](#var-parent) |
|
|
- [var parent](#var-parent) |
|
|
- [example1 parent](#example1-parent) |
|
|
- [example1 parent](#example1-parent) |
|
|
- [vars children](#vars-children) |
|
|
- [vars children](#vars-children) |
|
@ -221,7 +222,35 @@ You can run the spider with display=1 instead of display=0 in the python line of |
|
|
|
|
|
|
|
|
With running the spider while watching the automated mouse moves and clicks, you will be able to find the right xpath for every step and element. |
|
|
With running the spider while watching the automated mouse moves and clicks, you will be able to find the right xpath for every step and element. |
|
|
|
|
|
|
|
|
|
|
|
#### example1 jslink and jsiteration |
|
|
|
|
|
|
|
|
|
|
|
So let us consider evergabe-online as an example. |
|
|
|
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
evergabe-online: |
|
|
|
|
|
domain: 'https://www.evergabe-online.de/' |
|
|
|
|
|
entry-list: |
|
|
|
|
|
link1: 'https://www.evergabe-online.de/search.html?101-1.-searchPanel> |
|
|
|
|
|
link2: '-pageLink' |
|
|
|
|
|
jsdomain: 'https://www.evergabe-online.de/search.html' |
|
|
|
|
|
jslink1: '/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[' |
|
|
|
|
|
jslink2: ']' |
|
|
|
|
|
jsiteration-var-list: "[1,2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] |
|
|
|
|
|
iteration-var-list: "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
Go on jsdomain https://www.evergabe-online.de/search.html. |
|
|
|
|
|
You will see the table we want to spider. |
|
|
|
|
|
Open the inspector, and have a look at the button to get to the next site. |
|
|
|
|
|
Its xpath is '/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[1]' |
|
|
|
|
|
|
|
|
|
|
|
Now we click on it. On page two, the button to click us to page three has the xpath: |
|
|
|
|
|
'/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[2]' |
|
|
|
|
|
|
|
|
|
|
|
From page 5 on, the button to get to the next pages stays |
|
|
|
|
|
'/html/body/div[8]/main/div[4]/div/div/div[2]/table/thead/tr[1]/td/div[2]/div/span[6]' |
|
|
|
|
|
|
|
|
|
|
|
until the end. |
|
|
|
|
|
|
|
|
#### var parent |
|
|
#### var parent |
|
|
|
|
|
|
|
|