Browse Source

added explanation of display variable in the spiders code

master
alpcentaur 8 months ago
parent
commit
0fa420d74c
1 changed files with 15 additions and 1 deletions
  1. +15
    -1
      README.md

+ 15
- 1
README.md View File

@ -218,7 +218,7 @@ Sometimes the xpath changes, after the new js content got loaded. That is where
Which means we emulate a whole virtual "user" using a virtual "browser" on his or her virtual "screen". In the end the clickable elements are defined by xpath too, so put these accordingly in the jslink and jsiteration variables. Which means we emulate a whole virtual "user" using a virtual "browser" on his or her virtual "screen". In the end the clickable elements are defined by xpath too, so put these accordingly in the jslink and jsiteration variables.
You can run the spider with display=1 instead of display=0 in the python line of the virtual display the chromium driver is running on. I will put that in the initialization of the spider.
You can run the spider with display=1 instead of display=0 in the python line of the virtual display the chromium driver is running on. I will put that in the initialization of the spider. How to do this, in general if you use any of the js related variables instead of setting "NONE" (important: and not "None"), will be described in the paragraph display. It is very useful to debug js related configs.
With running the spider while watching the automated mouse moves and clicks, you will be able to find the right xpath for every step and element. With running the spider while watching the automated mouse moves and clicks, you will be able to find the right xpath for every step and element.
@ -252,6 +252,20 @@ From page 5 on, the button to get to the next pages stays
until the end. until the end.
#### display
When you run the spider with js spidering enabled, in fdb_spider.py a display will get created. If you open nano, and press crtl+w, you can type display and enter. This will bring you to the lines of code, generating the display.
Watch out for the line
```
display = Display(visible=0, size=(1200, 800))
```
If you change visible=0 to visible=1 here, the spider will run with actually open a viewable browser on the workspace.
This line is present two times in the code. One for downloading the pages with the links, and one for downloading the pages of/behind the links.
After finding and jumping to "display" with ctrl-w, go down some lines, and issue ctrl-w again. Or find out in the manual how to jump to the next occurance, I knew it once.
#### var parent #### var parent
The parent stands for the last xml element which contains the entry links. Go with the Inspector on the entry respectively one of the links, klick on it and klick in the code view afterwards. Now use the arrow up key to get to the last child before it comes to the parent. You can see it on the rendered html blue contained. The parent stands for the last xml element which contains the entry links. Go with the Inspector on the entry respectively one of the links, klick on it and klick in the code view afterwards. Now use the arrow up key to get to the last child before it comes to the parent. You can see it on the rendered html blue contained.

Loading…
Cancel
Save