Start of Step by Step Guide
Oi
This commit is contained in:
parent
7ba196b0c2
commit
e4fa13d29d
1 changed files with 41 additions and 0 deletions
41
README.md
41
README.md
|
@ -15,6 +15,7 @@
|
||||||
3. [Usage](#usage)
|
3. [Usage](#usage)
|
||||||
* [Configuration File Syntax](#configuration-file-syntax)
|
* [Configuration File Syntax](#configuration-file-syntax)
|
||||||
* [Efficient Xpath Copying](#efficient-xpath-copying)
|
* [Efficient Xpath Copying](#efficient-xpath-copying)
|
||||||
|
* [Step By Step Guide](#step-by-step-guide)
|
||||||
|
|
||||||
# Introduction
|
# Introduction
|
||||||
|
|
||||||
|
@ -111,3 +112,43 @@ slashes. That will make the spider more stable, in case the websites
|
||||||
html/xml gets changed for maintenance or other reasons.
|
html/xml gets changed for maintenance or other reasons.
|
||||||
|
|
||||||
|
|
||||||
|
## Step By Step Guide
|
||||||
|
|
||||||
|
Start with an old Configuration that is similar to what you need.
|
||||||
|
|
||||||
|
There are Three Types of Configurations:
|
||||||
|
|
||||||
|
The first Type is purely path based. An example is greenjobs.de.
|
||||||
|
The second Type is a mixture of path and javascript functions, giz is an example for this Type.
|
||||||
|
The third Type is purely javascript based. An example is ted.europe.eu.
|
||||||
|
|
||||||
|
Type 1:
|
||||||
|
|
||||||
|
Start with collecting every variable.
|
||||||
|
From up to down.
|
||||||
|
|
||||||
|
### var domain
|
||||||
|
|
||||||
|
domain is the variable for the root of the website.
|
||||||
|
In case links are glued, they will be glued based on the root.
|
||||||
|
|
||||||
|
### var entry-list
|
||||||
|
|
||||||
|
Now come all the variables regarding the entry list pages.
|
||||||
|
|
||||||
|
#### var link1, link2 and iteration-var-list
|
||||||
|
|
||||||
|
In Pseudo Code, whats happening with these three variables is
|
||||||
|
|
||||||
|
```
|
||||||
|
for n in iteration var list:
|
||||||
|
get(link1 + n + link2)
|
||||||
|
```
|
||||||
|
|
||||||
|
So if you are on the no javascript side of reality, you are lucky. Thats all needed to get the collection of links.
|
||||||
|
|
||||||
|
We can just come to
|
||||||
|
|
||||||
|
#### var parent
|
||||||
|
|
||||||
|
Oi
|
Loading…
Reference in a new issue