Start of Step by Step Guide
Oi
This commit is contained in:
parent
7ba196b0c2
commit
e4fa13d29d
1 changed files with 41 additions and 0 deletions
41
README.md
41
README.md
|
@ -15,6 +15,7 @@
|
|||
3. [Usage](#usage)
|
||||
* [Configuration File Syntax](#configuration-file-syntax)
|
||||
* [Efficient Xpath Copying](#efficient-xpath-copying)
|
||||
* [Step By Step Guide](#step-by-step-guide)
|
||||
|
||||
# Introduction
|
||||
|
||||
|
@ -111,3 +112,43 @@ slashes. That will make the spider more stable, in case the websites
|
|||
html/xml gets changed for maintenance or other reasons.
|
||||
|
||||
|
||||
## Step By Step Guide
|
||||
|
||||
Start with an old Configuration that is similar to what you need.
|
||||
|
||||
There are Three Types of Configurations:
|
||||
|
||||
The first Type is purely path based. An example is greenjobs.de.
|
||||
The second Type is a mixture of path and javascript functions, giz is an example for this Type.
|
||||
The third Type is purely javascript based. An example is ted.europe.eu.
|
||||
|
||||
Type 1:
|
||||
|
||||
Start with collecting every variable.
|
||||
From up to down.
|
||||
|
||||
### var domain
|
||||
|
||||
domain is the variable for the root of the website.
|
||||
In case links are glued, they will be glued based on the root.
|
||||
|
||||
### var entry-list
|
||||
|
||||
Now come all the variables regarding the entry list pages.
|
||||
|
||||
#### var link1, link2 and iteration-var-list
|
||||
|
||||
In Pseudo Code, whats happening with these three variables is
|
||||
|
||||
```
|
||||
for n in iteration var list:
|
||||
get(link1 + n + link2)
|
||||
```
|
||||
|
||||
So if you are on the no javascript side of reality, you are lucky. Thats all needed to get the collection of links.
|
||||
|
||||
We can just come to
|
||||
|
||||
#### var parent
|
||||
|
||||
Oi
|
Loading…
Reference in a new issue