Browse Source

Start of Step by Step Guide

Oi
master
alpcentaur 8 months ago
parent
commit
e4fa13d29d
1 changed files with 41 additions and 0 deletions
  1. +41
    -0
      README.md

+ 41
- 0
README.md View File

@ -15,6 +15,7 @@
3. [Usage](#usage) 3. [Usage](#usage)
* [Configuration File Syntax](#configuration-file-syntax) * [Configuration File Syntax](#configuration-file-syntax)
* [Efficient Xpath Copying](#efficient-xpath-copying) * [Efficient Xpath Copying](#efficient-xpath-copying)
* [Step By Step Guide](#step-by-step-guide)
# Introduction # Introduction
@ -111,3 +112,43 @@ slashes. That will make the spider more stable, in case the websites
html/xml gets changed for maintenance or other reasons. html/xml gets changed for maintenance or other reasons.
## Step By Step Guide
Start with an old Configuration that is similar to what you need.
There are Three Types of Configurations:
The first Type is purely path based. An example is greenjobs.de.
The second Type is a mixture of path and javascript functions, giz is an example for this Type.
The third Type is purely javascript based. An example is ted.europe.eu.
Type 1:
Start with collecting every variable.
From up to down.
### var domain
domain is the variable for the root of the website.
In case links are glued, they will be glued based on the root.
### var entry-list
Now come all the variables regarding the entry list pages.
#### var link1, link2 and iteration-var-list
In Pseudo Code, whats happening with these three variables is
```
for n in iteration var list:
get(link1 + n + link2)
```
So if you are on the no javascript side of reality, you are lucky. Thats all needed to get the collection of links.
We can just come to
#### var parent
Oi

Loading…
Cancel
Save