alpcentaur
|
2aa1134b48
|
updated gitignore
|
2024-03-05 14:56:03 +00:00 |
|
alpcentaur
|
a9c2346c04
|
first change to also click Accept Button in English if may come for js spidering functionality
|
2024-03-05 14:50:34 +00:00 |
|
alpcentaur
|
0808e5a42d
|
main.py and config.yaml are left out from updates, only examples are provided. Change in Readme too
|
2024-03-05 14:42:30 +00:00 |
|
alpcentaur
|
4ec9f76080
|
added xorg-server-xephyr as dep to install
|
2024-03-05 14:52:02 +01:00 |
|
alpcentaur
|
10cdab6f60
|
updated README with new and working install order
|
2024-03-05 14:43:43 +01:00 |
|
alpcentaur
|
ccfe20044f
|
added another tip to README.md, header for display and another tip added too
|
2024-03-05 12:34:32 +01:00 |
|
alpcentaur
|
0fa420d74c
|
added explanation of display variable in the spiders code
|
2024-03-05 12:30:29 +01:00 |
|
alpcentaur
|
0d7728240e
|
update var javascriptlink in README.md
|
2024-03-04 17:13:33 +01:00 |
|
alpcentaur
|
c52ea0cf0a
|
added example1 for js configuration in README.md
|
2024-03-04 16:46:57 +01:00 |
|
alpcentaur
|
5000dca314
|
Update README.md with better explanation how to js spider
|
2024-03-04 16:30:31 +01:00 |
|
alpcentaur
|
0908ccf6e5
|
clarifications for javascript link and js link plus js iteration
|
2024-03-03 18:24:07 +01:00 |
|
alpcentaur
|
ff0fe5193d
|
fixed the links for the clickable content summary
|
2024-03-03 17:57:48 +01:00 |
|
alpcentaur
|
49d5c2ffa9
|
third try ordering
|
2024-03-03 17:53:05 +01:00 |
|
alpcentaur
|
f489106ea0
|
second try ordering
|
2024-03-03 17:50:12 +01:00 |
|
alpcentaur
|
32fceffd01
|
searchable headers for step by step guide started
|
2024-03-03 17:48:52 +01:00 |
|
alpcentaur
|
eca77f9b63
|
Step by Step Guide continuation of describing the variables
|
2024-03-01 00:09:38 +01:00 |
|
alpcentaur
|
483eaec26e
|
changed domain for new configuration dtvp
|
2024-02-29 14:19:45 +01:00 |
|
alpcentaur
|
c33dbc37e6
|
Merge remote-tracking branch 'refs/remotes/origin/master'
Merging local changes to the code with changes to the README.md on gitea instance
|
2024-02-29 13:16:48 +00:00 |
|
alpcentaur
|
a07d2e93f6
|
changes for new database dtvp, new exceptions trying to click away cookie pop ups
|
2024-02-29 13:15:34 +00:00 |
|
alpcentaur
|
d284fef015
|
changes for new database dtvp, new exceptions trying to click away cookie pop ups
|
2024-02-29 13:15:01 +00:00 |
|
alpcentaur
|
5fd6b7f781
|
Part 2 of Step by Step Guide
|
2024-02-28 17:34:57 +01:00 |
|
alpcentaur
|
e4fa13d29d
|
Start of Step by Step Guide
Oi
|
2024-02-28 17:17:27 +01:00 |
|
alpcentaur
|
7ba196b0c2
|
changed size of virtual window, added some scrolling and shortened the time for js lazy loading enforced slow downloading
|
2024-02-11 17:08:33 +00:00 |
|
alpcentaur
|
a56569712e
|
another small change to config.yaml before pushing
|
2024-02-11 16:43:44 +00:00 |
|
alpcentaur
|
a0dd469f25
|
added new database ted.europe.eu, created new case of slow downloading, intergrated scrolling into entrylistpagesdownload
|
2024-02-09 18:38:49 +00:00 |
|
alpcentaur
|
094f092291
|
deleted fdb entry that was a ghost for syntax reasons, but same syntax should be in other fdb anyway
|
2024-01-23 17:17:40 +01:00 |
|
alpcentaur
|
d7d157bf42
|
added further dokumentation to README.md
|
2024-01-21 14:07:38 +00:00 |
|
alpcentaur
|
0500f5853d
|
full working example from localhost
|
2024-01-15 21:08:23 +00:00 |
|
alpcentaur
|
0411d74936
|
deleted config.yaml.save
|
2024-01-15 19:12:04 +00:00 |
|
alpcentaur
|
cf3bb52684
|
corrected link glueing for pdf links for loop
|
2024-01-15 19:09:28 +00:00 |
|
alpcentaur
|
af8374f715
|
added other exception for unitrue var text not being found, before saving index 0 to variable produced error to whole execution
|
2024-01-10 15:28:41 +00:00 |
|
alpcentaur
|
20db0028e1
|
added first changes to fix js related bug for giz db
|
2024-01-10 15:18:36 +01:00 |
|
alpcentaur
|
dec60f9bf5
|
added changed logic for link addition regarding entry links
|
2023-12-18 21:26:53 +00:00 |
|
alpcentaur
|
5d17f4e421
|
corrected error which arised in logic of wget backup get
|
2023-12-15 14:36:08 +01:00 |
|
alpcentaur
|
92c238a2ed
|
added instruction for downloading chromium driver for python selenium to README.md
|
2023-12-15 14:13:41 +01:00 |
|
alpcentaur
|
ece5cf1301
|
added better logic for getting the right link of entry
|
2023-12-15 13:34:23 +01:00 |
|
alpcentaur
|
0e58756600
|
added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links
|
2023-12-15 11:33:50 +00:00 |
|
alpcentaur
|
16199256e3
|
javascript on highest level done better
|
2023-12-14 23:37:10 +00:00 |
|
alpcentaur
|
5627c80177
|
merged onlinkgen with master, and added more universal chrome driver initialization to the beginning of the javascript entries gothrough function in download_entry_list_pages_of_funding_databases()
|
2023-12-14 12:38:14 +00:00 |
|
alpcentaur
|
14b8db7941
|
started adding javascript handling on highest spider level
|
2023-12-14 12:07:14 +00:00 |
|
alpcentaur
|
fbee5d6229
|
last commit in detached head
|
2023-12-13 16:20:27 +01:00 |
|
alpcentaur
|
953f85ee5b
|
added new lines to chromedriver, to make it work on other systems
|
2023-12-13 16:05:26 +01:00 |
|
alpcentaur
|
d2324d265a
|
added pdf child text downloading and parse to json exceptions/cases for javascript entry data and normal data
|
2023-12-06 13:46:54 +00:00 |
|
alpcentaur
|
885c210971
|
added selenium for pop up entry links
|
2023-12-05 22:19:00 +00:00 |
|
alpcentaur
|
ec180bed0a
|
added flow for selenium grabbing popup instead of links for entries
|
2023-12-05 22:16:07 +00:00 |
|
alpcentaur
|
b4fd385c5d
|
did some changes to main.py for using sys.argv
|
2023-12-05 18:23:57 +01:00 |
|
alpcentaur
|
99c74dcbad
|
updated requirements.txt
|
2023-12-05 16:59:13 +00:00 |
|
alpcentaur
|
54daad8dfa
|
started sys arguments for main.py, to be able to control spider from interface
|
2023-12-05 17:51:16 +01:00 |
|
alpcentaur
|
89dcca2031
|
added further handling for javascript links not being urls, made config for giz work
|
2023-11-28 15:27:39 +00:00 |
|
alpcentaur
|
a0075e429d
|
added further database in config.yaml, added new exception for downloading js generated html pages
|
2023-11-27 15:10:11 +00:00 |
|