60 Commits (49d5c2ffa939fcdc9a6c901b056f2834b460e324)
 

Author SHA1 Message Date
  alpcentaur 49d5c2ffa9 third try ordering 8 months ago
  alpcentaur f489106ea0 second try ordering 8 months ago
  alpcentaur 32fceffd01 searchable headers for step by step guide started 8 months ago
  alpcentaur eca77f9b63 Step by Step Guide continuation of describing the variables 8 months ago
  alpcentaur 483eaec26e changed domain for new configuration dtvp 8 months ago
  alpcentaur c33dbc37e6 Merge remote-tracking branch 'refs/remotes/origin/master' 8 months ago
  alpcentaur a07d2e93f6 changes for new database dtvp, new exceptions trying to click away cookie pop ups 8 months ago
  alpcentaur d284fef015 changes for new database dtvp, new exceptions trying to click away cookie pop ups 8 months ago
  alpcentaur 5fd6b7f781 Part 2 of Step by Step Guide 8 months ago
  alpcentaur e4fa13d29d Start of Step by Step Guide 8 months ago
  alpcentaur 7ba196b0c2 changed size of virtual window, added some scrolling and shortened the time for js lazy loading enforced slow downloading 9 months ago
  alpcentaur a56569712e another small change to config.yaml before pushing 9 months ago
  alpcentaur a0dd469f25 added new database ted.europe.eu, created new case of slow downloading, intergrated scrolling into entrylistpagesdownload 9 months ago
  alpcentaur 094f092291 deleted fdb entry that was a ghost for syntax reasons, but same syntax should be in other fdb anyway 9 months ago
  alpcentaur d7d157bf42 added further dokumentation to README.md 9 months ago
  alpcentaur 0500f5853d full working example from localhost 9 months ago
  alpcentaur 0411d74936 deleted config.yaml.save 9 months ago
  alpcentaur cf3bb52684 corrected link glueing for pdf links for loop 9 months ago
  alpcentaur af8374f715 added other exception for unitrue var text not being found, before saving index 0 to variable produced error to whole execution 10 months ago
  alpcentaur 20db0028e1 added first changes to fix js related bug for giz db 10 months ago
  alpcentaur dec60f9bf5 added changed logic for link addition regarding entry links 10 months ago
  alpcentaur 5d17f4e421 corrected error which arised in logic of wget backup get 10 months ago
  alpcentaur 92c238a2ed added instruction for downloading chromium driver for python selenium to README.md 10 months ago
  alpcentaur ece5cf1301 added better logic for getting the right link of entry 10 months ago
  alpcentaur 0e58756600 added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links 10 months ago
  alpcentaur 16199256e3 javascript on highest level done better 11 months ago
  alpcentaur 5627c80177 merged onlinkgen with master, and added more universal chrome driver initialization to the beginning of the javascript entries gothrough function in download_entry_list_pages_of_funding_databases() 11 months ago
  alpcentaur 14b8db7941 started adding javascript handling on highest spider level 11 months ago
  alpcentaur fbee5d6229 last commit in detached head 11 months ago
  alpcentaur 953f85ee5b added new lines to chromedriver, to make it work on other systems 11 months ago
  alpcentaur d2324d265a added pdf child text downloading and parse to json exceptions/cases for javascript entry data and normal data 11 months ago
  alpcentaur 885c210971 added selenium for pop up entry links 11 months ago
  alpcentaur ec180bed0a added flow for selenium grabbing popup instead of links for entries 11 months ago
  alpcentaur b4fd385c5d did some changes to main.py for using sys.argv 11 months ago
  alpcentaur 99c74dcbad updated requirements.txt 11 months ago
  alpcentaur 54daad8dfa started sys arguments for main.py, to be able to control spider from interface 11 months ago
  alpcentaur 89dcca2031 added further handling for javascript links not being urls, made config for giz work 11 months ago
  alpcentaur a0075e429d added further database in config.yaml, added new exception for downloading js generated html pages 11 months ago
  alpcentaur df4a8289b8 added pdf parser if entry link is direct pdf 11 months ago
  alpcentaur 677e54c0c2 added trafilatura to requirements 11 months ago
  alpcentaur 9ceaa28a82 Merge remote-tracking branch 'refs/remotes/origin/master' 11 months ago
  alpcentaur d3335f203b added trafilatura exception 11 months ago
  alpcentaur 61f9ba67fb update README.md 11 months ago
  alpcentaur 69c517292b Update 'README.md' 11 months ago
  alpcentaur 14ece9bceb added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 11 months ago
  alpcentaur b2cf4b67ce added first config parameters for search on not uniform entries 11 months ago
  alpcentaur 42841ee650 added some exceptions for bad encoding and get errors 1 year ago
  alpcentaur 317ef99720 changed code in entrylist data2dictionary to handle empty or missing xml elements 1 year ago
  alpcentaur ff23c22e3c added working bund.de-bekanntmachungen config with new example of xpath contains 1 year ago
  alpcentaur 06fa81e549 added function find config parameter and changed core spider 1 year ago