66 Commits (0fa420d74c181cccd0ed903727e7d089a2c67ece)
 

Author SHA1 Message Date
  alpcentaur 0fa420d74c added explanation of display variable in the spiders code 6 months ago
  alpcentaur 0d7728240e update var javascriptlink in README.md 6 months ago
  alpcentaur c52ea0cf0a added example1 for js configuration in README.md 6 months ago
  alpcentaur 5000dca314 Update README.md with better explanation how to js spider 6 months ago
  alpcentaur 0908ccf6e5 clarifications for javascript link and js link plus js iteration 6 months ago
  alpcentaur ff0fe5193d fixed the links for the clickable content summary 6 months ago
  alpcentaur 49d5c2ffa9 third try ordering 6 months ago
  alpcentaur f489106ea0 second try ordering 6 months ago
  alpcentaur 32fceffd01 searchable headers for step by step guide started 6 months ago
  alpcentaur eca77f9b63 Step by Step Guide continuation of describing the variables 6 months ago
  alpcentaur 483eaec26e changed domain for new configuration dtvp 6 months ago
  alpcentaur c33dbc37e6 Merge remote-tracking branch 'refs/remotes/origin/master' 6 months ago
  alpcentaur a07d2e93f6 changes for new database dtvp, new exceptions trying to click away cookie pop ups 6 months ago
  alpcentaur d284fef015 changes for new database dtvp, new exceptions trying to click away cookie pop ups 6 months ago
  alpcentaur 5fd6b7f781 Part 2 of Step by Step Guide 6 months ago
  alpcentaur e4fa13d29d Start of Step by Step Guide 6 months ago
  alpcentaur 7ba196b0c2 changed size of virtual window, added some scrolling and shortened the time for js lazy loading enforced slow downloading 7 months ago
  alpcentaur a56569712e another small change to config.yaml before pushing 7 months ago
  alpcentaur a0dd469f25 added new database ted.europe.eu, created new case of slow downloading, intergrated scrolling into entrylistpagesdownload 7 months ago
  alpcentaur 094f092291 deleted fdb entry that was a ghost for syntax reasons, but same syntax should be in other fdb anyway 8 months ago
  alpcentaur d7d157bf42 added further dokumentation to README.md 8 months ago
  alpcentaur 0500f5853d full working example from localhost 8 months ago
  alpcentaur 0411d74936 deleted config.yaml.save 8 months ago
  alpcentaur cf3bb52684 corrected link glueing for pdf links for loop 8 months ago
  alpcentaur af8374f715 added other exception for unitrue var text not being found, before saving index 0 to variable produced error to whole execution 8 months ago
  alpcentaur 20db0028e1 added first changes to fix js related bug for giz db 8 months ago
  alpcentaur dec60f9bf5 added changed logic for link addition regarding entry links 9 months ago
  alpcentaur 5d17f4e421 corrected error which arised in logic of wget backup get 9 months ago
  alpcentaur 92c238a2ed added instruction for downloading chromium driver for python selenium to README.md 9 months ago
  alpcentaur ece5cf1301 added better logic for getting the right link of entry 9 months ago
  alpcentaur 0e58756600 added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links 9 months ago
  alpcentaur 16199256e3 javascript on highest level done better 9 months ago
  alpcentaur 5627c80177 merged onlinkgen with master, and added more universal chrome driver initialization to the beginning of the javascript entries gothrough function in download_entry_list_pages_of_funding_databases() 9 months ago
  alpcentaur 14b8db7941 started adding javascript handling on highest spider level 9 months ago
  alpcentaur fbee5d6229 last commit in detached head 9 months ago
  alpcentaur 953f85ee5b added new lines to chromedriver, to make it work on other systems 9 months ago
  alpcentaur d2324d265a added pdf child text downloading and parse to json exceptions/cases for javascript entry data and normal data 9 months ago
  alpcentaur 885c210971 added selenium for pop up entry links 9 months ago
  alpcentaur ec180bed0a added flow for selenium grabbing popup instead of links for entries 9 months ago
  alpcentaur b4fd385c5d did some changes to main.py for using sys.argv 9 months ago
  alpcentaur 99c74dcbad updated requirements.txt 9 months ago
  alpcentaur 54daad8dfa started sys arguments for main.py, to be able to control spider from interface 9 months ago
  alpcentaur 89dcca2031 added further handling for javascript links not being urls, made config for giz work 9 months ago
  alpcentaur a0075e429d added further database in config.yaml, added new exception for downloading js generated html pages 9 months ago
  alpcentaur df4a8289b8 added pdf parser if entry link is direct pdf 10 months ago
  alpcentaur 677e54c0c2 added trafilatura to requirements 10 months ago
  alpcentaur 9ceaa28a82 Merge remote-tracking branch 'refs/remotes/origin/master' 10 months ago
  alpcentaur d3335f203b added trafilatura exception 10 months ago
  alpcentaur 61f9ba67fb update README.md 10 months ago
  alpcentaur 69c517292b Update 'README.md' 10 months ago