20 Commits (10cdab6f604f39764087b35683c675d579188820)

Author SHA1 Message Date
  alpcentaur d284fef015 changes for new database dtvp, new exceptions trying to click away cookie pop ups 8 months ago
  alpcentaur a0dd469f25 added new database ted.europe.eu, created new case of slow downloading, intergrated scrolling into entrylistpagesdownload 9 months ago
  alpcentaur 0500f5853d full working example from localhost 10 months ago
  alpcentaur cf3bb52684 corrected link glueing for pdf links for loop 10 months ago
  alpcentaur af8374f715 added other exception for unitrue var text not being found, before saving index 0 to variable produced error to whole execution 10 months ago
  alpcentaur 20db0028e1 added first changes to fix js related bug for giz db 10 months ago
  alpcentaur 0e58756600 added last resort exception for entry page downloading with wget, also implemented some further logic regarding getting the right links 11 months ago
  alpcentaur fbee5d6229 last commit in detached head 11 months ago
  alpcentaur d2324d265a added pdf child text downloading and parse to json exceptions/cases for javascript entry data and normal data 11 months ago
  alpcentaur ec180bed0a added flow for selenium grabbing popup instead of links for entries 11 months ago
  alpcentaur b4fd385c5d did some changes to main.py for using sys.argv 11 months ago
  alpcentaur 54daad8dfa started sys arguments for main.py, to be able to control spider from interface 11 months ago
  alpcentaur a0075e429d added further database in config.yaml, added new exception for downloading js generated html pages 11 months ago
  alpcentaur df4a8289b8 added pdf parser if entry link is direct pdf 11 months ago
  alpcentaur 14ece9bceb added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 1 year ago
  alpcentaur 42841ee650 added some exceptions for bad encoding and get errors 1 year ago
  alpcentaur ff23c22e3c added working bund.de-bekanntmachungen config with new example of xpath contains 1 year ago
  alpcentaur c078ee4b1b first function works, actuall xml parser has still problems with certain xml types 1 year ago
  alpcentaur 59838bb8e1 added main.py importing and using the spider functions 1 year ago