25 Commits (54daad8dfa62c561c91923de3c737362abf16b0c)
 

Author SHA1 Message Date
  alpcentaur 54daad8dfa started sys arguments for main.py, to be able to control spider from interface 1 year ago
  alpcentaur 89dcca2031 added further handling for javascript links not being urls, made config for giz work 1 year ago
  alpcentaur a0075e429d added further database in config.yaml, added new exception for downloading js generated html pages 1 year ago
  alpcentaur df4a8289b8 added pdf parser if entry link is direct pdf 1 year ago
  alpcentaur 677e54c0c2 added trafilatura to requirements 1 year ago
  alpcentaur 9ceaa28a82 Merge remote-tracking branch 'refs/remotes/origin/master' 1 year ago
  alpcentaur d3335f203b added trafilatura exception 1 year ago
  alpcentaur 61f9ba67fb update README.md 1 year ago
  alpcentaur 69c517292b Update 'README.md' 1 year ago
  alpcentaur 14ece9bceb added functions for uniform and not uniform entry end points - non uniform endpoints are generally parsed as text from any paragraph xml element p 1 year ago
  alpcentaur b2cf4b67ce added first config parameters for search on not uniform entries 1 year ago
  alpcentaur 42841ee650 added some exceptions for bad encoding and get errors 1 year ago
  alpcentaur 317ef99720 changed code in entrylist data2dictionary to handle empty or missing xml elements 1 year ago
  alpcentaur ff23c22e3c added working bund.de-bekanntmachungen config with new example of xpath contains 1 year ago
  alpcentaur 06fa81e549 added function find config parameter and changed core spider 1 year ago
  alpcentaur a846ce04cc specifying the links, new exception clause if soupparser does not work 1 year ago
  alpcentaur a99881796a first function works, actuall xml parser has still problems with certain xml types 1 year ago
  alpcentaur c078ee4b1b first function works, actuall xml parser has still problems with certain xml types 1 year ago
  alpcentaur 8b20bc178f added multi pages configuration and code 1 year ago
  alpcentaur 7aa903883b update to config.yaml 1 year ago
  alpcentaur 59838bb8e1 added main.py importing and using the spider functions 1 year ago
  alpcentaur 5ac07d151a added first config.yaml template and started creating folder structure 1 year ago
  alpcentaur b3011efc73 small change of naming in error message added 1 year ago
  alpcentaur 687d40f156 first change of naming, first commit for the actual spider based on importPEP 1 year ago
  alpcentaur 8783251133 first commit 1 year ago