automated Pipeline for parsing profiles of politically exposed persons (PEP) into Wikidata
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

95 lines
1.6 KiB

  1. mexico
  2. mexico memberlist of camera diputados is here:
  3. https://web.diputados.gob.mx/inicio/tusDiputados/listadoDiputadosBuscador;nombre=;estado=;cabeceraMunicipal=;grupoParlamentario=;mail=
  4. link is javascript event, need of selenium
  5. mexico list of senators needs selenium in python to click js
  6. el salvador
  7. el salvador should work with same machine
  8. costa rica
  9. will work with same machine
  10. guatemala
  11. zurzeit nicht aufrufbar? --> doch per internet archive
  12. this one (jan 22) also works for the deputado links
  13. https://web.archive.org/web/20221007084458/https://www.congreso.gob.gt/buscador_diputados#gsc.tab=0
  14. honduras
  15. nicht zu finden
  16. doch, aber nicht alle auf einer seite, sondern doppelt nach
  17. https://resultadosgenerales2021.cne.hn/Integracion/?id=4
  18. its 120 peoplez
  19. general about parser yaml
  20. selenium general yaml conf for
  21. xpaths to click and get data for lists
  22. selenium or not
  23. merge of doubles will be necessary too
  24. merge or not
  25. write/read wikidata
  26. There is a hard query deadline configured which is set to 60 seconds.
  27. example of query
  28. https://query.wikidata.org/sparql?query=SELECT%20?dob%20WHERE%20{wd:Q42%20wdt:P569%20?dob.}&explain=details
  29. instance of (P31)
  30. Q5 is human
  31. given name (P735)
  32. Christian has Q18001597
  33. family name (P734)
  34. Lindner has also a Q
  35. occupation (P106)
  36. Politician Q82955
  37. postition held (P39)
  38. Member of German Bundestag Q1939555
  39. start time (P) 27 October 2009
  40. end time (P) 10 July 2012
  41. member of political party (P102)
  42. party has Q
  43. official website (P856)
  44. http etc without Q