automated Pipeline for parsing profiles of politically exposed persons (PEP) into Wikidata
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

95 lines
1.6 KiB

mexico
mexico memberlist of camera diputados is not available
https://web.diputados.gob.mx/inicio/tusDiputados/listadoDiputadosBuscador;nombre=;estado=;cabeceraMunicipal=;grupoParlamentario=;mail=
link is javascript event, need of selenium
mexico list of senators needs selenium in python to click js
el salvador
el salvador should work with same machine
costa rica
will work with same machine
guatemala
zurzeit nicht aufrufbar? --> doch per internet archive
this one (jan 22) also works for the deputado links
https://web.archive.org/web/20221007084458/https://www.congreso.gob.gt/buscador_diputados#gsc.tab=0
honduras
nicht zu finden
doch, aber nicht alle auf einer seite, sondern doppelt nach
https://resultadosgenerales2021.cne.hn/Integracion/?id=4
its 120 peoplez
general about parser yaml
selenium general yaml conf for
xpaths to click and get data for lists
selenium or not
merge of doubles will be necessary too
merge or not
write/read wikidata
There is a hard query deadline configured which is set to 60 seconds.
example of query
https://query.wikidata.org/sparql?query=SELECT%20?dob%20WHERE%20{wd:Q42%20wdt:P569%20?dob.}&explain=details
instance of (P31)
Q5 is human
given name (P735)
Christian has Q18001597
family name (P734)
Lindner has also a Q
occupation (P106)
Politician Q82955
postition held (P39)
Member of German Bundestag Q1939555
start time (P) 27 October 2009
end time (P) 10 July 2012
member of political party (P102)
party has Q
official website (P856)
http etc without Q