The corpora are split into three directories :
# Construction of the dataset We collected links from Google News in Portuguese and Spanish between July and September 2016. These links redirect international news sites in Spanish (La Jornada, Milenio, El Economista, BBC Mundo, El Colombiano, El Paı́s, CNN en español, etc.) and in Portuguese (G1, Uol Notı́cias, Estadão, O Globo, etc.). Each cluster is composed of related sentences describing a specific event about Science, Sports, Economy, Health, Business, Technology, Accidents/Catastrophes, General Information and other subjects (see our paper for more details). |
Page mise à jour le 10 mars 2018