A Corpus of Emotions in Spanish Literary Sentences

Le corpus LiSSS littéraire d'émotions en espagnol | El corpus literario LiSSS de emociones en espanol


The LiSSS is a corpus for evaluating emotions Spanish in literary texts. The LiSSS corpus is constituted by clusters of related literary sentences, phrases and paragraphs along with five reference emotions selected by human annotators:

  1. Anger (A) : cluster of anger sentences
  2. Love (L) : cluster of love sentences
  3. Fear (F) : cluster of fear sentences
  4. Happiness (H) : cluster of happiness sentences
  5. Sadness/Pain (S) : cluster of sadness/pain sentences
The purpose of this corpus is not to become a learning corpus, but a testing corpus for algorithms processing literary text.

Construction of the dataset

We have collected literary (poetic, narration, stories, etc.) Spanish sentences and phrases from books or Internet. Each cluster is composed of related sentences describing a specific emotion about Love, Fear, Happiness, Anger and Sadness/Pain (see our paper for more details). The corpus is coded using an emotion code: (A,L,F,H,S). A sentence may belong to several clusters.

The text format is: unique ID, emotion code, the sentence and the author (tabs are separators). By example:


1A	El odio es la venganza de un cobarde intimidado.	# George Bernard Shaw
2A	El desprecio debe ser el más misterioso de nuestros sentimientos.	# Antoine de Rivarol
3LS	Lo único que me duele de morir, es que no sea de amor.	# Gabriel García Márquez
4F	Cuando se viaja en avión solamente existen dos clases de emociones: el aburrimiento y el terror.	# Orson Welles
5A	La ira es una locura corta.	# Horacio
6A	El odio es el invierno del corazón.	# Victor Hugo
7AF	Intenta no ocupar tu vida en odiar y tener miedo.	# Stendhal
8HS	No somos felices: nuestra felicidad es el silencio de la desgracia.	# Jules Renard
9S	He cometido el peor pecado que un hombre puede cometer. No he sido feliz	# Jorge Luis Borges
10H	Felicidad no es hacer lo que uno quiere, sino querer lo que uno hace.	# Jean Paul Sartre
11A	Si los hombres se odian, nada se puede hacer.	# José Saramago

In XML format each champ is separated using suitable xml tags.

The corpus has been single-annotated (more than 42 000 tokens but the number of sentences increase as the new versions go on) or multi-annotated (several annotators, two voting strategies, but the number of annotators may increase). Versions tagged using Freeling 4.1 are also availables.

The LiSSS corpus in formats XML/text (encoding utf8, GNU/Linux end-of-line) is distributed under LGPL license. New versions, with more literary sentences, phrases and annotations will be aggregated periodically.

Télécharger le corpus littéraire LiSSS | Bajar el corpus literario LiSSS | Download the LiSSS corpus


Multi-annotated corpora (202 authors, 500 sentences; output/tagged Freeling 4.1; XML/text format)

Single-annotated corpora (output/tagged Freeling 4.1; XML/text format)


Would you like to collaborate with the LiSSS project? Do you want to send us new classified sentences? Mistakes in the corpus? Please contact us.

How to cite this corpus? If you use LiSSS corpus, please cite:

Contact : Juan-Manuel Torres & Luis-Gil Moreno Jiménez
http://lia.univ-avignon.fr / Universite d'Avignon, France
juan *-* manuel *dot* torres *at* univ-avignon *dot* fr | luis-gil *dot* moreno-jimenez *at* univ-avignon *dot* fr
Updated 2020.June.19