Learner Corpus of Portuguese L2 - COPLE2

The Learner Corpus of Portuguese as Second/Foreign Language (COPLE2) is a corpus of written and oral texts produced by students of Portuguese as Foreign/Second Language courses in the Instituto de Cultura e Língua Portuguesa (ICLP – FLUL) and by applicants for examinations in the Centro de Avaliação de Português Língua Estrangeira (CAPLE – FLUL). 

The corpus contains texts from learners with 15 different native languages (L1s) and proficiencies from A1 to C1, and covers different topics and types of tasks. It is encoded in TEI format through the TEITOK environment. 

Each learner text is codified with complete metadata concerning the learner profile, the type of task and the circumstances where the text was produced. The original handwritten texts and oral productions are also provided in the corpus, with the modifications performed by the student and the teacher encoded. The corpus contains annotations for part of speech, lemma and learner errors. All the information encoded is searchable through the CQP query language.

The corpus is currently funded by Fundação Calouste Gulbenkian within the RECAP project, and it was previously funded by the Associação para o Desenvolvimento da Faculdade de Letras da Universidade de Lisboa (ADFLUL), Instituto de Cultura e Língua Portuguesa (ICLP) and 
Fundação Calouste Gulbenkian within the LeCIEPLE project. It results from a partnership between several institutions such as ICLP, CAPLE and Centro de Linguística da Universidade de Lisboa (CLUL).


How to cite the corpus

Mendes, Amália, Sandra Antunes, Maarten Janssen & Anabela Gonçalves (2016) The COPLE2 Corpus: A Learner Corpus for Portuguese. In: Proceedings of the Tenth Language Resources and Evaluation Conference – LREC’16, 23-28 May 2016, Portoroz, Slovenia, 3207-3214.