1. Introduction‎ > ‎

1.1. Introductory remarks

This manual proposes a unified syntactic annotation systemfor the corpora of Portuguese built along the lines initially defined for the Penn Corpora of Historical English, namely the Tycho Brahe Parsed Corpus of Historical Portuguese (TYCHO BRAHE), the Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN),the Post Scriptum Corpus (POST SCRIPTUM)and the Word Order Change in Western European Languages Corpus (WOChWEL).

Together, these corpora contain around 3 million words, and cover the old, the middle, the classical and themodern periods of European Portuguese, as well as the 19th and 20th centuryBrazilian Portuguese.

The general principles forthe analysis and representation of constituents were adopted from the Penn annotation system. However, in order to accommodate the grammaticaldifferences between Middle English and Portuguese as well as the specificity ofdialectal spoken texts of the CORDIAL-SIN corpus, we adapted the original annotationsystem in some aspects. This task was initially carried out by the TYCHO BRAHEcorpus team. Further revisions and the enlargement of the annotation system to fulfillthe requirements of the different sets of Portuguese data result from the collaborativework of the researchers of all the mentioned projects. 

In this unified Manual, we present the general parsing principles and the details of the annotation system for Portuguese. The examples are drawn from the four corpora mentioned above and identified by the code of the source file.

This Manual is intended to be a tool for both the annotators and the users of the corpora.

We acknowledge the support of:

  • FAPESP (grant 2012/06078-9)
  • CNPQ (grant 309764/2014-9)
  • FCT (grants PRAXIS XXI/P/PLP/13046/1998, PTDC/LIN/71559/2006, POCTI/LIN/46980/2002, POSI/PLP/33275/1999, PTDC/CLE-LIN/121707/2010)
  • ERC (7FP/ERC advanced grant - GA 295562)