WEB BASED DATA

Web-based syntactically annotated corpora constitute an important body of information that can be understood as a growing per se digital network for word order investigation. Researchers on word order phenomena become an active part of the network when contributing to it with new items, but also when making use of what is at hand with a view to designing the particular empirical support needed for topic-specific theory-oriented research.

The information given below is restricted to Western European languages and is not intended to be complete. We will work on improving it.

PARSED CORPORA – PENN-HELSINKI ANNOTATION SYSTEM

Penn Corpora of Historical English
http://www.ling.upenn.edu/histcorpora/

Tycho Brahe Parsed Corpus of Historical Portuguese
http://www.tycho.iel.unicamp.br/~tycho/corpus/

MCVF Corpus (Modéliser le changement: les voies du français)
http://www.voies.uottawa.ca/corpus_pg_en.html

Icelandic Parsed Historical Corpus
http://linguist.is/icelandic_treebank/Icelandic_Parsed_Historical_Corpus_(IcePaHC)

CORDIAL-SIN – Syntax Oriented Corpus of Portuguese Dialects
http://www.clul.ul.pt/en/resources/411-cordial-corpus

INPOLDER – Integrated Parser and Lemmatizer Dutch in Retrospect
http://depot.knaw.nl/8914/1/Morphology_in_INPOLDER.pdf
http://www.meertens.knaw.nl/cms/en/technologie
http://www.clarin.nl/page/about/projects/162#INPOLDER

OTHER PARSED CORPORA

Geoffrey Sampson's Corpora (SUSANNE, SEMi-SUSANNE, CHRISTINE, LUCY)
http://www.grsampson.net/Resources.html

The Lancaster Parsed Corpus
http://icame.uib.no/lanpeks.html

ICE-GB
http://www.ucl.ac.uk/english-usage/projects/ice-gb/
http://www.ucl.ac.uk/english-usage/projects/ice-gb/sampler/index.htm
http://ice-corpora.net/ice/icegb.htm

DCPSE
http://www.ucl.ac.uk/english-usage/projects/ice-gb/

The Penn Treebank
http://www.cis.upenn.edu/~treebank/home.html

BLLIP 1987-89 WSJ Corpus
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2000T43

Copenhagen-Dependency-Treebank
http://code.google.com/p/copenhagen-dependency-treebank/

Alpino Dependency Treebank
http://odur.let.rug.nl/~vannoord/trees/

NEGRA Corpus
http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/

TIGER Corpus
http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/

LASSY – Large Scale Syntactic Annotation of written Dutch
http://www.let.rug.nl/vannoord/Lassy/

Tübingen Treebank of written German
https://www.clarin.eu/node/379

Nordic Treebank Network
http://w3.msi.vxu.se/~nivre/research/nt.html

Talbanken05
http://w3.msi.vxu.se/~nivre/research/Talbanken05.html

Floresta Sin(c)tática
http://www.linguateca.pt/floresta/

17. TUT – Turin University Treebank
http://www.di.unito.it/~tutreeb/

French Treebank
http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php

RST Spanish Treebank
http://corpus.iingen.unam.mx/rst/index_en.html

3LB
http://www.dlsi.ua.es/projectes/3lb/index_en.html

UAM Spanish Treebank
http://elvira.lllf.uam.es/~sandoval/UAMTreebank.html

VIT – Venice Italian Treebank
http://www.elda.org/catalogue/en/text/W0040.html

The ISST Italian Treebank at CoNLL-2007
http://www.elda.org/catalogue/en/text/W0040.html

PAISÀ
http://www.corpusitaliano.it/en/index.html
http://www.corpusitaliano.it/

Colonia: Corpus of Historical Portuguese
http://corporavm.uni-koeln.de/colonia/index.html

OTHER PROJECTS AND RESOURCES

WALS – The World Atlas of Language Structures Online
http://wals.info/

Edisyn – European Dialect Syntax
http://www.dialectsyntax.org/wiki/Main_Page

CRPC – Corpus de Referência do Português Contemporâneo
http://www.clul.ul.pt/pt/recursos/183-reference-corpus-of-contemporary-portuguese-crpc

Corpus do Português
http://www.corpusdoportugues.org/

Corpus del Español
http://www.corpusdelespanol.org/

CIPM – Corpus Informatizado do Português Medieval
http://cipm.fcsh.unl.pt/

TMILG – Tesouro Medieval Informatizado da Língua Galega
http://ilg.usc.es/tmilg/corpus.html

TILG – Tesouro Informatizado da Língua galega
http://sli.uvigo.es/TILG/

CREA – Corpus de Referencia del Español Actual
http://corpus.rae.es/creanet.html

CORDE – Corpus Diacrónico del Español
http://corpus.rae.es/cordenet.html

AnCora
http://clic.ub.edu/corpus/en

Corpus Textual Informatitzat de la Llengua Catalana
http://ctilc.iec.cat/

CICA – Corpus Informatitzat del Català Antic
http://cica.cat/
http://www.cica.cat/

CORIS/CODIS – Corpus of written Italian
http://dslo.unibo.it/coris_eng.html

Corpus Français – Université de Leipzig
http://wortschatz.uni-leipzig.de/ws_fra/

Corpus du français parle
http://sites.univ-provence.fr/delic/corpus/index.html

Corpus du français parlé au Québec
http://recherche.flsh.usherbrooke.ca/cfpq/

Richard Xiao, "Well-known and influential corpora: a survey"
http://www.lancs.ac.uk/staff/xiaoz/papers/corpus%20survey.htm
http://cw.routledge.com/textbooks/0415286239/resources/corpa3.htm