latest news

3.02.2012

Course page is on-line.

13.03.2012

First lecture takes place on Monday 13 Feb.

Contact

Iris Hendrickx
E-mail: iris@clul.ul.pt

introduction

This course is called Corpus processing and the aim is provide the students with information about corpus linguistics and all its aspects. What it is a corpus? Why do linguists need corpora? How do you create a corpus? How do you enhance it with linguistic information? What processing tools are there for corpus linguistics? This course is a practical course and will take place in the computer room, sala TIC in the basement of the faculty. The course will be taught in English.
The students will be graded for this course on the basis of several assignments given during the course (70%) and a final exam (30%).

Preliminary Course Schedule

Lecture 1 Introduction

Content: General introduction to this course and to corpus linguistics.
Date: Monday Feb 13, 2012 at 8.00-10.00
Room: TIC

Lecture 2

Content: Corpus design and corpus compilaton
Date: Wednesday Feb 15, 2012 at 8.00-10.00
Room: TIC

--

Férias de Carnaval on Feb 20

----

Lecture 3

Content: Different types of corpora for Portuguese
Date: Feb 22, 2012 at 8.00-10.00
Room: TIC

Lecture 4

Content: Corpus cleaning and preparation: encoding, meta data and XML
Date: Feb 27, 2012 at 8.00-10.00
Room: TIC

Lecture 5

Content: Practice with XML
Date: Feb 29, 2012 at 8.00-10.00
Room: TIC

Lecture 6

Content: linguistic annotation: part-of-speech tagging
Date: March 5, 2012 at 8.00-10.00
Room: TIC

Lecture 7

Content: linguistic annotation: chunking and parsing
Date: March 7, 2012 at 8.00-10.00
Room: TIC

Lecture 8

Content: Treebanks
Date: March 12, 2012 at 8.00-10.00
Room: TIC

Lecture 9

Content: Semantic annotation overview
Date: March 15, 2012 at 8.00-10.00
Room: TIC

Lecture 10

Content: Lexical semantics: words and their meaning
Date: March 19, 2012 at 8.00-10.00
Room: TIC

Lecture 11

Content: Semantics at the sentence level
Date: March 21, 2012 at 8.00-10.00
Room: TIC

Lecture 12

Content: Semantics at the discourse level
Date: March 26, 2012 at 8.00-10.00
Room: TIC

Lecture 13

Content: Annotation tools and annotation evaluation
Date: March 28, 2012 at 8.00-10.00
Room: TIC

--

Easter Holidays on April 2 until April 9, 2012

----

Lecture 14

Content: Practice with annotation tools
Date: April 11, 2012 at 8.00-10.00
Room: TIC

Lecture 15

Content: Speech corpora
Date: April 16, 2012 at 8.00-10.00
Room: TIC

Lecture 16

Content: Practice with tool Exmaralda
Date: April 18, 2012 at 8.00-10.00
Room: TIC

Lecture 17

Content:knowledge representations, taxonomy, ontology
Date: April 23, 2012 at 8.00-10.00
Room: TIC

Lecture 18

Content: Meta data and TEI
Date: April 25, 2012 at 8.00-10.00
Room: TIC

Lecture 19

Content: Learner corpora
Date: April 30, 2012 at 8.00-10.00
Room: TIC

Lecture 20

Content: multi-word expressions
Date: May 2, 2012 at 8.00-10.00
Room: TIC

Lecture 21

Content: practice multi-word expressions
Date: May 7, 2012 at 8.00-10.00
Room: TIC

Lecture 22

Content: register genre and style, experiments with BNC
Date: May 9, 2012 at 8.00-10.00
Room: TIC

Lecture 23

Content: Case study: modality annotation
Date: May 14, 2012 at 8.00-10.00
Room: TIC

Lecture 24

Content: historical corpora
Date: May 16, 2012 at 8.00-10.00
Room: TIC

Lecture 25

Content: course summary
Date: May 21, 2012 at 8.00-10.00
Room: TIC

exam period 09-07-2012 until 21-07-2012

Literature

  • Kennedy, G. (1998) An Introduction to Corpus Linguistics, Londres-Nova Iorque, Longman. Chapter 1 (Introduction) + sections 2.5 (Issues in corpus design and compilation) and 2.6 (Compiling a corpus).
  • McEnery, T. & A. Wilson (1996/2001) Corpus Linguistics, Edimburgo, Edinburgh University Press
  • A Gentle Introduction to XML, online available: here
  • Wynne, M. (2005) Developing Linguistic Corpora: a Guide to Good Practice,
    Book is online available: here
  • Using Computers in Linguistics<
  • Chapter (Unit) A4 Corpus annotation
    Book: Corpus-based Language studies, Anthony McEnery, Richard Xiao, Yukio Tono, 2005
  • Sag, I., T. Baldwin, F. Bond, A. Copestake & D. Flickinger (2002) “Multiword Expressions: A Pain in the Neck for NLP”. In Gelbukh A. (ed.), Proceedings of CICLING-2002.
    Online available: here