ASIAN 222: Seminar in Corpus Linguistics

 

  Fall 2005   UCLA

 

Instructor: Hongyin Tao 

 

Time: 12:30-1:45.

Location: CDH PC Lab, B01 Luvalle Commons.

 

Office hours: Thursdays 10:00-11:00, 2:00-3:00 & by appointment

Office: 280B Royce Hall                                                                        

Office phone: 310-206-6872/Dept: 310-206-8235       

Email: tao at humnet ucla edu

______________________________________________________________________________

 

Course Description       Requirements      Readings      Weekly Schedule    Class Notes

______________________________________________________________________________

  

Course Description

This graduate seminar introduces some of the commonly used methods and techniques in working with large quantities of computerized (spoken and written) language corpora. It deals with ways with which language corpora can be gathered, tagged/coded, and exploited for research into patterns of language use.

 

We will read chapters from the forthcoming brand-new book by McEnery, Xiao and Tono: Corpus-based Language Studies: An Advanced Resource Book (Routledge 2005). Chapters from other books and a set of research papers will also be discussed.  These readings will provide the necessary foundations for theoretical insights. Equally important, however, are hands-on experiences with corpus data and analytical skills. To this end, we will spend considerable amounts of time working with real data and software tools and investigating concrete research questions.

 

By the end of this seminar, students are expected to 1) have a solid theoretical foundation in corpus-based approaches to language research; 2) be able to construct a mini-corpus for research; 2) be familiar with well known corpora and common corpus analysis tools such as WordSmith Tools and Concordance; 3) conduct a substantial research project on the basis of natural language corpora.

 

Requirements

  • Participation and assignments: 60%                        

    • 4 Mini-written assignments   

    • 2 oral reports                        

  • Final paper: 40%

 

Note:

1) Projects can be in either English or a language of your choice.

2) All work is required to be done on the basis of well-defined corpus data, whether in small or large quantities.

3) Mini-written assignments are expected to be between 2-3 pages in length. Final paper is not to exceed 25 pages, double spaced.

4) According to departmental policy, there should be no incompletes for this class. You will have two chances to discuss your research plans, with the whole class providing input for each participant. 

  

Readings

 

Textbook:

McEnery, Tony, Richard Xiao & Yukio Tono. (forthcoming) Corpus-based language studies: An advanced resource book. London: Routledge. (Sample chapters) - (MXT)

Books on reserve at the Young Research Library:

(1) Biber, Douglas, Randi Reppen, and Susan Conrad. 1998. Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. - (BRC)

(2) Bybee, Joan. 2001. Phonology and language use. Cambridge: Cambridge University Press. Chpts 1-2.

(3) Stubbs, Michael. 1996. Text and corpus analysis: Computer assisted studies of language and culture. Oxford: Blackwell Publishers.
 

Papers:

1) Banks, David. 2005. The Case of Perrin and Thomson: An Example of the Use of a Mini-Corpus
English for Specific Purposes, 24.2:201-211.

 

2) Biber, Douglas; Burges, Jena 2000. Historical Change in the Language Use of Women and Men: Gender Differences in Dramatic Dialogue. Journal of English Linguistics, 28.1:21-37.

 

3) Carter, Ronald; McCarthy, Michael . 2004. Talking, Creating: Interactional Language, Creativity, and Context. Applied Linguistics, 25.1:62-88.

 

4) Church, K., Hanks, P., Hindle, D., Gale, W. 1991. Using Statistics in Lexical Analysis. In Uri Zernik (ed), Lexical Acquisition: Using On-line Resources to Build a Lexicon, Lawrence Erlbaum, pp. 115-164.

 

5) Gabrielatos, Costas, 2005. Corpora and Language Teaching: Just a fling or wedding bells? TESL-E J, Vol. 8, No 4, March 2005.

 

6) Gries, Stefan Th, Stefanowitsch, Anatol, 2004. Extending Collostructional Analysis: A Corpus-Based Perspective on 'Alternations'.  International Journal of Corpus Linguistics, 9.1: 97-129.

 

7) Holmes, Janet. 1994. Inferring Language Change from Computer Corpora: Some Methodological Problems. ICAME Journal, 18, Apr, 27-40.

 

8) Hoey, Michael. 2004. Lexical priming and the properties of text. In Louan Harmann, John Morley and Alan Partington, eds., Corpora and discourse, Bern: Peter Lang. 385-412.

 

9) Louw, B. 2000. Contextual prosodic theory: bringing semantic prosodies to life'. In Heffer C., Sauntson H. and Fox G. (eds.) Words in Context: A tribute to John Sinclair on his Retirement [English Language Research Discourse Analysis Monograph No. 18 (CD-ROM)]. Birmingham: University of Birmingham.

 

10) Newman, John and Sally Rice. 2004. Patterns of usage for English SIT, STAND, and LIE: A cognitively inspired exploration in corpus linguistics. Cognitive Linguistics 15–3 (2004), 351–396.

 

11) Stevens, Vance. 1995. Concordancing with Language Learners: Why? When? What? CAELL Journal, vol 6 #2, Summer 1995, pp. 2-10. 

 

12) Tao, Hongyin. 2000. Adverbs of Absolute Time and Assertiveness in Vernacular Chinese: A Corpus-Based Study. Journal of  the Chinese Language Teachers Association, 2000.3:53-73.

 

13) Tao, Hongyin, McCarthy, Michael J. 2001. Understanding Non-Restrictive Which-Clauses in Spoken English, Which Is Not an Easy Thing. Language Sciences, 23.6:651-677.

 

14) Tao, Hongyin. 2003a. A Usage-Based Approach to Argument Structure: 'Remember' and 'Forget' in Spoken English. International Journal of Corpus Linguistics, 8.1:75-95

 

15) Tao, Hongyin. 2003b. Turn Initiators in Spoken English: A Corpus Based Approach to Interaction and Grammar. In Leistyna, Pepi, & Meyer, Charles F., eds., Corpus analysis: Language structure and language use. Amsterdam: Rodopi. pp. 187-207.

 

16) Thompson, Sandra A. 2002. "Object complements” and conversation: Towards a realistic account, Studies in Language 26:1:125–164.

 


  Top   Course Description       Requirements      Readings      Weekly Schedule    Class Notes

 

 Weekly Schedule

(as of Sept 12, 2005)

     

 

Lecture  

Lab Activities, Assignments, etc.

Week 0   9/29

 

1) Organizational matters;

2) Introduction: corpora and corpus linguistics;

3) Lab resources and settings.

 

Week 1 

 10/4;6

1) Theoretical foundation: Emergent Grammar; grammaticization; frequency effects on language; discourse constructions

 

2) Corpus building; types of corpora, construction methods, relevant issues 

 

Reading assignments:

MXT: Units 1-2;7-8;  

BCR: Ch. 1, M-Boxes 1-2;
Bybee 2001, Chpts. 1-2.

Assignments : 1) Explore different types of corpora;  2) Think of some research questions that are suitable to study with diff. types of corpora, and describe a potential project and the corpus to use in two written pages: due 10/11. 3) Start to compile a mini-corpus of your own with at least 50,000 words or characters. You will continue to work on this corpus as we move along. You research topics may or may not be solely based on this mini-corpus. Consult me for ideas for this project.

Week 2 

 10/11;13

Corpora and Lexicography:

Token, type, lemma, ngrams;

Word frequency, meaning, and genre

 

Reading assignments:

MXT: Units 3-4; 10.2

BCR: Chpt. 2;

 

Due: Initial project description.

 

Concord; Wordlist; Clusters

Annotation & tagging

 

Windows programs:

 

Mac: Concordance software list, especially Conc; A recent piece: TextStat.

 

Here is a comparison of AntCon/TextStat/Compleat Lexical Tutor.

 

Assignment:

Assignment (II): Write a short paper based on the concordance lines from your corpus.

Due 10/18.

 

Week 3 

  10/18;20

 

Corpora and Lexicography: Key words; collocations, semantic prosody 

 

Reading assignments:

MXT: Unit 6;

Tao 2000 ('adverbs');

Louw 2000;

Hoey 2004.

 

Due: Sample concordance analysis.

 

 

WS Tools: Concord (continued). WordList; KeyWords;

Statistical analyses of collocations (Church et al. 1991)

The Web as corpora: KWiCFinder

 

Assignments:

(1) Explore some of the pre-compiled word frequency/lemma lists: Chinese WL; English; Japanese list; for Korean, try the CLID Web site. Pay attention to how they are compiled and what is noticeable from these lists. (2) Compare the pre-compiled lists with the list of your own corpus and see how they differ, if at all. (3) Preliminary oral report on research topic selection

Week 4 

  10/25;27

 

Corpora and lexical grammar: from morphology to gr. constructions 
 

Reading assignments:  
MXT: Unit 10.3-4; 
Newman & Rice 2004;

Tao 2003a ('remember/forget').

WS Tools: Utilities;

 

Segmentation & tagging tools: 

 

Work assignments:

(1) Run a tagger/segmenter on your own corpus and describe the strengths and weaknesses of such systems; (2) Save your segmented /tagged corpus; (3) Written assignment (III): Describe a morphological phenomenon based on your POS-tagged/segmented corpus. Due 11/1.

Week 5 

  11/1;3

 

Corpora and Grammar: Syntactic variation and gr. constructions 

 

Reading assignments:

 

Gries & Stefanowitsch 2004.

Tao & McCarthy 2001

Thompson 2002.

WS Tools: Working with tags

  

Interactive tagging tool: ACWT

 

Work assignment:

Try NoteTab Light/ACWT for interactive coding of a syntactic/text property in your corpus.

 

Week 6 

  11/8;10

 

Corpora & discourse-pragmatics

 

Reading assignments:

MXT: Units 10.10-14;

Carter & McCarthy 2004;

Tao 2003b.

1) WS Tools: Working with tags (continued);

2) Interactive text processing (with NoteTab Light);

3) Technology consultation.

 

Work assignment:

Continue working with NoteTab Light/ACWT to code syntactic/text properties on your corpus. This will become part of your final corpus data set.

Week 7 

 11/15;17
 

Corpora and diachronic change

 

Reading assignments:

MXT: Unit 10.7;

Banks 2005;

Holmes 1994; 

Biber & Burges 2000.

 

Relational database programs: Importing and building a database. (Explore Susanna Cumming's Databases for Linguists)
From concordance to database

Introduction to MS Access (part of the MS Office suite.); coding and querying

 

Work assignments:

(1) Coding your data with a DB program (e.g. MS Access).

(2) Continue working on your mini corpus. Your data set should include: a) the raw data; b) a POS and/or morphologically- tagged data set; c) a subset with textually marked data; and d) a DB data table. Due 11/29.

Week 8 

  11/22

 

Corpora and social ideology: linguistic coding of responsibility

 

Reading assignments:

Stubbs 1996, Chpt 6

 

11/24 Thanksgiving - no class

 

Week 9 

  11/29;12/1
 

Corpora and language teaching

 

Reading assignments:

MXT: Unit 10.8;

Stevens 1995;

Gabrielatos 2005.

 

Due: Data set.

 

Vance Stevens's page

More examples in a number of foreign languages along with other kinds of information

Classroom concordancing: sample plans for grammar teaching writing (TALC handout)

 

Work assignment:

Prepare for your final project report next week (12/6;8).

 

Week 10 

  12/6;8
 

Class Presentations: Final report  

 Class Presentations: Final report

 

 Thursday,

12/15 12:00 Final paper due

 

 

   Top   Course Description      Requirements     Readings     Weekly Schedule    Class Notes

 

 Last revised: Sept. 16, 2005.