A Corpus Worker’s Toolkit

![]()
©
Hongyin Tao 2005
|
A
Corpus Worker's Toolkit (ACWT) is a collection of NoteTab clips, Perl scripts
and other utilities for Chinese and English text processing. They can do some
cheap and dirty corpus/discourse linguistic work for those who can otherwise
not afford sophisticated yet expensive commercial software programs. Most of
these tools function like macros in word processing programs, but they can do
much more and work in a relatively simple text processing environment. Major
tools included in the Toolkit so far: |
|
Text Utilities 文本处理
ú
Merge Files
ú
HTML<-->Text Conversion
ú
Tagged Text --> Plain Text
Conversion
ú File comparison/sizes/counts/split/join
ú
Character Spacing/Word
Segmentation/POS Tagging Search & Analysis 检索统计
ú
Basic Chinese Concordance
ú
Basic English Concordance
ú
Word List/Frequency
ú
Mutual Info/T-Scores/Z-Score/Log-likelihood
ú
Normed Freq/Ratio/Lexical Density Interactive Text Tagging 互动加码
ú
L2 Errors - The CLEC Tags
ú
Discourse Structure - Samples
ú
Semantics & Pragmatics -
Samples
ú
Sociolinguistics - Samples
ú
Syntax - Samples Discourse Transcription 口语转写
ú
The Du Bois (updated) System (Aug-2005)
ú
Header Info ú Intonation Units/Sequence ú Manners: Voice/Prosody
ú
Metatranscription |
Ø User Guide (Eng, Chin-GB, Chin-Big5) Ø Download Ø Support forum: http://www.corpus4u.com Ø Email: < ht_ling at sbcglobal dot net >
Ø
Last
updated:
Ø
Upgrade Information (
|