Using corpus analysis
software to analyse specialised texts
1. What is a
corpus? in corpus linguistics, a corpus (sometimes used in the plural form
"corpora') can be generally defined as... a collection of
naturally-occurring texts in a computer-readable format which can be retrieved
and analyzed using corpus analysis software'
2. Sources of
language corpora
Subscribe to a large
corpus provider such as the British National Corpus (BNC)
Use web concordancing
http://corpus.leeds.ac.uk/protected/query.html/(general
corpus; English) http://corpus.byu.edu/ (general
corpus, American/British English) http://lextutor.ca/conc/eng/
(general and specialized corpora; English) http://www.arts.chula.ac.th/~ling/TNCII/
(general corpus; Thai) - http://www.arts.chula.ac.th/~ling/ParaConc/
(English-Thai parallel concordance)
compile own corpora and
analyse data using corpus analysis software
Antconc'(
http://www.antlab.sci.waseda.ac.jp/software.html/
) (for monolingual corpus)
‘Wordsmith'
( http://www.lexically.networdsmith/
) (for monolingual corpus) "Paraconc' ( http://www.athel.com/para.html ) for
multilingual corpora.
3.Designing a specialised corpus
Corpus size
- There are no fixed
rules; depending on research purposes, availability of data and time.
-Large,
general corpora may be less useful than small, focused corpora if searches are
made on context-specific terms.
Text extracts vs full
texts
-Depends on the aim of
corpus compilation.
-Whole text offers more
coverage because words or terms to be looked at may be randomly distributed
throughout the text.
Number of texts
-Choices can be made
between collect few texts of large size or a number of texts with smaller sizes.
-writers Choices can
also be made between selecting texts written by one or two key sources, or
texts retrieved from different sources or written by different authors.
- Depends on your
research focus e.g. to study overall language use or to study idiosync or
linguistic choices preferred by particular writers.
Medium
-Can be spoken or
written texts or mixed.
-Depends on research
questions.
-Some practical factors
should also be considered.
Subject and text type
-Should mainly focus on
the specialised text under investigation, although this is less clear cut in
multidisciplinary subjects.
-Texts may come from
different subjects if the research focus is on the study of particular language
features rather than term extraction.
-Text types within a
specialised subject field may vary from 'expert-to-expert texts
'expert-to-non-expert' texts.
Other considerations
-Authorship
-Language
-Publication date
4.Sources
of specialised texts
-Printed materials software.
-word document texts CD-ROMs
-Texts on the Web
-Online databases
5.
Getting started with Antconc
-Download the latest
version of Antconc and watch YouTube tutorials from http://www.antlab.Sci.waseda.ac.jplantconc_index.html
-Creating a specialized
corpus profile.
- Doing small-scaled
research on your own specialized corpora. Using corpora to do research in ESP.
0 ความคิดเห็น:
แสดงความคิดเห็น