Здавалка
Главная | Обратная связь

CORPORA AND TRANSLATION STUDIES



The use of corpora in the discipline of translation studies is enjoying increasing popularity. Van Doorslaer (1995) investigates the requirements for compiling corpora that will be useful for descriptive translation studies. Researchers such as Baker (1996), Laviosa (1998) and Kenny (this volume) are using corpora to study the phenomenon of translation by carrying out descriptive studies which have resulted in the identification of a variety of universal features of translated texts. McEnery and Wilson (1993) discuss the potential of corpora in applications such as machine translation and other areas of natural language processing. Malmkjaer (this volume), Peters and Picchi (1998), and Danielsson and Ridings (1996) consider the potential of bilingual and parallel corpora as a source of translation equivalents.

While descriptive translation studies and computer applications for corpora are interesting and important areas of research, they cannot directly help us to remedy the problems outlined above.5 The merits of using bilingual and parallel corpora for trans- lation purposes are obvious; however, the problem here is that there are relatively few corpora of this type available (certainly not enough to cover the wide variety of subject fields that translators have to deal with), and it is more difficult to create this type of corpus because it needs to be aligned.6 Therefore, we decided to conduct a pilot study to see whether a specialized monolingual native-language corpus, which is much sim- pler to compile, could be a useful translation resource.

 

 

PILOT STUDY

This pilot study consists of a translation experiment using a specialized monolin- gual native-language corpus that was carried out at Dublin City University in March

1997.

 

Hypotheses

We hypothesize that a monolingual native-language corpus comprised of texts in a specialized subject field can be a useful resource for translators translating into their native language. We consider it will be more useful than conventional resources (e.g. specialized monographs, journals, or monolingual dictionaries) in the following ways:

 

a) It will help translators to increase their understanding of the subject field and thereby make fewer errors resulting from lack of subject comprehension.


 

 

4 Meta, XLIII, 4, 1998

 

 

Conventionally, translators have been encouraged to do some background read- ing in order to become familiar with a subject field before starting the translation. In this respect, an electronic corpus offers several advantages over a conventional printed corpus.

Firstly, it is often time-consuming to physically gather together a printed corpus (trips to various libraries and documentation centres, hours at the photocopier, etc.), with the result that translators often have time to gather only a handful of documents. An electronic corpus containing hundreds or even thousands of documents can often be compiled in a shorter time, thus giving the translator more information to consult. Furthermore, once compiled, an electronic corpus has the advantage of being reusable for other translations in the same subject area.

Once gathered, it takes longer to consult a conventional corpus. As observed by

Miller (1993: 8):

 

Popular science texts and magazines are sometimes helpful in providing explanations of key concepts in a given field. Specialized publications — including books, textbooks, journals, manuals and papers — can be valuable sources of specific technical information. The main problem with these latter sources is determining and then locating the most appropriate items. The bibliography or references contained in the source text can sometimes help narrow down the search, but all too often the translator still finds himself rapidly scanning technical material, hoping to stumble onto a discussion of a relevant point.

 

In contrast, translators using an electronic corpus coupled with a corpus analysis tool can go directly to areas of text containing key words. 7 They can then read as much or as little of the discussion as desired — from a sentence, to a couple of paragraphs, to the entire text. Moreover, as mentioned above, they have a larger selection of texts available for consultation, so if one explanation is not particularly helpful, they can quickly move on to the next.

It is worth noting that specialized dictionaries are generally easy to find and con- sult; however, a common drawback here is that concepts are often treated in isolation, and it is difficult for the non-expert (i.e. the translator) to draw out the implicit rela- tions between them in order to get an overview of the subject field as a whole.8 In a cohesive text, such as that found in a corpus, it is generally easier to understand how the concepts in a subject field relate to one another.

 

b) It will help translators to make fewer errors related to target language production.

 

Something often overlooked is that it is not enough for translators to understand the concepts and identify the equivalent terms: they also need information about how to properly use specialized terminology. One of the simplest and most useful types of usage information is collocational information. Collocations are combinations of words that, according to the conventions of a given language, are habitually associated (Sinclair 1991: 170). Some specialized dictionaries do provide limited collocational or contextual information; however, the translator inevitably wants or needs more. If translators really want to know what the usage conventions are for a term in the domain they are working in, professional literature is the best source for supplying the answer.

 

Putting a word in context means breathing life into it. Taking a word out of context is like stuffing an animal. If you want to know something about animals you may learn a lot from looking at a stuffed specimen. You may even learn more by dissecting it, but if you want to know about the behaviour of animals you must study them in their natural environment. If you want to know how words behave you must study them in their natural environment too, and the natural environment of words is text, context. Finding information in "live"


 

 

NATIVE-LANGUAGE CORPORA AS A TRANSLATION RESOURCE 5

 

 

text means plodding through reams of books, newspaper, and more than anything else, professional literature. Not the kind of task that combines well with completing a translation on a tight deadline. (Roumen and van der Ster 1993: 215-216)

 

In addition to being time consuming, detecting linguistic patterns is actually more difficult when working with a conventional corpus: the translator may simply not notice a pattern when its occurrences are spread over several pages, or even several doc- uments. The concordancing feature of a corpus analysis tool (see section 3.5), however, can quickly bring together all occurrences of a given pattern. Moreover, through statis- tical information, corpus analysis tools can also provide an indication as to the central- ity of the pattern (i.e. whether it is just one author's idiosyncratic usage, or a well- accepted pattern in expert discourse).

 

Participants

When choosing participants for our experiment, we had several criteria in mind:

 

a) We are involved in translator training and one of the aims of this study is to try and improve the quality of our students' translations; therefore, we wanted the participants to be students.

b) However, because we wanted to test the potential of the corpus for improving specific skills (i.e. subject-field understanding and specialized native-language competence), we felt the participants should not be entirely novice, but rather should be familiar with the basic principles of translation and should have fol- lowed several courses in translation practice and specialized translation.

c) Because one of our objectives is to see if the corpus can help translators im- prove their specialized native-language competence, we wanted all the partici- pants to be native speakers of English.

d) Because the experiment is not intended to help students improve their foreign language skills, we wanted to ensure that the participants already had a reason- ably good grasp of the source language (in this case, French). We hoped that this would minimize the number of errors caused by linguistic comprehension prob- lems.

e) In order to reflect what we consider to be the prototypical situation for most translators, we felt that the participants should be comfortable with the general subject area, but should not be true experts on the specific subject matter of the text (see section 3.3).

f) Because part of the experiment involved using an electronic corpus and corpus analysis tool, we wanted the participants to be computer literate and able to ma- nipulate a corpus using a corpus analysis tool.

 

With these criteria in mind, we asked fourteen students from the Applied Com- putational Linguistics programme at DCU to participate in the experiment. We felt these students would be ideal candidates for the experiment for the following reasons: all of the students were native speakers of English; all were in their final year of under- graduate studies and had followed a minimum of 3 courses in translation and 6 courses in French language; all had spent a year at a university in either France or Belgium; all were familiar with the general field of computing, but none were experts in the subfield of optical scanning technology; all had followed a course in corpus linguistics where they had learned to manipulate corpora using a corpus analysis tool.


 

 

6 Meta, XLIII, 4, 1998

 

 

Texts

Similarly, we had several criteria in mind when choosing the texts for translation.

 

a) We felt that the texts should treat a subject field that falls within both our gen- eral area of interest and expertise (because we had to evaluate it), and also that of the students (as described above).

b) We felt that the texts should be specialized, but not highly technical because the participants were students and not professional translators.

c) Because we were relying on the good will of the students who volunteered their time to participate, we felt the texts should cover a subject coherently with- out exceeding 300 words in length.

 

Bearing these criteria in mind, we chose two extracts from an article on optical scanners which appeared in the French journal Science et Vie Micro in January 1991. This is a popular science magazine so although its contents are specialized, they are not highly technical. The first text (282 words) was a description of the way that a flatbed scanner operates. The second text (301 words) contained a brief description of three different types of scanners, and a discussion about the advantages and disadvantages of each. We felt these texts were suitable because they dealt with a general area with which the students were comfortable and familiar (i.e. computing), but it was not a subject that they had dealt with specifically in either their computing or translation classes, so the concepts, terms, and discourse style would not be immediately familiar to them, and therefore we anticipated that they would have to make use of the resources pro- vided. Moreover, optical scanning is an area in which we have conducted terminologi- cal research in the past (Bowker 1995), so we felt competent to evaluate their work. The source texts used in the experiment can be found in appendix A.

 

3.4. Resources9

Any issues relating to the students' foreign-language skills were beyond the scope of this study; therefore, all students, whether using the conventional monolingual resources or the corpus, were allowed to use general bilingual dictionaries during the experiment. The dictionaries provided included Robert-Collins, Oxford-Hachette, and Harrap's.

 

3.4.1. Conventional monolingual resources

The conventional monolingual resources which we allowed the students to con- sult included both lexicographic and non-lexicographic resources.

 

3.4.1.1. Lexicographic resources

 

• Oxford English dictionary;

• five specialized dictionaries and glossaries relating to computing.

 

3.4.1.2. Non-lexicographic resources

 

• an encyclopaedia of electronics;

• a user manual for an optical scanner;

• an article on scanners and optical character-recognition from a popularized computer journal;

• a monograph on desktop publishing.


 

 

NATIVE-LANGUAGE CORPORA AS A TRANSLATION RESOURCE 7

 

 

3.4.2. Electronic corpus

Our electronic corpus was extracted from a series of commercially available CD- ROMs called Computer Select (Ziff-Davis Publishing, Computer Library, NY).10 Each disc contains English-language articles from several hundred publications dealing with a wide range of computer-related topics.11

In Bowker (1996), we discussed several criteria that must be met in order to com- pile a specialized corpus that is balanced and representative for the purposes of termi- nography, and we feel that this can also be extended to the case of specialized translation. Table 1 outlines these criteria and evaluates the potential of Computer Select for meeting them. The potential of Computer Select for fulfilling the specific criterion text type is further elaborated in tables 2 and 3.

Although Computer Select does not perfectly meet all the criteria required to make a corpus a completely balanced and representative source for the purposes of spe- cialized translation, we feel that it meets enough of these criteria to justify using it as a source for our corpus. As pointed out by Atkins et al. (1992: 6):

 

... we have found any corpus — however "unbalanced" — to be a source of information and indeed inspiration. Knowing that your corpus is unbalanced is what counts. It would be short-sighted to wait until one can scientifically balance a corpus before starting to use one, and hasty to dismiss the results of corpus analysis as "unreliable" or "irrelevant" simply because the corpus used cannot be proved to be balanced.

 

3.4.2.1. Compilation procedure for the electronic corpus

As mentioned above, each Computer Select disc comprises thousands of articles taken from hundreds of journals dealing with a wide range of computer-related topics. All the articles in Computer Select are indexed according to key words.

We had previously done some terminology work in the subject field of optical scanners (Bowker 1995), so as a first step to compiling our corpus, we consulted our previous research to see what important terms we had identified in the field. Next, we used these terms as key words, instructing the computer to select all articles indexed according to these key words and to concatenate them into one file. 14 We repeated this step for each of the discs spanning the time period May 1989 to February 1995. As illus- trated in table 4, the end result was four files which contained a total of approximately

1.5 million words and constituted almost 10MB of data.

 







©2015 arhivinfo.ru Все права принадлежат авторам размещенных материалов.