Translation Systems Major historical developments Architectures Hybrid and interactive machine translation systems Online machine translation systems Commercial machine translation systems Reasons for using machine translation systems Conclusion vii ix xii xiv xvi 1 6 8 11 13 14 19 20 22 22 26 30 36 43 55 57 58 66 84 85 87 89 91 viii Contents.
4 Computer-Aided Translation Tools and Resources Workbenches Translation support tools and resources Localization tools Commercial computer-aided translation tools Standards for data interchange Conclusion 5 Evaluating Translation Tools Machine translation systems Computer-aided translation tools Stakeholders Evaluation methods General frameworks for evaluating translation tools Conclusion 6 Recent Developments and Future Directions Machine translation systems Computer-aided translation tools Translation systems with speech technology.
Translation systems for minority languages Translation on the web Machine translation systems and the semantic web The localization industry Conclusion 7 Translation Types Revisited Relationships between topics and translation types Machine translation systems Computer-aided translation tools Conclusion Appendices References Index 93 93 106 113 117 119 128 129 129 131 133 135 139 151 152 152 156 157 162 164 166 170 171 172 173 191 193 195 197 204 218 List of Figures, Tables and Boxes Figures 1. 1 1. 2 1. 3 1. 4 1. 5 2. 1 2. 2 2. 3 2. 4 2. 5 2. 6 2. 7 2. 8 2. 9 2. 10 2. 11 2. 12 2. 13 3. 1 3. 2 3. 3 3. 4 3. 5 3. 6 3. 7 3. 8 3. 9.
Classification of translation types Machine translation model Machine translation system based on usage Human-aided machine translation model Machine-aided human translation model Chronology of translation theories Translation process model Example of sentence representations Holmes schema of translation studies.
A schema of applied translation studies A model of the translation process including pre- and post-editing tasks Example of an English SL text and its pre-edited version Unedited and post-edited Spanish machine translation output Example of natural and controlled languages Example of original English text and its AECMA simplified English version Example of natural English, simplified English and simplified Arabic texts Example of an English controlled language text and its translations Illustration of the translation process using a machine translation system.
Chronology of machine translation development Example of structural representations Machine translation architectures Direct translation model Interlingua model Interlingua multilingual machine translation system model Transfer model Transfer using tree-to-tree parsing Transfer multilingual machine translation system model ix 7 9 10 12 13 23 29 31 37 42 43 44 46 48 50 51 53 54 58 68 68 70 72 72 74 75 76 x List of Figures, Tables and Boxes 3. 10 3. 11 3. 12 3. 13 4. 1 4. 2 4. 3 4. 4 4. 5 4. 6 4. 7 4. 8 4. 9 4. 10 4. 11 4. 12 4. 13 4. 14 4. 15 4. 16 4. 17 4. 18 4. 19 4. 20 4. 21 4. 22 5. 1 5. 2 5. 3 5. 4 5. 5 6. 1 6. 2.
Statistical-based model Probabilities workflow in the statistical-based approach Example-based model Translations by online machine translation systems Example of HTML code in a web page Example of the web page without HTML code Example of a translation workflow using a translation memory system Example of an English source text Pre-translation 1 Database model in translation memory systems Reference model in translation memory systems.
Flowchart to illustrate how to build a parallel corpus Example of a text header in a corpus Example of part-of-speech tagging Example of a concordance for the word round Types of tool used in a localization project Example of the translation process using a machine translation system, a translation database and a terminology database Example of TMX data-sharing Example of a header in TMX.
Example of a body in TMX Example of a header in TBX Example of a body in TBX Example of XLIFF in the localization process Example of a header in XLIFF Example of a body in XLIFF Example of an alternate translation element in XLIFF Example of a glass-box evaluation Example of a black-box evaluation Example of an evaluation process Standardization projects for evaluating machine translation systems EAGLES general evaluation framework Future-use model of translation technology.
Speech technology in translation 78 80 81 87 99 99 102 102 103 103 104 109 110 111 112 114 117 120 121 122 124 125 126 127 127 127 138 139 141 142 145 154 158 Tables 1. 1 3. 1 An example of a table for describing translation types Example of a word entry in KAMI 8 67 List of Figures, Tables and Boxes xi 3. 2 3. 3 3. 4 4. 1 4. 2 4. 3 4. 4 4. 5 4. 6 4. 7 4. 8 7. 1 7. 2 7. 3 7. 4 7. 5 7. 6 7. 7 7. 8 7. 9 7. 10 7. 11 7. 12 7. 13 7. 14 7. 15 7. 16 7. 17 7. 18 7. 19 7. 20.
Imitation in the example-based approach Semantic similarity in the example-based approach Classification of commercial machine translation systems Example of perfect matching Examples of fuzzy matching Higher and lower threshold percentages for fuzzy matching Examples of matching suggestions for bow Example of segments Example of translation units Example of English-French translation units from a database.
Classification of commercial computer-aided translation tools Degree of automation Human intervention Integrated tools Application of theory Application of theory in machine translation systems Source-language texts Target-language texts Stages of the translation process Types of text Language dependency Types of source language Data interchange standards in translation Translation groups.
and data interchange standards Levels of evaluation Methods of evaluation Features in a machine translation system Language coverage in machine translation systems Texts and computer-aided translation tools Language dependency in computer-aided translation tools Number of languages in computer-aided translation tools 82 82 88 95 96 97 98 100 101 102 118 174 175 175 176 177 178 180 181 182 185 186 187 188 189 190 191 193 194 194 195 Boxes 1. 1 5.
1 A translator at work FEMTI evaluation framework 14 147 Series Editors Preface Recent years have witnessed momentous changes in the study of Modern Languages, globally as well as nationally. On the one hand, the rapid growth of English as a universal lingua franca has rendered the command of other languages a less compelling commodity.
On the other hand, the demand for intercultural mediators including translators and interpreters has grown as a result of many recent social, political and economic developments; these include legislative changes, the emergence of supranational organisations, the ease of travel, telecommunications, commercial pressures raising awareness of local needs, migration and employment mobility, and a heightened awareness of linguistic and human rights.
Today, linguistically oriented students wishing to pursue a career in which they are able to further their interest in languages and cultures would be more inclined to choose vocationally relevant courses in which translation and interpreting play an important part rather than traditional Modern Language degrees. Thus the possibilities for professional work in translation and interpreting have been extended, particularly as a result of developments in technology, whether as facilitating the translation process or as a means of dissemination and broadening access to communications in a range of media.
The role of translation is, for example, becoming increasingly important in the context of modern media such as television and cinema, whether for documentary or entertainment purposes.
And the technological possibilities for providing interpreting services, whether to the police officer on the beat or to the businessperson on a different continent, have extended the previously physically confined nature of mediating the spoken word. Not only do these new vistas open up opportunities for the professional linguist, they also point to expanding areas of research in Translation and Interpreting Studies. Practice and theory are of mutual benefit, especially in the case of a relatively young discipline such as Translation Studies.
As a result, the first aim of this series, written primarily for the MA and advanced undergraduate student, is to highlight contemporary issues and concerns in order to provide informed, theoretically based, accounts of developments in translation and interpretation.
The second aim is to provide ready access for students interested in the study and pursuit of Modern Languages to xii Series Editors Preface xiii vocational issues which are of relevance to the contemporary world of translating and interpreting. The final aim is to offer informed updates to practising professionals on recent developments in the field impacting on their discipline.
Linguistic, Culture and Translation Studies University of Surrey Guildford UK GUNILLA ANDERMAN MARGARET ROGERS Acknowledgements I am indebted to three individuals for their contributions. This book would have taken more time to complete if it had not been for Chooi Tsien Yeo who researched background information for me.
Words cannot express my gratitude to Stephen Moore, in between translation deadlines, for putting his experiences as a professional translator into writing. I am extremely indebted to Paul Marriott for his comments and suggestions, particularly on helping to visualize a new way to depict the multidimensional classification of translation types in Chapter 7.
I would like to acknowledge especially the Duke University Libraries and Institute of Statistics and Decision Science at Duke University in providing me with the environment and research facilities where most of this book was written. Also my thanks to the National University of Singapore Libraries, George Edward Library at the University of Surrey, and the Department of Statistics and Actuarial Science at the University of Waterloo for their help.
I would also like to acknowledge the following authors, publishers and organizations for allowing the use of copyright material in this book: John Hutchins, Harold Somers and Elsevier (Academic Press Ltd) for the classification of translation types in Chapter 1; Eugene Nida and the Linguistic Society of America for the translation process in Chapter 2; John Smart and Smart Communications, Inc.
for the controlled and simplified English samples in Chapter 2; Francis Bond and Takefumi Yamazaki for the KAMI MalayEnglish dictionary entry in Chapter 3; Paolo Dongilli and Johann Gamper for the building of a parallel corpus in Chapter 4; Tony Jewtushenko and Peter Reynolds of OASIS for XLIFF in Chapter 4; Enrique de Argaez at Internet World Stats for the statistical figure on the Internet population in Chapter 6; Michael Carl, Reinhard Schaler, Andy Way, Springer Science and Business Media, and Kluwer Academic Publishers for the model of the future use of translation technology in Chapter 6.
To Antonio Ribeiro, Tessadit Lagab, Margaret Rogers and Chooi Tsien Yeo, my most sincere thanks for translating from English into Portuguese, French, German and Chinese respectively. I am solely responsible for any translation errors that occurred. A special thank you goes to Elsie Lee, Shaun Yeo, Angeliki Petrits, Mirko Plitt and Ken Seng Tan for answering some of my queries. xiv Acknowledgements xv To Caroline, Elizabeth, Gillian and Lyndsay, thank you for helping out with keying in corrections on the earlier drafts. Lastly, to my sifu and friend Peter Newmark, a big thank-you for all the translation discussions we had during our coffeebiscuit sessions years ago.
If it had not been for the series editors, Gunilla Anderman and Margaret Rogers, this book would not have been written. I am forever grateful to both of them for their feedback and comments. Thanks to Jill Lake of Palgrave Macmillan for her patience and understanding due to my country-hopping from Southeast Asia to North America during the writing of this book. Waterloo, Canada CHIEW KIN QUAH List of Abbreviations.
ACRoTERMITE AECMA AIA ALPAC ALPS ALT-J/C ALT-J/E ALT-J/M AMTA ASCC ASD ATA BASIC BLEU BSO CAT CAT2 CESTA CFE CIA CICC CRATER CTE CULT DARPA DBMT DIPLOMAT DLT DTS EAGLES EARS EDIG Terminology of Telecommunications European Association of Aerospace Industries Aerospace Industries Association of America Automatic Language Processing Advisory Committee Automatic Language Processing System Automatic Language Translator Japanese to Chinese Automatic Language.
Translator Japanese to English Automatic Language Translator Japanese to Malay Association of Machine Translation in the Americas Automatic Spelling Checker Checker AeroSpace and Defence American Translators Association British American Scientific International, Commercial Bilingual Evaluation Understudy Buro voor Systeemontwikkeling Computer-Aided Translation Constructors, Atoms and Translators Campagne dEvaluation de Systemes de Traduction Automatique Caterpillar Fundamental English Central Intelligence Agency Center of International Cooperation for Computerization Corpus Resources and Terminology Extraction Caterpillar.
Technical English Chinese University Language Translator Defense Advanced Research Projects Agency Dialogue-based Machine Translation Distributed Intelligent Processing of Language for Operational Machine Aided Translation Distributed Language Translation Descriptive Translation Studies Expert Advisory Group on Language Engineering Standards Effective, Affordable Reusable Speech-to-Text European Defence Industries Group xvi List of Abbreviations xvii.
ELDA ELRA ENGSPAN ENIAC EURODICAUTUM EUROSPACE EUROTRA EVALDA EWG FAHQT/FAHQMT FEMTI GENETER GETA HAMT HICATS HT HTML IAMT IATE INTERSECT ISI ISLE ISO JEIDA JEITA JICST-E KAMI KANT KGB LDC LISA LMT LTC LTRAC MAHT MANTRA MARTIF Evaluations and Language resources Distribution Agency European Language Resources Association English Spanish Machine Translation System Electronic Numerical Integrator and Computer European.
Terminology Database Aerospace and Defence Industries Association of Europe European Translation Infrastructure dEVALuation a ELDA Evaluation Working Group Fully Automatic High Quality (Machine) Translation A Framework for the Evaluation of Machine Translation in ISLE Generic Model for Terminology Groupe dEtude pour la Traduction Automatique Human-Aided/Assisted Machine Translation Hitachi Computer Aided Translation System Human Translation HyperText Markup Language International Association of Machine Translation Inter-Agency Terminology Exchange International Sample of English Contrastive.
Texts International Statistical Institute International Standards for Language Engineering International Organization for Standardization Japan Electronic Industry Development Association Japan Electronics and Information Technology Association Japan Information Center of Science and Technology Kamus Melayu-Inggeris (Malay-English Dictionary) Knowledge-based Accurate Translation Komitet Gosudarstvennoi Bezopasnosti Linguistic Data Consortium Localisation Industry and Standards Association Logic-based Machine Translation Language Technology Centre Language Translation Resources Automatic Console Machine-Aided/Assisted Human Translation Machine Assisted Translation Machine Readable Terminology Interchange Format xviii List of Abbreviations.
MASTOR MAT METAL METU MLIR MT NAATI NIST OASIS OCP OCR OLIF OS AR PaTrans PAHO PDA PESA RDF RFC SALT SGML SPANAM SUSY SYSTRAN TAP TAUM TBX TEMAA TGT-1 THETOS TMF TMX TOLL TONGUES TS TTS Multilingual Automatic Speech-to-Speech Translator Machine-Aided/Assisted Translation Mechanical Translation and Analysis of Language Middle East Technical University MultiLingual Information Retrieval Machine.
Translation National Accreditation Authority for Translators and Interpreters Ltd. National Institute of Standards and Technology Organization for the Advancement of Structured Information Standards Oxford Concordance Programme Optical Character Recognition Open Lexicon Interchange Format Operating System Open Standards for Container/Content Allowing Re-use Patent Translation Pan-American Health Organization Personal Digital Assistant Portuguese-English Sentence Alignment Resource Description Framework Request for Comments Standards-based Access to Lexicographical & Terminological Multilingual Resources Standard Generalised Markup.
Language Spanish American Machine Translation System Saarbrucker UbersetzungsSYstem System Translation Think-Aloud Protocols Traduction automatique a lUniversite de Montreal TermBase eXchange Testbed Study of Evaluation Methodologies: Authoring Aids Text-into-Gesture Translator Text into Sign Language Automatic Translator for Polish Terminological Markup Framework Translation Memory eXchange Thai On-Line Library Act II Audio Voice Translation Guide Systems Translation Studies Theoretical Translation Studies List of Abbreviations xix
WebDIPLOMAT WebOnt WWW W3C XLIFF XLT XML Web Distributed Intelligent Processing of Language for Operational Machine Aided Translation Web Ontology World Wide Web WWW Consortium XML Localisation Interchange File Format XML Representation of Lexicons and Terminologies Extensible or Extensive Markup Language This page intentionally left blank Introduction For over half a century, the demand for a variety of translations by different groups of end-users has enabled many types of translation tools to be developed. This is reflected in the systems that will be discussed in this book, ranging from machine translation systems, computer-aided translation tools and translation resources.
The majority of books and articles on translation technology focusing on the development of these systems and tools have been written from the point of view of researchers and developers. More recent publications written with translators in mind have focused on the use of particular tools. This book is intended as an introduction to translation technology for students of translation. It can also be useful to professional translators and those interested in knowing about translation technology. A different approach is taken in that descriptions of particular tools are not provided, and the development of different machine translation and computer-aided translation tools and their uses are discussed.
Programming details and mathematical equations are not considered, except in the discussion of the statistical approach to machine translation where minimal essential formulae are included. Descriptions are given to allow readers to further investigate specific approaches or issues that might interest them, using references cited throughout the book. It is also important to note that no particular approach or design is deemed to be better than any other. Each and every one has their strengths and weaknesses. In many cases, readers will find that examples of systems and tools are given but this does not suggest that they are the best; they are simply examples to illustrate the points made. 1 2 Translation and Technology
While researching this book, I discovered that the majority of publications from the literature on translation technology are about the development of machine translation systems, primarily involving experimental systems developed or being developed at a number of universities and large commercial corporations across the globe. The book will show that many of these systems never achieved their commercial potential and remained as experimental tools, while some others served as tools for other natural-language processing applications. By contrast, not much literature seems to be available on computeraided tools such as translation memory systems.
As we shall see in this book, most computer-aided translation tools are developed by commercial companies and, as a result, progress reports on these tools are rarely published in the public domain. Furthermore, to cater to different needs and demands, a tool like a translation memory system comes in many versions from the most basic to the most advanced. Insights into the use of these tools can be found in translator magazines and occasionally also posted on the World Wide Web (WWW). The evaluation of translation tools falls into a field that is wellresearched. Again we will see that most of the literature focuses on the evaluation of machine translation systems.
Furthermore, the extensive use of translation tools and translation processes involved in the localization industry tend to be discussed separately, giving the impression that they are not related to translation. These two areas are, however, directly relevant to translation technology. Hence they are also included in this book. Essentially, the book contains what is felt should be included in order to provide an overview of translation technology. In order to keep the book at the given length, the topics have been carefully selected with some described in greater detail than others.
In some chapters, an abbreviated historical background has been deemed necessary in order to provide a better understanding of the topics discussed, especially in the description of the development of machine translation systems and their evaluation.
However, in all cases, references have been provided which readers may choose to pursue at a later time. Suggestions for further reading are provided at the end of every chapter (Chapters 1 to 6). The first chapter discusses the definitions of terms referring to the use of computers in translation activities. Some of the terms can be confusing to anyone who is unfamiliar with translation tools.
In some cases, the same translation tools are given different names depending on what they are used for; in other cases, a tool may be differently classified depending on the perspective of those who have developed that tool. Introduction 3 The aim in this chapter is therefore to clarify these terminological and related matters.
An alternative perspective to the four basic translation types fully automated high-quality machine translation, human-aided machine translation, machine-aided human translation, and human translation first proposed by Hutchins and Somers (1992) is introduced to reflect current developments in translation technology. This will be explored in more detail in the final chapter where the four translation types are reviewed in relation to topics described in the book.
The second chapter discusses technology within the larger framework of Translation Studies as a discipline, focusing on the relationship between the engineering of translation technology, on the one hand, and Translation Studies including translation theory, on the other hand. The relationship between academic and professional groups involved in translation is also examined.
This in turn leads to a discussion of the involvement of a particular approach in linguistic theories known as formalisms in natural-language processing especially in the design of machine translation systems. A different perspective on the translation process involving pre- and post-editing tasks using a special variety of language called controlled language is also presented.
This translation process is described using the translation model proposed by Jakobson (1959/2000), a translation model that differs significantly from the one proposed by Nida (1969). The third chapter gives detailed descriptions of different machine translation system designs also known as architectures.
The development of machine translation over several decades, its capabilities and the different types of machine translation systems, past and present, are also included. Both experimental and commercial systems are discussed, although the focus is on the experimental systems.
Even though machine translation has been well-documented elsewhere, a discussion is deemed to be important for this book. It is felt that modern-day professional translators should be informed about machine translation systems because there is every reason to believe, as we shall discover in Chapter 6, that future trends in translation technology are moving towards integrated systems where at least one translation tool is combined with another, as is already the case in the integration of machine translation with translation memory.
The fourth chapter describes the architectures and uses of several computer-aided translation tools, such as translation memory systems, as well as resources such as parallel corpora. Unlike machine translation systems, which are largely developed by universities, most computeraided translation tools are developed by commercial companies. Thus, 4 Translation and Technology information about such tools is harder to obtain. This chapter will also show that computer-aided translation tools are becoming more advanced and using different operating systems, and so standards for data interchange have been created. Three different standards are described. Currently available commercial translation tools are also discussed.
In addition, this chapter presents an overview of other commercially available tools such as those used in the localization industry. The fifth chapter touches on the evaluation of translation technology. The discussion focuses on different groups of stakeholders from research sponsors to end-users. Also included in the discussion are the different methods of evaluation: human, machine, and a combination of human and machine as evaluator. The choice of method used depends on who the evaluation is for and its purpose. It also depends on whether an entire tool or only some components are evaluated. Also described in this chapter is the general framework of evaluation offered by various research groups in the USA and Europe.
The literature on evaluation concentrates on the evaluation of machine translation systems either during the developmental stage or after the process of development is completed. Less information is available on the evaluation of computeraided translation tools. What is available is found mainly in translation journals, magazines and newsletters.
The sixth chapter presents some recent developments and shows the direction in which translation technology is heading, in particular regarding the future of machine translation systems that are now incorporating speech technology features. The integration of speech technology and traditional machine translation systems allows translation not only between texts or between stretches of speech, but also between text and speech.
This integration is proving to be useful in many specific situations around the globe especially in international relations and trade. This chapter also looks at research projects in countries that are involved in the development of translation tools for minority languages and discusses the problems encountered in developing machine translation systems for languages that are less well-known and not widely spoken.
Another form of technology called the Semantic Web that has the potential to improve the performance of certain machine translation systems is also described. Included in this chapter, too, are issues such as linguistic dominance and translation demands on the WWW that are already shaping parts of the translation industry.
The book concludes by presenting an expanded version of the four basic classifications of translation types as suggested by Hutchins and Somers (1992) and introduced in Chapter 1. It is concluded that the Introduction 5 one-dimensional linear continuum originally proposed is no longer able to accurately reflect current developments in translation technology.
Translation tools today come in different versions and types depending on the purposes for which they are built. Some are multifunctional while others remain monofunctional. An alternative way must therefore be found to depict the complexities and multidimensional relationships between the four translation types and the topics discussed in this book.
It is not possible to put every single subject discussed here into one diagram or figure, and so, in order to gain a better understanding of how the issues are related to one another, they are divided into groups. Topics or issues in each group have a common theme that links them together, and are presented in a series of tables. However, it is important to bear in mind that not all topics can be presented neatly and easily even in this way.
This clearly shows the complexity and multidimensionality of translation activities in the modern technological world. At the end of the book, several Appendices provide information on the various Internet sites for many different translation tools and translation support tools such as monolingual, bilingual, trilingual and multilingual dictionaries, glossaries, thesauri and encyclopaedia.
Only a selected few are listed here, and as a result the lists are not exhaustive. It is also important to note that some Internet sites may not be permanent; at the time of the writing, every effort has been made to ensure that all sites are accessible. 1 Definition of Terms In translation technology, terms commonly used to describe translation tools are as follows: ¢ ¢ ¢ ¢ ¢ ¢ machine translation (MT); machine-aided/assisted human translation (MAHT); human-aided/assisted machine translation (HAMT); computer-aided/assisted translation (CAT); machine-aided/assisted translation (MAT); fully automatic high-quality (machine) translation (FAHQT/FAHQMT). Distinctions between some of these terms are not always clear.
For example, computer-aided translation (CAT) is often the term used in Translation Studies (TS) and the localization industry (see the second part of this chapter), while the software community which develops this type of tool prefers to call it machine-aided translation (MAT). As the more familiar term among professional translators and in the field of Translation Studies, computer-aided translation is used throughout the book to represent both computer-aided translation and machine-aided translation tools, and the term aided is chosen instead of assisted, as also in human-aided machine translation and machine-aided human translation. Figure 1. 1 distinguishes four types of translation relating human and machine involvement in a classification along a linear continuum introduced by Hutchins and Somers (1992: 148).
This classification, now more than a decade old, will become harder to sustain as more tools become multifunctional, as we shall see in Chapters 3, 4 and 6. Nevertheless, the concept in Figure 1. 1 remains useful as a point of reference for classifying translation in relation to technology. 6 Definition of Terms 7 MT CAT Machine Fully automated high quality (machine) translation (FAHQT/ FAHQMT) Human-aided machine translation (HAMT) Machine-aided human translation (MAHT) Human Human translation (HT) MT = machine translation; CAT = computer-aided translation Figure 1. 1 Source: Classification of translation types Hutchins and Somers (1992): 148. The initial goal of machine translation was to build a fully automatic high-quality machine translation that did not require any human intervention.
At a 1952 conference, however, Bar-Hillel reported that building a fully automatic translation system was unrealistic and years later still remained convinced that a fully automatic high-quality machine translation system was essentially unattainable (Bar-Hillel 1960/2003: 45). Instead, what has emerged in its place is machine translation, placed between FAHQT and HAMT on the continuum of Figure 1. 1. The main aim of machine translation is still to generate translation automatically, but it is no longer required that the output quality is high, rather that it is fit-for-purpose (see Chapters 2 and 3). As for human-aided machine translation and machine-aided human translation, the boundary between these two areas is especially unclear.