Public Works and Government Services Canada
Symbol of the Government of Canada
Skip all menusSkip first menu  Français  Contact Us  Help  Search  Canada Site
   Español  Home  Publications  Language Update
   Português  Translation Bureau  Writing Tools
   Italiano  Nederlands      
 
Translation Bureau - The Pavel Terminology Tutorial
Introduction – Overview
Terminology Research Principles
Methodology for Creating Terminology Records
Tools
Standardization
Supporting Documentation
Glossary
 
Linguistic Papers by Silvia Pavel
 

4.7.2. Electronic Texts and OCR

Government organizations, research institutions, universities, and private-sector organizations distribute a growing number of electronic documents through their Web sites, with permission to download them. You can access documentation on the sites of professional associations, the press and television networks, although access is not always free. These documentary sources are currently the most commonly used for term extraction in terminology work. It is simply a matter of locating them with the help of search and navigation guides on the Internet, indexing them, and retrieving them with such tools as Isys Desktop 6 or AltaVista Discovery. Some documentation may only be available in hardcopy.

If only a hardcopy of the document is available, you can have the text optically scanned in order to obtain an electronic version. Once the text is available in computer-readable form, you can use computer-assisted term-extraction software or automatic term-extraction software such as Nomino, MultiTrans, or EdiTerm.

Given the current state of the art in OCR, this option is not recommended for very large documents or documents with complicated layouts (graphics, diagrams, tables, multiple languages, etc).



Top of page

Maintained by the Client Services
Last Updated:  2008-12-19