Local Languages Initiative
06 Dec 2017
The ICT Agency of Sri Lanka (ICTA) has been promoting the use of ICT in Sinhala and Tamil, and has been addressing issues relating to standard fonts and keyboards in Sinhala and Tamil.
The objective is to ensure that the benefits of ICT should be taken to the majority of the population in Sri Lanka which includes people who prefer to use ICT in Sinhala or Tamil, if given a choice.
Previously, software applications used their own fonts. There was no standard font for the industry. Therefore documents produced using one application could be accessed and used only through that application. For example, when accessing a Sinhala website, various legacy fonts had to be downloaded, otherwise the websites were displayed as indecipherable jargon. This was a major problem when a person tried to use a document created by another, which had been produced using a different font. The font had to be sent to the recipient together with a Sinhala document, unless the recipient already had the font. This made the use of Sinhala email impractical, and slowed the use of Sinhala on the web. Also specific applications such as word processing, did not integrate with other applications, and functions such as sorting, were not standardized among applications. There was no way in which Sinhala content could be developed for the Internet. It was not possible to search, or to sort.
This gave rise to private, non-standard solutions, and to a large number of proprietary codes for fonts. Now it is possible to type in Sinhala and Tamil, exchange information in Sinhala and Tamil using computers and browse the web in Sinhala and Tamil. New avenues are now open in the use of ICT for most people in Sri Lanka.
Standardization was the key to escaping all the disorder caused by the use of numerous non-standard solutions. The only available international standard for a language character set is Unicode (Universal Encoding). The Unicode standard includes all the world’s languages. As a result, key operating systems can now be used in Sinhala and Tamil.
Standards
ICTA, in partnership with key stakeholders, especially the Sri Lanka Standards Institution, the University of Colombo School of Computing and the University of Moratuwa, worked on standardization. The latest version of the Sinhala standard includes encoding for Sinhala numerals, which were hitherto not known to most people in Sri Lanka. Standards have also been developed for testing and certifying Sinhala and Tamil ICT products. Companies which develop Sinhala and Tamil products such as keyboards, fonts etc can get these tested and certified by the Sri Lanka Standards Institution and obtain a SLS mark.
Sinhala and Tamil Sri Lanka Standards (SLS):
- Sri Lanka Sinhala Character Code for Information Interchange, SLS 1134 : 2004 and SLS 1134 : 2011.
- Sri Lanka Tamil Character Code for Information Interchange, SLS 1326 : 2008 (Tamil)
Sorting (collation sequence)
This refers to the order in which lists of words or phrases are sorted and is a key function in computer systems. With the implementation of the collation sequence it is possible to easily and reliably find words and names in dictionaries and databases.
The sorting orders for Sinhala and Tamil were important issues that have now been resolved. A sorting sequence was needed in order to set up databases and other lists of information in Sinhala and Tamil. This was needed especially for government organizations which maintain lists of information in Sinhala and Tamil. The Sinhala collation sequence is part 1 of SLS 1134: 2004. The Tamil collation sequence is part 1 of SLS 1326: 2008.
Keyboard layouts
The keyboard layouts for Sinhala and Tamil are standardized. Both are based on the “type as you write” method. The functionality of these is similar.
Keyboard input
Software (input methods) necessary for using computers in accordance with the two standard keyboard layouts have been developed. A keyboard driver and a Sinhala and Tamil font has been developed for Mac OS X.Fonts – milestone in using ICT in local languages
ICTA spearheaded the development of Sinhala and Tamil Unicode fonts. At ICTA’s inception, awareness programs were held for font developers. Font developers were training and provided with the knowledge and expertise on how to develop standards based on aesthetically correct Sinhala and Tamil fonts, in order to upgrade the skill and knowledge in this area. The trainees produced several stylized Sinhala fonts. Unicode fonts are needed to discourage users from using legacy, non-Unicode fonts.
The industry has followed and developed Unicode compliant fonts. ICTA has also ensured the development of a Unicode Sinhala font Bhashitha, and a Tamil font SriTamil. The IPR of these two are ICTA’s. These are made available free to users. The font rules are given free to font designers. Development of Unicode fonts has resulted in a multitude of Sinhala content.
Terminology, glossaries, user interfaces
Sinhala and Tamil IT glossaries have been developed and updated over the years by CINTEC (ICTA’s predecessor), ICTA, the Department of Official Languages, UCSC, University of Moratuwa, and the private sector. ICTA, in partnership with Microsoft, developed a 2000 word Sinhala Glossary, as one of the outputs of the project in developing a Sinhala user interface for Windows Vista and Windows 7.
User interfaces in Sinhala are available for Windows Vista, Windows 7, Microsoft Office, and for Linux distributions.
Country top level domain .LK in Sinhala and Tamil
Locale information refers to linguistic and cultural preferences and is necessary for software developers when localizing software to suit the requirements of different locales and to adapt to conventions of different countries & languages. Example: date and time formats, digital grouping and so on. The information for Sinhala and for Tamil with respect to Sri Lanka has been defined and uploaded onto the Unicode site’s Common Locale Data Repository (CLDR).
Outcome
- Using Unicode Sinhala and Tamil is the norm.
- Over 400 Unicode tri-lingual Government websites.
- Proliferation of Sinhala and Tamil blogs.
- Extensive Sinhala content now available.
Challenges
- Embedding input method in operating systems; (to avoid the need to install external things to type).
- Mobiles:
- iOS : try to include aesthetically correct Sinhala font.
- ICTA keypad: standardize
In 2017 and 2018, the projects Text to Speech and Speech to Text and Optical Character Recognition are being implemented under the Local Languages Initiative. Details are as follows:
Text to Speech for Sinhala and Tamil and Speech to Text for Sinhala:
The objective of this project is to develop a text-to-speech system specifically for Sinhala and for Sri Lanka Tamil. The text-to-speech (TTS) system will convert normal text into speech. The quality will be such that the output will closely resemble human speech and will be understood with ease. A speech to text system for Sinhala will also be developed. This project will be especially useful for the visually impaired and would also be helpful to senior citizens in using computers and other devices.
Optical Character Recognition:
The objective is to develop an Optical Character Recognition system for Sinhala, thus facilitating the development of digital content. This would enable the conversion of paper documents, documents in pdf formats and in images into editable formats and enable the migration of existing systems to Unicode. This would result in non-duplication of information, more productive, better utilized staff and reduced storage requirements.
IN ESSENCE
- The Local Languages initiative is to ensure that ICT can be used in Sinhala and Tamil.
- This program also ensures that ICT in local languages is easy to use.
- ICTA has been addressing issues relating to standard fonts and keyboards in Sinhala and Tamil.
- The objective is to ensure that the benefits of ICT should be taken to the majority of the population in Sri Lanka.
- Standardization was the key to escaping all the disorder caused by the use of numerous non-standard solutions.