There is a dearth of good Unicode fonts for Malayalam script. Most publishing houses and desktop publishing agencies still rely on outdated ASCII era fonts. This not only causes issues with typesetting using present technologies, it makes the ‘document’ or ‘data’ created using these fonts and tools absolutely useless — because the ‘document/data’ is still Latin, not Malayalam.
Rachana Institute of Typography (rachana.org.in) has designed and published a new traditional orthography ornamental Unicode font for Malayalam script, for use in headings, captions and titles. It is named after Sundar, who was a relentless advocate of open fonts, open standards and open publishing. He dreamed of making available several good quality Malayalam fonts, particularly created by Narayana Bhattathiri with his unique calligraphic and typographic signature, freely and openly to the users. The font is licensed under OFL.
The font follows traditional orthography for Malayalam, rather than the unpleasing reformed orthography which was solely introduced due to the technical limitations of typewriters in the ’70s. Such restrictions do not apply to computers and present technology, so it is possible to render the classic beauty of Malayalam script using Unicode and Opentype technologies.
‘Sundar’ is designed by K.H. Hussain — known for his work on Rachana and Meera fonts which comes pre-installed with most Linux distributions; and Narayana Bhattathiri — known for his beautiful calligraphy and lettering in Malayalam script. Graphic engineers of STM Docs (stmdocs.in) did the vectoring and glyph creation. Yours truly took care of the Opentype feature programming. The font can be freely downloaded from rachana.org.in.
At SMC, we’ve been continuously working on improving the fonts for Malayalam – by updating to newer opentype standard (mlm2), adding new glyphs, supporting new Unicode points, fixing shaping issues, reducing complexity and the compiled font size, involving new contributors etc.
Recently, out of scratching my own itch, I decided that it is high time to fix the annoyance that combination of Virama(U+D04D ് ) with quote marks (‘ ” ‘ ’ “ ” etc) used to overlap into an ugly amalgam in all our fonts. Usually Virama(് ) connects/combines two consonants which makes all 3 into a new glyph – for example സ+്+ന is shaped into a new glyph സ്ന (Note that you need a traditional orthography font installed to see the distinction in this example. Many of them are available here) . The root of the problem is that sometimes when Virama(് ) appears individually in a word such as “സ്വപ്നം” it connects two consonants പ and ന, it is positioned above the x height of most glyphs and it shall not have much left and right bearing to avoid ugly spacing between the consonants പ and ന. Because of small side bearings, in fact of the negative right bearing (് protrudes beyond the right bearing) when a quote mark follows it, quote mark gets a little juxtaposed into Virama glyph and renders rather bad. The issue is quite prominent when you professionally typeset a book or article in Malayalam using XeTeX or SILE.
Fontforge’s tools made it easy to write opentype lookup rules for horizontal pair kerning to allow more space between Virama(് ) and quote marks. You can see the before and after effect of the change with Rachana font in the screenshot.
Update 26/03/2016: Many applications already support the kerning feature out of the box, including Firefox, SILE and VLC (3.0.0-git for subtitles) that I have tested, but many still need support, for instance LibreOffice, Kwrite etc. Here is a screenshot of VLC (3.0.0-git) taking kerning rules into account while displaying Malayalam subtitle.
Other fonts like AnjaliOldLipi, Meera and Chilanka also got this feature and those will be available with the new release in the pipeline. I have plans to expand this further to use with post-base vowels of വ(്വ) and യ(്യ) with abundant stacked glyphs that Malayalam has.
If you were following VLC development status (hey, you should follow the awesome Jean-Baptiste Kempf’s weekly updates!), you might have noticed some recent improvements on how VLC handles subtitle text rendering. In May 2015, the freetype module was improved to use Harfbuzz for text shaping. On the week of August 4, it was mentioned that the internals of VLC subtitle handling were completely rewritten . And in last week’s (October 26) update it mentioned Salah-Eddin added support for font fallback in the freetype module; which would mean that there is no need to set a specific font to display particular script/language.
All this combined, it should mean that complex text shaping and rendering for subtitles should work fine out of the box. To test this, I built the VLC 3.0.0-git master branch by checking out the code, creating a tar ball and adapting the spec file from RPMFusion to build RPM package. NOTE: don’t remove '.git*' files while creating tar ball, otherwise building would fail. Then edited/translated one of the .srt subtitle files and used that to play a movie. The result is – Malayalam subtitles are shaped and rendered beautifully!
Jean-Baptiste Kempf tells me that this should also work fine with Android (since version 1.6.90) as well as with Windows. Totem (GNOME Vidoes) have been displaying complex texts correctly since years but VLC lacked that feature till now. This is an awesome news for people who were limited in enjoying world movies in their own language. There are collectives like MSone where volunteers translate world movies’ subtitles to Malayalam and help those to reach wider audience.
This post is a promised followup from last November documenting intricacies of opentype specification for Indic languages, specifically for Malayalam. There is an initiative to document similar details in the IndicFontbook, this series might make its way into it. A Malayalam unicode font supporting traditional orthography is required to correctly display most of the examples described in this article, some can be obtained from here.
Malayalam has a complex script, which in general means the shape and position of glyphs are determined in relation with other surrounding glyphs, for example a single glyph can be formed out of a combination of independent glyphs in a specific sequence forming a conjunct. Take an example: ക + ് + ത + ് + ര => ക്ത്ര in traditional orthography. Note that in almost all the cases glyph shaping and positioning change such as this example is due to the involvement of Virama diacritic ” ് “. The important rules on glyph forming are:
When Virama is used to combine two Consonants, it usually forms a Conjunct, such as ക + ് + ത => ക്ത. This is known as C₁ conjoining as a half form of first consonant is joined with second consonant.
The notable exceptions to point 1 are when the followed Consonants are either of യ, ര, ല, വ. In those cases, they form the ‘Mark’ shapes of യ, ര, ല, വ => ്യ, ്ര, ്ല, ്വ. This is known as C₂ conjoining as a modified form of second consonant is attached to the first consonant.
When Virama is used to combine a Consonant with Vowel, the Vowel forms a Vowel Mark => such as ാ, ി, ീ.
Opentype organizes these glyph forming and shaping logic by a sequence of ‘Lookup tables (or rules)’ to be defined in the font. The first part gives an overview of the relevant lookup rules used for glyph processing by shaping engine such as Harfbuzz or Uniscribe.
Only those opentype features applicable for Malayalam are discussed. The features (or lookups) are applied in the following order:
akhn (Akhand – used for conjuncts like ക്ക, ക്ഷ, ല്ക്ക, യ്യ, വ്വ, ല്ല etc)
pref (Pre-base form – used for pre base form of Ra – ് + ര = ്ര)
blwf (Below base form – used for below base form of La – virama+La – ് + ല = ്ല)
half (Half form – Not used in mlm2 spec by Rachana and Meera, but used in mlym spec and might be useful later. For now, ignore)
pstf (Post base form – used for post base forms of Ya and Va – ് +യ = ്യ, ് + വ = ്വ. Note that യ്യ & വ്വ are under akhn rule)
pres (Pre-base substitution – mostly used for ligatures involving pref Ra – like ക്ര, പ്ര, ക്ത്ര, ഗ്ദ്ധ്ര etc)
blws (Below base substitution – used for ligatures involving blwf La – like ക്ല, പ്ല, ത്സ്ല etc. Note that ല്ല is under akhn rule)
psts (Post base substitution – used for ligatures involving post base Matras – like കു, ക്കൂ, മൃ etc)
abvm (Above base Mark positioning – used for dot Reph – ൎ)
Last 3 forms (pres, blws, psts) are presentation forms, they have lower priority in the glyph formation. They usually form the large number of secondary glyphs. The final one (abvm) is not a GSUB (glyph substitution lookup) but a GPOS (glyph position lookup) – this is used to position dotreph correctly above the glyphs.
akhn: Use this for conjuncts (കൂട്ടക്ഷരങ്ങള്) like ക്ക, ട്ട, ണ്ണ, ക്ഷ, യ്യ, വ്വ, ല്ല, മ്പ. This rule has the highest priority, so akhn glyphs won’t be broken by the shaping engine.
pref: Used only for pre-base form of Ra ര – ്ര
blwf: Used only for below base form of La ല – ്ല
pstf: Used for the post base forms of Ya, Va യ, വ – ്യ, ്വ
pres: One of the presentation forms, mostly used for ligatures/glyphs with pref Ra ര – like ക്ര, പ്ര, ക്ത്ര, ഗ്ദ്ധ്ര etc. This could also used together with the ‘half’ forms in certain situations, but that is for later.
blws: Used for ligatures/glyphs with blwf La ല – like ക്ല, പ്ല, ത്സ്ല etc.
psts: Used by a large number of ligatures/glyphs due to the post base Matras (ു,ൂ,ൃ etc) – like കു, ക്കൂ, മൃ etc. Other Matras (ാ,ി,ീ,േ,ൈ,ൈ,ൊ,ോ,ൌ,ൗ) are implicitly handled by the shaping engine based on their Unicode properties (pre-base, post-base etc) as they don’t form a different glyph together with a consonant – there is no need to define lookup rules for those matras in the font.
I will discuss these lookup rules and how they fit in the glyph shaping sequence with detailed examples in next episodes.
(P.S: WordPress tells me I started this blog 7 years ago on this day. How time flies.)
Swathanthra Malayalam Computing is a free software collective engaged in language computing, development, localization, standardization and popularization of various Free and Open Source Softwares in Malayalam language. SMC developers have contributed to various Indian language computing efforts including fonts, spell checkers, hyphenation patterns (used by TeX, Libreoffice, Firefox), input methods etc. Last year, SMC was selected as a mentor for Google Summer of Code program and we successfully mentored 3 student projects – a web application to store and process bibliography data of books with i8n support, port SILPA into Flask application and restructure into standalone modules and Automated Shaping&Rendering testing, primarily for HarfBuzz.
Together with Santhosh, I have mentored the Automated Shaping&Rendering testing framework which we use to test Malayalam font changes against HarfBuzz. It can also be used to test Uniscribe shaping engine if compiled in Windows, or used against HarfBuzz with Uniscribe backend.
SMC is selected as an organization to mentor for GSoC again this year. If you are a student who wants to work on interesting problems, look at our project ideas. One of the problems I am particularly interested and to mentor is adding Indic shaping support to ConTeXt. Apart from the listed ideas, you can propose other ideas as well. Read the FAQ, you can reach us by mailing list or via IRC #smc-project on freenode.net.
The Unicode fonts for Malayalam maintained by Swathanthra Malayalam Computing were last updated almost 2 years ago. They all were supporting just the v1 Indic opentype spec. But there were rendering problems with the fonts under Harfbuzz.
I was fortunate to attend the Open Source Language Summit 2012 (last year!) organized by Wikimedia Foundation and Red Hat (thank you, guys!) where many of the Indic language experts came together to work on issues at hand. The 2-days workshop helped me greatly to get much more insight into fonts, opentype spec and Harfbuzz in general. Since then I have been spending a lot of effort in updating and fixing the Malayalam fonts and also testing git snapshots of Harfbuzz and reporting issues to harfbuzz development list.
In the meantime, Harfbuzz matured enough and fixed many rendering issues. Thanks to the last Udupi hackfest by Behdad and Jonathan Kew, all known issues with Malayalam shaping has been addressed. And we were busy updating the fonts, opentype lookup rules and fixing bugs to work with old shapers (old pango, Qt, ICU Layout Engine, Windows XP) as well as the new ones (Harfbuzz, Uniscribe, Adobe). The v1 Indic opentype spec was a mess due to ‘undesirable’ Halant reordering (Consonant+Halant forms were ligated while it should have been Halant+Consonant). It has caused a lot of grief on the font developers and shaping engine developers side. With the v2 spec (mlm2 script tag for Malayalam), this has been changed and there is no need to perform Halant shifting anymore by shaping engines. I was leading the effort of porting to mlm2 spec of Malayalam fonts. We could port only Meera and Rachana for now, and RaghuMalayalam taken care by a few sedscripts.
During the 12th anniversary celebrations of Swathanthra Malayalam Computing group, the new version of fonts (5.1 supporting old shapers and 6.0 supporting new shapers) were released. See the email to smc-discuss for details. Remaining fonts also need to be updated, there is interest from community to collaborate on that. The new release will show up in Fedora 20.
In the process, I have learned quite some intricacies of the Indic opentype spec and would try to document them in a series of posts.
Meera is the default font for Malayalam in Fedora. Lately, a few bugs causing wrong rendering of some complex conjuncts were identified and reported – 1, 2. There was another bug reported in RedHat bugzilla. As Unicode 5.1 is not supported by smc-fonts, glyph Ⓡ was being displayed for Atomic Chillu letters. To make the font Unicode complying, it was suggested to remove this.
Yesterday a new release of Meera fixing all these bugs was made available by Suresh, and I’ve uploaded a new upstream source file at Savannah repository. Then the RPMs were rebuilt for rawhide, which can be found here.
And the ChangeLog reads:
* The glyphs(an R inside a circle) at unallocated Unicode code points are removed.
* Wrong glyph for 'th1s1r3u1' (ത്സ്രു) fixed.
* Breaking up of conjunct 'l3k1k1' (ല്ക്ക) fixed.
* Shaping issues for 'r3cil'+'l3l3' (ര്ല്ല) context fixed.
Thanks to Suresh, Ani Peter, Santhosh Thottingal and Pravin Satpute !