VLC now render subtitles in South Asian scripts!

If you were following VLC development status (hey, you should follow the awesome Jean-Baptiste Kempf’s weekly updates!), you might have noticed some recent improvements on how VLC handles subtitle text rendering. In May 2015, the freetype module was improved to use Harfbuzz for text shaping. On the week of August 4, it was mentioned that the internals of VLC subtitle handling were completely rewritten . And in last week’s (October 26) update it mentioned Salah-Eddin added support for font fallback in the freetype module; which would mean that there is no need to set a specific font to display particular script/language.

All this combined, it should mean that complex text shaping and rendering for subtitles should work fine out of the box. To test this, I built the VLC 3.0.0-git master branch by checking out the code, creating a tar ball and adapting the spec file from RPMFusion to build RPM package. NOTE: don’t remove '.git*' files while creating tar ball, otherwise building would fail. Then edited/translated one of the .srt subtitle files and used that to play a movie. The result is – Malayalam subtitles are shaped and rendered beautifully!

VLC rendering Malayalam subtitle

VLC rendering Malayalam subtitle

Jean-Baptiste Kempf tells me that this should also work fine with Android (since version 1.6.90) as well as with Windows. Totem (GNOME Vidoes) have been displaying complex texts correctly since years but VLC lacked that feature till now. This is an awesome news for people who were limited in enjoying world movies in their own language. There are collectives like MSone where volunteers translate world movies’ subtitles to Malayalam and help those to reach wider audience.

Kudos to the awesome Videolan team!

Advertisements

smc-fonts (Meera) updated

Meera is the default font for Malayalam in Fedora. Lately, a few bugs causing wrong rendering of some complex conjuncts were identified and reported – 1, 2.  There was another bug reported in RedHat bugzilla. As Unicode 5.1 is not supported by smc-fonts, glyph Ⓡ was being displayed for Atomic Chillu letters. To make the font Unicode complying, it was suggested to remove this.

Yesterday a new release of Meera fixing all these bugs was made available by Suresh, and I’ve uploaded a new upstream source file at Savannah repository. Then the RPMs were rebuilt for rawhide, which can be found here.

And the ChangeLog reads:

* The glyphs(an R inside a circle) at unallocated Unicode code points are removed.
* Wrong glyph for 'th1s1r3u1' (ത്സ്രു) fixed.
* Breaking up of conjunct 'l3k1k1' (ല്ക്ക) fixed.
* Shaping issues for 'r3cil'+'l3l3' (ര്‍ല്ല) context fixed.

Thanks to Suresh, Ani Peter, Santhosh Thottingal and Pravin Satpute !

Kuttans – another frontend for Payyans

Few months ago I released Chathans, a frontend to the Payyans ASCII <=> Unicode converter. Few weeks later, Santhosh forwarded a mail from Rahul with another neat implementation of a frontend completely written in Qt4. He named it Kuttans as a pun on Qt+Payyans.

I liked the User Interface at the first look. But it was using “system()” call to interact with Payyans. So we decided to reimplement it in PyQt4. Based on the UI designed in Qt Creator, the python ui class is generated with pyuic4; and the resources (icons…) with pyrcc4 (both from the PyQt4-devel) package.

Features include:

  • Support for Creating, Displaying, Editing and Saving ASCII/Unicode files
  • Support for all available fonts. Useful in determining the ASCII font if the font is not known for original ASCII document
  • All the standard Cut/Copy/Paste/Undo/Redo functionalities

kuttans

RPM package and source tar ball can be obtained from Savannah repository. Source can be browsed in Savannah git repo.

Future improvements:

  • Support for displaying PDF files (using python bindings for poppler-qt4)
  • .deb package for Debian/Ubuntu

Chathans

A few months ago, one fine morning I logged into #smc-project IRC channel. An unusual number of members were present on that day, and we were completing the process of getting aspell-ml into Fedora. Meanwhile, Santhosh and Nishan were in the process of creating an ASCII-to-Unicode converter. The discussions and talks went light, and we went into a myriad of topics. Somehow, the discussion turned into Ani Peter’s “blog inauguration“, and then Santhosh pointed out at my Malayalam blog. That prompted him to ask if I am a fan of V.K.N. I said yes, and he replied the feeling is mutual. But a few guys in the list didn’t know who “Payyan” is, or “Chathans” is.

Payyans and Chathans are arguably the most famous characters by this exceptional genius V.K.N.

Few days later, Santhosh and Nishan released the ASCII-to-Unicode converter, and they named the software as “Payyans” !

Couple of weeks after, I was trying to learn GTK+ programming, and Glade. As an experiment, I wrote a small GUI for Payyans in GTK, which resulted in a patch to Payyans and a tiny GTK+ application, which I duly named as “Chathans”.

A few months withered away, I learned a little PyGTK programming, and I rewrote Chathans in PyGTK. Payyans was also improved in a substantial way, gaining internal APIs, so that services of Payyans can be used directly inside Python applications, feature for bidirectional conversion et al. I have added the ability for translation to Chathans later.

So, we have released Chathans, documented the steps to obtain, install and use in SMC wiki.

chathans

As always – comments, bug reports and patches are welcome.

English-Malayalam Dictionary beta release

As mentioned earlier, we are ready with the beta release of English-Malayalam dictionary in DICT format. The RPM, DEB and source packages can be found here in Savannah repository. The steps for installing, configuring and using are documented in SMC Wiki.

The next step is Malayalam-English and English-Malayalam dictionaries, though their plausibility greatly depends on the favourability of CDAC’s reply who owns the required data. We are expecting and hoping that they will make the data available under a free software compatible license.

English-Malayalam Dict [RFC2229]

When Santhosh Thottingal sent out the task to create English-Malayalam/Malayalam-English dictionary conforming to Dict Protocol, I didn’t care much. Just took a look and left it there. But later when he pinged and urged me to take it up – providing many required resources – I just thought I’ll take a look at it. And thus started scratching another itch.

The Govt of Kerala is well known for its support for Free/Open Source Software. And they’ve been doing a pretty good job. But I was surprised when I got the link to an English-Malayalam Dictionary with a Python frontend. And the best part is this – it is GPL’ed.

And I set out to convert the data found inside to suite to the Dict Protocol [RFC2229]. An ugly shell script turned out to be a nice one after 3 days of carving and craving.

This is how it is done:

  1. Format the input file in the format :  {headword\n\tdefinitions}.
  2. Use dictfmt to convert to Dict format : dictfmt -f –utf8 -s Dict-English-Malayalam -u smc.org.in dict-en-ml < <input_file> && dictzip dict-en-ml.dict
  3. This will create two files dict-en-ml.dict.dz & dict-en-ml.index.
  4. Install “dictd“.
  5. Create folder “/usr/share/dictd” if it doesn’t exist.
  6. Copy dict-en-ml.dict.dz and dict-en-ml.index to “/usr/share/dictd
  7. Create file “/etc/dict.conf” and edit it. Put “server  localhost” and save.
  8. Create file “/etc/dictd.conf” and edit. Put :      “database Eng-Mal  {data “/usr/share/dictd/dict-en-ml.dict.dz” \n\t index “/usr/share/dictd/dict-en-ml.index”}
  9. Start the dictd service by “/etc/rc.d/init.d/dictd start
  10. Use your favourite dictionary frontend and lookup!

And, here’s a preview as well:

dictionary

There’s still some more work to do, viz. incorporating the grammatical components (like Noun, Verb etc).

We, at Swathanthra Malayalam Computing hope to release it soon, and even an RPM package as proposed by Sankharshan Mukhopadyay.

Stay tuned.