• Projects
  • $(whoami)

Soliloquies

  • GSoC 2009, Payyans and PyUNO

    February 28th, 2009

    SMC has proposals for Google Summer of Code 2009, and the initial ideas are put in the wiki page.

    A lot of documents, books and other information exists in the form of ASCII data in India. Quite a lot of them are in proprietary formats, usually in Microsoft Word format. Migrating them to the Unicode format is a daunting task, despite the existing tools for aide. Since we already have Payyans, which does a decent job for converting ASCII data (in either text format or PDF) in Malayalam. The project plan is to enhance Payyans in such a way that it handles:

    1. A bunch of Indic languages. We need to incorporate the language specific grammatical rules (prebase and postbase, etc) if they differ from the generic implementation.
    2. All the document formats supported by OpenOffice. Be it .DOC, .ODT, .DOCX… Payyans should be able to read them and convert to Unicode.

    I did a feasibility study and research on the second feature. As Payyans is written in Python, interaction with OpenOffice can be implemented by making use of PyUNO. What we need to have is, load an input file in .DOC or .ODT format, extract the text, and convert it to Unicode based on the ASCII font map.

    There are some examples on how to interact with OpenOffice using Python – how to start OpenOffice in listening mode, how to connect to the running instance, how to load a document and how to write some text to the current document or search and replace a phrase.

    But disappointedly, there is no reference on how to extract only the text, removing all the tags and formatting. Something that simple is not available anywhere in the wiki or forums! There are one or two code snippets to do this, but they didn’t work. I tried using the Enumeration technique, but the output text was cryptic. No luck.

    After couple of days exhaustive searching through reference documents, I found out the way to convert the files to text files. The below code snippet does exactly that. First, an instance of OpenOffice has to be started. It could be started in the “headless” mode, where you won’t see the window. Perfect. Do that this way : openoffice.org "-accept=socket,host=localhost,port=2002;urp;StarOffice.ServiceManager" -nologo -headless &

    Pass the input file name (file could be in any format OpenOffice understands) and output file name (output file will be Text format) as commandline arguments to this Python program:

    #!/usr/bin/env python
    #
    # convertTotext.py
    #
    # Copyright (c) 2009 Rajeesh K Nambiar <rajeeshknambiar@gmail.com>
    #
    # This program is free software; you can redistribute it and/or modify
    # it under the terms of the GNU General Public License as published by
    # the Free Software Foundation; either version 3 of the License, or
    # (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU General Public License for more details.
    import sys
    #
    # This hack is required since pyuno installed is of standalone OOO-3.0 instead of Fedora's
    #sys.path.append('/opt/openoffice.org/basis3.0/program/')
    #import pyuno
    #
    # This bloody hack is required due to the PyUno bug
    import os
    #os.putenv('URE_BOOTSTRAP','vnd.sun.star.pathname:/opt/openoffice.org3/program/fundamentalrc')
    import uno
    #
    # get the uno component context from the PyUNO runtime
    localContext = uno.getComponentContext()
    #
    # create the UnoUrlResolver
    resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
    # connect to the running office
    ctx = resolver.resolve( "uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext" )
    smgr = ctx.ServiceManager
    #
    # get the central desktop object
    desktop = smgr.createInstanceWithContext( "com.sun.star.frame.Desktop",ctx)
    #
    # access the current writer document
    #model = desktop.getCurrentComponent()
    infile = "file://" + os.path.abspath(sys.argv[1])
    outfile = "file://" + os.path.abspath(sys.argv[2])
    document = desktop.loadComponentFromURL(infile, "_blank", 0, ())
    #import uno
    # Needed for FilterName - to export to TXT
    from com.sun.star.beans import PropertyValue
    TXT = PropertyValue()
    TXT.Name = "FilterName"
    TXT.Value = "Text"
    document.storeAsURL(outfile, (TXT,))
    #
    # Close the document
    document.dispose()
    #
    # Do a nasty thing before exiting the python process. In case the
    # last call is a oneway call (e.g. see idl-spec of insertString),
    # it must be forced out of the remote-bridge caches before python
    # exits the process. Otherwise, the oneway call may or may not reach
    # the target object.
    # I do this here by calling a cheap synchronous call (getPropertyValue).
    ctx.ServiceManager

    I need a break!

    Advertisement
  • VLC crashing… Qt bug?

    February 8th, 2009

    My dry weekends used to be lightened up by some old classics. But two days back I received my copy of Gran Torino, which makes it out of the question what am I upto, on Sunday eve.

    Alas, after the opening scene, VLC simply disappeared. I just fired up a terminal to see what happens, and this is what I saw:
    No accelerated IMDCT transform found
    QPainter::begin: Paint device returned engine == 0, type: 1
    QPainter::setClipRegion: Painter not active
    QPainter::setClipping: Painter not active, state will be reset by begin
    QPainter::begin: Paint device returned engine == 0, type: 1
    Segmentation fault

    Mplayer plays the movie well, but only VLC has the provision to raise volume to 400% (I found mplayer can also do it, thanks Mace), which comes handy with the weak Laptop speakers. So, I did a “gdb vlc” and saw this:
    [Thread 0xac5f3b90 (LWP 3627) exited]
    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0xb2b20b90 (LWP 3607)]
    0x00b377ec in _int_malloc () from /lib/libc.so.6
    (gdb) bt
    #0  0x00b377ec in _int_malloc () from /lib/libc.so.6
    #1  0x00b39765 in malloc () from /lib/libc.so.6
    #2  0x00542297 in operator new () from /usr/lib/libstdc++.so.6
    #3  0xb2d4b526 in QMutex::QMutex () from /usr/lib/libQtCore.so.4
    #4  0xb2d4de94 in QThreadData::QThreadData () from /usr/lib/libQtCore.so.4
    #5  0xb2d5044d in QThreadData::current () from /usr/lib/libQtCore.so.4
    #6  0xb2e5168d in QObject::QObject () from /usr/lib/libQtCore.so.4
    #7  0xb2d4d5c0 in QThread::QThread () from /usr/lib/libQtCore.so.4
    #8  0xb2d4e17e in ?? () from /usr/lib/libQtCore.so.4
    #9  0xb2d50479 in QThreadData::current () from /usr/lib/libQtCore.so.4
    #10 0xb2e5168d in QObject::QObject () from /usr/lib/libQtCore.so.4
    #11 0xb2d4d5c0 in QThread::QThread () from /usr/lib/libQtCore.so.4
    #12 0xb2d4e17e in ?? () from /usr/lib/libQtCore.so.4
    #13 0xb2d50479 in QThreadData::current () from /usr/lib/libQtCore.so.4
    #14 0xb2e5168d in QObject::QObject () from /usr/lib/libQtCore.so.4
    #15 0xb2d4d5c0 in QThread::QThread () from /usr/lib/libQtCore.so.4
    #16 0xb2d4e17e in ?? () from /usr/lib/libQtCore.so.4
    #17 0xb2d50479 in QThreadData::current () from /usr/lib/libQtCore.so.4
    #18 0xb2e5168d in QObject::QObject () from /usr/lib/libQtCore.so.4
    #19 0xb2d4d5c0 in QThread::QThread () from /usr/lib/libQtCore.so.4

    I guess it was all running fine a day back. I just did an update Fedora updates repo this morning, don’t know if its related. Some other movies I have play fine, but some crashes in a similar fashion. Need to investigate more.

    Update:

    Apparently, the issue is caused when dealing with subtitles. That is why I didn’t see problem with many other movie files. This bug entry has a solution, to update to the updates-testing branch, and that fixes the issue.

  • Chathans

    February 7th, 2009

    A few months ago, one fine morning I logged into #smc-project IRC channel. An unusual number of members were present on that day, and we were completing the process of getting aspell-ml into Fedora. Meanwhile, Santhosh and Nishan were in the process of creating an ASCII-to-Unicode converter. The discussions and talks went light, and we went into a myriad of topics. Somehow, the discussion turned into Ani Peter’s “blog inauguration“, and then Santhosh pointed out at my Malayalam blog. That prompted him to ask if I am a fan of V.K.N. I said yes, and he replied the feeling is mutual. But a few guys in the list didn’t know who “Payyan” is, or “Chathans” is.

    Payyans and Chathans are arguably the most famous characters by this exceptional genius V.K.N.

    Few days later, Santhosh and Nishan released the ASCII-to-Unicode converter, and they named the software as “Payyans” !

    Couple of weeks after, I was trying to learn GTK+ programming, and Glade. As an experiment, I wrote a small GUI for Payyans in GTK, which resulted in a patch to Payyans and a tiny GTK+ application, which I duly named as “Chathans”.

    A few months withered away, I learned a little PyGTK programming, and I rewrote Chathans in PyGTK. Payyans was also improved in a substantial way, gaining internal APIs, so that services of Payyans can be used directly inside Python applications, feature for bidirectional conversion et al. I have added the ability for translation to Chathans later.

    So, we have released Chathans, documented the steps to obtain, install and use in SMC wiki.

    chathans

    As always – comments, bug reports and patches are welcome.

←Previous Page
1 … 25 26 27 28 29
Next Page→

Proudly powered by WordPress

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
 

Loading Comments...
 

    • Follow Following
      • Soliloquies
      • Join 25 other followers
      • Already have a WordPress.com account? Log in now.
      • Soliloquies
      • Edit Site
      • Follow Following
      • Sign up
      • Log in
      • Report this content
      • View site in Reader
      • Manage subscriptions
      • Collapse this bar