Pakistan: Internet and the challenge of language

This post is part of our special coverage Languages and the Internet.

Pakistan today would seem primed for rapid growth in internet use. The country has had explosive growth of FM radio, satellite and cable TV set in motion by regulatory changes that allow non-state ownership of mass media. Cell phone use has also skyrocketed, with over 90 million subscribers. With a growing middle class that numbers some 30-40 million in a country of some 180 million people, Internet use should also see similar growth.

However, there are several constraints that mitigate that expansion, both structural, as in chronic electricity shortages, and social, particularly focused on language. Literacy hovers at around 50% in Pakistan, but while most people understand Urdu, Pakistan’s national language, less that 10% of the population speaks and writes it as a native. Provincial languages such as Sindhi, Punjabi, Pashto, and Balochi, as well regional languages such as Seraiki and Kashmiri are native languages for the majority of the population, and English is the official language of governance.

This language fragmentation has consequences for internet use. No one Pakistani language effectively serves both the reading and content creation needs of Pakistan’s netizens. As a consequence, English remains the popular choice online. In an interview, Adnan Rehmat of Intermedia Pakistan says that English is an “aspirational” language, a marker for education and access to resources, and because English provides access to a global linguistic community. Additionally, several regional Pakistani languages such as Punjabi are primarily oral languages, without strong literary cultures.

Fouad Bajwa, writing on Internet’s Governance, describes the problem further:

A key pressing issue with relevance to both the local Internet and Mobile Technology scenario in Pakistan has been availability of local content and making the local content widely accessible to the community at large across Pakistan and the entire world using a variety of currently available technology platforms.

There have been few concerted efforts to create Unicode fonts for Pakistani language scripts. Nastaliq, the popular font for Urdu, is not yet widely adapted in Unicode. Online writing in the main either uses an Arabic font, as with the relatively popular BBC and Google fonts, or it uses image files pasted into text.

There is not yet a broadly accepted font in use for either mass media of citizen media production. Many mainstream media still use image files, which requires that the text be composed on another platform, and discourages hyperlinking, as with a recent issue of the Daily Jang online.

Screenshot of Daily Jang ePaper, May 2010

The Pakistani government has provided little policy guidance for language use. In an interview Ahmad Shahzad of Bytes for All notes that the National Language Authority of Pakistan lacks resources, knowledge of digital issues, and a sense of urgency or policy priorities for Pakistani language expansion online.

There are a number of projects that have been working to fix this problem over the past decade. Perhaps the most comprehensive comes out of the Centre for Research in Urdu Language Processing, at Lahore’s National University of Computer & Emerging Sciences (CRULP). The Centre’s director, Professor Sarmad Hussain, has been working to support Nastaliq in Unicode since 2002. They describe their objective to ”conduct research for the evolution of computational models of Urdu and Pakistan’s other regional languages.” Their projects develop standard character sets, localize popular software and online applications, such as Microsoft Word, Firefox, and Open Office, and script processing for fonts that can support all Pakistani languages.

They are also working on optical character recognition and speech processing  tools such as screen readers for the illiterate and blind users, and language processing tools such as spell checkers and machine translation. CMS platforms in Nastaliq, as well as mobile scripts.

Additionally, CRULP’s PAN Localization project is working to develop local language computing capacity in a dozen Asian languages, including Urdu, Pastho, and Bangla. The project seeks to develop tools to facilitate the use localization of advanced applications.

These scripts and their wider promotion, as well as the availability of content management systems in Urdu and language processing tools, has gone some way to making Urdu a functional language of content creation.

Other tools now available facilitate the shift from English to Urdu, including Google’s Urdu transliteration tool and the Dynamic Language Tools Bookmarklet, which supports transliteration of Urdu to both English and Hindi. Syed Ghulam Akbar, the bookmarklet’s creator, describes his motivation in a post on the Pakistani science blog STEP:

The main inspiration behind this tool development was not actually Urdu writing. In fact, there are many existing tools and applications which let users type Urdu either using a special keyboard layout or by using roman script transliteration. What actually inspired me to develop this tool was to provide a way to easily convert the roman content on all the existing web-pages to Urdu script so that it is more readable.

Together, the advancement of scripts, applications, and platforms in Urdu will go some way to advancing a culture of online production in Urdu. The relative lag in their availability does, however, highlight the general sense that English will continue to be the language of choice for many in Pakistan’s online world.

This lag can be addressed in several ways, including wide promotion of available tools and their application, support for both mass media and citizen media communities to discover, learn about, and implement creative use of these tools, and support to build bridges and networks among communities. For this reason, Fouad Bajwa is seeking to build an Online Urdu Encyclopedia:

It will create a converged environment overtime for presenting updated knowledge that is usable through reading, listening and visuals for both social and economic awareness, education, knowledge application in various fields, higher education, competitive exams, expert resources and endless Urdu language options.

At present there is no Urdu Wikipedia community, and few Urdu-language blog aggregators, such as http://urdublogs.co.cc/, capacity among mainstream media to produce searchable-text, Unicode-based online media, and a lack of mobile telephony platforms and applications for Urdu.

This post is part of our special coverage Languages and the Internet.

11 comments

  • […] May 5, 2010 in General | Tags: ICT4D, pakistan This post was originally published on Global Voices. […]

  • I wish you would have researched a little more before writing this piece because some of the statements don’t seem to reconcile with the facts.

    1. Urdu Wikipedia is an active community ur dot wikipedia dot org
    2. There’s urdu mehfil forum which is very active no matter how you define “ACITE”.
    3. Several urdu blogs are written/update on daily basis (Urdu Web Planet, Mawarai Feed, Urdu ke sab rang are the aggregators to name a few).

    Urdu blogging community is using Nastaliq unicode font for a long time now; also Nafees and Naskh are widely used and almost every linux distribution and newer operating systems have support for urdu built into them.

    Jahan-e-Qalam has added tens of unicode based urdu fonts to the archive and still an active community when it comes to fonts and all.

    The news paper screen shot in the post is from the eEdition of the published paper otherwise almost every news outlet has it’s own unicode based urdu web site. Jang, express, Aaj TV to new a few.

    Yes there are problems, yes there are issues related to machine translation and transliteration but the situation is not as grim as was painted in the article.

    Note: Links were skipped to avoid spam filters; but any of the resource pretty much link them all.

  • Thanks for your comments and corrections Rashid:

    1. Why is there no link to the Urdu Wikipedia site on the wikipedia.org language list?
    2. I didn’t mean to imply that there’s only one Urdu blog aggregator; only that there are considerably more in English.

    My comments on blogger and mainstream media use of unicode Nastaliq come from interviews with numerous publications, TV stations, and bloggers – the point is not that is doesn’t exist, but that the energy in the Pakistani blogosphere is visibly in English, and is felt to be such by the people I spoke to. I’m glad to hear that not everyone feels the situation is so difficult.

    Yes, many papers have unicode-based sites – I absolutely agree; but not that many have Nastaliq unicode – Express is in the process of building their own, for instance. the epaper image (note I said it’s epaper) was simply there to illustrate the standard font.

    Happy to learn more and appreciate your corrections.

  • “1. Why is there no link to the Urdu Wikipedia site on the wikipedia.org language list?”

    I think you are missing something again. The link of Urdu Wikipedia site is exist on wikipedia.org. Please fine the word “اردو”.

  • We have quite of Urdu bloggers in Pakistan but due to not so user friendly Urdu blogging they are discouraged. Its because all of focused on English site when internet was launched in Pakistan, then we jumped into blogging which has boomed in recent past. Now the Urdu bloggers have realized development for Urdu sites that might help promote their work as well. I hope in a year or so we will numerous Urdu blog aggregators and sites as well.

  • Ivan; I really appreciate that you spared some time to respond. Just in case if anyone following the comments

    1. There’s is indeed a link to Urdu Wikipedia on Wiki’s home page; they don’t list too many languages in the main search drop down though. But search does work on Urdu Wiki

    2. Why papers don’t use Nastaliq; their choice. Technology is available and out there. Still without any fancy fonts the apps are equally useful.

    Plus we have already added an iPhone app (Urdu Blog Reader) to aggregate and link urdu blogs on this exploding mobile platform.

    Urdu blogging is relatively new and I agree is slow to respond to the global events but lately a lot of people have started taking urdu blogging very seriously and now we can say with confidence that entire political and ideological spectrum is well represented and anything from Politics to technology is debated. Some of the well known columnist and some young journalist have also started their Urdu blogs and this could make it mainstream rather quickly. Things gonna change sooner than later.

    I thank you for bringing these thing to this platform and I hope readers who didn’t know about this very aspect of Urdu’ internet presence can now explore it further.

  • I’m wondering what are the linguistic and technological constraints for content that’s not too dependent on writing systems such as videos and audios. I can’t seem to find many video or audio blogs in Urdu or Pashto. Maybe because posting videos and audios take more steps than simply tapping text on a keyboard? Or maybe because my search phrases are in English and non-Roman-alphabet blogs are indexed in whatever script they are written about? Or is it a chicken-and-egg problem where the lack of online communities communicating and sharing in these languages only increases the motivation for people to blog in English where they can find more community – the social need trumping available technical tools in most cases? CMS’s like Joomla, Drupal and WordPress have modules for Urdu and/or Pashto translations. Maybe we can learn from the growth of Global Voices? When do online communities achieve critical mass? How can media development organizations use the research of linguistic organizations like the Frontier Language Institute – http://www.fli-online.org/ – or SIL – http://www.sil.org/?

  • Thank you for a great review of Pakistani media. I would like to tell you we got a rich Urdu Blogging community. According to my opinion English is always confusing for Pakistanis. Whatever the literacy rate is in Pakistan, but a person who can write his name in English doesn’t mean he can understand a blog written totally in English, Urdu Blogging is something that is very easy to understand and discuss for Pakistanis but problem is, It is not getting any attention by Government and other media like Television & Radio. Urdu Blogging is doing good in last few years, arranging contests among bloggers, celebrating blogging weeks, issuing award to bloggers who are writing outstanding and publishing monthly reviews. One important community pillar is Manzarnama dot com, Urdu Blogs aggregator is also a great effort to gather thoughts at one place and there is a great community available at urdu wikipedia too.

  • يا سر ~ ɹısɐʎ

    * Thanks, Ivan, for the article which brings the issues together concisely. I am happy to hear great comments.
    * The energy that urdu blogs are generating is obvious. some of my friends who are writers in urdu have taken up blogging recently.
    * But most importantly its the tools, and Ivan is right that we need to spread tools that are available and to focus on developing some. While the transliteration tool does help, we still need the translate tool urgently.
    * Since Google has worked out the Hindi, Persian and Arabic translate tools, Urdu, whose grammar-syntax is like Hindi while script Persian variant, and some vocabulary is shared between Hindi and Persian, this should not be something we should have to wait for, in accessing translated web content on-the-fly.
    * Best wishes. Yasir, Karachi.

  • يا سر ~ ɹısɐʎ

    Urdu has just been added along with 4 other languages to google translate which means you can now translate automatically to and from, websites, blogs, gmail, or any other text.
    http://googletranslate.blogspot.com/2010/05/five-more-languages-on.html

Join the conversation

Authors, please log in »

Guidelines

  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.