AGL 38.02 Increased By ▲ 0.08 (0.21%)
AIRLINK 197.36 Increased By ▲ 3.45 (1.78%)
BOP 9.54 Increased By ▲ 0.22 (2.36%)
CNERGY 5.91 Increased By ▲ 0.07 (1.2%)
DCL 8.82 Increased By ▲ 0.14 (1.61%)
DFML 35.74 Decreased By ▼ -0.72 (-1.97%)
DGKC 96.86 Increased By ▲ 4.32 (4.67%)
FCCL 35.25 Increased By ▲ 1.28 (3.77%)
FFBL 88.94 Increased By ▲ 6.64 (8.07%)
FFL 13.17 Increased By ▲ 0.42 (3.29%)
HUBC 127.55 Increased By ▲ 6.94 (5.75%)
HUMNL 13.50 Decreased By ▼ -0.10 (-0.74%)
KEL 5.32 Increased By ▲ 0.10 (1.92%)
KOSM 7.00 Increased By ▲ 0.48 (7.36%)
MLCF 44.70 Increased By ▲ 2.59 (6.15%)
NBP 61.42 Increased By ▲ 1.61 (2.69%)
OGDC 214.67 Increased By ▲ 3.50 (1.66%)
PAEL 38.79 Increased By ▲ 1.21 (3.22%)
PIBTL 8.25 Increased By ▲ 0.18 (2.23%)
PPL 193.08 Increased By ▲ 2.76 (1.45%)
PRL 38.66 Increased By ▲ 0.49 (1.28%)
PTC 25.80 Increased By ▲ 2.35 (10.02%)
SEARL 103.60 Increased By ▲ 5.66 (5.78%)
TELE 8.30 Increased By ▲ 0.08 (0.97%)
TOMCL 35.00 Decreased By ▼ -0.03 (-0.09%)
TPLP 13.30 Decreased By ▼ -0.25 (-1.85%)
TREET 22.16 Decreased By ▼ -0.57 (-2.51%)
TRG 55.59 Increased By ▲ 2.72 (5.14%)
UNITY 32.97 Increased By ▲ 0.01 (0.03%)
WTL 1.60 Increased By ▲ 0.08 (5.26%)
BR100 11,727 Increased By 342.7 (3.01%)
BR30 36,377 Increased By 1165.1 (3.31%)
KSE100 109,513 Increased By 3238.2 (3.05%)
KSE30 34,513 Increased By 1160.1 (3.48%)
Pakistan

Urdu Search Engine being developed to meet linguistic needs

ISLAMABAD: The authorities concerned are engaged in developing Urdu Search Engine to help address national and lingu
Published August 4, 2017

ISLAMABAD: The authorities concerned are engaged in developing Urdu Search Engine to help address national and linguistic needs and incubate much needed expertise in this area of research and development.

The project works in three aspects, focusing on high performance distributed computing, content search optimization and local content management.

Online content will be crawled and could be optionally filtered as per user opt-in requests. This will require developing both language identification and filtering algorithms.

Once the information is sifted, the indexing scheme will be tuned for efficient retrieval of the information. The content will also be summarized for quicker access through computers and mobile phones, and will be stored initially using Amazon Web Services and later on a local compute infrastructure. Presentation of the Urdu content will be tuned for access through the various devices.

National ICT Research and Development Fund, Ministry of Information Technology and Telecommunications is funding this project being developed by Al-Khwarizmi Institute of Computer Science, University of Engineering and Technology, Lahore at a cost of Rs. 33.22 million. The project is expected to be completed during 2019.

Official sources on Friday said though based on open source search technology, the work still presents multi-faceted challenges.

Automatic language identification is needed to ensure that Urdu content is appropriately tagged after crawling, and not mixed with content of Persian, Arabic, Pushto and other languages with common vocabulary. Further, search should be linguistically intelligent, ignoring Urdu stop words, providing proper tokenization and searching through different morphologically relevant forms of Urdu keywords.

The sources said in addition, ranking and ordering the resulting pages as per the optional user initiated filtering is needed. Finally, both summarizing the results for mobile phone access and determining user's choices and linguistically acceptable presentation forms require detailed analysis for implementation.

To get the initial user base for search engine, a marketing campaign will be organized. As user base strengthens, online contextual advertising and other services will also be initiated, to enable revenue generation for sustainability and growth of the project.

There will also be opportunity to make the user search trend data available for commercial use and policy development. In addition, the language technology for language identification, text summarization, content filtering, etc. can be independently commercialized.

The possibility to get relevant Urdu information from online sources, with access through mobile phones, provides a great opportunity to general public across Pakistan. This also opens further research opportunities to provide similar services in other Pakistani languages. The data will be crucial to spark better online marketing at more affordable rates and to drive policy around online content development and its presentation. Thus, the project presents both social and economic promise at a national scale.

It is mentioned here research indicates that indigenously developed search engines, are more successful in the communities accessing localized content, primarily because they offer language and culture specific services.

For example, Google only has 8%, 22 % and 31 % share in search market in South Korea, China and Japan respectively, till 2012, which is considerably smaller share than the search engines developed locally.

Copyright APP (Associated Press of Pakistan), 2017

Comments

Comments are closed.