Thursday, March 03, 2011

Jeopardy Computer, Sanskrit, Artificial Intelligence

mar 1st, 2011 CE

thanks for your kind note. 

personally, as a computer scientist at one point in my life, i am quite familiar with machine understanding of grammar.

panini's grammar of 2500 years ago is the first-ever 'context-free grammar' that can be represented in the panini-backus form (also known as the backus-naur form), meaning each statement in paninian sanskrit is unambigous and has a single meaning. it can also be parsed using a lexical analyzer like lex and a semantic tree of the meaning can be built up automatically. 

the concept of context-free grammars was reinvented by backus and naur at IBM in the 1950s, to be able to provide unambiguous instructions to computers.

actually the key question regarding the jeopardy computer is whether it actually *understands* anything. here's a very interesting article by a philosopher at uc berkeley.  http://online.wsj.com/article/SB10001424052748703407304576154313126987674.html?mod=googlenews_wsj

---------- Forwarded message ----------
From: A


Rajeev:

I wanted to post the following in the comments section at the post where you requested information on ideas about the Jeopardy computer.  I have a long post on this.  Blogger does not let me post more than 4000 words in one comment.  I have many interesting points on this topic.  If possible, could you please put the following content matter in a separate post on the main blog?  I think it deserves attention.

Also, please ensure my name will not appear in the post.  I want to be anonymous.  You could maybe use only my first name.

Thanks for supporting the cause.

A

__________________________________________________________________________________________________


The concept of the Jeopardy computer is something that was imagined quite a while ago.  It is
nothing but Artificial Intelligence.  And as Indians, we are expected to know that Sanskrit is an
ideal language to implement Artificial intelligence.  You will ask me why?  I will explain below.
It is a bit long, but I think very insightful.

(I think if your audience is Indian, you should emphasize that Sanskrit language will make the
ultimate Jeopardy computer)

The big feature of the sanskrit language is the fact that changing the sequence of words in a
sentence will:

1. Not make the sentence lose meaning
2. Make the altered sentence have the same meaning as the original sentence.

(even many Indian languages share this feature.  Try it in Malayalam- you will understand)

The feature within Indian languages of making the sequence of words within a sentence irrelevant
is most underrated.  This is something that needs to exploited and used. The benefits can be very
significant.   How is it so?  I will try to explain some of my ideas below.

In order to get a computer to comprehend human instructions today, the sentences we feed it have
to be in a fixed format only because the english language requires that words in a sentence be in
a particular format (besides the programming that went to creating the compilers).
If sufficient programming effort were conducted and a compiler that would read sanskrit be
created, no such limitations will exist.  We could get a computer to accept a group of words in
any sequence within a sanskrit sentence (like in real life) and identify all parts of speech and obtain the meaning.
Through suffixes added to the ending of every root word (Which could be a noun, verb, adjective
etc), the meaning of the word changes such that its role within a sentence is uniquely
identifiable.  No additional words will be required to be placed in that sentence in any
particular sequence to explain what that word does (Like in english, which causes major confusion)

Due to this, a word in an Indian language provides much more information than a similar word in
english.  A verb describing some action within a sentence in Indian Indian language will also
provide information on whether the action is performed by a male or a female, a single person or a
group of people, in the past or in the future.  A verb in an english sentence does not tell us
anything other than the type of action performed.  This is also the case with other parts of
sentences like nouns adverbs etc.
 
Such rules will reduce the programming effort in creating an Artificial intelligence computer
similar to the one used to compete in Jeopardy.

The rules also contribute to the absence of ambiguity within the sanskrit language.  In a sentence there
will be little doubt as to the purpose of each word, irrespective of where it is placed.  This
will greatly improve the succesful interpretation of real life sentences by a computer and let us
design more tools to increase productivity.  

The effectiveness of Jeopardy computers will ultimately be limited by the restrictions of the
english language.  These restrictions do not apply to the sanskrit language.

Imagine that you have a 100 pages of text in sanskrit.  You want to input the 100 pages of text to
the computer program in order to get it to comprehend the content so that you could query it for
information.  This can only be done effectively using a language like sanskrit.

For more information on how sanskrit is an ideal language for computers, click on the link below:

http://www.aaai.org/ojs/index.php/aimagazine/article/view/466/402

Features of the sanskrit language also will enable effective translation between Indian languages
like never achieved before.

All Indian languages trace their roots to Panini's Sanskrit grammar.  They are similar in
structure.  The closer the grammar of two languages are to Panini Grammar, the easier and more
accurate computer translation becomes possible between the two languages.  We know words in an
Indian language will consist of a root-word with a meaning altering word-ending (or suffix). 
Therefore, to convert a sentence in one Indian language to another,  one has simply to replace the
root-word in one language with the root-word in the other language. Then the word-ending in one
language must then be replaced with the corresponding word-ending in the other language.  When
this is done to all words in the sentence, the new sentence created in the second language will
still be a meaningful sentence, and it will have the same meaning as the sentence in the original
language.  There will be no ambiguity.  No meaning will be lost in translation.  You could repeat
the above process for all sentences in an essay and automatically translate it to as many
compatible languages as you want.

How could a computer translation of such kind be put to use?  I think it could be put to use in
many many ways.  You could get a computer to read a body of text in one Indian language and
automatically output text in a different Indian language with no grammatical errors.  So a writer
of a novel in one Indian language can instantly convert his novel into a novel in 20 different
languages.  The body of knowledge in one Indian language can quickly be expanded to every language
in India.  A Webpage in one Indian Language could be translated instantly into any other Indian
language with the touch of a button.  An Industrial Drawing of a spacecraft in one Indian language
can be automatically converted into other Indian language by an engineer by just clicking on a
button. This will promote collaboration between teams that speak different languages and maybe
hundreds of kilometers apart.
 
 A source code of a software written in one Indian language could be automatically converted into
source code in a second Indian language. This new computer generated source code will get accepted
by the compiler in the second Indian language and produce an executable file that will run as
intended without any errors. 
 
You could make submitting any  government forms possible in any language at any location. Lets say
a Bengali woman moves to kerala.  She could log on to the Government website to file her taxes and
input information in Bengali. The website will automatically convert her forms into Malayalam and
store it after sending her a refund. She does not forget her language.  The government of Kerala
is happy that it is able to collect all information in the language of its liking.  Nobody is
threatened.  Nobody's language is threatened.  Everyone will be happy.

When something like this gets implemented, language barriers will become irrelevant.  We could
take this concept even further.   An important feature of Indian languages are that they are
phonetic (There is no ambiguity between how the words are written on paper and how they are
pronounced).  And for this reason, they are most well suited towards voice recognition.  We could
combine voice recognition with language translation to get very good results. 

We could create a mobile device, that will
 
1: Accept voice input in one language
2. Convert that input into text
3. Translate that text into a second language
4. Playback the text in the second language.
 
When this happens, for instance, A telugu man on a 6 month work assignment to Gujarat will not
need to learn gujarathi.  Instead he will carry a device that will let him translate spoken
Gujarathi into spoken Telugu.  He will also have the ability to translate what he says in telugu
into gujarathi automatically.  The fact that a common language does not exist in the country will
be of no consequence. 
 
With such a system, everyone in the country will enjoy the benefit of living in a large country
(and economy).  Diversity of population will be maintained.  Every language will have the ability
to access any knowledge and information in any other Indian language.  People will not avoid a
language just because the size of literature in that language is small. No population will feel
left behind and find cause to revolt. There is be no loss of communication due to the lack of a
common language.  Students need not burden themselves by trying to learn 3 different languages. 
This will free up their energy to become better professionals in a shorter period of time.  This
will boost productivity and raise peoples wages and standard of living.

You could tell me that computer translation has already been attempted by many websites and that
tools to translate from english to Japanese already exist.  But the rate of success of all
attempts have not been encouraging.  Like I said before, english is a very flawed language. 
Indian languages with their roots in Panini grammar are very well suited towards computerized
translation from one language to another.  Language translation between Indian languages will be
of much more practical use than between any other languages of our world. (I am not just bragging)
I believe it is a matter of time before someone starts a company to sell  language translation
software to translate Indian languages.  The people ruling the country need to have such a vision
to get this to happen sooner.  The current rulers are more interested in IPL matches and making
money from outdated 2G spectrum mobile services.

8 comments:

Unknown said...

It is indeed sad that the useless UPA government has set aside money in the budget for Islamic institutions propagating Arabic and not spending anything to promote sanskrit.

Pagan said...

@Chitrakutdesh:
Well, even Murli Manohar Joshi, HRD Min in NDA govt, did more to promote Urdu than Sanskrit or even other Indian languages. Let us not forget that. And they foolishly used the ELM for their India Shining campaign. ELM followers account for a tiny fraction of vote share and they hardly go out to vote. No wonder 'India Shining' boomeranged on them.

daisies said...

the problem with sanskrit is that it is methodical to the extreme. therefore, it is unnatural and difficult to use.

or perhaps, we could say - it is highly left-brained.

for the more right-brained, artistic, creative, types, less structure and more simplicity is preferred.

in any case - all languages in the world have two versions - the spoken (vernacular) and the formal (text-book).

it just means that the human brain, which is left+right combo, cannot stay with fixed rigid rules.

best use of sanskrit is perhaps in shlokas and chanting - the sounds have great (and proven) effects on the mind-body system.

JS said...

so why don't you wipe them out? They are certainly wiping you out.

Unknown said...

@inferno

Who is discussing the NDA here? The NDA was a compromise government. It was not a true RSS/BJP government for the people of India. It certainly was better than having an illeterate and indophobic italian ruling the country.

@daisies

Sanskrit alone should be the national language. The talk about it being too difficult should not be a hindrance in this age of computers. Are you trying to say we should remain backward only because progressing and using better tools is too much of an effort? If we have leaders think in this mannar, India will never be a prosperous nation.

Contrast your attitude toward sanskrit with the Japanese attitude towards the Kanji script. The Japanese script is impossibly difficult to master and requires several years of efforts to master. But they still use the script in everyday life and in technical and administrative documents. Nobody in Japan says the Kanji needs to go because its is too difficult to learn.

Sanskrit alone can unite the country with a common national language. And it shud be done ASAP.

Unknown said...

@daisies


actually, if you ask questions like why choose sanskrit due to its difficulty, it means you did not comprehend the message in the posting.

Sanskrit is an ideal language for implementing artificial intelligence. It is an essential tool for progress. And you want to tell us India shud avoid it due to its complexity.

This means you want India to remain in the dump and not innovate and improve its peoples lives and lead the world in science and technology. (Are you a UPA supporter?? You certainly sound like one now)

I suggest you re-read the post again and again till you get the gist of the article. Then you will not ask such a stupid question.

drisyadrisya said...

saMskRutEna sambhAShaNam api kuru | saraLamEva |

Raja said...

Daisies,
Is Sanskrit only fit chant "shlokas and mantras"? You are horribly ignorant! If you have ever programmed in any modern computer language, you would have known about the BNF grammar that forms the basis for all these modern computer languages. Do you know where this grammar came from? It is a direct adaptation of Panini's Sanskrit grammar! Now don't call me a Chaddi or a sanghi. I am a computer scientist and I did my masters in computer science in USA. The course I studied on computer languages has this reference to Panini's grammar and how Backus and Naur adapted it. Unfortunately, as most whites d, they shamelessly did not achkonledge it. Though amendments to it have been made recently.