Swarajya Logo

Culture

Why Sanskrit Is Important For Indian Language Computing

Prof K RamasubramanianMar 02, 2015, 12:32 AM | Updated Feb 24, 2016, 04:26 PM IST
Story hero image


That Sanskrit has a Context Free Grammar (CFG) is a myth. But there are other things about Sanskrit that make the language very important from the perspective of computer applications, especially in Indian languages.

In recent times we have had a lot of debate on the nature and importance of Sanskrit. Sriram Shankar’s piece (‘Is Sanskrit really the most scientific language’) is a welcome addition to this debate.

Besides effectively bringing out certain interesting features of Sanskrit language such as “one-to-one mapping between orthography (what you write) and phonography (what you say)”, Sriram Shankar (SS) in his article, primarily raises two questions: Is Sanskrit most scientific? Is it best suited for computer science?

Unfortunately SS seems to have no clear definition of what is a “scientific language” and what it means to be “suited for computer science”. Moreover, he has confounded the issue by bringing in one more misconception that the Sanskrit Grammar is Context Free Grammar (CFG). This misconception has been there in the air for quite some time, and SS has only added to that. Truly speaking, there is NO ‘natural’ language whose grammar can be declared as CFG, and this is certainly not true of Sanskrit (see section 2). For the purpose of clarity, we present this note in three sections:

1. Review of Sriram Shankar’s article.

2. Illustration of why Sanskrit grammar is NOT a Context Free Grammar

3. General observations and specific suggestions to move forward

Review of Sriram Shankar’s article

While the grammar of all known ‘natural’ languages fall under the category of Context Sensitive Grammars (CSG), SS in his article informs the reader that Sanskrit has a CFG. This misconception is surely an outcome of, (i) not having a direct and first-hand exposure to Panini’s works, and (ii) reading some unreliable secondary material from the web and other sources. It is strange that SS did not realize that if the grammars of Sanskrit or other natural languages were CFG, then our fond hopes to have computer programs that could analyse sentences in the natural languages (such as English) would have been realized long back. Even today, the commercial products for analysing grammar and style of any natural language fall short of the ideal. This being the case, SS states:

Sanskrit comes very close to having a context-free grammar, thanks to Panini,

which clearly proves that he does not know what CFG is. Moreover, the following statement made by SS In the paper,

Briggs describes . . . he goes on to state that Sanskrit is most suited for this task, because of its context-free grammar.

This shows that SS has thoroughly misunderstood the paper of Briggs. In fact, the above statement makes one even wonder if SS has ever read the paper of Briggs, as the latter does not even mention about CFG anywhere in his paper.

Towards the end of his article, SS having made a guarded statement “Sanskrit is an ancient language with a rich literature and one of the first formally described grammar structures”, finally concludes: “most of the commentary that abounds on the internet is half-educated”. Ironically, his article too falls under the same category!

Having pointed out some of the factual errors in the article of SS, we would now like to draw attention on certain other aspects of the article. It is not clear as to why SS chooses to commence his article, on a supposedly serious topic, with an unnecessary dig at the NDA government, immediately followed by sarcastic remarks about certain events organized (or statements made) at the Indian Science Congress 2015. Such statements, besides digressing the attention of readers, also gives a political slant to the article which is highly undesirable, particularly when one wants to engage the reader on a serious topic.

Sanskrit Grammar is not a CFG: Illustration

In this section, by taking two simple examples of sandhi rules we would like to illustrate why Sanskrit grammar is not a CFG.

Example 1:

Suppose we would like to combine the two words

एक + एकम्          (1)

Here the last letter of the first word is अ , and the first letter of the second word is ए.  As per the rule give by the sutra ✈वृद्धिरेचि(6.1.88), the two letters would be replaced by the letter ऐ  . That is,

अ+ए = ऐ            (2)

Hence (1) becomes,

एक् + ऐ +कम् = एकैकम्        (3)

Example 2:

Suppose we would like to combine the two words

प्र + एजते                    (4)

As in the previous case (1), here agan the last letter of the first word is अ, and the first letter of the second word is ए. Since the conditions are identical, one expects the rule given by (2) to come into operation, and one expects the word in combined form to be प्रैजते . But that’s not the case!

Actually, another rule given by the sutra एड़ि पररूपम्  (6.1.94), comes into operation preventing the above form. Without getting into finer details, it essentially states that if the second word happens to be a ‘verb’ commencing with ए, then the two letter have to be combined as per the prescription given by this sutra, which may be written as

अ + ए = ए   (choose the latter form) (5)

Hence (4) becomes,

प्र + ए + जते = प्रेजते                                   (6)

Thus, though the LHS of (2) and (5) are identical, the rules that operate on (1) and (4) are entirely different, which clearly demonstrates that Sanskrit grammar is context-sensitive and not context-free. One can present umpteen such examples to prove the point. Be that as it may, let us now move to make some general observations based on the article of SS, and also offer some specific suggestions.

General observations and specific suggestions to move forward

The advantage of Sanskrit is, it has a completely formulated grammar, the Ashtadhyayi of Panini. No other natural language can claim to have such a ‘full-fledged’ grammar. Moreover, Sanskrit grammar forms an example (or only example?) of what is called a generative grammar, in contrast to descriptive grammar which most of the other languages of the world have.

Generative grammars are those, which on the basis of a well-developed lexicon (कोश ) and a set of rules (विधि ), can generate ‘all’ and ‘only’ acceptable expressions of a language. Whereas, descriptive grammars are those which simply try to describe / state regularities, and do not aim to generate ‘all’ and ‘only’ accpeted expressions of the language. This being the case, several concepts and methods of Panini have inspired similar concepts and methods in Modern Linguistics and Computer science.

Further, the Sanskrit grammarians have arrived at a very systematic method of analysing the meaning of a given sentence (शाब्दबोध ). It is this point which Briggs was trying to highlight in his paper. In fact, he even argues that the method adopted by Sanskrit grammarians for paraphrasing a statement, is akin to what is referred to as knowledge representation in Artificial Intelligence. With sufficient research, I am sure we can discover several novel techniques by seriously contemplating and investigating the concepts, methods and techniques of Indian grammarians. The question is: how are they relevant to computer science?

At this point, we can only say that they are potentially of great relevance—a fact which is proved by the several concepts spawned by the study of Sanskrit grammar by Europeans in 19th and 20th centuries! But we are yet to harness any of these potentialities. One way to do so, is by initiating a national program for developing all the standard computer applications for processing Indian languages. By this we mean applications such as searching for a string, spell check, grammar check, finding synonyms, antonyms, etc. that are widely available for English, Chinese, French, German, Russian, or even Spanish.

As of today, even though several texts in various local languages of India (Sanskrit, Tamil, Malayalam, Telugu, Bengali, Marathi, etc.) are available in digital format (thanks to Google!), most of them are scanned versions, and we do not have any text processing facilities (searching for a string, obtaining the meaning, and so on) available for them. We also do not have ‘satisfactory’ software applications that can read digital texts, and can convert them into voice either for Sanskrit and other local languages. While even these basic processing facilities are not available, what to speak of machine translation tools that can help people to get a sentence translated from one language to the other! Thus, there is indeed a need to develop:

1. Text processing tools to handle Sanskrit and as other Indian languages.

2. Digital to voice converter and vice versa that can help people read texts, as well as record in Sanskrit and other regional languages.

3. Machine translation tools to move from one language to the other.

Of the three applications mentioned above, Machine Translation is one of the most important applications of Natural Language Processing (NLP), and is perhaps the most challenging one too. As it tries to take the text from one language, generally called source language (SL), to the other called the target language (TL), the machine has to understand the meaning of the text in the SL thoroughly. This is the most difficult part, and it is precisely here that the wonderful and most systematic structure of grammar (‘scientific’ if one may call so) that has been created by Panini, has been found extremely useful. This is what one means by saying that the Sanskrit or Sanskrit Grammar is ‘useful for computer science’, and certainly not in the sense of developing any programming language like Java or C.

While attempting machine translation, instead of moving from SL to TL directly, it was felt that it would be convenient to employ a pseudo-interlingua approach, during which attempt is made to remove all the disambiguities. Here too, the highly systematic and logical approach adopted by Panini is found extremely useful, and linguists have developed morphological analyser, as well as karaka analyser, based on Paninian grammar. Though, there has been reasonable success in this regard based on the attempts made so far, there is a lot more to be done.

Needless to say we need brilliant computer scientists to take up this problem in order to accomplish this task of machine translation. But a mere union of computer scientists will not be able to move much forward without the help of grammarians and linguists. What is most desirable is a blend of all these in the same individual. To begin with, at least we need to have a group consisting of both these set of people who can understand the language of one another.

It is high time that we focus our attention in training a large group of scholars who have high competence in modern computer science, Sanskrit grammar and modern Indian languages—most of which would have their base in Sanskrit. In the process of developing such a group, we will also possibly generate a set of experts who could become leaders of innovation!

Join our WhatsApp channel - no spam, only sharp analysis