将一种编程语言翻译成另一种人类语言有多困难? [英] How hard would it be to translate a programming language to another human language?
问题描述
让我解释一下.假设我想向只会说西班牙语的人教Python.如您所知,在大多数编程语言中,所有关键字都是英语.创建将在给定的源代码中找到所有关键字并将其翻译的程序有多复杂?我需要使用解析器和其他东西,还是几个正则表达式和字符串函数就足够了?
Let me explain. Suppose I want to teach Python to someone who only speaks Spanish. As you know, in most programming languages all keywords are in English. How complex would it be to create a program that will find all keywords in a given source code and translate them? Would I need to use a parser and stuff, or will a couple of regexes and string functions be enough?
如果取决于源编程语言,则Python和Javascript将是最重要的.
If it depends on the source programming language, then Python and Javascript would be the most important.
我的意思是将有多复杂"拥有一个关键字列表并解析源代码以查找不在引号中的关键字就足够了吗?还是有足够多的语法怪异的东西需要更复杂的东西?
What I mean by "how complex would it be" is that would it be enough to have a list of keywords, and parse the source code to find keywords not in quotes? Or are there enough syntactical weirdnesses that something more complicated is required?
推荐答案
如果您只想翻译关键字,则(尽管您确实确实需要适当的解析器,否则避免了字符串的任何更改,注释& c变成了一场噩梦)的任务很简单.例如,由于您提到了Python:
If all you want is to translate keywords, then (while you definitely DO need a proper parser, as otherwise avoiding any change in strings, comments &c becomes a nightmare) the task is quite simple. For example, since you mentioned Python:
import cStringIO
import keyword
import token
import tokenize
samp = '''\
for x in range(8):
if x%2:
y = x
while y>0:
print y,
y -= 3
print
'''
translate = {'for': 'per', 'if': 'se', 'while': 'mentre', 'print': 'stampa'}
def toks(tokens):
for tt, ts, src, erc, ll in tokens:
if tt == token.NAME and keyword.iskeyword(ts):
ts = translate.get(ts, ts)
yield tt, ts
def main():
rl = cStringIO.StringIO(samp).readline
toki = toks(tokenize.generate_tokens(rl))
print tokenize.untokenize(toki)
main()
我希望很明显如何将其概括为可以翻译"任何Python源和任何语言的语言(我只提供了非常部分的意大利语关键字翻译命令).发出:
I hope it's obvious how to generalize this to "translate" any Python source and in any language (I'm supplying only a very partial Italian keyword translation dict). This emits:
per x in range (8 ):
se x %2 :
y =x
mentre y >0 :
stampa y ,
y -=3
stampa
(尽管正确的空白很奇怪,但是可以很容易地对其进行补救).作为一名讲意大利语的人,我可以告诉您这本书读起来很糟糕,但是对于您所希望的任何编程语言翻译"课程来说,这都是同等的.更糟糕的是,诸如range
之类的非关键字仍未翻译(根据您的规范)-当然,您没有必须将翻译限制为仅关键字(这样很容易删除执行上述操作的if
;-).
(strange though correct whitespace, but that could be easily enough remedied). As an Italian speaker I can tell you this is terrible to read, but that's par for the course for any "programming language translation" as you desire. Worse, NON-keywords such as range
remain un-translated (as per your specs) -- of course, you don't have to constrain your translation to keywords-only (it's easy enough to remove the if
that does that above;-).
这篇关于将一种编程语言翻译成另一种人类语言有多困难?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!