code识别的编程语言在一个文本文件 [英] Code for identifying programming language in a text file

查看：139 发布时间：2015/11/30 14:58:50 c++ algorithm text-processing language-recognition

本文介绍了code识别的编程语言在一个文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我应该写将输出的编程语言是它在给定的文本文件code（来源$ C $ C）作为输入。这就是问题的最基本的定义。更多的约束条件如下：

i'm supposed to write code which when given a text file (source code) as input will output which programming language is it. This is the most basic definition of the problem. More constraints follow:

我必须用C ++写。
在各种各样的语言应当承认 - HTML，PHP，Perl中，红宝石，C，C ++，Java和C＃...
误报金额（错误识别）应为低 - 更好地输出未知不是一个错误的结果。（这将是概率例如为未知的清单：100％，见下文）
在输出应该是概率为每一个code知道语言的清单，因此，如果它知道C，Java和Perl，输出应该是例如：C：70％，Java的：50％的Perl： 30％（注意，没有必要为具有概率之和达到100％）
在它应该有精度/速度的很好的比例（速度更偏爱一点）

这将是非常好的，如果code可以写的方式，增加新的语言识别将是相当简单，只涉及加入设置/数据的特殊语言。我可以使用任何可用 - 启发式，神经网络，黑魔法。什么都行。我'甚至允许使用现有的解决方案，但是：该解决方案必须是免费的，开源的，并允许商业用途。它必须进来容易积源$ C $ C形式或静态库 - 无DLL。但是我preFER写我自己的code或者只是使用的另一种解决办法片段，我受够了积分$ C $别人℃。最后要注意的：也许有些人会认为FANN（快人工神经网络库） - 这是我不能使用的唯一的事情，因为这是我们使用的已经是事情，我们希望替换

It would be very nice if the code could be written in a way that adding new languages for recognition will be fairly easy and involve just adding "settings/data" for that particular language. I can use anything available - a heuristic, a neural network, black magic. Anything. I'am even allowed to use existing solutions, but: the solution must be free, opensource and allow commercial usage. It must come in form of easily integrable source code or as a static library - no DLL. However i prefer writing my own code or just using fragments of another solution, i'm fed up with integrating code of others. Last note: maybe some of you will suggest FANN (fast artificial neural network library) - this is the only thing i cannot use, since this is the thing we use ALREADY and we want to replace that.

现在的问题是：你会如何处理这样的任务，你会怎么办？如何实现这个或用什么有什么建议？

Now the question is: how would you handle such a task, what would you do? Any suggestions how to implement this or what to use?

编辑：的基础上的意见和答案，我必须强调，有些事情我忘了：速度是非常关键的，因为这将让成千上万的文件，并要回答快，所以看着千文件应该产生最多的答案为所有的人都在几秒钟内（文件大小将是小，当然，几KB各一个）。因此，尝试编译每一个不成问题。事情是，我真的想概率为每种语言 - 所以我更想知道该文件很可能是C或C ++，但该机会，它是一个bash脚本是非常低的。由于code混淆，评论等，我认为寻找一个100％的准确code是一个坏主意，事实上不是这样的目标。

based on the comments and answers i must emphasize some things i forgot: speed is very crucial, since this will get thousands of files and is supposed to answer fast, so looking at a thousand files should produce answers for all of them in a few seconds at most (the size of files will be small of course, a few kB each one). So trying to compile each one is out of question. The thing is, that i really want probabilities for each language - so i rather want to know that the file is likely to be C or C++ but that the chance it is a bash script is very low. Due to code obfuscation, comments etc. i think that looking for a 100% accurate code is a bad idea and in fact is not the goal of this.

code识别的编程语言在一个文本文件 [英] Code for identifying programming language in a text file

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

code识别的编程语言在一个文本文件 [英] Code for identifying programming language in a text file

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭