自动重构工具找到类似的Java / Javascript重复源代码? [英] Automated-refactoring tool to find similar duplicate source code for Java/Javascript?

查看:206
本文介绍了自动重构工具找到类似的Java / Javascript重复源代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种工具来查找Java / Javascript的重复或类似的代码。我无法确定类似的确切定义,但是我希望该工具足够聪明,并给我建议重构代码,例如


$ b $ (1)A类和B类具有相似的方法(例如,有5种方法具有相同的方法名称,参数和类似的实现出现在两个类中),那么它应该建议将这些类似的方法转移到基类中。

(2)A类在不同的地方有多个类似的代码行,该工具应该建议将这些类似的代码行移动到一个方法中。



我尝试过



[表格:Roy, Cordy,Koschke:比较和评估代码克隆检测技术和工具:定性方法,计算机编程科学,第74卷第7期,2009年5月。本文描绘了许多不同的克隆检测方法并评估其有效性。]



[PMD未列出,但使用Rabin-Karp字符串匹配,根据上表进行基于文本,而不是AST匹配。]



Re OP的要求:



CloneDR(实际上没有工具我知道)将不会在多个方法中找到一组类似的方法,如果这些方法发生在不同类别的不同顺序中。在这种情况下,CloneDR更有可能将各种方法报告为克隆;最终的结果是一样的。如果成员在不同的类中以相同的顺序顺序发生,就会发现这样的一个集合,就像一个类的主体从另一个批发复制中发生的那样。



类似的代码跨多个方法的块是非常普遍的检测。生成的报告显示了相似的代码块是如何相关的,包括一个抽象版本的代码,这本身就是方法体所需的参数化代码块。


I'm looking for a tool to find duplicate or similar code of Java/Javascript. I can't tell the exact definition of "similar", but I wish the tool is smart enough and give me advices to refactor the code, e.g.,

(1) class A and class B have imilar methods (e.g., there 5 methods have same method name, arguments and similar implementation appearing in both classes), then it should advise to move these similar methods into a base class.
(2) class A has similar code lines at different places multiple times, the tool should advise to move these similar code lines into a single method.

I tried PMD which can find duplicate code lines but it's not clever enough. It did not find out those similar source codes which is widely spreaded in one my projects.

Is there such tool?

解决方案

Our CloneDR tool finds duplicated code by comparing abstract syntax trees from parsers. (It comes in language-specific versions for many languages, including Java and JavaScript).

This means it can find cloned code in spite of format changes and modifications of the body of the clone, both of which are often done while cloning. Found clones match language concepts such as expression, declaration, statements, functions, and even classes. Clones that are similar are reported along with the differences/variation points as proposed parameters.

It can find clone sets with multiple instances (we've some applications with hundreds of clones of a single bit of code), and it can find clones across many source files.

It produces HTML reports that are directly readable by people, and XML reports that can be processed by other downstream tools. (You can see some sample HTML reports via the link).

Similarity is hard to define, and in fact you can define it in many ways. CloneDR defines it as the ratio of identical elements (technically, AST nodes) across a clone set divided by the total number of elements across the clone set. This ratio is a value between 0 and 1. It is compared against a threshold; we've found that 95% is surprisingly robust as threshold in terms of the quality of reported clones.

It is useful to establish a minimum size for interesting clones. a*b is a clone of x*y (with 2 parameters) but isn't useful to report because it is too small. CloneDR also uses a size threshold which we call "line count", but in fact is the size of the clone in elements divided by the average number of elements per line across the entire code base. This produces clones which usually have more lines than the threshold, but it will find clones for enormous expressions that are within a line. We've found that 5-6 "lines" is also fairly robust in terms of reported clone quality.

This table shows how effective the AST matching approach of CloneDR is compared to many other clone detection tools (ranking it "very well"). The only one that comes close is CCDIML …. which is an academic re-implementation of the CloneDR approach. There are other approaches (namely PDG-based approaches) which can detect clones that are scattered about more effectively, but in practice, in my personal experience, people that clone code don’t usually cut the cloned part into a bunch of separate parts to scatter them about; they are just too lazy. YMMV.

[Table from: Roy, Cordy, Koschke: Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach , Science of Computer Programming, Volume 74 Issue 7, May, 2009. This paper sketches many different clone detection approaches and evaluates their effectiveness.]

[PMD isn't listed, but apparantly using Rabin-Karp string matching, "text based" according to the above table, rather than AST matching.]

Re OP's requirements:

CloneDR (and in fact no tool I know) will NOT find a set of similar methods across multiple methods, if those methods occur in different orders in different classes. In this case, CloneDR is more likely to report the individual methods as clones; the net result is the same. It will find such a set if the members occur sequentially in the same order in the different classes, as happens when one class body has been wholesale copied from another.

Similar code blocks across multiple methods is quite commonly detected. The generated report shows how the the similar code blocks are related, including an abstracted version of the code which is essentially the parameterized code block you need for a method body.

这篇关于自动重构工具找到类似的Java / Javascript重复源代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆