Spark决策树中使用了哪种算法(ID3,C4.5或CART) [英] What algorithm is used in spark decision tree (is ID3, C4.5 or CART)
问题描述
我对MLlib
中的决策树有疑问. Spark使用什么算法?是ID3,C4.5还是CART?
I have a question about decision tree in MLlib
. What algorithm is used in Spark? Is it ID3, C4.5 or CART?
推荐答案
Spark MLlib在CART中使用ID3算法.
Spark MLlib is using the ID3 algorithm with CART.
ID3仅处理分类变量,而CART可以处理连续变量. Spark决策树可以处理分类变量,因此它使用的是CART(在下面指定的Jira凭单中,我们可以看到它们尚未实现C4.5).
ID3 only handles categorical variables and CART can handle continuous variables. Spark decision trees can handle categorical variables, so it is using CART (in the Jira ticket specified below we can see that they haven't implemented C4.5 yet).
In this blog post you can find some information about the different algorithms and it is where I got the answer from.
您可以在 Jira票证中找到有关将其扩展到C4.5的讨论./a>.
You can find a discussion on extending it to C4.5 in this Jira ticket.
More information about the difference between the algorithms here.
这篇关于Spark决策树中使用了哪种算法(ID3,C4.5或CART)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!