为什么我的 Spark SVM 总是预测相同的标签? [英] Why is my Spark SVM always predicting the same label?

查看：29 发布时间：2021/11/14 21:08:26 python apache-spark svm pyspark apache-spark-mllib

本文介绍了为什么我的 Spark SVM 总是预测相同的标签?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在让 SVM 预测 0 和 1 的位置时遇到了麻烦.似乎在我训练它并给它更多数据之后，它总是想预测一个 1 或一个 0，但它会预测全 1 或全 0，而不是两者的混合.我想知道你们中是否有人能告诉我我做错了什么.

I'm having trouble getting my SVM to predict 0's and 1's where I would expect it to. It seems that after I train it and give it more data, it always wants to predict a 1 or a 0, but it will predict all 1's or all 0's, and never a mix of the two. I'm wondering if one of you could tell me what I'm doing wrong.

我搜索过svm 总是预测相同的值"和类似的问题，对于我们这些机器学习新手来说，这看起来很常见.恐怕我不明白我遇到的答案.

I've searched for "svm always predicting same value" and similar problems, and it looks like this is pretty common for those of us new to machine learning. I'm afraid though that I don't understand the answers that I've come across.

所以我从这个开始，它或多或少是有效的:

So I start off with this, and it more or less works:

from pyspark.mllib.regression import LabeledPoint
cooked_rdd = sc.parallelize([LabeledPoint(0, [0]), LabeledPoint(1, [1])])
from pyspark.mllib.classification import SVMWithSGD
model = SVMWithSGD.train(cooked_rdd)

我说或多或少"是因为

model.predict([0])
Out[47]: 0

正是我所期望的，而且...

is what I would expect, and...

model.predict([1])
Out[48]: 1

也是我所期望的，但是...

is also what I would expect, but...

model.predict([0.000001])
Out[49]: 1

绝对不是我所期望的.我认为无论是什么原因造成的都是我问题的根源.

is definitely not what I expected. I think that whatever is causing that is at the root of my problems.

在这里，我首先处理我的数据...

Here I start by cooking my data...

def cook_data():
  x = random()
  y = random()
  dice = 0.25 + (random() * 0.5)
  if x**2 + y**2 > dice:
    category = 0
  else:
    category = 1
  return LabeledPoint(category, [x, y])

cooked_data = []
for i in range(0,5000):
  cooked_data.append(cook_data())

...然后我得到了一团美丽的点云.当我绘制它们时，我会得到一个带有一点混乱区域的分区，但是任何幼儿园的孩子都可以画一条线来将它们分开.那么为什么当我尝试画一条线将它们分开时...

... and I get a beautiful cloud of points. When I plot them I get a division with a little bit of a muddled area, but any kindergartner could draw a line to separate them. So why is that when I try drawing a line to separate them...

cooked_rdd = sc.parallelize(cooked_data)
training, testing = cooked_rdd.randomSplit([0.9, 0.1], seed = 1)
model = SVMWithSGD.train(training)
prediction_and_label = testing.map(lambda p : (model.predict(p.features), p.label))

...我只能将它们归为一组，不能归为两组?(下面的列表显示了 SVM 预测的元组，以及答案应该是什么.)

...I can only lump them into one group, and not two? (Below is a list that shows tuples of what the SVM predicted, and what the answer should have been.)

prediction_and_label.collect()
Out[54]: 
[(0, 1.0),
 (0, 0.0),
 (0, 0.0),
 (0, 1.0),
 (0, 0.0),
 (0, 0.0),
 (0, 1.0),
 (0, 0.0),
 (0, 1.0),
 (0, 1.0),
...

等等.它只猜测 0，当应该有一个非常明显的划分时它应该开始猜测 1.谁能告诉我我做错了什么?感谢您的帮助.

And so on. It only ever guesses 0, when there should be a pretty obvious division where it should start guessing 1. Can anyone tell me what I'm doing wrong? Thanks for your help.

我认为这不是比例问题，正如其他一些有类似问题的帖子所建议的那样.我试过把所有东西都乘以 100，但我仍然遇到同样的问题.我也尝试玩弄我如何计算骰子"变量，但我所能做的就是将 SVM 的猜测从全 0 更改为全 1.

I don't think it's a problem with scale, as was suggested in some other posts with similar problems. I've tried multiplying everything by 100, and I still get the same problem. I also try playing with how I calculate my "dice" variable, but all I can do is change the SVM's guesses from all 0's to all 1's.

为什么我的 Spark SVM 总是预测相同的标签? [英] Why is my Spark SVM always predicting the same label?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么我的 Spark SVM 总是预测相同的标签? [英] Why is my Spark SVM always predicting the same label?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭