为什么我的Spark SVM总是预测相同的标签? [英] Why is my Spark SVM always predicting the same label?

查看：170 发布时间：2020/9/4 18:43:14 python apache-spark svm pyspark apache-spark-mllib

本文介绍了为什么我的Spark SVM总是预测相同的标签?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我无法让我的SVM预测我期望的0和1.看来，在我训练它并提供更多数据之后，它总是希望预测1或0，但是它将预测所有1或全部0，而不是两者的混合.我想知道你们中的一个人能否告诉我我在做错什么.

I'm having trouble getting my SVM to predict 0's and 1's where I would expect it to. It seems that after I train it and give it more data, it always wants to predict a 1 or a 0, but it will predict all 1's or all 0's, and never a mix of the two. I'm wondering if one of you could tell me what I'm doing wrong.

我搜索了"svm总是预测相同的值"和类似的问题，对于刚接触机器学习的我们来说，这似乎很常见.恐怕我不明白所遇到的答案.

I've searched for "svm always predicting same value" and similar problems, and it looks like this is pretty common for those of us new to machine learning. I'm afraid though that I don't understand the answers that I've come across.

所以我从这个开始，它或多或少地起作用:

So I start off with this, and it more or less works:

from pyspark.mllib.regression import LabeledPoint
cooked_rdd = sc.parallelize([LabeledPoint(0, [0]), LabeledPoint(1, [1])])
from pyspark.mllib.classification import SVMWithSGD
model = SVMWithSGD.train(cooked_rdd)

我说或多或少"是因为

model.predict([0])
Out[47]: 0

是我所期望的，并且...

is what I would expect, and...

model.predict([1])
Out[48]: 1

也是我所期望的，但是...

is also what I would expect, but...

model.predict([0.000001])
Out[49]: 1

绝对不是我所期望的.我认为造成问题的根本原因是我的问题.

is definitely not what I expected. I think that whatever is causing that is at the root of my problems.

在这里，我首先准备数据...

Here I start by cooking my data...

def cook_data():
  x = random()
  y = random()
  dice = 0.25 + (random() * 0.5)
  if x**2 + y**2 > dice:
    category = 0
  else:
    category = 1
  return LabeledPoint(category, [x, y])

cooked_data = []
for i in range(0,5000):
  cooked_data.append(cook_data())

...我得到了美丽的点云.当我绘制它们时，我得到了一个有一点点混乱区域的划分，但是任何幼儿园的人都可以画一条线来将它们分开.那为什么当我尝试画一条线将它们分开时...

... and I get a beautiful cloud of points. When I plot them I get a division with a little bit of a muddled area, but any kindergartner could draw a line to separate them. So why is that when I try drawing a line to separate them...

cooked_rdd = sc.parallelize(cooked_data)
training, testing = cooked_rdd.randomSplit([0.9, 0.1], seed = 1)
model = SVMWithSGD.train(training)
prediction_and_label = testing.map(lambda p : (model.predict(p.features), p.label))

...我只能将它们分成一组，而不是两个? (下面的列表显示了SVM预测的元组，以及答案应该是什么.)

...I can only lump them into one group, and not two? (Below is a list that shows tuples of what the SVM predicted, and what the answer should have been.)

prediction_and_label.collect()
Out[54]: 
[(0, 1.0),
 (0, 0.0),
 (0, 0.0),
 (0, 1.0),
 (0, 0.0),
 (0, 0.0),
 (0, 1.0),
 (0, 0.0),
 (0, 1.0),
 (0, 1.0),
...

以此类推.当应该有一个很明显的除法开始猜测1时，它只会猜测0.有人可以告诉我我做错了什么吗?感谢您的帮助.

And so on. It only ever guesses 0, when there should be a pretty obvious division where it should start guessing 1. Can anyone tell me what I'm doing wrong? Thanks for your help.

我认为这不是规模问题，就像其他一些有类似问题的帖子所建议的那样.我尝试将所有内容乘以100，但仍然遇到相同的问题.我也尝试使用如何计算"dice"变量，但是我所能做的就是将SVM的猜测值从全0更改为全1.

I don't think it's a problem with scale, as was suggested in some other posts with similar problems. I've tried multiplying everything by 100, and I still get the same problem. I also try playing with how I calculate my "dice" variable, but all I can do is change the SVM's guesses from all 0's to all 1's.

为什么我的Spark SVM总是预测相同的标签? [英] Why is my Spark SVM always predicting the same label?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么我的Spark SVM总是预测相同的标签? [英] Why is my Spark SVM always predicting the same label?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭