使用Tensorflow的LinearClassifier和Panda的数据框构建SVM [英] Building SVM with tensorflow's LinearClassifier and Panda's Dataframes
问题描述
我知道此问题,但这是针对过时的功能.
比方说,我正在尝试根据某人已经访问过的国家和他们的收入来预测该人是否会访问"X"国.
我在pandas DataFrame中有一个训练数据集,格式如下.
- 每一行代表一个不同的人,每个人与矩阵中的其他人都不相关.
- 前10列均为国家/地区名称,其中的值为 该列为二进制(如果他们访问过该国家,则为1;如果是,则为0 他们没有).
- 第11栏是他们的收入.这是一个连续的十进制变量.
- 最后,第12列是另一个二进制表,表示是的,他们是否访问过'X'.
因此,从本质上讲,如果我的数据集中有100,000个人,那么我的数据框的尺寸为100,000 x 12
.我希望能够使用tensorflow将其正确传递到线性分类器中.但是甚至不确定如何解决这个问题.
我正在尝试将数据传递到此功能
estimator = LinearClassifier(
n_classes=n_classes, feature_columns=[sparse_column_a,
sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)
(如果对使用哪种估算器有更好的建议,我可以尝试使用它.)
我将数据传递为:
df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)
但是,我不确定如何获取此输出并将其正确传递到分类器中.我是否可以正确设置问题?我不是来自数据科学领域,因此任何指导都将非常有帮助!
关注点
- 第11列是协变量.因此,我认为它不能仅作为功能部件传递,对吗?
- 由于第11列是与第1列到第10列完全不同的功能,因此我也如何将第11列并入分类器中.
- 至少,即使我忽略第11列,如何至少将第1列到第10列与label =第12列匹配,并将其传递给分类器?
(赏金所需的工作代码)
线性SVM
SVM是最大边距分类器,即,它使将正分类与负分类分开的宽度或余量最大化.下面给出了二进制分类情况下线性支持向量机的损失函数.
它可以从下面显示的更广义的多类线性SVM损耗(也称为铰链损耗)(Δ= 1)中得出.
注意:在以上所有等式中,权重向量w
包括偏差b
有人到底是怎么想出这种损失的? 让我们深入研究吧.
上图显示了属于正类的数据点和属于负类的数据点之间通过一个分隔的超平面(显示为实线)分开的情况.但是,可以有许多这样的分离超平面. SVM找到分离的超平面,以使超平面到最近的正数据点和最近的负数据点的距离最大(显示为虚线).
从数学上讲,SVM找到权重向量w
(包括偏差)使得
如果+ ve类和-ve类的标签(y
)分别为+1
和-1
,则SVM会找到w
这样
•如果数据点位于超平面的正确一侧(正确分类),则
•如果数据点位于错误的一侧(未分类),则
因此,数据点的损失(可以衡量未命中分类)可以写为
正则化
如果权重向量w
正确分类了数据(X
),则这些权重向量λw
的任意倍数,其中λ>1
也将正确分类数据(零损失).这是因为变换λW
拉伸了所有得分幅度,因此也拉伸了它们的绝对差. L2正则化通过将正则化损失添加到铰链损失中来惩罚较大的权重.
例如,如果x=[1,1,1,1]
和两个权重向量w1=[1,0,0,0]
,w2=[0.25,0.25,0.25,0.25]
.然后dot(W1,x) =dot(w2,x) =1
,即两个权重向量都导致相同的点积,从而导致相同的铰链损耗.但是w1
的L2罚则是1.0
,而w2
的L2罚则只有0.25
.因此,L2正则化更喜欢w2
而不是w1
.鼓励分类器将所有输入维度都考虑在内,而不是非常严格地考虑一些输入维度.这样可以改善模型的通用性,并减少过度拟合的情况.
现在我们知道线性SVM的损失函数,我们可以使用梯度适当的方法(或其他优化程序)来找到将损失最小化的权重向量.
代码
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y = iris.target[iris.target != 2]
# Change labels to +1 and -1
y = np.where(y==1, y, -1)
# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge loss
def hinge_loss(y_true, y_pred):
return tf.maximum(0., 1- y_true*y_pred)
# Train the model
model.compile(optimizer='adam', loss=hinge_loss)
model.fit(X, y, epochs=50000, verbose=False)
# Plot the learned decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()
SVM也可以表示为约束二次优化问题.这种表述的优点是我们可以使用内核技巧对非线性可分离数据进行分类(使用不同的内核). LIBSVM为内核化支持向量机(SVM)实现了序列最小优化(SMO)算法.
代码
from sklearn.svm import SVC
# SVM with linear kernel
clf = SVC(kernel='linear')
clf.fit(X, y)
# Plot the learned decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()
最后
可用于问题陈述的使用tf的线性SVM模型是
# Prepare Data
# 10 Binary features
df = pd.DataFrame(np.random.randint(0,2,size=(1000, 10)))
# 1 floating value feature
df[11] = np.random.uniform(0,100000, size=(1000))
# True Label
df[12] = pd.DataFrame(np.random.randint(0, 2, size=(1000)))
# Convert data to zero mean unit variance
scalar = StandardScaler().fit(df[df.columns.drop(12)])
X = scalar.transform(df[df.columns.drop(12)])
y = np.array(df[12])
# convert label to +1 and -1. Needed for hinge loss
y = np.where(y==1, +1, -1)
# Model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear',
kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge Loss
def my_loss(y_true, y_pred):
return tf.maximum(0., 1- y_true*y_pred)
# Train model
model.compile(optimizer='adam', loss=my_loss)
model.fit(X, y, epochs=100, verbose=True)
K折交叉验证和做出预测
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.metrics import roc_curve, auc
# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y_ = iris.target[iris.target != 2]
# Change labels to +1 and -1
y = np.where(y_==1, +1, -1)
# Hinge loss
def hinge_loss(y_true, y_pred):
return tf.maximum(0., 1- y_true*y_pred)
def get_model():
# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
model.compile(optimizer='adam', loss=hinge_loss)
return model
def sigmoid(x):
return 1 / (1 + np.exp(-x))
predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)
kf = KFold(n_splits=2, shuffle=True)
# K Fold cross validation
best = (None, -1)
for i, (train_index, test_index) in enumerate(kf.split(X)):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = get_model()
model.fit(X_train, y_train, epochs=5000, verbose=False, batch_size=128)
y_pred = model.predict_classes(X_test)
val = roc_auc_score(y_test, y_pred)
print ("CV Fold {0}: AUC: {1}".format(i+1, auc))
if best[1] < val:
best = (model, val)
# ROC Curve using the best model
y_score = predict(best[0], X)
fpr, tpr, _ = roc_curve(y_, y_score)
roc_auc = auc(fpr, tpr)
print (roc_auc)
# Plot ROC
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='darkorange',
lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()
# Make predictions
y_score = predict_class(best[0], X)
做出预测
由于模型的输出是线性的,因此我们必须将其归一化为预测的概率.如果是二进制分类,则可以使用sigmoid
;如果是多分类,则可以使用softmax
.下面的代码用于二进制分类
predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)
参考
更新1:
要使代码与tf 2.0兼容,y
的数据类型应与X
相同.为此,请在行y = np.where(.....
之后添加行y = y.astype(np.float64)
.
I'm aware of this question, but it is for an outdated function.
Let's say I'm trying to predict whether a person will visit country 'X' given the countries they have already visited and their income.
I have a training data set in a pandas DataFrame that's in the following format.
- Each row represents a different person, each unrelated to the others in matrix.
- The first 10 columns are all names of countries and the values in the column are binary (1 if they have visited that country or 0 if they haven't).
- Column 11 is their income. It's a continuous decimal variable.
- Lastly, column 12 is another binary table that says yes they have visited 'X' or not.
So essentially, if I have a 100,000 people in my dataset, then I have a dataframe of dimensions 100,000 x 12
. I want to be able to properly pass this into a linear classifier using tensorflow. But not sure even how to approach this.
I am trying to pass the data into this function
estimator = LinearClassifier(
n_classes=n_classes, feature_columns=[sparse_column_a,
sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)
(If there's a better suggestion on which estimator to use, I'd be open to trying that.)
And I'm passing data as:
df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)
However, I'm not sure how to take this output and properly pass into a classifier. Am I setting up the problem properly? I'm not coming from a data science background, so any guidance would be very helpful!
Concerns
- Column 11 is a covariate. Hence, I don't think it can just be passed in as a feature, can it?
- How can I incorporate column 11 into the classifier as well, since column 11 is a completely different type of feature than columns 1 through 10.
- At the very least, even if I ignore column 11, how do I at least fit column 1 through 10, with label = column 12 and pass this into a classifier?
(working code needed for bounty)
Linear SVM
SVM is a max margin classifier, i.e. it maximizes the width or the margin separating the positive class from the negative class. The loss function of linear SVM in case of binary classification is given below.
It can be derived from the more generalized multi class linear SVM loss (also called hinge loss) shown below (with Δ = 1).
Note: In all the above equations, the weight vector w
includes bias b
How on the earth did someone came up with this loss? Lets dig in.
Image above shows the data points belonging to positive class separated from the data point belonging to the negative class by a separating hyperplane (shown as solid line). However, there can be many such separating hyperplanes. SVM finds the separating hyperplane such that the distance of the hyperplane to the nearest positive data point and to the nearest negative data point is maximum (shown as dotted line).
Mathematically, SVM finds the weight vector w
(bias included) such that
If the labels(y
) of +ve class and -ve class are +1
and -1
respectively, then SVM finds w
such that
• If a data point is on the correct side of the hyperplane (correctly classified) then
• If a data point is on the wrong side (miss classified) then
So the loss for a data point, which is a measure of miss classification can be written as
Regularization
If a weight vector w
correctly classifies the data (X
) then any multiple of these weight vector λw
where λ>1
will also correctly classifies the data ( zero loss). This is because the transformation λW
stretches all score magnitudes and hence also their absolute differences. L2 regularization penalizes the large weights by adding the regularization loss to the hinge loss.
For example, if x=[1,1,1,1]
and two weight vectors w1=[1,0,0,0]
, w2=[0.25,0.25,0.25,0.25]
. Then dot(W1,x) =dot(w2,x) =1
i.e. both the weight vectors lead to the same dot product and hence same hinge loss. But the L2 penalty of w1
is 1.0
while the L2 penalty of w2
is only 0.25
. Hence L2 regularization prefers w2
over w1
. The classifier is encouraged to take into account all input dimensions to small amounts rather than a few input dimensions and very strongly. This improve the generalization of the model and lead to less overfitting.
L2 penalty leads to the max margin property in SVMs. If the SVM is expressed as an optimization problem then the generalized Lagrangian form for the constrained quadratic optimization problem is as below
Now that we know the loss function of linear SVM we can use gradient decent (or other optimizers) to find the weight vectors which minimizes the loss.
Code
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y = iris.target[iris.target != 2]
# Change labels to +1 and -1
y = np.where(y==1, y, -1)
# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge loss
def hinge_loss(y_true, y_pred):
return tf.maximum(0., 1- y_true*y_pred)
# Train the model
model.compile(optimizer='adam', loss=hinge_loss)
model.fit(X, y, epochs=50000, verbose=False)
# Plot the learned decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()
SVM can also be expressed as a constrained quadratic optimization problem. The advantage of this formulation is that we can use the kernel trick to classify non linearly separable data (using different kernels). LIBSVM implements the Sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs).
Code
from sklearn.svm import SVC
# SVM with linear kernel
clf = SVC(kernel='linear')
clf.fit(X, y)
# Plot the learned decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()
Finally
The Linear SVM model using tf which you can use for your problem statement is
# Prepare Data
# 10 Binary features
df = pd.DataFrame(np.random.randint(0,2,size=(1000, 10)))
# 1 floating value feature
df[11] = np.random.uniform(0,100000, size=(1000))
# True Label
df[12] = pd.DataFrame(np.random.randint(0, 2, size=(1000)))
# Convert data to zero mean unit variance
scalar = StandardScaler().fit(df[df.columns.drop(12)])
X = scalar.transform(df[df.columns.drop(12)])
y = np.array(df[12])
# convert label to +1 and -1. Needed for hinge loss
y = np.where(y==1, +1, -1)
# Model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear',
kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge Loss
def my_loss(y_true, y_pred):
return tf.maximum(0., 1- y_true*y_pred)
# Train model
model.compile(optimizer='adam', loss=my_loss)
model.fit(X, y, epochs=100, verbose=True)
K-Fold cross validation and making predictions
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.metrics import roc_curve, auc
# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y_ = iris.target[iris.target != 2]
# Change labels to +1 and -1
y = np.where(y_==1, +1, -1)
# Hinge loss
def hinge_loss(y_true, y_pred):
return tf.maximum(0., 1- y_true*y_pred)
def get_model():
# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
model.compile(optimizer='adam', loss=hinge_loss)
return model
def sigmoid(x):
return 1 / (1 + np.exp(-x))
predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)
kf = KFold(n_splits=2, shuffle=True)
# K Fold cross validation
best = (None, -1)
for i, (train_index, test_index) in enumerate(kf.split(X)):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = get_model()
model.fit(X_train, y_train, epochs=5000, verbose=False, batch_size=128)
y_pred = model.predict_classes(X_test)
val = roc_auc_score(y_test, y_pred)
print ("CV Fold {0}: AUC: {1}".format(i+1, auc))
if best[1] < val:
best = (model, val)
# ROC Curve using the best model
y_score = predict(best[0], X)
fpr, tpr, _ = roc_curve(y_, y_score)
roc_auc = auc(fpr, tpr)
print (roc_auc)
# Plot ROC
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='darkorange',
lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()
# Make predictions
y_score = predict_class(best[0], X)
Making predictions
Since the output of the model is linear we have to normalize it to probabilities to make predictions. If it is a binary classification we can use sigmoid
of if it is a multiclass classification then we can use softmax
. Below code is for binary classification
predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)
References
Update 1:
To made the code compatible with tf 2.0 the datatype of y
should be same as X
. To do this, after line y = np.where(.....
add the line y = y.astype(np.float64)
.
这篇关于使用Tensorflow的LinearClassifier和Panda的数据框构建SVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!