使用Tensorflow的LinearClassifier和Panda的数据框构建SVM [英] Building SVM with tensorflow's LinearClassifier and Panda's Dataframes

查看:113
本文介绍了使用Tensorflow的LinearClassifier和Panda的数据框构建SVM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道此问题,但这是针对过时的功能.

比方说,我正在尝试根据某人已经访问过的国家和他们的收入来预测该人是否会访问"X"国.

我在pandas DataFrame中有一个训练数据集,格式如下.

  1. 每一行代表一个不同的人,每个人与矩阵中的其他人都不相关.
  2. 前10列均为国家/地区名称,其中的值为 该列为二进制(如果他们访问过该国家,则为1;如果是,则为0 他们没有).
  3. 第11栏是他们的收入.这是一个连续的十进制变量.
  4. 最后,第12列是另一个二进制表,表示是的,他们是否访问过'X'.

因此,从本质上讲,如果我的数据集中有100,000个人,那么我的数据框的尺寸为100,000 x 12.我希望能够使用tensorflow将其正确传递到线性分类器中.但是甚至不确定如何解决这个问题.

我正在尝试将数据传递到此功能

estimator = LinearClassifier(
    n_classes=n_classes, feature_columns=[sparse_column_a, 
 sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)

(如果对使用哪种估算器有更好的建议,我可以尝试使用它.)

我将数据传递为:

df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)

但是,我不确定如何获取此输出并将其正确传递到分类器中.我是否可以正确设置问题?我不是来自数据科学领域,因此任何指导都将非常有帮助!

关注点

  1. 第11列是协变量.因此,我认为它不能仅作为功能部件传递,对吗?
  2. 由于第11列是与第1列到第10列完全不同的功能,因此我也如何将第11列并入分类器中.
  3. 至少,即使我忽略第11列,如何至少将第1列到第10列与label =第12列匹配,并将其传递给分类器?

(赏金所需的工作代码)

解决方案

线性SVM

SVM是最大边距分类器,即,它使将正分类与负分类分开的宽度或余量最大化.下面给出了二进制分类情况下线性支持向量机的损失函数.

它可以从下面显示的更广义的多类线性SVM损耗(也称为铰链损耗)(Δ= 1)中得出.

注意:在以上所有等式中,权重向量w包括偏差b

有人到底是怎么想出这种损失的? 让我们深入研究吧.

上图显示了属于正类的数据点和属于负类的数据点之间通过一个分隔的超平面(显示为实线)分开的情况.但是,可以有许多这样的分离超平面. SVM找到分离的超平面,以使超平面到最近的正数据点和最近的负数据点的距离最大(显示为虚线).

从数学上讲,SVM找到权重向量w(包括偏差)使得

如果+ ve类和-ve类的标签(y)分别为+1-1,则SVM会找到w这样

•如果数据点位于超平面的正确一侧(正确分类),则

•如果数据点位于错误的一侧(未分类),则

因此,数据点的损失(可以衡量未命中分类)可以写为

正则化

如果权重向量w正确分类了数据(X),则这些权重向量λw的任意倍数,其中λ>1也将正确分类数据(零损失).这是因为变换λW拉伸了所有得分幅度,因此也拉伸了它们的绝对差. L2正则化通过将正则化损失添加到铰链损失中来惩罚较大的权重.

例如,如果x=[1,1,1,1]和两个权重向量w1=[1,0,0,0]w2=[0.25,0.25,0.25,0.25].然后dot(W1,x) =dot(w2,x) =1,即两个权重向量都导致相同的点积,从而导致相同的铰链损耗.但是w1的L2罚则是1.0,而w2的L2罚则只有0.25.因此,L2正则化更喜欢w2而不是w1.鼓励分类器将所有输入维度都考虑在内,而不是非常严格地考虑一些输入维度.这样可以改善模型的通用性,并减少过度拟合的情况.

L2惩罚导致SVM中的最大保证金属性.如果将SVM表示为优化问题,则约束二次优化问题的广义Lagrangian形式如下

现在我们知道线性SVM的损失函数,我们可以使用梯度适当的方法(或其他优化程序)来找到将损失最小化的权重向量.

代码

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y = iris.target[iris.target != 2]

# Change labels to +1 and -1 
y = np.where(y==1, y, -1)

# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))

# Hinge loss
def hinge_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

# Train the model
model.compile(optimizer='adam', loss=hinge_loss)
model.fit(X, y,  epochs=50000, verbose=False)

# Plot the learned decision boundary 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()

SVM也可以表示为约束二次优化问题.这种表述的优点是我们可以使用内核技巧对非线性可分离数据进行分类(使用不同的内核). LIBSVM为内核化支持向量机(SVM)实现了序列最小优化(SMO)算法.

代码

from sklearn.svm import SVC
# SVM with linear kernel
clf = SVC(kernel='linear')
clf.fit(X, y) 

# Plot the learned decision boundary 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show() 

最后

可用于问题陈述的使用tf的线性SVM模型是

# Prepare Data 
# 10 Binary features
df = pd.DataFrame(np.random.randint(0,2,size=(1000, 10)))
# 1 floating value feature 
df[11] = np.random.uniform(0,100000, size=(1000))
# True Label 
df[12] = pd.DataFrame(np.random.randint(0, 2, size=(1000)))

# Convert data to zero mean unit variance 
scalar = StandardScaler().fit(df[df.columns.drop(12)])
X = scalar.transform(df[df.columns.drop(12)])
y = np.array(df[12])

# convert label to +1 and -1. Needed for hinge loss
y = np.where(y==1, +1, -1)

# Model 
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', 
                                kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge Loss
def my_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

# Train model 
model.compile(optimizer='adam', loss=my_loss)
model.fit(X, y,  epochs=100, verbose=True)

K折交叉验证和做出预测

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.metrics import roc_curve, auc

# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y_ = iris.target[iris.target != 2]

# Change labels to +1 and -1 
y = np.where(y_==1, +1, -1)


# Hinge loss
def hinge_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

def get_model():
    # Linear Model with L2 regularization
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
    model.compile(optimizer='adam', loss=hinge_loss)
    return model

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)


kf = KFold(n_splits=2, shuffle=True)

# K Fold cross validation
best = (None, -1)

for i, (train_index, test_index) in enumerate(kf.split(X)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model = get_model()
    model.fit(X_train, y_train, epochs=5000, verbose=False, batch_size=128)
    y_pred = model.predict_classes(X_test)
    val = roc_auc_score(y_test, y_pred)    
    print ("CV Fold {0}: AUC: {1}".format(i+1, auc))
    if best[1] < val:
        best = (model, val)

# ROC Curve using the best model
y_score = predict(best[0], X)
fpr, tpr, _ = roc_curve(y_, y_score)
roc_auc = auc(fpr, tpr)
print (roc_auc)

# Plot ROC
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()

# Make predictions
y_score = predict_class(best[0], X)

做出预测

由于模型的输出是线性的,因此我们必须将其归一化为预测的概率.如果是二进制分类,则可以使用sigmoid;如果是多分类,则可以使用softmax.下面的代码用于二进制分类

predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)

参考

  1. CS231n

更新1:

要使代码与tf 2.0兼容,y的数据类型应与X相同.为此,请在行y = np.where(.....之后添加行y = y.astype(np.float64).

I'm aware of this question, but it is for an outdated function.

Let's say I'm trying to predict whether a person will visit country 'X' given the countries they have already visited and their income.

I have a training data set in a pandas DataFrame that's in the following format.

  1. Each row represents a different person, each unrelated to the others in matrix.
  2. The first 10 columns are all names of countries and the values in the column are binary (1 if they have visited that country or 0 if they haven't).
  3. Column 11 is their income. It's a continuous decimal variable.
  4. Lastly, column 12 is another binary table that says yes they have visited 'X' or not.

So essentially, if I have a 100,000 people in my dataset, then I have a dataframe of dimensions 100,000 x 12. I want to be able to properly pass this into a linear classifier using tensorflow. But not sure even how to approach this.

I am trying to pass the data into this function

estimator = LinearClassifier(
    n_classes=n_classes, feature_columns=[sparse_column_a, 
 sparse_feature_a_x_sparse_feature_b], label_keys=label_keys)

(If there's a better suggestion on which estimator to use, I'd be open to trying that.)

And I'm passing data as:

df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL'))
tf_val = tf.estimator.inputs.pandas_input_fn(X.iloc[:, 0:9], X.iloc[:, 11], shuffle=True)

However, I'm not sure how to take this output and properly pass into a classifier. Am I setting up the problem properly? I'm not coming from a data science background, so any guidance would be very helpful!

Concerns

  1. Column 11 is a covariate. Hence, I don't think it can just be passed in as a feature, can it?
  2. How can I incorporate column 11 into the classifier as well, since column 11 is a completely different type of feature than columns 1 through 10.
  3. At the very least, even if I ignore column 11, how do I at least fit column 1 through 10, with label = column 12 and pass this into a classifier?

(working code needed for bounty)

解决方案

Linear SVM

SVM is a max margin classifier, i.e. it maximizes the width or the margin separating the positive class from the negative class. The loss function of linear SVM in case of binary classification is given below.

It can be derived from the more generalized multi class linear SVM loss (also called hinge loss) shown below (with Δ = 1).

Note: In all the above equations, the weight vector w includes bias b

How on the earth did someone came up with this loss? Lets dig in.

Image above shows the data points belonging to positive class separated from the data point belonging to the negative class by a separating hyperplane (shown as solid line). However, there can be many such separating hyperplanes. SVM finds the separating hyperplane such that the distance of the hyperplane to the nearest positive data point and to the nearest negative data point is maximum (shown as dotted line).

Mathematically, SVM finds the weight vector w (bias included) such that

If the labels(y) of +ve class and -ve class are +1 and -1 respectively, then SVM finds w such that

• If a data point is on the correct side of the hyperplane (correctly classified) then

• If a data point is on the wrong side (miss classified) then

So the loss for a data point, which is a measure of miss classification can be written as

Regularization

If a weight vector w correctly classifies the data (X) then any multiple of these weight vector λw where λ>1 will also correctly classifies the data ( zero loss). This is because the transformation λW stretches all score magnitudes and hence also their absolute differences. L2 regularization penalizes the large weights by adding the regularization loss to the hinge loss.

For example, if x=[1,1,1,1] and two weight vectors w1=[1,0,0,0], w2=[0.25,0.25,0.25,0.25]. Then dot(W1,x) =dot(w2,x) =1 i.e. both the weight vectors lead to the same dot product and hence same hinge loss. But the L2 penalty of w1 is 1.0 while the L2 penalty of w2 is only 0.25. Hence L2 regularization prefers w2 over w1. The classifier is encouraged to take into account all input dimensions to small amounts rather than a few input dimensions and very strongly. This improve the generalization of the model and lead to less overfitting.

L2 penalty leads to the max margin property in SVMs. If the SVM is expressed as an optimization problem then the generalized Lagrangian form for the constrained quadratic optimization problem is as below

Now that we know the loss function of linear SVM we can use gradient decent (or other optimizers) to find the weight vectors which minimizes the loss.

Code

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y = iris.target[iris.target != 2]

# Change labels to +1 and -1 
y = np.where(y==1, y, -1)

# Linear Model with L2 regularization
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))

# Hinge loss
def hinge_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

# Train the model
model.compile(optimizer='adam', loss=hinge_loss)
model.fit(X, y,  epochs=50000, verbose=False)

# Plot the learned decision boundary 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show()

SVM can also be expressed as a constrained quadratic optimization problem. The advantage of this formulation is that we can use the kernel trick to classify non linearly separable data (using different kernels). LIBSVM implements the Sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs).

Code

from sklearn.svm import SVC
# SVM with linear kernel
clf = SVC(kernel='linear')
clf.fit(X, y) 

# Plot the learned decision boundary 
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.show() 

Finally

The Linear SVM model using tf which you can use for your problem statement is

# Prepare Data 
# 10 Binary features
df = pd.DataFrame(np.random.randint(0,2,size=(1000, 10)))
# 1 floating value feature 
df[11] = np.random.uniform(0,100000, size=(1000))
# True Label 
df[12] = pd.DataFrame(np.random.randint(0, 2, size=(1000)))

# Convert data to zero mean unit variance 
scalar = StandardScaler().fit(df[df.columns.drop(12)])
X = scalar.transform(df[df.columns.drop(12)])
y = np.array(df[12])

# convert label to +1 and -1. Needed for hinge loss
y = np.where(y==1, +1, -1)

# Model 
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(1, activation='linear', 
                                kernel_regularizer=tf.keras.regularizers.l2()))
# Hinge Loss
def my_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

# Train model 
model.compile(optimizer='adam', loss=my_loss)
model.fit(X, y,  epochs=100, verbose=True)

K-Fold cross validation and making predictions

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import KFold
from sklearn.metrics import roc_curve, auc

# Load Data
iris = datasets.load_iris()
X = iris.data[:, :2][iris.target != 2]
y_ = iris.target[iris.target != 2]

# Change labels to +1 and -1 
y = np.where(y_==1, +1, -1)


# Hinge loss
def hinge_loss(y_true, y_pred):    
    return tf.maximum(0., 1- y_true*y_pred)

def get_model():
    # Linear Model with L2 regularization
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(1, activation='linear', kernel_regularizer=tf.keras.regularizers.l2()))
    model.compile(optimizer='adam', loss=hinge_loss)
    return model

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)


kf = KFold(n_splits=2, shuffle=True)

# K Fold cross validation
best = (None, -1)

for i, (train_index, test_index) in enumerate(kf.split(X)):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model = get_model()
    model.fit(X_train, y_train, epochs=5000, verbose=False, batch_size=128)
    y_pred = model.predict_classes(X_test)
    val = roc_auc_score(y_test, y_pred)    
    print ("CV Fold {0}: AUC: {1}".format(i+1, auc))
    if best[1] < val:
        best = (model, val)

# ROC Curve using the best model
y_score = predict(best[0], X)
fpr, tpr, _ = roc_curve(y_, y_score)
roc_auc = auc(fpr, tpr)
print (roc_auc)

# Plot ROC
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()

# Make predictions
y_score = predict_class(best[0], X)

Making predictions

Since the output of the model is linear we have to normalize it to probabilities to make predictions. If it is a binary classification we can use sigmoid of if it is a multiclass classification then we can use softmax. Below code is for binary classification

predict = lambda model, x : sigmoid(model.predict(x).reshape(-1))
predict_class = lambda model, x : np.where(predict(model, x)>0.5, 1, 0)

References

  1. CS231n

Update 1:

To made the code compatible with tf 2.0 the datatype of y should be same as X. To do this, after line y = np.where(..... add the line y = y.astype(np.float64).

这篇关于使用Tensorflow的LinearClassifier和Panda的数据框构建SVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆