训练TensorFlow以预测csv文件中的列 [英] Training TensorFlow for Predicting a Column in a csv file

查看:155
本文介绍了训练TensorFlow以预测csv文件中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有在csv文件中结构化的数据。我希望能够预测给定所有其他列的列1将是1还是0。我该如何训练程序(最好使用神经网络)以使用所有给定的数据以做出预测。有没有人可以给我看的代码?我试过喂它 numpy.ndarray FIF0Que (对不起,如果我拼写错了)和 DataFrame ;还没有任何工作。这是我正在运行的代码,直到出现错误-

I have data that is structured in a csv file. I want to be able to predict whether column 1 is going to be a 1 or a 0 given all other columns. How do I go about training the program (preferably using Neural Networks) to use all of the given data in order to make that prediction. Is there code that someone can show me? I've tried feeding it numpy.ndarray, FIF0Que (sorry if I spelt that wrong), and a DataFrame; nothing has worked yet. Here is the code I am running until I get the error-

import tensorflow as tf
import numpy as np
from numpy import genfromtxt

data = genfromtxt('cs-training.csv',delimiter=',')

x = tf.placeholder("float", [None, 11])
W = tf.Variable(tf.zeros([11,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,2])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

init = tf.initialize_all_variables()

sess = tf.Session()
sess.run(init)

for i in range(1000):
    batch_xs, batch_ys = data.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

这时我遇到此错误-

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-128-b48741faa01b> in <module>()
      1 for i in range(1000):
----> 2     batch_xs, batch_ys = data.train.next_batch(100)
      3     sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

AttributeError: 'numpy.ndarray' object has no attribute 'train'

任何帮助将不胜感激。我需要做的只是预测第1列将是1还是0。即使您所做的只是让我克服了这个错误,我也应该可以从那里得到它。

Any help is greatly appreciated. All I need to do is predict whether column 1 is going to be a 1 or a 0. Even if all you do is get me past this one error, I should be able to take it from there.

编辑:这就是我打印出来的csv的样子。

This is what the csv looks like when I print it out.

[[1,0.766126609,45,2,0.802982129,9120,13,0,6,0,2],
[0,0.957151019,40,0,0.121876201,2600,4,0,0,0,1],
[0,0.65818014,38,1,0.085113375,3042,2,1,0,0,0],
[0,0.233809776,30,0,0.036049682,3300,5,0,0,0,0]]

我正在尝试预测第一列。

I'm trying to predict the first column.

推荐答案

以下内容从CSV文件中读取并构建一个tensorflow程序。该示例使用Iris数据集,因为那可能是一个更有意义的示例。但是,它可能也适用于您的数据。

The following reads from a CSV file and builds a tensorflow program. The example uses the Iris data set, since that maybe a more meaningful example. However, it should probably work for your data as well.

请注意,第一列为[0,1或2],因为存在3种虹膜。

Please note, the first column will be [0,1 or 2], since there are 3 species of iris.

#!/usr/bin/env python
import tensorflow as tf
import numpy as np
from numpy import genfromtxt

# Build Example Data is CSV format, but use Iris data
from sklearn import datasets
from sklearn.cross_validation import train_test_split
import sklearn
def buildDataFromIris():
    iris = datasets.load_iris()
    X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.33, random_state=42)
    f=open('cs-training.csv','w')
    for i,j in enumerate(X_train):
        k=np.append(np.array(y_train[i]),j   )
        f.write(",".join([str(s) for s in k]) + '\n')
    f.close()
    f=open('cs-testing.csv','w')
    for i,j in enumerate(X_test):
        k=np.append(np.array(y_test[i]),j   )
        f.write(",".join([str(s) for s in k]) + '\n')
    f.close()


# Convert to one hot
def convertOneHot(data):
    y=np.array([int(i[0]) for i in data])
    y_onehot=[0]*len(y)
    for i,j in enumerate(y):
        y_onehot[i]=[0]*(y.max() + 1)
        y_onehot[i][j]=1
    return (y,y_onehot)


buildDataFromIris()


data = genfromtxt('cs-training.csv',delimiter=',')  # Training data
test_data = genfromtxt('cs-testing.csv',delimiter=',')  # Test data

x_train=np.array([ i[1::] for i in data])
y_train,y_train_onehot = convertOneHot(data)

x_test=np.array([ i[1::] for i in test_data])
y_test,y_test_onehot = convertOneHot(test_data)


#  A number of features, 4 in this example
#  B = 3 species of Iris (setosa, virginica and versicolor)
A=data.shape[1]-1 # Number of features, Note first is y
B=len(y_train_onehot[0])
tf_in = tf.placeholder("float", [None, A]) # Features
tf_weight = tf.Variable(tf.zeros([A,B]))
tf_bias = tf.Variable(tf.zeros([B]))
tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

# Training via backpropagation
tf_softmax_correct = tf.placeholder("float", [None,B])
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))

# Train using tf.train.GradientDescentOptimizer
tf_train_step = tf.train.GradientDescentOptimizer(0.01).minimize(tf_cross_entropy)

# Add accuracy checking nodes
tf_correct_prediction = tf.equal(tf.argmax(tf_softmax,1), tf.argmax(tf_softmax_correct,1))
tf_accuracy = tf.reduce_mean(tf.cast(tf_correct_prediction, "float"))

# Initialize and run
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

print("...")
# Run the training
for i in range(30):
    sess.run(tf_train_step, feed_dict={tf_in: x_train, tf_softmax_correct: y_train_onehot})

# Print accuracy
    result = sess.run(tf_accuracy, feed_dict={tf_in: x_test, tf_softmax_correct: y_test_onehot})
    print "Run {},{}".format(i,result)


"""
Below is the ouput
  ...
  Run 0,0.319999992847
  Run 1,0.300000011921
  Run 2,0.379999995232
  Run 3,0.319999992847
  Run 4,0.300000011921
  Run 5,0.699999988079
  Run 6,0.680000007153
  Run 7,0.699999988079
  Run 8,0.680000007153
  Run 9,0.699999988079
  Run 10,0.680000007153
  Run 11,0.680000007153
  Run 12,0.540000021458
  Run 13,0.419999986887
  Run 14,0.680000007153
  Run 15,0.699999988079
  Run 16,0.680000007153
  Run 17,0.699999988079
  Run 18,0.680000007153
  Run 19,0.699999988079
  Run 20,0.699999988079
  Run 21,0.699999988079
  Run 22,0.699999988079
  Run 23,0.699999988079
  Run 24,0.680000007153
  Run 25,0.699999988079
  Run 26,1.0
  Run 27,0.819999992847
  ...

 Ref:
 https://gist.github.com/mchirico/bcc376fb336b73f24b29#file-tensorflowiriscsv-py
"""

我希望这会有所帮助。

这篇关于训练TensorFlow以预测csv文件中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆