将CSV索引到数量不一致的样本中以进行逻辑回归 [英] Indexing a CSV running into inconsistent number of samples for logistic regression

查看:121
本文介绍了将CSV索引到数量不一致的样本中以进行逻辑回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前正在为CSV编制索引,且值低于以下,并遇到错误:

I'm currently indexing a CSV with values below and running into the error:

ValueError:找到数量不一致的输入变量 样本:[1,514]

ValueError: Found input variables with inconsistent numbers of samples: [1, 514]

它正在检查它是否为514列的1行,这强调我调用了一个特定的参数错误,或者是由于我删除了NaN(大多数数据将默认为?)的原因

It's examining it as 1 row with 514 columns which emphasize that I have called a specific parameter wrong or is it due to me removing NaN's (which most of the data would default as?)

"Classification","DGMLEN","IPLEN","TTL","IP"
"1","0.000000","192.168.1.5","185.60.216.35","TLSv1.2"
"2","0.000160","192.168.1.5","185.60.216.35","TCP"
"3","0.000161","192.168.1.5","185.60.216.35","TLSv1.2"


import pandas  
df = pandas.read_csv('wcdemo.csv', header=0,
                  names = ["Classification", "DGMLEN", "IPLEN", "TTL", "IP"], 
                  na_values='.')

df = df.apply(pandas.to_numeric, errors='coerce')
#Data=pd.read_csv ('wcdemo.csv').reset_index()#index_col='false')
feature_cols=['Classification','DGMLEN','IPLEN','IP']

X=df[feature_cols]


    #datanewframe = pandas.Series(['Classification', 'DGMLEN', 'IPLEN', 'TTL', 'IP'], dtype='object')

#df = pandas.read_csv('wcdemo.csv')
#indexed_df = df.set_index(['Classification', 'DGMLEN','IPLEN','TTL','IP']


df['IPLEN'] = pandas.to_numeric(df['IPLEN'], errors='coerce').fillna(0)
df['TTL'] = pandas.to_numeric(df['TTL'], errors='coerce').fillna(0)

#DEFINE X TRAIN
X_train = df['IPLEN']
y_train = df['TTL']

#s = pandas.Series(['Classification', 'DGMLEN', 'IPLEN', 'TTL', 'IP'])

Y=df['TTL'] 

from sklearn.linear_model import LogisticRegression

logreg=LogisticRegression()
logreg.fit(X_train,y_train,).fillna(0.0)

#with the error being triggered here 
logreg.fit(X_train,y_train,).fillna(0.0)

推荐答案

由于X_train中只有1个要素,其当前形状为(n_samples,).但是scikit估计量要求X的形状为(n_samples, n_features).因此,您需要重塑数据.

As there is only 1 feature in your X_train, its current shape is (n_samples,). But scikit estimators require X to be of shape (n_samples, n_features). So you need to reshape your data.

使用此:

logreg.fit(X_train.reshape(-1,1), y_train).fillna(0.0)

这篇关于将CSV索引到数量不一致的样本中以进行逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆