数据框成numpy数组，其值以逗号分隔 [英] Dataframe into numpy array with values comma seperated

查看：366 发布时间：2020/5/24 1:19:33 python arrays pandas numpy

本文介绍了数据框成numpy数组，其值以逗号分隔的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经将一个csv(\ t分隔)读入一个Dataframe中，现在需要采用numpy数组格式进行聚类，而无需更改类型

I've read a csv (which is \t seperated) into a Dataframe, which is now needed to be in a numpy array format for clustering without changing type

到目前为止，根据尝试的引用(如下)，我未能按要求获取输出.我尝试获取的两列值位于int64/float64中，如下所示:

So far as per tried references (below) I've failed to get the output as required. The two column's values I'm trying to fetch are in int64 / float64, as below

         uid   iid       rat
0        196   242  3.000000
1        186   302  3.000000
2         22   377  1.000000

我暂时只对 iid 和 rat 感兴趣，并将其传递给Kmeans.fit()方法，而对于EPSILON来说也是如此.我需要以下格式的

I'm intrested in only iid and rat for the moment, and to pass it to Kmeans.fit() method and that too not with EPSILON in it. I need it in following format

期望的格式

[[242, 3.000000],
[302, 3.000000],
[22, 1.000000]]

尝试失败

X = values[:, 1:2]
Y = values[:, 2:3]
someArray = np.array([X,Y])
print someArray

并且不会在执行时告别

[[[  2.42000000e+02]
  [  3.02000000e+02]
  [  3.77000000e+02]
  ..., 
  [  1.35200000e+03]
  [  1.62600000e+03]
  [  1.65900000e+03]]
 [[  3.00000000e+00]
  [  3.00000000e+00]
  [  1.00000000e+00]
  ..., 
  [  1.00000000e+00]
  [  1.00000000e+00]
  [  1.00000000e+00]]]

到目前为止没有帮助的参考

This one
This two
This three
This four

编辑1

尝试了np_df = np.genfromtxt('AllData.csv', delimiter='\t', unpack=True)并获得了

[[             nan   1.96000000e+02   1.86000000e+02 ...,   4.79000000e+02
    4.79000000e+02   4.79000000e+02]
 [             nan   2.42000000e+02   3.02000000e+02 ...,   1.36000000e+03
    1.39400000e+03   1.65200000e+03]
 [             nan   3.00000000e+00   3.00000000e+00 ...,   2.00000000e+00
    1.92803605e+00   1.00000000e+00]]

推荐答案

似乎您需要 read_csv 首先用于DataFrame，首先仅过滤第二和第三列，然后通过

It seems you need read_csv for DataFrame first with filter only second and third column first and then convert to numpy array by values: import pandas as pd from sklearn.cluster import KMeans from pandas.compat import StringIO

temp=u"""col,iid,rat
4,1,0
5,2,4
6,3,3
7,4,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), usecols = [1,2])
print (df)
   iid  rat
0    1    0
1    2    4
2    3    3
3    4    1

X = df.values 
print (X)
[[1 0]
 [2 4]
 [3 3]
 [4 1]]

kmeans = KMeans(n_clusters=2)
a = kmeans.fit(X)
print (a)
KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=2, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

这篇关于数据框成numpy数组，其值以逗号分隔的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据框成numpy数组，其值以逗号分隔 [英] Dataframe into numpy array with values comma seperated

问题描述

编辑1

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

数据框成numpy数组，其值以逗号分隔 [英] Dataframe into numpy array with values comma seperated

问题描述

编辑1

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭