如何使用np.genfromtxt并填写缺少的列? [英] How to use np.genfromtxt and fill in missing columns?
问题描述
我正在尝试使用np.genfromtxt
将看起来像这样的数据加载到矩阵中:
I am trying to use np.genfromtxt
to load a data that looks something like this into a matrix:
0.79 0.10 0.91 -0.17 0.10 0.33 -0.90 0.10 -0.19 -0.00 0.10 -0.99 -0.06 0.10 -0.42 -0.66 0.10 -0.79 0.21 0.10 0.93 0.79 0.10 0.91 -0.72 0.10 0.25 0.64 0.10 -0.27 -0.36 0.10 -0.66 -0.52 0.10 0.92 -0.39 0.10 0.43 0.63 0.10 0.25 -0.58 0.10 -0.03 0.59 0.10 0.02 -0.69 0.10 0.79 0.30 0.10 0.09 0.70 0.10 0.67 -0.04 0.10 -0.65 -0.07 0.10 0.70 -0.06 0.10 0.08 7 566 112 32 163 615 424 543 424 422 490 47 499 595 94 515 163 535
0.79 0.10 0.91 -0.17 0.10 0.33 -0.90 0.10 -0.19 -0.00 0.10 -0.99 -0.06 0.10 -0.42 -0.66 0.10 -0.79 0.21 0.10 0.93 0.79 0.10 0.91 -0.72 0.10 0.25 0.64 0.10 -0.27 -0.36 0.10 -0.66 -0.52 0.10 0.92 -0.39 0.10 0.43 0.63 0.10 0.25 -0.58 0.10 -0.03 0.59 0.10 0.02 -0.69 0.10 0.79 0.30 0.10 0.09 0.70 0.10 0.67 -0.04 0.10 -0.65 -0.07 0.10 0.70 -0.06 0.10 0.08 263 112 32 30 163 366 543 457 424 422 556 55 355 485 112 515 163 509 112 535
0.79 0.10 0.91 -0.17 0.10 0.33 -0.90 0.10 -0.19 -0.00 0.10 -0.99 -0.06 0.10 -0.42 -0.66 0.10 -0.79 0.21 0.10 0.93 0.79 0.10 0.91 -0.72 0.10 0.25 0.64 0.10 -0.27 -0.36 0.10 -0.66 -0.52 0.10 0.92 -0.39 0.10 0.43 0.63 0.10 0.25 -0.58 0.10 -0.03 0.59 0.10 0.02 -0.69 0.10 0.79 0.30 0.10 0.09 0.70 0.10 0.67 -0.04 0.10 -0.65 -0.07 0.10 0.70 -0.06 0.10 0.08 311 112 32 543 457 77 639 355 412 422 509 112 535 163 77 125 30 412 422 556 55 355 485 112 515
假设我要将数据导入大小为(4,5)的矩阵中.如果不是所有行都有5列,则在导入矩阵时,应使用"替换那些没有5行的列.例如,如果数据更简单,则看起来像这样:
Suppose I want to import data into a matrix of size (4, 5). If not all rows have 5 columns, when it imports the matrix it should replace those columns without 5 rows with "". For example, if the data were simpler, it would look like this:
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
16,"","","",""
因此,我希望导入的列数与最大行列数相匹配,如果某行没有那么多列,则将其填充为".我正在读取一个名为"data.txt"的文件.
Thus, I want the number of columns to be imported to match that of the max row column count, and if a row doesn't have that many columns, it will fill it with "". I am reading from a file called "data.txt".
这是我到目前为止尝试过的:
This is what I have tried so far:
trainData = np.genfromtxt('data.txt', usecols = range(0, 5), invalid_raise=False, missing_values = "", filling_values="")
但是,它给出了错误提示:
However, it gives errors saying:
Line #4 (got 1 columns instead of 5)
我该如何解决?
谢谢!
推荐答案
我设法找到了解决方案.
I managed to figure out a solution.
df = pandas.DataFrame([line.strip().split() for line in open('data.txt', 'r')])
data = np.array(df)
这篇关于如何使用np.genfromtxt并填写缺少的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!