负载csv文件通过名称numpy的和接入列 [英] load csv file to numpy and access columns by name
问题描述
我有一个 CSV
像头文件:
鉴于这种 test.csv
文件:
A,B,C,D,E,F,时间戳
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12
我只是希望将其加载为一个矩阵/ ndarray有3行7列,也是我要访问的列向量
从给定的列名
。如果我使用 genfromtxt
(如下图所示),我得到3行(每行一个),并没有列的ndarray。
R = np.genfromtxt('test.csv',分隔符='',DTYPE =无,名= TRUE)
打印内容R
打印r.shape[(611.88243,9089.5601000000006,5133.0,864.07514000000003,1715.3747599999999,765.22776999999996,1291111964948.0)
(611.88243,9089.5601000000006,5133.0,864.07514000000003,1715.3747599999999,765.22776999999996,1291113113366.0)
(611.88243,9089.5601000000006,5133.0,864.07514000000003,1715.3747599999999,765.22776999999996,1291120650486.0)
(3日)
我可以从列名像这样得到的列向量
打印内容R ['A']
[611.88243 611.88243 611.88243]
如果,我用 load.txt
然后我得到3行和列7数组,但不能访问列
使用列
名称(如下图所示)。
numpy.loadtxt(开放(test.csv,RB),分隔符=,skiprows = 1)
我得到
[611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12]
[611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12]
[611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12]
是否有 Python的任何方法
我能实现这两个要求的合(按coluumn名称访问列像np.genfromtext,有一个矩阵像np.loadtxt
)?
单独使用numpy的,你看这些选项是你唯一的选择。既可以使用均质DTYPE的ndarray具有形状(3,7),或者结构化阵列(可能)异质DTYPE和形状(3)。
如果你真的想与标记的列和形状(3,7),(很多好东西的)的数据结构,你可以使用
大熊猫数据框:
在[67]:进口大熊猫作为PD
在[68]:DF = pd.read_csv(数据); DF
出[68]:
A B C D E F时间戳
0 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291112e + 12
1 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291113e + 12
2 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291121e + 12在[70]:DF ['A']
出[70]:
0 611.88243
1 611.88243
2 611.88243
名称:A,DTYPE:float64在[71]:df.shape
出[71]:(3,7)
一个纯numpy的/ Python的替代方法是使用一个字典的列名映射到指数:
导入numpy的是NP
导入CSV
开放(文件名)为f:
读卡器= csv.reader(F)
列=下一个(阅读器)
colmap =字典(邮政编码(列,范围(LEN(列))))ARR = np.matrix(np.loadtxt(文件名,定界符=,,skiprows = 1))
打印(ARR [:, colmap ['A'])
收益
[611.88243]
[611.88243]
[611.88243]
这样,改编
是一个numpy的矩阵,用列,可以通过标签使用语法来访问
改编[:, colmap [COLUMN_NAME]
I have a csv
file with headers like:
Given this test.csv
file:
"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12
I simply want to load it as a matrix/ndarray with 3 rows and 7 columns and also I want to access the column vectors
from a given column name
. If I use genfromtxt
(like shown below) I get an ndarray with 3 rows (one per line) and no columns.
r = np.genfromtxt('test.csv',delimiter=',',dtype=None, names=True)
print r
print r.shape
[ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0)
(611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0)
(611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)]
(3,)
I can get column vectors from column names like this:
print r['A']
[ 611.88243 611.88243 611.88243]
If, I use load.txt
then I get the array with 3 rows and 7 columns but cannot access columns
by using the column
names (like shown below).
numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)
I get
[ [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12]
[611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12]
[611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12] ]
Is there any approach in Python
that I can achieve both the requirements together (access columns by coluumn name like np.genfromtext and have a matrix like np.loadtxt
)?
Using numpy alone, the options you show are your only options. Either use an ndarray of homogeneous dtype with shape (3,7), or a structured array of (potentially) heterogenous dtype and shape (3,).
If you really want a data structure with labeled columns and shape (3,7), (and lots of other goodies) you could use a pandas DataFrame:
In [67]: import pandas as pd
In [68]: df = pd.read_csv('data'); df
Out[68]:
A B C D E F timestamp
0 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291112e+12
1 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291113e+12
2 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291121e+12
In [70]: df['A']
Out[70]:
0 611.88243
1 611.88243
2 611.88243
Name: A, dtype: float64
In [71]: df.shape
Out[71]: (3, 7)
A pure NumPy/Python alternative would be to use a dict to map the column names to indices:
import numpy as np
import csv
with open(filename) as f:
reader = csv.reader(f)
columns = next(reader)
colmap = dict(zip(columns, range(len(columns))))
arr = np.matrix(np.loadtxt(filename, delimiter=",", skiprows=1))
print(arr[:, colmap['A']])
yields
[[ 611.88243]
[ 611.88243]
[ 611.88243]]
This way, arr
is a NumPy matrix, with columns that can be accessed by label using the syntax
arr[:, colmap[column_name]]
这篇关于负载csv文件通过名称numpy的和接入列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!