负载csv文件通过名称numpy的和接入列 [英] load csv file to numpy and access columns by name

查看：261 发布时间：2016/5/31 20:05:00 python arrays csv numpy

本文介绍了负载csv文件通过名称numpy的和接入列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 CSV 像头文件：

鉴于这种 test.csv 文件：

 A，B，C，D，E，F，时间戳
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12

我只是希望将其加载为一个矩阵/ ndarray有3行7列，也是我要访问的列向量从给定的列名。如果我使用 genfromtxt （如下图所示），我得到3行（每行一个），并没有列的ndarray。

  R = np.genfromtxt（'test.csv'，分隔符=''，DTYPE =无，名= TRUE）
打印内容R
打印r.shape[（611.88243，9089.5601000000006，5133.0，864.07514000000003，1715.3747599999999，765.22776999999996，1291111964948.0）
 （611.88243，9089.5601000000006，5133.0，864.07514000000003，1715.3747599999999，765.22776999999996，1291113113366.0）
 （611.88243，9089.5601000000006，5133.0，864.07514000000003，1715.3747599999999，765.22776999999996，1291120650486.0）
（3日）

我可以从列名像这样得到的列向量

 打印内容R ['A']
  [611.88243 611.88243 611.88243]

如果，我用 load.txt 然后我得到3行和列7数组，但不能访问列使用列名称（如下图所示）。

  numpy.loadtxt（开放（test.csv，RB），分隔符=，skiprows = 1）

我得到

  [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12]
    [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12]
    [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12]

是否有 Python的任何方法我能实现这两个要求的合（按coluumn名称访问列像np.genfromtext，有一个矩阵像np.loadtxt ）？

解决方案

单独使用numpy的，你看这些选项是你唯一的选择。既可以使用均质DTYPE的ndarray具有形状（3,7），或者结构化阵列（可能）异质DTYPE和形状（3）。

如果你真的想与标记的列和形状（3,7），（很多好东西的）的数据结构，你可以使用
大熊猫数据框：

 在[67]：进口大熊猫作为PD
在[68]：DF = pd.read_csv（数据）; DF
出[68]：
           A B C D E F时间戳
0 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291112e + 12
1 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291113e + 12
2 611.88243 9089.5601 5133 864.07514 1715.37476 765.22777 1.291121e + 12在[70]：DF ['A']
出[70]：
0 611.88243
1 611.88243
2 611.88243
名称：A，DTYPE：float64在[71]：df.shape
出[71]：（3，7）

一个纯numpy的/ Python的替代方法是使用一个字典的列名映射到指数：

 导入numpy的是NP
导入CSV
开放（文件名）为f：
    读卡器= csv.reader（F）
    列=下一个（阅读器）
    colmap =字典（邮政编码（列，范围（LEN（列））））ARR = np.matrix（np.loadtxt（文件名，定界符=，，skiprows = 1））
打印（ARR [:, colmap ['A']）

收益

  [611.88243]
 [611.88243]
 [611.88243]

这样，改编是一个numpy的矩阵，用列，可以通过标签使用语法来访问

 改编[:, colmap [COLUMN_NAME]

I have a csv file with headers like:

Given this test.csv file:

"A","B","C","D","E","F","timestamp"
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12
611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12

I simply want to load it as a matrix/ndarray with 3 rows and 7 columns and also I want to access the column vectors from a given column name. If I use genfromtxt (like shown below) I get an ndarray with 3 rows (one per line) and no columns.

r = np.genfromtxt('test.csv',delimiter=',',dtype=None, names=True)
print r
print r.shape

[ (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291111964948.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291113113366.0)
 (611.88243, 9089.5601000000006, 5133.0, 864.07514000000003, 1715.3747599999999, 765.22776999999996, 1291120650486.0)]
(3,)

I can get column vectors from column names like this:

print r['A']
  [ 611.88243  611.88243  611.88243]

If, I use load.txt then I get the array with 3 rows and 7 columns but cannot access columns by using the column names (like shown below).

numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)

I get

  [ [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291111964948E12]
    [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291113113366E12]
    [611.88243,9089.5601,5133.0,864.07514,1715.37476,765.22777,1.291120650486E12] ]

Is there any approach in Python that I can achieve both the requirements together (access columns by coluumn name like np.genfromtext and have a matrix like np.loadtxt)?

解决方案

Using numpy alone, the options you show are your only options. Either use an ndarray of homogeneous dtype with shape (3,7), or a structured array of (potentially) heterogenous dtype and shape (3,).

If you really want a data structure with labeled columns and shape (3,7), (and lots of other goodies) you could use a pandas DataFrame:

In [67]: import pandas as pd
In [68]: df = pd.read_csv('data'); df
Out[68]: 
           A          B     C          D           E          F     timestamp
0  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291112e+12
1  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291113e+12
2  611.88243  9089.5601  5133  864.07514  1715.37476  765.22777  1.291121e+12    

In [70]: df['A']
Out[70]: 
0    611.88243
1    611.88243
2    611.88243
Name: A, dtype: float64

In [71]: df.shape
Out[71]: (3, 7)

A pure NumPy/Python alternative would be to use a dict to map the column names to indices:

import numpy as np
import csv
with open(filename) as f:
    reader = csv.reader(f)
    columns = next(reader)
    colmap = dict(zip(columns, range(len(columns))))

arr = np.matrix(np.loadtxt(filename, delimiter=",", skiprows=1))
print(arr[:, colmap['A']])

yields

[[ 611.88243]
 [ 611.88243]
 [ 611.88243]]

This way, arr is a NumPy matrix, with columns that can be accessed by label using the syntax

arr[:, colmap[column_name]]

这篇关于负载csv文件通过名称numpy的和接入列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

负载csv文件通过名称numpy的和接入列 [英] load csv file to numpy and access columns by name

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

负载csv文件通过名称numpy的和接入列 [英] load csv file to numpy and access columns by name

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭