选择特定的CSV列(过滤)-Python/ pandas [英] Select specific CSV columns (Filtering) - Python/pandas

查看:74
本文介绍了选择特定的CSV列(过滤)-Python/ pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的CSV文件,其中包含100列.为了说明我的问题,我将使用一个非常基本的示例.

I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.

假设我们有一个CSV文件.

Let's suppose that we have a CSV file.

in  value   d     f
0    975   f01    5
1    976   F      4
2    977   d4     1
3    978   B6     0
4    979   2C     0

我要选择一个特定的列.

I want to select a specific columns.

import pandas
data = pandas.read_csv("ThisFile.csv")

为了选择我使用的前两列

In order to select the first 2 columns I used

data.ix[:,:2]

为了选择不同的列,例如第二和第四列.我该怎么办?

In order to select different columns like the 2nd and the 4th. What should I do?

还有另一种方法可以通过重写CSV文件来解决此问题.但这是一个巨大的文件.所以我避免这种方式.

There is another way to solve this problem by re-writing the CSV file. But it's huge file; So I am avoiding this way.

推荐答案

这将选择第二和第四列(因为Python使用基于0的索引):

This selects the second and fourth columns (since Python uses 0-based indexing):

In [272]: df.iloc[:,(1,3)]
Out[272]: 
   value  f
0    975  5
1    976  4
2    977  1
3    978  0
4    979  0

[5 rows x 2 columns]

df.ix可以按位置或标签进行选择. df.iloc始终按位置选择.当按位置索引时,使用df.iloc可以更明确地表明您的意图.由于Pandas不必检查索引是否使用标签,因此速度也较快.

df.ix can select by location or label. df.iloc always selects by location. When indexing by location use df.iloc to signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.

另一种可能性是使用usecols参数:

Another possibility is to use the usecols parameter:

data = pandas.read_csv("ThisFile.csv", usecols=[1,3])

这只会将第二列和第四列加载到data DataFrame中.

This will load only the second and fourth columns into the data DataFrame.

这篇关于选择特定的CSV列(过滤)-Python/ pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆