从现有数据框的某些列创建新的 pandas 数据框 [英] Creating new pandas dataframe from certain columns of existing dataframe

查看:99
本文介绍了从现有数据框的某些列创建新的 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读将csv文件加载到pandas数据框中,并希望对该数据框中进行一些简单的操作.我无法弄清楚如何根据原始数据框中的选定列创建新的数据框.我的尝试:

I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:

names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']

我想用原始数据帧中的A和D列创建一个新的数据帧.

I would like to create a new dataframe with the columns A and D from the original dataframe.

推荐答案

它称为subset-在[]中传递的列列表:

It is called subset - passed list of columns in []:

dataset = pandas.read_csv('file.csv', names=names)

new_dataset = dataset[['A','D']]

与以下相同:

new_dataset = dataset.loc[:, ['A','D']]

如果仅需要过滤输出,则将参数usecols添加到 read_csv :

If need only filtered output add parameter usecols to read_csv:

new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])

如果仅使用:

new_dataset = dataset[['A','D']]

并使用一些数据操作,显然可以得到:

and use some data manipulation, obviously get:

试图在DataFrame的切片副本上设置一个值.
尝试改用.loc [row_indexer,col_indexer] = value

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

如果稍后在new_dataset中修改值,您会发现修改不会传播回原始数据(dataset),并且Pandas会发出警告.

If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data (dataset), and that Pandas does warning.

EdChum 添加 copy 来删除警告:

As pointed EdChum add copy for remove warning:

new_dataset = dataset[['A','D']].copy()

这篇关于从现有数据框的某些列创建新的 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆