从现有数据框的某些列创建新的 pandas 数据框 [英] Creating new pandas dataframe from certain columns of existing dataframe
问题描述
我已阅读将csv文件加载到pandas数据框中,并希望对该数据框中进行一些简单的操作.我无法弄清楚如何根据原始数据框中的选定列创建新的数据框.我的尝试:
I have read loaded a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:
names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']
我想用原始数据帧中的A和D列创建一个新的数据帧.
I would like to create a new dataframe with the columns A and D from the original dataframe.
推荐答案
它称为subset
-在[]
中传递的列列表:
It is called subset
- passed list of columns in []
:
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset[['A','D']]
与以下相同:
new_dataset = dataset.loc[:, ['A','D']]
如果仅需要过滤输出,则将参数usecols
添加到 read_csv
:
If need only filtered output add parameter usecols
to read_csv
:
new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])
如果仅使用:
new_dataset = dataset[['A','D']]
并使用一些数据操作,显然可以得到:
and use some data manipulation, obviously get:
试图在DataFrame的切片副本上设置一个值.
尝试改用.loc [row_indexer,col_indexer] = value
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
如果稍后在new_dataset
中修改值,您会发现修改不会传播回原始数据(dataset
),并且Pandas会发出警告.
If you modify values in new_dataset
later you will find that the modifications do not propagate back to the original data (dataset
), and that Pandas does warning.
As pointed EdChum add copy
for remove warning:
new_dataset = dataset[['A','D']].copy()
这篇关于从现有数据框的某些列创建新的 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!