使用pandas数据框中的一行而无需进行链索引编制(不应对仅索引编制) [英] Work with a row in a pandas dataframe without incurring chain indexing (not coping just indexing)
问题描述
我的数据组织在一个数据框中:
My data is organized in a dataframe:
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
看起来像这样(只有更大):
Which looks like this (only much bigger):
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
我的算法遍历此表行并执行一组操作.
My algorithm loops through this table rows and performs a set of operations.
出于清洁/懒惰的考虑,我希望在每次迭代时只处理一行,而无需键入df.loc['row index', 'column name']
来获取每个单元格值
For cleaness/lazyness sake, I would like to work on a single row at each iteration without typing df.loc['row index', 'column name']
to get each cell value
我尝试遵循正确的样式例如:
row_of_interest = df.loc['R2', :]
但是,我仍然会收到警告:
However, I still get the warning when I do:
row_of_interest['Col2'] = row_of_interest['Col2'] + 1000
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
它无法正常工作(按我的预期),正在制作副本
And it is not working (as I intended) it is making a copy
print df
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
关于正确方法的任何建议吗?还是我应该坚持直接使用数据框?
Any advice on the proper way to do it? Or should I just stick to work with the data frame directly?
使用提供的答复,从代码中删除了警告,但未修改原始数据框:感兴趣的行" Series
是副本,并非原始数据框的一部分.例如:
Using the replies provided the warning is removed from the code but the original dataframe is not modified: The "row of interest" Series
is a copy not part of the original dataframe. For example:
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
row_of_interest = df.loc['R2']
row_of_interest.is_copy = False
new_cell_value = row_of_interest['Col2'] + 1000
row_of_interest['Col2'] = new_cell_value
print row_of_interest
Col1 5
Col2 1020
Col3 50
Col4 BBB
Name: R2, dtype: object
print df
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
这是我要复制的功能的示例.在python中,列表列表如下所示:
This is an example of the functionality I would like to replicate. In python a list of lists looks like:
a = [[1,2,3],[4,5,6]]
现在我可以创建一个标签"
Now I can create a "label"
b = a[0]
如果我更改b中的条目:
And if I change an entry in b:
b[0] = 7
a和b都改变了.
print a, b
[[7,2,3],[4,5,6]], [7,2,3]
可以在标有熊猫系列之一的熊猫数据框之间复制这种行为吗?
Can this behavior be replicated between a pandas dataframe labeling one of its rows a pandas series?
推荐答案
这应该有效:
row_of_interest = df.loc['R2', :]
row_of_interest.is_copy = False
row_of_interest['Col2'] = row_of_interest['Col2'] + 1000
设置.is_copy = False
是诀窍
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
row_of_interest = df.loc['R2']
row_of_interest.is_copy = False
new_cell_value = row_of_interest['Col2'] + 1000
row_of_interest['Col2'] = new_cell_value
print row_of_interest
df.loc['R2'] = row_of_interest
print df
df:
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 1020 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
这篇关于使用pandas数据框中的一行而无需进行链索引编制(不应对仅索引编制)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!