pandas 将列添加到未引用的数据框 [英] Pandas adds columns to non-referenced dataframe
问题描述
这个人已经让我震惊了好几个小时.也许我缺少一些神秘的陷阱",但这必须是难以置信的违反直觉的.
This one has been blowing my mind for hours. Perhaps there's some arcane 'gotcha' I'm missing but it must be incredibly counter-intuitive.
'trial_unq'是两列数据帧,'trial_unq2'是相同的副本,for循环遍历'unique_in'中的所有字符串.如果trial_unq文本中的"unique_in"少于250次,则将一个布尔列插入到trial_unq的末尾.如果untriue_in在trial_unq的文本中超过10000次,则将布尔列n插入到trial_unq2的末尾.
'trial_unq' is a two column dataframe and 'trial_unq2' is an identical copy, the for loop loops over all strings in 'unique_in'. If a 'unique_in' is in the text of trial_unq less 250 times, then a boolean column is inserted to the end of trial_unq. If a unqiue_in is in the text of trial_unq more than 10000 times, then the boolean columnn is inserted into the end of trial_unq2.
trial_unq2 = trial_unq
for i in range(len(unique_in)):#for each individual word
unq_count = trial_unq.brief_title.str.contains(unique_in[i]).sum()#count trial occurances
print(unique_in[i], ' ', unq_count)
if unq_count < 280 and unq_count > 0:
colname = unique_in[i]
colpos = len(trial_unq.columns)
boolcol = trial_unq.brief_title.str.contains(unique_in[i])
trial_unq.insert(colpos, colname, boolcol)
if unq_count > 10000:
colname2 = unique_in[i]
colpos2 = len(trial_unq2.columns)
boolcol2 = trial_unq2.brief_title.str.contains(unique_in[i])
trial_unq2.insert(colpos2, colname2, boolcol2)
print(trial_unq.columns)
print(trial_unq2.columns)
输出
['depressive', 'disorder', 'depressive disorder', 'therapy']
depressive 257
disorder 2190
depressive disorder 167
therapy 12236
Index(['NCT', 'brief_title', 'depressive', 'depressive disorder', 'therapy'], dtype='object')
Index(['NCT', 'brief_title', 'depressive', 'depressive disorder', 'therapy'], dtype='object')
从输出中可以清楚地看到,小计数trial_unq数据帧和大计数trial_unq2数据帧都添加了所有三列.
From the output it is clear that both the the small count trial_unq dataframe and the larger count trial_unq2 dataframe have all three columns added to them.
推荐答案
在Python中,多个名称可以引用相同的对象,例如
In Python, several names can refer to the same object, e.g.
l1 = [1, 2, 3]
l2 = l1 # now both, l1 and l2 refer to the same object!
l2[1] = 100
现在l1
和l2
都看起来像这样:
now both, l1
and l2
look like this:
[1, 100, 3]
您的两个数据框也会发生同样的情况.
Same happens with your two dataframes.
在这种情况下,您可以简单地使用.copy()
In this case, you can simply use .copy()
l3 = l1.copy()
l3[1] = 0
l1
[1, 100, 3]
l3
[1, 0, 3]
因此,要解决您的问题,您需要做的是:
So, to fix your issue, all you need is:
trial_unq2 = trial_unq.copy()
这篇关于 pandas 将列添加到未引用的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!