pandas 将列添加到未引用的数据框 [英] Pandas adds columns to non-referenced dataframe

查看:65
本文介绍了 pandas 将列添加到未引用的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个人已经让我震惊了好几个小时.也许我缺少一些神秘的陷阱",但这必须是难以置信的违反直觉的.

This one has been blowing my mind for hours. Perhaps there's some arcane 'gotcha' I'm missing but it must be incredibly counter-intuitive.

'trial_unq'是两列数据帧,'trial_unq2'是相同的副本,for循环遍历'unique_in'中的所有字符串.如果trial_unq文本中的"unique_in"少于250次,则将一个布尔列插入到trial_unq的末尾.如果untriue_in在trial_unq的文本中超过10000次,则将布尔列n插入到trial_unq2的末尾.

'trial_unq' is a two column dataframe and 'trial_unq2' is an identical copy, the for loop loops over all strings in 'unique_in'. If a 'unique_in' is in the text of trial_unq less 250 times, then a boolean column is inserted to the end of trial_unq. If a unqiue_in is in the text of trial_unq more than 10000 times, then the boolean columnn is inserted into the end of trial_unq2.

trial_unq2 = trial_unq

for i in range(len(unique_in)):#for each individual word
    unq_count = trial_unq.brief_title.str.contains(unique_in[i]).sum()#count trial occurances
    print(unique_in[i], ' ', unq_count)
    if unq_count < 280 and unq_count > 0:
        colname = unique_in[i]
        colpos = len(trial_unq.columns)
        boolcol = trial_unq.brief_title.str.contains(unique_in[i])
        trial_unq.insert(colpos, colname, boolcol) 
    if unq_count > 10000:
        colname2 = unique_in[i]
        colpos2 = len(trial_unq2.columns)
        boolcol2 = trial_unq2.brief_title.str.contains(unique_in[i])
        trial_unq2.insert(colpos2, colname2, boolcol2) 

print(trial_unq.columns)
print(trial_unq2.columns)

输出

['depressive', 'disorder', 'depressive disorder', 'therapy']
depressive   257
disorder   2190
depressive disorder   167
therapy   12236
Index(['NCT', 'brief_title', 'depressive', 'depressive disorder', 'therapy'], dtype='object')
Index(['NCT', 'brief_title', 'depressive', 'depressive disorder', 'therapy'], dtype='object')

从输出中可以清楚地看到,小计数trial_unq数据帧和大计数trial_unq2数据帧都添加了所有三列.

From the output it is clear that both the the small count trial_unq dataframe and the larger count trial_unq2 dataframe have all three columns added to them.

推荐答案

在Python中,多个名称可以引用相同的对象,例如

In Python, several names can refer to the same object, e.g.

l1 = [1, 2, 3]
l2 = l1  # now both, l1 and l2 refer to the same object!
l2[1] = 100

现在l1l2都看起来像这样:

now both, l1 and l2 look like this:

[1, 100, 3]

您的两个数据框也会发生同样的情况.

Same happens with your two dataframes.

在这种情况下,您可以简单地使用.copy()

In this case, you can simply use .copy()

l3 = l1.copy()
l3[1] = 0

l1
[1, 100, 3]

l3
[1, 0, 3]

因此,要解决您的问题,您需要做的是:

So, to fix your issue, all you need is:

trial_unq2 = trial_unq.copy()

这篇关于 pandas 将列添加到未引用的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆