Dropna没有删除,fillna没有填充,我的列表理解无法理解如何摆脱nans(python) [英] Dropna isn't dropping, fillna isn't filling and my list comprehension can't comprehend how to get rid of nans (python)

查看:619
本文介绍了Dropna没有删除,fillna没有填充,我的列表理解无法理解如何摆脱nans(python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在某些情况下,我要将数据从一个数据帧添加到另一个数据帧,但无法摆脱nan值.

I have a case where I am adding data from one dataframe to another, but I can't rid of the nan values.

示例数据

df1 = pd.DataFrame(
        {
        'Journal' : ['US Drug standards.','Acta veterinariae.','Bulletin of big toe science.','The UK journal of dermatology.'],
        'ISSN_1': ['0096-0225', '0567-8315','0007-4977','0007-0963'],
        'ISSN_2': ['0096-0225','nan','0007-4977','0007-0963'],
        'ISSN_3': ['nan','1820-7448','nan','0366-077X'],
        'ISSN_4': ['nan','0567-8315','nan','1365-2133']
        }
        )

df1 = df1[['Journal'] + df1.columns[:-1].tolist()]
df2 = pd.DataFrame(
    {
    'Full Journal Title': ['Drug standards.','Acta veterinaria.','Bulletin of marine science.','The British journal of dermatology.'],
    'Abbreviated Title': ['DStan','Avet','Marsci','BritSkin'],
    'Total Cites': ['223','444','324','166'],
    'ISSN': ['0096-0225','0567-8315','0007-4977','0007-0963']                           
     })

#this makes list of ISSNs from df1 to combine into a column to add to df2
xx=df1.set_index('Journal').values.tolist() 
df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )
df2=df2[df2.New.apply(len)>0]
df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

我尝试了替换: df2 = df2.replace(np.nan,'',regex = True)

I have tried a replace: df2 = df2.replace(np.nan, '', regex=True)

我尝试过dropna: 打印(df2.dropna(subset = ['New']))

I have tried dropna: print(df2.dropna(subset=['New']))

我尝试了fillna: 打印(df2.fillna(''))

I have tried fillna: print(df2.fillna(''))

我尝试了替换列表理解: xx = [如果有str(value)!='nan',则为xx中的值]

I have tried a replace list comprehension: xx = [value for value in xx if str(value) != 'nan']

无论我做什么,"New"列仍然充满着难闻的事情.

No matter what I try, the "New" column is still full of nans.

0                0096-0225,0096-0225,nan,nan
1          0567-8315,nan,1820-7448,0567-8315
2                0007-4977,0007-4977,nan,nan
3    0007-0963,0007-0963,0366-077X,1365-2133

我希望他们跳过或丢弃.我只想要有效的ISSN.

I want them skipped or dropped. I only want the valid ISSNs.

预先感谢您的帮助.

推荐答案

这里发生了一些事情.第一个问题是该问题表明'nan'在数据框中,但是注释建议该域实际上应为nan(字符串与null).

There are a few things going on here. The first is that the question shows that 'nan' is in the dataframe, however the comment suggests that this should actually be nan (string versus null).

第二个原因是您要存储列表,然后将这些列表的字符串存储在通常不鼓励使用的数据框中(正是出于您所遇到的原因),通常会出现意外行为.

The second is that you are storing lists, and then strings of those lists in a dataframe which is typically discouraged - for precisely the reason you are running into - there is often unexpected behavior.

尽管您也应该能够将其适应于nans,但我将解决所提出的问题

I will address the question as it was posed although you should be able to adapt this to nans as well

引起该问题的代码是:

xx=df1.set_index('Journal').values.tolist() 
df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )
df2=df2[df2.New.apply(len)>0]
df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

第二行是将xx中的 all 值添加到包含'nan'df2['New']中,随后的几行将它们转换为列表,然后是字符串.一旦这些值存在于字符串或列表中,您将无法使用普通的pandas方法访问它们.

The second line here is adding all of the values in xx to df2['New'] which contains 'nan' then subsequent lines turn these into a list and then a string. Once those values exist in a string or list you are not going to be able access them with normal pandas methods.

我的建议是将它们从xx中删除,然后它们将不会显示在df2中:

My suggestion would be to remove them from xx and then they won't show up in df2 at all:

xx=df1.set_index('Journal').values.tolist()
#get rid of nans here
xx=[[y for y in x if y != 'nan'] for x in xx]
df2['New']=df2.ISSN.apply(lambda x : [y for y in xx if x in y] )
df2=df2[df2.New.apply(len)>0]
df2['New']=df2.New.apply(pd.Series)[0].apply(lambda x : ','.join(x))

请注意,此处的第二行是在'nan'易于访问时将其删除.

Note the second line here is removing the 'nan's at time when they are easily accessible.

这应该可以满足您的需要,尽管我再次提醒您,如果可能的话,不要将列表存储在数据框中,并确保使用nan而不是'nan'.希望这会有所帮助!

This should get you what you need, though once again I would caution against storing lists in dataframes if possible and be sure to use nan and not 'nan'. Hope this helps!

这篇关于Dropna没有删除,fillna没有填充,我的列表理解无法理解如何摆脱nans(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆