如何对 pandas 数据框的列中的值进行拆分和分类 [英] how to split and categorize value in a column of a pandas dataframe

查看：56 发布时间：2020/10/15 21:32:05 python pandas dataframe data-analysis

本文介绍了如何对 pandas 数据框的列中的值进行拆分和分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个df

    keys
0   one
1   two,one
2   " "
3   five,one
4   " "
5   two,four
6   four
7   four,five

和两个列表，

 actual=["one","two"]
 syn=["four","five"]

我正在创建新行df [ val]
我正在努力获取 df [ keys] 中的单元格类别。如果 actual 中存在任何键，那么我想在新列中添加实际值，但在同一行中，如果在实际值中不存在任何值，那么我想要相应的 df [ val]作为syn 。

I am creating a new row df["val"] I am trrying to get the categories of the cells in df["keys"]. If anyone of the key is present in actual then i want to add actual in a new column but same row, If anyone of the value is not present in actual then i want the corresponding df["val"] as syn. and it should not do anything on the white space cells.

我想要的输出是

output_df

    keys      val
0   one       actual
1   two,one   actual
2   " "        
3   five,one  actual
4   " "
5   two,four  actual
6   four      syn
7   four,five syn

请帮助，谢谢！

推荐答案

使用 numpy.select 具有双重条件，通过比较 set s：

s = df['keys'].str.split(',')
m1 = s.apply(set) & set(actual)
m2 = s.apply(set) & set(syn)

df['part'] = np.select([m1, m2], ['actual','syn'], default='')
print (df)
        keys    part
0        one  actual
1    two,one  actual
2                   
3   five,one  actual
4                   
5   two,four  actual
6       four     syn
7  four,five     syn

时间：

df = pd.concat([df] * 10000, ignore_index=True)


In [143]: %%timeit 
     ...: s = df['keys'].str.split(',')
     ...: m1 = s.apply(set) & set(actual)
     ...: m2 = s.apply(set) & set(syn)
     ...: 
1 loop, best of 3: 160 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ' s solution
In [144]: %%timeit
     ...: v = df['keys'].str.split(',',expand=True)
     ...: m1 = v.isin(["one","two"]).any(1)
     ...: m2 = v.isin(["four","five"]).any(1)
     ...: 
1 loop, best of 3: 193 ms per loop

注意：

性能确实取决于数据。

这篇关于如何对 pandas 数据框的列中的值进行拆分和分类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何对 pandas 数据框的列中的值进行拆分和分类 [英] how to split and categorize value in a column of a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何对 pandas 数据框的列中的值进行拆分和分类 [英] how to split and categorize value in a column of a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭