将列表中找到的ID添加到Pandas数据框中的新列 [英] Add ID found in list to new column in pandas dataframe

查看：41 发布时间：2020/10/17 0:53:38 python python-3.x pandas dataframe

本文介绍了将列表中找到的ID添加到Pandas数据框中的新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我有以下数据帧（一列整数和一列整数列表）...

Say I have the following dataframe (a column of integers and a column with a list of integers)...

      ID                   Found_IDs
0  12345        [15443, 15533, 3433]
1  15533  [2234, 16608, 12002, 7654]
2   6789      [43322, 876544, 36789]

还有ID的单独列表...

And also a separate list of IDs...

bad_ids = [15533, 876544, 36789, 11111]

忽略 df ['ID'] 列和任何索引，我想查看 bad_ids 列表在 df ['Found_IDs'] 列中提到。到目前为止，我拥有的代码是：


Given that, and ignoring the df['ID'] column and any index, I want to see if any of the IDs in the bad_ids list are mentioned in the df['Found_IDs'] column.  The code I have so far is:
df['bad_id'] = [c in l for c, l in zip(bad_ids, df['Found_IDs'])]

此方法有效，但仅当 bad_ids 列表比数据框长，对于实际数据集， bad_ids 列表将比数据框短很多。如果我将 bad_ids 列表设置为仅两个元素... 
This works but only if the bad_ids list is longer than the dataframe and for the real dataset the bad_ids list is going to be a lot shorter than the dataframe.  If I set the bad_ids list to only two elements...
bad_ids = [15533, 876544]

我遇到了一个非常普遍的错误（我已经阅读了很多关于相同的错误）... 
I get a very popular error (I have read many questions with the same error)...
ValueError: Length of values does not match length of index

我尝试将列表转换为序列（错误没有变化）。我还尝试过添加新列并将所有值设置为 False ，然后再执行理解行（同样，错误也不变）。
I have tried converting the list to a series (no change in the error).  I have also tried adding the new column and setting all values to False before doing the comprehension line (again no change in the error).
两个问题：
 
 如何使我的代码（以下）适用于短于$的列表b $ ba数据框？ 
 
 如何获取将
找到的实际ID写回到 df ['bad_id'] 列的代码（比True / False有用）？
 
 

How do I get my code (below) to work for a list that is shorter than
a dataframe? 
How would I get the code to write the actual  ID found
back to the df['bad_id'] column (more useful than True/False)?

  bad_ids的预期输出= [15533，876544] ：
      ID                   Found_IDs  bad_id
0  12345        [15443, 15533, 3433]    True
1  15533  [2234, 16608, 12002, 7654]   False
2   6789      [43322, 876544, 36789]    True

  bad_ids = [15533，876544] 的理想输出（将ID写入一个或多个新列）：
Ideal output for bad_ids = [15533, 876544] (ID(s) are written to a new column or columns):
      ID                   Found_IDs  bad_id
0  12345        [15443, 15533, 3433]    15533
1  15533  [2234, 16608, 12002, 7654]   False
2   6789      [43322, 876544, 36789]    876544

代码：
import pandas as pd

result_list = [[12345,[15443,15533,3433]],
        [15533,[2234,16608,12002,7654]],
        [6789,[43322,876544,36789]]]

df = pd.DataFrame(result_list,columns=['ID','Found_IDs'])

# works if list has four elements
# bad_ids = [15533, 876544, 36789, 11111]

# fails if list has two elements (less elements than the dataframe)
# ValueError: Length of values does not match length of index
bad_ids = [15533, 876544]

# coverting to Series doesn't change things
# bad_ids = pd.Series(bad_ids)
# print(type(bad_ids))

# setting up a new column of false values doesn't change things
# df['bad_id'] = False

print(df)

df['bad_id'] = [c in l for c, l in zip(bad_ids, df['Found_IDs'])]

print(bad_ids)

print(df)

 
 
推荐答案
使用  np.intersect1d  以获得两个列表的相交：
Using np.intersect1d to get the intersect of the two lists:
df['bad_id'] = df['Found_IDs'].apply(lambda x: np.intersect1d(x, bad_ids))

      ID                   Found_IDs    bad_id
0  12345        [15443, 15533, 3433]   [15533]
1  15533  [2234, 16608, 12002, 7654]        []
2   6789      [43322, 876544, 36789]  [876544]

或者仅使用香草python使用<$相交c $ c> sets ：

Or with just vanilla python using intersect of sets:

bad_ids_set = set(bad_ids)
df['Found_IDs'].apply(lambda x: list(set(x) & bad_ids_set))

这篇关于将列表中找到的ID添加到Pandas数据框中的新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将列表中找到的ID添加到Pandas数据框中的新列 [英] Add ID found in list to new column in pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将列表中找到的ID添加到Pandas数据框中的新列 [英] Add ID found in list to new column in pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭