将列表中找到的ID添加到Pandas数据框中的新列 [英] Add ID found in list to new column in pandas dataframe
问题描述
说我有以下数据帧(一列整数和一列整数列表)...
Say I have the following dataframe (a column of integers and a column with a list of integers)...
ID Found_IDs
0 12345 [15443, 15533, 3433]
1 15533 [2234, 16608, 12002, 7654]
2 6789 [43322, 876544, 36789]
还有ID的单独列表...
And also a separate list of IDs...
bad_ids = [15533, 876544, 36789, 11111]
忽略 df ['ID']
列和任何索引,我想查看 bad_ids
列表在 df ['Found_IDs']
列中提到。到目前为止,我拥有的代码是:
Given that, and ignoring the df['ID']
column and any index, I want to see if any of the IDs in the bad_ids
list are mentioned in the df['Found_IDs']
column. The code I have so far is:
df['bad_id'] = [c in l for c, l in zip(bad_ids, df['Found_IDs'])]
此方法有效,但仅当 bad_ids
列表比数据框长,对于实际数据集, bad_ids
列表将比数据框短很多。如果我将 bad_ids
列表设置为仅两个元素...
This works but only if the bad_ids
list is longer than the dataframe and for the real dataset the bad_ids
list is going to be a lot shorter than the dataframe. If I set the bad_ids
list to only two elements...
bad_ids = [15533, 876544]
我遇到了一个非常普遍的错误(我已经阅读了很多关于相同的错误)...
I get a very popular error (I have read many questions with the same error)...
ValueError: Length of values does not match length of index
我尝试将列表转换为序列(错误没有变化)。我还尝试过添加新列并将所有值设置为 False
,然后再执行理解行(同样,错误也不变)。
I have tried converting the list to a series (no change in the error). I have also tried adding the new column and setting all values to False
before doing the comprehension line (again no change in the error).
两个问题:
- 如何使我的代码(以下)适用于短于$的列表b $ ba数据框?
- 如何获取将
找到的实际ID写回到df ['bad_id']
列的代码(比True / False有用)?
- How do I get my code (below) to work for a list that is shorter than a dataframe?
- How would I get the code to write the actual ID found
back to the
df['bad_id']
column (more useful than True/False)?
bad_ids的预期输出= [15533,876544]
:
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] True
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] True
bad_ids = [15533,876544]
的理想输出(将ID写入一个或多个新列):
Ideal output for bad_ids = [15533, 876544]
(ID(s) are written to a new column or columns):
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] 15533
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] 876544
代码:
import pandas as pd
result_list = [[12345,[15443,15533,3433]],
[15533,[2234,16608,12002,7654]],
[6789,[43322,876544,36789]]]
df = pd.DataFrame(result_list,columns=['ID','Found_IDs'])
# works if list has four elements
# bad_ids = [15533, 876544, 36789, 11111]
# fails if list has two elements (less elements than the dataframe)
# ValueError: Length of values does not match length of index
bad_ids = [15533, 876544]
# coverting to Series doesn't change things
# bad_ids = pd.Series(bad_ids)
# print(type(bad_ids))
# setting up a new column of false values doesn't change things
# df['bad_id'] = False
print(df)
df['bad_id'] = [c in l for c, l in zip(bad_ids, df['Found_IDs'])]
print(bad_ids)
print(df)
推荐答案
使用 np.intersect1d
以获得两个列表的相交:
Using np.intersect1d
to get the intersect of the two lists:
df['bad_id'] = df['Found_IDs'].apply(lambda x: np.intersect1d(x, bad_ids))
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]
或者仅使用香草python使用<$相交c $ c> sets :
Or with just vanilla python using intersect of sets
:
bad_ids_set = set(bad_ids)
df['Found_IDs'].apply(lambda x: list(set(x) & bad_ids_set))
这篇关于将列表中找到的ID添加到Pandas数据框中的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!