DF Groupby集比较 [英] Df groupby set comparison
本文介绍了DF Groupby集比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个要测试字谜的单词列表.我想使用熊猫,所以我不必在计算上浪费大量的循环.给定.txt单词列表:
I have a list of words that I want to test for anagrams. I want to use pandas so I don't have to use computationally wasteful for loops. Given a .txt list of words say:
"acb" "bca" "foo" 钱币" 西班牙猎狗"
"acb" "bca" "foo" "oof" "spaniel"
我想将它们放在df中,然后按它们的字谜列表进行分组-我以后可以删除重复的行.
I want to put them in a df then group them by lists of their anagrams - I can remove duplicate rows later.
到目前为止,我已经有了代码:
So far I have the code:
import pandas as pd
wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, names=['word'])
wordlist = wordlist.drop_duplicates(keep='first')
wordlist['split'] = ''
wordlist['anagrams'] = ''
for index, row in wordlist.iterrows() :
row['split'] = list(row['word'])
wordlist = wordlist.groupby('word')[('split')].apply(list)
print(wordlist)
如何对集合进行分组,以便它知道
How do I groupby a set so it knows that
[[a, b, c]]
[[b, a, c]]
一样吗?
推荐答案
我认为您可以使用sorted
list
s:
I think you can use sorted
list
s:
df['a'] = df['word'].apply(lambda x: sorted(list(x)))
print (df)
word a
0 acb [a, b, c]
1 bca [a, b, c]
2 foo [f, o, o]
3 oof [f, o, o]
4 spaniel [a, e, i, l, n, p, s]
查找字谜的另一种解决方案:
Another solution for find anagrams:
#reverse strings
df['reversed'] = df['word'].str[::-1]
#reshape
s = df.stack()
#get all dupes - anagrams
s1 = s[s.duplicated(keep=False)]
print (s1)
0 word acb
reversed bca
1 word bca
reversed acb
2 word foo
reversed oof
3 word oof
reversed foo
dtype: object
#if want select of values by second level word
s2 = s1.loc[pd.IndexSlice[:, 'word']]
print (s2)
0 acb
1 bca
2 foo
3 oof
dtype: object
这篇关于DF Groupby集比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文