在 pandas 数据框中加入元组列表 [英] Joining a list of tuples within a pandas dataframe
本文介绍了在 pandas 数据框中加入元组列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在数据框中加入一个元组列表.
我已经尝试了几种使用join
和lambda
I want to join a list of tuples within a dataframe.
I have tried several methods of doing this within the dataframe with join
and with lambda
import pandas as pd
from nltk import word_tokenize, pos_tag, pos_tag_sents
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
'The rock was found in LA.']}
def posTag(data):
data = pd.DataFrame(data)
comments = data['Comment'].tolist()
taggedComments = pos_tag_sents(map(word_tokenize,comments))
data['taggedComment'] = taggedComments
print data['taggedComment']
data['taggedComment'].apply(lambda x: (' '.join(x)))
return data
taggedData = posTag(data)
print data
我尝试过的tuple
加入的其他一些方法是:
Some other methods of tuple
joining that I have tried are:
(' '.join(['_'.join(x) for x in data['taggedComment']]))
[''.join(x) for x in data['taggedComment']]
['_'.join(str(x)) for x in data['taggedComment']]
无论我做什么我都会遇到相同的错误.
No matter what I do I arrive a the same error.
TypeError: sequence item 0: expected string, tuple found
对于每个列表,我想要的响应
My desired response if for each list
[('A', 'B'), ('B', 'C'), ('C', 'B')]
在数据框中
到outPutFile
in the dataframe to outPutFile
'A_B B_C C_B'
关于出了什么问题的任何建议吗?
Any suggestions as to what is going wrong?
推荐答案
您可以使用double list comprehension
并将输出分配给列后:
You can use double list comprehension
and assign output to column back:
所以不是:
data['taggedComment'].apply(lambda x: (' '.join(x)))
在您的posTag(data)
方法中使用以下内容:
use the following in your posTag(data)
method:
data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in data['taggedComment']]
taggedData = posTag(data)
print (taggedData)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
taggedComment
0 The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ
1 NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...
2 The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._.
一起:
def posTag(data):
data = pd.DataFrame(data)
comments = data['Comment'].tolist()
print (pos_tag_sents(map(word_tokenize, comments)))
taggedComments = pos_tag_sents(map(word_tokenize, comments))
data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in taggedComments]
return data
taggedData = posTag(data)
print (taggedData)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
taggedComment
0 The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ
1 NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...
2 The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._.
这篇关于在 pandas 数据框中加入元组列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文