在 pandas 数据框中加入元组列表 [英] Joining a list of tuples within a pandas dataframe

查看:70
本文介绍了在 pandas 数据框中加入元组列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在数据框中加入一个元组列表. 我已经尝试了几种使用joinlambda

I want to join a list of tuples within a dataframe. I have tried several methods of doing this within the dataframe with join and with lambda

import pandas as pd
from nltk import word_tokenize, pos_tag, pos_tag_sents

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']}
def posTag(data):
    data = pd.DataFrame(data)
    comments = data['Comment'].tolist()
    taggedComments = pos_tag_sents(map(word_tokenize,comments))
    data['taggedComment'] = taggedComments
    print data['taggedComment']
    data['taggedComment'].apply(lambda x: (' '.join(x)))
    return data
taggedData = posTag(data)
print data

我尝试过的tuple加入的其他一些方法是:

Some other methods of tuple joining that I have tried are:

(' '.join(['_'.join(x) for x in data['taggedComment']]))
 [''.join(x) for x in data['taggedComment']]
 ['_'.join(str(x)) for x in data['taggedComment']]

无论我做什么我都会遇到相同的错误.

No matter what I do I arrive a the same error.

TypeError: sequence item 0: expected string, tuple found

对于每个列表,我想要的响应

My desired response if for each list

[('A', 'B'),  ('B', 'C'),  ('C', 'B')]

在数据框中

到outPutFile

in the dataframe to outPutFile

'A_B B_C C_B'

关于出了什么问题的任何建议吗?

Any suggestions as to what is going wrong?

推荐答案

您可以使用double list comprehension并将输出分配给列后:

You can use double list comprehension and assign output to column back:

所以不是:

data['taggedComment'].apply(lambda x: (' '.join(x)))

在您的posTag(data)方法中使用以下内容:

use the following in your posTag(data) method:

data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in data['taggedComment']] 


taggedData = posTag(data)
print (taggedData)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                       taggedComment  
0       The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ  
1  NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...  
2  The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._. 

一起:

def posTag(data):
    data  = pd.DataFrame(data)
    comments = data['Comment'].tolist()
    print (pos_tag_sents(map(word_tokenize, comments)))

    taggedComments =  pos_tag_sents(map(word_tokenize,  comments))
    data['taggedComment'] = [' '.join(['_'.join(y) for y in x]) for x in taggedComments]
    return data

taggedData = posTag(data)
print (taggedData)

  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                       taggedComment  
0       The_DT NYC_NNP tree_NN is_VBZ very_RB big_JJ  
1  NY_NNP The_DT cat_NN from_IN the_DT UK_NNP is_...  
2  The_DT rock_NN was_VBD found_VBN in_IN LA_NNP ._.

这篇关于在 pandas 数据框中加入元组列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆