使用字典替换 Pandas 列中字符串中的字符串 [英] Use dictionary to replace a string within a string in Pandas columns

查看：34 发布时间：2021/12/25 9:17:09 python pandas dictionary dataframe replace

本文介绍了使用字典替换 Pandas 列中字符串中的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 dictionary key 将 pandas 列中的 strings 替换为其 值.但是，每一列都包含句子.因此，我必须首先对句子进行标记，并检测句子中的某个词是否与我字典中的某个键相对应，然后将字符串替换为相应的值.


I am trying to use a dictionary key to replace strings in a pandas column with its values. However, each column contains sentences. Therefore, I must first tokenize the sentences and detect whether a Word in the sentence corresponds with a key in my dictionary, then replace the string with the corresponding value.
然而，我继续得到它没有的结果.有没有更好的 Pythonic 方法来解决这个问题?
However, the result that I continue to get it none. Is there a better pythonic way to approach this problem?
这是我目前的 MVC.在评论中，我指定了问题发生的位置.
Here is my MVC for the moment. In the comments, I specified where the issue is happening. 
import pandas as pd

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big','The cat from the UK is small','The rock was found in LA.']
}

ids = {'Id':['NYC','LA','UK'],
      'City':['New York City','Los Angeles','United Kingdom']}


df = pd.DataFrame(data)
ids = pd.DataFrame(ids)

def col2dict(ids):
    data = ids[['Id', 'City']]
    idDict = data.set_index('Id').to_dict()['City']
    return idDict

def replaceIds(data,idDict):
    ids = idDict.keys()
    types = idDict.values()
    data['commentTest'] = data['Comment']
    words = data['commentTest'].apply(lambda x: x.split())
    for (i,word) in enumerate(words):
        #Here we can see that the words appear
        print word
        print ids
        if word in ids:
        #Here we can see that they are not being recognized. What happened?
            print ids
            print word
            words[i] = idDict[word]
            data['commentTest'] = ' '.apply(lambda x: ''.join(x))
    return data

idDict = col2dict(ids)
results = replaceIds(df, idDict)

结果:
None

我正在使用 python2.7，当我打印 dict 时，有 u' 的 Unicode.
I am using python2.7 and when I am printing out the dict, there are u' of Unicode. 
我的预期结果是:
类别
评论
类型
评论测试
  Categories  Comment  Type commentTest
0 animal  The NYC tree is very big tree The New York City tree is very big 
1 plant The cat from the UK is small dog  The cat from the United Kingdom is small 
2 object  The rock was found in LA. rock  The rock was found in Los Angeles. 


推荐答案
您可以创建 dictionary 然后 replace:
You can create dictionary and then replace:
ids = {'Id':['NYC','LA','UK'],
      'City':['New York City','Los Angeles','United Kingdom']}

ids = dict(zip(ids['Id'], ids['City']))
print (ids)
{'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City'}

df['commentTest'] = df['Comment'].replace(ids, regex=True)
print (df)
  Categories                       Comment  Type  
0     animal      The NYC tree is very big  tree   
1      plant  The cat from the UK is small   dog   
2     object     The rock was found in LA.  rock   

                                commentTest  
0        The New York City tree is very big  
1  The cat from the United Kingdom is small  
2        The rock was found in Los Angeles.  


                        这篇关于使用字典替换 Pandas 列中字符串中的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用字典替换 Pandas 列中字符串中的字符串 [英] Use dictionary to replace a string within a string in Pandas columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用字典替换 Pandas 列中字符串中的字符串 [英] Use dictionary to replace a string within a string in Pandas columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭