使用字典替换 Pandas 列中字符串中的字符串 [英] Use dictionary to replace a string within a string in Pandas columns
问题描述
我正在尝试使用 dictionary
key
将 pandas
列中的 strings
替换为其 值代码>.但是,每一列都包含句子.因此,我必须首先对句子进行标记,并检测句子中的某个词是否与我字典中的某个键相对应,然后将字符串替换为相应的值.
I am trying to use a dictionary
key
to replace strings
in a pandas
column with its values
. However, each column contains sentences. Therefore, I must first tokenize the sentences and detect whether a Word in the sentence corresponds with a key in my dictionary, then replace the string with the corresponding value.
然而,我继续得到它没有的结果.有没有更好的 Pythonic 方法来解决这个问题?
However, the result that I continue to get it none. Is there a better pythonic way to approach this problem?
这是我目前的 MVC.在评论中,我指定了问题发生的位置.
Here is my MVC for the moment. In the comments, I specified where the issue is happening.
import pandas as pd
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big','The cat from the UK is small','The rock was found in LA.']
}
ids = {'Id':['NYC','LA','UK'],
'City':['New York City','Los Angeles','United Kingdom']}
df = pd.DataFrame(data)
ids = pd.DataFrame(ids)
def col2dict(ids):
data = ids[['Id', 'City']]
idDict = data.set_index('Id').to_dict()['City']
return idDict
def replaceIds(data,idDict):
ids = idDict.keys()
types = idDict.values()
data['commentTest'] = data['Comment']
words = data['commentTest'].apply(lambda x: x.split())
for (i,word) in enumerate(words):
#Here we can see that the words appear
print word
print ids
if word in ids:
#Here we can see that they are not being recognized. What happened?
print ids
print word
words[i] = idDict[word]
data['commentTest'] = ' '.apply(lambda x: ''.join(x))
return data
idDict = col2dict(ids)
results = replaceIds(df, idDict)
结果:
None
我正在使用 python2.7
,当我打印 dict
时,有 u'
的 Unicode.
I am using python2.7
and when I am printing out the dict
, there are u'
of Unicode.
我的预期结果是:
类别
评论
类型
评论测试
Categories Comment Type commentTest
0 animal The NYC tree is very big tree The New York City tree is very big
1 plant The cat from the UK is small dog The cat from the United Kingdom is small
2 object The rock was found in LA. rock The rock was found in Los Angeles.
推荐答案
您可以创建 dictionary
然后 replace
:
You can create dictionary
and then replace
:
ids = {'Id':['NYC','LA','UK'],
'City':['New York City','Los Angeles','United Kingdom']}
ids = dict(zip(ids['Id'], ids['City']))
print (ids)
{'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City'}
df['commentTest'] = df['Comment'].replace(ids, regex=True)
print (df)
Categories Comment Type
0 animal The NYC tree is very big tree
1 plant The cat from the UK is small dog
2 object The rock was found in LA. rock
commentTest
0 The New York City tree is very big
1 The cat from the United Kingdom is small
2 The rock was found in Los Angeles.
这篇关于使用字典替换 Pandas 列中字符串中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!