使用字典脚本导入txt文件并将其应用于数据框以替换单词 [英] importing txt file with dictionary script and applying it to dataframe to replace words

查看:70
本文介绍了使用字典脚本导入txt文件并将其应用于数据框以替换单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用txt文件替换数据框中某列中的某些字符串。



我有一个类似于以下内容的数据框(这是一个非常

  coffee_directions_df 

话机频率

前往星巴克的路线1045
前往塔利的路线1034
给我前往塔利的路线986
前往西雅图最佳的路线875
告诉我前往Dunkin 812的路线
路线前往Daily Dozen 789
向我显示前往星巴克的路线754
给我前往Dunkin的路线612
导航至Seattles Best 498
显示导航至Starbucks 376
指引我前往星巴克201

DF会显示人们的讲话和讲话的频率。



即,指向星巴克的路线发出了1045次。



我有另一个xlsx格式的DataFrame coffee_donut.xlsx ,我想使用它导入和替换某些字符串(类似于通过检查熊猫数据框来替换单词。)

  coffee_donut 

代名词

星巴克咖啡
塔利斯咖啡
西雅图最佳咖啡
邓肯甜甜圈
每日十二个甜甜圈

最后,我希望数据框看起来像这样:

  coffee_donut_df 

出勤频率

前往咖啡的路线1045
前往咖啡的路线1034
给我前往咖啡的路线986
前往咖啡的路线875
向我显示前往甜甜圈的路线812
前往甜甜圈的路线789



我按照上一个问题的步骤进行了操作,但是我被困在最后一部分:

  import re 
以pd
sdf = pd.read_excel('C:\coffee_donut.xlsx')$导入熊猫b $ b rep = dict(zip(sdf.Token,sdf.synonyms))#转换为字典

rep = dict((re.escape(k),v)for rep .iteritems())
pattern = re.compile( | .join(rep.keys()))
rep = pattern.sub(lambda m:rep [re.escape(m.group (0))],** coffee_directions_df **)

打印代表

如何将代表应用于数据框?如果这是一个菜鸟问题,我感到很抱歉。非常感谢您的帮助。



谢谢!

解决方案

您差不多了!这是一个在当前代码中重用regex对象和lambda函数的解决方案。



而不是最后一行( rep = pattern.sub(。 .. ),运行以下命令:

  coffee_directions_df ['Utterance'] = \ 
coffee_directions_df ['Utterance']。str.replace(pattern,lambda m:rep [m.group(0)])

#确认替换
coffee_directions_df
发言频率
0前往咖啡的路线1045
1前往咖啡的路线1034
2给我前往咖啡的路线986
3前往Seattles Best 875的路线
...

之所以有用,是因为 pd.Series.str.replace 可以接受编译的正则表达式对象和函数; 有关详细信息,请参阅文档。 / p>

I am trying to replace certain strings within a column in a dataframe using a txt file.

I have a dataframe that looks like the following (this is a very small version of a massive dataframe that i have).

coffee_directions_df

Utterance                         Frequency   

Directions to Starbucks           1045
Directions to Tullys              1034
Give me directions to Tullys      986
Directions to Seattles Best       875
Show me directions to Dunkin      812
Directions to Daily Dozen         789
Show me directions to Starbucks   754
Give me directions to Dunkin      612
Navigate me to Seattles Best      498
Display navigation to Starbucks   376
Direct me to Starbucks            201

The DF shows utterances made by people and the frequency of utterances.

I.e., "Directions to Starbucks" was uttered 1045 times.

I have another DataFrame in xlsx format coffee_donut.xlsx that I want to use to import and replace certain strings (similar to what Replace words by checking from pandas dataframe asked).

coffee_donut

Token              Synonyms

Starbucks          Coffee
Tullys             Coffee
Seattles Best      Coffee
Dunkin             Donut
Daily Dozen        Donut

And ultimately, I want the dataframe to look like this:

coffee_donut_df

Utterance                        Frequency   

Directions to Coffee             1045
Directions to Coffee             1034
Give me directions to Coffee     986
Directions to Coffee             875
Show me directions to Donut      812
Directions to Donut              789
.
.
.

I followed the previous question's steps, but i got stuck at the last part:

import re
import pandas as pd
sdf = pd.read_excel('C:\coffee_donut.xlsx')
rep = dict(zip(sdf.Token, sdf.Synonyms)) #convert into dictionary

rep = dict((re.escape(k), v) for k, v in rep.iteritems())
pattern = re.compile("|".join(rep.keys()))
rep = pattern.sub(lambda m: rep[re.escape(m.group(0))], **coffee_directions_df**)

print rep

How do I apply the rep to the dataframe?? I'm so sorry if this is such a noob question. I really appreciate your help.

Thanks!!

解决方案

You almost had it! Here's a solution that reuses the regex object and lambda function in your current code.

Instead of your last line (rep = pattern.sub(...), run this:

coffee_directions_df['Utterance'] = \
coffee_directions_df['Utterance'].str.replace(pattern, lambda m: rep[m.group(0)])

# Confirm replacement
coffee_directions_df
                          Utterance  Frequency
0          Directions to Coffee       1045
1          Directions to Coffee       1034
2  Give me directions to Coffee        986
3   Directions to Seattles Best        875
...

This works because pd.Series.str.replace can accept a compiled regex object and a function; see the docs for more.

这篇关于使用字典脚本导入txt文件并将其应用于数据框以替换单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆