导入txt文件以替换数据帧中的某些字符串(pandas) [英] importing txt file to replace certain strings in a dataframe (pandas)

查看:261
本文介绍了导入txt文件以替换数据帧中的某些字符串(pandas)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用txt文件替换数据框中列中的某些字符串。

I am trying to replace certain strings within a column in a dataframe using a txt file.

我的数据框如下所示。

coffee_directions_df

Utterance                         Frequency   

Directions to Starbucks           1045
Directions to Tullys              1034
Give me directions to Tullys      986
Directions to Seattles Best       875
Show me directions to Dunkin      812
Directions to Daily Dozen         789
Show me directions to Starbucks   754
Give me directions to Dunkin      612
Navigate me to Seattles Best      498
Display navigation to Starbucks   376
Direct me to Starbucks            201

DF显示人们的言语和话语的频率。

The DF shows utterances made by people and the frequency of utterances.

即星巴克的路线 发了1045次。

I.e., "Directions to Starbucks" was uttered 1045 times.

我明白我可以创建一个字典来代替s诸如Starbucks,Tullys和Seattles Best之类的内容如下:

I understand that I can create a dictionary to replace strings such as "Starbucks", "Tullys", and "Seattles Best" such as the following:

# define dictionary of mappings
rep_dict = {'Starbucks': 'Coffee', 'Tullys': 'Coffee', 'Seattles Best': 'Coffee'}

# apply substring mapping

df['Utterance'] = df['Utterance'].replace(rep_dict, regex=True).str.lower()

然而,我的数据框非常大,我想知道是否有办法可以将 rep_dict 保存为.txt文件,导入.txt文件,并将该txt文件中的单词应用或映射到 coffee_directions_df.Utterance

However, my dataframe is pretty big, and I am wondering if there is a way where I can save rep_dict as a .txt file, import the .txt file, and apply or map that the words in that txt file to coffee_directions_df.Utterance

最终,我不想在脚本中创建一堆字典,并且能够导入包含这些字典的txt文件。

Ultimately, I don't want to create a bunch of dictionaries within the script and be able to import a txt file that contains these dictionaries.

谢谢!!

推荐答案

我的意思是这么简单:

import pandas as pd

data = '''\
Starbucks,Coffee
Tullys,Coffee
Seattles Best,Coffee'''

# Create a map from a file 
m = pd.read_csv(pd.compat.StringIO(data), header=None, index_col=[0])[1]

然后:

df['Utterance'] = df['Utterance'].replace(m, regex=True).str.lower()

这篇关于导入txt文件以替换数据帧中的某些字符串(pandas)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆