根据字典替换文本中的字符串 [英] Replace strings in text based on dictionary

查看：90 发布时间：2020/5/9 0:40:10 r performance merge dataframe

本文介绍了根据字典替换文本中的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是R的新手，需要建议. 我有一个带有1个文本字段的数据框.我需要在该文本字段中修复拼写错误的单词.为了解决这个问题，我还有第二个文件(字典)，其中有两列-拼写错误的单词和替换单词的正确单词.

I am new to R and need suggestions. I have a dataframe with 1 text field in it. I need to fix the misspelled words in that text field. To help with that, I have a second file (dictionary) with 2 columns - the misspelled words and the correct words to replace them.

您会建议这样做吗?我写了一个简单的"for循环"，但是性能是一个问题. 该文件有〜120K行，而字典有〜5k行，程序已经运行了几个小时.文字最多可以包含2000个字符.

How would you recommend doing it? I wrote a simple "for loop" but the performance is an issue. The file has ~120K rows and the dictionary has ~5k rows and the program's been running for hours. The text can have a max of 2000 characters.

这是代码:

output<-source_file$MEMO_MANUAL_TXT
for (i in 1:nrow(fix_file))  {           #dictionary file
target<-paste0(" ", fix_file$change_to_target[i], " ")
replace<-paste0(" ", fix_file$target[i], " ")
output<-gsub(target, replace, output, fixed = TRUE)

推荐答案

我会尝试agrep.我不确定它的伸缩性如何.

I would try agrep. I'm not sure how well it scales though.

例如.

> agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
[1] "1 lazy"

也请检查pmatch和charmatch，尽管我觉得它们对您没有用.

Also check out pmatch and charmatch although I feel they won't be as useful to you.

这篇关于根据字典替换文本中的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据字典替换文本中的字符串 [英] Replace strings in text based on dictionary

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据字典替换文本中的字符串 [英] Replace strings in text based on dictionary

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭