从数据框中删除英语和非英语名称 [英] Remove both English and Non-English names from a dataframe
本文介绍了从数据框中删除英语和非英语名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在处理数百行垃圾数据.一个虚拟数据是这样的:
I am working with several hundreds of rows of a junk data. A dummy data is as thus:
foo_data <- c("Mary Smith is not here", "Wiremu Karen is not a nice person",
"Rawiri Herewini is my name", "Ajibade Smith is my man", NA)
我需要删除所有名字(英文和非英文名字和姓氏,以便我想要的输出是:
I need to remove all names (both English and non-English first names and family names such that my desired output will be:
[1] "is not here" " is not a nice person" " is my name"
[4] "is my man" NA
但是,使用 textclean 包,我只能删除英文名称,留下非英文名称:
However, using textclean package, I was only able to remove English names leaving the non-English names:
library(textclean)
textclean::replace_names(foo_data)
[1] " is not here" "Wiremu is not a nice person" "Rawiri Herewini is my name"
[4] "Ajibade is my man" NA
任何帮助将不胜感激.
推荐答案
您可以:
s <- textclean::replace_names(foo_data)
trimws(gsub(sprintf('\\b(%s)\\b',
paste0(unlist(hunspell::hunspell(s)), collapse = '|')), '', s))
[1] "is not here" "is not a nice person" "is my name" "is my man" NA
这篇关于从数据框中删除英语和非英语名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文