从数据框中删除英语和非英语名称 [英] Remove both English and Non-English names from a dataframe

查看：71 发布时间：2021/7/7 19:47:51 r string replace text-mining data-cleaning

本文介绍了从数据框中删除英语和非英语名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理数百行垃圾数据.一个虚拟数据是这样的:

I am working with several hundreds of rows of a junk data. A dummy data is as thus:

   foo_data <- c("Mary Smith is not here", "Wiremu Karen is not a nice person",
                  "Rawiri Herewini is my name", "Ajibade Smith is my man", NA)

我需要删除所有名字(英文和非英文名字和姓氏，以便我想要的输出是:

I need to remove all names (both English and non-English first names and family names such that my desired output will be:

[1] "is not here"         " is not a nice person" " is my name"  
[4] "is my man"           NA

但是，使用 textclean 包，我只能删除英文名称，留下非英文名称:

However, using textclean package, I was only able to remove English names leaving the non-English names:

library(textclean)
textclean::replace_names(foo_data)

[1] "  is not here"     "Wiremu  is not a nice person"    "Rawiri Herewini is my name"  
[4] "Ajibade  is my man"           NA

任何帮助将不胜感激.

推荐答案

您可以:

s <- textclean::replace_names(foo_data)
trimws(gsub(sprintf('\\b(%s)\\b', 
      paste0(unlist(hunspell::hunspell(s)), collapse = '|')), '', s))

[1] "is not here"          "is not a nice person" "is my name"           "is my man"            NA

这篇关于从数据框中删除英语和非英语名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据框中删除英语和非英语名称 [英] Remove both English and Non-English names from a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从数据框中删除英语和非英语名称 [英] Remove both English and Non-English names from a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭