预处理:对数据框中的许多列进行文本分析 [英] Preprocessing: text analysis on many columns from a dataframe

查看：41 发布时间：2021/5/9 20:04:22 r function dataframe

本文介绍了预处理:对数据框中的许多列进行文本分析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用以下几行，可以对数据框的特定列中的文本进行预处理:

Using the following lines it is possible to preprocess text in a specific column of my dataframe:

#text to lower case
df$name <- tolower(df$name)
#remove all special characters
df$name <- gsub("[[:punct:]]", " ", df$name)
#remove long spaces
df$name <- gsub("\\s+"," ",str_trim(df$name))

我想在这样的数据框的所有列(期望ID)中实现此预处理规则:

I would like to implement this preprocessing rules in all columns (expect id) of a dataframe like this:

df  <- data.frame(id = c("A","B","C"), D = c("mytext 11","mytext +", "!!"), E = c("text","stg","1.2"), F = c("press","remove","22"))

推荐答案

如果要多次执行操作，定义

If you want to do something multiple times, it is often useful to define a function.

例如，您可以执行以下操作:

For example, you could do the following:

library(stringr)
df  <- data.frame(id = c("A","B","C"), D = c("mytext 11","mytext +", "!!"), 
                  E = c("text","stg","1.2"), F = c("press","remove","22"))

# create a function so we can apply this multiple times easily.
process <- function(my_vector)
{
  my_vector <- tolower(my_vector)
  #remove all special characters
  my_vector <- gsub("[[:punct:]]", " ", my_vector)
  #remove long spaces
  my_vector <- gsub("\\s+"," ",str_trim(my_vector))
  # return result
  return(my_vector)
}

# for all columns except 'id', apply our function.
for(x in setdiff(colnames(df),"id"))
{
 df[[x]]=process(df[[x]])
}

这篇关于预处理:对数据框中的许多列进行文本分析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

预处理:对数据框中的许多列进行文本分析 [英] Preprocessing: text analysis on many columns from a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

预处理:对数据框中的许多列进行文本分析 [英] Preprocessing: text analysis on many columns from a dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭