使用dplyr mutate和现有列的子字符串创建新列 [英] Create new column with dplyr mutate and substring of existing column

查看：269 发布时间：2020/10/26 2:55:43 r dplyr strsplit

本文介绍了使用dplyr mutate和现有列的子字符串创建新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有一列字符串的数据框，并且想要将这些字符串的子字符串提取到一个新列中。

I have a dataframe with a column of strings and want to extract substrings of those into a new column.

下面是一些示例代码和数据，它们表明我想在 id 列中最后一个下划线字符之后输入字符串，以创建 new_id 列。
id 列条目始终包含2个下划线字符，并且始终是我想要的最后一个子字符串。

Here is some sample code and data showing I want to take the string after the final underscore character in the id column in order to create a new_id column. The id column entry always has 2 underscore characters and it's always the final substring I would like.

df = data.frame( id = I(c("abcd_123_ABC","abc_5234_NHYK")), x = c(1.0,2.0) )

require(dplyr)

df = df %>% dplyr::mutate(new_id = strsplit(id, split="_")[[1]][3])

我期望strsplit依次对每一行起作用。

I was expecting strsplit to act on each row in turn.

但是， new_id 列每行仅包含 ABC ，而我想 ABC 在第1行， NHYK 在第2行，您知道为什么这样做失败以及如何实现我想要的吗？

However, the new_id column only contains ABC in each row, whereas I would like ABC in row 1 and NHYK in row 2. Do you know why this fails and how to achieve what I want?

推荐答案

您可以使用 stringr :: str_extract ：

library(stringr)

 df %>%
   dplyr::mutate(new_id = str_extract(id, "[^_]+$"))


#>              id x new_id
#> 1  abcd_123_ABC 1    ABC
#> 2 abc_5234_NHYK 2   NHYK

正则表达式表示匹配一个或多个（ + ）不是 _ （否定的 [^] ），然后是字符串结尾（ $ ）。

The regex says, match one or more (+) of the characters that aren't _ (the negating [^ ]), followed by end of string ($).

这篇关于使用dplyr mutate和现有列的子字符串创建新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr mutate和现有列的子字符串创建新列 [英] Create new column with dplyr mutate and substring of existing column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用dplyr mutate和现有列的子字符串创建新列 [英] Create new column with dplyr mutate and substring of existing column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭