仅当它是 R 中的特定字符时才替换字符串中的第 n 个字符 [英] replacing the nth character in a string only if it is a particular character in R

查看:41
本文介绍了仅当它是 R 中的特定字符时才替换字符串中的第 n 个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将一系列调查作为 .csv 文件导入并合并到一个数据集中.问题是对于七个文件之一,一些变量的导入略有不同.数据集很大,我想找到一种方法来编写一个函数来运行给我带来麻烦的数据集.

I am importing a series of surveys as .csv files and combining into one data set. The problem is for one of the seven files some of the variables are importing slightly differently. The data set is huge and I would like to find a way to write a function to run over dataset that is giving me trouble.

在一些变量中,应该有一个点的时候有一个下划线.并非所有变量的格式都相同,但不正确的是,下划线始终是列名的第 6 个元素.

In some of the variables there is an underscore when there should be a dot. Not all variables are of the same format but the ones that are incorrect are, in that the underscore is always the 6th element of the column name.

我希望 R 查找第 6 个元素,如果它是下划线,则用点替换它.下面是一个虚构的例子.

I want R to look for the 6th element and if it is an underscore replace it with a dot. here is a made up example below.

col_names <- c("s1.help_needed",
               "s1.Q2_im_stuck",
               "s1.Q2.im_stuck",
               "s1.Q3.regex",
               "s1.Q3_regex",
               "s2.Q1.is_confusing",
               "s2.Q2.answer_please",
               "s2.Q2_answer_please",
               "s2.someone_knows_the answer",
               "s3.appreciate_the_help")

我认为对此有一个正则表达式答案,但我正在努力寻找一个.也许还有一个整洁的答案?

I assume there is a Regex answer to this but i am struggling to find one. perhaps there is also a tidyr answer?

推荐答案

正如@thelatemail 所指出的,您的数据实际上没有在第五个位置有下划线,但有些数据在第六个位置(其他人有点).基本的 R 方法是使用 gsub():

As @thelatemail pointed out, none of your data actually has underscores in the fifth position, but some have it in the sixth position (where others have dot). A base R approach would be to use gsub():

result <- gsub("^(.{5})_", "\\1.", col_names)

> result
 [1] "s1.help_needed"              "s1.Q2.im_stuck"             
 [3] "s1.Q2.im_stuck"              "s1.Q3.regex"                
 [5] "s1.Q3.regex"                 "s2.Q1.is_confusing"         
 [7] "s2.Q2.answer_please"         "s2.Q2.answer_please"        
 [9] "s2.someone_knows_the answer" "s3.appreciate_the_help"

这里是正则表达式的解释:

Here is an explanation of the regex:

^         from the start of the string
(.{5})    match AND capture any five characters
_         followed by an underscore

括号中的数量称为捕获组,可用于通过\\1进行替换.所以正则表达式说用我们捕获的五个字符替换前六个字符,但使用点作为第六个字符.

The quantity in parentheses is called a capture group and can be used in the replacement via \\1. So the regex is saying replace the first six characters with the five characters we captured but use a dot as the sixth character.

这篇关于仅当它是 R 中的特定字符时才替换字符串中的第 n 个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆