使用上一列名称和正则表达式模式重命名R中的数据框列名称 [英] Rename Dataframe Column Names in R using Previous Column Name and Regex Pattern

查看:132
本文介绍了使用上一列名称和正则表达式模式重命名R中的数据框列名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我第一次在R中工作,而且在重命名数据帧(Grade.Data)中的列名时遇到了困难.我有一个从csv文件导入的数据集,该数据集具有如下列名: 学生编号

I am working in R for the first time and I have been having difficulty renaming column names in a dataframe (Grade.Data). I have a dataset imported from an csv file that has column names like this: Student.ID

Grade    

Interactive.Exercises.1..Health

Interactive.Exercises.2..Fitness

Quizzes.1..Week.1.Quiz

Quizzes.2..Week.2.Quiz

Case.Studies.1..Case.Study1

Case.Studies.2..Case.Study2

我希望能够更改变量名称,使它们更简单,即从Interactive.Exercises.1.Health更改为Interactive.Exercises.1或Quizzes.1.Week.1.Quiz更改为Quizzes.1.

I would like to be able to change the variable names so that they are more simple, i.e. from Interactive.Exercises.1.Health to Interactive.Exercises.1 or Quizzes.1.Week.1.Quiz to Quizzes.1

到目前为止,我已经尝试过:

So far, I have tried this:

grep(".*[0-9]", names(Grade.Data))

但是我得到了这个返回:

But I get this returned:

[1]  3  4  5  6  7  8  9 11 12 13 14 15 16 17 19 20 21 22 23 24 25

有人可以帮助我弄清楚发生了什么,并写出更好的正则表达式吗?非常感谢.

Can anyone help me figure out what is going on, and write a better regex expression? Thank you so much.

推荐答案

似乎您在第一批数字后截断了列名.

It seems you truncate column names after the first chunk of digits.

您可以使用以下sub解决方案:

You may use the following sub solution:

names(Grade.Data) <- sub("^(.*?\\d+).*$", "\\1", names(Grade.Data))

请参见 regex演示

详细信息

  • ^-字符串开头
  • (.*?\\d+)-第1组(后继替换模式中用\1表示)匹配的0+个字符越少越好(.*?),然后匹配1个或多个数字(\d+)
  • .*-尽可能多的0个字符
  • $-字符串结尾
  • ^ - start of string
  • (.*?\\d+) - Group 1 (later referred with \1 from the replacement pattern) matching any 0+ chars as few as possible (.*?) and then 1 or more digits (\d+)
  • .* - any 0+ chars as many as possible
  • $ - end of string

这篇关于使用上一列名称和正则表达式模式重命名R中的数据框列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆