如何将字符串向量转换为标题大小写 [英] How to convert a vector of strings to Title Case

查看:42
本文介绍了如何将字符串向量转换为标题大小写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小写的字符串向量.我想将它们更改为标题大小写,这意味着每个单词的第一个字母都将大写.我已经设法用双循环来做到这一点,但我希望有一种更有效、更优雅的方式来做到这一点,也许是带有 gsub 和正则表达式的单行.

I have a vector of strings in lower case. I'd like to change them to title case, meaning the first letter of every word would be capitalized. I've managed to do it with a double loop, but I'm hoping there's a more efficient and elegant way to do it, perhaps a one-liner with gsub and a regex.

以下是一些示例数据,以及有效的双循环,然后是我尝试过但无效的其他操作.

Here's some sample data, along with the double loop that works, followed by other things I tried that didn't work.

strings = c("first phrase", "another phrase to convert",
            "and here's another one", "last-one")

# For each string in the strings vector, find the position of each 
#  instance of a space followed by a letter
matches = gregexpr("\\b[a-z]+", strings) 

# For each string in the strings vector, convert the first letter 
#  of each word to upper case
for (i in 1:length(strings)) {

  # Extract the position of each regex match for the string in row i
  #  of the strings vector.
  match.positions = matches[[i]][1:length(matches[[i]])] 

  # Convert the letter in each match position to upper case
  for (j in 1:length(match.positions)) {

    substr(strings[i], match.positions[j], match.positions[j]) = 
      toupper(substr(strings[i], match.positions[j], match.positions[j]))
  }
}

这行得通,但似乎异常复杂.我只是在尝试使用更直接的方法失败后才使用它.以下是我尝试过的一些方法以及输出:

This worked, but it seems inordinately complicated. I resorted to it only after experimenting unsuccessfully with more straightforward approaches. Here are some of the things I tried, along with the output:

# Google search suggested \\U might work, but evidently not in R
gsub("(\\b[a-z]+)", "\\U\\1" ,strings)
[1] "Ufirst Uphrase"                "Uanother Uphrase Uto Uconvert"
[3] "Uand Uhere'Us Uanother Uone"   "Ulast-Uone"                   

# I tried this on a lark, but to no avail
gsub("(\\b[a-z]+)", toupper("\\1"), strings)
[1] "first phrase"              "another phrase to convert"
[3] "and here's another one"    "last-one"  

正则表达式捕获每个字符串中的正确位置,如调用 gregexpr 所示,但替换字符串显然没有按预期工作.

The regex captures the correct positions in each string as shown by a call to gregexpr, but the replacement string is clearly not working as desired.

如果您还不知道,我对正则表达式还比较陌生,希望得到有关如何让替换正常工作的帮助.我还想学习如何构建正则表达式以避免在撇号后捕获字母,因为我不想更改这些字母的大小写.

If you can't already tell, I'm relatively new to regexes and would appreciate help on how to get the replacement to work correctly. I'd also like to learn how to structure the regex so as to avoid capturing a letter after an apostrophe, since I don't want to change the case of those letters.

推荐答案

主要问题是您缺少 perl=TRUE(并且您的正则表达式略有错误,尽管这可能是结果试图解决第一个问题).

The main problem is that you're missing perl=TRUE (and your regex is slightly wrong, although that may be a result of flailing around to try to fix the first problem).

使用 [:lower:] 而不是 [az] 稍微安全一些,以防您的代码最终以某种奇怪的方式运行 (抱歉,爱沙尼亚人) 语言环境,其中 z 不是最后一个字母字母表...

Using [:lower:] instead of [a-z] is slightly safer in case your code ends up being run in some weird (sorry, Estonians) locale where z is not the last letter of the alphabet ...

re_from <- "\\b([[:lower:]])([[:lower:]]+)"
strings <- c("first phrase", "another phrase to convert",
             "and here's another one", "last-one")
gsub(re_from, "\\U\\1\\L\\2" ,strings, perl=TRUE)
## [1] "First Phrase"              "Another Phrase To Convert"
## [3] "And Here's Another One"    "Last-One"    

您可能更喜欢使用 \\E(停止大写)而不是 \\L(开始小写),具体取决于您要遵循的规则,例如:

You may prefer to use \\E (stop capitalization) rather than \\L (start lowercase), depending on what rules you want to follow, e.g.:

string2 <- "using AIC for model selection"
gsub(re_from, "\\U\\1\\E\\2" ,string2, perl=TRUE)
## [1] "Using AIC For Model Selection"

这篇关于如何将字符串向量转换为标题大小写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆