在第一个冒号处分割字符串 [英] Split strings at the first colon

查看:349
本文介绍了在第一个冒号处分割字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用readLines读取文本格式的数据文件.第一个列"是我不需要的复杂文本.接下来的列包含我需要的数据.第一个列"和数据用冒号(:)分隔.我希望在第一个冒号处分割每一行,并删除结果文本字符串,仅保留数据.

I am reading data files in text format using readLines. The first 'column' is complicated text that I do not need. The next columns contain data that I do need. The first 'column' and the data are separated by a colon (:). I wish to split each row at the first colon and delete the resulting text string, keeping only the data.

下面是一个示例数据文件.一种潜在的复杂性是一行数据包含多个冒号.该行有时可能成为我的标题.因此,我可能不应该在每个冒号处分割,而应该在第一个冒号处分割.

Below is an example data file. One potential complication is that one line of data contains multiple colons. That line may at some point become my header. So, I probably should not split at every colon, just at the first colon.

my.data <- "first string of text..:  aa : bb : cc 
            next string ........  :   2    0    2
            third string......1990:   7    6    5
            last string           :   4    2    3"

my.data2 <- readLines(textConnection(my.data))
my.data2

我已经尝试过在此处显示的代码:

I have tried code presented here:

在字符串中的第一个逗号处分隔

在这里:

R:从字符串中删除最后三个点

上面第一个链接处的代码似乎仅在第一行的第一个冒号处拆分.第二个链接上的代码可能会完成我想要的操作,但是到目前为止,对于我来说,成功修改它太复杂了.

Code at the first link above seems to split only at the first colon of the first row. Code at the second link will probably do what I want, but is too complex for me to modify it successfully so far.

这是我希望获取的数据,这时我可以使用非常简单的gsub语句将第一行中的其余冒号替换为空白:

Here are the data I hope to obtain, at which point I can simply replace the remaining colons in the first row with empty spaces using a very simple gsub statement:

   aa : bb : cc 
    2    0    2
    7    6    5
    4    2    3

很抱歉,如果这是我没有找到的帖子的重复,并感谢您的任何建议或帮助.

Sorry if this is a duplicate of a post I have not located and thank you for any advice or assistance.

推荐答案

以下内容将从字符串的开头开始,然后获取所有内容,包括第一个冒号和任何其他空格,并将其替换为空(基本上只是删除它)

The following will start at the beginning of the string and then grab everything up to and including the first colon and any additional spaces and replace that with nothing (essentially just removing it)

gsub("^[^:]+:\\s*", "", my.data2)

如果您不想删除空格,可以

If you don't want to remove the spaces you could do

gsub("^[^:]+:", "", my.data2)


对于原始正则表达式在做什么有一些说明.从头开始:


For some clarification on what the original regular expression is doing. Starting at the beginning:

^这表示仅在字符串开头查找匹配项

^ this says to only find matches at the start of the string

[^:]这表示不是冒号的任何字符

[^:] this represents any character that is not a colon

+表示匹配前面的字符一次或多次(因此匹配尽可能多的非冒号字符)

+ this says to match the preceding character one or more times (so match as many non-colon characters as possible)

:这实际上是与冒号匹配的

: this is what actually matches the colon

\\s这与空格匹配

*这表示匹配前面的字符零次或多次(因此我们在冒号后删除了所有多余的空格)

* this says to match the preceding character zero or more times (so we remove any additional space after the colon)

因此,将所有内容放在一起,我们从字符串的开头开始,然后匹配尽可能多的非冒号字符,然后获取第一个冒号字符和任何其他空格,然后全部替换为空(基本上删除了所有垃圾内容我们不想要).

So putting it all together we start at the beginning of the string then match as many non-colon characters as possible then grab the first colon character and any additional spaces and replace all of that with nothing (essentially removing all of the junk we don't want).

这篇关于在第一个冒号处分割字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆