在R编程中连接来自不同列的备用字符 [英] Concatenate alternate characters from different columns in R programming
问题描述
我有2列的df。我需要在Col3中合并Col1和Col2-用> a1-b1; a2-b2; a3-b3; ......分隔的替代文本...
I have a df with 2 columns. I need to combine Col1 and Col2 in Col3 - alternate text separated by ">" a1-b1;a2-b2;a3-b3;...
示例
| Col1 | Col2 | Col3 |
| abcd > de > efg | ppppp > ppt > pp | abcd-ppppp > de-ppt > efg-pp |
| hij > kl > iiii | aaa > bbb > hhh | hij-aaa > kl-bbb > iiii-hhh |
| aa | fff | aa-fff |
| a > bbb | pp > a | a-pp > bbb-a |
....
如何我在R编程中做到了吗?
谢谢
How can I do that in R programming? Thanks
推荐答案
这很难解决。将来,出于我们的理智考虑,请考虑如何输出数据。如果您生成了数据,但考虑进行下游分析,则可以轻松解决此问题。无论如何,这里都是解决方案。
This was a pain in the ass to solve. In the future, for our sanity please consider how you output your data. This could have been easily solved if, however the data was generated, you consider downstream analysis. Anyway enough whinging here is the solution.
让我们生成您的数据:
Col1 <- c("abcd > de > efg", "hij > kl > iiii", "aa", "a > bbb")
Col2 <- c("ppppp > ppt > pp", "aaa > bbb > hhh", "fff", "pp > a")
dat <- data.frame(Col1, Col2, stringsAsFactors = FALSE)
接下来使用 apply
剥离,分离并展平 Col1
和 Col2
并添加第一个分隔符-
:
Next using apply
we strip, separate and flatten Col1
and Col2
and add the first separator -
:
l1 <- apply(dat, 2, function(x) trimws(unlist(strsplit(x, split = ">"))))
l2 <- apply(l1, 1, function(x) paste0(x[1], "-", x[2]))
下一部分非常困难,经过大量的搜寻之后,我找到了一种解决方案(技巧),用数字矢量将字符列表分开。 / p>
The next part was surprisingly difficult, after much googling I found a solution (a hack) to split a list of characters by a numeric vector.
#thanks: https://techoverflow.net/2012/11/10/r-count-occurrences-of-character-in-string/
#gets occurrences of ">" for later use
countCharOccurrences <- function(char, s) {
s2 <- gsub(char,"",s)
return (nchar(s) - nchar(s2))
}
o <- countCharOccurrences(">", dat$Col1)+1
df <- as.data.frame(l2, stringsAsFactors = FALSE)
通过>的出现分割 df
(即 o
的值):
Split df
by the occurrences of ">" (i.e the values of o
):
# Thanks to this SO answer:
# https://stackoverflow.com/questions/27132290/split-dataframe-by-row-number-in-r
l2a <- split(df, cumsum(c(TRUE,(1:nrow(df) %in% cumsum(o))[-nrow(df)])))
最后,我们折叠数据框列表并添加最后的分隔符>
:
Finally, we collapse list of dataframes and add the final separator >
:
l3 <- lapply(l2a, function(x) paste(x[,1], collapse = " > "))
然后与您的起始数据框组合:
Then combine with your starting dataframe:
dat$Col3 <- l3
Col1 Col2 Col3
1 abcd > de > efg ppppp > ppt > pp abcd-ppppp > de-ppt > efg-pp
2 hij > kl > iiii aaa > bbb > hhh hij-aaa > kl-bbb > iiii-hhh
3 aa fff aa-fff
4 a > bbb pp > a a-pp > bbb-a
Tada!
编辑:我忘记了 l3
是对象列表。您需要使用 unlist
将其扁平化:
edit: I had forgotten l3
is a list of objects. You need to use unlist
to flatten them like this:
dat$Col3 <- unlist(l3)
这篇关于在R编程中连接来自不同列的备用字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!