dplyr:substr的向量化 [英] dplyr: vectorisation of substr

查看:78
本文介绍了dplyr:substr的向量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请参考问题将dplyr%>%mutate替换为,对于@akrun的答案,为什么两个创建的列给出相同的答案?

Referring to question substr in dplyr %>% mutate, and to @akrun 's answer, why do the two created columns give the same answer?

df <- data_frame(t = '1234567890ABCDEFG', a = 1:5, b = 6:10)
df %>%  mutate(u = substr(t, a,  a + b), v = substring(t, a,  a + b))

我无法理解原始问题与情况的区别。
谢谢!

I can't grasp the difference with the situation in the original question. Thank you!

推荐答案

区别在于矢量化

substr("1234567890ABCDEFG", df$a, df$a+df$b)
#[1] "1234567"
substring("1234567890ABCDEFG", df$a, df$a+df$b)
#[1] "1234567"     "23456789"    "34567890A"   "4567890ABC"  "567890ABCDE"

substr 仅返回单个值,而 substring 返回长度向量等于数据集'df'中的行数。由于只有一个值输出,因此会在 mutate 中对其进行回收。但是,如果我们使用多个值,即

The substr returns only a single value while the substring returns a vector of length equal to the number of rows in the dataset 'df'. As there is only a single value output, it gets recycled in the mutate. However, if we are using multiple values i.e.

substr(rep("1234567890ABCDEFG", nrow(df)), df$a, df$a+df$b)
#[1] "1234567"     "23456789"    "34567890A"   "4567890ABC"  "567890ABCDE"
substring(rep("1234567890ABCDEFG", nrow(df)), df$a, df$a+df$b)
#[1] "1234567"     "23456789"    "34567890A"   "4567890ABC"  "567890ABCDE"

然后,输出是相同的。在OP的示例中,它得到上述输出,因为 substr 中的 x 具有与<$ c $相同的长度。 c>开始和停止。我们可以使用

Then, the output is the same. In the OP's example, it gets the above output as the x in substr has the same length as start and stop. We can replicate the first output with

 df %>%
     mutate(u = substr("1234567890ABCDEFG", a, a+b),
            v = substring("1234567890ABCDEFG", a, a+b)) 
#                 t     a     b       u           v
#              (chr) (int) (int)   (chr)       (chr)
#1 1234567890ABCDEFG     1     6 1234567     1234567
#2 1234567890ABCDEFG     2     7 1234567    23456789
#3 1234567890ABCDEFG     3     8 1234567   34567890A
#4 1234567890ABCDEFG     4     9 1234567  4567890ABC
#5 1234567890ABCDEFG     5    10 1234567 567890ABCDE

这篇关于dplyr:substr的向量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆