tidyr 仅分离前 n 个实例 [英] tidyr separate only first n instances

查看:19
本文介绍了tidyr 仅分离前 n 个实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有一个 data.frame,为简单起见,我想将其中的一列分开.它看起来像这样:

I have a data.frame in R, which, for simplicity, has one column that I want to separate. It looks like this:

V1
Value_is_the_best_one
This_is_the_prettiest_thing_I've_ever_seen
Here_is_the_next_example_of_what_I_want

我的真实数据非常大(数百万行),所以我想使用 tidyr 的单独函数(因为它非常快)来分离出前几个实例.我希望结果如下:

My real data is very large (millions of rows), so I'd like to use tidyr's separate function (because it's amazingly fast) to separate out JUST the first few instances. I'd like the result to be the following:

V1       V2     V3     V4 
Value    is     the    best_one
This     is     the    prettiest_thing_I've_ever_seen
Here     is     the    next_example_of_what_I_want

如您所见,分隔符是 _ V4 列可以有不同数量的分隔符.我想保留 V4(而不是丢弃它),但不必担心里面有多少东西.总会有四列(即我的行都没有只有 V1-V3).

As you can see, the separator is _ the V4 column can have different numbers of the separators. I want to keep V4 (not discard it), but not have to worry about how much stuff is in there. There will always be four columns (i.e. none of my rows have only V1-V3).

这是我一直在使用的起始 tidyr 命令:

Here is my starting tidyr command I've been working with:

separate(df, V1, c("V1", "V2", "V3", "V4"), sep="_")

这摆脱了 V4(并发出警告,这不是最重要的).

This gets rid of V4 (and spits out warnings, which isn't the biggest deal).

推荐答案

您需要带有 "merge" 选项的 extra 参数.这仅允许与定义的新列一样多的拆分.

You need the extra argument with the "merge" option. This allows only as many splits as you have new columns defined.

separate(df, V1, c("V1", "V2", "V3", "V4"), extra = "merge")

     V1 V2  V3                             V4
1 Value is the                       best_one
2  This is the prettiest_thing_I've_ever_seen
3  Here is the    next_example_of_what_I_want

这篇关于tidyr 仅分离前 n 个实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆