使用Sparklyr将多个列值取消嵌套(分离)到新行中 [英] Unnest (seperate) multiple column values into new rows using Sparklyr
本文介绍了使用Sparklyr将多个列值取消嵌套(分离)到新行中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试根据ID将用逗号(,)分隔的列值划分为新行。我知道如何使用 dplyr 和 tidyr 在 R 中执行此操作。但我想在 sparklyr 中解决相同的问题。
I am trying to split column values separated by comma(,) into new rows based on id's. I know how to do this in R using dplyr and tidyr. But I am looking to solve same problem in sparklyr.
id <- c(1,1,1,1,1,2,2,2,3,3,3)
name <- c("A,B,C","B,F","C","D,R,P","E","A,Q,W","B,J","C","D,M","E,X","F,E")
value <- c("1,2,3","2,4,43,2","3,1,2,3","1","1,2","26,6,7","3,3,4","1","1,12","2,3,3","3")
dt <- data.frame(id,name,value)
R解决方案:
separate_rows(dt, name, sep=",") %>%
separate_rows(value, sep=",")
sparkframe(sparklyr程序包)所需的输出-
> final_result
id name value
1 1 A 1
2 1 A 2
3 1 A 3
4 1 B 1
5 1 B 2
6 1 B 3
7 1 C 1
8 1 C 2
9 1 C 3
10 1 B 2
11 1 B 4
12 1 B 43
13 1 B 2
14 1 F 2
15 1 F 4
16 1 F 43
17 1 F 2
18 1 C 3
19 1 C 1
20 1 C 2
21 1 C 3
22 1 D 1
23 1 R 1
24 1 P 1
25 1 E 1
26 1 E 2
27 2 A 26
28 2 A 6
29 2 A 7
30 2 Q 26
31 2 Q 6
32 2 Q 7
33 2 W 26
34 2 W 6
35 2 W 7
36 2 B 3
37 2 B 3
38 2 B 4
39 2 J 3
40 2 J 3
41 2 J 4
42 2 C 1
43 3 D 1
44 3 D 12
45 3 M 1
46 3 M 12
47 3 E 2
48 3 E 3
49 3 E 3
50 3 X 2
51 3 X 3
52 3 X 3
53 3 F 3
54 3 E 3
注意-
- 我大约有1000列带有嵌套值。因此,我需要一个可以为每列循环的函数。
- 我知道我们从包<$中获得了
sdf_unnest()
函数c $ c> sparklyr.nested 。但是,我不确定如何拆分多列的字符串并应用此功能。我对Sparklyr相当陌生。
- I have approx 1000 columns with nested values. so, I need a function which can loop in for each column.
- I know we have
sdf_unnest()
function from packagesparklyr.nested
. But, I am not sure how to split strings of multiple columns and apply this function. I am quite new in sparklyr.
任何帮助将不胜感激。
推荐答案
必须结合爆炸
和 split
sdt %>%
mutate(name = explode(split(name, ","))) %>%
mutate(value = explode(split(value, ",")))
# Source: lazy query [?? x 3]
# Database: spark_connection
id name value
<dbl> <chr> <chr>
1 1.00 A 1
2 1.00 A 2
3 1.00 A 3
4 1.00 B 1
5 1.00 B 2
6 1.00 B 3
7 1.00 C 1
8 1.00 C 2
9 1.00 C 3
10 1.00 B 2
# ... with more rows
请注意,侧视图必须表示为单独的子查询,因此:
Please note that lateral views have be to expressed as separate subqueries, so this:
sdt %>%
mutate(
name = explode(split(name, ",")),
value = explode(split(value, ",")))
不起作用
这篇关于使用Sparklyr将多个列值取消嵌套(分离)到新行中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文