使用Sparklyr将多个列值取消嵌套(分离)到新行中 [英] Unnest (seperate) multiple column values into new rows using Sparklyr

查看:106
本文介绍了使用Sparklyr将多个列值取消嵌套(分离)到新行中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据ID将用逗号(,)分隔的列值划分为新行。我知道如何使用 dplyr tidyr R 中执行此操作。但我想在 sparklyr 中解决相同的问题。

I am trying to split column values separated by comma(,) into new rows based on id's. I know how to do this in R using dplyr and tidyr. But I am looking to solve same problem in sparklyr.

id <- c(1,1,1,1,1,2,2,2,3,3,3)
name <- c("A,B,C","B,F","C","D,R,P","E","A,Q,W","B,J","C","D,M","E,X","F,E")
value <- c("1,2,3","2,4,43,2","3,1,2,3","1","1,2","26,6,7","3,3,4","1","1,12","2,3,3","3")
dt <- data.frame(id,name,value)

R解决方案:

separate_rows(dt, name, sep=",") %>%
  separate_rows(value, sep=",")

sparkframe(sparklyr程序包)所需的输出-

> final_result
   id name value
1   1    A     1
2   1    A     2
3   1    A     3
4   1    B     1
5   1    B     2
6   1    B     3
7   1    C     1
8   1    C     2
9   1    C     3
10  1    B     2
11  1    B     4
12  1    B    43
13  1    B     2
14  1    F     2
15  1    F     4
16  1    F    43
17  1    F     2
18  1    C     3
19  1    C     1
20  1    C     2
21  1    C     3
22  1    D     1
23  1    R     1
24  1    P     1
25  1    E     1
26  1    E     2
27  2    A    26
28  2    A     6
29  2    A     7
30  2    Q    26
31  2    Q     6
32  2    Q     7
33  2    W    26
34  2    W     6
35  2    W     7
36  2    B     3
37  2    B     3
38  2    B     4
39  2    J     3
40  2    J     3
41  2    J     4
42  2    C     1
43  3    D     1
44  3    D    12
45  3    M     1
46  3    M    12
47  3    E     2
48  3    E     3
49  3    E     3
50  3    X     2
51  3    X     3
52  3    X     3
53  3    F     3
54  3    E     3

注意-


  1. 我大约有1000列带有嵌套值。因此,我需要一个可以为每列循环的函数。

  2. 我知道我们从包<$中获得了 sdf_unnest()函数c $ c> sparklyr.nested 。但是,我不确定如何拆分多列的字符串并应用此功能。我对Sparklyr相当陌生。

  1. I have approx 1000 columns with nested values. so, I need a function which can loop in for each column.
  2. I know we have sdf_unnest() function from package sparklyr.nested. But, I am not sure how to split strings of multiple columns and apply this function. I am quite new in sparklyr.

任何帮助将不胜感激。

推荐答案

必须结合爆炸 split

sdt %>% 
  mutate(name = explode(split(name, ","))) %>% 
  mutate(value = explode(split(value, ",")))



# Source:   lazy query [?? x 3]
# Database: spark_connection
      id name  value
   <dbl> <chr> <chr>
 1  1.00 A     1    
 2  1.00 A     2    
 3  1.00 A     3    
 4  1.00 B     1    
 5  1.00 B     2    
 6  1.00 B     3    
 7  1.00 C     1    
 8  1.00 C     2    
 9  1.00 C     3    
10  1.00 B     2   
# ... with more rows   

请注意,侧视图必须表示为单独的子查询,因此:

Please note that lateral views have be to expressed as separate subqueries, so this:

sdt %>% 
  mutate(
    name = explode(split(name, ",")),
     value = explode(split(value, ",")))

不起作用

这篇关于使用Sparklyr将多个列值取消嵌套(分离)到新行中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆