从宽到长的数据表转换,在列和行中具有变量 [英] wide to long data table transformation with variables in columns and rows

查看:135
本文介绍了从宽到长的数据表转换,在列和行中具有变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv,其中有多个表,行和列中都存储有变量。

关于此csv:

I have a csv with multiple tables with variables stored in both rows and columns.
About this csv:


  1. 我想从宽到长

  2. 一个csv中有多个数据帧

  3. 每个数据帧的变量类型







> df3
     V1          V2    V3     V4      V5     V6      V7    V8
1   nyc 123 main st month      1       2      3       4     5
2   nyc 123 main st     x  58568  567567 567909   35876 56943
3   nyc 123 main st     y   5345    3673   3453    3467   788
4   nyc 123 main st     z  53223  563894 564456   32409 56155
5                                                            
6    la  63 main st month      1       2      3       4     5
7    la  63 main st     a  87035 7467456   3363     863 43673
8    la  63 main st     b    345     456    345     678   345
9    la  63 main st     c  86690 7467000   3018     185 43328
10                                                           
11   sf 953 main st month      1       2      3       4     5
12   sf 953 main st     x 457456    3455 345345   56457  3634
13   sf 953 main st     b   5345    3673   3453    3467   788
14   sf 953 main st     z 452111    -218 341892   52990  2846







> df4
18 city     address month      x       y      z       a     b       c
19  nyc 123 main st     1  58568    5345  53223    null  null    null
20  nyc 123 main st     2 567567    3673 563894    null  null    null
21  nyc 123 main st     3 567909    3453 564456    null  null    null
22  nyc 123 main st     4  35876    3467  32409    null  null    null
23  nyc 123 main st     5  56943     788  56155    null  null    null
24   la  63 main st     1   null    null   null   87035   345   86690
25   la  63 main st     2   null    null   null 7467456   456 7467000
26   la  63 main st     3   null    null   null    3363   345    3018
27   la  63 main st     4   null    null   null     863   678     185
28   la  63 main st     5   null    null   null   43673   345   43328
29   sf 953 main st     1 457456    null 452111    null  5345    null
30   sf 953 main st     2   3455    null   -218    null  3673    null
31   sf 953 main st     3 345345    null 341892    null  3453    null
32   sf 953 main st     4  56457    null  52990    null  3467    null
33   sf 953 main st     5   3634    null   2846    null   788    null

顶部是我拥有的数据,底部是我想要的转换。

The top is the data I have, the bottom is the transformation I want.

我最擅长R,但是我正在练习Python,所以任何方法都行。

I'm most comfortable in R but I'm practicing Python, so any approach works.

推荐答案

的df有正确的列名,请在读入数据后插入列名。

It would help first if you had proper column names for your df, please insert column names once you read in the data.

我使用了以下库, dplyr stringr 进行此分析,并重命名了前三列:

I have use the following libraries, dplyr and stringr for this analysis and also renamed the first 3 columns:

df <- data.frame(stringsAsFactors=FALSE,
        city = c("nyc", "nyc", "nyc"),
     address = c("123 main st", "123 main st", "123 main st"),
       month = c("x", "y", "z"),
          X1 = c(58568L, 5345L, 53223L),
          X2 = c(567567L, 3673L, 563894L),
          X3 = c(567909L, 3453L, 564456L),
          X4 = c(35876L, 3467L, 32409L),
          X5 = c(56943L, 788L, 56155L)
)

df %>% gather(Type, Value, -c(city:month)) %>% 
        spread(month, Value) %>%
        mutate(month = str_sub(Type, 2, 2)) %>%
        select(-Type) %>%
        select(c(city, address, month, x:z))

city     address month      x    y      z
1  nyc 123 main st     1  58568 5345  53223
2  nyc 123 main st     2 567567 3673 563894
3  nyc 123 main st     3 567909 3453 564456
4  nyc 123 main st     4  35876 3467  32409
5  nyc 123 main st     5  56943  788  56155

这篇关于从宽到长的数据表转换,在列和行中具有变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆