对数据框架进行改写 [英] Reframing magic on data.frame

查看：401 发布时间：2017/3/26 0:01:35 r dataframe reshape

本文介绍了对数据框架进行改写的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在学习使用data.frame，并对如何重新排序他们很困惑。

目前，我有一个data.frame，显示：

列1：商店名称

第2列：产品

第3栏：本店购买此产品的数量

或视觉上如下所示：

  + --- + ----------- + ------- + ---- ------ +  -  + 
 | | Shop.Name |项目|产品| | 
 + --- + ----------- + ------- + ---------- +  -  + 
 | 1 | Shop1 | 2 | Product1 | | 
 | 2 | Shop1 | 4 | Product2 | | 
 | 3 | Shop2 | 3 | Product1 | | 
 | 4 | Shop3 | 2 | Product1 | | 
 | 5 | Shop3 | 1 | Product4 | | 
 + --- + ----------- + ------- + ---------- +  -  +

我想实现的是以下店铺为中心的结构：

第1栏：商店名称

第2列：为product1出售的商品

第3栏：已售出的商品对于product2

列4：出售的商品3
...

当没有特定商店/产品的线（因为没有销售）时，我想创建一个0。

或

  + --- + ------- + ------- + ------- + ----- -  + ------- + ----- +  -  +  -  + 
 | |商店| Prod1 | Prod2 | Prod3 | Prod4 | ... | | | 
 + --- + ------- + ------- + ------- + ------- + ------- +  - ---- +  -  +  -  + 
 | 1 | Shop1 | 2 | 4 | 0 | 0 | ... | | | 
 | 2 | Shop2 | 3 | 0 | 0 | 0 | ... | | | 
 | 3 | Shop3 | 2 | 0 | 0 | 1 | ... | | | 
 + --- + ------- + ------- + ------- + ------- + ------- +  - ---- +  -  +  -  +

提前感谢！

解决方案

到目前为止，答案在一定程度上工作，但不完全回答您的问题。特别是，他们没有解决没有出售特定产品的商店的情况的问题。从您的示例输入和所需的输出，没有出售Product3的商店。事实上，Product3甚至不会出现在您的来源 data.frame 中。另外，它们并没有解决每个Shop + Product组合有多个行的可能情况。

这是一个修改版本的数据和两个解决方案到目前为止。我为Shop1和Product1的组合添加了另一行。请注意，我已将您的产品转换为因子变量，其中包括变量可以采取的级别，即使没有一个案例实际上具有该级别。

  mydf<  -  data.frame（
 Shop.Name = c（Shop1，Shop1，Shop2商品3，Shop3，Shop1），
 Items = c（2，4，3，2，1，2），
 Product = factor（
c（Product1 Product2，Product1，Product1，Product4，Product1），
 levels = c（Product1，Product2，Product3，Product4）））

dcast 从reshape2

  library（reshape2）
 dcast（mydf，formula = Shop.Name〜Product，value = items，fill = 0）
＃使用Product作为值列：使用value.var来覆盖。 
＃缺少聚合功能：缺省长度
＃.fun中的错误（.value [i]，...）：
＃2传递给'length'的参数需要1

突然不起作用这样做：

  dcast（mydf，formula = Shop.Name〜Product，
 fill = 0，value。 var =Items，
 fun.aggregate = sum，drop = FALSE）
＃Shop.Name Product1 Product2 Product3 Product4 
＃1 Shop1 4 4 0 0 
＃2 Shop2 3 0 0 0 
＃3 Shop3 2 0 0 1

。 c code code code code code code code code code codeast（mydf，formula = Shop.Name〜Product，value =Items，fill = 0）
＃聚合需要fun.aggregate：用作默认值的长度
＃Shop.Name Product1 Product2 Product4
＃1 Shop1 2 1 0
＃2 Shop2 1 0 0
＃3 Shop3 1 0 1

呃不是你想要的再次尝试这样做：
```
  cast（mydf，formula = Shop.Name〜Product，
 value =Items，fill = 0，
 add.missing = TRUE，fun.aggregate = sum）
＃Shop.Name Product1 Product2 Product3 Product4 
＃1 Shop1 4 4 0 0 
＃2 Shop2 3 0 0 0 
＃3 Shop3 2 0 0 1 
  
```

让我们回到基础。 xtabs 从基础R

  xtabs（Items〜Shop.Name + Product ，mydf）
＃Product 
＃Shop.Name Product1 Product2 Product3 Product4 
＃Shop1 4 4 0 0 
＃Shop2 3 0 0 0 
＃Shop3 2 0 0 1

或者，如果您更喜欢 data.frame （请注意，您的Shop.Name变量已转换为 data.frame row.names >）：

  as.data.frame.matrix（xtabs（Items〜Shop.Name + Product，mydf））
＃Product1 Product2 Product3 Product4 
＃Shop1 4 4 0 0 
＃Shop2 3 0 0 0 
＃Shop3 2 0 0 1 
  pre>

I am currently learning to work with data.frame and quite confused on how to reorder them.

At the moment, I have a data.frame that shows :

column 1: a shop name
column 2: a product
column 3: the number of purchase for this product by this shop

or visually something like this:

+---+-----------+-------+----------+--+
|   | Shop.Name | Items | Product  |  |
+---+-----------+-------+----------+--+
| 1 | Shop1     |     2 | Product1 |  |
| 2 | Shop1     |     4 | Product2 |  |
| 3 | Shop2     |     3 | Product1 |  |
| 4 | Shop3     |     2 | Product1 |  |
| 5 | Shop3     |     1 | Product4 |  |
+---+-----------+-------+----------+--+

What I would like to achieve is the following "shop-centric" structure:

column 1: a shop name
column 2: Items sold for product1
column 3: Items sold for product2
column 4: Items sold for product3 ...

When there is no line for a specific shop/product (because of no sales), I would like to create a 0.

+---+-------+-------+-------+-------+-------+-----+--+--+
|   | Shop  | Prod1 | Prod2 | Prod3 | Prod4 | ... |  |  |
+---+-------+-------+-------+-------+-------+-----+--+--+
| 1 | Shop1 |     2 |     4 |     0 |     0 | ... |  |  |
| 2 | Shop2 |     3 |     0 |     0 |     0 | ... |  |  |
| 3 | Shop3 |     2 |     0 |     0 |     1 | ... |  |  |
+---+-------+-------+-------+-------+-------+-----+--+--+

Thanks a lot in advance!

解决方案

The answers so far work to a certain degree, but don't fully answer your question. In particular, they don't address the issue of a case in which there are no shops which sold a particular product. From your example input and desired output, there were no shops which sold "Product3". Indeed, "Product3" does not even appear in your source data.frame. Additionally, they do not address the possible situation of having more than one row for each Shop + Product combination.

Here's a modified version of your data and the two solutions so far. I've added another row for a combination of "Shop1" and "Product1". Notice that I have converted your products to a factor variable that includes the levels that the variable can take, even if none of the cases actually has that level.

mydf <- data.frame(
  Shop.Name = c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3", "Shop1"),
  Items = c(2, 4, 3, 2, 1, 2),
  Product = factor(
    c("Product1", "Product2", "Product1", "Product1", "Product4", "Product1"),
    levels = c("Product1", "Product2", "Product3", "Product4")))

dcast from "reshape2"

library(reshape2)
dcast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Using Product as value column: use value.var to override.
# Aggregation function missing: defaulting to length
# Error in .fun(.value[i], ...) : 
#   2 arguments passed to 'length' which requires 1

Wha? Suddenly does not work. Do this instead:

dcast(mydf, formula = Shop.Name ~ Product, 
      fill = 0, value.var = "Items", 
      fun.aggregate = sum, drop = FALSE)
#   Shop.Name Product1 Product2 Product3 Product4
# 1     Shop1        4        4        0        0
# 2     Shop2        3        0        0        0
# 3     Shop3        2        0        0        1

Let's be oldschool. cast from "reshape"

library(reshape)
cast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Aggregation requires fun.aggregate: length used as default
#   Shop.Name Product1 Product2 Product4
# 1     Shop1        2        1        0
# 2     Shop2        1        0        0
# 3     Shop3        1        0        1

Eh. Not what you wanted again... Try this instead:

cast(mydf, formula = Shop.Name ~ Product, 
     value = "Items", fill = 0, 
     add.missing = TRUE, fun.aggregate = sum)
#   Shop.Name Product1 Product2 Product3 Product4
# 1     Shop1        4        4        0        0
# 2     Shop2        3        0        0        0
# 3     Shop3        2        0        0        1

Let's get back to basics. xtabs from base R

xtabs(Items ~ Shop.Name + Product, mydf)
#          Product
# Shop.Name Product1 Product2 Product3 Product4
#     Shop1        4        4        0        0
#     Shop2        3        0        0        0
#     Shop3        2        0        0        1

Or, if you prefer a data.frame (note that your "Shop.Name" variable has been converted to the row.names of the data.frame):

as.data.frame.matrix(xtabs(Items ~ Shop.Name + Product, mydf))
#       Product1 Product2 Product3 Product4
# Shop1        4        4        0        0
# Shop2        3        0        0        0
# Shop3        2        0        0        1

这篇关于对数据框架进行改写的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对数据框架进行改写 [英] Reframing magic on data.frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

对数据框架进行改写 [英] Reframing magic on data.frame

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭