对数据框架进行改写 [英] Reframing magic on data.frame
问题描述
我目前正在学习使用data.frame,并对如何重新排序他们很困惑。
目前,我有一个data.frame,显示:
- 列1:商店名称
- 第2列:产品
- 第3栏:本店购买此产品的数量
或视觉上如下所示:
+ --- + ----------- + ------- + ---- ------ + - +
| | Shop.Name |项目|产品| |
+ --- + ----------- + ------- + ---------- + - +
| 1 | Shop1 | 2 | Product1 | |
| 2 | Shop1 | 4 | Product2 | |
| 3 | Shop2 | 3 | Product1 | |
| 4 | Shop3 | 2 | Product1 | |
| 5 | Shop3 | 1 | Product4 | |
+ --- + ----------- + ------- + ---------- + - +
我想实现的是以下店铺为中心的结构:
第1栏:商店名称
...
当没有特定商店/产品的线(因为没有销售)时,我想创建一个0。
或
+ --- + ------- + ------- + ------- + ----- - + ------- + ----- + - + - +
| |商店| Prod1 | Prod2 | Prod3 | Prod4 | ... | | |
+ --- + ------- + ------- + ------- + ------- + ------- + - ---- + - + - +
| 1 | Shop1 | 2 | 4 | 0 | 0 | ... | | |
| 2 | Shop2 | 3 | 0 | 0 | 0 | ... | | |
| 3 | Shop3 | 2 | 0 | 0 | 1 | ... | | |
+ --- + ------- + ------- + ------- + ------- + ------- + - ---- + - + - +
提前感谢!
到目前为止,答案在一定程度上工作,但不完全回答您的问题。特别是,他们没有解决没有出售特定产品的商店的情况的问题。从您的示例输入和所需的输出,没有出售Product3的商店。事实上,Product3甚至不会出现在您的来源 data.frame
中。另外,它们并没有解决每个Shop + Product组合有多个行的可能情况。
这是一个修改版本的数据和两个解决方案到目前为止。我为Shop1和Product1的组合添加了另一行。请注意,我已将您的产品转换为因子
变量,其中包括变量可以采取的级别,即使没有一个案例实际上具有该级别。
mydf< - data.frame(
Shop.Name = c(Shop1,Shop1,Shop2商品3,Shop3,Shop1),
Items = c(2,4,3,2,1,2),
Product = factor(
c(Product1 Product2,Product1,Product1,Product4,Product1),
levels = c(Product1,Product2,Product3,Product4)))
-
dcast
从reshape2library(reshape2)
dcast(mydf,formula = Shop.Name〜Product,value = items,fill = 0)
#使用Product作为值列:使用value.var来覆盖。
#缺少聚合功能:缺省长度
#.fun中的错误(.value [i],...):
#2传递给'length'的参数需要1
突然不起作用这样做:
dcast(mydf,formula = Shop.Name〜Product,
fill = 0,value。 var =Items,
fun.aggregate = sum,drop = FALSE)
#Shop.Name Product1 Product2 Product3 Product4
#1 Shop1 4 4 0 0
#2 Shop2 3 0 0 0
#3 Shop3 2 0 0 1
-
。 c code code code code code code code code code codeast(mydf,formula = Shop.Name〜Product,value =Items,fill = 0)
#聚合需要fun.aggregate:用作默认值的长度
#Shop.Name Product1 Product2 Product4
#1 Shop1 2 1 0
#2 Shop2 1 0 0
#3 Shop3 1 0 1
呃不是你想要的再次尝试这样做:
cast(mydf,formula = Shop.Name〜Product,
value =Items,fill = 0,
add.missing = TRUE,fun.aggregate = sum)
#Shop.Name Product1 Product2 Product3 Product4
#1 Shop1 4 4 0 0
#2 Shop2 3 0 0 0
#3 Shop3 2 0 0 1
-
让我们回到基础。
xtabs
从基础Rxtabs(Items〜Shop.Name + Product ,mydf)
#Product
#Shop.Name Product1 Product2 Product3 Product4
#Shop1 4 4 0 0
#Shop2 3 0 0 0
#Shop3 2 0 0 1
或者,如果您更喜欢
data.frame
(请注意,您的Shop.Name变量已转换为data.frame
row.names >):as.data.frame.matrix(xtabs(Items〜Shop.Name + Product,mydf))
pre>
#Product1 Product2 Product3 Product4
#Shop1 4 4 0 0
#Shop2 3 0 0 0
#Shop3 2 0 0 1
I am currently learning to work with data.frame and quite confused on how to reorder them.
At the moment, I have a data.frame that shows :
- column 1: a shop name
- column 2: a product
- column 3: the number of purchase for this product by this shop
or visually something like this:
+---+-----------+-------+----------+--+
| | Shop.Name | Items | Product | |
+---+-----------+-------+----------+--+
| 1 | Shop1 | 2 | Product1 | |
| 2 | Shop1 | 4 | Product2 | |
| 3 | Shop2 | 3 | Product1 | |
| 4 | Shop3 | 2 | Product1 | |
| 5 | Shop3 | 1 | Product4 | |
+---+-----------+-------+----------+--+
What I would like to achieve is the following "shop-centric" structure:
- column 1: a shop name
- column 2: Items sold for product1
- column 3: Items sold for product2
- column 4: Items sold for product3 ...
When there is no line for a specific shop/product (because of no sales), I would like to create a 0.
or
+---+-------+-------+-------+-------+-------+-----+--+--+
| | Shop | Prod1 | Prod2 | Prod3 | Prod4 | ... | | |
+---+-------+-------+-------+-------+-------+-----+--+--+
| 1 | Shop1 | 2 | 4 | 0 | 0 | ... | | |
| 2 | Shop2 | 3 | 0 | 0 | 0 | ... | | |
| 3 | Shop3 | 2 | 0 | 0 | 1 | ... | | |
+---+-------+-------+-------+-------+-------+-----+--+--+
Thanks a lot in advance!
The answers so far work to a certain degree, but don't fully answer your question. In particular, they don't address the issue of a case in which there are no shops which sold a particular product. From your example input and desired output, there were no shops which sold "Product3". Indeed, "Product3" does not even appear in your source data.frame
. Additionally, they do not address the possible situation of having more than one row for each Shop + Product combination.
Here's a modified version of your data and the two solutions so far. I've added another row for a combination of "Shop1" and "Product1". Notice that I have converted your products to a factor
variable that includes the levels that the variable can take, even if none of the cases actually has that level.
mydf <- data.frame(
Shop.Name = c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3", "Shop1"),
Items = c(2, 4, 3, 2, 1, 2),
Product = factor(
c("Product1", "Product2", "Product1", "Product1", "Product4", "Product1"),
levels = c("Product1", "Product2", "Product3", "Product4")))
dcast
from "reshape2"library(reshape2) dcast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0) # Using Product as value column: use value.var to override. # Aggregation function missing: defaulting to length # Error in .fun(.value[i], ...) : # 2 arguments passed to 'length' which requires 1
Wha? Suddenly does not work. Do this instead:
dcast(mydf, formula = Shop.Name ~ Product, fill = 0, value.var = "Items", fun.aggregate = sum, drop = FALSE) # Shop.Name Product1 Product2 Product3 Product4 # 1 Shop1 4 4 0 0 # 2 Shop2 3 0 0 0 # 3 Shop3 2 0 0 1
Let's be oldschool.
cast
from "reshape"library(reshape) cast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0) # Aggregation requires fun.aggregate: length used as default # Shop.Name Product1 Product2 Product4 # 1 Shop1 2 1 0 # 2 Shop2 1 0 0 # 3 Shop3 1 0 1
Eh. Not what you wanted again... Try this instead:
cast(mydf, formula = Shop.Name ~ Product, value = "Items", fill = 0, add.missing = TRUE, fun.aggregate = sum) # Shop.Name Product1 Product2 Product3 Product4 # 1 Shop1 4 4 0 0 # 2 Shop2 3 0 0 0 # 3 Shop3 2 0 0 1
Let's get back to basics.
xtabs
from base Rxtabs(Items ~ Shop.Name + Product, mydf) # Product # Shop.Name Product1 Product2 Product3 Product4 # Shop1 4 4 0 0 # Shop2 3 0 0 0 # Shop3 2 0 0 1
Or, if you prefer a
data.frame
(note that your "Shop.Name" variable has been converted to therow.names
of thedata.frame
):as.data.frame.matrix(xtabs(Items ~ Shop.Name + Product, mydf)) # Product1 Product2 Product3 Product4 # Shop1 4 4 0 0 # Shop2 3 0 0 0 # Shop3 2 0 0 1
这篇关于对数据框架进行改写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!