在data.frame中具有多个值的列 [英] column with multiple values in data.frame
问题描述
汽车价格
F 1000,2000,3000
GM 2000,500,1000
第二个问题:
现在我想对价格列中的每个值应用相同的功能,我该怎么办那?假设我想创建另一个列,价格列的值翻倍。
data.frame
s只是列表
,因此也可以是列表
code>列表 s。
cars< - c(FORD,GM)
price< - list(c 1000,2000,3000),c(2000,500,1000))
myDF< - data.frame(cars = cars,price = cbind(price))
myDF
#汽车价格
#1 FORD 1000,2000,3000
#2 GM 2000,500,1000
然后执行给定行中价格
的所有值的函数:
#一次执行所有价格
表示(unlist(myDF $ price))
#[1] 1583.333
#在每行每个PRICES集上执行:
lapply(myDF $ price,mean)
#[[1]]
#[1] 2000
#
#[[2]]
#[1] 1166.667
< hr>
就是说,我会推荐这种方法。
它变得很麻烦,通常有更好的方法来完成同样的目标
另一种方法是简单地使用价格表作为数据集,并根据汽车列命名元素:
名称(价格)< - 汽车
价格
#$ FORD
#[1] 1000 2000 3000
#
#$ GM
#[1] 2000 500 1000
在这种情况下,您的* ply语句将具有已经分配给他们的汽车的名称,并且稍后会打字:
lapply(price,意思是
#$ FORD
#[1] 2000
#
#$ GM
#[1] 1166.667
Al替代方法是使用一个长的 data.frame
或 data.table
:
#转换为长:
myDF< - data.frame(cars= rep(cars,times = lapply(price,length)),price= unlist(price,use.names = FALSE))
myDF
然后你可以使用by参数来执行所有价格的函数在一组中:
by(data = myDF $ price,INDICIES = myDF $ cars,FUN = mean)
#或使用:
with(myDF,by(price,cars,mean))
这是一样的方法,但是使用 data.table
(其内置了)
library(data.table)
myDT < - data.table(myDF,key =cars)
myDT [,mean(price),by = cars]
#cars V1
#1:FORD 1501.250
#2:GM 1166.667
I would like to make a data.frame in R with some columns having multiple values (same number of variables for all rows). For example, here is a data frame with two columns (cars and price), note that column price has three values for each row.
cars price
F 1000,2000,3000
GM 2000, 500, 1000
The second question:
Now I want to apply the same function to each value in the price column, how can I do that? Let's say I want to create another column with doubled values of price column.
data.frame
s are simply list
s, and as such, they can also be list
s of list
s.
cars <- c("FORD", "GM")
price <- list( c(1000, 2000, 3000), c(2000, 500, 1000))
myDF <- data.frame(cars=cars, price=cbind(price))
myDF
# cars price
# 1 FORD 1000, 2000, 3000
# 2 GM 2000, 500, 1000
then to execute a function on all values of price
in a given row:
# execute on ALL PRICES at once
mean(unlist(myDF$price))
# [1] 1583.333
# execute on each set of PRICES per row:
lapply(myDF$price, mean)
# [[1]]
# [1] 2000
#
# [[2]]
# [1] 1166.667
That being said, I would recomend against this approach.
It gets cummbersome and there are usually better ways to accomplish the same goal.
One alternate method is to simply use the price list as your dataset and name the elemens according to the cars column:
names(price) <- cars
price
# $FORD
# [1] 1000 2000 3000
#
# $GM
# [1] 2000 500 1000
In this case, your *ply statements would have the names of the cars already assigned to them and it would be slightly less typing:
lapply(price, mean)
# $FORD
# [1] 2000
#
# $GM
# [1] 1166.667
Al alternate method is to use a long data.frame
or data.table
:
# transforming to long:
myDF <- data.frame("cars"=rep(cars, times=lapply(price, length)), "price"=unlist(price, use.names=FALSE))
myDF
Then you can use the by argument to execute functions across all prices in a group:
by(data=myDF$price, INDICIES=myDF$cars, FUN=mean)
# or using with:
with(myDF, by(price, cars, mean))
Here is the same approach, but using data.table
(which has by
built in)
library(data.table)
myDT <- data.table(myDF, key="cars")
myDT[, mean(price), by=cars]
# cars V1
# 1: FORD 1501.250
# 2: GM 1166.667
这篇关于在data.frame中具有多个值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!