将选择crteria添加到read.table [英] Add selection crteria to read.table
问题描述
让我们使用 read.table
导入数据集的以下简化版本:
Let's take the following simplified version of a dataset that I import using read.table
:
a<-as.data.frame(c("M","M","F","F","F"))
b<-as.data.frame(c(25,22,33,17,18))
df<-cbind(a,b)
colnames(df)<-c("Sex","Age")
实际上我的数据集非常大,我只对一小部分数据感兴趣,即有关18岁或以下女性的数据。在上面的例子中,这只是最后两次观察。
In reality my dataset is extremely large and I'm only interested in a small proportion of the data i.e. the data concerning Females aged 18 or under. In the example above this would be just the last 2 observations.
我的问题是,我可以立即导入这些观察而不导入其余数据然后使用 subset
来优化我的数据库。我的计算机容量有限,因此我一直在使用 scan
以块的形式导入我的数据,但非常耗时。
My question is, can I just import these observations immediately without importing the rest of the data then using subset
to refine my database. My computer's capacities are limited and so I have been using scan
to import my data in chunks but this is extremely time consuming.
有更好的解决方案吗?
推荐答案
这与@ Drew75的回答几乎相同但是我要用它来说明SQLite的一些问题:
This is almost the same as @Drew75's answer but I'm including it to illustrate some gotcha's with SQLite:
# example: large-ish data.frame
df <- data.frame(Sex=sample(c("M","F"),1e6,replace=T),
Age=sample(18:75,1e6,replace=T))
write.csv(df, "myData.csv", quote=F, row.names=F) # note: non-quoted strings
library(sqldf)
myData <- read.csv.sql(file="myData.csv", # looks for char M (no qoutes)
sql="select * from file where Sex='M'", eol = "\n")
nrow(myData)
# [1] 500127
write.csv(df, "myData.csv", row.names=F) # quoted strings...
myData <- read.csv.sql(file="myData.csv", # this fails
sql="select * from file where Sex='M'", eol = "\n")
nrow(myData)
# [1] 0
myData <- read.csv.sql(file="myData.csv", # need quotes in the char literal
sql="select * from file where Sex='\"M\"'", eol = "\n")
nrow(myData)
# [1] 500127
这篇关于将选择crteria添加到read.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!