如何读取无头(-er)ASCII,使用R识别特定行? [英] How to read head(-er)less ASCII, identify specific rows using R?

查看:154
本文介绍了如何读取无头(-er)ASCII,使用R识别特定行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用VBA在Excel中执行此操作很简单,但是对于扩展数据分析,绘图功能,必须使用R.仪器的软件仅输出.csv文件,没有标题.必须单独阅读以进行分析.

Doing this in Excel using VBA is straight forward, however for extended data analysis, plotting capability, R has to be used. The instrument's software only outputs .csv files and without headers. It has to be read separately for analysis.

我想做的事情(需要做的事情):

What I am trying to do (need to do):

  • 读取多个不带标题的.csv(4列).每个文件只能通过名称(仅名称)来区分.
  • 从每个文件中选择一些特定的行.例如第一列的行 值= 101,列值= 201,依此类推.每个文件只有500行.我觉得有必要在此处某处循环读取特定的行(?)
  • 将以上值存储到所有文件的数据框中
  • 计算一些统计数据
  • 绘制统计结果
  • 最终输出数据框,进行统计分析,并绘制到Excel中.csv
  • Read multiple .csv without headers (4 columns). Each file can only be distinguished by name, name only.
  • Pick few specific rows from each file. e.g. a row with first column value = 101, column value = 201 etc. Each file only has 500 rows. I sense there is a need for a looping somewhere here to read the specific row(?)
  • Store above values into a data frame for all the files
  • calculate some stats on it
  • plot the stat results
  • finally output data frame, stat analysis, plot into an Excel, .csv

我已经尝试了如何使用R读取.csv,将其存储到数据框中,对数据框中的项目进行分析并进行绘图.每天针对每个仪器运行重复此过程流程.我运行两种乐器.我想保存一个可以每次运行的代码段.

I already tried how to read .csv using R, store into a data frame, perform analysis on data frame items and plot. This process flow to be repeated daily for each instrument run. I run two instruments. I want to save a code snippet that can be run each time.

我面临的唯一问题是,我不确定如何找到这些特定的行.下一个问题是如何将结果输出到Excel.

The only issue I am facing, I am not sure how to locate those specific rows. The next issue is how to go about output into Excel.

.csv输入文件格式:

.csv input file format:

V1, V2, V3, V4
numeric (0 decimals), numeric (with 4 decimals), alpha num, alpha num
001, 12.8975, XY03, XY05
...
485, 89.2134, XY02, XY7

更新1: 由于仪器sw的某些更改,

Update 1: Due to some changes with the instrument sw, files are in ASCII format instead of CSV. This brings much limitations. Facing issues installing raster package to support ASCII. Using the following instead. Installed sqldf package to support sql. However that gives an error.

> fl1 <- read.table("~/rcodes/dt07042017/fileone.ASC", sep=",") 
--read ok
> sv1 <- sqldf('select FNAME "F1", V2, V3, V4 from fl1 where V1 in (101, 201, 301)') 
Warning message: Quoted identifiers should have class SQL, use DBI::SQL() 
if the caller performs the quoting.

然后安装DBI软件包.不知道它做什么.的SQL工作.这只是一个文件,我需要读取1000个文件,选择特定的行值并将其保存到一个表/df中.写下以下虚线.失败,因为我找不到注入文件名的方法.

Then installed DBI package. Not sure what it does. The sql worked. This is just one file, I need to read 1000s of files, picked specific row values and saved them into one table/df. Wrote the following broken lines. It fails as I can't find a way to inject file name.

> df <- NULL
> fn <- NULL
> n <- 1
> for (f in files) {
+     fn <- file_path_sans_ext(f)
+     df0 <-read.table(f, sep=",")
+     n <- n + 1
+     df <- sqldf('select n, fn, V1, V2 from df0 
where V1 IN (101, 201, 301, 401)') 
-- thought R could read fn just as df0
+     df <- rbind(df) --further fn and n only reads two files
+ }
Error in rsqlite_send_query(conn@ptr, statement) : 
no such column: n, df1

我在R上使用sql并不很高兴.R中是否还有其他方法可以在不使用sql软件包的情况下做到这一点并实现上述目标?

I am not quite happy using sql on R. Is there any other way in R to do this without using sql packages and get above done?

我不想在该列中获取2:20的值(18个值),而是要获取第2行到第20行的平均值.

Instead of getting 2:20 values (18 values) in that column, I want to get the mean of row 2 to 20. How can I do that?

#set working directly to the folder where csv files are located
files <- list.files(pattern='.csv')
m = data.frame()
 for (k in 1:length(files)){
  +     csv = read.csv(files[k], header = FALSE)
  #picking up 2:20 consecutive values, value for row 50,120,150 so on
  +     data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)]
  #-pivot transform col/row- data <- as.data.frame(t(data))
  #but that line screwed up the data
  #when those selected values are with NA/blanks
        data$file = files[k]
  +     m = rbind(m, data)
 }

推荐答案

我不太理解"ASCII文件"的含义.希望以下内容与您的问题非常相似.假设您有以下文件:

I don't quite get what you mean by "ASCII files". Hope that the following is significantly similar to your problem. Say that you have the following files:

文件1.csv

101 0.8117 AFWSK QSZDP
099 0.4594 MDKKK DHVAH
301 0.8730 HMRCR PQSMB
012 0.4705 OHHZL CIKVM
012 0.1335 KKEHQ YFNJU
201 0.2967 XFIPZ PMNYT
301 0.2470 WKLIV TBHZP
101 0.9182 SSEHT QYBUW
101 0.8076 VKRSO BUGJL
301 0.5861 PICFN UFTWC

文件2.csv

201 0.0767 FHWLL FZKQF
301 0.3176 HDBVJ EIUNJ
301 0.9606 LHODD XRYFE
301 0.8327 NKGVT LYTUJ
012 0.8073 KOBXD GBEFA
099 0.9414 ZCOCX ENFAV
101 0.3683 UTBUG RKIQU
012 0.1515 XXJZI MEWVI
012 0.0383 IUKQQ PAIFV
301 0.2304 GKIKJ GEOZT

存储在名为path/to/my/dir的目录中.假设此目录还包含其他文件,但是您知道包含数据的文件是[0-9]+.csv格式的名称(即一些数字,后跟.csv).当然,它们的数量很多,它们的名称可以不同.由于问题不包含与此相关的任何提示,因此我仅提供一个示例,希望您可以使此步骤适应您的实际问题.

stored in a directory called path/to/my/dir. Say that this directory contains also other files, but you know that the ones containing data are names in the [0-9]+.csv format (i.e. some number, followed by .csv). Of course you have a big number of them and their names can be different; since the question doesn't contain any hint regarding this, I'm just providing an example, hoping that you might adapt this step to your real problem.

现在:

#you get the name of all files. No manual typing.
filenames<-list.files(path="path/to/my/dir",pattern="[0-9]+\\.csv",full.names=TRUE)
#reading them all
dataFiltered<-sapply(filenames,function(x) {
                    y<-read.table(x)
                    y[y[[1]] %in% c(101, 201, 301),]},simplify=FALSE)
finalRes<-cbind(do.call(rbind,dataFiltered),
           fileOrigin=rep(names(dataFiltered),vapply(dataFiltered,nrow,1L)))

要获得类似的东西:

#    V1     V2    V3    V4 fileOrigin
#1  101 0.8117 AFWSK QSZDP      1.csv
#2  301 0.8730 HMRCR PQSMB      1.csv
#3  201 0.2967 XFIPZ PMNYT      1.csv
#4  301 0.2470 WKLIV TBHZP      1.csv
#5  101 0.9182 SSEHT QYBUW      1.csv
#6  101 0.8076 VKRSO BUGJL      1.csv
#7  301 0.5861 PICFN UFTWC      1.csv
#8  201 0.0767 FHWLL FZKQF      2.csv
#9  301 0.3176 HDBVJ EIUNJ      2.csv
#10 301 0.9606 LHODD XRYFE      2.csv
#11 301 0.8327 NKGVT LYTUJ      2.csv
#12 101 0.3683 UTBUG RKIQU      2.csv
#13 301 0.2304 GKIKJ GEOZT      2.csv

以上应该可以,但是可能会占用大量内存.如果您有大量文件,建议您成堆阅读,并将以上内容应用于每堆文件.这意味着您不会将所有filenames对象都传递给sapply,但是您可以每次都传递它的一个子集,直到完成为止.

The above should work, but could be memory expensive. If you have a big number of files, I suggest you to read them in bunches and apply the above to each bunch. This means that you don't pass all the filenames object to sapply, but you can pass a subset of it each time till you finish.

最后,将所有finalRes一起rbind.

这篇关于如何读取无头(-er)ASCII,使用R识别特定行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆