如何自动绘制许多具有相同行数和列数的CSV文件? [英] How to automatically plot many CSV files with the same number of rows and columns?

查看:132
本文介绍了如何自动绘制许多具有相同行数和列数的CSV文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多(超过100个)具有相同表结构的csv文件,例如,所有表头都在第4行中,它们都有6列,数据从第5行到400001,

I have many (more than 100) csv files with same table structure for example in all table headers are in row 4 and they all have 6 columns and the data are from row 5 to 400001,

我需要以散点图的形式绘制这些数据,其中x表示第一列(40001时间单位),其他列的Y表示不同的变量,[如果我能够格式化图表(颜色,范围,标题,图例,...)],然后自动输入这些csv文件并导出png或pdf或其他可能有用的东西,我同时拥有Excel和R,但我不知道如何以有效的方式进行此绘制. (命名也很重要,它们应具有其CSV文件的名称)

I need to plot these data in a scatter plot which x shows the first column (40001 time unit) and the other columns are Ys for different variables, [its preferable if I be able to format a plot (colors, ranges, titles, legends , ...)] and automatically input these csv files and export png or pdf or anything else that might be useful , I have both Excel and R but I don't know how to do this plotting in an efficient manner. (Naming is also important, they shall have the name of their CSV files)

关于如何以更少的精力做到这一点的任何想法?

Any idea on how can I do this with less effort ?

谢谢

推荐答案

您的问题在具体细节上有些许不足,因此,我将做出一些假设,以从某种答案的框架入手.

Your question is a bit light on specific detail, so I'm going to make some assumptions to get started on a kind of skeleton of an answer.

让我们制作一些伪造的CSV文件,例如数据

Let's make some fake CSV files ones for example data

将工作目录设置为包含数据的文件夹...

Set working directory to folder containing data...

setwd("C:/my-csv-files")

制作100个数据帧,每个数据帧包含6个col乘500行(以保持快速运行)...

Make 100 data frames of six col by 500 rows (to keep things quick)...

df <- lapply(1:100, function(i) data.frame(cbind(1:500, matrix(sample(1000), 500, 5))))

从这些数据帧中的工作目录中制作100个csv文件...

Make 100 csv files from these data frames in the working directory...

lapply(1:length(df), function(i) write.csv(df[[i]],file=paste("df",i,"csv",sep=".")))

现在,我们可以重现您的问题,并像这样...迅速将许多CSV文件读入R .

Now we can reproduce your problem and quickly read many CSV files into R like so...

# create a list of all CSV files in all the folders 
files <- (dir("C:/my-csv-files", recursive=TRUE, full.names=TRUE, pattern="\\.(csv|CSV)$"))
# read in the CSV files and add the filename of each file as a column to
# each dataset so we can trace back dodgy data 
# so, create a function to read the CSV and get filenames
read.tables <- function(file.names, ...) {
  require(plyr)
  ldply(file.names, function(fn) data.frame(Filename=fn, read.csv(fn, ...)),.progress = 'text')
}
# execute function to read in data from each CSV, including file names of file that data comes from
mydata <- read.tables(files, stringsAsFactors = FALSE)

现在绘图数据,您说您只想要CSV文件中所有数据的一个绘图...

Now plot data, you say you just want one plot of all the data in the CSV files...

融合为绘图格式,这里X1是您的时间变量,X2X5是CSV文件中的其他变量

Melt into a format for plotting, here X1 is your time variable and X2 to X5 are the other variables in your CSV files

require(reshape2)
dat <- melt(mydata, id.vars = c("X1"), measure.vars = c("X2", "X3", "X4", "X5"))

这是您的时间变量与其他变量(以颜色区分)的单个散点图.只是从您的问题中不清楚您要绘制的内容是什么,因此请问另一个具有更多详细信息的问题.

And here's a single scatter plot of your time variable by the other variables (colour-coded). It's just not clear from your question exactly what you want to plot, so do ask another question with more details.

require(ggplot2)
ggplot(dat, aes(X1, value)) +
  geom_point(aes(colour = factor(variable)))

现在,将其另存为PDF或PNG ,请参见?ggsave此处的众多选项...

Now, save it as a PDF or PNG, see ?ggsave for the numerous options here...

ggsave(file="myplot.pdf")
ggsave(file="myplot.png")

找到这些文件的位置

getwd()

每个CSV文件绘制一个图,这是一种方法

listcsvs <- lapply(files,function(i) read.csv(i,  stringsAsFactors = FALSE))
names(listcsvs) <- files
require(reshape2)
require(ggplot2)
for (i in 1:length(files)) { 
  tmp <- melt(listcsvs[[i]], id.vars = "X1", measure.vars = c("X2", "X3", "X4", "X5"))
  print(ggplot(tmp,aes(X1, value)) + 
          geom_point(aes(colour = factor(variable))) +
          ggtitle(names(listcsvs[i]))
        )
}

如果您使用的是 RStudio ,则可以滚动查看图并导出要绘制的图将它们另存为PDF或PNG .

If you are using RStudio you can scroll through the plots and Export the ones you want to save them as a PDF or PNG.

因此,这涵盖了您问题的主要部分:

So that's covered the main parts of your question:

  1. 将大量CSV文件读入R
  2. 将数据绘制为一个散点图,显示针对一个变量的多个变量
  3. 将数据绘制为每个CSV文件的一个散点图
  4. 将图另存为PDF或PNG文件

此外,您还拥有创建示例数据的代码,可用于以后的问题中.通常,示例数据的质量越好,您得到的答案质量就越好(如Thomas在其评论中所建议的那样).

And as a bonus you've got code for creating example data which you can use in your future questions. In general, the better the quality of your example data, the better quality answers you'll get (as Thomas suggests in his comment).

这篇关于如何自动绘制许多具有相同行数和列数的CSV文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆