在R中循环通过.csv文件,计算相对频率? [英] Loop through a .csv file in R, computing relative frequencies?

查看:349
本文介绍了在R中循环通过.csv文件,计算相对频率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,我正在尝试创建一个.R脚本,它将打开一个.csv文件并计算一些频率。此文件中有标头,与其关联的值为1,0,NA或-4。我想做的是通过每个垂直行,然后计算它们的频率。我相信这是一个容易的脚本,但我不知道R的语法如何工作。

I'm new to R and I'm trying to create a .R script that will open up a .csv file of mine and compute some frequencies. There are headers in this file and the values associated with them are either 1,0,NA, or -4. What I want to do is go through each vertical row and then compute the frequencies of them. I'm sure this is an easy script, but I'm not sure how the syntax of R works yet. Can anyone get me started on this please?

推荐答案

确切的脚本将根据您的输入和什么类型的输出你想要(只是打印到交互式控制台?写入.csv?),但这里是我的尝试:

The exact script is going to vary based on your input and what kind of output you'd like (just printed to the interactive console? Written to .csv?), but here's my attempt:

#Read the data into .csv - it assumes headers
dat <- read.csv(file = "yourfile.csv")

#For right now, use this fake data
dat <- data.frame(x = c(-4, 0, 1, 1, -4, NA, NA, 0), y = c(1, 1, 1, 0, -4, NA, 0, NA))

#Get the frequency of values for each column, assuming every column consists of data
apply(X = dat, MARGIN = 2, FUN = function(x) {summary(factor(x))})

apply 函数应用您给出的函数(FUN)在您给它的数据的边距(1 =行,2 =列)。你可以给它任何你喜欢的功能。传递 FUN = summary 将给出每列的平均值,最小值,最大值等(因为它们是数字的)。但是factor()的默认方法是频率,这是你需要的。所以,不是通过摘要,而是让R看到你的数字作为一个因素:定义一个匿名函数 function(x)(apply将知道由你指的是列一次一个)。设置此函数首先将x转换为因子( factor(x)),然后总结该因子。这将返回每个列的频率矩阵。

The apply function applies the function you give it (FUN) over the margin (1 = rows, 2 = columns) of the data that you give it. You can give it any function you like. Passing FUN = summary will give you the mean, min, max, etc. of each column (because they're numeric). But the default method of summary() for factors is frequencies, which is what you need. So instead of passing summary, trick R into seeing your numbers as a factor: define an anonymous function function(x) (apply will know that by x you're referring to the columns taken one at a time). Set this function to first convert x to a factor (factor(x)) and then summarize that factor. This will return a matrix with the frequencies for each column.

不是最优雅的代码,但我认为它会得到你需要的。

Not the most elegant code ever, but I think it'll get you what you need.

这篇关于在R中循环通过.csv文件,计算相对频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆