标准化 R 中的数据列 [英] Standardize data columns in R

查看:65
本文介绍了标准化 R 中的数据列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为 spam 的数据集,它包含 58 列和大约 3500 行与垃圾邮件相关的数据.

I have a dataset called spam which contains 58 columns and approximately 3500 rows of data related to spam messages.

我计划将来在这个数据集上运行一些线性回归,但我想事先做一些预处理并将列标准化,使其具有零均值和单位方差.

I plan on running some linear regression on this dataset in the future, but I'd like to do some pre-processing beforehand and standardize the columns to have zero mean and unit variance.

有人告诉我,最好的方法是使用 R,所以我想问一下如何使用 R 实现规范化?我已经正确加载了数据,我只是在寻找一些包或方法来执行此任务.

I've been told the best way to go about this is with R, so I'd like to ask how can i achieve normalization with R? I've already got the data properly loaded and I'm just looking for some packages or methods to perform this task.

推荐答案

我必须假设您的意思是说您想要平均值为 0 和标准差为 1.如果您的数据位于数据框和所有列中是数字,您可以简单地对数据调用 scale 函数来执行您想要的操作.

I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale function on the data to do what you want.

dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)

# check that we get mean of 0 and sd of 1
colMeans(scaled.dat)  # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)

使用内置函数很经典.喜欢这只猫:

Using built in functions is classy. Like this cat:

这篇关于标准化 R 中的数据列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆