标准化 R 中的数据列 [英] Standardize data columns in R
问题描述
我有一个名为 spam
的数据集,它包含 58 列和大约 3500 行与垃圾邮件相关的数据.
I have a dataset called spam
which contains 58 columns and approximately 3500 rows of data related to spam messages.
我计划将来在这个数据集上运行一些线性回归,但我想事先做一些预处理并将列标准化,使其具有零均值和单位方差.
I plan on running some linear regression on this dataset in the future, but I'd like to do some pre-processing beforehand and standardize the columns to have zero mean and unit variance.
有人告诉我,最好的方法是使用 R,所以我想问一下如何使用 R 实现规范化?我已经正确加载了数据,我只是在寻找一些包或方法来执行此任务.
I've been told the best way to go about this is with R, so I'd like to ask how can i achieve normalization with R? I've already got the data properly loaded and I'm just looking for some packages or methods to perform this task.
推荐答案
我必须假设您的意思是说您想要平均值为 0 和标准差为 1.如果您的数据位于数据框和所有列中是数字,您可以简单地对数据调用 scale
函数来执行您想要的操作.
I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale
function on the data to do what you want.
dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))
scaled.dat <- scale(dat)
# check that we get mean of 0 and sd of 1
colMeans(scaled.dat) # faster version of apply(scaled.dat, 2, mean)
apply(scaled.dat, 2, sd)
使用内置函数很经典.喜欢这只猫:
Using built in functions is classy. Like this cat:
这篇关于标准化 R 中的数据列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!