用该列的平均值替换值-许多列 [英] Replace value with the average of it's column - many columns
问题描述
我有一个Excel表格,其中包含1000多个列和11000行-都包含数字数据.在数据中,缺少用"*"表示的值.
我想用其所在列的平均值替换所有'*'值.
手动执行此操作会花费很长时间,那么是否有公式可以实现此目的?
非常感谢您提供任何帮助.
正如您提到的机器学习,我想我将向您介绍如何使用清除丢失的数据模块,其中介绍了替换方法,例如使用链式方程式,均值,中位数和其他几种方法的多元插补.很棒的是,您可以通过右键单击数据集并查看哪些列有偏斜来可视化数据集列.然后,您可以逐列选择要使用的替换方法.如果您有严重倾斜的列,则可以使用中位数来代替.这也为数据规范化提供了很好的机会(缩小并缩小).您还可以在数据集中使用Python和R.
我不知道是否有一种直接将"*"
当作缺失值的方法,我试图找出答案,但是如果您在加载之前进行了一些处理,那么一切都很好.加载之前的步骤要求:
- 将工作表导出为CSV并保存.
- 使用 Ctrl + F 弹出查找和替换"对话框,并输入
"~*"
作为查找和替换"空白
然后登录AML,然后点击屏幕底部的 +新建
选择新建">"DATASET > FROM LOCAL FILE
",然后选择您的文件
选择类型时,如果数据没有标题行,请确保选择不包含标题的CSV;如果确实如此,请选择带有标题的
您的数据集将开始上传,如屏幕底部的进度条所示,然后出现在SAVED DATASETS
集合中.
再次单击 +新建按钮,然后选择EXPERIMENT > BLANK EXPERIMENT
将保存的数据集拖放到右侧的画布上:
在右侧的搜索实验项目框中,键入:Clean Missing Data
然后将出现的模块拖到画布上
通过单击顶部框底部的点并拖动到另一个框来加入2个框
选择底部的框,然后在右侧输入以下参数(您可以在此处选择适用于缺失值的方法,例如,用均值替换缺失,或者如果列数据偏斜,则可以选择中位数.
右键单击底部模块,然后选择Run selected
再次右键单击并选择Cleaned dataset > Save as Dataset
底部的进度条将在完成时通知您
再次输入搜索实验项目框:convert to csv
并将其拖动到画布上,然后将第二个模块的左侧底部连接到新添加的第三个模块的顶部:>
选择底部模块,然后右键单击> Run selected
等待进度条完成.
右键单击底部模块,然后单击Download
.完成.
I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*"
as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
- Export the sheet as a CSV and save it.
- Use Ctrl+ F to bring up the find and replace dialog and enter
"~*"
for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE
and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS
collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv
and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download
. Done.
这篇关于用该列的平均值替换值-许多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!