通用数据集的数据增强技术? [英] Data augmentation techniques for general datasets?

查看:68
本文介绍了通用数据集的数据增强技术?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究机器学习问题,并希望在matlab中基于它建立基于神经网络的分类器.一个问题是,数据是以特征的形式给出的,并且样本数量要少得多.我了解图像的数据增强技术,包括旋转,平移,仿射平移等.

我想知道通用数据集是否有可用的数据增强技术?可能使用随机性生成更多数据吗?我在

您会看到绿色的圆圈,代表用自动编码器生成的其他数据.

I am working in a machine learning problem and want to build neural network based classifiers on it in matlab. One problem is that the data is given in the form of features and number of samples is considerably lower. I know about data augmentation techniques for images, by rotating, translating, affine translation, etc.

I would like to know whether there are data augmentation techniques available for general datasets ? Like is it possible to use randomness to generate more data ? I read the answer here but I did not understand it.

Kindly please provide answers with the working details if possible.

Any help will be appreciated.

解决方案

You need to look into autoencoders. Effectively you pass your data into a low level neural network, it applies a PCA-like analysis, and you can subsequently use it to generate more data.

Matlab has an autoencoder class as well as a function, that will do all of this for you. From the matlab help files

Generate the training data.

rng(0,'twister'); % For reproducibility
n = 1000;
r = linspace(-10,10,n)';
x = 1 + r*5e-2 + sin(r)./r + 0.2*randn(n,1);

Train autoencoder using the training data.

hiddenSize = 25;
autoenc = trainAutoencoder(x',hiddenSize,...
        'EncoderTransferFunction','satlin',...
        'DecoderTransferFunction','purelin',...
        'L2WeightRegularization',0.01,...
        'SparsityRegularization',4,...
        'SparsityProportion',0.10);

Generate the test data.

n = 1000;
r = sort(-10 + 20*rand(n,1));
xtest = 1 + r*5e-2 + sin(r)./r + 0.4*randn(n,1);

Predict the test data using the trained autoencoder, autoenc .

xReconstructed = predict(autoenc,xtest');

Plot the actual test data and the predictions.

figure;
plot(xtest,'r.');
hold on
plot(xReconstructed,'go');

You can see the green cicrles which represent additional data generated with the auto-encoder.

这篇关于通用数据集的数据增强技术?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆