当输入大小不同时,如何进行机器学习? [英] How to do machine learning when the inputs are of different sizes?

查看:138
本文介绍了当输入大小不同时,如何进行机器学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在标准食谱机器学习中,我们对矩形矩阵进行操作;也就是说,我们所有的数据点都具有相同数量的特征.我们如何应对所有数据点都具有不同数量特征的情况?例如,如果我们要进行视觉分类,但是我们所有的图片都具有不同的尺寸,或者如果我们要进行情感分析,但是我们所有的句子具有不同数量的单词,或者如果我们要进行恒星分类,但是所有观测到星星的次数不同,等等.

In standard cookbook machine learning, we operate on a rectangular matrix; that is, all of our data points have the same number of features. How do we cope with situations in which all of our data points have different numbers of features? For example, if we want to do visual classification but all of our pictures are of different dimensions, or if we want to do sentiment analysis but all of our sentences have different amounts of words, or if we want to do stellar classification but all of the stars have been observed a different number of times, etc.

我认为通常的方法是从这些大小不规则的数据中提取规则大小的特征.但是我最近参加了一次有关深度学习的演讲,演讲者强调说,深度学习者无需从数据中手工制作功能,而是能够自己学习适当的功能.但是我们如何使用例如输入层不是固定大小的神经网络?

I think the normal way would be to extract features of regular size from these irregularly sized data. But I attended a talk on deep learning recently where the speaker emphasized that instead of hand-crafting features from data, deep learners are able to learn the appropriate features themselves. But how do we use e.g. a neural network if the input layer is not of a fixed size?

推荐答案

由于您在询问深度学习,因此我认为您对端到端系统更感兴趣,而不是功能设计.可以处理可变数据输入的神经网络是:

Since you are asking about deep learning, I assume you are more interested in end-to-end systems, rather then feature design. Neural networks that can handle variable data inputs are:

1)具有池化层的卷积神经网络.它们通常用于图像识别环境,但最近也被用于对句子建模. (我认为他们也应该擅长对星星进行分类).

1) Convolutional neural networks with pooling layers. They are usually used in image recognition context, but recently were applied to modeling sentences as well. ( I think they should also be good at classifiying stars ).

2)递归神经网络. (适用于顺序数据,例如时间序列,顺序标记任务,也适用于机器翻译).

2) Recurrent neural networks. (Good for sequential data, like time series,sequence labeling tasks, also good for machine translation).

3)基于树的自动编码器(也称为递归自动编码器),用于以树状结构排列的数据(可以应用于句子解析树)

3) Tree-based autoencoders (also called recursive autoencoders) for data arranged in tree-like structures (can be applied to sentence parse trees)

通过谷歌搜索可以很容易找到很多描述示例应用的论文.

Lot of papers describing example applications can readily be found by googling.

对于不常见的任务,您可以根据数据的结构选择其中之一,也可以设计这些系统的某些变体和组合.

For uncommon tasks you can select one of these based on the structure of your data, or you can design some variants and combinations of these systems.

这篇关于当输入大小不同时,如何进行机器学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆