如何在 Matlab 中热编码? [英] How can I hot one encode in Matlab?
问题描述
例如,通常会给您一个表示标签(也称为类)的整数值向量
Often you are given a vector of integer values representing your labels (aka classes), for example
[2; 1; 3; 3; 2]
并且您想对该向量进行热编码,例如,每个值由标签向量的每一行中的值指示的列中的 1 表示
and you would like to hot one encode this vector, such that each value is represented by a 1 in the column indicated by the value in each row of the labels vector, for example
[0 1 0;
1 0 0;
0 0 1;
0 0 1;
0 1 0]
推荐答案
为了速度和内存节省,你可以使用 bsxfun
结合 eq
来完成同样的事情.虽然您的 eye
解决方案可能有效,但您的内存使用量会随着 X
中唯一值的数量呈二次方增长.
For speed and memory savings, you can use bsxfun
combined with eq
to accomplish the same thing. While your eye
solution may work, your memory usage grows quadratically with the number of unique values in X
.
Y = bsxfun(@eq, X(:), 1:max(X));
如果您愿意,也可以作为匿名函数:
Or as an anonymous function if you prefer:
hotone = @(X)bsxfun(@eq, X(:), 1:max(X));
或者,如果您使用的是 Octave(或 MATLAB 版本 R2016b 及更高版本),您可以利用自动广播并按照@Tasos 的建议执行以下操作.
Or if you're on Octave (or MATLAB version R2016b and later) , you can take advantage of automatic broadcasting and simply do the following as suggested by @Tasos.
Y = X == 1:max(X);
基准
这是一个快速基准测试,显示了在 X
上具有不同元素数量和 X
中不同数量的唯一值的各种答案的性能.
Benchmark
Here is a quick benchmark showing the performance of the various answers with varying number of elements on X
and varying number of unique values in X
.
function benchit()
nUnique = round(linspace(10, 1000, 10));
nElements = round(linspace(10, 1000, 12));
times1 = zeros(numel(nUnique), numel(nElements));
times2 = zeros(numel(nUnique), numel(nElements));
times3 = zeros(numel(nUnique), numel(nElements));
times4 = zeros(numel(nUnique), numel(nElements));
times5 = zeros(numel(nUnique), numel(nElements));
for m = 1:numel(nUnique)
for n = 1:numel(nElements)
X = randi(nUnique(m), nElements(n), 1);
times1(m,n) = timeit(@()bsxfunApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times2(m,n) = timeit(@()eyeApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times3(m,n) = timeit(@()sub2indApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times4(m,n) = timeit(@()sparseApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times5(m,n) = timeit(@()sparseFullApproach(X));
end
end
colors = get(0, 'defaultaxescolororder');
figure;
surf(nElements, nUnique, times1 * 1000, 'FaceColor', colors(1,:), 'FaceAlpha', 0.5);
hold on
surf(nElements, nUnique, times2 * 1000, 'FaceColor', colors(2,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times3 * 1000, 'FaceColor', colors(3,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times4 * 1000, 'FaceColor', colors(4,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times5 * 1000, 'FaceColor', colors(5,:), 'FaceAlpha', 0.5);
view([46.1000 34.8000])
grid on
xlabel('Elements')
ylabel('Unique Values')
zlabel('Execution Time (ms)')
legend({'bsxfun', 'eye', 'sub2ind', 'sparse', 'full(sparse)'}, 'Location', 'Northwest')
end
function Y = bsxfunApproach(X)
Y = bsxfun(@eq, X(:), 1:max(X));
end
function Y = eyeApproach(X)
tmp = eye(max(X));
Y = tmp(X, :);
end
function Y = sub2indApproach(X)
LinearIndices = sub2ind([length(X),max(X)], [1:length(X)]', X);
Y = zeros(length(X), max(X));
Y(LinearIndices) = 1;
end
function Y = sparseApproach(X)
Y = sparse(1:numel(X), X,1);
end
function Y = sparseFullApproach(X)
Y = full(sparse(1:numel(X), X,1));
end
结果
如果你需要一个非稀疏输出 bsxfun
表现最好,但如果你可以使用一个 sparse
矩阵(不转换为完整矩阵),那就是速度最快、内存效率最高的选项.
Results
If you need a non-sparse output bsxfun
performs the best, but if you can use a sparse
matrix (without conversion to a full matrix), then that is the fastest and most memory efficient option.
这篇关于如何在 Matlab 中热编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!