如何在 Matlab 中热编码? [英] How can I hot one encode in Matlab?

查看:30
本文介绍了如何在 Matlab 中热编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,通常会给您一个表示标签(也称为类)的整数值向量

Often you are given a vector of integer values representing your labels (aka classes), for example

[2; 1; 3; 3; 2]

并且您想对该向量进行热编码,例如,每个值由标签向量的每一行中的值指示的列中的 1 表示

and you would like to hot one encode this vector, such that each value is represented by a 1 in the column indicated by the value in each row of the labels vector, for example

[0 1 0;
 1 0 0;
 0 0 1;
 0 0 1;
 0 1 0]

推荐答案

为了速度和内存节省,你可以使用 bsxfun 结合 eq 来完成同样的事情.虽然您的 eye 解决方案可能有效,但您的内存使用量会随着 X 中唯一值的数量呈二次方增长.

For speed and memory savings, you can use bsxfun combined with eq to accomplish the same thing. While your eye solution may work, your memory usage grows quadratically with the number of unique values in X.

Y = bsxfun(@eq, X(:), 1:max(X));

如果您愿意,也可以作为匿名函数:

Or as an anonymous function if you prefer:

hotone = @(X)bsxfun(@eq, X(:), 1:max(X));

或者,如果您使用的是 Octave(或 MATLAB 版本 R2016b 及更高版本),您可以利用自动广播并按照@Tasos 的建议执行以下操作.

Or if you're on Octave (or MATLAB version R2016b and later) , you can take advantage of automatic broadcasting and simply do the following as suggested by @Tasos.

Y = X == 1:max(X);

基准

这是一个快速基准测试,显示了在 X 上具有不同元素数量和 X 中不同数量的唯一值的各种答案的性能.

Benchmark

Here is a quick benchmark showing the performance of the various answers with varying number of elements on X and varying number of unique values in X.

function benchit()

    nUnique = round(linspace(10, 1000, 10));
    nElements = round(linspace(10, 1000, 12));

    times1 = zeros(numel(nUnique), numel(nElements));
    times2 = zeros(numel(nUnique), numel(nElements));
    times3 = zeros(numel(nUnique), numel(nElements));
    times4 = zeros(numel(nUnique), numel(nElements));
    times5 = zeros(numel(nUnique), numel(nElements));

    for m = 1:numel(nUnique)
        for n = 1:numel(nElements)
            X = randi(nUnique(m), nElements(n), 1);
            times1(m,n) = timeit(@()bsxfunApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times2(m,n) = timeit(@()eyeApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times3(m,n) = timeit(@()sub2indApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times4(m,n) = timeit(@()sparseApproach(X));

            X = randi(nUnique(m), nElements(n), 1);
            times5(m,n) = timeit(@()sparseFullApproach(X));
        end
    end

    colors = get(0, 'defaultaxescolororder');

    figure;

    surf(nElements, nUnique, times1 * 1000, 'FaceColor', colors(1,:), 'FaceAlpha', 0.5);
    hold on
    surf(nElements, nUnique, times2 * 1000, 'FaceColor', colors(2,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times3 * 1000, 'FaceColor', colors(3,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times4 * 1000, 'FaceColor', colors(4,:), 'FaceAlpha', 0.5);
    surf(nElements, nUnique, times5 * 1000, 'FaceColor', colors(5,:), 'FaceAlpha', 0.5);

    view([46.1000   34.8000])

    grid on
    xlabel('Elements')
    ylabel('Unique Values')
    zlabel('Execution Time (ms)')

    legend({'bsxfun', 'eye', 'sub2ind', 'sparse', 'full(sparse)'}, 'Location', 'Northwest')
end

function Y = bsxfunApproach(X)
    Y = bsxfun(@eq, X(:), 1:max(X));
end

function Y = eyeApproach(X)
    tmp = eye(max(X));
    Y = tmp(X, :);
end

function Y = sub2indApproach(X)
    LinearIndices = sub2ind([length(X),max(X)], [1:length(X)]', X);
    Y = zeros(length(X), max(X));
    Y(LinearIndices) = 1;
end

function Y = sparseApproach(X)
    Y = sparse(1:numel(X), X,1);
end

function Y = sparseFullApproach(X)
    Y = full(sparse(1:numel(X), X,1));
end

结果

如果你需要一个非稀疏输出 bsxfun 表现最好,但如果你可以使用一个 sparse 矩阵(不转换为完整矩阵),那就是速度最快、内存效率最高的选项.

Results

If you need a non-sparse output bsxfun performs the best, but if you can use a sparse matrix (without conversion to a full matrix), then that is the fastest and most memory efficient option.

这篇关于如何在 Matlab 中热编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆