在python中将标量映射到颜色的快速方法 [英] Fast way to map scalars to colors in python

查看:166
本文介绍了在python中将标量映射到颜色的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种在python中将标量映射为十六进制颜色的快速方法:

I'm looking for a fast way to map scalars to hex colors in python:

import matplotlib
import matplotlib.cm as cm
import matplotlib.colors as mcol

np.random.seed(0) 
df = pd.DataFrame(np.random.rand(20000,1))
df.head()

    0
0   0.548814
1   0.715189
2   0.602763
3   0.544883
4   0.423655

我只有20种颜色,所以我想知道matplotlib是最好的解决方案,还是简单的查找表会更好.

I have 20 colors only, so I wonder if matplotlib is the best solution, or a simple lookup table would be better.

colors = ["#084594", "#0F529E", "#1760A8", "#1F6EB3", "#2979B9", "#3484BE", "#3E8EC4",
                "#4A97C9", "#57A0CE", "#64A9D3", "#73B2D7", "#83BBDB", "#93C4DE", "#A2CBE2",
                "#AED1E6", "#BBD6EB", "#C9DCEF", "#DBE8F4", "#EDF3F9", "#FFFFFF"]
values = df[0].values

@profile
def apply_method(): # 6.9 sec
    cm1 = mcol.ListedColormap(colors)
    norm = matplotlib.colors.Normalize(vmin=np.min(values), vmax=np.max(values), clip=True)
    mapper = cm.ScalarMappable(norm=norm, cmap=cm1)

    return df[0].apply(lambda row: mcol.to_hex(mapper.to_rgba(row)))

%time apply_method()

从探查器中,我看到to_rgba()是最昂贵的方法(6.5秒仅显示20.000个值).

From the profiler I see that to_rgba() is the most expensive method (6.5 sec for only 20.000 values).

所以我正在寻找一种绕过to_rgba()方法的方法.有没有办法从cm.ScalarMappable获取颜色范围?然后查找正确的十六进制颜色?

So I'm looking at a way to bypass the to_rgba() method. Is there a way to get the color ranges from cm.ScalarMappable? And then do a lookup to the right hex color?

推荐答案

问题代码中最昂贵的方法不是to_rgba()而是DataFrame.apply,因为它将函数分别应用于每一行.

The most expensive method in the code from the question is not to_rgba() but the DataFrame.apply because it applies the function to each row individually.

我对这个问题的回答给出了使用matplotlib颜色图的不同方法之间的比较:

A comparisson between different methods using matplotlib colormaps is given in my answer to this question: How do I map df column values to hex color in one go?

精髓在于,使用查找表(LUT)确实要快得多(在那儿调查的情况下是系数400).

The quintessence is that using a look up table (LUT) is indeed much faster (a factor 400 in the case investigated over there).

但是请注意,在此问题中,根本不需要使用matplotlib.由于您已经有了十六进制格式的可能颜色列表,因此绝对不需要使用matplotlib并将十六进制颜色转换为颜色图,然后再转换回十六进制颜色.

However note that in the case of this question here, there is no need to use matplotlib at all. Since you already have a list of possible colors in hex format, there is absolutely no need to use matplotlib and convert hex colors to a colormap and then back to hex colors.

直接使用颜色列表作为查找表(LUT)可以更快.取一个具有10000个条目的数据框(以使其与其他答案的时间一致),此问题的代码耗时2.7秒.

Instead just using the list of colors as look up table (LUT) directly is way faster. Taking a dataframe with 10000 entries (to keep it comarable to the other answer's timings) the code from this question takes 2.7 seconds.

以下代码需要380 µs.这是7000改进的一个原因. 与从链接问题的7.7 ms答案中使用matplotlib的最佳方法相比,它仍然要好20倍.

The following code takes 380 µs. This is a factor of 7000 improvement.
Compared to the best method using matplotlib from the linked question's answer of 7.7 ms, it is still a factor of 20 better.

import numpy as np; np.random.seed(0)
import pandas as pd

def create_df(n=10000):
    return pd.DataFrame(np.random.rand(n,1), columns=['some_value'])

def apply(df):
    colors = ["#084594", "#0F529E", "#1760A8", "#1F6EB3", "#2979B9", "#3484BE", "#3E8EC4",
              "#4A97C9", "#57A0CE", "#64A9D3", "#73B2D7", "#83BBDB", "#93C4DE", "#A2CBE2",
              "#AED1E6", "#BBD6EB", "#C9DCEF", "#DBE8F4", "#EDF3F9", "#FFFFFF"]
    colors = np.array(colors)
    v = df['some_value'].values
    v = ((v-v.min())/(v.max()-v.min())*(len(colors)-1)).astype(np.int16)
    return pd.Series(colors[v])

df = create_df()
%timeit apply(df)

# 376 µs

这篇关于在python中将标量映射到颜色的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆