cuDF-不利用GPU内核 [英] cuDF - Not leveraging GPU cores

查看:97
本文介绍了cuDF-不利用GPU内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是使用cuDF的python中的以下代码,以加快该过程.但是,与我的4核心本地计算机cpu相比,我看不出速度上的任何差异. GPU配置为4 x NVIDIA Tesla T4

I am the below piece of code in python with cuDF to speed up the process. But I do not see any difference in the speed when compared to my 4 core local machine cpu. GPU configuration is 4 x NVIDIA Tesla T4

def arima(train):
    h = []
    for each in train:
        model = pm.auto_arima(np.array(ast.literal_eval(each)))
        p = model.predict(1).item(0)
        h.append(p)
    return h


for t_df in pd.read_csv("testset.csv",chunksize=1000):
    t_df = cudf.DataFrame.from_pandas(t_df)
    t_df['predicted'] = arima(t_df['prev_sales'])

我在这里想念什么?

推荐答案

同时,我将为您解决无法访问所有GPU的问题,我将向您分享一个性能提示:如果所有数据都适合一个单独的GPU,那么您应该使用 cudf 对单个GPU进行处理,因为它速度更快,因为它不需要任何编排开销.如果没有,请继续阅读:)

While, i'll help you with your issue of not accessing all the GPUs, I'll share with you a performance tip: If all your data fits on a single GPU, then you should use stick with single GPU processing using cudf as it is much faster, as it doesn't require any orchestration overhead. If not, then read on :)

之所以不使用4个GPU,是因为您没有使用 dask-cudf . cudf 是单个GPU库. dask-cudf 允许您将其扩展到多个GPU和多个节点,或处理大于GPU内存"大小的数据集.

The reason why you're not utilizing the 4 GPUs is because you're not using dask-cudf. cudf is a single GPU library. dask-cudf allows you to scale it out to multiple gpus and multiple nodes, or process datasets with "larger than GPU memory" sizes.

这里是一个不错的起点: https://docs.rapids.ai/api/cudf/stable/10min.html

Here is a great place to start: https://docs.rapids.ai/api/cudf/stable/10min.html

关于速度问题,如果可能的话,您应该通过cudf将CSV直接读取到GPU中.在您的代码中,您要读取两次数据-一次使用熊猫托管[CPU],另一次通过熊猫托管cudf [GPU].不必要-您将失去读取时GPU加速的所有优势.在大数据集上,与熊猫相比,cudf将为您提供非常不错的文件读取速度.

As for your speed issue, you should be reading the CSV directly into GPU through cudf, if possible. In your code, you're reading the data twice - once to host [CPU] with pandas and once to cudf [GPU] from pandas. It's unnecessary - and you lose all the benefits of GPU acceleration on read. On large datasets, cudf will give you a pretty nice file read speedup compared to pandas.

import dask_cudf
df = dask_cudf.read_csv("testset.csv", npartitions=4) # or whatever multiples of the # of GPUs that you have

,然后从那里去.确保设置一个客户端. https://docs.rapids.ai/api/cudf/stable/10min.html#Dask-Performance-Tips .该信息也可以在该链接中找到,该链接位于与上述链接相同的页面中.不需要for循环:).

and then go from there. Be sure to set up a client. https://docs.rapids.ai/api/cudf/stable/10min.html#Dask-Performance-Tips. This information is also found in that link, which is in the same page linked as above. No for loops required :).

对于其余内容,我假设您正在将cuml用于ARIMA之类的机器学习算法. https://docs.rapids.ai/api/cuml/stable/api.html?highlight = arima#cuml.tsa.ARIMA .这是一个示例笔记本: https://github.com/Rapidsai/cuml/blob/branch-0.14/notebooks/arima_demo.ipynb

For the rest of it, I am assuming that you're using the cuml for your machine learning algos, like ARIMA. https://docs.rapids.ai/api/cuml/stable/api.html?highlight=arima#cuml.tsa.ARIMA. Here is an example notebook: https://github.com/rapidsai/cuml/blob/branch-0.14/notebooks/arima_demo.ipynb

这篇关于cuDF-不利用GPU内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆