使用PTX在C ++ / CUDA程序中计算浮点运算的方法 [英] A Method of counting Floating Point Operations in a C++/CUDA Program using PTX

查看：629 发布时间：2017/3/5 19:04:10 c++ cuda ptx

本文介绍了使用PTX在C ++ / CUDA程序中计算浮点运算的方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个有点大的CUDA应用程序，我需要计算获得的GFLOPs。
我正在寻找一种简单而且通用的计算浮点运算数的方法。

I have a somewhat large CUDA application and I need to calculate the attained GFLOPs. I'm looking for an easy and perhaps generic way of counting the number of floating point operations.

可以从生成的浮点运算计数PTX代码（如下所示），使用汇编语言中的预定义fpo的列表？基于代码，计数可以通用吗？例如， add.s32％r58，％r8，-2; 计为一个浮点运算？

Is it possible to count floating point operations from the generated PTX code (as shown below), using a list of predefined fpo in assembly language? Based on the code, can the counting be made generic? For example, does add.s32 %r58, %r8, -2; count as one floating point operation?

EXAMPLE：

BB3_2:
.loc 2 108 1
mov.u32         %r8, %r79;
setp.ge.s32     %p1, %r78, %r16;
setp.lt.s32     %p2, %r78, 0;
or.pred         %p3, %p2, %p1;
@%p3 bra        BB3_5;

add.s32         %r58, %r8, -2;
setp.lt.s32     %p4, %r58, 0;
setp.ge.s32     %p5, %r58, %r15;
or.pred         %p6, %p4, %p5;
@%p6 bra        BB3_5;

.loc 2 112 1
ld.global.u8    %rc1, [%rd17];
cvt.rn.f32.u8   %f11, %rc1;
mul.wide.u32    %rd12, %r80, 4;
add.s64         %rd13, %rd7, %rd12;
ld.local.f32    %f12, [%rd13];
fma.rn.f32      %f14, %f11, %f12, %f14;
.loc 2 113 1
add.f32         %f15, %f15, %f12;

或者有更简单的计数FPO的方法，这是浪费时间吗？

Or are there far simpler ways of counting FPOs and this is a waste of time?

使用PTX在C ++ / CUDA程序中计算浮点运算的方法 [英] A Method of counting Floating Point Operations in a C++/CUDA Program using PTX

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

使用PTX在C ++ / CUDA程序中计算浮点运算的方法 [英] A Method of counting Floating Point Operations in a C++/CUDA Program using PTX

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭