英特尔I5& amp;浮动部门的速度非常慢E5至强处理器 [英] Floating divisions terribly slow on Intel I5 & E5 xeon processors

查看:131
本文介绍了英特尔I5& amp;浮动部门的速度非常慢E5至强处理器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在计算机(win10)中使用Intel xeon CPU E5-1620 v3 @ 3.5Ghz进行了性能测试,获得了比raspberry pi性能更高的simitar结果。我的熊越来越等了。



我获得:



整数总和:2184 Mops(megaoperations / second)正如预期的那样

双重划分:15.6 - 18.32 Mops

双倍乘数:344 -430 Mops

双倍总和:881 - 1178 Mops

浮动部门:17.3 - 19.1 Mops



更新:我测试了I5和低于21 MOPs的部门



问题是:¿英特尔E5有协处理器吗?

我可以使用编译器指令更快地运行它吗?

它会更快地工作在I7处理器中?



我尝试过:



这个是我的代码。请在任何计算机上运行它,因为它运行得非常好!:



 #include< iostream> 
#include< time.h> // clock(),time_t
#pragma warning(disable:4996)//使用namespace std禁用deprecateds
;


time_t start,stop; char null_char ='\ 0';
//使用空计时器()重置开始时间:
void timer(char * title =& null_char,int data_size = 1){stop = clock(); if(* title)cout<< title<< time =<<(double)(stop-start)/(double)CLOCKS_PER_SEC<< =<< 1e-6 * data_size /((double)(stop-start)/(double)CLOCKS_PER_SEC)<< Mops / seg<< endl;开始=时钟(); }


int main()
{
cout<< 在发布模式下执行测试。在调试模式下结果将出错<< endl;
int isum = 0,size = 100 * 1024 * 1024;
timer(); // void timer重置计时器!
for(int i = 0; i< size; i ++)
isum + = i;
timer(100 Mega int sums的时间,大小);
double dsum = 1.0;
for(int i = 0; i< size; i ++)
dsum = dsum / 1.1111;
计时器(100兆双重划分时间,大小);双d2 = 1.111; dsum + = 0.1;
for(int i = 0; i< size; i ++)
dsum / = d2;
计时器(100兆双重划分时间-2,大小);
for(int i = 0; i< size; i ++)
dsum = dsum * d2;
timer(100 Mega double multiplications的时间,大小);
for(int i = 0; i< size; i ++)
dsum = dsum + d2;
timer(100 Mega sums乘法的时间,大小);

float fsum = 1.0f;
for(int i = 0; i< size; i ++)
fsum = fsum / 1.1111f;
计时器(100兆浮动分区的时间,大小);

cout<< endl<<拒绝以下行数据(在编译器优化后执行强制for循环):<< endl ;;
cout<< isum<< dsum<< fsum<< endl; //强制for()在isum上完成
cout<<=== FIN === << ENDL;的getchar();
返回1;
}

解决方案

Quote:

问题是:¿英特尔E5有协处理器吗?

是的。所有基于x86的CPU都有一个内置的x87 FPU和矢量单元(SSE,AVX)。

Quote:

我可以使用编译器指令更快地运行吗?

是的,但它取决于编译器以及是否可以接受减少的错误处理而不是严格的IEEE兼容。为此,大多数编译器都有某种 fast-math 选项。根据使用的CPU,您还可以启用标量指令(SSE)而不是FPU。

引用:

它在I7处理器中运行得更快?

它取决于x87 FPU / x86 CPU的时钟速率(对于SSE)。每条指令都需要一定数量的时钟周期。



浮点除法需要比加法或乘法多得多的时钟周期(与乘法相比,需要8-20倍)。这适用于所有类型的FPU,不仅适用于x86类型。当需要高性能时(例如,通过乘以循环内的倒数值),应该避免使用它们。



来自Intel®64和IA-32架构优化参考手册

Quote:

汇编/编译器编码规则4.(M影响,M generality)支持SSE浮点指令超过x87浮点指令。

汇编/编译器编码规则5.(MH影响,M一般性)运行屏蔽异常并设置DAZ和FTZ标志(尽可能)。

调整建议5. 使用perfmon计数器MACHINE_CLEARS.FP_ASSIST查看浮动异常是否影响程序性能


I made a performance test in my computer (win10) with a Intel xeon CPU E5-1620 v3 @3.5Ghz obtaining simitar results than raspberry pi performance. My bear grew waiting.

I obtained:

integer sums: 2184 Mops (megaoperations/second) as expected
double divisions: 15.6 - 18.32 Mops
double multiplications: 344 -430 Mops
double sums: 881 - 1178 Mops
float divisions: 17.3 - 19.1 Mops

Updated: I tested on a I5 and divisions where slower than 21 MOPs

The question is: ¿does intel E5 has coprocessor?
Can I use a compiler directive to run it faster?
It would work faster in a I7 processor?

What I have tried:

This is my code. Please run it in any computer as it run very well!:

#include <iostream>
#include <time.h>	//clock(), time_t
#pragma warning(disable:4996) //disable deprecateds
using namespace std;


time_t start,stop;char null_char='\0';
//Use empty timer() to reset start time:
void timer(char *title=&null_char,int data_size=1){    	stop=clock();	if (*title) cout<<title<< " time ="<<(double) (stop-start)/(double) CLOCKS_PER_SEC<< " = " << 1e-6*data_size/( (double)(stop-start)/(double)CLOCKS_PER_SEC ) <<  " Mops/seg"   <<endl; 	start=clock(); }


int main()
{
	cout << "Perform test in Release mode. Results will be wrong in debug mode" <<endl;
	int isum=0,size=100*1024*1024;
	timer();//void timer resets timer!
	for (int i=0;i<size;i++)
		isum+=i;
	timer("Time for 100 Mega int sums       ",size);
	double dsum=1.0;
	for (int i=0;i<size;i++)
		dsum=dsum/1.1111;
	timer("Time for 100 Mega double divisions",size);double d2=1.111;dsum+=0.1;
	for (int i=0;i<size;i++)
		dsum/=d2;
	timer("Time for 100 Mega double divisions-2",size);
	for (int i=0;i<size;i++)
		dsum=dsum*d2;
	timer("Time for 100 Mega double multiplications",size);
	for (int i=0;i<size;i++)
		dsum=dsum+d2;
	timer("Time for 100 Mega sums   multiplications",size);

	float fsum=1.0f;
	for (int i=0;i<size;i++)
		fsum=fsum/1.1111f;
	timer("Time for 100 Mega float  divisions",size);

	cout<<endl<<" Reject following line data (done to force for loops be performed after compiler optimizations):"<<endl;;
	cout<<isum<<dsum<<fsum<<endl;//to force for() be done on isum
	cout<<"=== FIN ==="<<endl;getchar();
	return 1;
}

解决方案

Quote:

The question is: ¿does intel E5 has coprocessor?

Yes. All x86 based CPUs have a build-in x87 FPU and vector units (SSE, AVX).

Quote:

Can I use a compiler directive to run it faster?

Yes, but it depends on the compiler and if you can accept reduced error handling and not being strict IEEE compliant. Most compilers have some kind of fast-math options for this purpose. Depending on the used CPU, you can also enable the usage of scalar instructions (SSE) instead of the FPU.

Quote:

It would work faster in a I7 processor?

It depends on the clock rate of the x87 FPU / x86 CPU (for SSE). Each instruction requires a defined number of clock cycles.

Floating point divisions require far more clock cycles than additions or multiplications (8 - 20 times compared with multiplications). This applies to all kind of FPUs, not only to x86 types. They should be avoided when high performance is required (e.g. by multiplying with the reciprocal value within loops).

From the Intel® 64 and IA-32 Architectures Optimization Reference Manual

Quote:

Assembly/Compiler Coding Rule 4. (M impact, M generality) Favor SSE floating-point instructions over x87 floating point instructions.
Assembly/Compiler Coding Rule 5. (MH impact, M generality) Run with exceptions masked and the DAZ and FTZ flags set (whenever possible).
Tuning Suggestion 5. Use the perfmon counters MACHINE_CLEARS.FP_ASSIST to see if floating exceptions are impacting program performance


这篇关于英特尔I5&amp; amp;浮动部门的速度非常慢E5至强处理器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆