**忙碌**如何使用sun.misc.Unsafe加快byte []查找的速度,以使其更快? [英] **BUSTED** How to speed up a byte[] lookup to be faster using sun.misc.Unsafe?
问题描述
我正在尝试使用Unsafe遍历内存,而不是遍历byte []中的值.使用不安全分配内存块.该存储器足以容纳65536个字节的值.
I am experimenting with Unsafe to iterate over memory instead of iterating over the values in a byte[]. A memory block is allocated using unsafe. The memory is sufficient to hold 65536 byte values.
我正在尝试:
char aChar = some character
if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
// do something
}
代替:
char aChar = some character
if ((byte) 0 == ( lookup[aChar] & mask )){
// do something
}
我认为不安全可以比使用常规数组访问对每个索引执行的索引检查更快地访问内存...
I thought Unsafe could access the memory faster than using a regular array access with the index check it does for each index...
只是一厢情愿地认为jvm将具有一个特殊的op(不安全),它将以某种方式使常规数组访问和迭代更快.在我看来,jvm可以与普通的byte []迭代一起正常工作,并且可以使用普通的纯净Java代码尽可能快地完成它们.
It was only wishful thinking that the jvm would have a special op (unsafe) that would somehow make regular array access and iteration faster. The jvm, it seems to me, works fine with normal byte[] iterations and does them, fast as can be, using normal, unadulterated, vanilla java code.
@millimoose击中了众所周知的钉在头上"
@millimoose hits the proverbial 'nail on the head'
不安全可能对很多事情都有用,但是这种微优化级别不是其中之一.–毫摩尔"
"Unsafe might be useful for a lot of things, but this level of microoptimisation isn't one of them. – millimoose"
在非常严格的有限情况下,使用不安全"会更快:
Using Unsafe is faster in a very strict limited set of circumstances:
- (仅适用于64位jvm),单个65535字节[]查找的速度更快.每次测试精确完成一次.在这种情况下,64位jvm上的UnsafeLookup_8B速度提高了24%.如果测试重复进行,以便每个测试执行两次,则正常方法现在比不安全方法快30%.在冷jvm上的纯解释模式下,Unsafe到目前为止要快得多,但是这只是第一次,并且仅适用于较小的阵列大小.在32位标准Oracle JVM 7.x上,正常速度比不安全模式快三倍.
- (64bit jvm only) faster for a single 65535 byte[] lookup done exactly once for each test. In this case UnsafeLookup_8B on 64_bit jvm is 24% faster. If the test repeats itself so that each test is done twice, the normal method is now 30% faster than unsafe. In pure interpreted mode on a cold jvm, the Unsafe is faster by far --- but only the first time and only for a small array size. ON a 32 bit standard Oracle JVM 7.x, the normal is three times faster than using unsafe.
(在我的测试中)使用不安全"的速度较慢:
Using Unsafe (in my tests) is slower:
- 在Oracle Java 64位和32位虚拟机上都变慢
- 无论操作系统和机器架构(32位和64位)如何,都较慢
-
变慢,即使调用
server
jvm选项
Unsafe从9%或更高的速度变慢(下面代码在32位jvm上的速度为1_GB阵列和UnsafeLookup_8B(最快的速度)(64位甚至更慢?))
Unsafe is slower from 9% or more ( 1_GB array and UnsafeLookup_8B(fastest one) in code below on 32 bit jvm (64bit was even slower??))
有什么原因吗?**
当我运行下面发布的代码yellowB(检查1GB字节[])时,正常速度还是最快的:
When I run the code yellowB posted below (checks a 1GB byte[]), the normal is also still the fastest:
C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1967737 us.
use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.
use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.
Flat profile of 26.35 secs (2018 total ticks): main
Interpreted + native Method
0.0% 1 + 0 test.StackOverflow.main
0.0% 1 + 0 Total interpreted
Compiled + native Method
67.8% 1369 + 0 test.StackOverflow.main
11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B
11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B
9.1% 184 + 0 test.StackOverflow.normalLookup
99.9% 2016 + 0 Total compiled
Stub + native Method
0.0% 0 + 1 sun.misc.Unsafe.getLong
0.0% 0 + 1 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 26.39 seconds:
100.0% 2023 Received ticks
C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)
CPU是:Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB(可用3.25GB)操作系统:Windows 7(32)
CPU is: Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB (3.25GB usable) OS: Windows 7 (32)
在具有Windows 7_64、32位Java的4核AMD64上运行测试:
Running the test on an 4 Core AMD64 with Windows 7_64, 32 bit java:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1631142 us.
use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.
use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.
在具有Windows 7_64、64位Java的4核AMD64上运行测试:
Running the test on an 4 Core AMD64 with Windows 7_64, 64 bit java:
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
推荐答案
我认为您发布的两个函数基本相同,因为它们只读取1个字节,然后将其转换为int并进行进一步的比较.
I think the two functions you posted are basically the same because they read only 1 byte and then convert it to int and do the futher comparing.
每次读取4字节int或8字节长的代码要有效得多.我编写了两个函数来做同样的事情:比较两个byte []的内容以查看它们是否相同:
Reading 4-Byte int or 8-Byte long every time is much more effective.I wrote two function to do the same thing:compare the content of two byte[] to see if they are the same:
功能1:
public static boolean hadoopEquals(byte[] b1, byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
// Bring WritableComparator code local
for(int i = 0;i < b1.length; ++i)
{
int a = (b1[i] & 0xff);
int b = (b2[i] & 0xff);
if (a != b)
{
return false;
}
}
return true;
}
功能2:
public static boolean goodEquals(byte[] b1,byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
int numLongs = (int)Math.ceil(b1.length / 8.0);
for(int i = 0;i < numLongs; ++i)
{
long currentOffset = baseOffset + (i * 8);
long l1 = UnSafe.getLong(b1, currentOffset);
long l2 = UnSafe.getLong(b2, currentOffset);
if(0L != (l1 ^ l2))
{
return false;
}
}
return true;
}
我在笔记本电脑上运行了这两个功能(corei7 2630QM,8GB DDR3、64bit win 7、64bit Hotspot JVM),并比较了两个400MB字节[],结果如下:
I ran these two functions on my laptop(corei7 2630QM , 8GB DDR3 , 64bit win 7 , 64bit Hotspot JVM),and compare two 400MB byte[],result is below:
功能1:〜670ms
function 1: ~670ms
功能2:〜80ms
2更快.
所以我的建议是每次读取8字节并使用XOR运算符(^):
So my suggestion is to read 8-byte every time and use the XOR operator(^):
long l1 = UnSafe.getLong(byteArray, offset); //8 byte
if(0L == l1 ^ 0xFF) //if the lowest byte == 0?
/* do something */
if(0L == l1 ^ 0xFF00) //if the 2nd lowest byte == 0?
/* do something */
/* go on... */
============================================================================
============================================================================
威尔夫,我使用您的代码制作一个如下的测试类,该类在查找字节数组中的第一个0时比较3个函数之间的速度:
Hi Wilf, I use your code to make a test class as below,this class compare the speed among 3 functions in looking up the 1st 0 in a byte array:
package test;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
/**
* Test the speed in looking up the 1st 0 in a byte array
* Set -Xms the same as -Xms to avoid Heap reallocation
*
* @author yellowb
*
*/
public class StackOverflow
{
public static Unsafe UnSafe;
public static Unsafe getUnsafe() throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
return unsafe;
}
/**
* use 'byte[index]' form to read 1 byte every time
* @param buf
*/
public static void normalLookup(byte[] buf)
{
for (int i = 0; i < buf.length; ++i)
{
if ((byte) 0 == buf[i])
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getByte to read 1 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_1B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
for (int i = 0; i < buf.length; ++i)
{
byte b = UnSafe.getByte(buf, (long) (baseOffset + i));
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getLong to read 8 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_8B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
//The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop
int numLongs = buf.length / 8;
long currentOffset = 0L;
for (int i = 0; i < numLongs; ++i)
{
currentOffset = baseOffset + (i * 8); //the step is 8 bytes
long l = UnSafe.getLong(buf, currentOffset);
//Compare each byte(in the 8-Byte long) to 0
//PS:x86 cpu is little-endian mode
if (0L == (l & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (i * 8));
return;
}
if (0L == (l & 0xFF00L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 1));
return;
}
if (0L == (l & 0xFF0000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 2));
return;
}
if (0L == (l & 0xFF000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 3));
return;
}
if (0L == (l & 0xFF00000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 4));
return;
}
if (0L == (l & 0xFF0000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 5));
return;
}
if (0L == (l & 0xFF000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 6));
return;
}
if (0L == (l & 0xFF00000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 7));
return;
}
}
//If some rest bytes exists
int rest = buf.length % 8;
if(0 != rest)
{
currentOffset = currentOffset + 8;
//Because the length of rest bytes < 8,we have to read them one by one
for(; currentOffset < (baseOffset + buf.length); ++currentOffset)
{
byte b = UnSafe.getByte(buf, (long)currentOffset);
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (currentOffset - baseOffset));
return;
}
}
}
System.out.println("Not found '0'");
}
public static void main(String[] args) throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
UnSafe = getUnsafe();
int len = 1024 * 1024 * 1024; //1G
long startTime = 0L;
long endTime = 0L;
System.out.println("initialize data...");
byte[] byteArray1 = new byte[len];
for (int i = 0; i < len; ++i)
{
byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0
}
//If you want to set one byte to 0,uncomment the below statement
// byteArray1[2500] = (byte)0;
System.out.println("initialize data done!");
System.out.println("use normalLookup()...");
startTime = System.nanoTime();
normalLookup(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_1B()...");
startTime = System.nanoTime();
unsafeLookup_1B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_8B()...");
startTime = System.nanoTime();
unsafeLookup_8B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
}
}
输出为:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1271781 us.
use unsafeLookup_1B()...
Not found '0'
time : 716898 us.
use unsafeLookup_8B()...
Not found '0'
time : 591689 us.
结果表明,即使每次使用Unsafe.getByte()读取1个字节,也比定期迭代byte []快得多.而读取8字节长的速度最快.
the result shows that even reading 1 byte every time by Unsafe.getByte() is much faster than iterating the byte[] regularly.And reading 8-byte long is the fastest.
这篇关于**忙碌**如何使用sun.misc.Unsafe加快byte []查找的速度,以使其更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!