查找indexOf另一个字节数组中的字节数组 [英] Find indexOf a byte array within another byte array

查看:190
本文介绍了查找indexOf另一个字节数组中的字节数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个字节数组,如何在其中找到(较小的)字节数组的位置?

Given a byte array, how can I find within it, the position of a (smaller) byte array?

本文档看起来很有前途,使用 ArrayUtils ,但如果我是正确的,它只会让我在数组中找到要搜索的单个字节。

This documentation looked promising, using ArrayUtils, but if I'm correct it would only let me find an individual byte within the array to be searched.

(我看不出来有关系,但以防万一:有时搜索字节数组将是常规ASCII字符,有时它将是控制字符或扩展的ASCII字符。因此使用字符串操作并不总是合适的)

(I can't see it mattering, but just in case: sometimes the search byte array will be regular ASCII characters, other times it will be control characters or extended ASCII characters. So using String operations would not always be appropriate)

大数组可能在10到10000个字节之间,较小的数组可能在10左右。在某些情况下,我会在一次搜索中在较大的数组中找到几个较小的数组。我有时希望找到实例的最后一个索引而不是第一个。

The large array could be between 10 and about 10000 bytes, and the smaller array around 10. In some cases I will have several smaller arrays that I want found within the larger array in a single search. And I will at times want to find the last index of an instance rather than the first.

推荐答案

Java字符串由16个字符串组成-bit char s,而不是8位 byte s。 char 可以保存字节,因此您始终可以将字节数组转换为字符串,并使用 indexOf :ASCII字符,控制字符,甚至零字符都可以正常工作。

Java strings are composed of 16-bit chars, not of 8-bit bytes. A char can hold a byte, so you can always make your byte arrays into strings, and use indexOf: ASCII characters, control characters, and even zero characters will work fine.

这是一个演示:

byte[] big = new byte[] {1,2,3,0,4,5,6,7,0,8,9,0,0,1,2,3,4};
byte[] small = new byte[] {7,0,8,9,0,0,1};
String bigStr = new String(big, StandardCharsets.UTF_8);
String smallStr = new String(small, StandardCharsets.UTF_8);
System.out.println(bigStr.indexOf(smallStr));

打印 7

但是,考虑到您的大型阵列可能高达10,000字节,小数组只有十个字节,这个解决方案可能不是最有效的,原因有两个:

However, considering that your large array could be up to 10,000 bytes, and the small array is only ten bytes, this solution may not be the most efficient, for two reasons:


  • 它需要复制你的大数组到两倍大的数组(相同容量,但 char 而不是 byte )。这会使你的内存需求增加三倍。

  • Java的字符串搜索算法并不是最快的。如果您实现其中一种高级算法,则可能会更快,例如, Knuth-Morris-Pratt 。这可能会使执行速度降低十倍(小字符串的长度),并且需要额外的内存,这与小字符串的长度成正比,而不是大字符串。

  • It requires copying your big array into an array that is twice as large (same capacity, but with char instead of byte). This triples your memory requirements.
  • String search algorithm of Java is not the fastest one available. You may get sufficiently faster if you implement one of the advanced algorithms, for example, the Knuth–Morris–Pratt one. This could potentially bring the execution speed down by a factor of up to ten (the length of the small string), and will require additional memory that is proportional to the length of the small string, not the big string.

这篇关于查找indexOf另一个字节数组中的字节数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆