Java NIO通过ByteBuffer扫描某些字节和带有节的字 [英] Java NIO scan through ByteBuffer for certain bytes and word with sections

查看:163
本文介绍了Java NIO通过ByteBuffer扫描某些字节和带有节的字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,所以我正在尝试做一些看起来应该相当简单的事情,但是对于这些新的NIO接口,事情让我感到困惑!这是我正在尝试做的事情,我需要扫描一个文件作为字节,直到遇到某些字节!当我遇到那些特定的字节时,需要抓住那段数据并用它做一些事情,然后再继续这样做。我本以为在ByteBuffer中有了所有这些标记,位置和限制,我能够做到这一点,但我似乎无法让它工作!这是我到目前为止所拥有的......

Okay, so I'm trying to do something that seemed like it should be fairly simple, but with these new NIO interfaces, things are confusing the hell out of me! Here's what I'm trying to do, I need to scan through a file as bytes until encountering certain bytes! When I encounter those certain bytes, need to grab that segment of the data and do something with it, and then move on and do this again. I would have thought that with all these markers and positions and limits in ByteBuffer, I'd be able to do this, but I can't seem make it work! Here's what I have so far..

test.text:

test.text:

this is a line of text a
this is line 2b
line 3
line 4
line etc.etc.etc.

Test.java:

Test.java:

import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTE = {0x0A, 0x0D};

    public Test() {

        String pathString = "test.txt";

        //the path to the file
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {            
            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int pos = 0;
                System.out.println("FILE LOADED: |" + new String(buffer.array(), ENCODING) + "|");
                do {
                    byte b = buffer.get();
                    if (b == NEWLINE_BYTE[0] || b == NEWLINE_BYTE[1]) {
                        System.out.println("POS: " + pos);
                        System.out.println("POSITION: " + buffer.position());
                        System.out.println("LENGTH: " + Integer.toString(buffer.position() - pos));
                        ByteBuffer lineBuffer = ByteBuffer.wrap(buffer.array(), pos + 1, buffer.position() - pos);
                        System.out.println("LINE: |" + new String(lineBuffer.array(), ENCODING) + "|");
                        pos = buffer.position();
                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) {
           ioe.printStackTrace();
        }
    }
    public static void main(String args[]) {
        Test t = new Test();
    }
}

所以第一部分正在运行,fc.read (缓冲区)函数只运行一次并将整个文件拉入ByteBuffer。然后在第二个do循环中,我能够逐字节循环,当它命中\ n(或\r)时它会触及if语句,但后来我无法弄清楚如何得到我刚才看到的字节的PORTION成一个单独的字节数组来使用!我已经尝试过拼接和各种翻转,我已经尝试过如上面代码所示的包装,但似乎无法使其工作,两个缓冲区总是有完整的文件,所以我拼接或包装它的任何东西!

So the first part is working, the fc.read(buffer) function only ever runs once and pulls the entire file into the ByteBuffer. Then in the second do loop, I'm able to loop through byte by byte just fine and it does hit the if statement when it hits a \n(or \r), but then I can't figure out how to get that PORTION of the bytes I've just looked through into a separate byte array to work with! I've tried splice and various flips, and I've tried wrap as shown in the code above, but can't seem to make it work, both buffers alway have the complete file and so does anything I splice or wrap off it!

我只需要逐字循环遍历文件,一次查看某个部分,然后查看我的最终目标,当我查看并找到正确的位置,我想在正确的位置插入一些数据!我需要在LINE:输出的lineBuffer,以便只有我到目前为止循环的部分字节!求助!谢谢!

I just need to loop through the file byte by byte, looking at a certain section at a time, and then my end goal, when I've looked through and found the right spot, I want to insert some data to the right spot! I need that lineBuffer as outputted at "LINE: " to have ONLY the portion of the bytes I've looped through so far! Help and thank you!

推荐答案

这是我最终得到的解决方案,使用ByteBuffer的批量相对get函数来获取每次大块。我认为我正在使用mark()功能,尽管我使用了一个额外的变量(pos)来跟踪标记,因为我找不到ByteBuffer中的函数来返回标记本身的相对位置。此外,我有明确的功能来按顺序查找\ r,\ n或两者。请记住,此代码仅适用于UTF-8编码数据。我希望这可以帮助其他人。

Here is the solution I ended up with, using the bulk relative get function of ByteBuffer to get the chunk each time. I think I'm using the mark() functionality as it's intended, though am using an additional variable (pos) to keep track of the mark since I can't find a function in ByteBuffer to return the relative position of the mark itself. Also, I've got explicit functionality to look for either \r, \n, or both in sequence. Keep in mind this code will only work on UTF-8 encoded data. I hope this helps someone else.

public class Test {
    public static final Charset ENCODING = Charset.forName("UTF-8");
    public static final byte[] NEWLINE_BYTES = {0x0A, 0x0D};

    public Test() {
        //test text file sequence of any strings followed by newline
        String pathString = "test.txt";
        Path path = Paths.get(pathString);

        try (FileChannel fc = FileChannel.open(path, 
                StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {

            if (fc.size() > 0) {
                int n;
                ByteBuffer buffer = ByteBuffer.allocate((int) fc.size());
                do {                    
                    n = fc.read(buffer);
                } while (n != -1 && buffer.hasRemaining());
                buffer.flip();
                int newlineByteCount = 0;
                buffer.mark();
                do {
                    //get one byte at a time
                    byte b = buffer.get();

                    if (b == NEWLINE_BYTES[0] || b == NEWLINE_BYTES[1]) {
                        newlineByteCount++;

                        byte nextByte = buffer.get();
                        if (nextByte == NEWLINE_BYTES[1]) {
                            newlineByteCount++;
                        } else {
                            buffer.position(buffer.position() - 1);
                        }

                        int pos = buffer.position();
                        //reset the buffer back to the mark() position
                        buffer.reset();
                        //create an array just the right length and get the bytes we just measured out 
                        int length = pos - buffer.position() - newlineByteCount;
                        byte[] lineBytes = new byte[length];
                        buffer.get(lineBytes, 0, length);

                        String lineString = new String(lineBytes, ENCODING);
                        System.out.println("LINE: " + lineString);

                        buffer.position(buffer.position() + newlineByteCount);

                        buffer.mark();
                        newlineByteCount = 0;
                    } else if (newlineByteCount > 0) {

                    }
                } while (buffer.hasRemaining());
            } 
        } catch (IOException ioe) { ioe.printStackTrace(); }
    }
    public static void main(String args[]) { new Test(); }
}

这篇关于Java NIO通过ByteBuffer扫描某些字节和带有节的字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆