过滤(搜索和替换) InputStream 中的字节数组 [英] Filter (search and replace) array of bytes in an InputStream

查看:19
本文介绍了过滤(搜索和替换) InputStream 中的字节数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 InputStream,它将 html 文件作为输入参数.我必须从输入流中获取字节.

I have an InputStream which takes the html file as input parameter. I have to get the bytes from the input stream .

我有一个字符串:"XYZ".我想将此字符串转换为字节格式,并检查从 InputStream 获得的字节序列中的字符串是否匹配.如果有,我必须用其他字符串的再见序列替换匹配.

I have a string: "XYZ". I'd like to convert this string to byte format and check if there is a match for the string in the byte sequence which I obtained from the InputStream. If there is then, I have to replace the match with the bye sequence for some other string.

有人可以帮我解决这个问题吗?我已经使用正则表达式来查找和替换.但是查找和替换字节流,我不知道.

Is there anyone who could help me with this? I have used regex to find and replace. however finding and replacing byte stream, I am unaware of.

以前,我使用jsoup来解析html并替换字符串,但是由于一些utf编码问题,当我这样做时,文件似乎已损坏.

Previously, I use jsoup to parse html and replace the string, however due to some utf encoding problems, the file seems to appear corrupted when I do that.

TL;DR:我的问题是:

是一种在 Java 中的原始 InputStream 中查找和替换字节格式的字符串的方法吗?

Is a way to find and replace a string in byte format in a raw InputStream in Java?

推荐答案

不确定您是否选择了解决问题的最佳方法.

Not sure you have chosen the best approach to solve your problem.

也就是说,我不喜欢(并且有政策不)用不"回答问题,所以这里......

That said, I don't like to (and have as policy not to) answer questions with "don't" so here goes...

看看FilterInputStream.

来自文档:

FilterInputStream 包含一些其他输入流,它用作其基本数据源,可能沿途转换数据或提供附加功能.

A FilterInputStream contains some other input stream, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.

<小时>

把它写下来是一个有趣的练习.这是一个完整的示例:


It was a fun exercise to write it up. Here's a complete example for you:

import java.io.*;
import java.util.*;

class ReplacingInputStream extends FilterInputStream {

    LinkedList<Integer> inQueue = new LinkedList<Integer>();
    LinkedList<Integer> outQueue = new LinkedList<Integer>();
    final byte[] search, replacement;

    protected ReplacingInputStream(InputStream in,
                                   byte[] search,
                                   byte[] replacement) {
        super(in);
        this.search = search;
        this.replacement = replacement;
    }

    private boolean isMatchFound() {
        Iterator<Integer> inIter = inQueue.iterator();
        for (int i = 0; i < search.length; i++)
            if (!inIter.hasNext() || search[i] != inIter.next())
                return false;
        return true;
    }

    private void readAhead() throws IOException {
        // Work up some look-ahead.
        while (inQueue.size() < search.length) {
            int next = super.read();
            inQueue.offer(next);
            if (next == -1)
                break;
        }
    }

    @Override
    public int read() throws IOException {    
        // Next byte already determined.
        if (outQueue.isEmpty()) {
            readAhead();

            if (isMatchFound()) {
                for (int i = 0; i < search.length; i++)
                    inQueue.remove();

                for (byte b : replacement)
                    outQueue.offer((int) b);
            } else
                outQueue.add(inQueue.remove());
        }

        return outQueue.remove();
    }

    // TODO: Override the other read methods.
}

示例用法

class Test {
    public static void main(String[] args) throws Exception {

        byte[] bytes = "hello xyz world.".getBytes("UTF-8");

        ByteArrayInputStream bis = new ByteArrayInputStream(bytes);

        byte[] search = "xyz".getBytes("UTF-8");
        byte[] replacement = "abc".getBytes("UTF-8");

        InputStream ris = new ReplacingInputStream(bis, search, replacement);

        ByteArrayOutputStream bos = new ByteArrayOutputStream();

        int b;
        while (-1 != (b = ris.read()))
            bos.write(b);

        System.out.println(new String(bos.toByteArray()));

    }
}

给定它打印的字符串 "Hello xyz world" 的字节:

Given the bytes for the string "Hello xyz world" it prints:

Hello abc world

这篇关于过滤(搜索和替换) InputStream 中的字节数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆