使用编码删除字符串中的ASCII字符 [英] Removing ASCII characters in a string with encoding

查看:253
本文介绍了使用编码删除字符串中的ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字节数组,由一个串口事件填充,代码如下所示:

I have a byte array which is filled by a serial port event and code is shown below:

private InputStream input = null; 
......
......
public void SerialEvent(SerialEvent se){
  if(se.getEventType == SerialPortEvent.DATA_AVAILABLE){
    int length = input.available();
    if(length > 0){
      byte[] array = new byte[length];
      int numBytes = input.read(array);
      String text = new String(array);
    }
  }
}

变量 text 包含以下字符:

"\033[K", "\033[m",  "\033[H2J", "\033[6;1H" ,"\033[?12l", "\033[?25h", "\033[5i", "\033[4i", "\033i" and similar types..

我使用 String.replace 从字符串中删除所有这些字符。

As of now, I use String.replace to remove all these characters from the string.

我尝试过 new String(array,'CharSet'); //尝试使用所有CharSet选项,但我无法删除它们。

I have tried new String(array , 'CharSet'); //Tried with all CharSet options but I couldn't able to remove those.

有没有办法可以在不使用replace方法的情况下删除这些字符?

Is there any way where I can remove those characters without using replace method?

推荐答案

我给了一个不满意的答案,感谢@OlegEstekhin指出。
因为还没有人回答,解决方案不是一个双线,这里就是。

I gave a unsatisfying answer, thanks to @OlegEstekhin for pointing that out. As noone else answered yet, and a solution is not a two-liner, here it goes.

做一个包装InputStream,抛弃转义序列。我使用PushbackInputStream,其中部分序列跳过,仍然可能被推回读取首先。这里 FilterInputStream 就足够了。

Make a wrapping InputStream that throws away escape sequences. I have used a PushbackInputStream, where a partial sequence skipped, may still be pushed back for reading first. Here a FilterInputStream would suffice.

public class EscapeRemovingInputStream extends PushbackInputStream {

    public static void main(String[] args) {
        String s = "\u001B[kHello \u001B[H12JWorld!";
        byte[] buf = s.getBytes(StandardCharsets.ISO_8859_1);
        ByteArrayInputStream bais = new ByteArrayInputStream(buf);
        EscapeRemovingInputStream bin = new EscapeRemovingInputStream(bais);
        try (InputStreamReader in = new InputStreamReader(bin,
                StandardCharsets.ISO_8859_1)) {
            int c;
            while ((c = in.read()) != -1) {
                System.out.print((char) c);
            }
            System.out.println();
        } catch (IOException ex) {
            Logger.getLogger(EscapeRemovingInputStream.class.getName()).log(
                Level.SEVERE, null, ex);
        }
    }

    private static final Pattern ESCAPE_PATTERN = Pattern.compile(
        "\u001B\\[(k|m|H\\d+J|\\d+:\\d+H|\\?\\d+\\w|\\d*i)");
    private static final int MAX_ESCAPE_LENGTH = 20;

    private final byte[] escapeSequence = new byte[MAX_ESCAPE_LENGTH];
    private int escapeLength = 0;
    private boolean eof = false;

    public EscapeRemovingInputStream(InputStream in) {
        this(in, MAX_ESCAPE_LENGTH);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        for (int i = 0; i < len; ++i) {
            int c = read();
            if (c == -1) {
                return i == 0 ? -1 : i;
            }
            b[off + i] = (byte) c;
        }
        return len;
    }

    @Override
    public int read() throws IOException {
        int c = eof ? -1 : super.read();
        if (c == -1) { // Throw away a trailing half escape sequence.
            eof = true;
            return c;
        }
        if (escapeLength == 0 && c != 0x1B) {
            return c;
        } else {
            escapeSequence[escapeLength] = (byte) c;
            ++escapeLength;
            String esc = new String(escapeSequence, 0, escapeLength,
                    StandardCharsets.ISO_8859_1);
            if (ESCAPE_PATTERN.matcher(esc).matches()) {
                escapeLength = 0;
            } else if (escapeLength == MAX_ESCAPE_LENGTH) {
                escapeLength = 0;
                unread(escapeSequence);
                return super.read(); // No longer registering the escape
            }
            return read();
        }
    }

}




  • 用户调用 EscapeRemovingInputStream.read

  • 调用某些读取本身以填充字节缓冲区escapeSequence

  • (可以通过调用未读来执行回推)

  • 原始读取返回。

    • User calls EscapeRemovingInputStream.read
    • this read may call some read's itself to fill an byte buffer escapeSequence
    • (a push-back may be done calling unread)
    • the original read returns.
    • 一个转义序列似乎语法:命令字母,数字参数。因此,我使用正则表达式。

      The recognition of an escape sequence seems grammatical: command letter, numerical argument(s). Hence I use a regular expression.

      这篇关于使用编码删除字符串中的ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆