使用编码删除字符串中的ASCII字符 [英] Removing ASCII characters in a string with encoding
问题描述
我有一个字节数组,由一个串口事件填充,代码如下所示:
I have a byte array which is filled by a serial port event and code is shown below:
private InputStream input = null;
......
......
public void SerialEvent(SerialEvent se){
if(se.getEventType == SerialPortEvent.DATA_AVAILABLE){
int length = input.available();
if(length > 0){
byte[] array = new byte[length];
int numBytes = input.read(array);
String text = new String(array);
}
}
}
变量 text
包含以下字符:
"\033[K", "\033[m", "\033[H2J", "\033[6;1H" ,"\033[?12l", "\033[?25h", "\033[5i", "\033[4i", "\033i" and similar types..
我使用 String.replace
从字符串中删除所有这些字符。
As of now, I use String.replace
to remove all these characters from the string.
我尝试过 new String(array,'CharSet'); //尝试使用所有CharSet选项
,但我无法删除它们。
I have tried new String(array , 'CharSet'); //Tried with all CharSet options
but I couldn't able to remove those.
有没有办法可以在不使用replace方法的情况下删除这些字符?
Is there any way where I can remove those characters without using replace method?
推荐答案
我给了一个不满意的答案,感谢@OlegEstekhin指出。
因为还没有人回答,解决方案不是一个双线,这里就是。
I gave a unsatisfying answer, thanks to @OlegEstekhin for pointing that out. As noone else answered yet, and a solution is not a two-liner, here it goes.
做一个包装InputStream,抛弃转义序列。我使用PushbackInputStream,其中部分序列跳过,仍然可能被推回读取首先。这里 FilterInputStream 就足够了。
Make a wrapping InputStream that throws away escape sequences. I have used a PushbackInputStream, where a partial sequence skipped, may still be pushed back for reading first. Here a FilterInputStream would suffice.
public class EscapeRemovingInputStream extends PushbackInputStream {
public static void main(String[] args) {
String s = "\u001B[kHello \u001B[H12JWorld!";
byte[] buf = s.getBytes(StandardCharsets.ISO_8859_1);
ByteArrayInputStream bais = new ByteArrayInputStream(buf);
EscapeRemovingInputStream bin = new EscapeRemovingInputStream(bais);
try (InputStreamReader in = new InputStreamReader(bin,
StandardCharsets.ISO_8859_1)) {
int c;
while ((c = in.read()) != -1) {
System.out.print((char) c);
}
System.out.println();
} catch (IOException ex) {
Logger.getLogger(EscapeRemovingInputStream.class.getName()).log(
Level.SEVERE, null, ex);
}
}
private static final Pattern ESCAPE_PATTERN = Pattern.compile(
"\u001B\\[(k|m|H\\d+J|\\d+:\\d+H|\\?\\d+\\w|\\d*i)");
private static final int MAX_ESCAPE_LENGTH = 20;
private final byte[] escapeSequence = new byte[MAX_ESCAPE_LENGTH];
private int escapeLength = 0;
private boolean eof = false;
public EscapeRemovingInputStream(InputStream in) {
this(in, MAX_ESCAPE_LENGTH);
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
for (int i = 0; i < len; ++i) {
int c = read();
if (c == -1) {
return i == 0 ? -1 : i;
}
b[off + i] = (byte) c;
}
return len;
}
@Override
public int read() throws IOException {
int c = eof ? -1 : super.read();
if (c == -1) { // Throw away a trailing half escape sequence.
eof = true;
return c;
}
if (escapeLength == 0 && c != 0x1B) {
return c;
} else {
escapeSequence[escapeLength] = (byte) c;
++escapeLength;
String esc = new String(escapeSequence, 0, escapeLength,
StandardCharsets.ISO_8859_1);
if (ESCAPE_PATTERN.matcher(esc).matches()) {
escapeLength = 0;
} else if (escapeLength == MAX_ESCAPE_LENGTH) {
escapeLength = 0;
unread(escapeSequence);
return super.read(); // No longer registering the escape
}
return read();
}
}
}
- 用户调用
EscapeRemovingInputStream.read
- 此
读
调用某些读取本身以填充字节缓冲区escapeSequence - (可以通过调用
未读
来执行回推) - 原始
读取
返回。 - User calls
EscapeRemovingInputStream.read
- this
read
may call some read's itself to fill an byte buffer escapeSequence - (a push-back may be done calling
unread
) - the original
read
returns.
一个转义序列似乎语法:命令字母,数字参数。因此,我使用正则表达式。
The recognition of an escape sequence seems grammatical: command letter, numerical argument(s). Hence I use a regular expression.
这篇关于使用编码删除字符串中的ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!