文本清理和替换:从Java中的文本中删除\ n [英] Text cleaning and replacement: delete \n from a text in Java

查看:426
本文介绍了文本清理和替换:从Java中的文本中删除\ n的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在清理Java代码中的传入文本。该文本包含许多\ n,但不是在新行中,而是字面上的\ n。我正在使用String类中的replaceAll(),但是无法删除\ n。
这似乎不起作用:

I'm cleaning an incoming text in my Java code. The text includes a lot of "\n", but not as in a new line, but literally "\n". I was using replaceAll() from the String class, but haven't been able to delete the "\n". This doesn't seem to work:

String string;
string = string.replaceAll("\\n", "");

这两者都不是:

String string;
string = string.replaceAll("\n", "");

我想最后一个被识别为实际的新行,所以文本中的所有新行将被删除。

I guess this last one is identified as an actual new line, so all the new lines from the text would be removed.

此外,从String中删除不同模式的错误文本的有效方法是什么。我正在使用正则表达式来检测它们,像HTML保留字符等等和replaceAll,但每次我使用replaceAll,整个字符串都被读取,对吗?

Also, what would be an effective way to remove different patterns of wrong text from a String. I'm using regular expressions to detect them, stuff like HTML reserved characters, etc. and replaceAll, but everytime I use replaceAll, the whole String is read, right?

更新:谢谢您的精彩回答。我在这里扩展了这个问题:

文本替换效率
b $ b我特别询问效率:D

UPDATE: Thanks for your great answers. I' ve extended this question here:
Text replacement efficiency
I'm asking specifically about efficiency :D

推荐答案

Hooknc是对的。我只想发一点解释:

Hooknc is right. I'd just like to post a little explanation:

\\ n在编译完成后转换为\ n(因为你逃脱了反斜杠)。因此正则表达式引擎看到\ n并认为新行,并将删除那些(而不是你的文字\ n)。

"\\n" translates to "\n" after the compiler is done (since you escape the backslash). So the regex engine sees "\n" and thinks new line, and would remove those (and not the literal "\n" you have).

\\ \\ n由编译器转换为真正的新行。所以新行字符被发送到正则表达式引擎。

"\n" translates to a real new line by the compiler. So the new line character is send to the regex engine.

\\\\ n很难看,但是对。编译器删除转义序列,因此正则表达式引擎看到\\ n。正则表达式引擎看到两个反斜杠,并知道第一个反斜杠转义为转换为检查文字字符'\'和'n',给你想要的结果。

"\\\\n" is ugly, but right. The compiler removes the escape sequences, so the regex engine sees "\\n". The regex engine sees the two backslashes and knows that the first one escapes it so that translates to checking for the literal characters '\' and 'n', giving you the desired result.

Java很好(这是我工作的语言),但不得不考虑基本上双重逃避正则表达式可能是一个真正的挑战。为了获得额外的乐趣,StackOverflow似乎也喜欢尝试翻译反斜杠。

Java is nice (it's the language I work in) but having to think to basically double-escape regexes can be a real challenge. For extra fun, it seems StackOverflow likes to try to translate backslashes too.

这篇关于文本清理和替换:从Java中的文本中删除\ n的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆