Java 中的 Trim() 没有按我期望的方式工作? [英] Trim() in Java not working the way I expect?

查看:39
本文介绍了Java 中的 Trim() 没有按我期望的方式工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<块引用>

可能的重复:
查询Java中的trim()方法

我正在解析一个站点的用户名和其他信息,每一个后面都有一堆空格(但单词之间有空格).例如:Bob the Builder"或Sam thewelder".空格的数量因名称而异.我想我只用 .trim(),因为我以前用过它.但是,这给我带来了麻烦.我的代码如下所示:

for (int i = 0; i 

结果是一样的;最后没有删除空格.预先感谢您的出色回答!

更新:

完整的代码有点复杂,因为首先解析出 HTML 标签.它完全是这样的:

for (String s : splitSource2) {if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {splitSource3.add(s.substring("<td class=\"dddefault\">".length()));}}System.out.println("\n");for (int i = 0; i < splitSource3.size(); i++) {splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));splitSource3.set(i, splitSource3.get(i).trim());System.out.println(i + ": " + splitSource3.get(i));}}

更新:

冷静.我从来没有说错误在于 Java,我从来没有说它是一个错误或损坏或任何东西.我只是说我遇到了问题,并发布了我的代码供您协作并帮助解决我的问题.请注意短语我的问题"而不是java 的问题".我实际上已经把代码打印出来了

System.out.println(i + ": " + splitSource3.get(i) + "*");

在之后的每个循环中.

这就是我知道我遇到问题的方式.顺便说一下,问题还没有解决.

更新:

示例输出(减去单引号):

'0: Olin D. Kirkland ''1:大二 ''2:某地,弗吉尼亚州 12345<br/>VA SomeCity<br/>''3:本科 '

EDIT OP 在 Query 上重新表述了他的问题关于 Java 中的 trim() 方法,发现问题在于 String.trim() 不匹配的 Unicode 空白字符.

解决方案

我突然想到,在我从事屏幕抓取项目时,曾经遇到过此类问题.关键是有时下载的 HTML 源包含不可打印的字符,这些字符也是非空白字符.这些很难复制粘贴到浏览器.我想这可能发生在你身上.

如果我的假设是正确的,那么您有两个选择:

  1. 使用二进制阅读器找出这些字符是什么——然后用 String.replace() 删除它们;例如:

    <前>私有静态无效 cutCharacters(String fromHtml) {字符串结果 = fromHtml;char[] 有问题的Characters = {'\000', '\001', '\003'};//这也可能是一个私有的静态最终常量for (char ch : 问题字符) {result = result.replace(ch, "");//我知道,修改输入参数很脏.但它会作为一个例子}返回结果;}

  2. 如果您在要解析的 HTML 中发现某种重复出现的模式,那么您可以使用正则表达式和子字符串来剪切不需要的部分.例如:

    <前>私人字符串 getImportantParts(String fromHtml) {模式 p = Pattern.compile("(\\w*\\s*)");//这也可能是一个私有的静态最终常量.匹配器 m = p.matcher(fromHtml);StringBuilder buff = new StringBuilder();而(m.find()){buff.append(m.group(1));}返回 buff.toString().trim();}

Possible Duplicate:
Query about the trim() method in Java

I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words). For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before. However, it's giving me trouble. My code looks like this:

for (int i = 0; i < splitSource3.size(); i++) {
            splitSource3.set(i, splitSource3.get(i).trim());
}

The result is just the same; no spaces are removed at the end. Thank you in advance for your excellent answers!

UPDATE:

The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:

for (String s : splitSource2) {
        if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
                splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
        }
}

System.out.println("\n");
    for (int i = 0; i < splitSource3.size(); i++) {
            splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
            splitSource3.set(i, splitSource3.get(i).trim());
            System.out.println(i + ": " + splitSource3.get(i));
    }
}

UPDATE:

Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out

System.out.println(i + ": " + splitSource3.get(i) + "*");

in a for each loop afterward.

This is how I knew I had a problem. By the way, the problem has still not been fixed.

UPDATE:

Sample output (minus single quotes):

'0: Olin D. Kirkland                                          '
'1: Sophomore                                          '
'2: Someplace, Virginia  12345<br />VA SomeCity<br />'
'3: Undergraduate                                          '

EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().

解决方案

It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.

If my assumption is correct then you've got two choices:

  1. Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:

    private static void cutCharacters(String fromHtml) {
        String result = fromHtml;
        char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
        for (char ch : problematicCharacters) {
            result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
        }
        return result;
    }
    

  2. If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:

    private String getImportantParts(String fromHtml) {
        Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
        Matcher m = p.matcher(fromHtml);
        StringBuilder buff = new StringBuilder();
        while (m.find()) {
            buff.append(m.group(1));
        }
        return buff.toString().trim();
    }
    

这篇关于Java 中的 Trim() 没有按我期望的方式工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆