关于CharMatcher.WHITESPACE实现 [英] About the CharMatcher.WHITESPACE implementation

查看:285
本文介绍了关于CharMatcher.WHITESPACE实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我查找 CharMatcher 的实现时,注意一个字段 WHITESPACE_MULTIPLIER = 1682554634 ,然后我设置了这个值到 1582554634 ,运行测试用例 CharMatcherTest#testWhitespaceBreakingWhitespaceSubset ,当然失败了。

When i looked up the implementation of CharMatcher and notice a field WHITESPACE_MULTIPLIER=1682554634 , then i set this value to 1582554634 , running the testcase CharMatcherTest#testWhitespaceBreakingWhitespaceSubset, of course it failed.

之后我将testWhitespaceBreakingWhitespaceSubset更改为仅调用 WHITESPACE.apply((char)c)而不断言,打印方法中的索引 WHITESPACE.matches

After that I changed testWhitespaceBreakingWhitespaceSubset to only invoke WHITESPACE.apply((char)c) without assert, print the index in the method of WHITESPACE.matches

int index=(WHITESPACE_MULTIPLIER * c) >>> WHITESPACE_SHIFT)

在更改了 WHITESPACE_MULTIPLIER 1682554634 1582554634

毫无疑问,1682554634设计得很好,我的问题是如何推断这个神奇的数字

No doubt, 1682554634 is well designed , my question is how can I infer this "magic number"?`

Martin Grajcar的建议,我尝试编写魔术数字生成器,如下所示:

Upon Martin Grajcar's proposal, I try to write the "magic number generator" as follows and worked :

char[] charsReq = WHITESPACE_TABLE.toCharArray();
Arrays.sort(charsReq);
OUTER:
for (int WHITESPACE_MULTIPLIER_WANTTED = 1682553701; WHITESPACE_MULTIPLIER_WANTTED <= 1682554834; WHITESPACE_MULTIPLIER_WANTTED++) {
    int matchCnt = 0;
    for (int c = 0; c <= Character.MAX_VALUE; c++) {
        int position = Arrays.binarySearch(charsReq, (char) c);
        char index = WHITESPACE_TABLE.charAt((WHITESPACE_MULTIPLIER_WANTTED * c) >>> WHITESPACE_SHIFT);
        if (position >= 0 && index == c) {
                matchCnt++;
        } else if (position < 0 && index != c) {
                matchCnt++;
        } else {
            continue OUTER;
        }
    }
    // all valid
    if ((matchCnt - 1) == (int) (Character.MAX_VALUE)) {
        System.out.println(WHITESPACE_MULTIPLIER_WANTTED);
    }
}

如果更改了字符序列(swap \\\  WHITESPACE_TABLE中的算法没有解决方案(将循环结束条件更改为Integer.MAX_VALUE)。

if changed the sequence of characters(swap \u2001 \u2002 position) in WHITESPACE_TABLE the algorithms has no solution (changed the loop end condition to Integer.MAX_VALUE).



作为IntMath .gcd实现是指 http://en.wikipedia.org/wiki/Binary_GCD_algorithm

我的问题是:我在哪里可以找到 CharMatcher.WHITESPACE.match 实施的材料?


as the IntMath.gcd implementation is refer to http://en.wikipedia.org/wiki/Binary_GCD_algorithm
my question is : where can i find the material of CharMatcher.WHITESPACE.match implementation?

推荐答案

我不确定生成器是否仍然存在于某处,但可以轻松地重新创建。类结果包含 CharMatcher.WHITESPACE 的实施:

I'm not sure if the generator still exists somewhere, but it can be recreated easily. The class Result contains the data used in the implementation of CharMatcher.WHITESPACE:

static class Result {
    private int shift;
    private int multiplier;
    private String table;
}

// No duplicates allowed.
private final String allMatchingString = "\u2002\r\u0085\u200A\u2005\u2000"
        + "\u2029\u000B\u2008\u2003\u205F\u1680"
        + "\u0009\u0020\u2006\u2001\u202F\u00A0\u000C\u2009"
        + "\u2004\u2028\n\u2007\u3000";

public Result generate(String allMatchingString) {
    final char[] allMatching = allMatchingString.toCharArray();
    final char filler = allMatching[allMatching.length - 1];
    final int shift = Integer.numberOfLeadingZeros(allMatching.length);
    final char[] table = new char[1 << (32 - shift)];
    OUTER: for (int i=0; i>=0; ++i) {
        final int multiplier = 123456789 * i; // Jumping a bit makes the search faster.
        Arrays.fill(table, filler);
        for (final char c : allMatching) {
            final int index = (multiplier * c) >>> shift;
            if (table[index] != filler) continue OUTER; // Conflict found.
            table[index] = c;
        }
        return new Result(shift, multiplier, new String(table));
    }
    return null; // No solution exists.
}

它生成一个不同的乘数,但这没关系。

It generates a different multiplier, but this doesn't matter.

如果没有针对给定 allMatchingString 的解决方案,您可以减少转移并重试。

In case no solution for a given allMatchingString exists, you can decrement shift and try again.

这篇关于关于CharMatcher.WHITESPACE实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆