没有实际解析的TryParse或任何其他具有性能优势的文本格式检查选项 [英] TryParse without Actual Parsing or any other Alternative for Checking Text Format with Performance Benefit
问题描述
我目前正在创建自己的库,称为TextCheckerExtension
,该库基本上试图在进一步处理之前检查文本格式(如下所示的简短代码段).
I currently am making my own library, called TextCheckerExtension
which basically tries to check Text Format before further processing (short code snippet shown below).
现在,我知道我在做什么与Parse
或TryParse
十分相似.此与所有Parse
之间的唯一区别是,它不会生成任何已解析的对象.它只是检查字符串.
Now, I know what I am doing is quite similar to Parse
or TryParse
The only difference between this and all the Parse
is that this does not generate any parsed object. It simply checks the string.
我的问题是:
-
Parse
和TryParse
均生成已解析的对象.万一我们只想检查string
输入的有效性,生成解析对象的开销真的会影响方法的性能吗(这种情况下的任何示例)?也就是说,自行创建的检查方法不会生成解析的对象会更快. - C#中是否有其他方法(内置)可以在不生成解析对象的情况下检查各种字符串格式的有效性?
-
Regex
可以替代吗?
- Both
Parse
andTryParse
generate parsed object. Do the overhead of generating Parsed object in case we only want to check the validity of thestring
input really affects the performance of the methods (Any example for this case)? That is, self-created checking method without generating parsed object will be way faster. - Is there any alternative way (built-in) in C# to check various string format validity without generated parsed object?
- Could
Regex
be an alternative option?
对此事的任何投入将不胜感激.
Any input for this matter will be very much appreciated.
public static bool IsPureHex(string str) {
return IsPureHex(str, int.MaxValue); //assuming very high value!
}
public static bool IsPureHex(string str, int maxNibble) {
if (str.Length > maxNibble) //if the length is violated, it is considered failed
return false;
for (int i = 0; i < Math.Min(maxNibble, str.Length); i++)
if (!((char.IsDigit(str, i)) || ((str[i] >= 'A') && (str[i] <= 'F')) || ((str[i] >= 'a') && (str[i] <= 'f'))))
return false;
return true;
}
public static bool IsHex(string str) {
if (str.Length <= 2 || (str[0] != '0') || !((str[1] == 'x') || (str[1] == 'X'))) //Check input validity
return false;
for (int i = 2; i < str.Length; i++)
if (!((char.IsDigit(str, i)) || ((str[i] >= 'A') && (str[i] <= 'F')) || ((str[i] >= 'a') && (str[i] <= 'f'))))
return false;
return true;
}
public static bool IsFloat(string str) { //another criterion for float, giving "f" in the last part?
int dotCounter = 0;
for (int i = 0; i < str.Length; i++) { //Check if it is float
if (!(char.IsDigit(str, i)) && (str[i] != '.'))
return false;
else if (str[i] == '.')
++dotCounter; //Increase the dotCounter whenever dot is found
if (dotCounter > 1) //If there is more than one dot for whatever reason, return error
return false;
}
return dotCounter == 1 && str.Length > 1;
}
public static bool IsDigitsOnly(string str) {
foreach (char c in str)
if (c < '0' || c > '9')
return false;
return str.Length >= 1; //there must be at least one character here to continue
}
public static bool IsInt(string str) { //is not designed to handle null input or empty string
return str[0] == '-' && str.Length > 1 ? IsDigitsOnly(str.Substring(1)) : IsDigitsOnly(str);
}
推荐答案
它 确实 起作用.
It does make a difference.
令我惊讶的是:出于好奇,我继续进行此项目时,发现进行了真正的解析,并只是检查字符串是否具有特定格式 时间表现上的显着差异.
To my surprise: as I continue this project out of curiosity, I found out that doing the actual parsing and simply checking if a string is of certain format does make a significant difference in time performance.
在下面的实验中,通过创建没有解析器的检查器,与使用内置的TryParse
相比,我们可以获得33.77%至58.26%的时间增益.此外,我还将扩展名与Microsoft.VisualBasic.Information
dll中的VB.Net
IsNumeric
进行了比较.
In my experiment below, by creating checker without parser, we could gain 33.77% to 58.26% time gain as compared to using built-in TryParse
. In addition, I also compare my extension with VB.Net
IsNumeric
in Microsoft.VisualBasic.Information
dll.
以下是(1)测试代码,(2)测试方案,(3)测试代码和(4)测试结果(必要时在每个部分中添加注释):
Here are the (1) tested code, (2) testing scenario, (3) testing code, and (4) testing result (notes are added in each part whenever necessary):
这是经过测试的代码,我的扩展名为Extension.Checker.Text
.到目前为止,我只测试了通用integer
和float/double
(带/不带点-也许更好地称为分数级数)的方案. 泛型 integer
的意思是未选中最大值和最小值 range (例如,对于8位有符号整数,为-128到127).该代码仅用于确定文本是否为人类所理解的integer
,而无需查看其范围. float/double
也是如此.
Here is the tested code, my extension code named Extension.Checker.Text
. I only tested scenarios for generic integer
and float/double
(with/without dot - perhaps better termed fraction-ed number) so far. By generic integer
I mean that the maximum and minimum value range (such as -128 to 127 for 8-bit signed integer) is unchecked. This code is just to determine if a text is integer
as human understands it without looking at its range. That goes the same for float/double
.
与此帖子比较,该帖子有400 +在发布答案时对其答案表示赞同,我相信可以肯定地说,一般来说,我们将首先使用int.TryParse
测试文本是否为integer
(尽管其范围是有限的)到-2e9
到2e9
),以获取通用的integer
文本. 有些 帖子也显示出相同的趋势.我们从这些帖子中可以看到的另一种方法是通过Visual Basic
IsNumeric
进行检查.因此,我也在benchmarking
中包含了该方法.
Compare with this post which has 400+ upvotes on its answer by the time this answer is posted, I believe it is safe to assume that generally we will use int.TryParse
to test if a text is an integer
or not as a first try (albeit its range is limited to -2e9
to 2e9
) for generic integer
text. Some other posts also show the same trend alike. Another way which we could see from those posts are to check by Visual Basic
IsNumeric
. Thus, I included that method for the benchmarking
too.
public static bool IsFloatOrDoubleByDot(string str) { //another criterion for float, giving "f" in the last part?
if (string.IsNullOrWhiteSpace(str))
return false;
int dotCounter = 0;
for (int i = str[0] == '-' ? 1 : 0; i < str.Length; i++) { //Check if it is float
if (!(char.IsDigit(str, i)) && (str[i] != '.'))
return false;
else if (str[i] == '.')
++dotCounter; //Increase the dotCounter whenever dot is found
if (dotCounter > 1) //If there is more than one dot for whatever reason, return error
return false;
}
return dotCounter == 0 || dotCounter == 1 && str.Length > 1;
}
public static bool IsDigitsOnly(string str) {
foreach (char c in str)
if (c < '0' || c > '9')
return false;
return str.Length >= 1; //there must be at least one character here to continue
}
public static bool IsInt(string str) { //is not designed to handle null input or empty string
if (string.IsNullOrWhiteSpace(str))
return false;
return str[0] == '-' && str.Length > 1 ? IsDigitsOnly(str.Substring(1)) : IsDigitsOnly(str);
}
到目前为止,我已经测试了四种不同的情况:
So far, I have tested four different scenarios:
- 整数(在int.TryParse的可分析范围内)
- 包含
dot
的浮动文本(最大7位精度,在float.TryParse所能精确解析的范围内) - 包含
dot
的双精度文本(最大11位精度,在double.TryParse的准确解析范围内) - 整数文本读取为浮点/双精度文本(在double.TryParse的可解析范围内)
- integer (in the parse-able range by int.TryParse)
- float text containing
dot
(max of 7-digit precision, in the accurate parse-able range by float.TryParse) - double text containing
dot
(max of 11-digit precision, in the accurate parse-able range by double.TryParse) - integer text read as float/double text (in the parse-able range by double.TryParse)
对于每种情况,我有四种情况要测试:
And for each scenario, I have four cases to test:
- 有效的正值文本
- 有效的负值文本
- 无效的正值文本
- 无效的负值文本
对于每种情况,我通过以下方式测试了进行检查所需的时间:
And for each case I tested the time needed to do the checking by:
- 合适的
TryParse
- 合适的
Extension.Checker.Text
-
Visual Basic
IsNumeric
- 其他特定于类型的技巧,例如用于整数的string.All(char.IsDigit)
- Suitable
TryParse
- Suitable
Extension.Checker.Text
Visual Basic
IsNumeric
- Other type-specific tricks like string.All(char.IsDigit) for integer
为了测试上述情况,我使用以下数据:
To test the above scenarios, I use the following data:
string intpos = "1342517340";
string intneg = "-1342517340";
string intfalsepos = "134251734u";
string intfalseneg = "-134251734u";
string floatpos = "56.34251";
string floatneg = "-56.34251";
string floatfalsepos = "56.3425h";
string floatfalseneg = "-56.3425h";
string doublepos = "56.342515312";
string doubleneg = "-56.342515312";
string doublefalsepos = "56.34251531y";
string doublefalseneg = "-56.34251531y";
List<string> liststr = new List<string>() {
intpos, intneg, intfalsepos, intfalseneg,
floatpos, floatneg, floatfalsepos, floatfalseneg,
doublepos, doubleneg, doublefalsepos, doublefalseneg
};
List<string> liststrcode = new List<string>() {
"i+", "i-", "if+", "if-",
"f+", "f-", "ff+", "ff-",
"d+", "d-", "df+", "df-"
};
bool parsed = false; //to store checking result
int intval; //for int.TryParse result
float fval; //for float.TryParse result
double dval; //for double.TryParse result
文本代码的格式为.例子:
text code is in the format of . Examples:
- if + =整数假阳性
- f- =浮动负数
然后我使用以下测试循环来获取每种情况下每种方法的时间性能:
And I use the following testing loop to get the time performance of each method per case:
//time snap
for (int i = 0; i < 10000000; ++i) //for integer case
parsed = int.TryParse(str, out intval); //built-in TryParse
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
parsed = Extension.Checker.Text.IsInt(str); //extension Text checker
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
parsed = Information.IsNumeric(str); //Microsoft.VisualBasic
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
parsed = str[0] == '-' ? str.Substring(1).All(char.IsDigit) : str.All(char.IsDigit); //misc methods
//time snap
//Print the result
//Print the result difference
使用笔记本电脑,每种方法每个测试用例测试了多达1000万次迭代.
I tested as many as 10 million iterations per testing case per method using my laptop.
注意:请注意,我的Extension.Checker.Text
的行为与内置的TryParse
并不完全等效,例如检查字符串或其他格式的字符串的数值范围对于TryParse
情况是可以接受的,但对于我来说不是.这是因为Extension.Checker.Text
的主要目的不是必须将C#中的给定文本转换为内置TryParse
的某些数据类型.这就是我的Extension.Checker.Text
的重点.此处所做的比较仅是为了比较(从时间性能方面的优势)(1)
Note: it is noted that the behavior of my Extension.Checker.Text
is not completely equivalent with built-in TryParse
such as checking the range of the numerical value of the string or string with other formats which might be acceptable for TryParse
case but not in my case. This is because the main purpose of my Extension.Checker.Text
is not to necessarily convert the given text into certain data type in C# as built-in TryParse
. And that is the very point of my Extension.Checker.Text
. The comparisons made here is merely done to compare - in terms of time performance benefits - (1) the popular way of checking certain text format with (2) the extension method we could possibly made given that we do not need the result of the TryParse
, but only if a text is of certain format or not. That goes the same for comparison with VB IsNumeric
我打印出了parse/check
结果,以确保扩展名具有与内置TryParse
,VB.Net IsNumeric
以及给定情况下的其他替代技巧相同的结果.我还会打印原始文本,以方便阅读/检查.然后,通过测试之间的时间间隔,我可以获得每个测试用例的时间性能以及时差,我也将其打印出来.但是,时间增益比较仅使用TryParse
完成.这是完整的结果.
I printed out the parse/check
result to ensure that my extension has the same result as the built-in TryParse
, VB.Net IsNumeric
, and other alternative tricks for the given cases. I also print the original text for easy reading/checking. Then, by the time snap in between the testing, I could get the time performance as well as time difference for each testing case, which I also printed out. The time gain comparison however, is only done with the TryParse
. Here is the complete result.
[2016-01-05 06:04:25.466 UTC] Integer:
[2016-01-05 06:04:26.999 UTC] TryParse i+: 1531 ms Result: True Text: 1342517340
[2016-01-05 06:04:27.639 UTC] Extension i+: 639 ms Result: True Text: 1342517340
[2016-01-05 06:04:30.345 UTC] VB.IsNumeric i+: 2705 ms Result: True Text: 1342517340
[2016-01-05 06:04:31.468 UTC] All is digit i+: 1124 ms Result: True Text: 1342517340
[2016-01-05 06:04:31.469 UTC] Gain on TryParse i+: 892 ms Percent: -58.26%
[2016-01-05 06:04:31.469 UTC]
[2016-01-05 06:04:32.996 UTC] TryParse i-: 1527 ms Result: True Text: -1342517340
[2016-01-05 06:04:33.846 UTC] Extension i-: 849 ms Result: True Text: -1342517340
[2016-01-05 06:04:36.413 UTC] VB.IsNumeric i-: 2566 ms Result: True Text: -1342517340
[2016-01-05 06:04:37.693 UTC] All is digit i-: 1280 ms Result: True Text: -1342517340
[2016-01-05 06:04:37.694 UTC] Gain on TryParse i-: 678 ms Percent: -44.40%
[2016-01-05 06:04:37.694 UTC]
[2016-01-05 06:04:39.058 UTC] TryParse if+: 1364 ms Result: False Text: 134251734u
[2016-01-05 06:04:39.845 UTC] Extension if+: 786 ms Result: False Text: 134251734u
[2016-01-05 06:04:42.436 UTC] VB.IsNumeric if+: 2590 ms Result: False Text: 134251734u
[2016-01-05 06:04:43.540 UTC] All is digit if+: 1103 ms Result: False Text: 134251734u
[2016-01-05 06:04:43.540 UTC] Gain on TryParse if+: 578 ms Percent: -42.38%
[2016-01-05 06:04:43.540 UTC]
[2016-01-05 06:04:44.937 UTC] TryParse if-: 1397 ms Result: False Text: -134251734u
[2016-01-05 06:04:45.745 UTC] Extension if-: 807 ms Result: False Text: -134251734u
[2016-01-05 06:04:48.275 UTC] VB.IsNumeric if-: 2530 ms Result: False Text: -134251734u
[2016-01-05 06:04:49.541 UTC] All is digit if-: 1267 ms Result: False Text: -134251734u
[2016-01-05 06:04:49.542 UTC] Gain on TryParse if-: 590 ms Percent: -42.23%
[2016-01-05 06:04:49.542 UTC]
[2016-01-05 06:04:49.542 UTC] Float by Dot:
[2016-01-05 06:04:51.136 UTC] TryParse f+: 1594 ms Result: True Text: 56.34251
[2016-01-05 06:04:51.967 UTC] Extension f+: 830 ms Result: True Text: 56.34251
[2016-01-05 06:04:54.328 UTC] VB.IsNumeric f+: 2360 ms Result: True Text: 56.34251
[2016-01-05 06:04:54.329 UTC] Time Gain f+: 764 ms Percent: -47.93%
[2016-01-05 06:04:54.329 UTC]
[2016-01-05 06:04:55.962 UTC] TryParse f-: 1634 ms Result: True Text: -56.34251
[2016-01-05 06:04:56.790 UTC] Extension f-: 827 ms Result: True Text: -56.34251
[2016-01-05 06:04:59.102 UTC] VB.IsNumeric f-: 2313 ms Result: True Text: -56.34251
[2016-01-05 06:04:59.103 UTC] Time Gain f-: 807 ms Percent: -49.39%
[2016-01-05 06:04:59.103 UTC]
[2016-01-05 06:05:00.623 UTC] TryParse ff+: 1519 ms Result: False Text: 56.3425h
[2016-01-05 06:05:01.429 UTC] Extension ff+: 802 ms Result: False Text: 56.3425h
[2016-01-05 06:05:03.730 UTC] VB.IsNumeric ff+: 2301 ms Result: False Text: 56.3425h
[2016-01-05 06:05:03.730 UTC] Time Gain ff+: 717 ms Percent: -47.20%
[2016-01-05 06:05:03.731 UTC]
[2016-01-05 06:05:05.312 UTC] TryParse ff-: 1581 ms Result: False Text: -56.3425h
[2016-01-05 06:05:06.147 UTC] Extension ff-: 835 ms Result: False Text: -56.3425h
[2016-01-05 06:05:08.485 UTC] VB.IsNumeric ff-: 2337 ms Result: False Text: -56.3425h
[2016-01-05 06:05:08.486 UTC] Time Gain ff-: 746 ms Percent: -47.19%
[2016-01-05 06:05:08.486 UTC]
[2016-01-05 06:05:08.487 UTC] Double by Dot:
[2016-01-05 06:05:10.341 UTC] TryParse d+: 1854 ms Result: True Text: 56.342515312
[2016-01-05 06:05:11.492 UTC] Extension d+: 1151 ms Result: True Text: 56.342515312
[2016-01-05 06:05:14.035 UTC] VB.IsNumeric d+: 2541 ms Result: True Text: 56.342515312
[2016-01-05 06:05:14.035 UTC] Time Gain d+: 703 ms Percent: -37.92%
[2016-01-05 06:05:14.036 UTC]
[2016-01-05 06:05:15.916 UTC] TryParse d-: 1879 ms Result: True Text: -56.342515312
[2016-01-05 06:05:17.051 UTC] Extension d-: 1133 ms Result: True Text: -56.342515312
[2016-01-05 06:05:19.542 UTC] VB.IsNumeric d-: 2492 ms Result: True Text: -56.342515312
[2016-01-05 06:05:19.543 UTC] Time Gain d-: 746 ms Percent: -39.70%
[2016-01-05 06:05:19.543 UTC]
[2016-01-05 06:05:21.210 UTC] TryParse df+: 1667 ms Result: False Text: 56.34251531y
[2016-01-05 06:05:22.315 UTC] Extension df+: 1104 ms Result: False Text: 56.34251531y
[2016-01-05 06:05:24.797 UTC] VB.IsNumeric df+: 2481 ms Result: False Text: 56.34251531y
[2016-01-05 06:05:24.798 UTC] Time Gain df+: 563 ms Percent: -33.77%
[2016-01-05 06:05:24.798 UTC]
[2016-01-05 06:05:26.509 UTC] TryParse df-: 1711 ms Result: False Text: -56.34251531y
[2016-01-05 06:05:27.596 UTC] Extension df-: 1086 ms Result: False Text: -56.34251531y
[2016-01-05 06:05:30.039 UTC] VB.IsNumeric df-: 2442 ms Result: False Text: -56.34251531y
[2016-01-05 06:05:30.040 UTC] Time Gain df-: 625 ms Percent: -36.53%
[2016-01-05 06:05:30.041 UTC]
[2016-01-05 06:05:30.041 UTC] Integer as Double by Dot:
[2016-01-05 06:05:31.794 UTC] TryParse (doubled) i+: 1752 ms Result: True Text: 1342517340
[2016-01-05 06:05:32.904 UTC] Extension (doubled) i+: 1109 ms Result: True Text: 1342517340
[2016-01-05 06:05:35.590 UTC] VB.IsNumeric (doubled) d+: 2684 ms Result: True Text: 1342517340
[2016-01-05 06:05:35.590 UTC] Time Gain d+: 643 ms Percent: -36.70%
[2016-01-05 06:05:35.591 UTC]
[2016-01-05 06:05:37.390 UTC] TryParse (doubled) i-: 1799 ms Result: True Text: -1342517340
[2016-01-05 06:05:38.515 UTC] Extension (doubled) i-: 1125 ms Result: True Text: -1342517340
[2016-01-05 06:05:41.139 UTC] VB.IsNumeric (doubled) d-: 2623 ms Result: True Text: -1342517340
[2016-01-05 06:05:41.139 UTC] Time Gain d-: 674 ms Percent: -37.47%
[2016-01-05 06:05:41.140 UTC]
[2016-01-05 06:05:42.840 UTC] TryParse (doubled) if+: 1700 ms Result: False Text: 134251734u
[2016-01-05 06:05:43.933 UTC] Extension (doubled) if+: 1092 ms Result: False Text: 134251734u
[2016-01-05 06:05:46.575 UTC] VB.IsNumeric (doubled) df+: 2642 ms Result: False Text: 134251734u
[2016-01-05 06:05:46.576 UTC] Time Gain df+: 608 ms Percent: -35.76%
[2016-01-05 06:05:46.577 UTC]
[2016-01-05 06:05:48.328 UTC] TryParse (doubled) if-: 1750 ms Result: False Text: -134251734u
[2016-01-05 06:05:49.434 UTC] Extension (doubled) if-: 1106 ms Result: False Text: -134251734u
[2016-01-05 06:05:52.042 UTC] VB.IsNumeric (doubled) df-: 2607 ms Result: False Text: -134251734u
[2016-01-05 06:05:52.042 UTC] Time Gain df-: 644 ms Percent: -36.80%
[2016-01-05 06:05:52.043 UTC]
到目前为止,我从结果中得出的结论是:
The conclusions I got from the results so far:
- 当文本类型为有效正整数时,使用上述扩展方法可以获得的最佳性能.时间 在给定的条件下,我们可以获得的性能提升高达58.26% 案子.可能是由于有效正整数文本的简单性.
- 当文本类型为无效的正双精度时,使用上述扩展方法可以获得的最糟糕的性能提升.时间 在给定的条件下,我们可以获得的性能提升仅为33.77% 情况.
- 对于整数和浮点/双精度(带/不带点)文本格式,要检查文本是否为那些格式而无需实际解析,可以通过构建自己的文本来加快检查过程与使用内置
TryParse
相比,文本扩展名检查器.在所有情况下,VBIsNumeric
都比其他情况慢(这也令我惊讶,因为根据此
- Best performance gain we can obtain using an extension method such as above is when the text type is valid positive integer. The time performance gain we could get is as much as 58.26% for the given case. Perhaps this owes to the simplicity of the valid positive integer text.
- Worst performance gain we can obtain using an extension method such as above is when the text type is invalid positive double. The time performance gain we could get is only as much as 33.77% for the given case.
- For the integer and float/double (with/without dot) text format, to check if a text is of those formats without the need to actually parse it yet, it is possible to speed up the checking process by building our own text extension checker as compared to using built-in
TryParse
. VBIsNumeric
is rather slower than the rests for all cases (this is also to my surprise, because according to the benchmarking in this post, VB seems to be pretty fast - though not the best).
此扩展名检查的一种可能用法是,当您收到某个字符串并且您知道它可以具有多种格式类型(例如,整数或双精度)时,但是您想检查实际的文本类型首先在检查时不进行实际解析.对于这种情况,扩展方法可以加快处理过程.
One possible use of this extension checking is in the case where you receive a certain string and you know that it can be of more than one format types (say, integer or double), but you want to check the actual text type first without an actual parsing at the time of checking. For such given case, an extension method may speed up the process.
另一种用途是在计算语言学区域,在该区域中,您经常想知道文本的类型,而无需实际解析该文本以进行计算.
Another use is in the computational linguistic area, where often you want to know the type a text without actually parsing it to be used computationally.
这篇关于没有实际解析的TryParse或任何其他具有性能优势的文本格式检查选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!