检查字符串是否可以被解析的最快方法 [英] Fastest way to check if a string can be parsed

查看:133
本文介绍了检查字符串是否可以被解析的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将CSV文件解析为具有强类型属性的对象列表。这涉及将文件中的每个字符串值解析为 IConvertible 类型( int decimal double DateTime 等)使用 TypeDescriptor



我正在使用 try catch 来处理解析失败时的情况。然后记录这个异常发生的位置和原因的具体细节,以便进一步调查。以下是实际的解析代码:

  try 
{
parsedValue = TypeDescriptor.GetConverter(type) ConvertFromString(dataValue);
}
catch(Exception ex)
{
//日志失败
}

问题:



当成功解析值时,过程很快。当解析大量无效数据的数据时,该过程可能需要数千倍的速度(由于捕获异常)。



我已经通过解析到日期时间。这些是性能数据:




  • 成功解析:每个解析
  • $ b的平均值 32个刻度 $ b
  • 解析失败:每个解析



平均值<146296 ticks 超过4500次



问题:



我可以检查看看可以成功解析字符串值,而无需使用我的昂贵的 try catch 方法?或者还有另一种方式我应该这样做?



编辑:我需要使用 TypeDescriptor (而不是 DateTime.TryParse ),因为类型是在运行时确定的。

解决方案

如果您有一组已知的转换类型,则可以执行一系列 if / elseif / elseif / else (或 switch / case 在类型名称)基本上分配到专门的解析方法。这应该很快这一点与 @ Fabio的回答中所述。



如果你仍然有性能问题,您还可以创建一个查找表,让您可以添加新的解析方法,因为您需要支持它们:



给定一些基本的解析包装器: p>

  public delegate bool TryParseMethod< T>(string input,out T value); 

public interface ITryParser
{
bool TryParse(string input,out object value);
}

public class TryParser< T> :ITryParser
{
private TryParseMethod< T> ParsingMethod;

public TryParser(TryParseMethod< T> parsingMethod)
{
this.ParsingMethod = parsingMethod;
}

public bool TryParse(string input,out object value)
{
T parsedOutput;
bool success = ParsingMethod(input,out parsedOutput);
value = parsedOutput;
返回成功;
}
}

然后,您可以设置一个转换帮助器,并调用相应的解析器:

  public static class DataConversion 
{
private static Dictionary< Type,ITryParser> ;解析器;

static DataConversion()
{
Parsers = new Dictionary< Type,ITryParser>();
AddParser< DateTime>(DateTime.TryParse);
AddParser< int>(Int32.TryParse);
AddParser< double>(Double.TryParse);
AddParser< decimal>(Decimal.TryParse);
AddParser< string>((string input,out string value)=> {value = input; return true;});
}

public static void AddParser< T>(TryParseMethod< T> parseMethod)
{
Parsers.Add(typeof(T),new TryParser& (parseMethod));
}

public static bool Convert< T>(string input,out T value)
{
object parseResult;
bool success = Convert(typeof(T),input,out parseResult);
if(success)
value =(T)parseResult;
else
value = default(T);
返回成功;
}

public static bool Convert(Type type,string input,out object value)
{
ITryParser解析器;
if(Parsers.TryGetValue(type,out parser))
return parser.TryParse(input,out value);
else
抛出新的NotSupportedException(String.Format(指定的类型\{0} \不支持。,type.FullName));
}
}

然后使用可能就像:

  //在编译时为已知类型
int value;
if(!DataConversion.Convert< int>(3,out value))
{
// log failure
}

//或在编译时为未知类型:
对象值;
if(!DataConversion.Convert(myType,dataValue,out value))
{
// log failure
}
pre>

这可能会扩展泛型,以避免对象拳击和类型转换,但正如它所示工作正常;如果您有可测量的性能,可能只会优化该方面。



编辑:您可以更新 DataConversion.Convert 方法,以便如果没有指定的转换器注册,它可以退回到您的 TypeConverter 方法或抛出适当的异常。如果您想拥有全部或只需要预定义的一组支持类型,并且避免再次将 try / catch 重新设置,这取决于您。就这样,代码已被更新,以引发一个 NotSupportedException ,并显示一条消息,指示不受支持的类型。随意调整,因为它是有道理的。性能明智,也许这样做是有道理的,因为对于最常用的类型,一旦你指定了专门的解析器,那么可能会更少和更远。


I am parsing CSV files to lists of objects with strongly-typed properties. This involves parsing each string value from the file to an IConvertible type (int, decimal, double, DateTime, etc) using TypeDescriptor.

I am using a try catch to handle situations when parsing fails. The exact details of where and why this exception occurs is then logged for further investigation. Below is the actually parsing code:

try
{
    parsedValue = TypeDescriptor.GetConverter(type).ConvertFromString(dataValue);
}
catch (Exception ex)
{
    // Log failure
}

Problem:

When values are successfully parsed, the process is quick. When parsing data with lots of invalid data, the process can take thousands of times slower (due to catching the exception).

I've been testing this with parsing to DateTime. These are the performance figures:

  • Successful parsing: average of 32 ticks per parse
  • Failed parsing: average of 146296 ticks per parse

That's more than 4500 times slower.

Question:

Is it possible for me to check to see if a string value can be successfully parsed without having to use my expensive try catch method? Or perhaps there is another way I should be doing this?

EDIT: I need to use TypeDescriptor (and not DateTime.TryParse) because the type is determined at runtime.

解决方案

If you have a known set of types to convert, you can do a series of if/elseif/elseif/else (or switch/case on the type name) to essentially distribute it to specialized parsing methods. This should be pretty fast. This is as described in @Fabio's answer.

If you still have performance issues, you can also create a lookup table which will let you add new parsing methods as you need to support them:

Given some basic parsing wrappers:

public delegate bool TryParseMethod<T>(string input, out T value);

public interface ITryParser
{
    bool TryParse(string input, out object value);
}

public class TryParser<T> : ITryParser
{
    private TryParseMethod<T> ParsingMethod;

    public TryParser(TryParseMethod<T> parsingMethod)
    {
        this.ParsingMethod = parsingMethod;
    }

    public bool TryParse(string input, out object value)
    {
        T parsedOutput;
        bool success = ParsingMethod(input, out parsedOutput);
        value = parsedOutput;
        return success;
    }
}

You can then setup a conversion helper which does the lookup and calls the appropriate parser:

public static class DataConversion
{
    private static Dictionary<Type, ITryParser> Parsers;

    static DataConversion()
    {
        Parsers = new Dictionary<Type, ITryParser>();
        AddParser<DateTime>(DateTime.TryParse);
        AddParser<int>(Int32.TryParse);
        AddParser<double>(Double.TryParse);
        AddParser<decimal>(Decimal.TryParse);
        AddParser<string>((string input, out string value) => {value = input; return true;});
    }

    public static void AddParser<T>(TryParseMethod<T> parseMethod)
    {
        Parsers.Add(typeof(T), new TryParser<T>(parseMethod));
    }

    public static bool Convert<T>(string input, out T value)
    {
        object parseResult;
        bool success = Convert(typeof(T), input, out parseResult);
        if (success)
            value = (T)parseResult;
        else
            value = default(T);
        return success;
    }

    public static bool Convert(Type type, string input, out object value)
    {
        ITryParser parser;
        if (Parsers.TryGetValue(type, out parser))
            return parser.TryParse(input, out value);
        else
            throw new NotSupportedException(String.Format("The specified type \"{0}\" is not supported.", type.FullName));
    }
}

Then usage might be like:

//for a known type at compile time
int value;
if (!DataConversion.Convert<int>("3", out value))
{
    //log failure
}

//or for unknown type at compile time:
object value;
if (!DataConversion.Convert(myType, dataValue, out value))
{
    //log failure
}

This could probably have the generics expanded on to avoid object boxing and type casting, but as it stands this works fine; perhaps only optimize that aspect if you have a measurable performance from it.

EDIT: You can update the DataConversion.Convert method so that if it doesn't have the specified converter registered, it can fall-back to your TypeConverter method or throw an appropriate exception. It's up to you if you want to have a catch-all or simply have your predefined set of supported types and avoid having your try/catch all over again. As it stands, the code has been updated to throw a NotSupportedException with a message indicating the unsupported type. Feel free to tweak as it makes sense. Performance wise, maybe it makes sense to do the catch-all as perhaps those will be fewer and far between once you specify specialized parsers for the most commonly used types.

这篇关于检查字符串是否可以被解析的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆