你可以提出一个更优雅的方式来“标记”HTML格式的C#代码? [英] Can you propose a more elegant way to 'tokenize' c# code for html formatting?

查看:252
本文介绍了你可以提出一个更优雅的方式来“标记”HTML格式的C#代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题关于重构F#代码给我一个投票,但也有一些有趣的和有用的答案。在32,000以上SO中的62个F#问题似乎可怜,所以我将冒更多的不赞同的风险!)



我昨天想在博客博客上发布一些代码,然后转向这个网站,我发现过去很有用。然而,博客编辑吃了所有的风格声明,结果竟然是死胡同。所以(和其他黑客一样),我想它有多难是?并在F#的100行中滚动。



以下是代码的肉,将输入字符串转换为记号列表。请注意,这些标记不应与lexing / parsing-style标记混淆。我简单地看了一下,尽管我几乎没有任何理解,但是我明白,他们只会给我标记,而我想保留原始字符串。



现在的问题是:是否有一个更优雅的方式来做到这一点?我不喜欢从输入字符串中删除每个标记字符串所需的n个重新定义,但由于诸如注释,字符串和#region指令(这是包含一个非单词字符)。

pre $ code $ //我们要检测的令牌类型
类型令牌=
|字符串
|的空格字符串
|的评论字符串
|的字符串字符串
|的关键字字符串
|的文本EOF

//把一个字符串转换成一个被识别的令牌列表
让tokenize(s:String)=
//这是解析器 - 我们应该看看编译正则表达式提前?
让nexttoken(st:String)=

|匹配st当Regex.IsMatch(st,^ \s +) - > Whitespace(Regex.Match(st,^ \s +)。Value)
| st当Regex.IsMatch(st,^ //。*?\r?\\\
) - >评论(Regex.Match(st,^ //。*?\r?\\\
)。Value)//这是双斜线样式注释
| st当Regex.IsMatch(st,^ / \ *(。| [\r?\\\
])*?\ * /) - > Comment(Regex.Match(st,^ / \ *(。| [\r?\\\
])*?\ * /)。Value)// / * * / style comments http:// ostermiller.org/findcomment.html
| st当Regex.IsMatch(st,@^([^\\ | | \\。|)*) - > Strng(Regex.Match(st,@^([^\\] | \\。|)*)。Value)// unescaped =([ ^\\] | \\。|)*http://wordaligned.org/articles/string-literals-and-regular-expressions
| st当Regex.IsMatch(st ,^#(end)?region) - > Keyword(Regex.Match(st,^#(end)?region)。 >
匹配Regex.Match(st,@^ [^\s] *)。 x当iskeyword x - >关键字(x)// iskeyword使用Microsoft.CSharp.CSharpCodeProvider.IsValidIdentifier - 有点脆弱...
| x - >文本(x)
| _ - > ; EOF

//使用下一个标记将字符串转换为标记列表
让tokeneater s =
让rec循环s acc =
let t = nexttoken s

匹配| EOF - > List.rev acc //返回累加器(有将其倒转,因为使用尾递归向后建立)
| Whitespace(x)|评论(x)
|关键字(x)|文字(x)| Strng(x) - >
循环(s.Remove(0,x.Length))(t :: acc)//尾递归
循环s []

tokeneater s

(如果有人真的感兴趣,我很乐意发布其余的代码)

编辑
使用出色的建议
$ / $ / $ / $ /

 让nexttoken(st:String)= 
匹配st与
|匹配^ \s +s - >空格($)
|匹配^ //。*?\r?(\\\
| $)s - >评论(s)//这是双斜线式的评论
|匹配^ / \ *(。| [\r?\\\
])*?\ * /s - >评论(s)// / * * /样式评论http://ostermiller.org/findcomment.html
|匹配@^ @?([^\\] | \\。|)*s - > Strng(s)// unescaped regexp = ^ @?([^\\] | \\。|)*http://wordaligned.org/articles/string-literals-and-正则表达式
|匹配^#(end)?regions - >关键字
|匹配@^ [^\s] +s - > / /所有文本,直到下一个空白或报价(这可能是错误的)
匹配s与
| IsKeyword x - >关键字
| _ - >文本
| _ - > EOF


使用活动模式来封装Regex.IsMatch和Regex.Match对,如下所示:

  let(| Matches | _ | )re s = 
let m = Regex(re).Match(s)
if m.Success then
Some(Matches(m.Value))
else
None

然后你的nexttoken函数可以是这样的:

  let nexttoken(st:String)= 
匹配st与
|匹配^ s +s - >空格(s)
|匹配^ //。*?\r?\\\
s - >评论
...


(This question about refactoring F# code got me one down vote, but also some interesting and useful answers. And 62 F# questions out of the 32,000+ on SO seems pitiful, so I'm going to take the risk of more disapproval!)

I was trying to post a bit of code on a blogger blog yesterday, and turned to this site, which I had found useful in the past. However, the blogger editor ate all the style declarations, so that turned out to be a dead end.

So (like any hacker), I thought "how hard can it be?" and rolled my own in <100 lines of F#.

Here is the 'meat' of the code, which turns an input string into a list of 'tokens'. Note that these tokens aren't to be confused with the lexing/parsing-style tokens. I did look at those briefly, and though I hardly understood anything, I did understand that they would give me only tokens, whereas I want to keep my original string.

The question is: is there a more elegant way of doing this? I don't like the n re-definitions of s required to remove each token string from the input string, but it's difficult to split the string into potential tokens in advance, because of things like comments, strings and the #region directive (which contains a non-word character).

//Types of tokens we are going to detect
type Token = 
    | Whitespace of string
    | Comment of string
    | Strng of string
    | Keyword of string
    | Text of string
    | EOF

//turn a string into a list of recognised tokens
let tokenize (s:String) = 
    //this is the 'parser' - should we look at compiling the regexs in advance?
    let nexttoken (st:String) = 
        match st with
        | st when Regex.IsMatch(st, "^\s+") -> Whitespace(Regex.Match(st, "^\s+").Value)
        | st when Regex.IsMatch(st, "^//.*?\r?\n") -> Comment(Regex.Match(st, "^//.*?\r?\n").Value) //this is double slash-style comments
        | st when Regex.IsMatch(st, "^/\*(.|[\r?\n])*?\*/") -> Comment(Regex.Match(st, "^/\*(.|[\r?\n])*?\*/").Value) // /* */ style comments http://ostermiller.org/findcomment.html
        | st when Regex.IsMatch(st, @"^""([^""\\]|\\.|"""")*""") -> Strng(Regex.Match(st, @"^""([^""\\]|\\.|"""")*""").Value) // unescaped = "([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions
        | st when Regex.IsMatch(st, "^#(end)?region") -> Keyword(Regex.Match(st, "^#(end)?region").Value)
        | st when st <> "" -> 
                match Regex.Match(st, @"^[^""\s]*").Value with //all text until next whitespace or quote (this may be wrong)
                | x when iskeyword x -> Keyword(x)  //iskeyword uses Microsoft.CSharp.CSharpCodeProvider.IsValidIdentifier - a bit fragile...
                | x -> Text(x)
        | _ -> EOF

    //tail-recursive use of next token to transform string into token list
    let tokeneater s = 
        let rec loop s acc = 
            let t = nexttoken s
            match t with
            | EOF -> List.rev acc //return accumulator (have to reverse it because built backwards with tail recursion)
            | Whitespace(x) | Comment(x) 
            | Keyword(x) | Text(x) | Strng(x) -> 
                loop (s.Remove(0, x.Length)) (t::acc)  //tail recursive
        loop s []

    tokeneater s

(If anyone is really interested, I am happy to post the rest of the code)

EDIT Using the excellent suggestion of active patterns by kvb, the central bit looks like this, much better!

let nexttoken (st:String) = 
    match st with
    | Matches "^\s+" s -> Whitespace(s)
    | Matches "^//.*?\r?(\n|$)" s -> Comment(s) //this is double slash-style comments
    | Matches "^/\*(.|[\r?\n])*?\*/" s -> Comment(s)  // /* */ style comments http://ostermiller.org/findcomment.html
    | Matches @"^@?""([^""\\]|\\.|"""")*""" s -> Strng(s) // unescaped regexp = ^@?"([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions
    | Matches "^#(end)?region" s -> Keyword(s) 
    | Matches @"^[^""\s]+" s ->   //all text until next whitespace or quote (this may be wrong)
            match s with
            | IsKeyword x -> Keyword(s)
            | _ -> Text(s)
    | _ -> EOF

解决方案

I'd use an active pattern to encapsulate the Regex.IsMatch and Regex.Match pairs, like so:

let (|Matches|_|) re s =
  let m = Regex(re).Match(s)
  if m.Success then
    Some(Matches (m.Value))
  else
    None

Then your nexttoken function can look like:

let nexttoken (st:String) =         
  match st with        
  | Matches "^s+" s -> Whitespace(s)        
  | Matches "^//.*?\r?\n" s -> Comment(s)
  ...

这篇关于你可以提出一个更优雅的方式来“标记”HTML格式的C#代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆