你可以提出一个更优雅的方式来“标记”HTML格式的C＃代码？ [英] Can you propose a more elegant way to 'tokenize' c# code for html formatting?

查看：252 发布时间：2018/2/4 11:41:54 html regex f# formatting

本文介绍了你可以提出一个更优雅的方式来“标记”HTML格式的C＃代码？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

（这个问题关于重构F＃代码给我一个投票，但也有一些有趣的和有用的答案。在32,000以上SO中的62个F＃问题似乎可怜，所以我将冒更多的不赞同的风险！）

我昨天想在博客博客上发布一些代码，然后转向这个网站，我发现过去很有用。然而，博客编辑吃了所有的风格声明，结果竟然是死胡同。所以（和其他黑客一样），我想它有多难是？并在F＃的100行中滚动。

以下是代码的肉，将输入字符串转换为记号列表。请注意，这些标记不应与lexing / parsing-style标记混淆。我简单地看了一下，尽管我几乎没有任何理解，但是我明白，他们只会给我标记，而我想保留原始字符串。

现在的问题是：是否有一个更优雅的方式来做到这一点？我不喜欢从输入字符串中删除每个标记字符串所需的n个重新定义，但由于诸如注释，字符串和#region指令（这是包含一个非单词字符）。

pre $ code $ //我们要检测的令牌类型
类型令牌=
|字符串
|的空格字符串
|的评论字符串
|的字符串字符串
|的关键字字符串
|的文本EOF

//把一个字符串转换成一个被识别的令牌列表
让tokenize（s：String）=
//这是解析器 - 我们应该看看编译正则表达式提前？
让nexttoken（st：String）=
与
|匹配st当Regex.IsMatch（st，^ \s +） - > Whitespace（Regex.Match（st，^ \s +）。Value）
| st当Regex.IsMatch（st，^ //。*？\r？\\\
） - >评论（Regex.Match（st，^ //。*？\r？\\\
）。Value）//这是双斜线样式注释
| st当Regex.IsMatch（st，^ / \ *（。| [\r？\\\
]）*？\ * /） - > Comment（Regex.Match（st，^ / \ *（。| [\r？\\\
]）*？\ * /）。Value）// / * * / style comments http：// ostermiller.org/findcomment.html
| st当Regex.IsMatch（st，@^（[^\\ | | \\。|）*） - > Strng（Regex.Match（st，@^（[^\\] | \\。|）*）。Value）// unescaped =（[ ^\\] | \\。|）*http://wordaligned.org/articles/string-literals-and-regular-expressions
| st当Regex.IsMatch（st ，^＃（end）？region） - > Keyword（Regex.Match（st，^＃（end）？region）。 >
匹配Regex.Match（st，@^ [^\s] *）。 x当iskeyword x - >关键字（x）// iskeyword使用Microsoft.CSharp.CSharpCodeProvider.IsValidIdentifier - 有点脆弱...
| x - >文本（x）
| _ - > ; EOF

//使用下一个标记将字符串转换为标记列表
让tokeneater s =
让rec循环s acc =
let t = nexttoken s
与
匹配| EOF - > List.rev acc //返回累加器（有将其倒转，因为使用尾递归向后建立）
| Whitespace（x）|评论（x）
|关键字（x）|文字（x）| Strng（x） - >
循环（s.Remove（0，x.Length））（t :: acc）//尾递归
循环s []

tokeneater s

（如果有人真的感兴趣，我很乐意发布其余的代码）

编辑
使用出色的建议
$ / $ / $ / $ /

让nexttoken（st：String）= 匹配st与 |匹配^ \s +s - >空格（$） |匹配^ //。*？\r？（\\\ | $）s - >评论（s）//这是双斜线式的评论 |匹配^ / \ *（。| [\r？\\\ ]）*？\ * /s - >评论（s）// / * * /样式评论http://ostermiller.org/findcomment.html |匹配@^ @？（[^\\] | \\。|）*s - > Strng（s）// unescaped regexp = ^ @？（[^\\] | \\。|）*http://wordaligned.org/articles/string-literals-and-正则表达式 |匹配^＃（end）？regions - >关键字 |匹配@^ [^\s] +s - > / /所有文本，直到下一个空白或报价（这可能是错误的）匹配s与 | IsKeyword x - >关键字 | _ - >文本 | _ - > EOF

解决方案
使用活动模式来封装Regex.IsMatch和Regex.Match对，如下所示：

let（| Matches | _ | ）re s = let m = Regex（re）.Match（s） if m.Success then Some（Matches（m.Value）） else None
然后你的nexttoken函数可以是这样的：

let nexttoken（st：String）= 匹配st与 |匹配^ s +s - >空格（s） |匹配^ //。*？\r？\\\ s - >评论 ...

(This question about refactoring F# code got me one down vote, but also some interesting and useful answers. And 62 F# questions out of the 32,000+ on SO seems pitiful, so I'm going to take the risk of more disapproval!)

I was trying to post a bit of code on a blogger blog yesterday, and turned to this site, which I had found useful in the past. However, the blogger editor ate all the style declarations, so that turned out to be a dead end.

So (like any hacker), I thought "how hard can it be?" and rolled my own in <100 lines of F#.

Here is the 'meat' of the code, which turns an input string into a list of 'tokens'. Note that these tokens aren't to be confused with the lexing/parsing-style tokens. I did look at those briefly, and though I hardly understood anything, I did understand that they would give me only tokens, whereas I want to keep my original string.

The question is: is there a more elegant way of doing this? I don't like the n re-definitions of s required to remove each token string from the input string, but it's difficult to split the string into potential tokens in advance, because of things like comments, strings and the #region directive (which contains a non-word character).
//Types of tokens we are going to detect type Token = | Whitespace of string | Comment of string | Strng of string | Keyword of string | Text of string | EOF //turn a string into a list of recognised tokens let tokenize (s:String) = //this is the 'parser' - should we look at compiling the regexs in advance? let nexttoken (st:String) = match st with | st when Regex.IsMatch(st, "^\s+") -> Whitespace(Regex.Match(st, "^\s+").Value) | st when Regex.IsMatch(st, "^//.*?\r?\n") -> Comment(Regex.Match(st, "^//.*?\r?\n").Value) //this is double slash-style comments | st when Regex.IsMatch(st, "^/\*(.|[\r?\n])*?\*/") -> Comment(Regex.Match(st, "^/\*(.|[\r?\n])*?\*/").Value) // /* */ style comments http://ostermiller.org/findcomment.html | st when Regex.IsMatch(st, @"^""([^""\\]|\\.|"""")*""") -> Strng(Regex.Match(st, @"^""([^""\\]|\\.|"""")*""").Value) // unescaped = "([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions | st when Regex.IsMatch(st, "^#(end)?region") -> Keyword(Regex.Match(st, "^#(end)?region").Value) | st when st <> "" -> match Regex.Match(st, @"^[^""\s]*").Value with //all text until next whitespace or quote (this may be wrong) | x when iskeyword x -> Keyword(x) //iskeyword uses Microsoft.CSharp.CSharpCodeProvider.IsValidIdentifier - a bit fragile... | x -> Text(x) | _ -> EOF //tail-recursive use of next token to transform string into token list let tokeneater s = let rec loop s acc = let t = nexttoken s match t with | EOF -> List.rev acc //return accumulator (have to reverse it because built backwards with tail recursion) | Whitespace(x) | Comment(x) | Keyword(x) | Text(x) | Strng(x) -> loop (s.Remove(0, x.Length)) (t::acc) //tail recursive loop s [] tokeneater s
(If anyone is really interested, I am happy to post the rest of the code)

EDIT Using the excellent suggestion of active patterns by kvb, the central bit looks like this, much better!
let nexttoken (st:String) = match st with | Matches "^\s+" s -> Whitespace(s) | Matches "^//.*?\r?(\n|$)" s -> Comment(s) //this is double slash-style comments | Matches "^/\*(.|[\r?\n])*?\*/" s -> Comment(s) // /* */ style comments http://ostermiller.org/findcomment.html | Matches @"^@?""([^""\\]|\\.|"""")*""" s -> Strng(s) // unescaped regexp = ^@?"([^"\\]|\\.|"")*" http://wordaligned.org/articles/string-literals-and-regular-expressions | Matches "^#(end)?region" s -> Keyword(s) | Matches @"^[^""\s]+" s -> //all text until next whitespace or quote (this may be wrong) match s with | IsKeyword x -> Keyword(s) | _ -> Text(s) | _ -> EOF

解决方案
I'd use an active pattern to encapsulate the Regex.IsMatch and Regex.Match pairs, like so:
let (|Matches|_|) re s = let m = Regex(re).Match(s) if m.Success then Some(Matches (m.Value)) else None
Then your nexttoken function can look like:
let nexttoken (st:String) = match st with | Matches "^s+" s -> Whitespace(s) | Matches "^//.*?\r?\n" s -> Comment(s) ...

这篇关于你可以提出一个更优雅的方式来“标记”HTML格式的C＃代码？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

你可以提出一个更优雅的方式来“标记”HTML格式的C＃代码？ [英] Can you propose a more elegant way to 'tokenize' c# code for html formatting?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

你可以提出一个更优雅的方式来“标记”HTML格式的C＃代码？ [英] Can you propose a more elegant way to &#39;tokenize&#39; c# code for html formatting?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

你可以提出一个更优雅的方式来“标记”HTML格式的C＃代码？ [英] Can you propose a more elegant way to 'tokenize' c# code for html formatting?

登录关闭