Ruby:解析简单的markdown文件(结构相似,但结构不相等),然后将内容填充到对象的属性中 [英] Ruby: Parse a simple markdown files (having similar, but not equal structure) and fill contents into object's attributes

查看:36
本文介绍了Ruby:解析简单的markdown文件(结构相似,但结构不相等),然后将内容填充到对象的属性中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个装有Markdown文件的文件夹.我每个人都想阅读以下Ruby对象:

  class文件attr_accessor:title,:description,:content结尾 

降价文件通常如下所示:

 #这是标题这是一些描述.甚至更多的描述.##这是一个h2bla bla.##这是另一个h2更多bla bla.###这甚至是h3再次,更多bla bla.##再次是h2等等等 

这应该导致这个Ruby对象:

 文件:h1:这是标题"description:这是一些说明.\ n \ n,甚至更多说明."内容:"##这是h2 ...等.等等." 

要将文件的内容分配给Ruby对象的定义,我可以简单地使用一个正则表达式来提取 title (第一个H1), description (文本在H1和随后的H2之间)和 content (其余所有内容).

但是文件并不总是看起来像这样:

  • 有时没有H1
    • (如果是,则文件名将用于 title )
  • 有时,没有描述
  • 有时候,没有内容

这些异常可以组合出现,即.没有H1和说明的文件:

  ##这是一个h2bla bla.##这是另一个h2更多bla bla. 

这应该导致这个Ruby对象:

 文件:h1:无说明:无内容:## 这是一个 h2...更多 bla bla." 

或带有H1但没有描述的文件:

 #这是标题##这是一个h2bla bla. 

这应该导致这个Ruby对象:

 文件:h1:这是标题"说明:无内容:"##这是h2 ... Bla bla. 

或没有H1而是描述的文件:

 这是一个描述.一些更多的描述.##这是一个h2bla bla. 

这应该导致这个Ruby对象:

 文件:h1:无说明:这是一个说明...更多说明.内容:"##这是h2 ... Bla bla. 

我想知道是否可以使用单个奇特的正则表达式来做到这一点(我不是专家),还是应该尝试以某种方式将其拆分为多个处理步骤.我在这里问了类似的问题:解决方案

给出示例:

examples = []例子<<<< -EOS#这是标题这是一些描述.甚至更多的描述.##这是一个h2bla bla.##这是另一个h2更多bla bla.###这甚至是h3再次,更多bla bla.##再次是h2等等等EOS例子<<<< -EOS##这是一个h2bla bla.##这是另一个h2更多bla bla.EOS例子<<<< -EOS#这是标题##这是一个h2bla bla.EOS例子<<<< -EOS这是一个描述.一些更多的描述.##这是一个h2bla bla.EOS 

您可以执行以下操作:

  examples.each | text |文字=〜/\ A(?:(?:^#(?!#)([^ \ n] *))?(.*?)(?= ^#| \ z))?(.*)\z/米title,description,content = [$ 1,$ 2,$ 3] .map {| s |s.strip!如果除非(s& s.empty?)}放置<< -EOS文件:标题:#{title.inspect}说明:#{description.inspect}内容:#{content.inspect}EOS结尾 

注意:正则表达式不在乎连续换行符的数量.

哪个给您:

 文件:h1:这是标题"description:这是一些说明.\ n甚至更多说明."内容:"##这是一个h2 \ nBla bla.\ n ##这是另一个h2 \ n更多的bla bla.\ n ###这甚至是一个h3 \ n,更多的bla bla.\ n ##同样,h2 \ netc.等等."文件:H1:无说明:无内容:"##这是一个h2 \ nBla bla.\ n ##这是另一个h2 \ n更多bla bla."文件:h1:这是标题"说明:无内容:"##这是h2 \ nBla bla."文件:h1:这是标题"描述:这是一些描述."内容:无文件:h1:无description:这是说明.\ n更多说明."内容:"##这是h2 \ nBla bla." 

I have a folder full of markdown files. Each of them I want to read into the following Ruby object:

class File
  attr_accessor :title, :description, :content
end

The markdown files usually look like this:

# This is the title

This is some description.

And even more description.

## This is an h2

Bla bla.

## This is another h2

More bla bla.

### This is even an h3

Again, more bla bla.

## Again, an h2

etc. etc.

This should result in this Ruby object:

File:
  h1: "This is the title"
  description: "This is some description.\n\nAnd even more description."
  content: "## This is an h2...etc. etc."

To assign the content of the file to the Ruby object's definition, I could simply use a regular expression which would extract title (the first H1), description (the text right between H1 and the following H2), and content (all the rest).

But the files do not always look exactly like this:

  • Sometimes, there is no H1
    • (If so, the file name will be used for title)
  • Sometimes, there is no description
  • Sometimes, there is no content

These exceptions can occur in combinations, ie. a file without H1 and description:

## This is an h2

Bla bla.

## This is another h2

More bla bla.

This should result in this Ruby object:

File:
  h1: nil
  description: nil
  content: "## This is an h2...More bla bla."

Or a file with H1 but no description:

# This is the title

## This is an h2

Bla bla.

This should result in this Ruby object:

File:
  h1: "This is the title"
  description: nil
  content: "## This is an h2...Bla bla.

Or a file with no H1, but a description:

This is a description.

Some more description.

## This is an h2

Bla bla.

This should result in this Ruby object:

File:
  h1: nil
  description: This is a description...Some more description.
  content: "## This is an h2...Bla bla.

I wonder whether I can do this using a single fancy regular expression (I'm no expert in that), or whether I should try to somehow split it into several process steps. I asked a similar question here: Markdown: Regex to find all content following an heading #2 (but stop at another heading #2), but I couldn't get the regex to run properly using Ruby with the exceptions described above.

Any idea how to solve this problem is highly welcome. Thank you.

PS: I also thought about parsing the markdown using a markdown parser and then use Nokogiri or something which would allow me to parse the results. But this feels like way too much overhead for such a basically simple requirement.

解决方案

Given your examples:

examples = []

examples << <<-EOS
# This is the title    
This is some description.    
And even more description.    
## This is an h2    
Bla bla.    
## This is another h2    
More bla bla.    
### This is even an h3    
Again, more bla bla.    
## Again, an h2    
etc. etc.    
EOS
 
examples << <<-EOS
## This is an h2    
Bla bla.    
## This is another h2    
More bla bla.
EOS
 
examples << <<-EOS
# This is the title    
## This is an h2    
Bla bla.
EOS

examples << <<-EOS
This is a description.
Some more description.
## This is an h2
Bla bla.
EOS

You can do this:

examples.each do |text|
  text =~ /\A(?:(?:^#(?!#)([^\n]*))?(.*?)(?=^#|\z))?(.*)\z/m
  title,description,content = [$1,$2,$3].map { |s|
    s.strip! if s
    s unless (s && s.empty?)
  }

puts <<-EOS
File:
  title: #{title.inspect}
  description: #{description.inspect}
  content: #{content.inspect}
EOS
end

Note: The regexp doesn't care about number of consecutive newlines.

Which gives you:

File:
  h1: "This is the title"
  description: "This is some description.\nAnd even more description."
  content: "## This is an h2\nBla bla.\n## This is another h2\nMore bla bla.\n### This is even an h3\nAgain, more bla bla.\n## Again, an h2\netc. etc."
File:
  h1: nil
  description: nil
  content: "## This is an h2\nBla bla.\n## This is another h2\nMore bla bla."
File:
  h1: "This is the title"
  description: nil
  content: "## This is an h2\nBla bla."
File:
  h1: "This is the title"
  description: "This is some description."
  content: nil
File:
  h1: nil
  description: "This is a description.\nSome more description."
  content: "## This is an h2\nBla bla."

这篇关于Ruby:解析简单的markdown文件(结构相似,但结构不相等),然后将内容填充到对象的属性中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆