在Ruby脚本中使用SLIM/HAML等? [英] Use SLIM/HAML etc. in a Ruby script?

查看:77
本文介绍了在Ruby脚本中使用SLIM/HAML等?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个脚本,该脚本分析一些遗传数据,然后在彩色的Word文档上生成输出.该脚本有效,但是脚本中的一种方法写得不好,即创建Word文档的方法.

I am currently making a script that analyses some genetic data and then produce the output on a coloured Word document. The script works, however, one method in the script is badly written, the method that creates the Word document.

创建文档的方法会创建一个独立的HTML文件,然后以"docx"扩展名保存该文件,这使我可以为文档的不同部分赋予不同的样式.

The method creating the document creates a standalone HTML file, which is then saved with a 'docx' extension, which allows me to give different parts of the document different styles.

下面是让它正常工作的最低要求.它包括一些样本输入数据,这些数据将在最后一步之前以不同的方法创建并存储在哈希中,以及必要的方法.

Below is the bare minimum to get this to work. It includes some sample input data which would be created in a different method just before the final step and stored in a hash, and the necessary methods.

require 'bio'

def make_hash(input_file)
  input_read = Hash.new
  biofastafile = Bio::FlatFile.open(Bio::FastaFormat, input_file) 
  biofastafile.each_entry do |entry|
    input_read[entry.definition] = entry.aaseq
  end
  return input_read
end

def to_doc(hash, output, motif)
  output_file = File.new(output, "w")
  output_file.puts "<!DOCTYPE html><html><head><style> .id{font-weight: bold;} .signalp{color:#000099; font-weight: bold;} .motif{color:#FF3300; font-weight: bold;} h3 {word-wrap: break-word;} p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}</style></head><body>"
  hash.each do |id, seq|
    sequence = seq.to_s.gsub("\[\"", "").gsub("\"\]", "")
    id.scan(/(\w+)(.*)/) do |id_start, id_end|
      output_file.puts "<p><span class=\"id\"> >#{id_start}</span><span>#{id_end}</span><br>"
      output_file.puts "<span class=\"signalp\">"
      sequence.scan(/(\w+)-(\w+)/) do |signalp, seq_end|
        output_file.puts signalp + "</span>" + seq_end.gsub(/#{motif}/, '<span class="motif">\0</span>')
        output_file.puts "</p>"
      end
    end
  end
  output_file.puts "</body></html>"
  output_file.close   
end

hash = make_hash("./sample.txt")
to_doc = to_doc(hash, "output.docx", "WL|KK|RR|KR|R..R|R....R"

这是一些示例数据.实际上,在分析某个物种的遗传数据时,它可以由许多100,000个序列组成:

This is some sample data. In reality, when analysing the genetic data from a species, this can be made up of many 100,000's of sequences:

>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL

每次读取由两部分组成:seq id(以> 开头的行)和序列.将其拆分并存储在 make_hash 方法中的哈希中.这个例子:

Each read is made of two parts: The seq id (the line starting with a >) and the sequence. This is split, and stored in a hash in the make_hash method. This example:

>isotig00001_f4_14 - Signal P Cleavage Site => 11:12

MMHLLCIVLLL-KWWLLL 

由以下组成:

>isotig00001_f4_14  (the first part of the id - class="id")

Signal P Cleavage Site => 11:12 (the second part of the id - normal writing)

(new line)

MMHLLCIVLLL (first part of the sequence - class="signalp")

KW WL LL  (the second part of the sequence - the motif KW will be class="motif")

在HTML中,它将产生:

In HTML it would produce:

<p>
  <span class="id"> >isotig00001_f4_14</span><span>Signal P Cleavage Site => 11:12</span>
<br>
  <span class="signalp">MMHLLCIVLL</span><span>KW</span><span class="motif">KW</span><span>LL</span>

基本上,我想使用适当的HTML模板脚本(例如SLIM/HAML/NOKOGIRI/ERB)重写 to_doc 方法.我试图做到这一点.

Basically, I would like to rewrite the to_doc method using a proper HTML templating script such as SLIM/HAML/NOKOGIRI/ERB. I have tried to get this done.

由于某种原因,一个循环中的一个循环不起作用,并且创建一个全局变量来存储这些变量也不起作用.

For some reason, a loop within a loop didn't work and creating an global variable to store these variables didn't work either.

上面的脚本有效,只需将示例数据保存为"sample.txt",然后运行脚本即可.

The script above works, just save the sample data as "sample.txt" and then run the script.

我将非常感谢您的帮助.

I would be highly grateful for any help.

推荐答案

这是一个起点:

require 'haml'

haml_doc = <<EOT
%html
  %head
    :css
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
  %body
EOT

engine = Haml::Engine.new(haml_doc)
puts engine.render

运行时输出以下内容:

<html>
  <head>
    <style>
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
    </style>
  </head>
  <body></body>
</html>

从那里,您可以使用以下命令轻松地写入文件:

From there, you can easily write to a file using:

File.write(output, engine.render)

而不是使用 puts 将其输出到控制台.

instead of using puts to output it to the console.

要使用此功能,您需要使用其他Haml充实 haml_doc ,以循环输入数据并将其处理为可以干净地迭代的数组或哈希,而无需嵌入各种 scan 和条件逻辑.视图应主要用于输出内容,而不是操作数据.

To use this, you need to flesh out the haml_doc with additional Haml to loop over your input data and massage it into an array or hash that you can iterate over cleanly, without embedding all sorts of scan and conditional logic. A view should be primarily used to output content, not manipulate data.

您只需在 engine = Haml ... 行上方读取输入数据并对其进行处理,然后将其存储在Haml可以迭代的实例变量中.您在原始代码中有了基本的想法,但不要尝试输出HTML,而是创建一个可以传递给Haml的对象或子哈希.

Just above the engine = Haml... line you'd want to read your input data and massage it, and store it in an instance variable that Haml can iterate over. You have the basic idea in your original code but instead of trying to output HTML, create an object or sub-hash that you can pass to Haml.

通常,所有这些都将被分离为模型,视图和控制器的单独文件,例如在Rails或大型Sinatra应用程序中,但这确实不是一个大型应用程序,因此您可以将它们全部放在一个文件中.保持逻辑整洁,一切都会好起来的.

Normally this would all be separated into separate files for the model, the view and the controller, like in Rails or big Sinatra apps, but this really isn't a big app, so you can put it all in one file. Keep your logic clean and it'll be fine.

没有样本输入数据和预期的输出,很难做更多的事情,但这将为您提供一个起点.

Without sample input data and an expected output it's hard to do more, but that'll give you a starting point.

根据数据样本,这里有些事情会引起您的注意.我不会擦亮它,因为毕竟您必须做一些,但这是一个合理的开始.第一部分是模拟类似您在代码中引用的Bio的东西,但是我从未见过.您不需要这部分,但可能需要仔细阅读:

Based on the data samples, here's something that gets in you the ballpark. I won't polish it because, after all, you have to do some of it, but this is a reasonable start. The first part is mocking up something reasonably like the Bio you reference in your code, but which I've never seen. You don't need this part, but might want to look through it:

module Bio

  FastaFormat = 1

SAMPLE_DATA = <<-EOT
>isotig00001_f4_14 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00001_f4_15 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00003_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00003_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00004_f6_8 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00004_f6_9 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
>isotig00009_f2_3 - Signal P Cleavage Site => 22:23
MLKCFSIIMGLILLLEIGGGCA-IYFYRAQIQAQFQKSLTDVTITDYRENADFQDLIDALQSGLSCCGVNSYEDWDNNIYFNCSGPANNPEALWCAFLLLYTGSSKRSSQHPVRLWSSFPRTTKYFPHKDLHHWLCGYVYNVD
>isotig00009_f3_9 - Signal P Cleavage Site => 16:17
MKTGIIIFISTVVVLP-ITLKPCGVPFSCCIPDQASGVANTQCGYGVRSPEQQNTFHTKIYTTGCADMFTMWINRYLYYIAGIAGVIVLVELFGFCFAHSLINDIKRQKARWAHR
>isotig00009_f6_13 - Signal P Cleavage Site => 11:12
MMHLLCIVLLL-KWWLLL
>isotig00009_f6_14 - Signal P Cleavage Site => 10:11
MHLLCIVLLL-KWWLLL
EOT

  class FlatFile

    class Entry
      attr_reader :definition, :aaseq

      def initialize(definition, aaseq)
        @definition = definition
        @aaseq = aaseq
      end
    end

    def initialize
    end

    def self.open(filetype, filename)
      SAMPLE_DATA.split("\n").each_slice(2).map{ |seq_id, sequence| Entry.new(seq_id, sequence) }
    end

    def each_entry
      @sample_data.each do |_entry|
        yield _entry
      end
    end

  end
end

从这里开始乐趣.我修改了您的 get_hash 例程,以解析字符串.它返回一个哈希数组,而不是哈希.每个子哈希都可以使用了,换句话说,数据已经解析并可以输出了:

Here's where the fun begins. I modified your get_hash routine to parse the strings how I'd do it. Instead of a hash, it returns an array of hashes. Each sub-hash is ready to be used, in other words, the data is parsed and ready to be output:

include Bio

def make_array_of_hashes(input_file)
  Bio::FlatFile.open(
    Bio::FastaFormat,
    input_file
  ).map { |entry|

    id_start, id_end = entry.definition.split('-').map(&:strip)
    signalp, seq_end = entry.aaseq.split('-')
    motif = seq_end.scan(/(?:WL|KK|RR|KR|R..R|R....R)/)

    {
      :id_start => id_start,
      :id_end => id_end,
      :signalp => signalp,
      :motif => motif
    }
  }
end

这是在脚本主体中定义HAML文档的简单方法.我只输出,模板中除了循环外没有逻辑.在处理视图之前,已处理了其他所有内容:

This is a simple way to define the HAML document inside the body of a script. I only output, there's no logic in the template except to loop. Everything else was handled prior to the view being processed:

haml_doc = <<EOT
!!!
%html
  %head
    :css
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
  %body
  - data.each do |d|
    %p
      %span.id= d[:id_start]
      %span= d[:id_end]
      %br/
      %span.signalp= d[:signalp]
      - d[:motif].each do |m|
        %span= m
EOT

这是使用方法:

require 'haml'

data = make_array_of_hashes('sample.txt')

engine = Haml::Engine.new(haml_doc)
puts engine.render(Object.new, :data => data)

其中,当运行输出时:

<!DOCTYPE html>
<html>
  <head>
    <style>
      .id {font-weight: bold;}
      .signalp {color:#000099; font-weight: bold;}
      .motif {color:#FF3300; font-weight: bold;}
      h3 {word-wrap: break-word;}
      p {word-wrap: break-word; font-family:Courier New, Courier, Mono;}
    </style>
  </head>
  <body></body>
  <p>
    <span class='id'>>isotig00001_f4_14</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00001_f4_15</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00003_f6_8</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00003_f6_9</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00004_f6_8</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00004_f6_9</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f2_3</span>
    <span>Signal P Cleavage Site => 22:23</span>
    <br>
    <span class='signalp'>MLKCFSIIMGLILLLEIGGGCA</span>
    <span>KR</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f3_9</span>
    <span>Signal P Cleavage Site => 16:17</span>
    <br>
    <span class='signalp'>MKTGIIIFISTVVVLP</span>
    <span>KR</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f6_13</span>
    <span>Signal P Cleavage Site => 11:12</span>
    <br>
    <span class='signalp'>MMHLLCIVLLL</span>
    <span>WL</span>
  </p>
  <p>
    <span class='id'>>isotig00009_f6_14</span>
    <span>Signal P Cleavage Site => 10:11</span>
    <br>
    <span class='signalp'>MHLLCIVLLL</span>
    <span>WL</span>
  </p>
</html>

这篇关于在Ruby脚本中使用SLIM/HAML等?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆