文件系统爬虫——迭代bug [英] File system crawler - iteration bugs

查看:24
本文介绍了文件系统爬虫——迭代bug的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用以下代码构建文件系统爬虫:

I'm currently building a file system crawler with the following code:

require 'find'
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'

count = 0

Find.find('/Users/Anconia/crawler/') do |file|           
  if file =~ /\b.xls$/                                            # check if filename ends in desired format
    contents =  Spreadsheet.open(file).worksheets
    contents.each do |row|
      if row =~ /regex/
        puts file
        count += 1
      end
    end
  end
end

puts "#{count} files were found"

并且我收到以下输出:找到0个文件

正则表达式已经过测试并且是正确的 - 我目前在另一个有效的爬虫中使用它.

The regex is tested and correct - I currently use it in another crawler that works.

row.inspect 的输出是

#<Spreadsheet::Excel::Worksheet:0x003ffa5d418538 @row_addresses= @default_format= @selected= @dimensions= @name=Sheet1 @workbook=#<Spreadsheet::Excel::Workbook:0x007ff4bb147140>@rows=[] @columns=[] @links={} @merged_cells=[] @protected=false @password_hash=0 @changes={} @offsets={} @reader=#<Spreadsheet::Excel::读者:0x007ff4bb1f3b98>@ole=#<Ole::Storage::RangesIOMigrateable:0x007ff4bb126fa8>@offset=15341 @guts={} @rows[3]> - 当然没有什么可迭代的.

#<Spreadsheet::Excel::Worksheet:0x003ffa5d418538 @row_addresses= @default_format= @selected= @dimensions= @name=Sheet1 @workbook=#<Spreadsheet::Excel::Workbook:0x007ff4bb147140> @rows=[] @columns=[] @links={} @merged_cells=[] @protected=false @password_hash=0 @changes={} @offsets={} @reader=#<Spreadsheet::Excel::Reader:0x007ff4bb1f3b98> @ole=#<Ole::Storage::RangesIOMigrateable:0x007ff4bb126fa8> @offset=15341 @guts={} @rows[3]> - certainly nothing to iterate over.

推荐答案

正如 Diego 提到的,我应该一直在迭代内容 - 非常感谢您的澄清!还应注意,在进行任何迭代之前,必须将 row 转换为字符串.

As Diego mentioned, I should have been iterating over contents - really appreciate the clarification! It should also be noted that row must be converted to a string before any iteration takes place.

这篇关于文件系统爬虫——迭代bug的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆