使用Spreadsheet :: ParseExcel在XLS文件中获取零个单元格 [英] Fetching zero cells in XLS file using Spreadsheet::ParseExcel
问题描述
首先请参阅堆栈溢出问题 使用Spreadsheet :: WriteExcel .
First refer to Stack Overflow question Using Spreadsheet::WriteExcel.
数据结构类似于以下内容:
The data structure looks alike the following:
col1 col2 col3 col4 col5
row1 School 1
row2 Dean John
row3 No.stu. 55
row4 some irrelevant stuff
row5 School2 2
row6 Dean Tony
row7 No. stu. 60
row8 some irrelevant stuff
row9 School 3
row10 Dean James
row11 No.stu. 56
row12 No. teacher 20
row13 School 4
row14 Dean Tom
row15 No.stu. 79
row16 No. teacher 21
row17 course
row18 math 2
row19 eng 4
row20 teacher name age gender race
row21 Jane 20 female white
row22 student name Lee
row23 SAT 1434
row24 gender male
我想要实现的输出是:
col1 col2 col3 col4 col5 col6 col7 col8 col9
row1 School Dean No.stu. No. teacher course_math course_eng teacher_name teacher_age teacher_gender teacher_race student_name student_SAT student_gender
row2 1 John 55
row3 2 Tony 60
row4 3 James 56 20
row5 4 Tome 79 21 2 4 Jane 20 female white Lee 1434 male
感谢gangabass,我得到的代码是:
And thanks to gangabass, the code I have been given is that:
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use FindBin qw($Bin);
my ($infile) = @ARGV;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$Bin/Test.xls");
die $parser->error unless defined $workbook;
my ($worksheet) = $workbook->worksheets();
my %data;
my $row = 0;
my $school = "";
while (1) {
my $cell = $worksheet->get_cell($row, 0);
last unless defined($cell);
my $key = $cell->value();
my $value = $worksheet->get_cell($row++, 1)->value();
if ($key eq "School") {
$school = $value;
next;
}
$data{$school}->{$key} = $value;
}
sleep 1;
我从解析row17-row19开始.我遇到的第一个问题是(row17,col3)中的空单元格.代码到达此处时会出错.知道Excel单元格可以区分空"和空白"后,我可以通过将原始XLS文件的单元格格式设置为除通用"之外的其他格式来使用它.但是,这只是一个临时解决方案.我想知道是否可以使用任何命令来获取空单元格.我已经尝试通过添加以下内容来尝试unformatted()
:
I start from parsing row17-row19. The first problem I came across was the empty cell in (row17, col3). The code errors out when it arrives here. Knowing that an Excel cell differentiates between 'empty' and 'blank', I can play around with it by setting the cell format of the original XLS file to others except 'general'. However, it is just a temporary solution. I wonder if there are any commands that I could use to fetch the empty cell. I already tried unformatted()
by adding:
my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();
但是,它无法解决问题.
It, however, does not work out.
然后我尝试使用以下代码在课程"状态下指定数据结构:
I then tried to specify the data structure under 'course' state using the following code:
my %data;
my $row = 0;
my $school = "";
my $course = ""; #Initial value for the state of course
while (1) {
my $cell = $worksheet->get_cell($row, 0);
last unless defined($cell);
my $key = $cell->value();
my $value = $worksheet->get_cell( $row++, 1 )->value();
my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3
if ($key eq "School") {
$school = $value;
next;
}
if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'
$course = $value1;
next;
}
$data{$school}->{$key} = $value; #Must be something wrong here, but can not figure out
}
该代码未通过,并给出了Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';
The code does not go through and gives Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';
简而言之,我的问题是:
So in short, my questions are:
- 通常,如何在不干预过程的情况下获取XLS中的空单元格?
-
如何使用Spreadsheet :: ParseExcel
- In general, how do I fetch empty cells in XLS without intervening the process?
How do I parse a structure like following by using Spreadsheet::ParseExcel
row17 course
row18 math 2
row19 eng 4
推荐答案
在不应该进行的情况下,您正在执行$row++
.
You are doing a $row++
when you shouldn't be.
尝试使用以下代码作为起点:
Try this code instead as your starting point:
my %data;
my $state = "";
my $school = "";
my $student = "";
my ( $row_min, $row_max ) = $worksheet->row_range();
my $row = $row_min;
while ($row <= $row_max) {
my $cell0 = $worksheet->get_cell( $row, 0 );
my $cell1 = $worksheet->get_cell( $row, 1 );
if (defined($cell0)) {
my $key = $cell0->value();
if ($key eq 'School') {
$state = 'school';
$school = $cell1->value();
} elsif ($key eq 'course') {
$state = 'course';
} elsif ($key eq 'teacher') {
$state = 'teacher';
} elsif ($key eq 'student') {
$state = 'student');
$student = $worksheet->get_cell( $row, 2 )->value();
} else {
$data{$school}{$key} = $cell1->value();
}
} elsif ($state eq 'course') {
# process columns for course
} elsif ($state eq 'teacher') {
# process columns for teacher
} elsif ($state eq 'student') {
# process columns for student
}
$row++;
}
use Data::Dumper;
print Dumper(\%data);
更新:
要处理教师行,我首先要在第一个while循环之外声明%teacher_columns哈希,并使用col_range()方法声明$ col_min和$ col_max.然后在$key eq 'teacher'
elsif子句中,我将执行以下操作:
To process the teacher rows, I would first declare a %teacher_columns hash outside the first while loop and also the $col_min and $col_max using the col_range() method. Then inside the $key eq 'teacher'
elsif clause, I would do something like this:
%teacher_columns = (); # clear it out in case column names are different for this school
for my $col (2 .. $col_max) {
my $cell = $worksheet->get_cell( $row, $col );
$teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
}
然后在state eq 'teacher'
elsif子句中,我将执行以下操作:
Then in the state eq 'teacher'
elsif clause, I would do something like this:
foreach my $key (keys %teacher_columns) {
my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
if (defined($cell)) {
# store cell data into proper location of your data hash
}
}
这篇关于使用Spreadsheet :: ParseExcel在XLS文件中获取零个单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!