Parsing OpenOffice Spreadsheets

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Hi all,

I've written a utility that reads OpenOffice Spreadsheet data into a Perl  
structure, actually a hash (the keys correspond to worksheet names) of  
arrays of arrays (the latter correspond to rows and cells, respectively.

It's a simple program that currently relies on XML::Parser::Lite::Tree for  
parsing the XML content of sxc files.

The reason I'm asking for advice is that I'm unsure whether it's too close  
to the existing OpenOffice::Parse::SXC module which is based on  
XML::Parser. The main differences of the module I have in mind: a) It  
returns the different worksheets as hash elements. In  
OpenOffice::Parse::SXC you have to write a handler to achieve this. b) It  
returns undef for empty cells, where OpenOffice::Parse::SXC returns an  
empty string. Undef is better suited for importing data into a database  
(which is what I've written the code for). c) It honors the  
"number-rows-repeated" argument of SXC files, which is ignored by  
OpenOffice::Parse::SXC. d) It optionally returns data any of  several  
encodings. On the other hand, my code is much less sophisticated and  
doesn't allow flexible use of the module through handlers, as  
OpenOffice::Parse::SXC does.

I haven't yet uploaded anything to CPAN, and although I really needed  
something different than the existing modules, I'm unsure whether it's  
wise to add yet another one. Regarding the namespace, I tend away from  
OpenOffice::Foo, as my code is less about OpenOffice than it is about  
importing spreadsheet data, so I am thinking of Spreadsheet::ParseSXC.

Please let me know what you think.

Christoph Terhechte

Site Timeline