Moving from delimited to XML

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
In most of my programs I use small tab or | delimited text to store
data (mostly because the data came from excel files to begin with).
These "databases" are usually only 10 - 15 fields per record and about
100 or so records, so the overhead in using a real database is not

Recently I have started using XML in other areas and realize that this
format would be more easily maintained then the text files. So the
question of the day, can someone point me to some simple XML
implementation in perl that wont take days to learn and includes some
commentary on how it works that is geared more to the layman?

As a more detailed example of what I  am hoping to accomplish, say I
have a simple tab delimited record in a file that contains a persons

First Name\tLast Name\tStreet Address\tCity\tState\t\Zip

and I wanted to get all the zip codes in this file I would load:

@data = <FILE>;

foreach $record (@data)
@temp = split("\t",$record);
$zips[@zips] = $temp[5];

Now if I decided later to have 2 street addresses and a middle name
the above code would have to be altered. But I was thinking if it was
in XML and each entry looked like this:

<record firstname="John" lastname="Doe" address="123 Main"
city="Anytown" state="NN" zip="12345" />

and the psuedo code would be something like

open xml file
get number of records (elements)
for each record
load record
$zips[@zips] = value of  attribute named "zip"
next record

This would allow the XML file to be changed, more information added
etc without changing the code.

I know one of your are going to tell me what module to include - thats
fine, just hopefully it includes a good description.

Bill H

Re: Moving from delimited to XML

Quoted text here. Click to load it
Errr, like the XML::Simple module?

Quoted text here. Click to load it

You will need to understand Perl's references and data structures
to use XML::Simple, start with perlreftut.pod.

Quoted text here. Click to load it

You could also represent the "fields" in elements rather than
in attributes (as I've done below).

See the XML FAQ: /

Quoted text here. Click to load it

I fail to see how switching to an XML representation would
obviate the need to change the code though...

Quoted text here. Click to load it

use warnings;
use strict;
use XML::Simple;

my $xml = join '', <DATA>; # No Uri, I don't want to use File::Slurp here :-)
my $ref = XMLin($xml);

foreach my $person ( @{ $ref-> } ) { # "Use Rule 1" from perlreftut
   print "$person->\n";

<?xml version='1.0' encoding='UTF-8'?>
   <address>123 Main</address>



Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Re: Moving from delimited to XML

Quoted text here. Click to load it
Quoted text here. Click to load it

The code to access the attribute "zip" doesn't change just because there
is an additional attribute "middlename".

OTOH, if you access a table by column number, it has to change if you
add a column before the one you want to access (and presumably you would
want to add the middle name between the first and last name.

However, you can access the columns by name. For example, in one script
I've done it like this:

my $csv = Text::CSV_XS->new({ binary => 1});
my $h = $csv->getline($fh); # get header line
while (my $d = $csv->getline($fh)) { # data lines
    my $dh = mkhash($h, $d);

sub mkhash {
    my ($k, $v) = @_;
    my $h;
    for my $i (0 .. $#$k) {
        $h-> = decode('utf8', $v->[$i]);
    return $h;


Re: Moving from delimited to XML

Quoted text here. Click to load it

While XML is a decent format for many things, particularly because it is
so widely supported, for storing Perlish data structures in a text file
YAML may be a better fit. In particular YAML::Tiny makes a good
serialization format.


The Earth is degenerating these days. Bribery and corruption abound.
Children no longer mind their parents, every man wants to write a book,
and it is evident that the end of the world is fast approaching.
       Assyrian stone tablet, c.2800 BC              

Re: Moving from delimited to XML

Quoted text here. Click to load it

The difference between a CSV format and an XML format is that the
latter does not have to be normalized. As you point out, in XML a
person may have zero or more names, zero or more streets, zero or more
emails, zero or more phones, etc. You can't do this with a CSV format
because you need to have a 'column' for every value. Same thing is
true for an RDBMS.

Both formats are good, but for different things. It strikes me that,
if you are interested in XML, you should spend time learning XSLT and
Java before you start to use Perl to manipulate the XML. You can
certainly do your zip code stuff with XSLT easier than with Perl.
(Java's a different story, but Java has XML classes and parsers built

In my job, I use a lot of data files, some of them very big, and write
a lot of scripts manipulating these files. We extract data from our
databases both in a CSV format and as XML.* I use both CSV and XML.
These formats are not interchangeable, like a hammer and a screwdriver
are not interchangeable. Which tool you use depends on your task, if
you have nails you use a hammer and if you have screws you use a

You might also want to check out Exist, a XML database. /
This uses XQuery and XPath. You can see how it handles free form data
that would otherwise be extremely difficult to store in an RDB.

IMO, Perl isn't really suited for XML. This is just my opinion, but
it's based on several years of practical experience of dealing with
XML files.


*Our big database uses an IBM product called UniData, which is a non-
first normal form relational database technology, and MOST of our
fields are multi-valued which explains why I find XML so useful. For
example, a student can have from zero to over a hundred classes to his
credit, which are all stored in one 'field'. So ... if I extract the
data as CSV and import them into Excel, I get a very ragged right
margin and very scrambled columns.

Re: Moving from delimited to XML

Quoted text here. Click to load it

WTF are you talking about?
You should get "normalized"...


Re: Moving from delimited to XML

Quoted text here. Click to load it

Quoted text here. Click to load it

Pardon me, but your ignorance is showing.


Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Site Timeline