Click here to get back home

Moving from delimited to XML

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Moving from delimited to XML Bill H 06-09-2008
Posted by Bill H on June 9, 2008, 7:53 pm
Please log in for more thread options
In most of my programs I use small tab or | delimited text to store
data (mostly because the data came from excel files to begin with).
These "databases" are usually only 10 - 15 fields per record and about
100 or so records, so the overhead in using a real database is not
needed.

Recently I have started using XML in other areas and realize that this
format would be more easily maintained then the text files. So the
question of the day, can someone point me to some simple XML
implementation in perl that wont take days to learn and includes some
commentary on how it works that is geared more to the layman?

As a more detailed example of what I am hoping to accomplish, say I
have a simple tab delimited record in a file that contains a persons
address:

First Name\tLast Name\tStreet Address\tCity\tState\t\Zip

and I wanted to get all the zip codes in this file I would load:

open(FILE,"thefile.txt");
@data = <FILE>;
close(FILE);

foreach $record (@data)
{
@temp = split("\t",$record);
$zips[@zips] = $temp[5];
}

Now if I decided later to have 2 street addresses and a middle name
the above code would have to be altered. But I was thinking if it was
in XML and each entry looked like this:

<record firstname="John" lastname="Doe" address="123 Main"
city="Anytown" state="NN" zip="12345" />

and the psuedo code would be something like

open xml file
get number of records (elements)
for each record
load record
$zips[@zips] = value of attribute named "zip"
next record

This would allow the XML file to be changed, more information added
etc without changing the code.

I know one of your are going to tell me what module to include - thats
fine, just hopefully it includes a good description.

Bill H

Posted by Tad J McClellan on June 9, 2008, 11:10 pm
Please log in for more thread options

> Recently I have started using XML in other areas and realize that this
> format would be more easily maintained then the text files. So the
> question of the day, can someone point me to some simple XML
^^^^^^^^^^
^^^^^^^^^^
Errr, like the XML::Simple module?


> implementation in perl that wont take days to learn and includes some
> commentary on how it works that is geared more to the layman?


You will need to understand Perl's references and data structures
to use XML::Simple, start with perlreftut.pod.


> As a more detailed example of what I am hoping to accomplish, say I
> have a simple tab delimited record in a file that contains a persons
> address:
>
> First Name\tLast Name\tStreet Address\tCity\tState\t\Zip
>
> and I wanted to get all the zip codes in this file I would load:
>
> open(FILE,"thefile.txt");
> @data = <FILE>;
> close(FILE);
>
> foreach $record (@data)
> {
> @temp = split("\t",$record);
> $zips[@zips] = $temp[5];
> }
>
> Now if I decided later to have 2 street addresses and a middle name
> the above code would have to be altered. But I was thinking if it was
> in XML and each entry looked like this:
>
><record firstname="John" lastname="Doe" address="123 Main"
> city="Anytown" state="NN" zip="12345" />


You could also represent the "fields" in elements rather than
in attributes (as I've done below).

See the XML FAQ:

http://xml.silmaril.ie/developers/attributes/


> and the psuedo code would be something like
>
> open xml file
> get number of records (elements)
> for each record
> load record
> $zips[@zips] = value of attribute named "zip"
> next record
>
> This would allow the XML file to be changed, more information added
> etc without changing the code.


I fail to see how switching to an XML representation would
obviate the need to change the code though...


> I know one of your are going to tell me what module to include - thats
> fine, just hopefully it includes a good description.


----------------------------
#!/usr/bin/perl
use warnings;
use strict;
use XML::Simple;

my $xml = join '', <DATA>; # No Uri, I don't want to use File::Slurp here :-)
my $ref = XMLin($xml);

foreach my $person ( @{ $ref-> } ) { # "Use Rule 1" from perlreftut
print "$person->\n";
}


__DATA__
<?xml version='1.0' encoding='UTF-8'?>
<addressbook>
<person>
<firstname>John</firstname>
<lastname>Doe</lastname>
<address>123 Main</address>
<city>Anytown</city>
<state>NN</state>
<zip>12345</zip>
</person>

<person>
<firstname>Bill</firstname>
<lastname>Aitch</lastname>
<city>Dunno</city>
<state>Confusion</state>
<zip>67890</zip>
</person>


</addressbook>
----------------------------


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Posted by Peter J. Holzer on June 10, 2008, 3:46 pm
Please log in for more thread options
>> As a more detailed example of what I am hoping to accomplish, say I
>> have a simple tab delimited record in a file that contains a persons
>> address:
>>
>> First Name\tLast Name\tStreet Address\tCity\tState\t\Zip
>>
>> and I wanted to get all the zip codes in this file I would load:
>>
>> open(FILE,"thefile.txt");
>> @data = <FILE>;
>> close(FILE);
>>
>> foreach $record (@data)
>> {
>> @temp = split("\t",$record);
>> $zips[@zips] = $temp[5];
>> }
>>
>> Now if I decided later to have 2 street addresses and a middle name
>> the above code would have to be altered. But I was thinking if it was
>> in XML and each entry looked like this:
>>
>><record firstname="John" lastname="Doe" address="123 Main"
>> city="Anytown" state="NN" zip="12345" />
[...]
>> This would allow the XML file to be changed, more information added
>> etc without changing the code.
>
>
> I fail to see how switching to an XML representation would
> obviate the need to change the code though...

The code to access the attribute "zip" doesn't change just because there
is an additional attribute "middlename".

OTOH, if you access a table by column number, it has to change if you
add a column before the one you want to access (and presumably you would
want to add the middle name between the first and last name.

However, you can access the columns by name. For example, in one script
I've done it like this:

my $csv = Text::CSV_XS->new({ binary => 1});
my $h = $csv->getline($fh); # get header line
while (my $d = $csv->getline($fh)) { # data lines
my $dh = mkhash($h, $d);
...
}

sub mkhash {
my ($k, $v) = @_;
my $h;
for my $i (0 .. $#$k) {
$h-> = decode('utf8', $v->[$i]);
}
return $h;
}

        hp

Posted by Ben Morrow on June 9, 2008, 11:49 pm
Please log in for more thread options

>
> Recently I have started using XML in other areas and realize that this
> format would be more easily maintained then the text files. So the
> question of the day, can someone point me to some simple XML
> implementation in perl that wont take days to learn and includes some
> commentary on how it works that is geared more to the layman?

While XML is a decent format for many things, particularly because it is
so widely supported, for storing Perlish data structures in a text file
YAML may be a better fit. In particular YAML::Tiny makes a good
serialization format.

Ben

--
The Earth is degenerating these days. Bribery and corruption abound.
Children no longer mind their parents, every man wants to write a book,
and it is evident that the end of the world is fast approaching.
Assyrian stone tablet, c.2800 BC ben@morrow.me.uk

Posted by Tad J McClellan on June 10, 2008, 11:32 pm
Please log in for more thread options

>>*Our big database uses an IBM product called UniData, which is a non-
>>first normal form relational database technology,


> WTF are you talking about?


http://en.wikipedia.org/wiki/Database_normalization


> You should get "normalized"...


Pardon me, but your ignorance is showing.

Again.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher0cmdat/"

Similar ThreadsPosted
Moving to a new machine January 5, 2005, 6:53 pm
Moving C code from 32 to 64 bit June 23, 2006, 9:39 pm
Footnote Moving Problem March 2, 2007, 6:36 am
Computer Moving Companies December 8, 2007, 1:58 pm
California Moving Companies November 21, 2007, 2:20 pm
Re: moving unused of a website April 15, 2008, 9:47 pm
Moving data structure around better than globals? October 26, 2005, 1:20 am
Sorting and moving files to dir for DVD burn October 11, 2006, 10:27 pm
IMAP Mail Filtering / Moving ?? February 15, 2007, 9:22 am
moving binary data from one RDBMS to Other February 6, 2008, 11:50 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap