Click here to get back home

Help with validating XML (DTD or Schema) with PERL

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Help with validating XML (DTD or Schema) with PERL 1234marlon 06-25-2007
Get Chitika Premium
Posted by Keith on June 28, 2007, 9:53 am
Please log in for more thread options


On Jun 25, 5:06 pm, 1234mar...@gmail.com wrote:
> Can someone tell me if a reliable production PERL package was ever
> created to handle either W3C DTD or Schema standards? I realize
> libxml2 based packages are very good, but I am not willing to tackle
> the maintenance required for compiled libxml2 program in our
> production environment.
>
> It is the year 2007 for heavens sake; did anyone ever create something
> other than poorly tested, partial W3C standards, barely usable module
> to handle DTD or Schemas? I have been search CPAN and tested modules
> franticly for months without finding anything that is good (other than
> lidxml2 stuff looks good).
I've said as much before on this group but, .....

Amongst the many things I've happily & successfully done with Perl &
CPAN modules in the last 15 years
or so, parsing XML is not one. Pickup "Java & XML" by O'Reilly, and
with little background in the
language you'll probably be able to solve your problem in hours.

Perl SHOULD be good at this sort of thing (and as others have
suggested, the Perl community would love some
help on this, I'm sure) but issues w/ recursion and lack of an
adequate threading model seem to put it
at a disadvantage vis a vis Java, in my opinion, for tasks like this,
while one of Perl's great
strengths over Java (built-in regular expression matching) are
somewhat wasted on data as structured as XML.

If you don't want to re-invent Perl wheels here, and can't or don't
want to use Java, why not go back
to VB (if you're running on Windows only)?

I've had to write my own lexers & parsers for even the lightest of XML
tasks in Perl. As good as it
is for other things, XML seems to be an Achilles heel.

Keith
------
"If all you have is a hammer, everything's a nail"




>
> I am using XML::SAX::ParserFactory (PurePerl) as my parser and
> original thought I would easily find a FILTER to handle DTD or Schema
> validation; instead after months of reseach, I find myself pleading
> for help in the newsgroup. Perl is really treating me bad for this
> XML project; I did a similar project with VB6 and it was a breeze.
>
> Anyway, right now I am trying to test XML-DTD-0.06. However the
> documentation is gibberish, it keeps referring to $rt without
> explaining what $rt is...omg!!! If someone has a working example of
> using XML-DTD-0.06, I would love to see it.
>
> Thank you in advance for any guidance you can give me!!!!!!



Posted by MM on June 28, 2007, 7:52 pm
Please log in for more thread options



> I've said as much before on this group but, .....
>
> Amongst the many things I've happily & successfully done with Perl &
> CPAN modules in the last 15 years
> or so, parsing XML is not one. Pickup "Java & XML" by O'Reilly, and
> with little background in the
> language you'll probably be able to solve your problem in hours.
>

I actually tried using Java for XML processing last year and found the
same situation that I am having with Perl looking for XML/XSLT/Schema/
DTD/XPath support (for a production environment). I thought about
buying a Java-XML book, but did not know which one to get or if it
would help - Java support for stable XML processing looked horrible to
me. However, thanks for the "Java & XML" by O'Reilly tip, I will
definitely look at it!!!!

> Perl SHOULD be good at this sort of thing (and as others have
> suggested, the Perl community would love some
> help on this, I'm sure) but issues w/ recursion and lack of an
> adequate threading model seem to put it
> at a disadvantage vis a vis Java, in my opinion, for tasks like this,
> while one of Perl's great
> strengths over Java (built-in regular expression matching) are
> somewhat wasted on data as structured as XML.
>
yes, yes, yes... libxml2 based modules is the Perl solution that
everyone is using. I am not sure about this, but it appears that all
other modules for core XML processing on CPAN were practically
abandoned, maybe due to the superior libxml2 based modules.

> If you don't want to re-invent Perl wheels here, and can't or don't
> want to use Java, why not go back
> to VB (if you're running on Windows only)?
>
I wanted to use Java for this project, but was afraid about XML module
support and development time. I am unable to write Java code as fast
as I can write Perl code, which is funny because this validation thing
is taking me forever to resolve. I am doing this for Unix, otherwise,
yes, VB (6/.net).

> I've had to write my own lexers & parsers for even the lightest of XML
> tasks in Perl. As good as it
> is for other things, XML seems to be an Achilles heel.
>
> Keith

I need this done by next week, so I will probably write my own
(garbage) code to handle XML validation. As for now, I am still
looking around CPAN and dissecting weird modules like
XML::DTD::Parser.

Thanks for your feedback!


Posted by Peter J. Holzer on July 1, 2007, 6:36 am
Please log in for more thread options


> On Jun 25, 5:06 pm, 1234mar...@gmail.com wrote:
>> Can someone tell me if a reliable production PERL package was ever
>> created to handle either W3C DTD or Schema standards? I realize
>> libxml2 based packages are very good, but I am not willing to tackle
>> the maintenance required for compiled libxml2 program in our
>> production environment.
>>
>> It is the year 2007 for heavens sake; did anyone ever create something
>> other than poorly tested, partial W3C standards, barely usable module
>> to handle DTD or Schemas? I have been search CPAN and tested modules
>> franticly for months without finding anything that is good (other than
>> lidxml2 stuff looks good).
> I've said as much before on this group but, .....
>
> Amongst the many things I've happily & successfully done with Perl &
> CPAN modules in the last 15 years or so, parsing XML is not one.

I've used both XML::Parser and XML::LibXML successfully (often
indirectly through XML::Simple). Yes, both rely on a C library to do the
actual parsing, but I don't see that as a big problem (I need
other external libraries anyway, especially database stuff, so I
couldn't run a "pure perl" environment anyway. And from a programmer's
perspective there is no difference).


> Perl SHOULD be good at this sort of thing (and as others have
> suggested, the Perl community would love some help on this, I'm sure)
> but issues w/ recursion and lack of an adequate threading model seem
> to put it at a disadvantage vis a vis Java, in my opinion, for tasks
> like this,

I don't think perl has "issues w/ recursion". I've certainly written
some deeply recursive stuff (so I know about "no warnings 'recursion'").
Function calls may be slower in perl than in Java, and they are
certainly a lot slower than in C.

Threading in Perl is broken IMHO. But I don't see what this has to do
with parsing. That task doesn't seem to be parallelizable to me.

> while one of Perl's great strengths over Java (built-in regular
> expression matching) are somewhat wasted on data as structured as XML.

True. XML is designed to be parsed a character at a time. You can do
that in perl of course, but you have all that overhead of interpreting a
bytecode which makes things slow. So you want to do that in a language
which is designed for dealing with data a byte at a time, like C. (Java
is somewhere in between).


> I've had to write my own lexers & parsers for even the lightest of XML
> tasks in Perl.

If you wrote your own lexers & parsers in Perl, Perl is obviously good
enough for you for parsing XML.

        hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ |        -- Sam in "Freefall"

Posted by Keith on July 2, 2007, 12:19 am
Please log in for more thread options


>
>
>
> > On Jun 25, 5:06 pm, 1234mar...@gmail.com wrote:
> >> Can someone tell me if a reliable production PERL package was ever
> >> created to handle either W3C DTD or Schema standards? I realize
> >> libxml2 based packages are very good, but I am not willing to tackle
> >> the maintenance required for compiled libxml2 program in our
> >> production environment.
>
> >> It is the year 2007 for heavens sake; did anyone ever create something
> >> other than poorly tested, partial W3C standards, barely usable module
> >> to handle DTD or Schemas? I have been search CPAN and tested modules
> >> franticly for months without finding anything that is good (other than
> >> lidxml2 stuff looks good).
> > I've said as much before on this group but, .....
>
> > Amongst the many things I've happily & successfully done with Perl &
> > CPAN modules in the last 15 years or so, parsing XML is not one.
>
> I've used both XML::Parser and XML::LibXML successfully (often
> indirectly through XML::Simple). Yes, both rely on a C library to do the
> actual parsing, but I don't see that as a big problem (I need
> other external libraries anyway, especially database stuff, so I
> couldn't run a "pure perl" environment anyway. And from a programmer's
> perspective there is no difference).
>
> > Perl SHOULD be good at this sort of thing (and as others have
> > suggested, the Perl community would love some help on this, I'm sure)
> > but issues w/ recursion and lack of an adequate threading model seem
> > to put it at a disadvantage vis a vis Java, in my opinion, for tasks
> > like this,
>
> I don't think perl has "issues w/ recursion". I've certainly written
> some deeply recursive stuff (so I know about "no warnings 'recursion'").
> Function calls may be slower in perl than in Java, and they are
> certainly a lot slower than in C.
>
> Threading in Perl is broken IMHO. But I don't see what this has to do
> with parsing. That task doesn't seem to be parallelizable to me.

I should have been clearer. One thing is a recursive subroutine,
another
is a recursive Perl script. I've had issues w/ recursive Perl
scripts,
but for parsing XML you're not likely to need these. As for
subroutines...

Because by default everything in Perl is global to the package, this
can create
issues for recursive code. Someone who takes a very disciplined
approach
to writing OO Perl might not have the same issues; but writing
recursive Perl subs usually
requires me to go back and cleanup after myself where I've taken short
cuts to get work
done in a hurry. Of course, languages which don't allow such
shortcuts to begin with
(e.g.: Java), force you to pay this price up front, whether you need
to pay it or not.
Your point is well-taken.

To the point below; again I was not clear. A lot of the work I need
to do in XML involves
parsing a master XML document (SAX - style) with pointers to other XML
documents.
The parallelization here should be obvious: in Java fooParser
instantiates one or
more barParsers, each of which can run in it's own thread. It's
likely that not everyone's
XML needs match that pattern, and might not benefit as much from a
robust threading model.

>
> > while one of Perl's great strengths over Java (built-in regular
> > expression matching) are somewhat wasted on data as structured as XML.
>
> True. XML is designed to be parsed a character at a time. You can do
> that in perl of course, but you have all that overhead of interpreting a
> bytecode which makes things slow. So you want to do that in a language
> which is designed for dealing with data a byte at a time, like C. (Java
> is somewhere in between).
>
> > I've had to write my own lexers & parsers for even the lightest of XML
> > tasks in Perl.
>
> If you wrote your own lexers & parsers in Perl, Perl is obviously good
> enough for you for parsing XML.
>
> hp
>
> --
> _ | Peter J. Holzer | I know I'd be respectful of a pirate
> |_|_) | Sysadmin WSR | with an emu on his shoulder.
> | | | h...@hjp.at |
> __/ |http://www.hjp.at/| -- Sam in "Freefall"



Posted by Peter J. Holzer on July 2, 2007, 2:11 am
Please log in for more thread options


>> > Amongst the many things I've happily & successfully done with Perl &
>> > CPAN modules in the last 15 years or so, parsing XML is not one.
>>
>> I've used both XML::Parser and XML::LibXML successfully (often
>> indirectly through XML::Simple). Yes, both rely on a C library to do the
>> actual parsing, but I don't see that as a big problem (I need
>> other external libraries anyway, especially database stuff, so I
>> couldn't run a "pure perl" environment anyway. And from a programmer's
>> perspective there is no difference).
>>
>> > Perl SHOULD be good at this sort of thing (and as others have
>> > suggested, the Perl community would love some help on this, I'm sure)
>> > but issues w/ recursion and lack of an adequate threading model seem
>> > to put it at a disadvantage vis a vis Java, in my opinion, for tasks
>> > like this,
>>
>> I don't think perl has "issues w/ recursion". I've certainly written
>> some deeply recursive stuff (so I know about "no warnings 'recursion'").
>> Function calls may be slower in perl than in Java, and they are
>> certainly a lot slower than in C.
>>
> I should have been clearer. One thing is a recursive subroutine,
> another
> is a recursive Perl script. I've had issues w/ recursive Perl
> scripts,

[please fix the line width in your newsreader: Alternating long and
short lines isn't pretty. I've reformatted the rest of your posting]

What kind of issues? A perl script is just a program like any other.
Whether you call a perl script, compiled C program, java program or
anything else from a perl script, compiled C program, java program or
anything else doesn't matter (on Unix anyway, except that you can't
directly invoke java programs but usually need a wrapper shellscript).

> but for parsing XML you're not likely to need these. As for
> subroutines...
>
> Because by default everything in Perl is global to the package, this
> can create issues for recursive code.

It has long been best practice to "use strict" in perl code. Then there
is no default as you have to declare every variable explicitely. Of
course you can still declare variables outside of any sub, but that's a
concious decision of the programmer (just like a Java programmer needs
to decide whether a variable should be local to the method, the
instance, or the class).

>> Threading in Perl is broken IMHO. But I don't see what this has to do
>> with parsing. That task doesn't seem to be parallelizable to me.
>
> To the point below; again I was not clear. A lot of the work I need
> to do in XML involves parsing a master XML document (SAX - style) with
> pointers to other XML documents. The parallelization here should be
> obvious: in Java fooParser instantiates one or more barParsers, each
> of which can run in it's own thread.

Which assumes that parsing the main document can continue before the
referenced document is parsed. But yes, if this is true, then
parallelization may be a win. In (non-threaded) perl you could probably
do that with a select-loop (which uses only a single CPU but works well
if you read from network resources) or by forking off children which use
Storable to pass the result back to the parent.

        hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@hjp.at |
__/ | http://www.hjp.at/ |        -- Sam in "Freefall"

Similar ThreadsPosted
ANNOUNCE: Initial release of WSF/Perl (Perl bindings for a WS-* framework) October 4, 2007, 1:37 am
PLJava - Perl embeded into Java (calling Perl from Java) - 1sr release - call for tests and review, please. July 13, 2004, 4:06 am
Perl MakeMaker - how to force Perl linking with the static C library (libcrt.lib) instead of dynamic C library (msvcrt.lib) April 17, 2007, 5:22 pm
MFC with Perl July 19, 2005, 9:38 pm
Net::SSH::Perl October 23, 2005, 11:16 pm
perl with ASP November 9, 2005, 10:19 am
Perl December 30, 2005, 9:28 pm
perl DBI help March 1, 2006, 10:25 am
Help with Net::SSH::Perl August 25, 2006, 2:26 pm
perl -> VB .NET September 2, 2006, 11:17 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap