Click here to get back home

Problem Parsing Huge XML file using XML::Twig

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Problem Parsing Huge XML file using XML::Twig vikrant 04-23-2007
Posted by vikrant on April 23, 2007, 10:24 pm
Please log in for more thread options


Hi,
I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
file is as following:-
This is a sample code:-
-------------------------------------------------------------------------------------------------------------------------
<?xml version='1.0'?>
<StoreInfo>
<StoreName>AEC</StoreName>
<Products>
        <Product>
                <ProductID>21CR10.2</ProductID>
                <ProductInfo name="abc" category="xyz">HUGE</ProductInfo>
                <SupplierID>AEC</SupplierID>
                <PurchasePrice>10.99</PurchasePrice>
                <links>
                        <link>http://www.example.com</link>
                        <link>http://www.example2.com</link>
                </links>
        </Product>
        <Product>
                <ProductID>21CR11.2</ProductID>
                <ProductInfo name="abcd" category="xyzd">ARROW</ProductInfo>
                <SupplierID>AEC</SupplierID>
                <PurchasePrice>10.49</PurchasePrice>
                <links>
                        <link>http://www.example.com</link>
                        <link>http://www.example2.com</link>
                </links>
        </Product>
</Products>
</StoreInfo>
------------------------------------------------------------------------------------------------------------------------------------
Here,Product Tag repeating 2000 times in original file.

I am able to get the values of ProductID,SupplierID and
PurchasePrice using the following code.But,How do a get the value's at
"link" Node's ,attributes values and node value of ProductInfo NODE.
I know we can use XPath with XML::Twig but unfortunaly i am not able
to use it.So,please help me.Any document,link or refrences related to
it.I search a lot but failed to find.
-----------------------------------------------------------------------------------------------------------------------------
#!/bin/perl -w
use strict;
use XML::Twig;

my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
$t->parsefile( 'sample.xml');
exit;
sub product
{ my ($t, $product)= @_;
my %product;
$product= $product->field( 'ProductID');
$product= $product->field( 'SupplierID');
$product= $product->field( 'PurchasePrice');

print "$product: $product :$product
\n";
$product->delete;
}
------------------------------------------------------------------------------------------------------------------------------
One strange thing i find accidently is that when i am removing the
"StoreInfo" tag from above XML code the following error coming on
screen.
Error:-
junk after document element at line 5, column 0, byte 53 at /usr/lib/
perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187

Any comments.

Sorry for band english :)

Regards,
Vikrant


Posted by mirod on April 24, 2007, 12:54 am
Please log in for more thread options


vikrant wrote:
> Hi,
> I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
> file is as following:-
> This is a sample code:-
>
-------------------------------------------------------------------------------------------------------------------------
> <?xml version='1.0'?>
> <StoreInfo>
> <StoreName>AEC</StoreName>
> <Products>
>         <Product>
>                 <ProductID>21CR10.2</ProductID>
>                 <ProductInfo name="abc" category="xyz">HUGE</ProductInfo>
>                 <SupplierID>AEC</SupplierID>
>                 <PurchasePrice>10.99</PurchasePrice>
>                 <links>
>                         <link>http://www.example.com</link>
>                         <link>http://www.example2.com</link>
>                 </links>
>         </Product>
>         <Product>
>                 <ProductID>21CR11.2</ProductID>
>                 <ProductInfo name="abcd" category="xyzd">ARROW</ProductInfo>
>                 <SupplierID>AEC</SupplierID>
>                 <PurchasePrice>10.49</PurchasePrice>
>                 <links>
>                         <link>http://www.example.com</link>
>                         <link>http://www.example2.com</link>
>                 </links>
>         </Product>
> </Products>
> </StoreInfo>
>
------------------------------------------------------------------------------------------------------------------------------------
> Here,Product Tag repeating 2000 times in original file.
>
> I am able to get the values of ProductID,SupplierID and
> PurchasePrice using the following code.But,How do a get the value's at
> "link" Node's ,attributes values and node value of ProductInfo NODE.
> I know we can use XPath with XML::Twig but unfortunaly i am not able
> to use it.So,please help me.Any document,link or refrences related to
> it.I search a lot but failed to find.
>
-----------------------------------------------------------------------------------------------------------------------------
> #!/bin/perl -w
> use strict;
> use XML::Twig;
>
> my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
> $t->parsefile( 'sample.xml');
> exit;
> sub product
> { my ($t, $product)= @_;
> my %product;
> $product= $product->field( 'ProductID');
> $product= $product->field( 'SupplierID');
> $product= $product->field( 'PurchasePrice');
>
> print "$product: $product :$product
> \n";
> $product->delete;
> }
>
------------------------------------------------------------------------------------------------------------------------------

'field' is not the only method to get data from the data.
In your case you would use:

my $name= $product->first_child( 'ProductInfo')->att( 'name');

my $links= $product->first_child( 'links'); # the element links
my @links= map { $_->text } $links->children( 'link');

The tutorial at http://www.xmltwig.com/xmltwig/tutorial/index.html
(referenced in the README and at the top of the doc of the module)
gives more info about those methods.

> One strange thing i find accidently is that when i am removing the
> "StoreInfo" tag from above XML code the following error coming on
> screen.
> Error:-
> junk after document element at line 5, column 0, byte 53 at /usr/lib/
> perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187

If you remove the StoreInfo tag then the parser sees
<StoreName>AEC</StoreName> as the entire document, then dies, with an
appropriate error message, when it finds the rest of your original
document, and has no way of dealing with it, as it has already seen a
complete tree.

--
mirod

Posted by vikrant on April 24, 2007, 12:23 pm
Please log in for more thread options



> 'field' is not the only method to get data from the data.
> In your case you would use:
>
> my $name= $product->first_child( 'ProductInfo')->att( 'name');
>
> my $links= $product->first_child( 'links'); # the element links
> my @links= map { $_->text } $links->children( 'link');
>
> The tutorial athttp://www.xmltwig.com/xmltwig/tutorial/index.html
> (referenced in the README and at the top of the doc of the module)
> gives more info about those methods.
>
> > One strange thing i find accidently is that when i am removing the
> > "StoreInfo" tag from above XML code the following error coming on
> > screen.
> > Error:-
> > junk after document element at line 5, column 0, byte 53 at /usr/lib/
> > perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187
>
> If you remove the StoreInfo tag then the parser sees
> <StoreName>AEC</StoreName> as the entire document, then dies, with an
> appropriate error message, when it finds the rest of your original
> document, and has no way of dealing with it, as it has already seen a
> complete tree.
>
> --
> mirod

Thanks for the answer.

Regards,
Vikrant


Similar ThreadsPosted
How to solve memory problems while running a script parsing huge data July 13, 2004, 1:23 pm
XML::Twig::XPath - strange problem November 10, 2005, 9:44 am
Spreadsheet-ParseExcel: Parsing various MS Excel file versions / grabing checkbox values? September 17, 2004, 3:11 am
install HTML::Template - Problem reading cache file / Bad file number July 24, 2004, 7:55 pm
Compare huge XML Files February 23, 2005, 2:01 am
Problem with CPAN and tar file headers May 16, 2007, 4:57 am
Problem printing a file uing Win32::Printer September 30, 2006, 3:36 pm
XML Twig help April 8, 2005, 2:18 pm
Using XML::Twig December 19, 2005, 4:59 pm
XML::Twig June 9, 2006, 1:50 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap