Click here to get back home

How to increase performance BerkeleyDB?

 HomeNewsGroups | Search | About
 comp.lang.perl.modules    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
How to increase performance BerkeleyDB? palexvs@gmail.com 11-14-2007
Posted by palexvs@gmail.com on November 14, 2007, 7:25 am
Please log in for more thread options
I have BDB with about 10 million records and every day add ~100K.
Now my perl-script that use it DB work very slow: program spend 4-5
minutes to find 40K records.
Script start every 10 minute, get client identification from some logs
(such as apache), find UUID in BDB and add if it not exists.
Use perl5.8.8, p5-BerkeleyDB-0.31, FreeBSD 6.2.

Settings:
Algorithm: B-Tree
Key - 67-72 bytes
Value - 1 byte - 1Kbyte
'bt_ndata' => 10500248,
'bt_int_pgfree' => 826208,
'bt_pagesize' => 16384,
'bt_free' => 0,
'bt_over_pgfree' => 0,
'bt_leaf_pg' => 197947,
'bt_dup_pg' => 0,
'bt_levels' => 3,
'bt_version' => 9,
'bt_dup_pgfree' => 0,
'bt_flags' => 0,
'bt_minkey' => 2,
'bt_re_pad' => 32,
'bt_nkeys' => 10500248,
'bt_magic' => 340322,
'bt_leaf_pgfree' => 1542999750,
'bt_metaflags' => 0,
'bt_maxkey' => 0,
'bt_re_len' => 0,
'bt_int_pg' => 348,
'bt_over_pg' => 0

How to increase performance?

P.S: I've try:
- changed Pagesize to 1K,6K,8K,16K,32K,64K
- split one big BDB to 16 by first character in key
- use Hash
...but have no effect.


#### SCRIPT
#!/usr/bin/perl

use strict;
use warnings;
use 5.8.8;

use BerkeleyDB;

tie my %bdbh, 'BerkeleyDB::Btree', -Filename => 'uniq.db', -Cachesize
=> 200000000, -Flags => DB_RDONLY or die "$!\n";
open(FH,'<','UUID.list') or die "$!\n";
while(my $key=<FH>) {
chomp($key);
if(exists($bdbh)) {
### Fined key
}
else {
### Not fined key
}
}
close(FH);
untie %bdbh;

##### UUID.list (40K recordes)
00000000000000000000000000000000_00000000000000000000000000000000_000000
....
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF_FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF_999999

All keys in UUID.list already exists in uniq.db.


Posted by Mark Clements on November 15, 2007, 4:03 pm
Please log in for more thread options
palexvs@gmail.com wrote:
> I have BDB with about 10 million records and every day add ~100K.
> Now my perl-script that use it DB work very slow: program spend 4-5
> minutes to find 40K records.
> Script start every 10 minute, get client identification from some logs
> (such as apache), find UUID in BDB and add if it not exists.

<snip>
I've just had a quick play with this and can do 40000 lookups on a 626MB
BerkeleyDB file containing 10 million records (the keys being UUIDs) in
a matter of seconds .

I suggest you identify the bottlenecks in your code using

Benchmark::Timer
Devel::DProf

Do you have limited RAM? Is the data on a network filesystem? Is the
machine heavily loaded?

Mark

Similar ThreadsPosted
Cant load ...BerkeleyDB.so for module BerkeleyDB: libdb-4.4.so January 19, 2006, 1:53 pm
Increase file reading efficiency March 19, 2008, 1:26 pm
problem with BerkeleyDB module and subdatabases February 27, 2005, 9:40 pm
BerkeleyDB: Graduating from text files; tutorial? September 25, 2004, 6:18 am
ANNOUNCE: Search::InvertedIndex::Simple::BerkeleyDB V 1.00 April 17, 2005, 3:18 am
BerkeleyDB Queue Database Array Size August 14, 2006, 2:50 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap