|
Posted by palexvs@gmail.com on November 14, 2007, 7:25 am
Please log in for more thread options
I have BDB with about 10 million records and every day add ~100K.
Now my perl-script that use it DB work very slow: program spend 4-5
minutes to find 40K records.
Script start every 10 minute, get client identification from some logs
(such as apache), find UUID in BDB and add if it not exists.
Use perl5.8.8, p5-BerkeleyDB-0.31, FreeBSD 6.2.
Settings:
Algorithm: B-Tree
Key - 67-72 bytes
Value - 1 byte - 1Kbyte
'bt_ndata' => 10500248,
'bt_int_pgfree' => 826208,
'bt_pagesize' => 16384,
'bt_free' => 0,
'bt_over_pgfree' => 0,
'bt_leaf_pg' => 197947,
'bt_dup_pg' => 0,
'bt_levels' => 3,
'bt_version' => 9,
'bt_dup_pgfree' => 0,
'bt_flags' => 0,
'bt_minkey' => 2,
'bt_re_pad' => 32,
'bt_nkeys' => 10500248,
'bt_magic' => 340322,
'bt_leaf_pgfree' => 1542999750,
'bt_metaflags' => 0,
'bt_maxkey' => 0,
'bt_re_len' => 0,
'bt_int_pg' => 348,
'bt_over_pg' => 0
How to increase performance?
P.S: I've try:
- changed Pagesize to 1K,6K,8K,16K,32K,64K
- split one big BDB to 16 by first character in key
- use Hash
...but have no effect.
#### SCRIPT
#!/usr/bin/perl
use strict;
use warnings;
use 5.8.8;
use BerkeleyDB;
tie my %bdbh, 'BerkeleyDB::Btree', -Filename => 'uniq.db', -Cachesize
=> 200000000, -Flags => DB_RDONLY or die "$!\n";
open(FH,'<','UUID.list') or die "$!\n";
while(my $key=<FH>) {
chomp($key);
if(exists($bdbh)) {
### Fined key
}
else {
### Not fined key
}
}
close(FH);
untie %bdbh;
##### UUID.list (40K recordes)
00000000000000000000000000000000_00000000000000000000000000000000_000000
....
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF_FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF_999999
All keys in UUID.list already exists in uniq.db.
|
|
Posted by Mark Clements on November 15, 2007, 4:03 pm
Please log in for more thread options
palexvs@gmail.com wrote:
> I have BDB with about 10 million records and every day add ~100K.
> Now my perl-script that use it DB work very slow: program spend 4-5
> minutes to find 40K records.
> Script start every 10 minute, get client identification from some logs
> (such as apache), find UUID in BDB and add if it not exists.
<snip>
I've just had a quick play with this and can do 40000 lookups on a 626MB
BerkeleyDB file containing 10 million records (the keys being UUIDs) in
a matter of seconds .
I suggest you identify the bottlenecks in your code using
Benchmark::Timer
Devel::DProf
Do you have limited RAM? Is the data on a network filesystem? Is the
machine heavily loaded?
Mark
|
| Similar Threads | Posted | | Cant load ...BerkeleyDB.so for module BerkeleyDB: libdb-4.4.so | January 19, 2006, 1:53 pm |
| Increase file reading efficiency | March 19, 2008, 1:26 pm |
| problem with BerkeleyDB module and subdatabases | February 27, 2005, 9:40 pm |
| BerkeleyDB: Graduating from text files; tutorial? | September 25, 2004, 6:18 am |
| ANNOUNCE: Search::InvertedIndex::Simple::BerkeleyDB V 1.00 | April 17, 2005, 3:18 am |
| BerkeleyDB Queue Database Array Size | August 14, 2006, 2:50 am |
|