Click here to get back home

finding duplicate records

 HomeNewsGroups | Search | About
 mailing.database.mysql    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
finding duplicate records rgurganus 03-18-2007
Posted by rgurganus on March 18, 2007, 11:49 pm
Please log in for more thread options
I'm trying to setup a way to find possible duplicate or near-duplicate
records of contact people. I have a table of 50k records doing an
inner join with itself, something like this:

SELECT p1.fields, p2.fields
FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname
WHERE p1.id <> p2.id AND
(other criteria)

The id and name and other criteria fields are indexed as well as they
can be. It works fine as long as I add a LIMIT statement to keep it
down to just 10 or 20 records, but if I take this off, it runs on
forever -- so I can't even tell how many potentail records there will
be. If the table grows to 100k or 200k, it'll only get worse.

So my question is -- are there any better ways to go about this?


Posted by strawberry on March 19, 2007, 4:29 am
Please log in for more thread options
On Mar 19, 3:49 am, rgurga...@citidc.com wrote:
> I'm trying to setup a way to find possible duplicate or near-duplicate
> records of contact people. I have a table of 50k records doing an
> inner join with itself, something like this:
>
> SELECT p1.fields, p2.fields
> FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname
> WHERE p1.id <> p2.id AND
> (other criteria)
>
> The id and name and other criteria fields are indexed as well as they
> can be. It works fine as long as I add a LIMIT statement to keep it
> down to just 10 or 20 records, but if I take this off, it runs on
> forever -- so I can't even tell how many potentail records there will
> be. If the table grows to 100k or 200k, it'll only get worse.
>
> So my question is -- are there any better ways to go about this?


I think '<' would be better than '<>'.


Similar ThreadsPosted
finding duplicate fields June 9, 2006, 9:39 am
Best way for finding duplicate entries in table? June 1, 2005, 4:25 am
How do I get rid of duplicate records? March 2, 2006, 2:47 pm
Finding Duplicates Help July 10, 2006, 9:21 pm
finding if a table exists November 21, 2005, 8:43 am
Is 'finding' a MySQL keyword or such? June 15, 2006, 3:17 am
All Records From Table A - All Records From Table B - Join Alike Records October 3, 2006, 11:24 am
Help: rejecting duplicate rows September 6, 2005, 2:19 pm
Help: rejecting duplicate rows September 6, 2005, 2:31 pm
Stopping duplicate fields March 2, 2006, 3:03 pm

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap