Vyger Irish Genealogy

Our search to further our Eggleton, Surgeoner, Smiley & Gracey
ancestry in Ireland and around the world building new
family relationships in sharing what we have learnt.

 

 

 

Duplicate search merge problem

Files included: (all zipped for faster download) 
1033812.ged
Merge test 103.rmgc  (duplicates color coded before merge)
Merge test 104.rmgc  (results of duplicate search merge)

Written by Helen (Rootsmagic user)

I noticed in my database some individuals of the same name and birthday but with different record numbers were not offered in a duplicate search merge. This is especially a problem when you download your own or someone else’s tree from Ancestry. Their process results in many duplicates in their trees.

To check this I used a gedcom which I downloaded from ancestry. I knew it was full of duplicates. The gedcom is “1033812.ged

I put this gedcom into a RM4 database called “Merge test 101.rmgc”  It had :-
1555 people
391 families
7318 citations.

I did a full automatic merge. That cleaned out a few duplicates. Properties now had :-
1203 people
354 families
6328 citations.
I color coded the remaining duplicates.  That file is called “Merge test 103.rmgc

I did duplicate search merge, and merged a lot more of the duplicates. Properties show :-
1168 people
347 families
5917 citations.
Called that database “Merge test 104.rmgc” 

But I found a number of duplicates with the same birth year still existing, about 140 of them. Those duplicates were never offered in the duplicates search merge. No matter how often I repeated the search.

When I tested the results by putting gedcom “1033812.ged” into RM3, there were no duplicates left after all the merges were done. And the properties were :-
984 people
286 families
854 citations.

It is possible to merge people manually one pair at a time but it is very time consuming and I have lots of info on ancestry that would be nice to avoid typing in.  But I can’t use it if the merges don’t work well. 

line

I ran a comparison test on Helens file just using all the RM automerge functions on the three RM versions I currently have to hand so all should work exactly the same but doesn't. The main observations from this are :-

1. RM4093 is not as effective as RM326 on Automerges
2. RM4096 is not as effective on Automerges as RM4093.

This leaves more hands on work for the user and therefore is not good, well it certainly is not progress. In the table below the best scores are highlighted green and the worst highlighted red, so as you can see the current version (4.0.9.6) is the least effective at automatically merging duplicate individuals.

Origin file RM326 RM4093 RM4096
People

1555

1036

1111

1203

Families

391

302

325

354

Events

2784

1822

2013

2198

Places

647

647

647

647

Sources

1

1

1

1

Citations

1424

905

980

1072

Repositories

1

1

1

1

rootsmagic-merge-issuesOne idea has been put forward to allow some user input into what qualifies as a match allowing the user to effectively set where the bar should be. Although very useful, this is also dangerous and I would suggest that the user be forced to backup the file before any such routine was run and also encouraged through a popup to examine there resulting file afterwards.

I know from experience when using Duplicate Search Merge that RM generated scores of 45 and above and invariably true matches. I would like to be able to scan down through the presented possible matches, decide my safe point score, set this break point and run the merge again based on that user input therefore removing a lot of manual user input.

It would seem from the results above that the speeding up of these operations which was introduced in version 4.0.9.5 is more to do with some logic being removed rather than slicker programming. This has resulted in an 8.3% increase in duplicates being left in 4.0.9.6 over 4.0.9.3 and 16% more duplicates remaining unmerged when compared to RM3.

As files get larger programmed help in finding possible duplicates and merging them needs to get better rather than being less effective and I do hope the developers can take note.