|
FuzzyDupes 5.7 Help
Help Start Page |
|
Project WizardSelect in main menu "File->New Project". This launches the Project Wizard.
Database Connection![]()
Data SourceSelect one of the following data sources:
You can connect to all databases that come with an ODBC driver or OLEdb-Provider, e.g. Oracle, MySQL and many more.
TableSubsequently select the Table or View/Query, which contains the data.
Special Fields![]()
Identity ColumnChoose a column from your table which contains distinct values (identity column). This column should also contain a primary key.Duplicate Fields![]()
ClusterSelect 2-4 columns for cluster creation. These columns should be filled with data very well.Only select columns of type character/string. ZIP-codes are unsuitable for cluster creation. With address data select e.g. LastName, Street, City
Duplicate Search ColumnsSelect some columns for the duplicate search. Select at last all columns, that are marked for cluster creation, and some more. With address data you typically select
Using fuzzy comparison algorithms, the program will calculate the correspondence in each of the selected fields. The program then uses these results to calculate the average correspondence between two records.
WeightsMost of the time, leave all quantifiers on Normal. If you want, you can place more or less emphasis on an individual column.Select Identical, if exact correspondence is required for a particular column. The "Identical" option is especially useful with grouped data sets, although duplicates can then only appear within defined groups. Select Identical for the group column.
NULL ComparisonSelect NULL Comparison for columns that contain values in (nearly) all rows (e.g. Last Name, Street, ZIP, City). For other columns which may contain NULLs in many rows don't select NULL Comparison (e.g. First Name, Phone, Fax, ...)With NULL Comparison, empty values (NULL-Values) are used for the calculation of the average correspondence between two records. Click on "Next" to continue.
Normaization![]()
StandardStandard normalization converts all characters to uppercase, replaces special characters and umlauts etc. Standard normalization should usually checked for all columns.
Normalization 1.. 3Select up to 3 different normalization rules for each column. Use "default" on address data when you're not shure and when you don't have user defined normalizytion rules.Use the Normalization Rules Editor to customize rules or add new rules. Options![]()
Threshold ClusterWith this slider you can influence the cluster size. In most cases you will leave this at normal position. It is not a good idea to set it to the most right position, because this will only slow down the search but gives no better results. With large databases, you may want to set it a little bit to "faster" to speed up the search.
Threshold DuplicatesThis setting has the most relevant effect on the search results.Default is 90. Later you can repeat the duplicate search with a different threshold value. Increase this value if too many duplicates were found. Choose a lower value if not enough duplicates were found.
Read more -> Duplicate Search Copyright (c) by Kroll-Software, Zug/CH 2002-2010, All Rights Reserved
|
|