FuzzyDupes Frequently Asked Questions

divider

During the duplicate search I get the error “Not enough memory” (Out Of Memory)

FuzzyDupes requires quite a lot of memory to do the duplicate search. The maximum size of the the data that can be searched is limited by the free memory available. A 32-bit system can have a maximum of approximately 2.5 GB of memory to be addressed, even if more physical RAM has been installed. FuzzyDupes 5 was successfully tested with up to 1.5 million data records. This value can fluctuate and depends on free memory, the number of columns and the redundancies in the data. Please close other applications that require a lot of memory. Create a new query in your database that returns only the relevant columns. If this still is not enough, you can pre-select and break your data into smaller chunks according to a specific criteria (eg 1st digit of the ZIP code). In addition to the 32-bit version a more powerful 64-bit Parallel Edition is now available.

Which is the best practice to calculate a negative match with a second list (Black-List, Robinson list)?

FuzzyDupes offers several approaches. Create a new project to your database. Perform a search for duplicates in order to find favorable parameters and thresholds for your data. Then select menu Duplicate search->Fuzzy Import with the output option Duplicates only. This result is in a grouped view and allows you to manually review the found matches. Remove false positive items with the context menu (right mouse button). The result from the option Duplicates only allowes in the next step a positive or negative match against the first data table.

I am getting a timeout issue from a long running sql query against a view. Is there a setting I am missing?

FuzzyDupes was designed to work with arbitrary data sources. Unfortunately each has it’s own options. For the MS-SQL Server you could also try the last option when selecting a data source: “Other (OLEdb / ODBC)”. This dialog allows you to connect with many different data sources. Please try the “Microsoft SQL Server” with the “.NET Framework” Provider (SqlClient). There is a button for extended options. You can find everything from Pooling, Connection-Timeout, Load-Balancing, implicit transactions, up to M.A.R.S (async. database access). Only a Command-Timeout option seems to be missing. I’m sure you can set this option with the server (in SQL Server Management Studio).

My question is not answered here.

We're here to help by email, telephone, or simply use the support form.