The problem

A client of ours wanted to clean a large product data (~100,000 items) set that contained duplicates and near duplicates. Manually searching each product for duplicates was projected to take more than four months of effort to clean by hand.

The managed functions that solved the problem

EQ8R provided them with two functions.

The first is a service that takes their original dataset and returns a grouped dataset of similar products. In the case of this dataset, it returned 40,000 groups of similar products along with a recommended canonical product.

The client's analyst could then review each of the canonical products and, if acceptable, they needed to take no action. If not acceptable, they could simply mark the product as new.

The second managed function took the entire data set back and regrouped using the products marked "New" as canonical products.

The end result was a list of 41,000 canonical products that the client could load into their database.

Why this problem was well-suited to a managed function solution

The Equator managed function matching capabilities are second to none. It took into consideration factors such as the name and supplier of the product as well as price, unit size and carton characteristics to identify similar products.