AUTOMATE DEDUPLICATION IN GOOGLE SHEETS

Tags: data cleaning google sheets google sheets macros google sheets remove duplicates flookup

WHY AUTOMATE DEDUPLICATION?

Duplicates silently reduce trust in your data and cost teams time. Manual cleanup is error-prone and does not scale. Automation makes deduplication repeatable, auditable and schedulable.

SAMPLE DATASET AND GOALS

We will use a contacts sheet with name, email and phone columns. The goal: find likely duplicate rows, mark them with a confidence score, and optionally merge or flag them for review.

APPS SCRIPT WORKFLOW (COPY-PASTE)

Drop this script into Extensions → Apps Script in your sheet. The example below reads rows from a Contacts sheet, computes match signals (name normalisation, email similarity and optional phonetic keys), and writes back a match confidence score and suggested merge action in adjacent columns.

Key ideas:


/**
* Dedupe runner: read rows, compute signals, call Flookup API for scored matches.
* Adjust column indices to match your sheet. This writes results to columns F..H.
*/
function runDedupe() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName('Contacts');
const lastRow = sh.getLastRow();
if (lastRow < 2) return; // nothing to process
const rows = sh.getRange(2,1,Math.max(0,lastRow-1),4).getValues();
const out = [];
for (let i=0;i<rows.length;i++){
const [name,email,phone,notes] = rows[i];
const key = (name||"").toString().toLowerCase().replace(/[^a-z0-9]/g, "");
// naive local score placeholder
let localScore = (email && email.toString().trim()) ? 0.6 : 0.2;
// call Flookup API for a better score (example - replace URL and key)
try{
const payload = { record: { name: name||"", email: email||"", phone: phone||"" }, top: 3 };
const res = UrlFetchApp.fetch('https://api.getflookup.com/v1/match', {
method: "POST",
contentType: "application/json",
headers: { "Authorization": "Bearer YOUR_API_KEY" },
payload: JSON.stringify(payload),
muteHttpExceptions: true
});
const json = JSON.parse(res.getContentText()||"{}");
const best = (json.matches && json.matches[0]) ? json.matches[0].score : 0;
const finalScore = Math.max(localScore, best || 0);
out.push([finalScore, best || "", JSON.stringify(json.matches || [])]);
} catch(e) {
out.push([localScore, "", "error"]);
}
}
// write confidence (col F), apiScore (col G), rawMatches (col H)
if (out.length > 0) {
sh.getRange(2,6,out.length,3).setValues(out);
}
}

INTEGRATING WITH FLOOKUP

Flookup provides higher-quality similarity scores and contextual matching than simple key comparisons. Use the add-on for quick, UI-driven workflows or the Flookup API for scheduled, programmatic deduplication.

When calling the API from Apps Script: keep requests small (batch 10-50 records), cache responses where possible, and store the raw match payload in a review sheet so humans can inspect suggested merges before applying them.


// Example: minimal UrlFetchApp call (already shown in runDedupe above)
// Remember to protect your API key and store it in PropertiesService rather than inlined.

SCHEDULING, NOTIFICATIONS AND APPROVALS

Add a time-based trigger to run the dedupe regularly. Best practice: run a conservative scoring pass daily that writes suggestions to a Review sheet, then run a weekly manual approval that applies merges. You can also email a short summary when the number of high-confidence matches exceeds a threshold.

function scheduleDedupe(){
ScriptApp.newTrigger('runDedupe').timeBased().everyDays(1).atHour(2).create();
}

For notifications, generate a CSV summary and email it via MailApp.sendEmail() or integrate with Slack using an incoming webhook.

TESTING, ROLLBACK AND METRICS

Always run the script on a copy first. Store suggested edits in a separate sheet with a proposed_action column. Keep a change log (timestamp, user, rows affected) and provide a one-click rollback that replays the log to revert changes.

Track metrics: matches found, merges applied, false positive rate (from manual review). These numbers inform threshold tuning.

FINAL THOUGHTS

Automating deduplication saves time and reduces risk, but start slow. Use suggested matches in a review workflow, monitor false positives, and tune thresholds. When ready, automate the merge step for high-confidence matches only.

YOU MIGHT ALSO LIKE