Hybrid Fuzzy Matching With Embeddings in Google Sheets and Excel

On This Page

Key Takeaways


Why Hybrid Matching?

Quick Checklist

Step Action Why It Matters
1 Normalise text fields in the source sheet Consistent casing and formatting improve both lexical and semantic match quality
2 Run a lexical fuzzy pass for initial filtering Quickly discard clear non-matches before invoking expensive embedding calls
3 Generate embeddings for shortlisted candidates Capture semantic meaning where surface-level character overlap is low
4 Score semantic similarity against a confidence threshold Translate embedding distance into an actionable match or reject decision
5 Merge hybrid scores into the final match verdict Combine lexical and semantic signals for higher overall accuracy

Classical fuzzy matching remains efficient for typographical errors and close variants; embeddings capture semantic similarity across phrasing, abbreviations and synonyms. Combining both reduces calls to embedding services while preserving precision. The hybrid approach is particularly suitable for spreadsheet audiences who require low-friction, low-cost integration.

A practical scenario: you are comparing supplier names from a procurement database. A simple edit-distance check catches "Walmart Inc" versus "WalMart Inc". But "Walmart Inc" and "Wal Mart Stores" share little character overlap. An embedding approach maps both to the same semantic region, returning a high match score. By running the lexical filter first, you only pay for embedding API calls on the small subset of candidates that survive the initial pass.


High-level Pattern

  1. Normalise text in the sheet (for example with Standardize Data).
  2. Use Flookup functions such as Match and Merge or Remove Duplicates tool to produce a short candidate list per row.
  3. Compute embeddings for candidates only and score with a semantic service.
  4. Combine semantic score with a lexical check (for example Compare Text) and apply thresholds to auto-accept, reject or flag for review.

Google Sheets via Apps script

The snippet below is a minimal, production-aware Apps Script that batches candidate requests, calls a hypothetical embedding match endpoint and writes scores back to the sheet. Adapt the endpoint and authentication to the chosen provider.


function batchSemanticMatch() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName('Matches');
var data = sheet.getRange(2,1,sheet.getLastRow()-1,3).getValues();
var batchSize = 50;
for (var i=0; i<data.length; i+=batchSize) {
var batch = data.slice(i, i+batchSize);
var payload = batch.map(function(r){
return {id: r[0], query: r[1], candidates: JSON.parse(r[2])};
});
var resp = UrlFetchApp.fetch('https://your-embedding-service.example/v1/match', {
method: 'post',
contentType: 'application/json',
payload: JSON.stringify({items: payload, top_k:5}),
muteHttpExceptions: true
});
if (resp.getResponseCode()!== 200) continue;
var results = JSON.parse(resp.getContentText());
results.forEach(function(r, idx){
var row = i + idx + 2;
sheet.getRange(row,4).setValue(JSON.stringify(r.matches));
sheet.getRange(row,5).setValue(r.top_score);
});
}
}

Notes:


Excel via Office Script

Office Script can perform an equivalent flow in Excel Online. The following is a concise example using the Fetch API available in Office Scripts runtimes.


async function main(workbook: ExcelScript.Workbook) {
const sheet = workbook.getWorksheet('Matches');
const range = sheet.getRange('A2:C101');
const values = range.getValues();
const items = values.map(r => ({id: r[0], query: r[1], candidates: JSON.parse(r[2])}));
const resp = await fetch('https://your-embedding-service.example/v1/match', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({items: items, top_k: 5})
});
if (!resp.ok) return;
const results = await resp.json();
results.forEach((r, i) => {
sheet.getRange(`D${i+2}`).setValue(JSON.stringify(r.matches));
sheet.getRange(`E${i+2}`).setValue(r.top_score);
});
}

Performance and Cost Benchmarks

Worked example: 10,000 rows naive embedding per row vs hybrid candidate approaches.

Approach Embedding calls Cost (@ $0.0005/call)
Naive: One embedding per row 10,000 $5.00
Hybrid: FLOOKUP with ~20 candidates (no optimisation) 200,000 $100.00
Optimised hybrid: Blocking + caching (avg 5 candidates) 50,000 $25.00

Interpretation: Unoptimised hybrid approaches can actually increase embedding costs due to candidate volume. Candidate reduction and caching are essential to keep costs manageable. The cost figures above are illustrative; adapt pricing to your provider and model.

A good rule of thumb: if your Flookup lexical step produces fewer than 10 candidates per row on average, the semantic scoring step remains cost effective at typical embedding API rates. When candidate counts rise above 20, consider adding blocking fields such as industry code or geographic region to narrow the pool before sending candidates to the embedding service.


ANN and Production Notes


UX and Review Workflows

Recommended columns in the review sheet:

Ready to Implement Hybrid Semantic Matching?

Start by defining your candidate reduction strategy and exploring how Flookup's tools can bridge the gap to high-precision embedding analysis.


Frequently Asked Questions

What is hybrid fuzzy matching?

Hybrid fuzzy matching combines traditional string similarity algorithms (such as Levenshtein or Jaro-Winkler) with semantic embedding models to improve matching accuracy. The lexical component catches typographical variations while the semantic component understands meaning, so "car" matches "automobile" even when the strings share no common characters.

When should I use embeddings instead of traditional fuzzy matching?

Embeddings are preferred when data contains synonyms, paraphrases or conceptually equivalent terms that share few characters in common. For example, matching job titles ("CEO" vs "Chief Executive Officer") or product descriptions across different catalogues. Traditional edit-distance algorithms remain better for typo correction and short-code matching.

Does Flookup support embedding-based matching?

Flookup integrates with embedding services to provide semantic matching alongside its built-in fuzzy matching algorithms. This allows you to combine lexical and semantic scores in a single decision framework, getting the best of both approaches within your Google Sheets workflow.


Further Reading