INTELLIGENT DATA CLEANING FOR GOOGLE SHEETS
Introduction to Semantic Matching
Semantic matching allows you to compare text based on its meaning rather than just its surface-level similarity. This is useful for finding matches between texts that are conceptually similar but use different wording.
Please note that all the functions discussed herein are accessible under
Extensions > Flookup Data Wrangler > Matching and Analysis
via your spreadsheet menu.
Compare Text by Semantic Similarity
- Open the sidebar
Navigate to the Compare by meaning tool within the Flookup Data Wrangler sidebar. - Select the range containing the lookup value
Highlight a single column and click Grab selected range. - Select a second range to compare
Highlight another single column and click Grab selected range. - Specify output location
Click an empty cell to mark where the results should appear. -
Get fuzzy matches
Click the Get semantic similarities button.
Notes on Comparing Text by Semantic Similarity
- Matches reflect meaning, not just surface similarity e.g. “I love dogs” versus “I adore canines”.
- Only the first column of each selected range is used; ensure relevant text is in that column.
- This feature consumes credits. Your processing limit is determined by your available credit balance.
Fuzzy Match by Semantic Similarity
- Open the sidebar
Navigate to the Fuzzy match by meaning tool within the Flookup Data Wrangler sidebar. - Select the range containing the lookup value
Highlight a single column of data and click Grab selected range to read it into Primary range. - Select the range where the lookup is potentially located
Highlight a separate range of one or more columns and click Grab selected range to read it into Secondary range. - Specify the lookup column
Enter the index of the column in Secondary range to compare with values in Primary range. - Specify the return column
Enter the index of the column in Secondary range from which you want values to be returned. - Specify output location
Click an empty cell within the spreadsheet to mark the position where the results should be displayed. - Get fuzzy matches
Click the Get semantic matches button to finish.
Notes on Fuzzy Matching by Semantic Similarity
- Lookup_column only processes one column. If you select more than one, only the leftmost will be analysed.
-
For best results, correct spelling errors before running this function.
Head to Tools > Spelling > Spell check in the spreadsheet menu.
Benefits of Semantic Matching
Semantic matching represents a significant advancement in data comparison and analysis. Unlike traditional text matching methods that rely on character-by-character comparisons, semantic matching understands the underlying meaning of text. This enables you to identify relationships and patterns that remain invisible to surface-level similarity algorithms.
Key Advantages
- Contextual Understanding: Semantic matching comprehends meaning, allowing you to match "I love dogs" with "I adore canines" and similar conceptually equivalent phrases.
- Reduced Manual Work: Automation of complex matching tasks eliminates the tedious process of manually identifying similar entries, saving substantial time in large datasets.
- Improved Data Quality: By discovering hidden relationships and duplicate entries that use different wording, you can maintain cleaner, more consistent datasets.
- Enhanced Analytics: Better data quality leads to more accurate analysis and reporting, enabling you to make informed decisions based on reliable information.
- Deduplication at Scale: Efficiently identify and consolidate duplicate records across your dataset, even when they use varying terminology or formatting.
Why Semantic Matching Matters for Your Data
Data quality directly impacts business intelligence and decision-making. Organisations often struggle with datasets containing entries described in different ways that refer to the same entity. Traditional deduplication approaches miss these variations, leaving your analysis incomplete. Semantic matching fills this gap by recognizing conceptual equivalence, ensuring your data analysis is comprehensive and trustworthy.