STANDARDIZE DATA

Overview

The Standardize Data tool helps you clean and normalise your data by removing noise such as stop words, punctuation and diacritical marks. This is a crucial pre-processing step that significantly improves the accuracy of fuzzy matching and data reconciliation by focusing on the substantive content of your text.

To begin, open the tool by navigating to Extensions > Flookup Data Wrangler > Matching and Analysis > Standardize data in your Google Sheets menu.


How to Use the Standardize Data Tool

  1. Select the cleaning operation
    Choose the type of normalization to perform, such as Remove punctuation marks, Remove unwanted words, Remove diacritical marks, Keep URL path only, or Keep URL domain only.
  2. Select the data to normalize
    Highlight the column or range of data you wish to process and click the Grab selected range button for the Range of entries to normalize.
  3. Select entries to remove (if applicable)
    For punctuation or word removal, highlight a range containing the specific characters or words you wish to exclude and click the Grab selected range button for the Range with entries to be removed.
  4. Specify the output location
    Click an empty cell within your spreadsheet to mark the starting position where the cleaned results should be displayed.
  5. Set the threshold (if applicable)
    If you selected Remove unwanted words, you can adjust the Threshold value (default 0.75) to account for slight variations in the words you wish to remove.
  6. Execute the normalization
    Click the Normalize text entries button to finish.

Notes on Standardizing Data


Visual Learning Resources

Labels might differ slightly but the steps remain the same.


You Might Also Like