Skip to content

remove_words

remove_words(string, words_to_remove)

Description

Remove any word of words_to_remove from string

Usage

Call or Deploy remove_words ?
Call remove_words directly

The easiest way to use bigfunctions

  • remove_words function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy remove_words in your project

Why deploy?

  • You may prefer to deploy remove_words in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

remove_words function can be deployed with:

pip install bigfunctions
bigfun get remove_words
bigfun deploy remove_words

Examples

select bigfunctions.eu.remove_words("I can eat candies", ['can', 'eat'])
select bigfunctions.us.remove_words("I can eat candies", ['can', 'eat'])
select bigfunctions.europe_west1.remove_words("I can eat candies", ['can', 'eat'])
+----------------+
| cleaned_string |
+----------------+
| I  candies     |
+----------------+

Use cases

A common use case for the remove_words function is cleaning text data by removing stop words or unwanted terms.

Example: Product Review Analysis

Imagine you have a dataset of product reviews and you want to perform sentiment analysis. Common words like "a," "the," "and," "is," etc. (stop words) don't contribute much to the sentiment and can even skew the analysis. You can use remove_words to eliminate them:

SELECT bigfunctions.us.remove_words(review_text, ['a', 'the', 'and', 'is', 'this', 'it', 'to', 'in', 'of', 'for', 'on', 'with', 'at', 'by', 'that', 'from']) AS cleaned_review
FROM `your_project.your_dataset.product_reviews`;

This query will process each review_text and return a cleaned_review with the specified stop words removed. This cleaned text can then be used for more accurate sentiment analysis or other text processing tasks.

Other Use Cases:

  • Data Preprocessing for Machine Learning: Removing irrelevant or noisy words from text data before feeding it into a machine learning model can improve performance.
  • Spam Filtering: Identifying and removing common spam words from emails or messages.
  • Content Filtering: Blocking inappropriate or offensive language from user-generated content.
  • Keyword Extraction: Removing common words to identify the most important keywords in a piece of text.
  • Search Optimization: Cleaning search queries by removing unnecessary terms.

By customizing the words_to_remove array, you can tailor the remove_words function to various text cleaning and preprocessing tasks.


Need help or Found a bug?
Get help using remove_words

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about remove_words

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on