remove_words¶
remove_words(string, words_to_remove)
Description¶
Remove any word of words_to_remove
from string
Usage¶
Call or Deploy remove_words
?
Call remove_words
directly
The easiest way to use bigfunctions
remove_words
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy remove_words
in your project
Why deploy?
- You may prefer to deploy
remove_words
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
remove_words
function can be deployed with:
pip install bigfunctions
bigfun get remove_words
bigfun deploy remove_words
Examples¶
select bigfunctions.eu.remove_words("I can eat candies", ['can', 'eat'])
select bigfunctions.us.remove_words("I can eat candies", ['can', 'eat'])
select bigfunctions.europe_west1.remove_words("I can eat candies", ['can', 'eat'])
+----------------+
| cleaned_string |
+----------------+
| I candies |
+----------------+
Use cases¶
A common use case for the remove_words
function is cleaning text data by removing stop words or unwanted terms.
Example: Product Review Analysis
Imagine you have a dataset of product reviews and you want to perform sentiment analysis. Common words like "a," "the," "and," "is," etc. (stop words) don't contribute much to the sentiment and can even skew the analysis. You can use remove_words
to eliminate them:
SELECT bigfunctions.us.remove_words(review_text, ['a', 'the', 'and', 'is', 'this', 'it', 'to', 'in', 'of', 'for', 'on', 'with', 'at', 'by', 'that', 'from']) AS cleaned_review
FROM `your_project.your_dataset.product_reviews`;
This query will process each review_text
and return a cleaned_review
with the specified stop words removed. This cleaned text can then be used for more accurate sentiment analysis or other text processing tasks.
Other Use Cases:
- Data Preprocessing for Machine Learning: Removing irrelevant or noisy words from text data before feeding it into a machine learning model can improve performance.
- Spam Filtering: Identifying and removing common spam words from emails or messages.
- Content Filtering: Blocking inappropriate or offensive language from user-generated content.
- Keyword Extraction: Removing common words to identify the most important keywords in a piece of text.
- Search Optimization: Cleaning search queries by removing unnecessary terms.
By customizing the words_to_remove
array, you can tailor the remove_words
function to various text cleaning and preprocessing tasks.
Need help or Found a bug?
Get help using remove_words
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about remove_words
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.