Skip to content

replace_special_characters

replace_special_characters(string, replacement)

Description

Replace most common special characters in a string with replacement

Usage

Call or Deploy replace_special_characters ?
Call replace_special_characters directly

The easiest way to use bigfunctions

  • replace_special_characters function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy replace_special_characters in your project

Why deploy?

  • You may prefer to deploy replace_special_characters in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

replace_special_characters function can be deployed with:

pip install bigfunctions
bigfun get replace_special_characters
bigfun deploy replace_special_characters

Examples

select bigfunctions.eu.replace_special_characters("%\u2665!Hello!*\u2665#", "")
select bigfunctions.us.replace_special_characters("%\u2665!Hello!*\u2665#", "")
select bigfunctions.europe_west1.replace_special_characters("%\u2665!Hello!*\u2665#", "")
+----------------+
| cleaned_string |
+----------------+
| Hello          |
+----------------+

Use cases

A use case for the replace_special_characters function is cleaning user-generated data before storing or processing it. Imagine you have a website where users can submit product reviews. These reviews might contain special characters like emoticons, punctuation marks beyond the standard set, or even unintended HTML entities. These characters can cause problems when:

  • Storing data in a database: Some databases may not handle certain special characters correctly, leading to errors or data corruption.
  • Displaying data: Special characters may not render correctly on different browsers or devices, leading to a poor user experience.
  • Performing text analysis: Special characters can interfere with natural language processing tasks like sentiment analysis or topic modeling.

Using the replace_special_characters function, you could clean the user-submitted reviews before storing them in your database. For example:

SELECT bigfunctions.us.replace_special_characters(review_text, ' ') AS cleaned_review
FROM `your_project.your_dataset.user_reviews`;

This query would replace all special characters in the review_text column with spaces, resulting in a cleaner version of the review text that is more suitable for storage, display, and analysis. This helps to ensure data consistency and improve the performance of downstream tasks.

Here's another example, focusing on creating URL-friendly strings (slugs):

SELECT bigfunctions.us.replace_special_characters('This is a product title with special characters!@#$%^&*()', '-') AS url_slug

This would output This-is-a-product-title-with-special-characters-------, which, after removing repeating hyphens, could be used as a URL slug.

In essence, the replace_special_characters BigQuery function assists in data sanitization and preparation for various uses by removing or replacing characters that could otherwise cause issues.


Need help or Found a bug?
Get help using replace_special_characters

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about replace_special_characters

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on