Skip to content

bigfunctions > run_python

run_python

Call or Deploy run_python ?

✅ You can call this run_python bigfunction directly from your Google Cloud Project (no install required).

  • This run_python function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

run_python(python_code, requirements, kwargs)

Description

Run any python_code.

For security reasons (sandboxing):

  • this function is rather slow (a new python environement is created for each query). You may prefer to create a dedicated python function for your use case. You can suggest a new bigfunction here if you want someone to create your function.
  • your python code won't have access to internet
  • not all python packages can be installed
Param Possible values
python_code Arbitrary python code (indented with 4 spaces).
requirements requirements as you would pass them to pip install (separated with space). Keep note that for security reasons, not all python packages can be installed
kwargs A json dict of variables. These variables will be defined and usable in your python code.
How sandboxing is done

The provided python_code will run in pyodide: a python distribution which runs in a chrome headless browser.

This simplifies the implementation of:

  • isolation between function calls,
  • installation of python packages,
  • isolation from the internet.

For every function call:

  • we init a new browser context,
  • download pyodide,
  • install python packages
  • run the code.

Examples

1. Basic Example

select bigfunctions.eu.run_python(
  '''
  return sum(range(10))
  '''
  , 
  null
  , 
  null
  )
select bigfunctions.us.run_python(
  '''
  return sum(range(10))
  '''
  , 
  null
  , 
  null
  )
select bigfunctions.europe_west1.run_python(
  '''
  return sum(range(10))
  '''
  , 
  null
  , 
  null
  )
+--------+
| result |
+--------+
| 45     |
+--------+

2. Some packages such as pandas can be installed and used.

select bigfunctions.eu.run_python(
  '''
  import pandas as pd
  return pd.Series(range(10)).sum()
  '''
  , 
  'pandas'
  , 
  null
  )
select bigfunctions.us.run_python(
  '''
  import pandas as pd
  return pd.Series(range(10)).sum()
  '''
  , 
  'pandas'
  , 
  null
  )
select bigfunctions.europe_west1.run_python(
  '''
  import pandas as pd
  return pd.Series(range(10)).sum()
  '''
  , 
  'pandas'
  , 
  null
  )
+--------+
| result |
+--------+
| 45     |
+--------+

3. Replace word passed as a variable by its stem

select bigfunctions.eu.run_python(
  '''
  import snowballstemmer
  stemmer = snowballstemmer.stemmer('english')
  stems = stemmer.stemWords(text.split())
  return ' '.join(stems)
  '''
  , 
  'snowballstemmer'
  , 
  to_json(struct(
    'care cared and caring' as text
  ))
  )
select bigfunctions.us.run_python(
  '''
  import snowballstemmer
  stemmer = snowballstemmer.stemmer('english')
  stems = stemmer.stemWords(text.split())
  return ' '.join(stems)
  '''
  , 
  'snowballstemmer'
  , 
  to_json(struct(
    'care cared and caring' as text
  ))
  )
select bigfunctions.europe_west1.run_python(
  '''
  import snowballstemmer
  stemmer = snowballstemmer.stemmer('english')
  stems = stemmer.stemWords(text.split())
  return ' '.join(stems)
  '''
  , 
  'snowballstemmer'
  , 
  to_json(struct(
    'care cared and caring' as text
  ))
  )
+--------+
| result |
+--------+
| go     |
+--------+

Need help using run_python?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using run_python?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

This run_python function allows you to execute arbitrary Python code within BigQuery. Here's a breakdown of potential use cases and how it addresses them:

1. Text Preprocessing/Natural Language Processing (NLP):

  • Stemming/Lemmatization: The provided example demonstrates stemming words using the snowballstemmer library. This is useful for NLP tasks like text analysis, where you want to reduce words to their root form (e.g., "running," "runs," "ran" become "run"). Imagine you have a BigQuery table with product reviews. You could use run_python to stem the review text directly within BigQuery before feeding it into a sentiment analysis model.
  • Regular Expressions: You can use Python's powerful re module for complex pattern matching and string manipulation in your data. For instance, extract specific information from text fields, validate data formats, or clean up inconsistent data.
  • Other NLP tasks: Tokenization, part-of-speech tagging, named entity recognition – any Python NLP library that can be installed in the sandbox can be leveraged.

2. Data Cleaning and Transformation:

  • Custom logic: Implement data transformations that are too complex for standard SQL functions. This could include handling missing values in a specific way, recoding variables based on complex criteria, or applying custom business rules.
  • Date/Time manipulation: Python's datetime module offers more flexibility than standard SQL for working with dates and times. You might use it to parse dates in unusual formats, calculate time differences, or handle time zones.
  • Numerical computations: Perform complex calculations beyond basic arithmetic, such as using the math or NumPy libraries.

3. User-Defined Functions (UDFs) with Python Flexibility:

  • Code Reusability: While less performant than compiled UDFs, run_python offers a quick way to prototype and deploy UDF-like functionality without the need for separate deployment steps.
  • Complex logic encapsulation: Package up complex logic within the function, making your SQL queries cleaner and easier to understand.

4. Prototyping and Experimentation:

  • Quick tests: Quickly test Python code snippets against your BigQuery data without leaving the BigQuery environment. This is great for exploratory data analysis or testing different transformations.
  • Library exploration: Experiment with different Python libraries to see how they might be applied to your data.

Example: Sentiment Analysis Preprocessing

Let's say you have a table called product_reviews with a column review_text. You could use run_python to perform basic sentiment preprocessing:

SELECT
    review_id,
    bigfunctions.us.run_python(
      '''
      import re
      from snowballstemmer import stemmer
      text = re.sub(r'[^\w\s]', '', text).lower()  # Remove punctuation and lowercase
      stemmer_en = stemmer('english')
      stemmed_text = ' '.join(stemmer_en.stemWords(text.split()))
      return stemmed_text
      ''',
      're snowballstemmer',
      TO_JSON(STRUCT(review_text as text))
    ) AS processed_review_text
  FROM
    `your_project.your_dataset.product_reviews`;

This query removes punctuation, lowercases the text, and stems the words, preparing the review_text for further sentiment analysis.

Key Considerations:

  • Performance: As noted in the documentation, run_python is relatively slow due to the sandboxed environment. For production-level, high-performance scenarios, consider using compiled UDFs instead.
  • Security: The sandboxed environment limits network access and available libraries for security reasons.

This function provides a powerful way to bridge the gap between SQL and Python within BigQuery, enabling more complex data manipulation and analysis directly within your data warehouse. However, be mindful of the performance implications and security constraints.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on