bigfunctions > run_python
run_python¶
Call or Deploy run_python
?
✅ You can call this run_python
bigfunction directly from your Google Cloud Project (no install required).
- This
run_python
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
run_python(python_code, requirements, kwargs)
Description
Run any python_code
.
For security reasons (sandboxing):
- this function is rather slow (a new python environement is created for each query). You may prefer to create a dedicated python function for your use case. You can suggest a new bigfunction here if you want someone to create your function.
- your python code won't have access to internet
- not all python packages can be installed
Param | Possible values |
---|---|
python_code |
Arbitrary python code (indented with 4 spaces). |
requirements |
requirements as you would pass them to pip install (separated with space). Keep note that for security reasons, not all python packages can be installed |
kwargs |
A json dict of variables. These variables will be defined and usable in your python code. |
How sandboxing is done
The provided python_code
will run in pyodide:
a python distribution which runs in a chrome headless browser.
This simplifies the implementation of:
- isolation between function calls,
- installation of python packages,
- isolation from the internet.
For every function call:
- we init a new browser context,
- download pyodide,
- install python packages
- run the code.
Examples¶
1. Basic Example
select bigfunctions.eu.run_python(
'''
return sum(range(10))
'''
,
null
,
null
)
select bigfunctions.us.run_python(
'''
return sum(range(10))
'''
,
null
,
null
)
select bigfunctions.europe_west1.run_python(
'''
return sum(range(10))
'''
,
null
,
null
)
+--------+
| result |
+--------+
| 45 |
+--------+
2. Some packages such as pandas
can be installed and used.
select bigfunctions.eu.run_python(
'''
import pandas as pd
return pd.Series(range(10)).sum()
'''
,
'pandas'
,
null
)
select bigfunctions.us.run_python(
'''
import pandas as pd
return pd.Series(range(10)).sum()
'''
,
'pandas'
,
null
)
select bigfunctions.europe_west1.run_python(
'''
import pandas as pd
return pd.Series(range(10)).sum()
'''
,
'pandas'
,
null
)
+--------+
| result |
+--------+
| 45 |
+--------+
3. Replace word
passed as a variable by its stem
select bigfunctions.eu.run_python(
'''
import snowballstemmer
stemmer = snowballstemmer.stemmer('english')
stems = stemmer.stemWords(text.split())
return ' '.join(stems)
'''
,
'snowballstemmer'
,
to_json(struct(
'care cared and caring' as text
))
)
select bigfunctions.us.run_python(
'''
import snowballstemmer
stemmer = snowballstemmer.stemmer('english')
stems = stemmer.stemWords(text.split())
return ' '.join(stems)
'''
,
'snowballstemmer'
,
to_json(struct(
'care cared and caring' as text
))
)
select bigfunctions.europe_west1.run_python(
'''
import snowballstemmer
stemmer = snowballstemmer.stemmer('english')
stems = stemmer.stemWords(text.split())
return ' '.join(stems)
'''
,
'snowballstemmer'
,
to_json(struct(
'care cared and caring' as text
))
)
+--------+
| result |
+--------+
| go |
+--------+
Need help using run_python
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using run_python
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
This run_python
function allows you to execute arbitrary Python code within BigQuery. Here's a breakdown of potential use cases and how it addresses them:
1. Text Preprocessing/Natural Language Processing (NLP):
- Stemming/Lemmatization: The provided example demonstrates stemming words using the
snowballstemmer
library. This is useful for NLP tasks like text analysis, where you want to reduce words to their root form (e.g., "running," "runs," "ran" become "run"). Imagine you have a BigQuery table with product reviews. You could userun_python
to stem the review text directly within BigQuery before feeding it into a sentiment analysis model. - Regular Expressions: You can use Python's powerful
re
module for complex pattern matching and string manipulation in your data. For instance, extract specific information from text fields, validate data formats, or clean up inconsistent data. - Other NLP tasks: Tokenization, part-of-speech tagging, named entity recognition – any Python NLP library that can be installed in the sandbox can be leveraged.
2. Data Cleaning and Transformation:
- Custom logic: Implement data transformations that are too complex for standard SQL functions. This could include handling missing values in a specific way, recoding variables based on complex criteria, or applying custom business rules.
- Date/Time manipulation: Python's
datetime
module offers more flexibility than standard SQL for working with dates and times. You might use it to parse dates in unusual formats, calculate time differences, or handle time zones. - Numerical computations: Perform complex calculations beyond basic arithmetic, such as using the
math
orNumPy
libraries.
3. User-Defined Functions (UDFs) with Python Flexibility:
- Code Reusability: While less performant than compiled UDFs,
run_python
offers a quick way to prototype and deploy UDF-like functionality without the need for separate deployment steps. - Complex logic encapsulation: Package up complex logic within the function, making your SQL queries cleaner and easier to understand.
4. Prototyping and Experimentation:
- Quick tests: Quickly test Python code snippets against your BigQuery data without leaving the BigQuery environment. This is great for exploratory data analysis or testing different transformations.
- Library exploration: Experiment with different Python libraries to see how they might be applied to your data.
Example: Sentiment Analysis Preprocessing
Let's say you have a table called product_reviews
with a column review_text
. You could use run_python
to perform basic sentiment preprocessing:
SELECT
review_id,
bigfunctions.us.run_python(
'''
import re
from snowballstemmer import stemmer
text = re.sub(r'[^\w\s]', '', text).lower() # Remove punctuation and lowercase
stemmer_en = stemmer('english')
stemmed_text = ' '.join(stemmer_en.stemWords(text.split()))
return stemmed_text
''',
're snowballstemmer',
TO_JSON(STRUCT(review_text as text))
) AS processed_review_text
FROM
`your_project.your_dataset.product_reviews`;
This query removes punctuation, lowercases the text, and stems the words, preparing the review_text
for further sentiment analysis.
Key Considerations:
- Performance: As noted in the documentation,
run_python
is relatively slow due to the sandboxed environment. For production-level, high-performance scenarios, consider using compiled UDFs instead. - Security: The sandboxed environment limits network access and available libraries for security reasons.
This function provides a powerful way to bridge the gap between SQL and Python within BigQuery, enabling more complex data manipulation and analysis directly within your data warehouse. However, be mindful of the performance implications and security constraints.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!