Skip to content

detect_sensitive_info

detect_sensitive_info(text)

Description

Detect sensitive information in text using Cloud Data Loss Prevention

Usage

Call or Deploy detect_sensitive_info ?
Call detect_sensitive_info directly

The easiest way to use bigfunctions

  • detect_sensitive_info function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy detect_sensitive_info in your project

Why deploy?

  • You may prefer to deploy detect_sensitive_info in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

detect_sensitive_info function can be deployed with:

pip install bigfunctions
bigfun get detect_sensitive_info
bigfun deploy detect_sensitive_info

Examples

1. String with email in it.

select bigfunctions.eu.detect_sensitive_info("My email is shivam@google.co.in")
select bigfunctions.us.detect_sensitive_info("My email is shivam@google.co.in")
select bigfunctions.europe_west1.detect_sensitive_info("My email is shivam@google.co.in")
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sensitive_info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"string": "shivam", "info_type": "PERSON_NAME", "confidence": "POSSIBLE"}, {"string": "shivam", "info_type": "FIRST_NAME", "confidence": "POSSIBLE"}, {"string": "shivam", "info_type": "FEMALE_NAME", "confidence": "POSSIBLE"}, {"string": "shivam", "info_type": "MALE_NAME", "confidence": "POSSIBLE"}, {"string": "google", "info_type": "ORGANIZATION_NAME", "confidence": "POSSIBLE"}, {"string": "shivam@google.co.in", "info_type": "EMAIL_ADDRESS", "confidence": "VERY_LIKELY"}, {"string": "google.co.in", "info_type": "DOMAIN_NAME", "confidence": "LIKELY"}] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

2. String with phone number in it.

select bigfunctions.eu.detect_sensitive_info("My phone number is 0123456789")
select bigfunctions.us.detect_sensitive_info("My phone number is 0123456789")
select bigfunctions.europe_west1.detect_sensitive_info("My phone number is 0123456789")
+---------------------------------------------------------------------------------+
| sensitive_info                                                                  |
+---------------------------------------------------------------------------------+
| [{"string": "0123456789", "info_type": "PHONE_NUMBER", "confidence": "LIKELY"}] |
+---------------------------------------------------------------------------------+

Use cases

This detect_sensitive_info BigQuery function, leveraging Google Cloud DLP, has several practical use cases, particularly when dealing with large datasets stored in BigQuery:

1. Data Discovery and Classification:

  • Understanding Data Content: Before applying specific data governance policies or anonymization techniques, you need to know what sensitive data you have. detect_sensitive_info can scan through text fields in your BigQuery tables to identify various types of sensitive information like PII (Personally Identifiable Information), including names, email addresses, phone numbers, credit card numbers, and more.
  • Compliance Auditing: Regularly scanning your data with this function helps ensure compliance with data privacy regulations like GDPR, CCPA, HIPAA, etc. You can identify potential violations and take corrective action.

2. Data Masking and Anonymization:

  • Pre-processing for Data Sharing: Before sharing datasets with third parties or making them publicly available, use detect_sensitive_info to pinpoint sensitive data. Then, you can apply appropriate masking or anonymization techniques (like redaction or pseudonymization) based on the detected information types.

3. Security Monitoring and Threat Detection:

  • Identifying Data Breaches: Implement continuous monitoring by periodically running detect_sensitive_info on critical datasets. Unusual patterns or sudden appearances of sensitive information in unexpected locations might indicate a data breach or unauthorized access.
  • Vulnerability Assessment: By scanning data entering your BigQuery tables, you can assess vulnerabilities related to sensitive data exposure. For example, if a free-text field intended for product descriptions suddenly contains credit card numbers, it indicates a potential security flaw in your data ingestion process.

4. Data Governance and Policy Enforcement:

  • Automated Policy Enforcement: Integrate detect_sensitive_info into automated data governance workflows. When sensitive data is detected, trigger alerts, block data ingestion, or automatically apply remediation steps.

Example Scenario:

Imagine a company storing customer feedback in a BigQuery table. They want to analyze the feedback for sentiment analysis but need to protect customer privacy.

  1. They use detect_sensitive_info to scan the feedback text column.
  2. The function identifies email addresses and phone numbers mentioned in some feedback entries.
  3. Based on this, they apply a masking function to replace the identified sensitive information with placeholders or pseudonyms before sharing the data with their analytics team.

This ensures that the analytics team can still perform sentiment analysis on the data without having access to the customers' private information.


Need help or Found a bug?
Get help using detect_sensitive_info

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about detect_sensitive_info

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on