Skip to content

bigfunctions > faker

faker

Call or Deploy faker ?

✅ You can call this faker bigfunction directly from your Google Cloud Project (no install required).

  • This faker function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions --> Read Getting Started. This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • For any question or difficulties, please read Getting Started.
  • Found a bug? Please raise an issue here

Public BigFunctions Datasets are like:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

faker(what, locale)

Description

Generates fake data of type what and localized with locale parameter (using faker python library)

Param Possible values
what aba, address, administrative_unit, am_pm, android_platform_token, ascii_company_email, ascii_email, ascii_free_email, ascii_safe_email, bank_country, bban, binary, boolean, bothify, bs, building_number, catch_phrase, century, chrome, city, city_prefix, city_suffix, color, color_name, company, company_email, company_suffix, coordinate, country, country_calling_code, country_code, credit_card_expire, credit_card_full, credit_card_number, credit_card_provider, credit_card_security_code, cryptocurrency, cryptocurrency_code, cryptocurrency_name, csv, currency, currency_code, currency_name, currency_symbol, current_country, current_country_code, date, date_between, date_between_dates, date_object, date_of_birth, date_this_century, date_this_decade, date_this_month, date_this_year, date_time, date_time_ad, date_time_between, date_time_between_dates, date_time_this_century, date_time_this_decade, date_time_this_month, date_time_this_year, day_of_month, day_of_week, dga, domain_name, domain_word, dsv, ean, ean13, ean8, ein, email, emoji, file_extension, file_name, file_path, firefox, first_name, first_name_female, first_name_male, first_name_nonbinary, fixed_width, free_email, free_email_domain, future_date, future_datetime, get_providers, hex_color, hexify, hostname, http_method, iana_id, iban, image_url, internet_explorer, invalid_ssn, ios_platform_token, ipv4, ipv4_network_class, ipv4_private, ipv4_public, ipv6, isbn10, isbn13, iso8601, items, itin, job, json, json_bytes, language_code, language_name, last_name, last_name_female, last_name_male, last_name_nonbinary, latitude, latlng, lexify, license_plate, linux_platform_token, linux_processor, local_latlng, locale, localized_ean, localized_ean13, localized_ean8, location_on_land, longitude, mac_address, mac_platform_token, mac_processor, md5, military_apo, military_dpo, military_ship, military_state, mime_type, month, month_name, msisdn, name, name_female, name_male, name_nonbinary, nic_handle, nic_handles, null_boolean, numerify, opera, paragraph, paragraphs, password, past_date, past_datetime, phone_number, port_number, postalcode, postalcode_in_state, postalcode_plus4, postcode, postcode_in_state, prefix, prefix_female, prefix_male, prefix_nonbinary, pricetag, profile, psv, pybool, pydecimal, pydict, pyfloat, pyint, pyiterable, pylist, pyobject, pyset, pystr, pystr_format, pystruct, pytimezone, pytuple, random_choices, random_digit, random_digit_not_null, random_digit_not_null_or_empty, random_digit_or_empty, random_element, random_elements, random_int, random_letter, random_letters, random_lowercase_letter, random_number, random_sample, random_uppercase_letter, randomize_nb_elements, rgb_color, rgb_css_color, ripe_id, safari, safe_color_name, safe_domain_name, safe_email, safe_hex_color, sbn9, secondary_address, seed_instance, sentence, sentences, sha1, sha256, simple_profile, slug, ssn, state, state_abbr, street_address, street_name, street_suffix, suffix, suffix_female, suffix_male, suffix_nonbinary, swift, swift11, swift8, tar, text, texts, time, time_delta, time_object, time_series, timezone, tld, tsv, unix_device, unix_partition, unix_time, upc_a, upc_e, uri, uri_extension, uri_page, uri_path, url, user_agent, user_name, uuid4, windows_platform_token, word, words, year, zip, zipcode, zipcode_in_state, zipcode_plus4
locale null, ar_AA, ar_AE, ar_BH, ar_EG, ar_JO, ar_PS, ar_SA, az_AZ, bg_BG, bn_BD, bs_BA, cs_CZ, da_DK, de, de_AT, de_CH, de_DE, dk_DK, el_CY, el_GR, en, en_AU, en_CA, en_GB, en_IE, en_IN, en_NZ, en_PH, en_TH, en_US, es, es_AR, es_CA, es_CL, es_CO, es_ES, es_MX, et_EE, fa_IR, fi_FI, fil_PH, fr_BE, fr_CA, fr_CH, fr_FR, fr_QC, ga_IE, he_IL, hi_IN, hr_HR, hu_HU, hy_AM, id_ID, it_CH, it_IT, ja_JP, ka_GE, ko_KR, la, lb_LU, lt_LT, lv_LV, mt_MT, ne_NP, nl_BE, nl_NL, no_NO, or_IN, pl_PL, pt_BR, pt_PT, ro_RO, ru_RU, sk_SK, sl_SI, sq_AL, sv_SE, ta_IN, th, th_TH, tl_PH, tr_TR, tw_GH, uk_UA, vi_VN, zh_CN, zh_TW

Examples

1. Generate fake italian name

select bigfunctions.eu.faker("name", "it_IT")
select bigfunctions.us.faker("name", "it_IT")
select bigfunctions.europe_west1.faker("name", "it_IT")
+------------------+
| fake_data        |
+------------------+
| Michela Beccaria |
+------------------+

2. Generate fake IPv4 address (without specifying locale)

select bigfunctions.eu.faker("ipv4_private", null)
select bigfunctions.us.faker("ipv4_private", null)
select bigfunctions.europe_west1.faker("ipv4_private", null)
+---------------+
| fake_data     |
+---------------+
| 10.52.207.187 |
+---------------+

Use cases

This faker BigQuery function has several practical use cases, primarily centered around generating realistic test data:

  1. Populating Test Databases: When developing or testing applications that interact with BigQuery, you often need a substantial amount of data to simulate real-world scenarios. Instead of manually creating this data, you can use faker to automatically generate a large volume of realistic fake data for various data types like names, addresses, emails, dates, etc. This ensures your application is tested under realistic conditions.

  2. Data Anonymization and Privacy: In situations where you need to share data but protect sensitive information, faker can be used to replace real data with plausible fake data. This allows you to maintain the statistical properties of the dataset while preserving individual privacy. For instance, you could replace real names with fake names, real addresses with fake addresses, and so on.

  3. Demonstrations and Mockups: When demonstrating a new application or creating mockups, you may not have access to real data. faker provides a quick and easy way to generate realistic data to populate your demos and make them more compelling.

  4. Load Testing: To test the performance of your BigQuery queries and applications under stress, you can use faker to generate large datasets with specific characteristics. This helps you identify potential bottlenecks and optimize your queries for better performance.

  5. Training Machine Learning Models: Some machine learning models require large amounts of data for training. faker can supplement real data or even be used to generate entirely synthetic datasets for training purposes, especially when real data is scarce or expensive to obtain.

  6. Data Analysis and Exploration: When exploring a new dataset or developing new data analysis techniques, faker can be used to generate datasets with known properties. This allows you to test your analysis methods and understand how they perform under different conditions.

Example Scenario:

Imagine you are developing a new e-commerce application and need to test its reporting features. You could use faker to generate a dataset of fake customer orders with realistic order dates, product names, prices, shipping addresses, and so on. This would allow you to thoroughly test your reporting dashboard and ensure it can handle a large volume of data and accurately calculate metrics like sales by region, average order value, and customer lifetime value.

By leveraging the various data types and locales supported by faker, you can tailor the generated data to your specific needs and create highly realistic test scenarios.