bigfunctions > faker
faker¶
Call or Deploy faker
?
✅ You can call this faker
bigfunction directly from your Google Cloud Project (no install required).
- This
faker
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions --> Read Getting Started. This is particularly useful if you want to create private functions (for example calling your internal APIs).
- For any question or difficulties, please read Getting Started.
- Found a bug? Please raise an issue here
Public BigFunctions Datasets are like:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
faker(what, locale)
Description
Generates fake data
of type what
and localized with locale
parameter (using faker python library)
Param | Possible values |
---|---|
what |
aba , address , administrative_unit , am_pm , android_platform_token , ascii_company_email , ascii_email , ascii_free_email , ascii_safe_email , bank_country , bban , binary , boolean , bothify , bs , building_number , catch_phrase , century , chrome , city , city_prefix , city_suffix , color , color_name , company , company_email , company_suffix , coordinate , country , country_calling_code , country_code , credit_card_expire , credit_card_full , credit_card_number , credit_card_provider , credit_card_security_code , cryptocurrency , cryptocurrency_code , cryptocurrency_name , csv , currency , currency_code , currency_name , currency_symbol , current_country , current_country_code , date , date_between , date_between_dates , date_object , date_of_birth , date_this_century , date_this_decade , date_this_month , date_this_year , date_time , date_time_ad , date_time_between , date_time_between_dates , date_time_this_century , date_time_this_decade , date_time_this_month , date_time_this_year , day_of_month , day_of_week , dga , domain_name , domain_word , dsv , ean , ean13 , ean8 , ein , email , emoji , file_extension , file_name , file_path , firefox , first_name , first_name_female , first_name_male , first_name_nonbinary , fixed_width , free_email , free_email_domain , future_date , future_datetime , get_providers , hex_color , hexify , hostname , http_method , iana_id , iban , image_url , internet_explorer , invalid_ssn , ios_platform_token , ipv4 , ipv4_network_class , ipv4_private , ipv4_public , ipv6 , isbn10 , isbn13 , iso8601 , items , itin , job , json , json_bytes , language_code , language_name , last_name , last_name_female , last_name_male , last_name_nonbinary , latitude , latlng , lexify , license_plate , linux_platform_token , linux_processor , local_latlng , locale , localized_ean , localized_ean13 , localized_ean8 , location_on_land , longitude , mac_address , mac_platform_token , mac_processor , md5 , military_apo , military_dpo , military_ship , military_state , mime_type , month , month_name , msisdn , name , name_female , name_male , name_nonbinary , nic_handle , nic_handles , null_boolean , numerify , opera , paragraph , paragraphs , password , past_date , past_datetime , phone_number , port_number , postalcode , postalcode_in_state , postalcode_plus4 , postcode , postcode_in_state , prefix , prefix_female , prefix_male , prefix_nonbinary , pricetag , profile , psv , pybool , pydecimal , pydict , pyfloat , pyint , pyiterable , pylist , pyobject , pyset , pystr , pystr_format , pystruct , pytimezone , pytuple , random_choices , random_digit , random_digit_not_null , random_digit_not_null_or_empty , random_digit_or_empty , random_element , random_elements , random_int , random_letter , random_letters , random_lowercase_letter , random_number , random_sample , random_uppercase_letter , randomize_nb_elements , rgb_color , rgb_css_color , ripe_id , safari , safe_color_name , safe_domain_name , safe_email , safe_hex_color , sbn9 , secondary_address , seed_instance , sentence , sentences , sha1 , sha256 , simple_profile , slug , ssn , state , state_abbr , street_address , street_name , street_suffix , suffix , suffix_female , suffix_male , suffix_nonbinary , swift , swift11 , swift8 , tar , text , texts , time , time_delta , time_object , time_series , timezone , tld , tsv , unix_device , unix_partition , unix_time , upc_a , upc_e , uri , uri_extension , uri_page , uri_path , url , user_agent , user_name , uuid4 , windows_platform_token , word , words , year , zip , zipcode , zipcode_in_state , zipcode_plus4 |
locale |
null , ar_AA , ar_AE , ar_BH , ar_EG , ar_JO , ar_PS , ar_SA , az_AZ , bg_BG , bn_BD , bs_BA , cs_CZ , da_DK , de , de_AT , de_CH , de_DE , dk_DK , el_CY , el_GR , en , en_AU , en_CA , en_GB , en_IE , en_IN , en_NZ , en_PH , en_TH , en_US , es , es_AR , es_CA , es_CL , es_CO , es_ES , es_MX , et_EE , fa_IR , fi_FI , fil_PH , fr_BE , fr_CA , fr_CH , fr_FR , fr_QC , ga_IE , he_IL , hi_IN , hr_HR , hu_HU , hy_AM , id_ID , it_CH , it_IT , ja_JP , ka_GE , ko_KR , la , lb_LU , lt_LT , lv_LV , mt_MT , ne_NP , nl_BE , nl_NL , no_NO , or_IN , pl_PL , pt_BR , pt_PT , ro_RO , ru_RU , sk_SK , sl_SI , sq_AL , sv_SE , ta_IN , th , th_TH , tl_PH , tr_TR , tw_GH , uk_UA , vi_VN , zh_CN , zh_TW |
Examples¶
1. Generate fake italian name
select bigfunctions.eu.faker("name", "it_IT")
select bigfunctions.us.faker("name", "it_IT")
select bigfunctions.europe_west1.faker("name", "it_IT")
+------------------+
| fake_data |
+------------------+
| Michela Beccaria |
+------------------+
2. Generate fake IPv4 address (without specifying locale)
select bigfunctions.eu.faker("ipv4_private", null)
select bigfunctions.us.faker("ipv4_private", null)
select bigfunctions.europe_west1.faker("ipv4_private", null)
+---------------+
| fake_data |
+---------------+
| 10.52.207.187 |
+---------------+
Use cases¶
This faker
BigQuery function has several practical use cases, primarily centered around generating realistic test data:
-
Populating Test Databases: When developing or testing applications that interact with BigQuery, you often need a substantial amount of data to simulate real-world scenarios. Instead of manually creating this data, you can use
faker
to automatically generate a large volume of realistic fake data for various data types like names, addresses, emails, dates, etc. This ensures your application is tested under realistic conditions. -
Data Anonymization and Privacy: In situations where you need to share data but protect sensitive information,
faker
can be used to replace real data with plausible fake data. This allows you to maintain the statistical properties of the dataset while preserving individual privacy. For instance, you could replace real names with fake names, real addresses with fake addresses, and so on. -
Demonstrations and Mockups: When demonstrating a new application or creating mockups, you may not have access to real data.
faker
provides a quick and easy way to generate realistic data to populate your demos and make them more compelling. -
Load Testing: To test the performance of your BigQuery queries and applications under stress, you can use
faker
to generate large datasets with specific characteristics. This helps you identify potential bottlenecks and optimize your queries for better performance. -
Training Machine Learning Models: Some machine learning models require large amounts of data for training.
faker
can supplement real data or even be used to generate entirely synthetic datasets for training purposes, especially when real data is scarce or expensive to obtain. -
Data Analysis and Exploration: When exploring a new dataset or developing new data analysis techniques,
faker
can be used to generate datasets with known properties. This allows you to test your analysis methods and understand how they perform under different conditions.
Example Scenario:
Imagine you are developing a new e-commerce application and need to test its reporting features. You could use faker
to generate a dataset of fake customer orders with realistic order dates, product names, prices, shipping addresses, and so on. This would allow you to thoroughly test your reporting dashboard and ensure it can handle a large volume of data and accurately calculate metrics like sales by region, average order value, and customer lifetime value.
By leveraging the various data types and locales supported by faker
, you can tailor the generated data to your specific needs and create highly realistic test scenarios.