Skip to content

bigfunctions > get_github_data

get_github_data

Call or Deploy get_github_data ?

✅ You can call this get_github_data bigfunction directly from your Google Cloud Project (no install required).

  • This get_github_data function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

get_github_data(public_repo, destination_dataset, streams)

Description

Get data from public_repo into destination_dataset (using GitHub Airbyte Connector with Airbyte-Serverless)

Data is appended in raw format in tables (one table per stream) into destination_dataset. When supported by the stream, data is extracted incrementally (next execution will only retrieve new rows).

You must create the destination_dataset and give dataEditor access to bigfunction@bigfunctions.iam.gserviceaccount.com before calling this function. You can do this by executing:

-- Create Destination Dataset
create schema `your_project.your_dataset`;

-- Grant Access to Destination Dataset
grant `roles/bigquery.dataEditor`
on schema `your_project.your_dataset`
to 'serviceAccount:bigfunction@bigfunctions.iam.gserviceaccount.com';

While it's running (or after) you can explore logs in table your_project.your_dataset._airbyte_logs

Examples

1. Get stargazers from airbytehq/airbyte repository

select bigfunctions.eu.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', 'stargazers')
select bigfunctions.us.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', 'stargazers')
select bigfunctions.europe_west1.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', 'stargazers')
+--------+
| result |
+--------+
| ok     |
+--------+

2. Get stargazers AND releases from airbytehq/airbyte repository

select bigfunctions.eu.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', 'stargazers, releases')
select bigfunctions.us.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', 'stargazers, releases')
select bigfunctions.europe_west1.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', 'stargazers, releases')
+--------+
| result |
+--------+
| ok     |
+--------+

3. To get a list of available streams, let streams param to null

select bigfunctions.eu.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', null)
select bigfunctions.us.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', null)
select bigfunctions.europe_west1.get_github_data('airbytehq/airbyte', 'your_project.your_dataset', null)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| result                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| issue_timeline_events,assignees,branches,collaborators,comments,commit_comment_reactions,commit_comments,commits,contributor_activity,deployments,events,issue_comment_reactions,issue_events,issue_labels,issue_milestones,issue_reactions,issues,organizations,project_cards,project_columns,projects,pull_request_comment_reactions,pull_request_commits,pull_request_stats,projects_v2,pull_requests,releases,repositories,review_comments,reviews,stargazers,tags,teams,team_members,users,workflows,workflow_runs,workflow_jobs,team_memberships |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Need help using get_github_data?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using get_github_data?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

A data analyst wants to analyze the activity on the dbt-labs/dbt-core GitHub repository. They are particularly interested in tracking the number of stargazers over time, identifying key contributors through commits, and understanding the release history. They can use the get_github_data BigQuery function to import this data directly into BigQuery for analysis.

Here's how they would use the function:

  1. Prepare the destination dataset:
-- Create the dataset
CREATE SCHEMA `my_project.github_data`;

-- Grant access to the BigFunctions service account
GRANT `roles/bigquery.dataEditor`
ON SCHEMA `my_project.github_data`
TO 'serviceAccount:bigfunction@bigfunctions.iam.gserviceaccount.com';
  1. Import the data using the function:
SELECT bigfunctions.us.get_github_data('dbt-labs/dbt-core', 'my_project.github_data', 'stargazers, commits, releases');

This call will import data for the specified streams (stargazers, commits, and releases) into tables within the my_project.github_data dataset. For instance, the stargazer data will likely be in a table named my_project.github_data.stargazers.

  1. Analyze the data in BigQuery:

Now the analyst can use standard SQL queries to analyze the imported data. For example:

-- Track stargazer growth over time
SELECT DATE(starred_at) AS star_date, COUNT(*) AS num_stars
FROM `my_project.github_data.stargazers`
GROUP BY star_date
ORDER BY star_date;

-- Identify top contributors
SELECT author.login, COUNT(*) AS num_commits
FROM `my_project.github_data.commits`
GROUP BY author.login
ORDER BY num_commits DESC;

-- Explore release history
SELECT name, tag_name, published_at
FROM `my_project.github_data.releases`
ORDER BY published_at DESC;

This use case demonstrates how the get_github_data function simplifies the process of importing and analyzing GitHub repository data within BigQuery, enabling data-driven insights into project activity and community engagement.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on