Get Started!
Build a custom data-catalog in minutes
ποΈ 1. What is CatalogBuilder?ΒΆ
- CatalogBuilder is a simple tool to generate & deploy a documentation website for your data assets.
- It enables anyone at your company to quickly find the trusted data they are looking for.
π‘ 2. Why CatalogBuilder?ΒΆ
There are many open-source projects (admundsen, open-metadata, datahub, metacat, atlas) to build such a catalog in-house. But as they offer a lot of advanced features, they are hard to manage and deploy if you're not a tech expert. They can be even harder to customize.
dbt docs is great to generate a documentation website on top of your dbt assets but:
- it focuses on dbt only (while you are interested in other sources + metadata)
- is very hard to customize (except you're an angular expert)
- can be slow.
π CatalogBuilder aims at offering a lightweight alternative to generate a documentation website on top of your data assets. It focuses on read-only data discovery and:
- βοΈ can be easily customized and deployed by low tech people
- βοΈ can then handle the very specific needs of your company
- βοΈ is fast and lightweight
- βοΈ is built on top of the very famous mkdocs-material python library which is used by millions of developers to deploy their documentation (such as fastapi).
π₯ 3. Getting Started with catalog
CLIΒΆ
catalog
is the CLI (command-line-interface) of CatalogBuilder to generate, show & deploy the documentation.
3.1 Install catalog
CLI π οΈΒΆ
pip install catalog-builder
3.2 Create your first documentation configuration π¨βπ»ΒΆ
catalog download bigquery_public_data
To get started, let's download a catalog configuration example from the GitHub repo and play with it. The above command will download the catalogs/bigquery_public_data
folder on your laptop.
You will find in the folder:
assets file
: a file containing the list of the assets you want to put in your documentation. It can be a parquet file namedassets.parquet
or a json lines file namedassets.jsonl
. Each asset in the file must have the following fields:asset_type
: for example:table
.documentation_path
: the path of the asset page in the generated documentation. For exampledataset_name/table_name
.data
: a dict of attributes used to generate the documentation. For example{"name": "foo"}
generate_assets_file.py
: the python script used to (re)generate theassets file
.requirements.txt
: the python requirements needed bygenerate_assets_file.py
.templates
: a folder which includes a jinja-template markdown-file for eachasset_type
. These templates are used to generate a markdown documentation file for each asset.mkdocs.yml
: the mkdocs configuration file used by mkdocs to build the documentation website from the generated markdown files.
3.3 Build your catalog website πΎΒΆ
catalog build bigquery_public_data
- For each asset of the
assets file
, the jinja template ofasset_type
will be rendered using the assetdata
to generate a markdown file which will be written intocatalogs/bigquery_public_data/docs/
atdocumentation_path
.- Mkdocs will then build the documentation website from the markdown files into
catalogs/bigquery_public_data/site
(usingmkdocs.yml
configuration file).
3.4 Run your catalog website locally β‘ΒΆ
catalog serve bigquery_public_data
You can now see the generated documentation website at http://localhost:8000.
3.5 Deploy the documentation website! πΒΆ
A. To deploy on GitHub pages:
catalog gh-deploy bigquery_public_data
Mkdocs will deploy the site on GitHub pages (this only works if you are on a github repository).
B. To deploy elsewhere:
You can follow these instructions from mkdocs.
π 4. Generate your dbt documentationΒΆ
To generate a documentation website for your own dbt project, do the following:
- Change directory to your dbt project directory
- Download
catalogs/dbt
documentation example by runningcatalog download dbt
. - Run
dbt docs generate
to computetarget/manifest.json
andtarget/catalog.json
. - Generate the assets file by running
python catalogs/dbt/generate_assets_file.py
. The script will parsetarget/manifest.json
andtarget/catalog.json
to generate theassets file
in the expected format. - Run
catalog serve dbt
to build the website and show it locally.
Keep in touch π§βπ»ΒΆ
Join our Slack for any question, to get help for getting started, to speak about a bug, to suggest improvements, or simply if you want to have a chat π.
π ContributeΒΆ
Any contribution is more than welcome π€!
- Add a β on the repo to show your support
- Join our Slack and talk with us
- Raise an issue to raise a bug or suggest improvements
- Open a PR!