Get Started!
Build a custom data-catalog in minutes
ποΈ 1. What is CatalogBuilder?ΒΆ
- CatalogBuilder is a simple tool to generate & deploy a documentation website for your data assets.
- It enables anyone at your company to quickly find the trusted data they are looking for.
π‘ 2. Why CatalogBuilder?ΒΆ
There are many open-source projects (admundsen, open-metadata, datahub, metacat, atlas) to build such a catalog in-house. But as they offer a lot of advanced features, they are hard to manage and deploy if you're not a tech expert. They can be even harder to customize.
dbt docs is great to generate a documentation website on top of your dbt assets but:
- it focuses on dbt only (while you are interested in other sources + metadata)
- is very hard to customize (except you're an angular expert)
- can be slow.
π CatalogBuilder aims at offering a lightweight alternative to generate a documentation website on top of your data assets. It focuses on read-only data discovery and:
- βοΈ can be easily customized and deployed by low tech people
- βοΈ can then handle the very specific needs of your company
- βοΈ is fast and lightweight
- βοΈ is built on top of the very famous mkdocs-material python library which is used by millions of developers to deploy their documentation (such as fastapi).
π₯ 3. Getting Started with catalog CLIΒΆ
catalogis the CLI (command-line-interface) of CatalogBuilder to generate, show & deploy the documentation.
3.1 Install catalog CLI π οΈΒΆ
pip install catalog-builder
3.2 Create your first documentation configuration π¨βπ»ΒΆ
catalog download bigquery_public_data
To get started, let's download a catalog configuration example from the GitHub repo and play with it. The above command will download the catalogs/bigquery_public_data folder on your laptop.
You will find in the folder:
assets file: a file containing the list of the assets you want to put in your documentation. It can be a parquet file namedassets.parquetor a json lines file namedassets.jsonl. Each asset in the file must have the following fields:asset_type: for example:table.documentation_path: the path of the asset page in the generated documentation. For exampledataset_name/table_name.data: a dict of attributes used to generate the documentation. For example{"name": "foo"}generate_assets_file.py: the python script used to (re)generate theassets file.requirements.txt: the python requirements needed bygenerate_assets_file.py.templates: a folder which includes a jinja-template markdown-file for eachasset_type. These templates are used to generate a markdown documentation file for each asset.mkdocs.yml: the mkdocs configuration file used by mkdocs to build the documentation website from the generated markdown files.
3.3 Build your catalog website πΎΒΆ
catalog build bigquery_public_data
- For each asset of the
assets file, the jinja template ofasset_typewill be rendered using the assetdatato generate a markdown file which will be written intocatalogs/bigquery_public_data/docs/atdocumentation_path.- Mkdocs will then build the documentation website from the markdown files into
catalogs/bigquery_public_data/site(usingmkdocs.ymlconfiguration file).
3.4 Run your catalog website locally β‘ΒΆ
catalog serve bigquery_public_data
You can now see the generated documentation website at http://localhost:8000.
3.5 Deploy the documentation website! πΒΆ
A. To deploy on GitHub pages:
catalog gh-deploy bigquery_public_data
Mkdocs will deploy the site on GitHub pages (this only works if you are on a github repository).
B. To deploy elsewhere:
You can follow these instructions from mkdocs.
π 4. Generate your dbt documentationΒΆ
To generate a documentation website for your own dbt project, do the following:
- Change directory to your dbt project directory
- Download
catalogs/dbtdocumentation example by runningcatalog download dbt. - Run
dbt docs generateto computetarget/manifest.jsonandtarget/catalog.json. - Generate the assets file by running
python catalogs/dbt/generate_assets_file.py. The script will parsetarget/manifest.jsonandtarget/catalog.jsonto generate theassets filein the expected format. - Run
catalog serve dbtto build the website and show it locally.
Keep in touch π§βπ»ΒΆ
Join our Slack for any question, to get help for getting started, to speak about a bug, to suggest improvements, or simply if you want to have a chat π.
π ContributeΒΆ
Any contribution is more than welcome π€!
- Add a β on the repo to show your support
- Join our Slack and talk with us
- Raise an issue to raise a bug or suggest improvements
- Open a PR!