Skip to main content

Data Modelling Conventions

We need to agree on a set of modelling conventions that we use when creating Depot schemas for each component. The naming needs to feel consistent when viewed from a project level, and give the impression it has been designed with one hand, even when teams are working independently.

Schemas

General

  • Use singular names for objects

Description

Include a meaningful description with the schema

example.report.Item:
type: object
description: Report the details, known references and royalty value for an item over a time window

Properties

General

  • Keep the property names as short as possible without losing the meaning

ID

When the primary key for an object schema is also the unique identifier for an entity then use the special id field that is automatically added to every object schema - there’s no need to specify it although you can for clarity.

example.object.Work:
type: object
properties:
title:
type: string
status:
type: string

When the primary key is a composite key, name the properties as entityId and explicitly declare the id.

public.object.ReportItem:
type: object
id:
expression: this.batchId + '_' + this.objectId
properties:
batchId:
type: string
objectId:
type: string

You can generate an ID using a uuid with a prefix:

example.object.WellIdentifiedThing:
type: object
id:
expression: "'example:thing:' + sys.uuidv4()"
properties:
title:
type: string

Optional / Required

Properties are optional unless explicitly declared as mandatory by adding ! to the type. Rather than declaring a field as optional consider where it would make life easier for consumers of the data to make the field mandatory with a default value.

duration:
type: string?
# explicitly optional
duration1:
type: string
# implicitly optional
duration3:
type: string!
# required
duration4:
type: string
required: true
value:
type: number
default: 0

Dates and times

Use the appropriate date and datetime types that conform to the iso8601 standard

startTime:
description: The start of the time window
type: datetime
endTime:
description: The end of the time window
type: datetime

Namespace

We’re frequently dealing with identifiers for the same entities from different providers with their own ID schemes. It’s incredibly important that we get a consistent approach to storing identifiers so that we can link across components.

  • Common list of project-wide namespaces
  • Namespaces are lowercase with hyphens
  • All identifiers are prepended with namespace:
  • Where the formatting isn’t always consistent we should normalize the ID (e.g. ISWC)
info

When we include the namespace as part of the identifier we can join on a single field, rather than having to always join on namespace and identifier

info

Look up the Platform > Data Standard page over Confluence

Examples

TypeValueID
ExampleObjectEO-12345eo:12345
Work Object4db32dz5G2dXf2wo:4db32dz5G2dXf2

Constraints

Use constraints everywhere to ensure that data is formatted correctly and falls within the expected bounds

value:
type: number
constraints:
- min: 0
duration:
type: string
constraints:
- pattern: PT\dM\d(\.\d+)?S

Description

Include a meaningful description with every property.

startTime:
description: The start of the time window
type: datetime
endTime:
description: The end of the time window
type: datetime
value:
description: The (approximate) royalty value in euros over the time window
type: number

Packages

Public

Where a schema is part of the interface to your component, use public as the first part of your package name.

public.object.ReportItem:
warning

Based on our Snowflake naming conventions this will create the Snowflake schema and table PUBLIC_MATCH.REPORT_TRACK

This doesn’t exactly match the current conventions, but does keep the Depot schemas relatively tidy and more importantly unique within an environment.

Enums

Values

  • Uppercase
  • Full words separated with underscores
example.report.ExampleRole:
type: enum
values:
- CREATOR
- EDITOR
- ADMIN
- READER
- GUEST