Data Modelling Conventions
We need to agree on a set of modelling conventions that we use when creating Depot schemas for each component. The naming needs to feel consistent when viewed from a project level, and give the impression it has been designed with one hand, even when teams are working independently.
Schemas
General
- Use singular names for objects
Description
Include a meaningful description with the schema
example.report.Item:
type: object
description: Report the details, known references and royalty value for an item over a time window
Properties
General
- Keep the property names as short as possible without losing the meaning
ID
When the primary key for an object schema is also the unique identifier for an entity then use the special id field that is automatically added to every object schema - there’s no need to specify it although you can for clarity.
example.object.Work:
type: object
properties:
title:
type: string
status:
type: string
When the primary key is a composite key, name the properties as entityId and explicitly declare the id.
public.object.ReportItem:
type: object
id:
expression: this.batchId + '_' + this.objectId
properties:
batchId:
type: string
objectId:
type: string
You can generate an ID using a uuid with a prefix:
example.object.WellIdentifiedThing:
type: object
id:
expression: "'example:thing:' + sys.uuidv4()"
properties:
title:
type: string
Optional / Required
Properties are optional unless explicitly declared as mandatory by adding ! to the type. Rather than declaring a field as optional consider where it would make life easier for consumers of the data to make the field mandatory with a default value.
duration:
type: string?
# explicitly optional
duration1:
type: string
# implicitly optional
duration3:
type: string!
# required
duration4:
type: string
required: true
value:
type: number
default: 0
Dates and times
Use the appropriate date and datetime types that conform to the iso8601 standard
startTime:
description: The start of the time window
type: datetime
endTime:
description: The end of the time window
type: datetime
Namespace
We’re frequently dealing with identifiers for the same entities from different providers with their own ID schemes. It’s incredibly important that we get a consistent approach to storing identifiers so that we can link across components.
- Common list of project-wide namespaces
- Namespaces are lowercase with hyphens
- All identifiers are prepended with namespace:
- Where the formatting isn’t always consistent we should normalize the ID (e.g. ISWC)
When we include the namespace as part of the identifier we can join on a single field, rather than having to always join on namespace and identifier
Look up the Platform > Data Standard page over Confluence
Examples
| Type | Value | ID |
|---|---|---|
| ExampleObject | EO-12345 | eo:12345 |
| Work Object | 4db32dz5G2dXf2 | wo:4db32dz5G2dXf2 |
Constraints
Use constraints everywhere to ensure that data is formatted correctly and falls within the expected bounds
value:
type: number
constraints:
- min: 0
duration:
type: string
constraints:
- pattern: PT\dM\d(\.\d+)?S
Description
Include a meaningful description with every property.
startTime:
description: The start of the time window
type: datetime
endTime:
description: The end of the time window
type: datetime
value:
description: The (approximate) royalty value in euros over the time window
type: number
Packages
Public
Where a schema is part of the interface to your component, use public as the first part of your package name.
public.object.ReportItem:
Based on our Snowflake naming conventions this will create the Snowflake schema and table PUBLIC_MATCH.REPORT_TRACK
This doesn’t exactly match the current conventions, but does keep the Depot schemas relatively tidy and more importantly unique within an environment.
Enums
Values
- Uppercase
- Full words separated with underscores
example.report.ExampleRole:
type: enum
values:
- CREATOR
- EDITOR
- ADMIN
- READER
- GUEST