Knowledge

Open Data Contract Standard (ODCS)

The Open Data Contract Standard (ODCS) was formerly known as the Data Contract Template, which PayPal used to specify datasets. Now, it is governed by Bitol, a Linux Foundation AI & Data project.

We are members in Bitol's Technical Steering Committee, and we are committed to support the ODCS standard in our products.

Starting with v3.0.0, ODCS is supported by Entropy Data to define data contracts.

Details of a data contract in Entropy Data

Details of a data contract in Entropy Data

Open Data Contract Standard Example

Let's start with an example of a data contract in ODCS v3 format:

apiVersion: v3.0.0
kind: DataContract

id: c176de03-8503-4859-bd0f-218cc413d958
name: Shipments
version: 1.0.0
domain: checkout
status: development

description:
  purpose: This data can be used for analytical purposes

schema:
- name: my_table
  physicalType: table
  properties:
    - name: shipment_id
      description: Unique identifier for each shipment.
      logicalType: string
      logicalTypeOptions:
        format: uuid
      physicalType: uuid
      primaryKey: true
      examples:
        - 03c35ea7-9a26-475f-a38a-0dad96f6de10
    - name: order_id
      description: Identifier for the order associated with the shipment.
      logicalType: string
      physicalType: text
      required: true
      unique: false
      examples:
        - ORD789012
    - name: delivery_date
      description: "The actual or expected delivery date of the shipment."
      logicalType: date
      physicalType: timestamp_tz
      required: false
      examples:
        - "2024-09-05T17:00:00Z"
      quality:
      - type: text
        description: Must be set, when status is "delivered".
    - name: carrier
      description: "The shipping carrier used for the delivery."
      logicalType: string
      physicalType: text
      examples:
        - DHL
        - UPS
    - name: tracking_number
      description: Tracking number provided by the carrier.
      logicalType: string
      logicalTypeOptions:
        minLength: 10
        maxLength: 36
      physicalType: text
      classification: restricted
      quality:
      - rule: duplicateCount
        mustBeLessThan: 1
        unit: percent
      examples:
        - 1Z9999W99999999999
    - name: status
      description: "Current status of the shipment."
      logicalType: string
      physicalType: text
      examples: ["pending", "shipped", "in_transit", "delivered", "returned", "canceled"]
  quality:
    - rule: rowCount
      name: Verify row count range
      mustBeBetween: [1000000, 5000000]

team:
  - username: john.doe@example.com
    role: Data Product Owner

servers:
  - server: production
    environment: production
    type: bigquery
    project: acme_shipments_prod
    dataset: shipments_v1

roles:
  - role: analyst_us_read
    access: read
  - role: analyst_eu_read
    access: read

If you were familiar with the v2 format, you will notice some significant changes. The ODCS v3 format is more flexible and can be used for a broader range of data products.

Content

These are the building blocks of an ODCS data contract:

Fundamentals

The general information about the data contract, such as the ID, name, version, owner, and description.

Schema

The schema specifies the logical, and optionally, physical representation of the data model. With ODCS v3, also complex data structures (e.g., JSON and AVRO models) are supported.

Data Quality

Data quality guarantees can now be defined as plain text, SQL, or with a maintained library of commonly used predefined quality attributes such as rowCount, unique, freshness, and more.

Pricing

The price that data consumers have to pay for using the data product. Optional.

Servers

The physical location of the data set, such as the actual host, database, and schema. Most technologies and data platforms are supported. Supports multiple servers for different environments or data product versions.

Roles

A list of roles that data consumers can apply for to access the data. Different roles may provide different access rights for role-based access control (RBAC).

SLAs

Service-Level Agreements (SLAs) can be defined to specify the expected availability and performance of the data product.

Custom Properties

For custom needs or tooling-specific requirements, additional properties can be added.

Entropy Data

Entropy Data is a frontend to manage data contracts and data products in an organization. It uses data contracts to create a data product marketplace with advanced features for data discovery, version-controlled data contract editing, contract-testing and automated data governance.

Screenshot of the data contract editor for ODCS in Entropy Data

A screenshot of the data contract editor for ODCS in Entropy Data

Sign up now for free, or explore the clickable demo of Entropy Data.