Knowledge

Open Data Contract Standard (ODCS)

The Open Data Contract Standard (ODCS) was formerly known as the Data Contract Template, which PayPal used to specify datasets. Now, it is governed by Bitol, a Linux Foundation AI & Data project.

We are members in Bitol's Technical Steering Committee, and we are committed to support the ODCS standard in our products.

Starting with v3.0.0, ODCS is supported by Data Contract Manager to define data contracts as an alternative to the Data Contract Specification.

Details of a data contract in Data Contract Manager

Details of a data contract in Data Contract Manager

Open Data Contract Standard Example

Let's start with an example of a data contract in ODCS v3 format:

apiVersion: v3.0.0
kind: DataContract

id: c176de03-8503-4859-bd0f-218cc413d958
name: Shipments
version: 1.0.0
domain: checkout
status: development

description:
  purpose: This data can be used for analytical purposes

schema:
- name: my_table
  physicalType: table
  properties:
    - name: shipment_id
      description: Unique identifier for each shipment.
      logicalType: string
      logicalTypeOptions:
        format: uuid
      physicalType: uuid
      primaryKey: true
      examples:
        - 03c35ea7-9a26-475f-a38a-0dad96f6de10
    - name: order_id
      description: Identifier for the order associated with the shipment.
      logicalType: string
      physicalType: text
      required: true
      unique: false
      examples:
        - ORD789012
    - name: delivery_date
      description: "The actual or expected delivery date of the shipment."
      logicalType: date
      physicalType: timestamp_tz
      required: false
      examples:
        - "2024-09-05T17:00:00Z"
      quality:
      - type: text
        description: Must be set, when status is "delivered".
    - name: carrier
      description: "The shipping carrier used for the delivery."
      logicalType: string
      physicalType: text
      examples:
        - DHL
        - UPS
    - name: tracking_number
      description: Tracking number provided by the carrier.
      logicalType: string
      logicalTypeOptions:
        minLength: 10
        maxLength: 36
      physicalType: text
      classification: restricted
      quality:
      - rule: duplicateCount
        mustBeLessThan: 1
        unit: percent
      examples:
        - 1Z9999W99999999999
    - name: status
      description: "Current status of the shipment."
      logicalType: string
      physicalType: text
      examples: ["pending", "shipped", "in_transit", "delivered", "returned", "canceled"]
  quality:
    - rule: rowCount
      name: Verify row count range
      mustBeBetween: [1000000, 5000000]

team:
  - username: john.doe@example.com
    role: Data Product Owner

servers:
  - server: production
    environment: production
    type: bigquery
    project: acme_shipments_prod
    dataset: shipments_v1

roles:
  - role: analyst_us_read
    access: read
  - role: analyst_eu_read
    access: read

If you were familiar with the v2 format, you will notice some significant changes. The ODCS v3 format is more flexible and can be used for a broader range of data products.

Content

These are the building blocks of an ODCS data contract:

Fundamentals

The general information about the data contract, such as the ID, name, version, owner, and description.

Schema

The schema specifies the logical, and optionally, physical representation of the data model. With ODCS v3, also complex data structures (e.g., JSON and AVRO models) are supported.

Data Quality

Data quality guarantees can now be defined as plain text, SQL, or with a maintained library of commonly used predefined quality attributes such as rowCount, unique, freshness, and more.

Pricing

The price that data consumers have to pay for using the data product. Optional.

Servers

The physical location of the data set, such as the actual host, database, and schema. Most technologies and data platforms are supported. Supports multiple servers for different environments or data product versions.

Roles

A list of roles that data consumers can apply for to access the data. Different roles may provide different access rights for role-based access control (RBAC).

SLAs

Service-Level Agreements (SLAs) can be defined to specify the expected availability and performance of the data product.

Custom Properties

For custom needs or tooling-specific requirements, additional properties can be added.

ODCS and Data Contract Specification

Now, how relates the Open Data Contract Standard (ODCS) to the Data Contract Specification (datacontract.com)? The Data Contract Specification is a format that we initially developed for tooling support, in particular for the Data Contract CLI and Data Contract Manager.

Let's start with the similarities: There are no fundamental or conceptual differences between these two major formats. Both are open standards, use YAML, and specify data sets in a similar way.

We are striving for harmonization, and we are active member of Bitol's Technical Steering Committee. We have contributed many of our insights from consulting, the maintenance of the Data Contract CLI and the Data Contract Manager and are actively contributing to the design of ODCS v3 and future versions.

Shaping a standard in a governed committee has many advantages, but also some limitations, particularly with regard to scope, velocity, and simplicity. So, until we have a future version of a unified standard, we will continue to support both formats in our products.

We recommend considering Open Data Contract Standard (ODCS) if these aspects are important to you:

  • Using a standard that is governed through a Linux Foundation project
  • Vendor-neutral decision-making process

We recommend considering Data Contract Specification if these aspects are important to you:

  • Support by Data Contract CLI for data contract testing and code-generation
  • Multi-Platform support with bindings to different data platforms for providers and consumers
  • Business Definitions
  • OpenLineage support

There is no wrong decision: We are committed to support both standards in our products, and we will provide migration tooling for an upcoming unified standard.

Data Contract Manager

Data Contract Manager is a frontend to manage data contracts and data products in an organization. It uses data contracts to create an enterprise data marketplace with advanced features for data discovery, version-controlled data contract editing, contract-testing and automated data governance.

Screenshot of the data contract editor for ODCS in Data Contract Manager

A screenshot of the data contract editor for ODCS in Data Contract Manager

Now, Data Contract Manager supports both Data Contract Specification and Open Data Contract Standard (ODCS) v3.

Sign up now for free, or explore the clickable demo of Data Contract Manager.