1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-02 08:20:22 +01:00
docs/content/concepts/ddo-metadata.md
Trent McConaghy 5716b49077
Update ddo-metadata.md
tweak
2021-08-24 08:42:36 +02:00

350 lines
22 KiB
Markdown

---
title: DDO Metadata
description: Specification of the DDO subset dedicated to asset metadata
slug: /concepts/ddo-metadata/
section: concepts
---
## Overview
This page defines the schema for asset _metadata_. Metadata is the subset of an Ocean DDO that holds information about the asset.
The schema is based on public schema.org [DataSet schema](https://schema.org/Dataset).
Standardizing labels is key to effective searching, sorting and filtering (discovery).
This page specifies metadata attributes that _must_ be included, and that _may_ be included. These attributes are organized hierarchically, from top-layer attributes like `"main"` to sub-level attributes like `"main.type"`. This page also provides DDO metadata examples.
## Rules for Metadata Storage and Control in Ocean
The publisher publishes an asset DDO (including metadata) onto the chain.
The publisher may be the asset owner, or a marketplace acting on behalf of the owner.
Most metadata fields may be modified after creation. The blockchain records the provenance of changes.
DDOs (including metadata) are found in two places:
- _Remote_ - main storage, on-chain. File URLs are always encrypted. One may actually encrypt all metadata, at a severe cost to discoverability.
- _Local_ - local cache. All fields are in plaintext.
Ocean Aquarius helps manage metadata. It can be used to write DDOs to the chain, read from the chain, and has a local cache of the DDO in plaintext with fast search.
## Fields for Metadata
An asset represents a resource in Ocean, e.g. a dataset or an algorithm.
A `metadata` object has the following attributes, all of which are objects. Some are only required for local or remote, and are specified as such.
| Attribute | Required | Description |
| --------------------------- | -------- | ---------------------------------------------------------- |
| **`main`** | **Yes** | Main attributes |
| **`encryptedFiles`** | Remote | Encrypted string of the `attributes.main.files` object. |
| **`encryptedServices`** | Remote | Encrypted string of the `attributes.main.services` object. |
| **`status`** | No | Status attributes |
| **`additionalInformation`** | No | Optional attributes |
The `main` and `additionalInformation` attributes are independent of the asset type.
## Fields for `attributes.main`
The `main` object has the following attributes.
| Attribute | Type | Required | Description |
| ------------------- | --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`name`** | Text |**Yes** | Descriptive name or title of the asset. |
| **`type`** | Text |**Yes** | Asset type. Includes `"dataset"` (e.g. csv file), `"algorithm"` (e.g. Python script). Each type needs a different subset of metadata attributes. |
| **`author`** | Text |**Yes** | Name of the entity generating this data (e.g. Tfl, Disney Corp, etc.). |
| **`license`** | Text |**Yes** | Short name referencing the license of the asset (e.g. Public Domain, CC-0, CC-BY, No License Specified, etc. ). If it's not specified, the following value will be added: "No License Specified". |
| **`files`** | Array of files object |**Yes** | Array of `File` objects including the encrypted file urls. |
| **`dateCreated`** | DateTime |**Yes** | The date on which the asset was created by the originator. ISO 8601 format, Coordinated Universal Time, e.g. `2019-01-31T08:38:32Z`. |
| **`datePublished`** | DateTime | Remote | The date on which the asset DDO is registered into the metadata store (Aquarius) |
## Fields for `attributes.main.files`
The `files` object has a list of `file` objects.
Each `file` object has the following attributes, with the details necessary to consume and validate the data.
| Attribute | Required | Description |
| -------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`index`** |**Yes** | Index number starting from 0 of the file. |
| **`contentType`** |**Yes** | File format. |
| **`url`** | Local | Content URL. Omitted from the remote metadata. Supports `http(s)://` and `ipfs://` URLs. |
| **`name`** | No | File name. |
| **`checksum`** | No | Checksum of the file using your preferred format (i.e. MD5). Format specified in `checksumType`. If it's not provided can't be validated if the file was not modified after registering. |
| **`checksumType`** | No | Format of the provided checksum. Can vary according to server (i.e Amazon vs. Azure) |
| **`contentLength`** | No | Size of the file in bytes. |
| **`encoding`** | No | File encoding (e.g. UTF-8). |
| **`compression`** | No | File compression (e.g. no, gzip, bzip2, etc). |
| **`encrypted`** | No | Boolean. Is the file encrypted? If is not set is assumed the file is not encrypted |
| **`encryptionMode`** | No | Encryption mode used. Just valid if `encrypted=true` |
| **`resourceId`** | No | Remote identifier of the file in the external provider. It is typically the remote id in the cloud provider. |
| **`attributes`** | No | Key-Value hash map with additional attributes describing the asset file. It could include details like the Amazon S3 bucket, region, etc. |
## Fields for `attributes.status`
A `status` object has the following attributes.
| Attribute | Type | Required | Description |
| --------------------- | ------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`isListed`** | Boolean | No | Use to flag unsuitable content. True by default. If it's false, the content must not be returned. |
| **`isRetired`** | Boolean | No | Flag retired content. False by default. If it's true, the content may either not be returned, or returned with a note about retirement. |
| **`isOrderDisabled`** | Boolean | No | For temporarily disabling ordering assets, e.g. when file host is in maintenance. False by default. If it's true, no ordering of assets for download or compute should be allowed. |
## Fields for `attributes.additionalInformation`
All the additional information will be stored as part of the `additionalInformation` section.
| Attribute | Type | Required |
| --------------------- | ------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`tags`** | Array of Text | No | Array of keywords or tags used to describe this content. Empty by default. |
| **`description`** | Text | No | Details of what the resource is. For a dataset, this attribute explains what the data represents and what it can be used for. |
| **`copyrightHolder`** | Text | No | The party holding the legal copyright. Empty by default. |
| **`workExample`** | Text | No | Example of the concept of this asset. This example is part of the metadata, not an external link. |
| **`links`** | Array of Link | No | Mapping of links for data samples, or links to find out more information. Links may be to either a URL or another Asset. We expect marketplaces to converge on agreements of typical formats for linked data: The Ocean Protocol itself does not mandate any specific formats as these requirements are likely to be domain-specific. The links array can be an empty array, but if there is a link object in it, then an "url" is required in that link object. |
| **`inLanguage`** | Text | No | The language of the content. Please use one of the language codes from the [IETF BCP 47 standard](https://tools.ietf.org/html/bcp47). |
| **`categories`** | Array of Text | No | Optional array of categories associated to the asset. Note: recommended to use `"tags"` instead of this. |
## Fields - Other Suggestions
Here are example attributes to help an asset's discoverability.
| Attribute | Description |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`updateFrequency`** | An indication of update latency - i.e. How often are updates expected (seldom, annually, quarterly, etc.), or is the resource static that is never expected to get updated. |
| **`structuredMarkup`** | A link to machine-readable structured markup (such as ttl/json-ld/rdf) describing the dataset. |
## DDO Metadata Example - Local
This is what the DDO metadata looks like. All fields are in plaintext. This is before it's stored on-chain or when it's retrieved and decrypted into a local cache.
```json
{
"main": {
"name": "Madrid Weather forecast",
"dateCreated": "2019-05-16T12:36:14.535Z",
"author": "Norwegian Meteorological Institute",
"type": "dataset",
"license": "Public Domain",
"price": "123000000000000000000",
"files": [
{
"index": 0,
"url": "https://example-url.net/weather/forecast/madrid/350750305731.xml",
"contentLength": "0",
"contentType": "text/xml",
"compression": "none"
}
]
},
"additionalInformation": {
"description": "Weather forecast of Europe/Madrid in XML format",
"copyrightHolder": "Norwegian Meteorological Institute",
"categories": ["Other"],
"links": [],
"tags": [],
"updateFrequency": null,
"structuredMarkup": []
},
"status": {
"isListed": true,
"isRetired": false,
"isOrderDisabled": false
}
}
```
## DDO Metadata Example - Remote
The previous example was for a local cache, with all fields in plaintext.
Here's the same example, for remote on-chain storage. That is, it's how metadata looks as a response to querying Aquarius (remote metadata).
How remote is changed, compared to local:
- `url` is removed from all objects in the `files` array
- `encryptedFiles` is added.
```json
{
"service": [
{
"index": 0,
"serviceEndpoint": "http://aquarius:5000/api/v1/aquarius/assets/ddo/{did}",
"type": "metadata",
"attributes": {
"main": {
"type": "dataset",
"name": "Madrid Weather forecast",
"dateCreated": "2019-05-16T12:36:14.535Z",
"author": "Norwegian Meteorological Institute",
"license": "Public Domain",
"files": [
{
"contentLength": "0",
"contentType": "text/xml",
"compression": "none",
"index": 0
}
],
"datePublished": "2019-05-16T12:41:01Z"
},
"encryptedFiles": "0x7a0d1c66ae861…df43aa9",
"additionalInformation": {
"description": "Weather forecast of Europe/Madrid in XML format",
"copyrightHolder": "Norwegian Meteorological Institute",
"categories": ["Other"],
"links": [],
"tags": [],
"updateFrequency": null,
"structuredMarkup": []
},
"status": {
"isListed": true,
"isRetired": false,
"isOrderDisabled": false
}
}
}
]
}
```
## Fields when `attributes.main.type = algorithm`
An asset of type `algorithm` has the following additional attributes under `main.algorithm`:
| Attribute | Type | Required | Description |
| --------------- | -------- | -------- | --------------------------------------------- |
| **`container`** | `Object` |**Yes** | Object describing the Docker container image. |
| **`language`** | `string` | No | Language used to implement the software |
| **`format`** | `string` | No | Packaging format of the software. |
| **`version`** | `string` | No | Version of the software. |
The `container` object has the following attributes:
| Attribute | Type | Required | Description |
| ---------------- | -------- | -------- | ----------------------------------------------------------------- |
| **`entrypoint`** | `string` |**Yes** | The command to execute, or script to run inside the Docker image. |
| **`image`** | `string` |**Yes** | Name of the Docker image. |
| **`tag`** | `string` |**Yes** | Tag of the Docker image. |
| **`checksum`** | `string` |**Yes** | Checksum of the Docker image. |
```json
{
"index": 0,
"serviceEndpoint": "http://localhost:5000/api/v1/aquarius/assets/ddo/{did}",
"type": "metadata",
"attributes": {
"main": {
"author": "John Doe",
"dateCreated": "2019-02-08T08:13:49Z",
"license": "CC-BY",
"name": "My super algorithm",
"type": "algorithm",
"algorithm": {
"language": "scala",
"format": "docker-image",
"version": "0.1",
"container": {
"entrypoint": "node $ALGO",
"image": "node",
"tag": "10",
"checksum": "efb2c764274b745f5fc37f97c6b0e761"
}
},
"files": [
{
"name": "build_model",
"url": "https://raw.gith ubusercontent.com/oceanprotocol/test-algorithm/master/javascript/algo.js",
"index": 0,
"checksum": "efb2c764274b745f5fc37f97c6b0e761",
"contentLength": "4535431",
"contentType": "text/plain",
"encoding": "UTF-8",
"compression": "zip"
}
]
},
"additionalInformation": {
"description": "Workflow to aggregate weather information",
"tags": ["weather", "uk", "2011", "workflow", "aggregation"],
"copyrightHolder": "John Doe"
}
}
}
```
## Fields when `attributes.main.type = compute`
An asset with a service of type `compute` has the following additional attributes under `main.privacy`:
| Attribute | Type | Required | Description |
| --------------------------------- | ------------------ | -------- | ---------------------------------------------------------- |
| **`allowRawAlgorithm`** | `boolean` |**Yes** | If True, a drag & drop algo can be runned |
| **`allowNetworkAccess`** | `boolean` |**Yes** | If True, the algo job will have network access (stil WIP) |
| **`publisherTrustedAlgorithms `** | Array of `Objects` |**Yes** | If Empty , then any published algo is allowed. (see below) |
The `publisherTrustedAlgorithms ` is an array of objects with the following structure:
| Attribute | Type | Required | Description |
| ------------------------------ | -------- | -------- | ------------------------------------------------------------------ |
| **`did`** | `string` |**Yes** | The did of the algo which is trusted by the publisher. |
| **`filesChecksum`** | `string` |**Yes** | Hash of ( algorithm's encryptedFiles + files section (as string) ) |
| **`containerSectionChecksum`** | `string` |**Yes** | Hash of the algorithm container section (as string) |
To produce `filesChecksum`:
```javascript
sha256(
algorithm_ddo.service['metadata'].attributes.encryptedFiles +
JSON.Stringify(algorithm_ddo.service['metadata'].attributes.main.files)
)
```
To produce `containerSectionChecksum`:
```javascript
sha256(
JSON.Stringify(
algorithm_ddo.service['metadata'].attributes.main.algorithm.container
)
)
```
### Example of a compute service
```json
{
"type": "compute",
"index": 1,
"serviceEndpoint": "https://provider.oceanprotocol.com",
"attributes": {
"main": {
"name": "dataAssetComputingService",
"creator": "0xA32C84D2B44C041F3a56afC07a33f8AC5BF1A071",
"datePublished": "2021-02-17T06:31:33Z",
"cost": "1",
"timeout": 3600,
"privacy": {
"allowRawAlgorithm": true,
"allowNetworkAccess": false,
"publisherTrustedAlgorithms": [
{
"did": "0xxxxx",
"filesChecksum": "1234",
"containerSectionChecksum": "7676"
},
{
"did": "0xxxxx",
"filesChecksum": "1232334",
"containerSectionChecksum": "98787"
}
]
}
}
}
}
```