mirror of
https://github.com/oceanprotocol/docs.git
synced 2024-10-31 23:35:36 +01:00
GITBOOK-7: No subject
This commit is contained in:
parent
7d29546d49
commit
3b894df0d4
BIN
.gitbook/assets/image (1).png
Normal file
BIN
.gitbook/assets/image (1).png
Normal file
Binary file not shown.
After Width: | Height: | Size: 52 KiB |
Binary file not shown.
Before Width: | Height: | Size: 52 KiB After Width: | Height: | Size: 122 KiB |
@ -27,6 +27,7 @@
|
||||
* [Architecture Overview](developers/architecture.md)
|
||||
* [Ocean Node](developers/ocean-node/README.md)
|
||||
* [Node Architecture](developers/ocean-node/node-architecture.md)
|
||||
* [Compute-to-data (C2D)](developers/compute-to-data-c2d.md)
|
||||
* [Page 1](developers/page-1.md)
|
||||
* [Contracts](developers/contracts/README.md)
|
||||
* [Data NFTs](developers/contracts/data-nfts.md)
|
||||
|
178
developers/compute-to-data-c2d.md
Normal file
178
developers/compute-to-data-c2d.md
Normal file
@ -0,0 +1,178 @@
|
||||
# Compute-to-data (C2D)
|
||||
|
||||
C2Dv2 continues the concept of bringing algorithms to the data, allowing both public and private datasets to be used with algorithms. While previous versions relied on external components (Provider -> Operator Service running in Kubernetes -> multiple Operator-Engines each running in their own Kubernetes namespace), C2Dv2 is embedded entirely within the ocean-node.
|
||||
|
||||
It has a modular approach, allowing multiple compute engines to be connected to the same ocean-node engine. These compute engines can be internal (Docker or Kubernetes if ocean-node runs in a Kubernetes environment) or external (in the future, integration with projects like Bachalau, iExec, etc., is possible).
|
||||
|
||||
### Additional Features
|
||||
|
||||
* Allow multiple C2D engines to connect to the same ocean-node
|
||||
* Support multiple jobs (stages) in a workflow
|
||||
* Jobs can be dependent or independent of previous stages, allowing for parallel or serial job execution
|
||||
|
||||
### Workflows
|
||||
|
||||
A workflow defines one or more jobs to be executed. Each job may have dependencies on a previous job.
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"index": number,
|
||||
"jobId": "generated by orchestrator",
|
||||
"runAfter": "if defined, wait for specific jobId to finish",
|
||||
"input": [
|
||||
{
|
||||
"index": number,
|
||||
"did": "optional",
|
||||
"serviceId": "optional",
|
||||
"files": "filesObject, optional"
|
||||
}
|
||||
],
|
||||
"algorithm": {
|
||||
"did": "optional",
|
||||
"serviceId": "optional",
|
||||
"files": "filesObject, optional",
|
||||
"rawcode": "optional",
|
||||
"container": {
|
||||
"entrypoint": "string",
|
||||
"image": "string",
|
||||
"tag": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Orchestration Layer
|
||||
|
||||
Formerly known as the "operator-service," this layer handles interactions between the ocean-node core layer and different execution environments.
|
||||
|
||||
In summary, it should:
|
||||
|
||||
* Expose a list of compute environments for all engines
|
||||
* Expose a list of running jobs and limits (e.g., max concurrent jobs)
|
||||
* Take on new jobs (created by the startJob core handler)
|
||||
* Determine which module to use (Docker, Kubernetes, Bachalau, etc.)
|
||||
* Insert workflow into the database
|
||||
* Signal the engine handler to take over job execution
|
||||
* Read workflow status when the C2D getStatus core is called
|
||||
* Serve job results when the C2D getJobResult is called
|
||||
|
||||
Due to technical constraints, both internal modules (Docker and Kubernetes) will use Docker images for data provisioning (previously pod-configuration) and results publishing (previously pod-publishing). The orchestration layer will also expose two new core commands:
|
||||
|
||||
* `c2dJobStatusUpdate` (called by both pod-configuration and pod-publishing to update job status)
|
||||
* `c2dJobPublishResult` (called by pod-publishing when results need to be uploaded)
|
||||
|
||||
When any pod-\*\* calls one of these endpoints, we must verify the signature and respond accordingly.
|
||||
|
||||
### Payment Flow in Orchestration
|
||||
|
||||
This will be based on an escrow contract. The orchestrator will:
|
||||
|
||||
* Compute the sum of maxDuration from all jobs in the workflow
|
||||
* Calculate the required fee (depending on the previous step, token, environment, etc.)
|
||||
* Lock the amount in the escrow contract
|
||||
* Wait until all jobs are finished (successfully or not)
|
||||
* Calculate the actual duration spent
|
||||
* Compute proof
|
||||
* Withdraw payment & provide proof, releasing the difference back to the customer
|
||||
|
||||
### C2D Engines
|
||||
|
||||
A C2D Engine is a piece of code that handles C2D jobs running on a specific orchestration implementation. This document focuses on internal compute engines: Docker-based (host with Docker environment installed) and Kubernetes-based (if ocean-node runs inside a Kubernetes cluster).
|
||||
|
||||
An engine that uses external services (like Bachalau) follows the same logic but will likely interact with remote APIs.
|
||||
|
||||
An engine is responsible for:
|
||||
|
||||
* Storing workflows and each job status (so on restart, we can resume or continue running flows)
|
||||
* Queueing new jobs
|
||||
|
||||
#### Docker Engine
|
||||
|
||||
This module requires Docker service installed at the host level. It leverages the Docker API to:
|
||||
|
||||
* Create job volumes (with quotas)
|
||||
* Start the provisioning container (pod-configuration)
|
||||
* Monitor its status
|
||||
* Create YAML for algorithms with hardware constraints (CPU, RAM)
|
||||
* Pass devices for GPU environments
|
||||
* Start the algorithm container
|
||||
* Monitor algorithm health & timeout constraints
|
||||
* Stop the algorithm if the quota is exceeded
|
||||
* Start the publishing container
|
||||
* Delete job volumes
|
||||
|
||||
```
|
||||
title C2Dv2 message flow for docker module
|
||||
User -> Ocean-node: start c2d job
|
||||
Ocean-node -> Orchestration-class: start c2d job
|
||||
Orchestration-class -> Orchestration-class: determinte module and insert workflow, random private key in db
|
||||
Orchestration-class -> Docker-engine: queue job
|
||||
Docker-engine -> Docker_host_api: create job volume
|
||||
Docker-engine -> Docker-engine: create yaml for pod-configuration, set private key
|
||||
Docker-engine -> Docker_host_api: start pod-configuration
|
||||
Pod_configuration -> Pod_configuration: starts ocean-node as pod-config
|
||||
Pod_configuration -> Ocean-node: call c2dJobProvision
|
||||
Ocean-node -> Pod_configuration: return workflow
|
||||
Pod_configuration -> Pod_configuration : download inputs & algo
|
||||
Pod_configuration -> Ocean-node: call c2dJobStatusUpdate
|
||||
Ocean-node -> Docker-engine: download success, start algo
|
||||
Docker-engine -> Docker-engine: create yaml for algo
|
||||
Docker-engine -> Docker_host_api: start algo container
|
||||
Docker-engine -> Docker-engine: monitor algo container, stop if timeout
|
||||
Docker-engine -> Docker-engine: create yaml for pod-publishing, set private key
|
||||
Docker-engine -> Docker_host_api: start pod-publishing
|
||||
Docker_host_api -> Pod-Publishing: start as docker container
|
||||
Pod-Publishing -> Pod-Publishing : prepare output
|
||||
Pod-Publishing -> Ocean-node: call c2dJobPublishResult
|
||||
Pod-Publishing -> Ocean-node: call c2dJobStatusUpdate
|
||||
```
|
||||
|
||||
<figure><img src="../.gitbook/assets/image.png" alt=""><figcaption><p>C2Dv2 flow diagram</p></figcaption></figure>
|
||||
|
||||
#### Kubernetes Engine
|
||||
|
||||
This module requires access to Kubernetes credentials (or autodetects them if ocean-node already runs in a Kubernetes cluster). It leverages the Kubernetes API to:
|
||||
|
||||
* Create job volumes (with quotas)
|
||||
* Start the provisioning container (pod-configuration)
|
||||
* Monitor its status
|
||||
* Create YAML for algorithms with hardware constraints (CPU, RAM)
|
||||
* Pass devices for GPU environments
|
||||
* Start the algorithm container
|
||||
* Monitor algorithm health & timeout constraints
|
||||
* Stop the algorithm if the quota is exceeded
|
||||
* Start the publishing container
|
||||
* Delete job volumes
|
||||
|
||||
#### POD-\* Common Description
|
||||
|
||||
For efficient communication between ocean-node and the two containers, the easiest way is to use p2p/http API. Thus, all pod-\* instances will run an ocean-node instance (each will have a job-generated random key) and connect to the main ocean-node instance. The main ocean-node instance's peerNodeId or HTTP API endpoint will be inserted in the YAML. Each pod-\* will use a private key, also exposed in the YAML.
|
||||
|
||||
Each YAML of pod-\* will contain the following environment variables:
|
||||
|
||||
* nodePeerId
|
||||
* nodeHttpApi
|
||||
* privateKey
|
||||
|
||||
#### Pod-configuration
|
||||
|
||||
Previously, pod-configuration was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_configuration.js).
|
||||
|
||||
Implementation:
|
||||
|
||||
* Call ocean-node/c2dJobProvision and get the workflow's input section
|
||||
* Download all assets
|
||||
* Call the ocean-node/c2dJobStatusUpdate core command to update status (provision finished or errors)
|
||||
|
||||
#### Pod-publishing
|
||||
|
||||
Previously, pod-publishing was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_publishing.js).
|
||||
|
||||
Implementation:
|
||||
|
||||
* Read the output folder
|
||||
* If multiple files or folders are detected, create a zip with all those files/folders
|
||||
* Call the ocean-node/c2dJobPublishResult core command and let ocean-node handle storage
|
||||
* Call the ocean-node/c2dJobStatusUpdate core command to update the job as done
|
@ -1,5 +1,5 @@
|
||||
---
|
||||
description: Monetise your data while preserving privacy
|
||||
description: Compute to data version 2 (C2dv2)
|
||||
---
|
||||
|
||||
# Compute to data
|
||||
@ -29,14 +29,14 @@ We suggest reading these guides to get an understanding of how compute-to-data w
|
||||
|
||||
### User Guides
|
||||
|
||||
* [How to write compute to data algorithms](broken-reference)
|
||||
* [How to publish a compute-to-data algorithm](broken-reference)
|
||||
* [How to publish a dataset for compute to data](broken-reference)
|
||||
* [How to write compute to data algorithms](broken-reference/)
|
||||
* [How to publish a compute-to-data algorithm](broken-reference/)
|
||||
* [How to publish a dataset for compute to data](broken-reference/)
|
||||
|
||||
### Developer Guides
|
||||
|
||||
* [How to use compute to data with ocean.js](../ocean.js/cod-asset.md)
|
||||
* [How to use compute to data with ocean.py](../../data-scientists/ocean.py/)
|
||||
* [How to use compute to data with ocean.py](../../data-scientists/ocean.py)
|
||||
|
||||
### Infrastructure Deployment Guides
|
||||
|
||||
|
@ -14,7 +14,7 @@ The Node stack is divided into the following layers:
|
||||
* Components layer (Indexer, Provider)
|
||||
* Modules layer
|
||||
|
||||
<figure><img src="../../.gitbook/assets/image.png" alt=""><figcaption><p>Ocean Node Infrastructure diagram</p></figcaption></figure>
|
||||
<figure><img src="../../.gitbook/assets/image (1).png" alt=""><figcaption><p>Ocean Node Infrastructure diagram</p></figcaption></figure>
|
||||
|
||||
### Features
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user