diff --git a/.gitbook/assets/image (1).png b/.gitbook/assets/image (1).png new file mode 100644 index 00000000..4f0abba4 Binary files /dev/null and b/.gitbook/assets/image (1).png differ diff --git a/.gitbook/assets/image.png b/.gitbook/assets/image.png index 4f0abba4..1efb3b3b 100644 Binary files a/.gitbook/assets/image.png and b/.gitbook/assets/image.png differ diff --git a/SUMMARY.md b/SUMMARY.md index cbed9a25..c8e8d7e8 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -27,6 +27,7 @@ * [Architecture Overview](developers/architecture.md) * [Ocean Node](developers/ocean-node/README.md) * [Node Architecture](developers/ocean-node/node-architecture.md) + * [Compute-to-data (C2D)](developers/compute-to-data-c2d.md) * [Page 1](developers/page-1.md) * [Contracts](developers/contracts/README.md) * [Data NFTs](developers/contracts/data-nfts.md) diff --git a/developers/compute-to-data-c2d.md b/developers/compute-to-data-c2d.md new file mode 100644 index 00000000..fa23850e --- /dev/null +++ b/developers/compute-to-data-c2d.md @@ -0,0 +1,178 @@ +# Compute-to-data (C2D) + +C2Dv2 continues the concept of bringing algorithms to the data, allowing both public and private datasets to be used with algorithms. While previous versions relied on external components (Provider -> Operator Service running in Kubernetes -> multiple Operator-Engines each running in their own Kubernetes namespace), C2Dv2 is embedded entirely within the ocean-node. + +It has a modular approach, allowing multiple compute engines to be connected to the same ocean-node engine. These compute engines can be internal (Docker or Kubernetes if ocean-node runs in a Kubernetes environment) or external (in the future, integration with projects like Bachalau, iExec, etc., is possible). + +### Additional Features + +* Allow multiple C2D engines to connect to the same ocean-node +* Support multiple jobs (stages) in a workflow +* Jobs can be dependent or independent of previous stages, allowing for parallel or serial job execution + +### Workflows + +A workflow defines one or more jobs to be executed. Each job may have dependencies on a previous job. + +```json +[ + { + "index": number, + "jobId": "generated by orchestrator", + "runAfter": "if defined, wait for specific jobId to finish", + "input": [ + { + "index": number, + "did": "optional", + "serviceId": "optional", + "files": "filesObject, optional" + } + ], + "algorithm": { + "did": "optional", + "serviceId": "optional", + "files": "filesObject, optional", + "rawcode": "optional", + "container": { + "entrypoint": "string", + "image": "string", + "tag": "string" + } + } + } +] +``` + +### Orchestration Layer + +Formerly known as the "operator-service," this layer handles interactions between the ocean-node core layer and different execution environments. + +In summary, it should: + +* Expose a list of compute environments for all engines +* Expose a list of running jobs and limits (e.g., max concurrent jobs) +* Take on new jobs (created by the startJob core handler) +* Determine which module to use (Docker, Kubernetes, Bachalau, etc.) +* Insert workflow into the database +* Signal the engine handler to take over job execution +* Read workflow status when the C2D getStatus core is called +* Serve job results when the C2D getJobResult is called + +Due to technical constraints, both internal modules (Docker and Kubernetes) will use Docker images for data provisioning (previously pod-configuration) and results publishing (previously pod-publishing). The orchestration layer will also expose two new core commands: + +* `c2dJobStatusUpdate` (called by both pod-configuration and pod-publishing to update job status) +* `c2dJobPublishResult` (called by pod-publishing when results need to be uploaded) + +When any pod-\*\* calls one of these endpoints, we must verify the signature and respond accordingly. + +### Payment Flow in Orchestration + +This will be based on an escrow contract. The orchestrator will: + +* Compute the sum of maxDuration from all jobs in the workflow +* Calculate the required fee (depending on the previous step, token, environment, etc.) +* Lock the amount in the escrow contract +* Wait until all jobs are finished (successfully or not) +* Calculate the actual duration spent +* Compute proof +* Withdraw payment & provide proof, releasing the difference back to the customer + +### C2D Engines + +A C2D Engine is a piece of code that handles C2D jobs running on a specific orchestration implementation. This document focuses on internal compute engines: Docker-based (host with Docker environment installed) and Kubernetes-based (if ocean-node runs inside a Kubernetes cluster). + +An engine that uses external services (like Bachalau) follows the same logic but will likely interact with remote APIs. + +An engine is responsible for: + +* Storing workflows and each job status (so on restart, we can resume or continue running flows) +* Queueing new jobs + +#### Docker Engine + +This module requires Docker service installed at the host level. It leverages the Docker API to: + +* Create job volumes (with quotas) +* Start the provisioning container (pod-configuration) +* Monitor its status +* Create YAML for algorithms with hardware constraints (CPU, RAM) +* Pass devices for GPU environments +* Start the algorithm container +* Monitor algorithm health & timeout constraints +* Stop the algorithm if the quota is exceeded +* Start the publishing container +* Delete job volumes + +``` +title C2Dv2 message flow for docker module +User -> Ocean-node: start c2d job +Ocean-node -> Orchestration-class: start c2d job +Orchestration-class -> Orchestration-class: determinte module and insert workflow, random private key in db +Orchestration-class -> Docker-engine: queue job +Docker-engine -> Docker_host_api: create job volume +Docker-engine -> Docker-engine: create yaml for pod-configuration, set private key +Docker-engine -> Docker_host_api: start pod-configuration +Pod_configuration -> Pod_configuration: starts ocean-node as pod-config +Pod_configuration -> Ocean-node: call c2dJobProvision +Ocean-node -> Pod_configuration: return workflow +Pod_configuration -> Pod_configuration : download inputs & algo +Pod_configuration -> Ocean-node: call c2dJobStatusUpdate +Ocean-node -> Docker-engine: download success, start algo +Docker-engine -> Docker-engine: create yaml for algo +Docker-engine -> Docker_host_api: start algo container +Docker-engine -> Docker-engine: monitor algo container, stop if timeout +Docker-engine -> Docker-engine: create yaml for pod-publishing, set private key +Docker-engine -> Docker_host_api: start pod-publishing +Docker_host_api -> Pod-Publishing: start as docker container +Pod-Publishing -> Pod-Publishing : prepare output +Pod-Publishing -> Ocean-node: call c2dJobPublishResult +Pod-Publishing -> Ocean-node: call c2dJobStatusUpdate +``` + +

C2Dv2 flow diagram

+ +#### Kubernetes Engine + +This module requires access to Kubernetes credentials (or autodetects them if ocean-node already runs in a Kubernetes cluster). It leverages the Kubernetes API to: + +* Create job volumes (with quotas) +* Start the provisioning container (pod-configuration) +* Monitor its status +* Create YAML for algorithms with hardware constraints (CPU, RAM) +* Pass devices for GPU environments +* Start the algorithm container +* Monitor algorithm health & timeout constraints +* Stop the algorithm if the quota is exceeded +* Start the publishing container +* Delete job volumes + +#### POD-\* Common Description + +For efficient communication between ocean-node and the two containers, the easiest way is to use p2p/http API. Thus, all pod-\* instances will run an ocean-node instance (each will have a job-generated random key) and connect to the main ocean-node instance. The main ocean-node instance's peerNodeId or HTTP API endpoint will be inserted in the YAML. Each pod-\* will use a private key, also exposed in the YAML. + +Each YAML of pod-\* will contain the following environment variables: + +* nodePeerId +* nodeHttpApi +* privateKey + +#### Pod-configuration + +Previously, pod-configuration was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_configuration.js). + +Implementation: + +* Call ocean-node/c2dJobProvision and get the workflow's input section +* Download all assets +* Call the ocean-node/c2dJobStatusUpdate core command to update status (provision finished or errors) + +#### Pod-publishing + +Previously, pod-publishing was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_publishing.js). + +Implementation: + +* Read the output folder +* If multiple files or folders are detected, create a zip with all those files/folders +* Call the ocean-node/c2dJobPublishResult core command and let ocean-node handle storage +* Call the ocean-node/c2dJobStatusUpdate core command to update the job as done diff --git a/developers/compute-to-data/README.md b/developers/compute-to-data/README.md index 34b9027b..a127c1ad 100644 --- a/developers/compute-to-data/README.md +++ b/developers/compute-to-data/README.md @@ -1,5 +1,5 @@ --- -description: Monetise your data while preserving privacy +description: Compute to data version 2 (C2dv2) --- # Compute to data @@ -29,14 +29,14 @@ We suggest reading these guides to get an understanding of how compute-to-data w ### User Guides -* [How to write compute to data algorithms](broken-reference) -* [How to publish a compute-to-data algorithm](broken-reference) -* [How to publish a dataset for compute to data](broken-reference) +* [How to write compute to data algorithms](broken-reference/) +* [How to publish a compute-to-data algorithm](broken-reference/) +* [How to publish a dataset for compute to data](broken-reference/) ### Developer Guides * [How to use compute to data with ocean.js](../ocean.js/cod-asset.md) -* [How to use compute to data with ocean.py](../../data-scientists/ocean.py/) +* [How to use compute to data with ocean.py](../../data-scientists/ocean.py) ### Infrastructure Deployment Guides diff --git a/developers/ocean-node/node-architecture.md b/developers/ocean-node/node-architecture.md index 28adc5c1..6a060f57 100644 --- a/developers/ocean-node/node-architecture.md +++ b/developers/ocean-node/node-architecture.md @@ -14,7 +14,7 @@ The Node stack is divided into the following layers: * Components layer (Indexer, Provider) * Modules layer -

Ocean Node Infrastructure diagram

+

Ocean Node Infrastructure diagram

### Features