GITBOOK-7: No subject

2024-11-26 19:49:26 +01:00 · 2024-06-18 10:45:45 +00:00 · 2024-06-18 10:45:45 +00:00 · 3b894df0d4
commit 3b894df0d4
parent 7d29546d49
6 changed files with 185 additions and 6 deletions
--- a/.gitbook/assets/image
+++ b/.gitbook/assets/image
--- a/.gitbook/assets/image.png
+++ b/.gitbook/assets/image.png
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -27,6 +27,7 @@
  * [Architecture Overview](developers/architecture.md)
  * [Ocean Node](developers/ocean-node/README.md)
    * [Node Architecture](developers/ocean-node/node-architecture.md)
+  * [Compute-to-data (C2D)](developers/compute-to-data-c2d.md)
  * [Page 1](developers/page-1.md)
  * [Contracts](developers/contracts/README.md)
    * [Data NFTs](developers/contracts/data-nfts.md)
--- a/developers/compute-to-data-c2d.md
+++ b/developers/compute-to-data-c2d.md
@ -0,0 +1,178 @@
+# Compute-to-data (C2D)
+
+C2Dv2 continues the concept of bringing algorithms to the data, allowing both public and private datasets to be used with algorithms. While previous versions relied on external components (Provider -> Operator Service running in Kubernetes -> multiple Operator-Engines each running in their own Kubernetes namespace), C2Dv2 is embedded entirely within the ocean-node.
+
+It has a modular approach, allowing multiple compute engines to be connected to the same ocean-node engine. These compute engines can be internal (Docker or Kubernetes if ocean-node runs in a Kubernetes environment) or external (in the future, integration with projects like Bachalau, iExec, etc., is possible).
+
+### Additional Features
+
+* Allow multiple C2D engines to connect to the same ocean-node
+* Support multiple jobs (stages) in a workflow
+* Jobs can be dependent or independent of previous stages, allowing for parallel or serial job execution
+
+### Workflows
+
+A workflow defines one or more jobs to be executed. Each job may have dependencies on a previous job.
+
+```json
+[
+  {
+    "index": number,
+    "jobId": "generated by orchestrator",
+    "runAfter": "if defined, wait for specific jobId to finish",
+    "input": [
+      {
+        "index": number,
+        "did": "optional",
+        "serviceId": "optional",
+        "files": "filesObject, optional"
+      }
+    ],
+    "algorithm": {
+      "did": "optional",
+      "serviceId": "optional",
+      "files": "filesObject, optional",
+      "rawcode": "optional",
+      "container": {
+        "entrypoint": "string",
+        "image": "string",
+        "tag": "string"
+      }
+    }
+  }
+]
+```
+
+### Orchestration Layer
+
+Formerly known as the "operator-service," this layer handles interactions between the ocean-node core layer and different execution environments.
+
+In summary, it should:
+
+* Expose a list of compute environments for all engines
+* Expose a list of running jobs and limits (e.g., max concurrent jobs)
+* Take on new jobs (created by the startJob core handler)
+* Determine which module to use (Docker, Kubernetes, Bachalau, etc.)
+* Insert workflow into the database
+* Signal the engine handler to take over job execution
+* Read workflow status when the C2D getStatus core is called
+* Serve job results when the C2D getJobResult is called
+
+Due to technical constraints, both internal modules (Docker and Kubernetes) will use Docker images for data provisioning (previously pod-configuration) and results publishing (previously pod-publishing). The orchestration layer will also expose two new core commands:
+
+* `c2dJobStatusUpdate` (called by both pod-configuration and pod-publishing to update job status)
+* `c2dJobPublishResult` (called by pod-publishing when results need to be uploaded)
+
+When any pod-\*\* calls one of these endpoints, we must verify the signature and respond accordingly.
+
+### Payment Flow in Orchestration
+
+This will be based on an escrow contract. The orchestrator will:
+
+* Compute the sum of maxDuration from all jobs in the workflow
+* Calculate the required fee (depending on the previous step, token, environment, etc.)
+* Lock the amount in the escrow contract
+* Wait until all jobs are finished (successfully or not)
+* Calculate the actual duration spent
+* Compute proof
+* Withdraw payment & provide proof, releasing the difference back to the customer
+
+### C2D Engines
+
+A C2D Engine is a piece of code that handles C2D jobs running on a specific orchestration implementation. This document focuses on internal compute engines: Docker-based (host with Docker environment installed) and Kubernetes-based (if ocean-node runs inside a Kubernetes cluster).
+
+An engine that uses external services (like Bachalau) follows the same logic but will likely interact with remote APIs.
+
+An engine is responsible for:
+
+* Storing workflows and each job status (so on restart, we can resume or continue running flows)
+* Queueing new jobs
+
+#### Docker Engine
+
+This module requires Docker service installed at the host level. It leverages the Docker API to:
+
+* Create job volumes (with quotas)
+* Start the provisioning container (pod-configuration)
+* Monitor its status
+* Create YAML for algorithms with hardware constraints (CPU, RAM)
+* Pass devices for GPU environments
+* Start the algorithm container
+* Monitor algorithm health & timeout constraints
+* Stop the algorithm if the quota is exceeded
+* Start the publishing container
+* Delete job volumes
+
+```
+title C2Dv2 message flow for docker module
+User -> Ocean-node: start c2d job
+Ocean-node -> Orchestration-class: start c2d job
+Orchestration-class -> Orchestration-class: determinte module and insert workflow, random private key in db
+Orchestration-class -> Docker-engine: queue job
+Docker-engine -> Docker_host_api:  create job volume
+Docker-engine -> Docker-engine: create yaml for pod-configuration, set private key
+Docker-engine -> Docker_host_api: start pod-configuration
+Pod_configuration -> Pod_configuration: starts ocean-node as pod-config
+Pod_configuration -> Ocean-node: call c2dJobProvision
+Ocean-node -> Pod_configuration: return workflow
+Pod_configuration -> Pod_configuration : download inputs & algo
+Pod_configuration -> Ocean-node: call c2dJobStatusUpdate
+Ocean-node -> Docker-engine: download success, start algo
+Docker-engine -> Docker-engine: create yaml for algo
+Docker-engine -> Docker_host_api: start algo container
+Docker-engine -> Docker-engine: monitor algo container, stop if timeout
+Docker-engine -> Docker-engine: create yaml for pod-publishing, set private key
+Docker-engine -> Docker_host_api: start pod-publishing
+Docker_host_api -> Pod-Publishing: start as docker container
+Pod-Publishing -> Pod-Publishing : prepare output
+Pod-Publishing -> Ocean-node: call c2dJobPublishResult
+Pod-Publishing -> Ocean-node: call c2dJobStatusUpdate
+```
+
+<figure><img src="../.gitbook/assets/image.png" alt=""><figcaption><p>C2Dv2 flow diagram</p></figcaption></figure>
+
+#### Kubernetes Engine
+
+This module requires access to Kubernetes credentials (or autodetects them if ocean-node already runs in a Kubernetes cluster). It leverages the Kubernetes API to:
+
+* Create job volumes (with quotas)
+* Start the provisioning container (pod-configuration)
+* Monitor its status
+* Create YAML for algorithms with hardware constraints (CPU, RAM)
+* Pass devices for GPU environments
+* Start the algorithm container
+* Monitor algorithm health & timeout constraints
+* Stop the algorithm if the quota is exceeded
+* Start the publishing container
+* Delete job volumes
+
+#### POD-\* Common Description
+
+For efficient communication between ocean-node and the two containers, the easiest way is to use p2p/http API. Thus, all pod-\* instances will run an ocean-node instance (each will have a job-generated random key) and connect to the main ocean-node instance. The main ocean-node instance's peerNodeId or HTTP API endpoint will be inserted in the YAML. Each pod-\* will use a private key, also exposed in the YAML.
+
+Each YAML of pod-\* will contain the following environment variables:
+
+* nodePeerId
+* nodeHttpApi
+* privateKey
+
+#### Pod-configuration
+
+Previously, pod-configuration was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_configuration.js).
+
+Implementation:
+
+* Call ocean-node/c2dJobProvision and get the workflow's input section
+* Download all assets
+* Call the ocean-node/c2dJobStatusUpdate core command to update status (provision finished or errors)
+
+#### Pod-publishing
+
+Previously, pod-publishing was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_publishing.js).
+
+Implementation:
+
+* Read the output folder
+* If multiple files or folders are detected, create a zip with all those files/folders
+* Call the ocean-node/c2dJobPublishResult core command and let ocean-node handle storage
+* Call the ocean-node/c2dJobStatusUpdate core command to update the job as done
--- a/developers/compute-to-data/README.md
+++ b/developers/compute-to-data/README.md
@ -1,5 +1,5 @@
 ---
-description: Monetise your data while preserving privacy
+description: Compute to data version 2 (C2dv2)
 ---

 # Compute to data
@ -29,14 +29,14 @@ We suggest reading these guides to get an understanding of how compute-to-data w

 ### User Guides

-* [How to write compute to data algorithms](broken-reference)
-* [How to publish a compute-to-data algorithm](broken-reference)
-* [How to publish a dataset for compute to data](broken-reference)
+* [How to write compute to data algorithms](broken-reference/)
+* [How to publish a compute-to-data algorithm](broken-reference/)
+* [How to publish a dataset for compute to data](broken-reference/)

 ### Developer Guides

 * [How to use compute to data with ocean.js](../ocean.js/cod-asset.md)
-* [How to use compute to data with ocean.py](../../data-scientists/ocean.py/)
+* [How to use compute to data with ocean.py](../../data-scientists/ocean.py)

 ### Infrastructure Deployment Guides

--- a/developers/ocean-node/node-architecture.md
+++ b/developers/ocean-node/node-architecture.md
@ -14,7 +14,7 @@ The Node stack is divided into the following layers:
 * Components layer (Indexer, Provider)
 * Modules layer

-<figure><img src="../../.gitbook/assets/image.png" alt=""><figcaption><p>Ocean Node Infrastructure diagram</p></figcaption></figure>
+<figure><img src="../../.gitbook/assets/image (1).png" alt=""><figcaption><p>Ocean Node Infrastructure diagram</p></figcaption></figure>

 ### Features