GITBOOK-7: No subject

2024-11-26 19:49:26 +01:00 · 2024-06-18 10:45:45 +00:00 · 2024-06-18 10:45:45 +00:00 · 3b894df0d4
commit 3b894df0d4
parent 7d29546d49
6 changed files with 185 additions and 6 deletions
--- a/.gitbook/assets/image
+++ b/.gitbook/assets/image
--- a/.gitbook/assets/image.png
+++ b/.gitbook/assets/image.png
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -27,6 +27,7 @@
  * [Architecture Overview](developers/architecture.md)
  * [Ocean Node](developers/ocean-node/README.md)
    * [Node Architecture](developers/ocean-node/node-architecture.md)
  * [Compute-to-data (C2D)](developers/compute-to-data-c2d.md)
  * [Page 1](developers/page-1.md)
  * [Contracts](developers/contracts/README.md)
    * [Data NFTs](developers/contracts/data-nfts.md)
--- a/developers/compute-to-data-c2d.md
+++ b/developers/compute-to-data-c2d.md
@ -0,0 +1,178 @@
 # Compute-to-data (C2D)
 C2Dv2 continues the concept of bringing algorithms to the data, allowing both public and private datasets to be used with algorithms. While previous versions relied on external components (Provider -> Operator Service running in Kubernetes -> multiple Operator-Engines each running in their own Kubernetes namespace), C2Dv2 is embedded entirely within the ocean-node.
 It has a modular approach, allowing multiple compute engines to be connected to the same ocean-node engine. These compute engines can be internal (Docker or Kubernetes if ocean-node runs in a Kubernetes environment) or external (in the future, integration with projects like Bachalau, iExec, etc., is possible).
 ### Additional Features
 * Allow multiple C2D engines to connect to the same ocean-node
 * Support multiple jobs (stages) in a workflow
 * Jobs can be dependent or independent of previous stages, allowing for parallel or serial job execution
 ### Workflows
 A workflow defines one or more jobs to be executed. Each job may have dependencies on a previous job.
 ```json
 [
  {
    "index": number,
    "jobId": "generated by orchestrator",
    "runAfter": "if defined, wait for specific jobId to finish",
    "input": [
      {
        "index": number,
        "did": "optional",
        "serviceId": "optional",
        "files": "filesObject, optional"
      }
    ],
    "algorithm": {
      "did": "optional",
      "serviceId": "optional",
      "files": "filesObject, optional",
      "rawcode": "optional",
      "container": {
        "entrypoint": "string",
        "image": "string",
        "tag": "string"
      }
    }
  }
 ]
 ```
 ### Orchestration Layer
 Formerly known as the "operator-service," this layer handles interactions between the ocean-node core layer and different execution environments.
 In summary, it should:
 * Expose a list of compute environments for all engines
 * Expose a list of running jobs and limits (e.g., max concurrent jobs)
 * Take on new jobs (created by the startJob core handler)
 * Determine which module to use (Docker, Kubernetes, Bachalau, etc.)
 * Insert workflow into the database
 * Signal the engine handler to take over job execution
 * Read workflow status when the C2D getStatus core is called
 * Serve job results when the C2D getJobResult is called
 Due to technical constraints, both internal modules (Docker and Kubernetes) will use Docker images for data provisioning (previously pod-configuration) and results publishing (previously pod-publishing). The orchestration layer will also expose two new core commands:
 * `c2dJobStatusUpdate` (called by both pod-configuration and pod-publishing to update job status)
 * `c2dJobPublishResult` (called by pod-publishing when results need to be uploaded)
 When any pod-\*\* calls one of these endpoints, we must verify the signature and respond accordingly.
 ### Payment Flow in Orchestration
 This will be based on an escrow contract. The orchestrator will:
 * Compute the sum of maxDuration from all jobs in the workflow
 * Calculate the required fee (depending on the previous step, token, environment, etc.)
 * Lock the amount in the escrow contract
 * Wait until all jobs are finished (successfully or not)
 * Calculate the actual duration spent
 * Compute proof
 * Withdraw payment & provide proof, releasing the difference back to the customer
 ### C2D Engines
 A C2D Engine is a piece of code that handles C2D jobs running on a specific orchestration implementation. This document focuses on internal compute engines: Docker-based (host with Docker environment installed) and Kubernetes-based (if ocean-node runs inside a Kubernetes cluster).
 An engine that uses external services (like Bachalau) follows the same logic but will likely interact with remote APIs.
 An engine is responsible for:
 * Storing workflows and each job status (so on restart, we can resume or continue running flows)
 * Queueing new jobs
 #### Docker Engine
 This module requires Docker service installed at the host level. It leverages the Docker API to:
 * Create job volumes (with quotas)
 * Start the provisioning container (pod-configuration)
 * Monitor its status
 * Create YAML for algorithms with hardware constraints (CPU, RAM)
 * Pass devices for GPU environments
 * Start the algorithm container
 * Monitor algorithm health & timeout constraints
 * Stop the algorithm if the quota is exceeded
 * Start the publishing container
 * Delete job volumes
 ```
 title C2Dv2 message flow for docker module
 User -> Ocean-node: start c2d job
 Ocean-node -> Orchestration-class: start c2d job
 Orchestration-class -> Orchestration-class: determinte module and insert workflow, random private key in db
 Orchestration-class -> Docker-engine: queue job
 Docker-engine -> Docker_host_api:  create job volume
 Docker-engine -> Docker-engine: create yaml for pod-configuration, set private key
 Docker-engine -> Docker_host_api: start pod-configuration
 Pod_configuration -> Pod_configuration: starts ocean-node as pod-config
 Pod_configuration -> Ocean-node: call c2dJobProvision
 Ocean-node -> Pod_configuration: return workflow
 Pod_configuration -> Pod_configuration : download inputs & algo
 Pod_configuration -> Ocean-node: call c2dJobStatusUpdate
 Ocean-node -> Docker-engine: download success, start algo
 Docker-engine -> Docker-engine: create yaml for algo
 Docker-engine -> Docker_host_api: start algo container
 Docker-engine -> Docker-engine: monitor algo container, stop if timeout
 Docker-engine -> Docker-engine: create yaml for pod-publishing, set private key
 Docker-engine -> Docker_host_api: start pod-publishing
 Docker_host_api -> Pod-Publishing: start as docker container
 Pod-Publishing -> Pod-Publishing : prepare output
 Pod-Publishing -> Ocean-node: call c2dJobPublishResult
 Pod-Publishing -> Ocean-node: call c2dJobStatusUpdate
 ```
 <figure><img src="../.gitbook/assets/image.png" alt=""><figcaption><p>C2Dv2 flow diagram</p></figcaption></figure>
 #### Kubernetes Engine
 This module requires access to Kubernetes credentials (or autodetects them if ocean-node already runs in a Kubernetes cluster). It leverages the Kubernetes API to:
 * Create job volumes (with quotas)
 * Start the provisioning container (pod-configuration)
 * Monitor its status
 * Create YAML for algorithms with hardware constraints (CPU, RAM)
 * Pass devices for GPU environments
 * Start the algorithm container
 * Monitor algorithm health & timeout constraints
 * Stop the algorithm if the quota is exceeded
 * Start the publishing container
 * Delete job volumes
 #### POD-\* Common Description
 For efficient communication between ocean-node and the two containers, the easiest way is to use p2p/http API. Thus, all pod-\* instances will run an ocean-node instance (each will have a job-generated random key) and connect to the main ocean-node instance. The main ocean-node instance's peerNodeId or HTTP API endpoint will be inserted in the YAML. Each pod-\* will use a private key, also exposed in the YAML.
 Each YAML of pod-\* will contain the following environment variables:
 * nodePeerId
 * nodeHttpApi
 * privateKey
 #### Pod-configuration
 Previously, pod-configuration was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_configuration.js).
 Implementation:
 * Call ocean-node/c2dJobProvision and get the workflow's input section
 * Download all assets
 * Call the ocean-node/c2dJobStatusUpdate core command to update status (provision finished or errors)
 #### Pod-publishing
 Previously, pod-publishing was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_publishing.js).
 Implementation:
 * Read the output folder
 * If multiple files or folders are detected, create a zip with all those files/folders
 * Call the ocean-node/c2dJobPublishResult core command and let ocean-node handle storage
 * Call the ocean-node/c2dJobStatusUpdate core command to update the job as done
--- a/developers/compute-to-data/README.md
+++ b/developers/compute-to-data/README.md
@ -1,5 +1,5 @@
 ---
-description: Monetise your data while preserving privacy
+description: Compute to data version 2 (C2dv2)
 ---
 # Compute to data
@ -29,14 +29,14 @@ We suggest reading these guides to get an understanding of how compute-to-data w
 ### User Guides
-* [How to write compute to data algorithms](broken-reference)
+* [How to write compute to data algorithms](broken-reference/)
-* [How to publish a compute-to-data algorithm](broken-reference)
+* [How to publish a compute-to-data algorithm](broken-reference/)
-* [How to publish a dataset for compute to data](broken-reference)
+* [How to publish a dataset for compute to data](broken-reference/)
 ### Developer Guides
 * [How to use compute to data with ocean.js](../ocean.js/cod-asset.md)
-* [How to use compute to data with ocean.py](../../data-scientists/ocean.py/)
+* [How to use compute to data with ocean.py](../../data-scientists/ocean.py)
 ### Infrastructure Deployment Guides
--- a/developers/ocean-node/node-architecture.md
+++ b/developers/ocean-node/node-architecture.md
@ -14,7 +14,7 @@ The Node stack is divided into the following layers:
 * Components layer (Indexer, Provider)
 * Modules layer
-<figure><img src="../../.gitbook/assets/image.png" alt=""><figcaption><p>Ocean Node Infrastructure diagram</p></figcaption></figure>
+<figure><img src="../../.gitbook/assets/image (1).png" alt=""><figcaption><p>Ocean Node Infrastructure diagram</p></figcaption></figure>
 ### Features