C2Dv2 continues the concept of bringing algorithms to the data, allowing both public and private datasets to be used with algorithms. While previous versions relied on external components (Provider -> Operator Service running in Kubernetes -> multiple Operator-Engines each running in their own Kubernetes namespace), C2Dv2 is embedded entirely within the ocean-node.
It has a modular approach, allowing multiple compute engines to be connected to the same ocean-node engine. These compute engines can be internal (Docker or Kubernetes if ocean-node runs in a Kubernetes environment) or external (in the future, integration with projects like Bachalau, iExec, etc., is possible).
### Additional Features
* Allow multiple C2D engines to connect to the same ocean-node
* Support multiple jobs (stages) in a workflow
* Jobs can be dependent or independent of previous stages, allowing for parallel or serial job execution
### Workflows
A workflow defines one or more jobs to be executed. Each job may have dependencies on a previous job.
```json
[
{
"index": number,
"jobId": "generated by orchestrator",
"runAfter": "if defined, wait for specific jobId to finish",
"input": [
{
"index": number,
"did": "optional",
"serviceId": "optional",
"files": "filesObject, optional"
}
],
"algorithm": {
"did": "optional",
"serviceId": "optional",
"files": "filesObject, optional",
"rawcode": "optional",
"container": {
"entrypoint": "string",
"image": "string",
"tag": "string"
}
}
}
]
```
### Orchestration Layer
Formerly known as the "operator-service," this layer handles interactions between the ocean-node core layer and different execution environments.
In summary, it should:
* Expose a list of compute environments for all engines
* Expose a list of running jobs and limits (e.g., max concurrent jobs)
* Take on new jobs (created by the startJob core handler)
* Determine which module to use (Docker, Kubernetes, Bachalau, etc.)
* Insert workflow into the database
* Signal the engine handler to take over job execution
* Read workflow status when the C2D getStatus core is called
* Serve job results when the C2D getJobResult is called
Due to technical constraints, both internal modules (Docker and Kubernetes) will use Docker images for data provisioning (previously pod-configuration) and results publishing (previously pod-publishing). The orchestration layer will also expose two new core commands:
*`c2dJobStatusUpdate` (called by both pod-configuration and pod-publishing to update job status)
*`c2dJobPublishResult` (called by pod-publishing when results need to be uploaded)
When any pod-\*\* calls one of these endpoints, we must verify the signature and respond accordingly.
### Payment Flow in Orchestration
This will be based on an escrow contract. The orchestrator will:
* Compute the sum of maxDuration from all jobs in the workflow
* Calculate the required fee (depending on the previous step, token, environment, etc.)
* Lock the amount in the escrow contract
* Wait until all jobs are finished (successfully or not)
* Calculate the actual duration spent
* Compute proof
* Withdraw payment & provide proof, releasing the difference back to the customer
### C2D Engines
A C2D Engine is a piece of code that handles C2D jobs running on a specific orchestration implementation. This document focuses on internal compute engines: Docker-based (host with Docker environment installed) and Kubernetes-based (if ocean-node runs inside a Kubernetes cluster).
An engine that uses external services (like Bachalau) follows the same logic but will likely interact with remote APIs.
An engine is responsible for:
* Storing workflows and each job status (so on restart, we can resume or continue running flows)
* Queueing new jobs
#### Docker Engine
This module requires Docker service installed at the host level. It leverages the Docker API to:
* Create job volumes (with quotas)
* Start the provisioning container (pod-configuration)
* Monitor its status
* Create YAML for algorithms with hardware constraints (CPU, RAM)
* Pass devices for GPU environments
* Start the algorithm container
* Monitor algorithm health & timeout constraints
* Stop the algorithm if the quota is exceeded
* Start the publishing container
* Delete job volumes
```
title C2Dv2 message flow for docker module
User -> Ocean-node: start c2d job
Ocean-node -> Orchestration-class: start c2d job
Orchestration-class -> Orchestration-class: determinte module and insert workflow, random private key in db
This module requires access to Kubernetes credentials (or autodetects them if ocean-node already runs in a Kubernetes cluster). It leverages the Kubernetes API to:
* Create job volumes (with quotas)
* Start the provisioning container (pod-configuration)
* Monitor its status
* Create YAML for algorithms with hardware constraints (CPU, RAM)
* Pass devices for GPU environments
* Start the algorithm container
* Monitor algorithm health & timeout constraints
* Stop the algorithm if the quota is exceeded
* Start the publishing container
* Delete job volumes
#### POD-\* Common Description
For efficient communication between ocean-node and the two containers, the easiest way is to use p2p/http API. Thus, all pod-\* instances will run an ocean-node instance (each will have a job-generated random key) and connect to the main ocean-node instance. The main ocean-node instance's peerNodeId or HTTP API endpoint will be inserted in the YAML. Each pod-\* will use a private key, also exposed in the YAML.
Each YAML of pod-\* will contain the following environment variables:
* nodePeerId
* nodeHttpApi
* privateKey
#### Pod-configuration
Previously, pod-configuration was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_configuration.js).
Implementation:
* Call ocean-node/c2dJobProvision and get the workflow's input section
* Download all assets
* Call the ocean-node/c2dJobStatusUpdate core command to update status (provision finished or errors)
#### Pod-publishing
Previously, pod-publishing was a standalone repository built as a Docker image. In this implementation, it will be ocean-node with a different entrypoint (entry\_publishing.js).
Implementation:
* Read the output folder
* If multiple files or folders are detected, create a zip with all those files/folders
* Call the ocean-node/c2dJobPublishResult core command and let ocean-node handle storage
* Call the ocean-node/c2dJobStatusUpdate core command to update the job as done