docs/compute-to-data.md at df7327c0f11b76d4f4311b2af5b2abbf3ba7787b

mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-02 00:05:35 +01:00

alexcos20 df7327c0f1 update core C2D concepts

2021-04-28 00:20:22 -07:00

3.3 KiB

Raw Blame History

title	description	slug	section
Compute-to-Data	Providing access to data in a privacy-preserving fashion	/concepts/compute-to-data/	concepts

Motivation

The most basic scenario for a Publisher is to provide access to the datasets they own or manage. However, a Publisher may offer a service to execute some computation on top of their data. This has some benefits:

The data never leaves the Publisher enclave.
It's not necessary to move the data; the algorithm is sent to the data.
Having only one copy of the data and not moving it makes it easier to be compliant with data protection regulations.

This page elaborates on the benefits.

Architecture

Enabling Publisher Services, using Ocean Provider

The direct interaction with the infrastructure where the data resides requires the execution of a component handled by Publishers.

This component will be in charge of interacting with users and managing the basics of a Publisher's infrastructure to provide these additional services.

The business logic supporting these additional Publisher capabilities is the responsibility of this new technical component.

The main and new key component introduced to support these additional Publisher services is named Ocean Provider.

Ocean Provider is the technical component executed by the Publishers, which provides extended data services. Ocean Provider includes the credentials to interact with the infrastructure (initially in cloud providers, but it could be on-premise).

Compute-to-Data Environment (Operator-Service)

The Operator Service is a micro-service that implements part of the Compute-to-Data spec OEP-12, in charge of managing the workflow executing requests.

Typically the Operator Service is integrated from Ocean Provider, but can be called independently of it.

The Operator Service is in charge of establishing the communication with the K8s cluster, allowing it to:

Register new compute jobs
List the current compute jobs
Get a detailed result for a given job
Stop a running job

The Operator Service doesn't provide any storage capability, all the state is stored directly in the K8s cluster.

Responsibilities

The main responsibilities are:

Expose an HTTP API allowing for the execution of data access and compute endpoints.
Interact with the infrastructure (cloud/on-premise) using the Publisher's credentials.
Start/stop/execute computing instances with the algorithms provided by users.
Retrieve the logs generated during executions.

Flow

![Sequence Diagram for computing services](images/Starting New Compute Job.png)

In the above diagram you can see the initial integration supported. It involves the following components/actors:

Consumers - The end users who need to use some computing services offered by the same Publisher as the data Publisher.
Operator-Service - Micro-service that is handling the compute requests.
Operator-Engine - The computing systems where the compute will be executed.
Kubernetes - a K8 cluster

Before the flow can begin, the following pre-conditions must be met:

The Asset DDO has a compute service.
The Asset DDO must specify a provider endpoint exposed by the Publisher.

3.3 KiB Raw Blame History