In the Ocean Protocol stack, algorithms are recognized as distinct asset types, alongside datasets. When it comes to Compute-to-Data, an algorithm comprises the following key components:
* **Algorithm Code**: The algorithm code refers to the specific instructions and logic that define the computational steps to be executed on a dataset. It encapsulates the algorithms' functionalities, calculations, and transformations.
* **Docker Image**: A Docker image plays a crucial role in encapsulating the algorithm code and its runtime dependencies. It consists of a base image, which provides the underlying environment for the algorithm, and a corresponding tag that identifies a specific version or variant of the image.
* **Entry Point**: The entry point serves as the starting point for the algorithm's execution within the compute environment. It defines the initial actions to be performed when the algorithm is invoked, such as loading necessary libraries, setting up configurations, or calling specific functions.
Collectively, these components form the foundation of an algorithm in the context of Compute-to-Data.
When creating an algorithm asset in Ocean Protocol, it is essential to include the additional algorithm object in its metadata service. This algorithm object plays a crucial role in defining the Docker container environment associated with the algorithm. By specifying the necessary details within the algorithm object, such as the base image, tags, runtime configurations, and dependencies, the metadata service ensures that the algorithm asset is properly configured for execution within a Docker container.
<table><thead><tr><thwidth="212">Variable</th><th>Usage</th></tr></thead><tbody><tr><td><code>image</code></td><td>The Docker image name the algorithm will run with.</td></tr><tr><td><code>tag</code></td><td>The Docker image tag that you are going to use.</td></tr><tr><td><code>entrypoint</code></td><td>The Docker entrypoint. <code>$ALGO</code> is a macro that gets replaced inside the compute job, depending where your algorithm code is downloaded.</td></tr></tbody></table>
Define your entry point according to your dependencies. E.g. if you have multiple versions of Python installed, use the appropriate command `python3.6 $ALGO`.
There are plenty of Docker containers that work out of the box. However, if you have custom dependencies, you may want to configure your own Docker Image. To do so, create a Dockerfile with the appropriate instructions for dependency management and publish the container, e.g. using Dockerhub.
| `/data/inputs` | read | Storage for input data sets, accessible only to the algorithm running in the pod. Contents will be the files themselves, inside indexed folders e.g. `/data/inputs/{did}/{service_id}`. |
| `/data/ddos` | read | Storage for all DDOs involved in compute job (input data set + algorithm). Contents will json files containing the DDO structure. |
| `/data/outputs` | read/write | Storage for all of the algorithm's output files. They are uploaded on some form of cloud storage, and URLs are sent back to the consumer. |
| `/data/logs/` | read/write | All algorithm output (such as `print`, `console.log`, etc.) is stored in a file located in this folder. They are stored and sent to the consumer as well. |
Please note that when using local Providers or Metatata Caches, the ddos might not be correctly transferred into c2d, but inputs are still available. If your algorithm relies on contents from the DDO json structure, make sure to use a public Provider and Metadata Cache (Aquarius instance).
<table><thead><tr><thwidth="296">Variable</th><th>Usage</th></tr></thead><tbody><tr><td><code>DIDS</code></td><td>An array of DID strings containing the input datasets.</td></tr><tr><td><code>TRANSFORMATION_DID</code></td><td>The DID of the algorithm.</td></tr></tbody></table>
The following is a simple JavaScript/Node.js algorithm, doing a line count for ALL input datasets. The algorithm is not using any environment variables, but instead it's scanning the `/data/inputs` folder.
An asset of type `algorithm` has additional attributes under `metadata.algorithm`, describing the algorithm and the Docker environment it is supposed to be run under.
<table><thead><tr><th>Attribute</th><thwidth="221.33333333333331">Type</th><th>Description</th></tr></thead><tbody><tr><td><strong><code>language</code></strong></td><td><code>string</code></td><td>Language used to implement the software.</td></tr><tr><td><strong><code>version</code></strong></td><td><code>string</code></td><td>Version of the software preferably in <ahref="https://semver.org">SemVer</a> notation. E.g. <code>1.0.0</code>.</td></tr><tr><td><strong><code>consumerParameters</code></strong></td><td><ahref="did-ddo.md#consumer-parameters">Consumer Parameters</a></td><td>An object that defines required consumer input before running the algorithm</td></tr><tr><td><strong><code>container</code></strong>*</td><td><code>container</code></td><td>Object describing the Docker container image. See below</td></tr></tbody></table>
<table><thead><tr><thwidth="210">Attribute</th><thwidth="164.33333333333331">Type</th><th>Description</th></tr></thead><tbody><tr><td><strong><code>entrypoint</code></strong>*</td><td><code>string</code></td><td>The command to execute, or script to run inside the Docker image.</td></tr><tr><td><strong><code>image</code></strong>*</td><td><code>string</code></td><td>Name of the Docker image.</td></tr><tr><td><strong><code>tag</code></strong>*</td><td><code>string</code></td><td>Tag of the Docker image.</td></tr><tr><td><strong><code>checksum</code></strong>*</td><td><code>string</code></td><td>Digest of the Docker image. (ie: sha256:xxxxx)</td></tr></tbody></table>