mirror of
https://github.com/oceanprotocol/docs.git
synced 2024-11-26 19:49:26 +01:00
add section c2d algos
This commit is contained in:
parent
686b963a32
commit
5194b96c5a
206
content/tutorials/write-c2d-algorithms.md
Normal file
206
content/tutorials/write-c2d-algorithms.md
Normal file
@ -0,0 +1,206 @@
|
||||
---
|
||||
title: Writing C2D Algorithms
|
||||
description: Writing C2D Algorithms
|
||||
---
|
||||
|
||||
## Overwiew
|
||||
|
||||
An C2D algorithm is composed of the following:
|
||||
- a docker image (base image)
|
||||
- an algorithm code
|
||||
- a entry point
|
||||
|
||||
That's why, while creating the algorithm asset in ocean, we need the additional object "algorithm" defined in the metadata service:
|
||||
|
||||
```
|
||||
"algorithm": {
|
||||
"container": {
|
||||
"entrypoint": "node $ALGO",
|
||||
"image": "node",
|
||||
"tag": "latest"
|
||||
}
|
||||
}
|
||||
```
|
||||
Most important attributes are the following:
|
||||
- image: this is the docker image that your are going to use
|
||||
- tag: this is the docker image tag that you are going to use
|
||||
- entrypoint: this is the entrypoint. $ALGO is a macro that gets replaced inside C2D, depending where your algo code is downloaded
|
||||
|
||||
Here are some examples:
|
||||
- to run a JS algo, based on node 14:
|
||||
```
|
||||
"algorithm": {
|
||||
"container": {
|
||||
"entrypoint": "node $ALGO",
|
||||
"image": "node",
|
||||
"tag": "14"
|
||||
}
|
||||
}
|
||||
```
|
||||
- to run a python algo, based on python:3.9.4-alpine3.13:
|
||||
```
|
||||
"algorithm": {
|
||||
"container": {
|
||||
"entrypoint": "python3.9 $ALGO",
|
||||
"image": "python",
|
||||
"tag": "3.9.4-alpine3.13"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Be aware that you might need a lot of dependencies, so it's a lot faster if you are going to build your own images (we build some [HERE](https://github.com/oceanprotocol/algo_dockers))
|
||||
|
||||
|
||||
|
||||
## Data structure
|
||||
|
||||
Every algorithm pod will have some volumes mounted:
|
||||
|
||||
- /data/inputs (read only) - this is where the datasets are going to be stored
|
||||
- /data/ddos (read only) - this is where all DDOs (input + algorithm) are going to be stored
|
||||
- /data/outputs - this is where the algorithm should store all output files (they are going to be uploaded on storage and URLs sent back to the consumer)
|
||||
|
||||
All algorithm output (such as print, console.log, etc) are going to be stored in a file located in /data/logs/. They are going to be stored and sent to the customer as well
|
||||
|
||||
|
||||
## ENV variables available to algorithms
|
||||
|
||||
|
||||
For every algorithm pod, C2D is going to provide the following ENVs:
|
||||
- DIDS: this is an array containing the input datasets
|
||||
- TRANSFORMATION_DID: this is the algorithm did
|
||||
|
||||
|
||||
# Sample Algorithms
|
||||
|
||||
## JS example
|
||||
|
||||
The following is a simple js algorithm, that does a line count for ALL input datasets. The algo is not using any ENVS, but instead it's scanning the /data/inputs folder.
|
||||
|
||||
```js
|
||||
const fs = require("fs")
|
||||
const path = require("path")
|
||||
|
||||
|
||||
var input_folder="/data/inputs";
|
||||
var output_folder="/data/outputs"
|
||||
|
||||
|
||||
async function processfolder(Path) {
|
||||
var files = fs.readdirSync(Path)
|
||||
for (var i = 0; i < files.length; i++) {
|
||||
var file=files[i];
|
||||
var fullpath=Path + "/" + file;
|
||||
if (fs.statSync(fullpath).isDirectory()) {
|
||||
await processfolder(fullpath)
|
||||
} else {
|
||||
await countrows(fullpath)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
async function countrows(file){
|
||||
console.log("Start counting for "+file)
|
||||
var fileBuffer = fs.readFileSync(file);
|
||||
var to_string = fileBuffer.toString();
|
||||
var split_lines = to_string.split("\n");
|
||||
var rows=split_lines.length-1;
|
||||
fs.appendFileSync(output_folder+'/output.log', file+','+rows+"\r\n");
|
||||
console.log('Finished. We have '+rows+' lines')
|
||||
}
|
||||
|
||||
processfolder(input_folder)
|
||||
```
|
||||
|
||||
To run this, use the following container object:
|
||||
```
|
||||
"algorithm": {
|
||||
"container": {
|
||||
"entrypoint": "node $ALGO",
|
||||
"image": "node",
|
||||
"tag": "10"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Advanced Python example
|
||||
|
||||
A more advanced python line counting, which relies on ENVs and constructs a job object, containing all the input files & ddos
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
import os
|
||||
import time
|
||||
import json
|
||||
|
||||
def get_job_details():
|
||||
"""Reads in metadata information about assets used by the algo"""
|
||||
job = dict()
|
||||
job['dids'] = json.loads(os.getenv('DIDS', None))
|
||||
job['metadata'] = dict()
|
||||
job['files'] = dict()
|
||||
job['algo'] = dict()
|
||||
job['secret'] = os.getenv('secret', None)
|
||||
algo_did = os.getenv('TRANSFORMATION_DID', None)
|
||||
if job['dids'] is not None:
|
||||
for did in job['dids']:
|
||||
# get the ddo from disk
|
||||
filename = '/data/ddos/' + did
|
||||
print(f'Reading json from {filename}')
|
||||
with open(filename) as json_file:
|
||||
ddo = json.load(json_file)
|
||||
# search for metadata service
|
||||
for service in ddo['service']:
|
||||
if service['type'] == 'metadata':
|
||||
job['files'][did] = list()
|
||||
index = 0
|
||||
for file in service['attributes']['main']['files']:
|
||||
job['files'][did].append(
|
||||
'/data/inputs/' + did + '/' + str(index))
|
||||
index = index + 1
|
||||
if algo_did is not None:
|
||||
job['algo']['did'] = algo_did
|
||||
job['algo']['ddo_path'] = '/data/ddos/' + algo_did
|
||||
return job
|
||||
|
||||
|
||||
def line_counter(job_details):
|
||||
"""Executes the line counter based on inputs"""
|
||||
print('Starting compute job with the following input information:')
|
||||
print(json.dumps(job_details, sort_keys=True, indent=4))
|
||||
|
||||
""" Now, count the lines of the first file in first did """
|
||||
first_did = job_details['dids'][0]
|
||||
filename = job_details['files'][first_did][0]
|
||||
non_blank_count = 0
|
||||
with open(filename) as infp:
|
||||
for line in infp:
|
||||
if line.strip():
|
||||
non_blank_count += 1
|
||||
print ('number of non-blank lines found %d' % non_blank_count)
|
||||
""" Print that number to output to generate algo output"""
|
||||
f = open("/data/outputs/result", "w")
|
||||
f.write(str(non_blank_count))
|
||||
f.close()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
line_counter(get_job_details())
|
||||
|
||||
```
|
||||
|
||||
To run this, use the following container object:
|
||||
```
|
||||
"algorithm": {
|
||||
"container": {
|
||||
"entrypoint": "python3.6 $ALGO",
|
||||
"image": "oceanprotocol/algo_dockers",
|
||||
"tag": "python-sql"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
@ -26,6 +26,8 @@
|
||||
items:
|
||||
- title: Run a Compute-to-Data Environment
|
||||
link: /tutorials/compute-to-data/
|
||||
- title: Writing C2D Algorithms
|
||||
link: /tutorials/write-c2d-algorithms/
|
||||
|
||||
- group: Storage Setup
|
||||
items:
|
||||
|
Loading…
Reference in New Issue
Block a user