1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-01 15:55:34 +01:00

GITBOOK-467: change request with no subject merged in GitBook

This commit is contained in:
Christian Casazza 2023-06-08 18:53:56 +00:00 committed by gitbook-bot
parent 8024900d09
commit 7547a9c421
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
5 changed files with 17 additions and 16 deletions

View File

@ -93,9 +93,9 @@
* [Compute Endpoints](developers/provider/compute-endpoints.md)
* [Authentication Endpoints](developers/provider/authentication-endpoints.md)
* [📊 Data Science](data-science/README.md)
* [Composable Data Flows](data-science/composable-data-flows.md)
* [The Data Value Creation Loop](data-science/the-data-value-creation-loop.md)
* [Benefits of Ocean for Data Science](data-science/benefits-of-ocean-for-data-science.md)
* [Data Engineers](data-science/data-engineers.md)
* [Examples of valuable data](data-science/data-engineers.md)
* [Data Scientists](data-science/data-scientists.md)
* [🔨 Infrastructure](infrastructure/README.md)
* [Setup a Server](infrastructure/setup-server.md)

View File

@ -5,7 +5,7 @@ coverY: 0
# 📊 Data Science
Ocean Protocol is built to serve the data science space. This guide links you to the most important tutorials for data scientists working with Ocean Protocol. 
Ocean Protocol was built to serve the data science space. This guide links you to the most important tutorials for data scientists working with Ocean Protocol. 

View File

@ -1,9 +1,10 @@
# Benefits of Ocean for Data Science
* **Tokenized Ownership and Access Control:** The core benifet of Ocean Protocol is tokenizing the ownership and access control of data assets and services using data NFTs and datatokens. Tokenizing your assets allows them t be nteroperable  
Ocean Protocol leverages its foundations on top of crypto and our compute-to-data engine to provide significant advantages to the data science space. Some of those advantages are explored in more detail below.
* **Tokenized Ownership and Access Control:** The core benefit of Ocean Protocol is tokenizing the ownership and access control of data assets and services using data NFTs and datatokens. Tokenizing your assets allows them to be interoperable  
* **Privacy-Preserving Data Sharing:** Consider a scenario where abundant data is waiting to unveil its potential, but privacy concerns keep it hidden. Ocean's Compute-to-Data (C2D) engine solves this dilemma. It lets data publishers open their treasure troves for computational access without exposing the data itself, creating a novel income avenue while fostering a wider talent pool. Plus, it makes deploying models a breeze! For more information, check out our [Compute-to-Data section](../developers/compute-to-data/).
* **Fine-Grained Access Control:** Ocean Protocol's access control is like a well-tailored suit, offering a perfect fit for each user. Publishers can personalize access to their data, creating an exclusive access list and ensuring data interacts only with approved partners. To learn more, check out our [fine-grained access control section.](../developers/Fine-Grained-Permissions.md)
* **Crypto-Native Payments:** Ocean Protocol's contracts offer the efficiency of crypto payments lower transaction fees and immediate settlements. It makes transacting as effortless as sipping a cup of coffee, and with zero counterparty risk, its a win-win. To learn more, check out our our [asset pricing](../developers/asset-pricing.md) and [contracts ](../developers/contracts/)sections.
* **Crypto-Native Payments:** Ocean Protocol's contracts offer the efficiency of crypto payments lower transaction fees and immediate settlements. It makes transacting as effortless as sipping a cup of coffee, and with zero counterparty risk, its a win-win. To learn more, check out our 
* **Provenance of Data:** Knowing your data's origin story is invaluable, and Ocean Protocol takes full advantage of blockchain's auditing ability. It facilitates a detailed trace of data publishing, metadata alterations, and computational jobs, encouraging trust in data quality. To learn more, check out our [Subgraph section](../developers/subgraph/).
* **Verified Usage Statistics:** With Ocean Protocol, accessing product information is as straightforward as reading a book. Composable subgraphs provide comprehensive details like asset access history, total revenue, and more. It's a two-way street publishers understand their customers better, and consumers can trust in the asset quality. To learn more, check out our [Subgraph section](../developers/subgraph/).
* **Global Discovery of Assets:** Finding relevant data and models should be as simple as browsing a well-organized bookshelf. Unlike web2 platforms where the host dictates asset discoverability, Ocean Protocol promotes transparency and permissionless discovery of assets. Anyone can tailor their own marketplace, ensuring an open and democratic system. To learn more, check out our [Aquarius ](../developers/aquarius/)and [Build a Marketplace](../developers/build-a-marketplace/) sections.

View File

@ -1,10 +1,6 @@
# Data Engineers
# Examples of valuable data
Data engineers play a pivotal role in driving data value creation. If you're a data scientist looking to build useful dashboards or cutting-edge machine-learning models, you understand the importance of having access to well-curated data. That's where friendly and skilled data engineers come in!
Ocean allows data engineers to unleash their creativity. 
Data engineers can contribute numerous types of data to the Ocean Protocol ecosystem. Some examples are below.
The data value creation loop begins with a us
* **Government Open Data:** Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our [data challenges](https://desights.ai/shared/challenge/8) we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.
* **Public APIs:** A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. [This ](https://github.com/public-apis/public-apis)is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.

View File

@ -1,18 +1,22 @@
# Composable Data Flows
# The Data Value Creation Loop
Data is the fuel that drives ML and AI. The popular expression "garbage in, garbage out" holds true as the best way to improve data science effectiveness is to have better data. Data can exit in several different forms throughout the entire AI/ML value creation loop. 
The Data Value Creation Loop lives at the heart of Ocean Protocol. It refers to the process in which data gains value as it progresses from business problem, to raw data, to cleaned data, to trained model, to its use in applications. At each step of the way, additional work is done on the data so that it accrues greater value.
* **Raw Data**: This is the unprocessed, untouched data, fresh from the source. Example: a sales spreadsheet from a coffee shop or a corpus of internet text.
* **Business Problem:** Identifying the business problem that can be addressed with data science is the critical first step. Example: Reducincustomerer churn rate, predicting token prices, or predicting drought risk. 
* **Raw Data**: This is the unprocessed, untouched data, fresh from the source. This data can be static or dynamic from an API. Example: User profiles, historical prices, or daily temperature.
* **Cleaned Data and Feature Vectors**: The raw data, now polished and transformed into numerical representations - the feature vectors. Example: the coffee shop sales data, now cleaned and organized, or preprocessed text data transformed into word embeddings.
* **Trained Models**: Machine learning models that have been trained on feature vectors, learning to decode data's patterns and relationships. Example: a random forest model predicting coffee sales or GPT-3 trained on a vast text corpus.
* **Data to Tune Models**: Additional data introduced to further refine and enhance model performance. Example: a new batch of sales data for the coffee shop model, or specific domain text data for GPT-3.
* **Tuned Models**: Models that have been optimized for high performance, robustness, and accuracy. Example: a tuned random forest model forecasting the coffee shop's busiest hours, or a fine-tuned GPT-3 capable of generating expert-level text.
* **Model Prediction Inputs**: Inputs provided to the models to generate insights. Example: inputting today's date and weather into the sales model, or a text prompt for GPT-3 to generate a blog post.
* **Model Prediction Outputs**: The models' predictions or insights based on the inputs. Example: the sales model's forecast of a spike in iced coffee sales due to an incoming heatwave, or GPT-3's generated blog post on sustainability in business
* **Application:** 
With Ocean Protocol, data can be tokenized at every stage of the value creation loop. By leveraging the standards of Ocean, individuals can work together to mix and match valuable assets to build powerful flows. For example, when building a model instead of starting from scratch a data scientist may find a dataset that has already been cleaned and prepped for model building. To fine-tune a foundation model instead of building the dataset from scratch, they can find an already prepared one shared on Ocean.
Ocean Protocol allows each data value creation loop stage to be tokenized as an asset. Using Ocean Protocol, you can thus unlock _composable data science._ Instead of a data scientists needing to conduct each stage of the pipeline themselves, they can work together and build off of each other's components and focus on what they are best at. 
For example, an insurance provider may want to offer a parametric insurance product to protect against drought in a particular region. those with a strong skillset in data engineering may focus on the beginning of the loop and publish curated datasets from Ocean. Then, a data scientist may build