GITBOOK-434: change request with no subject merged in GitBook

This commit is contained in:
Christian Casazza 2023-06-07 13:56:31 +00:00 committed by gitbook-bot
parent 9179f8ef01
commit 4637f36673
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
12 changed files with 57 additions and 19 deletions

View File

Before

Width:  |  Height:  |  Size: 320 KiB

After

Width:  |  Height:  |  Size: 320 KiB

View File

Before

Width:  |  Height:  |  Size: 78 KiB

After

Width:  |  Height:  |  Size: 78 KiB

View File

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 77 KiB

View File

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 77 KiB

View File

@ -1,6 +1,6 @@
---
description: Help for wherever you are on your Ocean Protocol journey.
cover: .gitbook/assets/cover/contribute (1).png
cover: .gitbook/assets/cover/contribute (1) (1) (2).png
coverY: 0
layout: landing
---

View File

@ -41,7 +41,7 @@
* [Asset Pricing](developers/asset-pricing.md)
* [Fees](developers/fees.md)
* [Metadata](developers/contracts/metadata.md)
* [Fractional Ownership](developers/datanft-and-datatoken/fractional-ownership.md)
* [Fractional Ownership](developers/contracts/fractional-ownership.md)
* [Community Monetization](developers/community-monetization.md)
* [Identifiers & Metadata](developers/Identifiers-Metadata.md)
* [DDO Specification](developers/ddo-specification.md)
@ -93,6 +93,8 @@
* [Compute Endpoints](developers/provider/compute-endpoints.md)
* [Authentication Endpoints](developers/provider/authentication-endpoints.md)
* [📊 Data Science](data-science/README.md)
* [Composable Data Flows](data-science/composable-data-flows.md)
* [Benefits of Ocean for Data Science](data-science/benefits-of-ocean-for-data-science.md)
* [Data Engineers](data-science/data-engineers.md)
* [Data Scientists](data-science/data-scientists.md)
* [🔨 Infrastructure](infrastructure/README.md)

View File

@ -1,17 +1,25 @@
---
description: Data Science
cover: ../.gitbook/assets/cover/data_science.png
coverY: 0
---
# 📊 Data Science
Ocean Protocol, a playground for data science, is all about standardization and convenience. Built with the data and AI era in mind, it offers an interactive ecosystem for digital assets, allowing developers to focus more on their creativity and less on technical roadblocks.
The world runs on data. From social media to shopping online to healthcare to financial planning, data drives our interactions in the world. Access to greater amounts of data can create a flywheel of value creation; better data leads to better insights which leads to greater profits. 
Unfortunately, today's data infrastructure is broken. Data lives in silos unable to interact with each other. Sharing data is difficult due to the difficulty of managing a hodgepodge of different methods for ownership and access control across many different service providers and applications. Data privacy problems also loom over data sharing; once it is duplicated, the owner loses control over their assets. 
Ocean Protocol was created to build a better system for how we manage and share our data assets. It repurposes the standards created within crypto and DeFi to facilitate a new paradigm of _self-custodial_ ownership and access control of our data assets. NFTs become a permissionless standard of ownership, ERC20s act as a permissionless standard for flexible access control rights, crypto wallets like metamask become a self-custodial holder of our assets. 
Ocean's Compute-to-Data engine resolves the trade-off between the benefits of open data and data privacy risks. Using the engine, algorithms can be run on 
Data scientists that prefer to use python can work with Ocean by using [Ocean.py](../developers/ocean.py/). Ocean.py is a python library that interacts with all Ocean contracts and tools. To get started with the library, check out our guides. They will teach installation and set-up and several popular workflows such as[ publishing an asset](../developers/ocean.py/publish-flow.md) and starting a [compute job](../developers/ocean.py/compute-flow.md).
1. Privacy-Preserving Data Sharing: Consider a scenario where abundant data is waiting to unveil its potential, but privacy concerns keep it hidden. Ocean's Compute-to-Data (C2D) engine solves this dilemma. It lets data publishers open their treasure troves for computational access without exposing the data itself, creating a novel income avenue while fostering a wider talent pool. Plus, it makes deploying models a breeze! For more information, check out our [Compute-to-Data section](../developers/compute-to-data/).
2. Fine-Grained Access Control: Ocean Protocol's access control is like a well-tailored suit, offering a perfect fit for each user. Publishers can personalize access to their data, creating an exclusive access list and ensuring data interacts only with approved partners. To learn more, check out our [fine-grained access control section.](../developers/Fine-Grained-Permissions.md)
3. Crypto-Native Payments: Ocean Protocol's contracts offer the efficiency of crypto payments lower transaction fees and immediate settlements. It makes transacting as effortless as sipping a cup of coffee, and with zero counterparty risk, its a win-win. To learn more, check out our our [asset pricing](../developers/asset-pricing.md) and [contracts ](../developers/contracts/)sections.\
4. Provenance of Data: Knowing your data's origin story is invaluable, and Ocean Protocol takes full advantage of blockchain's auditing ability. It facilitates a detailed trace of data publishing, metadata alterations, and computational jobs, encouraging trust in data quality. To learn more, check out our [Subgraph section](../developers/subgraph/).
5. Verified Usage Statistics: With Ocean Protocol, accessing product information is as straightforward as reading a book. Composable subgraphs provide comprehensive details like asset access history, total revenue, and more. It's a two-way street publishers understand their customers better, and consumers can trust in the asset quality. To learn more, check out our [Subgraph section](../developers/subgraph/).
6. Global Discovery of Assets: Finding relevant data and models should be as simple as browsing a well-organized bookshelf. Unlike web2 platforms where the host dictates asset discoverability, Ocean Protocol promotes transparency and permissionless discovery of assets. Anyone can tailor their own marketplace, ensuring an open and democratic system. To learn more, check out our [Aquarius ](../developers/aquarius/)and [Build a Marketplace](../developers/build-a-marketplace/) sections.\

View File

@ -0,0 +1,10 @@
# Benefits of Ocean for Data Science
* **Tokenized Ownership and Access Control:** The core benifet of Ocean Protocol is tokenizing the ownership and access control of data assets and services using data NFTs and datatokens. Tokenizing your assets allows them t be nteroperable  
* **Privacy-Preserving Data Sharing:** Consider a scenario where abundant data is waiting to unveil its potential, but privacy concerns keep it hidden. Ocean's Compute-to-Data (C2D) engine solves this dilemma. It lets data publishers open their treasure troves for computational access without exposing the data itself, creating a novel income avenue while fostering a wider talent pool. Plus, it makes deploying models a breeze! For more information, check out our [Compute-to-Data section](../developers/compute-to-data/).
* **Fine-Grained Access Control:** Ocean Protocol's access control is like a well-tailored suit, offering a perfect fit for each user. Publishers can personalize access to their data, creating an exclusive access list and ensuring data interacts only with approved partners. To learn more, check out our [fine-grained access control section.](../developers/Fine-Grained-Permissions.md)
* **Crypto-Native Payments:** Ocean Protocol's contracts offer the efficiency of crypto payments lower transaction fees and immediate settlements. It makes transacting as effortless as sipping a cup of coffee, and with zero counterparty risk, its a win-win. To learn more, check out our our [asset pricing](../developers/asset-pricing.md) and [contracts ](../developers/contracts/)sections.\
* **Provenance of Data:** Knowing your data's origin story is invaluable, and Ocean Protocol takes full advantage of blockchain's auditing ability. It facilitates a detailed trace of data publishing, metadata alterations, and computational jobs, encouraging trust in data quality. To learn more, check out our [Subgraph section](../developers/subgraph/).
* **Verified Usage Statistics:** With Ocean Protocol, accessing product information is as straightforward as reading a book. Composable subgraphs provide comprehensive details like asset access history, total revenue, and more. It's a two-way street publishers understand their customers better, and consumers can trust in the asset quality. To learn more, check out our [Subgraph section](../developers/subgraph/).
* **Global Discovery of Assets:** Finding relevant data and models should be as simple as browsing a well-organized bookshelf. Unlike web2 platforms where the host dictates asset discoverability, Ocean Protocol promotes transparency and permissionless discovery of assets. Anyone can tailor their own marketplace, ensuring an open and democratic system. To learn more, check out our [Aquarius ](../developers/aquarius/)and [Build a Marketplace](../developers/build-a-marketplace/) sections.

View File

@ -0,0 +1,20 @@
# Composable Data Flows
Data is the fuel that drives ML and AI. The popular expression "garbage in, garbage out" holds true as the best way to improve data science effectiveness is to have better data. Data can exit in several different forms throughout the entire AI/ML value creation loop. 
* **Raw Data**: This is the unprocessed, untouched data, fresh from the source. Example: a sales spreadsheet from a coffee shop or a corpus of internet text.
* **Cleaned Data and Feature Vectors**: The raw data, now polished and transformed into numerical representations - the feature vectors. Example: the coffee shop sales data, now cleaned and organized, or preprocessed text data transformed into word embeddings.
* **Trained Models**: Machine learning models that have been trained on feature vectors, learning to decode data's patterns and relationships. Example: a random forest model predicting coffee sales or GPT-3 trained on a vast text corpus.
* **Data to Tune Models**: Additional data introduced to further refine and enhance model performance. Example: a new batch of sales data for the coffee shop model, or specific domain text data for GPT-3.
* **Tuned Models**: Models that have been optimized for high performance, robustness, and accuracy. Example: a tuned random forest model forecasting the coffee shop's busiest hours, or a fine-tuned GPT-3 capable of generating expert-level text.
* **Model Prediction Inputs**: Inputs provided to the models to generate insights. Example: inputting today's date and weather into the sales model, or a text prompt for GPT-3 to generate a blog post.
* **Model Prediction Outputs**: The models' predictions or insights based on the inputs. Example: the sales model's forecast of a spike in iced coffee sales due to an incoming heatwave, or GPT-3's generated blog post on sustainability in business
With Ocean Protocol, data can be tokenized at every stage of the value creation loop. By leveraging the standards of Ocean, individuals can work together to mix and match valuable assets to build powerful flows. For example, when building a model instead of starting from scratch a data scientist may find a dataset that has already been cleaned and prepped for model building. To fine-tune a foundation model instead of building the dataset from scratch, they can find an already prepared one shared on Ocean.

View File

@ -2,16 +2,14 @@
Data engineers play a pivotal role in driving data value creation. If you're a data scientist looking to build useful dashboards or cutting-edge machine-learning models, you understand the importance of having access to well-curated data. That's where friendly and skilled data engineers come in!
Data engineers specialize in creating robust data pipelines that enable seamless data ingestion from diverse source systems. The expertise lies in conducting essential transformations to ensure data cleanliness and aggregation, ultimately making the data readily available for downstream use cases. With data engineer support, data scientists can focus on unleashing their creativity and innovation, knowing that the data they need is reliably curated and accessible.
Ocean allows data engineers to unleash their creativity. 
Data engineers can contribute numerous types of data to the Ocean Protocol ecosystem. Some examples are below.
* Government Open Data: Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our [data challenges](https://desights.ai/shared/challenge/8) we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.
* **Government Open Data:** Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our [data challenges](https://desights.ai/shared/challenge/8) we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.
* **Public APIs:** A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. [This ](https://github.com/public-apis/public-apis)is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.
* **On-Chain Data:** Blockchain data presents a unique opportunity for data engineers to curate high-quality data. Whether it's connecting directly to the blockchain or utilizing alternative data providers, there is tremendous value for simplifying data usability in this emerging field. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data.
* **Datasets for training foundation models:** Foundation models such as LLMs are some of the most exciting technologies today, such as GPT4. Building these models requires access to vast amounts of unstructured [data ](#user-content-fn-1)[^1]to build, and new models will need access to even more data. Building pipelines for building these datasets and structuring them in a format for training is a strong opportunity. 
* **Datasets for fine-tuning foundation models:** Making a foundation model like GPT4 work best in an application like ChatGPT To make these models suitable for customer-facing applications, they often are best when fine-tuned on a dataset with example structures and answers. Data engineers can curate high-quality datasets by labeling which outputs are good and which are bad. Leveraing industry knowledge can be used to build datasets to fine-tune models for every vertical in the world.
* Public APIs: A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. [This ](https://github.com/public-apis/public-apis)is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.
* On-Chain Data: Blockchain data presents a unique opportunity for data engineers to curate high-quality data. Whether it's connecting directly to the blockchain or utilizing alternative data providers, there is tremendous value for simplifying data usability in this emerging field. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data. 
[^1]: