1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-01 15:55:34 +01:00

GITBOOK-471: change request with no subject merged in GitBook

This commit is contained in:
Veronica Manuel 2023-06-09 03:02:37 +00:00 committed by gitbook-bot
parent 2c5956dba2
commit a50735ba2b
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
6 changed files with 17 additions and 23 deletions

View File

@ -96,8 +96,8 @@
* [📊 Data Science](data-science/README.md)
* [The Data Value Creation Loop](data-science/the-data-value-creation-loop.md)
* [Examples of valuable data](data-science/data-engineers.md)
* [Data Scientists](data-science/data-scientists.md)
* [Benefits of Ocean for Data Science](benefits-of-ocean-for-data-science.md)
* [Examples of useful models](data-science/data-scientists.md)
* [Benefits of Ocean for Data Science](data-science/benefits-of-ocean-for-data-science.md)
* [🔨 Infrastructure](infrastructure/README.md)
* [Setup a Server](infrastructure/setup-server.md)
* [Deploying Marketplace](infrastructure/deploying-marketplace.md)

View File

@ -7,25 +7,23 @@ coverY: 0
Ocean Protocol was built to serve the data science space. 
Data Value Creation Loop stage 
 
With Ocean, each [Data Value Creation Loop](the-data-value-creation-loop.md) stage is tokenized with data NFTs and datatokens. Leveraging tokenized standards unlocks several unique benefits for the ecosysem. Together, stakeholders can build sophisticated products by combining assets posted onto Ocean.  
Data engineers can publish pipelines for curated data, allowing data scientists to conduct feature engineering and build models on top. The models can be deployed with Compute-to-Data and leveraged by app developers building the last-mile distribution of model outputs into business practices.
Ocean Protocol unlocks _composable data science, ._ Instead of a data scientists needing to conduct each stage of the pipeline themselves, they can work together and build off of each other's components and focus on what they are best at. 
Ocean Protocol unlocks _composable data science,_ mixing and matching assets to build end-to-end solutions. On-chain assets ensure participants can trustlessly work together while sharing in the upside of the entire system.
This guide links you to the most important tutorials for data scientists working with Ocean Protocol. 
**Core Components for Data Scientists:**
**Key Links for Data Scientists:**
* [Ocean data NFTs ](../developers/contracts/data-nfts.md)and [datatokens](../developers/contracts/datatokens.md) are core building blocks of Ocean Protocol. They allow individuals and businesses to define their ownership of their assets, and create flexible access control tokens
* Ocean's [Compute-to-Data](../developers/compute-to-data/) engine resolves the trade-off between the benefits of open data and data privacy risks. Using the engine, algorithms can be run on data without exposing the underlying data. Now, data can be widely shared and monetized without 
* [Ocean.py](../developers/ocean.py/) a python library that interacts with all Ocean contracts and tools. To get started with the library, check out our guides. They will teach installation and set-up and several popular workflows such as[ publishing an asset](../developers/ocean.py/publish-flow.md) and starting a [compute job](../developers/ocean.py/compute-flow.md).
* [Ocean.py](../developers/ocean.py/) is our python library to interacts with Ocean contracts and tools. To get started with the library, check out our guides. They will teach installation and set-up and several popular workflows such as[ publishing an asset](../developers/ocean.py/publish-flow.md) and starting a [compute job](../developers/ocean.py/compute-flow.md).

View File

@ -1,11 +1,11 @@
# Examples of valuable data
There is opportunity for ths
There is a unique opportunity to begin building these value-creation loops/ 
Here is the 
* **Government Open Data:** Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our [data challenges](https://desights.ai/shared/challenge/8) we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.
* **Public APIs:** A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. [This ](https://github.com/public-apis/public-apis)is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.
* **On-Chain Data:** Blockchain data presents a unique opportunity for data engineers to curate high-quality data. Whether it's connecting directly to the blockchain or utilizing alternative data providers, there is tremendous value for simplifying data usability in this emerging field. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data.
* **Datasets for training foundation models:** Foundation models such as LLMs are some of the most exciting technologies today, such as GPT4. Building these models requires access to vast amounts of unstructured [data ](#user-content-fn-1)[^1]to build, and new models will need access to even more data. Building pipelines for building these datasets and structuring them in a format for training is a strong opportunity. 
* **Datasets for fine-tuning foundation models:** Making a foundation model like GPT4 work best in an application like ChatGPT To make these models suitable for customer-facing applications, they often are best when fine-tuned on a dataset with example structures and answers. Data engineers can curate high-quality datasets by labeling which outputs are good and which are bad. Leveraing industry knowledge can be used to build datasets to fine-tune models for every vertical in the world.
[^1]:
* **On-Chain Data:** Blockchain data presents a unique opportunity for data engineers to curate high-quality data. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data. Build datasets for a range of use cases such as trading to customer analytics
* **Datasets for training foundation models:** Build pipelines for training foundation models. Conduct web scraping and aggregate existing commercially licensed datasets
* **Datasets for fine-tuning foundation models:** Curate high-quality labeled datasets to fine-tune foundation models. label the outputs of models or create 

View File

@ -1,8 +1,6 @@
# Data Scientists
# Examples of useful models
Data scientists are integral to the process of extracting insights and generating value from data. Their expertise lies in applying statistical analysis, machine learning algorithms, and domain knowledge to uncover patterns, make predictions, and derive meaningful insights from complex datasets.
There are a few different ways that data scientists can add value to the Ocean Protocol ecosystem by building on datasets published on Ocean. 
Data scientists armed with access to high-quality datasets are now primed to
1. **Visualizations of Correlations between Features**: Data scientists excel at exploring and visualizing data to uncover meaningful patterns and relationships. By creating heatmaps and visualizations of correlations between features, they provide insights into the interdependencies and associations within the dataset. These visualizations help stakeholders understand the relationships between variables, identify influential factors, and make informed decisions based on data-driven insights. By publishing the results on Ocean, you can allow others to build on your work.
2. **Conducting Feature Engineering**: Feature engineering is a critical step in the data science workflow. Data scientists leverage their domain knowledge and analytical skills to engineer new features or transform existing ones, creating more informative representations of the data. By identifying and creating relevant features, data scientists enhance the predictive power of models and improve their accuracy. This process often involves techniques such as scaling, normalization, one-hot encoding, and creating interaction or polynomial features.

View File

@ -10,13 +10,11 @@ The Data Value Creation Loop represents the data journey as it progresses from a
* **Tuned Models**: Models that have been optimized for high performance, robustness, and accuracy. Example: a tuned random forest model forecasting the coffee shop's busiest hours, or a fine-tuned GPT-3 capable of generating expert-level text.
* **Model Prediction Inputs**: Inputs provided to the models to generate insights. Example: inputting today's date and weather into the sales model, or a text prompt for GPT-3 to generate a blog post.
* **Model Prediction Outputs**: The models' predictions or insights based on the inputs. Example: the sales model's forecast of a spike in iced coffee sales due to an incoming heatwave, or GPT-3's generated blog post on sustainability in business
* **Application:** 
* **Application:** Once the models have been deployed and can generate results, they must be packaged into an application so that they can impact real-world scenarios. Build composable user experiences built around the underlying data and model assets.
Ocean Protocol allows each data value creation loop stage to be tokenized as an asset. Using Ocean Protocol, you can thus unlock _composable data science._ Instead of a data scientists needing to conduct each stage of the pipeline themselves, they can work together and build off of each other's components and focus on what they are best at. 
For example, an insurance provider may want to offer a parametric insurance product to protect against drought in a particular region. those with a strong skillset in data engineering may focus on the beginning of the loop and publish curated datasets from Ocean. Then, a data scientist may build
For example, an insurance provider may want to offer a parametric insurance product for farmers to protect against drought in a particular region. Those with a strong skillset in data engineering may focus on the beginning of the value creation loop. They can create pipelines to ingest climate and local weather data and aggregate them together. Data scientists can then build their predictive models on top of