1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-01 15:55:34 +01:00
docs/data-science/the-data-value-creation-loop.md

3.0 KiB

The Data Value Creation Loop

The Data Value Creation Loop represents the data journey as it progresses from a business problem to raw data, undergoes cleansing and refinement, is used to train models, and finally finds its application in real-world scenarios. Data assets accrue more value at each stage of the loop accrues as it gets closer to its real-world deployment. created when a variety of different skillsets work together; business stakeholders, data engineers, data scientists, MLOps engindeploymenteers, and application developers

  • Business Problem: Identifying the business problem that can be addressed with data science is the critical first step. Example: Reducincustomerer churn rate, predicting token prices, or predicting drought risk.
  • Raw Data: This is the unprocessed, untouched data, fresh from the source. This data can be static or dynamic from an API. Example: User profiles, historical prices, or daily temperature.
  • Cleaned Data and Feature Vectors: The raw data, now polished and transformed into numerical representations - the feature vectors. Example: the coffee shop sales data, now cleaned and organized, or preprocessed text data transformed into word embeddings.
  • Trained Models: Machine learning models that have been trained on feature vectors, learning to decode data's patterns and relationships. Example: a random forest model predicting coffee sales or GPT-3 trained on a vast text corpus.
  • Data to Tune Models: Additional data introduced to further refine and enhance model performance. Example: a new batch of sales data for the coffee shop model, or specific domain text data for GPT-3.
  • Tuned Models: Models that have been optimized for high performance, robustness, and accuracy. Example: a tuned random forest model forecasting the coffee shop's busiest hours, or a fine-tuned GPT-3 capable of generating expert-level text.
  • Model Prediction Inputs: Inputs provided to the models to generate insights. Example: inputting today's date and weather into the sales model, or a text prompt for GPT-3 to generate a blog post.
  • Model Prediction Outputs: The models' predictions or insights based on the inputs. Example: the sales model's forecast of a spike in iced coffee sales due to an incoming heatwave, or GPT-3's generated blog post on sustainability in business
  • Application:

Ocean Protocol allows each data value creation loop stage to be tokenized as an asset. Using Ocean Protocol, you can thus unlock composable data science. Instead of a data scientists needing to conduct each stage of the pipeline themselves, they can work together and build off of each other's components and focus on what they are best at.

For example, an insurance provider may want to offer a parametric insurance product to protect against drought in a particular region. those with a strong skillset in data engineering may focus on the beginning of the loop and publish curated datasets from Ocean. Then, a data scientist may build