1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-01 15:55:34 +01:00
docs/data-science/the-data-value-creation-loop.md

28 lines
3.3 KiB
Markdown

---
description: When you have problems, but then you solve them 💁‍♀️
---
# The Data Value Creation Loop
<figure><img src="../.gitbook/assets/gif/tell-me-more.gif" alt=""><figcaption><p>Tell me more.</p></figcaption></figure>
### What is the Data Value Creation Loop?
The Data Value Creation Loop is a journey where **data progresses from a business problem to valuable insights**. It involves collaboration among various skillsets like business stakeholders, data engineers, data scientists, MLOps engineers, and application developers.
Here's a condensed breakdown of the loop:
1. Business Problem: Identify the specific problem to solve using data science, such as reducing customer churn or predicting token prices.
2. Raw Data: Gather unprocessed data directly from sources, whether static or dynamic, like user profiles or historical prices.
3. Cleaned Data and Feature Vectors: Transform raw data into organized numerical representations, like cleaned sales data or preprocessed text transformed into word embeddings.
4. Trained Models: Train machine learning models on feature vectors to learn patterns and relationships, such as a random forest predicting coffee sales or GPT-3 trained on a text corpus.
5. Data to Tune Models: Introduce additional data to further refine and enhance model performance, like new sales data for the coffee shop model or domain-specific text data for GPT-3.
6. Tuned Models: Optimize models for high performance, accuracy, and robustness, such as a tuned random forest predicting busy hours for the coffee shop or a fine-tuned GPT-3 generating expert-level text.
7. Model Prediction Inputs: Provide inputs to the models to generate insights, like today's date and weather for the sales model or a text prompt for GPT-3 to generate a blog post.
8. Model Prediction Outputs: Obtain predictions or insights from the models based on the inputs, such as the sales model forecasting a spike in iced coffee sales or GPT-3 generating a blog post on sustainability in business.
9. Application: Package the models into applications that can impact real-world scenarios. Build user experiences around the data and model assets to make them usable and valuable.
### What is an example of a Data Value Creation Loop?
Let's explore an example to showcase the process of the data value creation loop. Imagine a healthcare organization seeking to develop a predictive model for early detection of diseases. They collaborate with data engineers to collect and preprocess various medical datasets, including patient demographics, lab results, and medical imaging. These datasets are tokenized and made available on the Ocean Protocol platform for secure computation. Data scientists utilize the tokenized data to train machine learning models that can accurately identify early warning signs of diseases. These models are then published as compute assets on Ocean Market. Application developers work with the healthcare organization to integrate the models into their existing patient management system, allowing doctors to receive automated risk assessments and personalized recommendations for preventive care. As a result, patients benefit from early detection, doctors can make more informed decisions, and the healthcare organization generates insights to improve patient outcomes while fostering data and model asset collaboration. Et voilà!