GITBOOK-218: Christian's Data Science May 22 changes

This commit is contained in:
Ana Loznianu 2023-05-24 13:24:02 +00:00 committed by gitbook-bot
parent 20545358e0
commit b4114386ad
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF
16 changed files with 147 additions and 1 deletions

View File

Before

Width:  |  Height:  |  Size: 320 KiB

After

Width:  |  Height:  |  Size: 320 KiB

View File

Before

Width:  |  Height:  |  Size: 320 KiB

After

Width:  |  Height:  |  Size: 320 KiB

View File

Before

Width:  |  Height:  |  Size: 320 KiB

After

Width:  |  Height:  |  Size: 320 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

View File

Before

Width:  |  Height:  |  Size: 77 KiB

After

Width:  |  Height:  |  Size: 77 KiB

View File

Before

Width:  |  Height:  |  Size: 1.4 MiB

After

Width:  |  Height:  |  Size: 1.4 MiB

View File

@ -73,6 +73,13 @@
* [Compute Endpoints](developers/provider/compute-endpoints.md)
* [Authentication Endpoints](developers/provider/authentication-endpoints.md)
* [📊 Data Science](data-science/README.md)
* [Data Challenges](data-science/data-challenges/README.md)
* [Participating in a data challenge](data-science/data-challenges/participating-in-a-data-challenge.md)
* [Hosting a data challenge](data-science/data-challenges/hosting-a-data-challenge.md)
* [How to contribute](data-science/how-to-contribute/README.md)
* [Data Engineers](data-science/how-to-contribute/data-engineers.md)
* [Data Scientists](data-science/how-to-contribute/data-scientists/README.md)
* [Creating a new docker image for C2D](data-science/how-to-contribute/data-scientists/creating-a-new-docker-image-for-c2d.md)
* [🔨 Infrastructure](infrastructure/README.md)
* [Setup a Server](infrastructure/setup-server.md)
* [Deploying Marketplace](infrastructure/deploying-marketplace.md)

View File

@ -0,0 +1,13 @@
# Data Challenges
Data challenges present an exciting opportunity for data scientists to hone their skills, and actual business problems, and earn some income along the way. These operate as data science competitions where participants are tasked with solving a business problem. Data challenges can have several different types of formats, topics, and sponsors. One of the main advantages of these data challenges is that users retain ownership and the ability to further monetize their work outside of the competition. 
See below for the winners of past data challenges:
{% embed url="https://blog.oceanprotocol.com/air-quality-data-challenge-winners-2ae7e3a72bc3" %}
{% embed url="https://blog.oceanprotocol.com/here-are-the-winners-of-the-predict-eth-round-4-data-challenge-1672b36c0af9" %}
{% embed url="https://blog.oceanprotocol.com/presenting-the-winners-of-the-dex-liquidity-data-challenge-56feb3bc86bd" %}

View File

@ -0,0 +1,10 @@
# Hosting a data challenge
Creating a data challenge 
1.  Establish the business problem you want to solve. The first step in building a data solution is understanding what you want to solve. For example, you may want to be able to predict the drought risk in an area to help price parametric insurance, or predict the price of ETH to optimize Uniswap LPing. 
2. Curate the dataset for the challenge. The key to hosting a good data challenge is to provide an exciting and through dataset that participants can use to build their solutions. Do your research to understand what data is available, whether it be free from an API, available for download, require any transformations, etc. For the first challenge, it is alright if the created dataset is a static file. However, it is best to ensure there is a path to making the data available from a dynamic endpoint so that entires can eventually be applied to real-world use cases
3. Decide how the judging process will occur. This includes how long to make review period, how to score submissions, and how to decide any prizes will be divided among participants
4. Work with OPF to gather participants for your data challenge. Creating blog posts and hosting twitter spaces is a good way to spread the word about your data challenge

View File

@ -0,0 +1,10 @@
# Participating in a data challenge
Here is the typical flow for a data challenge\
\
1. On Ocean Market, a dataset( or several) is published along with its description and schema. The dataset will either be provided by the data challenge sponsor partner or by OPF ourselves.
2. Participants should download the dataset(s), or a sample if it the data is private. This will allow them to perform exploratory analysis to understand the dataset
3. Based on the data challenge, there are several different ways participants may be tasked with entering the competition, One way is to build a report that combines data visualization and written explanations so that business stakeholders can gain actionable insights into the data. Another way is participants may be tasked with building a machine-learning model to predict a specific target value. 
4. Users will typically submit directly with Ocean Technologies. They can post their reports or algorithms onto the Ocean Market. Those that produce strong submissions can monetize their work by using Oceans Compute-to-Data engine so that the model can be run by others in the future for a price.

View File

@ -0,0 +1,2 @@
# How to contribute

View File

@ -0,0 +1,11 @@
# Data Engineers
Data engineers are a key part of data value creation. Building any useful dashboards or machine-learning models requires access to curated data. Data engineers help provide this by creating robust data pipelines that ingest data from source systems, conduct transformations to clean and aggregate the data, and then make the data available for downstream use cases  
\
Some examples of useful sources of data can be found below
* **Government Open Data:** Governments serve as one of the most reliable sources of data, which, although abundant in information, often suffer from inadequate documentation or pose challenges for data scientists to work with effectively. Establishing a robust Extract, Transform, Load (ETL) pipeline to enhance accessibility to such data is crucial.
* **Public APIs:** Similarly to government open data, a wide array of freely available public APIs covering various data verticals are at the disposal of data engineers. Leveraging these APIs, data engineers can construct pipelines that enable others to efficiently access and utilize the available data.
* **On-Chain Data:** Blockchain data presents an excellent opportunity for data engineers to curate high-quality data. Whether connecting directly to the blockchain or utilizing alternative data providers, simplifying data usability holds significant value. While there is a consistent demand for well-curated decentralized finance (DeFi) data, there is also an emerging need for curated data in other domains, including social data.

View File

@ -0,0 +1,8 @@
# Data Scientists
Data scientists are integral to the process of extracting insights and generating value from data. Their expertise lies in applying statistical analysis, machine learning algorithms, and domain knowledge to uncover patterns, make predictions, and derive meaningful insights from complex datasets.
1. **Heatmaps and Visualizations of Correlations between Features**: Data scientists excel at exploring and visualizing data to uncover meaningful patterns and relationships. By creating heatmaps and visualizations of correlations between features, they provide insights into the interdependencies and associations within the dataset. These visualizations help stakeholders understand the relationships between variables, identify influential factors, and make informed decisions based on data-driven insights. By publishing the results on Ocean, you can allow others to build on your work.
2. **Conducting Feature Engineering**: Feature engineering is a critical step in the data science workflow. Data scientists leverage their domain knowledge and analytical skills to engineer new features or transform existing ones, creating more informative representations of the data. By identifying and creating relevant features, data scientists enhance the predictive power of models and improve their accuracy. This process often involves techniques such as scaling, normalization, one-hot encoding, and creating interaction or polynomial features.
3. **Conducting Experiments to Find the Best Model**: Data scientists perform rigorous experiments to identify the most suitable machine learning models for a given problem. They evaluate multiple algorithms, considering factors like accuracy, precision, recall, and F1-score, among others. By systematically comparing different models, data scientists select the one that performs best in terms of predictive performance and generalization. This process often involves techniques such as cross-validation, hyperparameter tuning, and model selection based on evaluation metrics. 
4. **Testing Out Different Models**: Data scientists explore various machine learning models to identify the optimal approach for a specific problem. They experiment with algorithms such as linear regression, decision trees, random forests, support vector machines, neural networks, and more. By testing out different models, data scientists gain insights into the strengths and weaknesses of each approach, allowing them to select the most appropriate model for the given dataset and problem domain.

View File

@ -0,0 +1,85 @@
# Creating a new docker image for C2D
Docker is widely used to run containerized applications with Ocean Compute-to-Data. Ocean Compute-to-Data allows computations to be brought to the data, preserving data privacy, and enabling the use of private data without exposing it. Docker is a crucial part of this infrastructure, allowing applications to run in a secure, isolated manner.
The best way to sell access to models using C2D is by creating a docker image. Docker is an open-source platform designed to automate the deployment, scaling, and management of applications. It uses containerization technology to bundle up an application along with all of its related configuration files, libraries, and dependencies into a single package. This means your applications can run uniformly and consistently on any infrastructure. Docker helps solve the problem of "it works on my machine" by providing a consistent environment from development to production.
Main Value:
* Consistency across multiple development and release cycles, ensuring that your application (and its full environment) can be replicated and reliably moved from one environment to another.
* Rapid deployment of applications. Docker containers are lightweight, featuring fast startup times as they don't include the unnecessary binaries and libraries of full-fledged virtual machines.
* Isolation of applications and resources, allowing for safe testing and effective use of system resources.
**Step by Step Guide to Creating a Docker Image**
1. **Install Docker**
First, you need to install Docker on your machine. Visit Docker's official website for installation instructions based on your operating system.
* [Docker for Windows](https://docs.docker.com/desktop/windows/install/)
* [Docker for Mac](https://docs.docker.com/desktop/mac/install/)
* [Docker for Linux](https://docs.docker.com/engine/install/)
2. **Write a Dockerfile**
Docker images are created using Dockerfiles. A Dockerfile is a text document that contains all the commands needed to assemble an image. Create a new file in your project directory named `Dockerfile` (no file extension).
3. **Configure Your Dockerfile**
Here is a basic example of what a Dockerfile might look like for a simple Python Flask application:
```bash
bashCopy code# Use an official Python runtime as a parent image
FROM python:3.7-slim
# Set the working directory in the container to /app
WORKDIR /app
# Add the current directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"]
```
For a more detailed explanation of Dockerfile instructions, check the [Docker documentation](https://docs.docker.com/engine/reference/builder/).
4. **Build the Docker Image**
Navigate to the directory that houses your Dockerfile in the terminal. Build your Docker image using the `docker build` command. The `-t` flag lets you tag your image so it's easier to find later.
```shell
shellCopy codedocker build -t friendlyhello .
```
The `.` tells the Docker daemon to look for the Dockerfile in your current directory.
5. **Verify Your Docker Image**
Use the `docker images` command to verify that your image was created correctly.
```shell
shellCopy codedocker images
```
6. **Run a Container from Your Image**
Now you can run a Docker container based on your new image:
```shell
shellCopy codedocker run -p 4000:80 friendlyhello
```
The `-p` flag maps the port on your machine to the port on the Docker container.
You've just created and run your first Docker image! For more in-depth information about Docker and its various uses, refer to the [official Docker documentation](https://docs.docker.com/).

View File

@ -14,7 +14,7 @@ Liquidity pools and dynamic pricing used to be supported in previous versions of
4\. Go to field `20. balanceOf` and insert your ETH address. This will retrieve your pool share token balance in wei.
<figure><img src="../.gitbook/assets/liquidity/remove-liquidity-2 (1).png" alt=""><figcaption><p>Balance Of</p></figcaption></figure>
<figure><img src="../.gitbook/assets/wallet/balance-of.png" alt=""><figcaption><p>Balance Of</p></figcaption></figure>
5\. Copy this number as later you will use it as the `poolAmountIn` parameter.