1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-01 15:55:34 +01:00
docs/data-science/data-engineers.md

2.4 KiB

Examples of useful data

Here is the game: Go get useful data and publish it on Ocean. Curating useful datasets is the critical early step in the data value creation loop. The first ones to publish datasets will be well-positioned to have others to build on their work and accrue greater value to the system. Those that start this process can kickstart value creation by working with Ocean Protocol to sponsor a data challenge.

Below is a short list of potential areas to curate useful data.

  • Government Open Data: Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our data challenges we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.
  • Public APIs: A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. This is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.
  • On-Chain Data: Blockchain data presents a unique opportunity for data engineers to curate high-quality data. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data. Build datasets for a range of use cases such as trading to customer analytics
  • Datasets for training foundation models: Build pipelines for training foundation models. Conduct web scraping and aggregate existing commercially licensed datasets
  • Datasets for fine-tuning foundation models: Curate high-quality labeled datasets to fine-tune foundation models. label the outputs of models or create