1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-01 15:55:34 +01:00
docs/data-science/data-engineers.md

12 lines
2.0 KiB
Markdown
Raw Normal View History

# Examples of valuable data
There is a unique opportunity to begin building these value-creation loops/ 
Here is the 
* **Government Open Data:** Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our [data challenges](https://desights.ai/shared/challenge/8) we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.
* **Public APIs:** A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. [This ](https://github.com/public-apis/public-apis)is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.
* **On-Chain Data:** Blockchain data presents a unique opportunity for data engineers to curate high-quality data. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data. Build datasets for a range of use cases such as trading to customer analytics
* **Datasets for training foundation models:** Build pipelines for training foundation models. Conduct web scraping and aggregate existing commercially licensed datasets
* **Datasets for fine-tuning foundation models:** Curate high-quality labeled datasets to fine-tune foundation models. label the outputs of models or create