1
0
mirror of https://github.com/oceanprotocol/docs.git synced 2024-11-02 00:05:35 +01:00
docs/data-science/how-to-contribute/data-engineers.md

2.5 KiB

Data Engineers

Data engineers play a pivotal role in driving data value creation. If you're a data scientist looking to build useful dashboards or cutting-edge machine-learning models, you understand the importance of having access to well-curated data. That's where friendly and skilled data engineers come in!

Data engineers specialize in creating robust data pipelines that enable seamless data ingestion from diverse source systems. The expertise lies in conducting essential transformations to ensure data cleanliness and aggregation, ultimately making the data readily available for downstream use cases. With data engineer support, data scientists can focus on unleashing their creativity and innovation, knowing that the data they need is reliably curated and accessible.

Data engineers can contribute numerous types of data to the Ocean Protocol ecosystem. Some examples are below.

  • Government Open Data: Governments serve as a rich and reliable source of data. However, this data often lacks proper documentation or poses challenges for data scientists to work with effectively. Establishing robust Extract, Transform, Load (ETL) pipelines enhance accessibility to government open data. This way, others can tap into this wealth of information without unnecessary hurdles. For example, in one of our data challenges we leveraged public real estate data from Dubai to build use cases for understanding and predicting valuations and rents. Local, state, and federal governments around the world provide access to valuable data. Build pipelines to make consuming that data easier and help others build useful products to help your local community.

  • Public APIs: A wide range of freely available public APIs covers various data verticals. Leveraging these APIs, data engineers can construct pipelines that enable efficient access and utilization of the data. This is a public repository of public APIs for a wide range of topics, from weather to gaming to finance.

  • On-Chain Data: Blockchain data presents a unique opportunity for data engineers to curate high-quality data. Whether it's connecting directly to the blockchain or utilizing alternative data providers, there is tremendous value for simplifying data usability in this emerging field. There is consistent demand for well-curated decentralized finance (DeFi) data and an emerging need for curated data in other domains, such as decentralized social data.