Open Data

Open Data[1] is a core pillar of our Research Integrity Strategy and reflects our commitment to transparent, collaborative, and impactful research.

Open Data refers to research datasets that are available for others to access and use without unnecessary restrictions. This openness allows researchers, industry partners, and the wider scientific community to use, modify, and share data for any purpose.

This practice transforms how scientific knowledge is built and validated, moving away from siloed data ownership to collaborative research ecosystems that accelerate discovery, enhance research quality, and benefit society as a whole.

The three pillars of open data[2]

1. Availability and access

Data must be made available as a complete dataset at no more than reasonable reproduction costs, preferably through online download. Data must be provided in convenient formats that can be easily modified and meaningfully used by others. This means avoiding formats that need expensive software or technical barriers that limit accessibility.

2. Reuse and redistribution

Open data must be provided under licensing terms that explicitly permit reuse and redistribution, including the ability to combine datasets with other sources. This enables researchers to build on existing work, conduct meta-analyses, and develop new insights through data integration.

3. Universal participation

Everyone must be able to use, reuse, and redistribute the data without discrimination against fields of endeavour, individuals, or groups. This means avoiding restrictions such as using the NC (non-commercial use only) attribute in Creative Common licenses, or limitations to specific purposes like education, as these create artificial barriers to knowledge advancement.

The FAIR Principles Framework

Open Data implementation is guided by the FAIR principles[3] that make scholarly materials Findable, Accessible, Interoperable and Reusable (FAIR). These principles provide a structured approach to data sharing:

Findable

  • Datasets are assigned persistent identifiers (DOIs, URNs)
  • Rich metadata descriptions enable discovery through search engines and catalogues
  • Clear indexing in data repositories and institutional systems
  • Proper documentation that allows researchers to locate relevant datasets

Accessible

  • Consider where materials are stored (e.g. in data repositories)
  • Data can be retrieved through standardised protocols
  • Clearly defined authentication and authorisation procedures
  • Long-term preservation ensures continued access
  • Appropriate access controls for sensitive data while maintaining openness where possible

Interoperable

  • Focus on the importance of data formats and how formats might change in the future
  • Use standard file formats that can be read across different software platforms
  • Consistent variable naming and coding schemes
  • Clear documentation of data structure and relationships
  • Compatibility with existing data integration tools and workflows

Reusable

  • Comprehensive documentation and metadata to enable proper interpretation
  • Clear licensing that specifies how data can be used
  • Quality assurance processes for data integrity
  • Contextual information for meaningful reanalysis

Why work with us?

We have a strong track record of delivering high quality Open Data initiatives, including flagship programmes such as MalariaGEN, one of the world’s most influential open genomics collaborations. Our experience in managing, sharing, and stewarding complex datasets, while ensuring ethical, secure, and responsible research practices, positions us as a global leader in the open research space.

[1] https://forrt.org/glossary/english/open_data/

[2] https://opendatahandbook.org/guide/en/what-is-open-data/

[3] https://www.go-fair.org/fair-principles/