Data sharing in Australia

- Updates from Australia Research Data Commons

Research data is of vital importance in scholarly communication. How do researchers share their data? Where do they store data? Keith Russell, Manager of Engagements at Australia Research Data Commons (ARDC), delivered a keynote address at Melbourne Scholarly Summit 2019, giving insights on data sharing in Australia.

What’s data sharing

Download the ‘what’s data sharing’ transcript here.

What’s data?

Data is a broad term. It’s not just tables or database, but all the underlying material that’s used to create the research. It could include text, images, video, spreadsheets, transcriptions, audio/video recordings, drawings, or notes captured in lab notebooks. ARDC defines data as “the materials that are actually useful for others to understand how the research findings were created and to be able to reproduce or build on them”.

What motivates researchers to share their data?

According to Figshare State of Open Data 2019, the main motivations for researchers to share data are:

Increasing impact and visibility of their research
Making it available to the public
Increasing transparency and reuse of their research
Getting proper credit for the data by having a citation available over it

There are other external drivers for sharing data, including:

Funders: start to push researchers to share their research data that’s publicly funded.
Education councils: such as the ARC and NHMRC in Australia, strongly encourage data sharing.
Publishers: such as Taylor & Francis, implement their data sharing policies.
Government: sets up open data initiatives in the US, Australia and Europe.

Data sharing in Science, Technology, Engineering and Medicine (STEM)

Data sharing within STEM increases machine learning and artificial intelligence. It creates huge opportunities in bringing data together and combining data to provide novel research and findings. This process involves making sure that data can be read by machines and humans, and be transferred backwards and forwards, which is challenging but evolutionizing. Data sharing in STEM also increases collaboration across disciplines, as data being produced in different disciplines can actually be combined and brought together.

Data sharing in Arts, Humanities and Social Sciences (AHSS)

What does data really mean in humanities and social sciences? It’s a frequently asked question. A starting point is to see data in a broader perspective. Data can be all sorts of materials and assets. In digital humanities, we’ve seen quite a strong push towards open data principles. The main goal is to use visualization to understand data and visualize data in a meaningful way. As the scientific workflows don’t usually apply directly to humanities and social sciences, it’s the creativity within humanities and social sciences that comes up with different ways and routes to findings.

One of the initiatives ARDC supported is Humanities, Arts & Social Sciences Data Enhanced Virtual Lab (Tinker). It aims to establish workflows across national capabilities and research institutions for a more cohesive and interoperable landscape within AHSS. The project will also give the AHSS community the ability to publish their own tools for use through the Virtual Laboratory, Tinker.

FAIR data principles

Download the ‘FAIR data principles’ transcript here.

What does FAIR stand for?

Findable. It means data should have a DOI and be stored in a discipline or institutional repository that feeds up into international collection mechanisms.

Accessible. Data should be made available through appropriate mechanisms and routes so that the researchers can get access to it. The data should also be accessible in a standard format that’s machine readable. This doesn’t mean that data have to be open, especially around sensitive datasets.

Interoperable. Data should use as many as possible discipline standards, so that other researchers can easily read and combine it with different types of data without having to find some proprietary software to read or use it.

Reusable. It’s important that data also has a clear license over it and associated information about how it was created/selected, so researchers know how they can reuse it. For instance, if you have a series of interviews, under which conditions were those interviews conducted and why did you select those respondents for those interviews. Having some provenance information is a very important aspect to understand the data and reuse it.

Taylor & Francis data-sharing policies are tiered, with the five standard policies offering increasing levels of expectations around how and when data should be shared. At the more progressive end, the open and FAIR policy mandates making data open under a CC BY, CC0 or equivalent licence and must be aligned with the FAIR principles. Find out about our journals with an open and FAIR data policy

How FAIR principles are implemented?

AGU Commitment Statement

AGU Commitment Statement has now been signed by a range of repository managers, researchers, research institutions, and publishers (including Taylor & Francis), who have committed themselves to making data available. ARDC very strongly encourage not to have the data as an appendix in the article (that doesn’t make it very reusable), but to have it in a repository where it can be made FAIR. It means there is an appropriate DOI with a citation over the data, so researchers can reference the data with the credit associated with it.

Core Trust Seal

Making data accessible for the longer term requires storing data in a certified and trusted repository. Core Trust Seal, an international standard certification organization, can certify a repository as a CoreTrustSeal certified repository. ARDC have been working with a number of data repositories in Australia such as Australian Data Archive to have their repository certified by CoreTrustSeal, and they were successful.

Data self-assessment tool

ARDC provides a self-assessment tool that researchers or research support staff can use to score how FAIR the data is and what they can do to make it more FAIR, then turn them into tangible actions. ARDC have also developed top 10 FAIR data and software in different disciplines and they’re now publicly available so anybody can use them to untangle “what does it mean to make my data FAIR” in a specific area.

Actions from different stakeholders

Making data FAIR requires actions from a number of different stakeholders in scholarly ecosystem.

A researcher can make sure that they have an ORCID and know their institutional or discipline repository. They can make sure that when they publish their data it does have a DOI so they can reference it.
Editors can promote the use of trusted repositories and standard journal data policy so data can be published in a consistent way.
Research supports staff can help researchers make the process easy and seamless without too much effort required from researchers.

Visit Taylor & Francis data sharing hub to understand our policies.