Research data is of vital importance in scholarly communication. How do researchers share their data? Where do they store data? Keith Russell, Manager of Engagements at Australia Research Data Commons (ARDC), delivered a keynote address at Melbourne Scholarly Summit 2019, giving insights on data sharing in Australia.
Data is a broad term. It’s not just tables or database, but all the underlying material that’s used to create the research. It could include text, images, video, spreadsheets, transcriptions, audio/video recordings, drawings, or notes captured in lab notebooks. ARDC defines data as “the materials that are actually useful for others to understand how the research findings were created and to be able to reproduce or build on them”.
According to Figshare State of Open Data 2019, the main motivations for researchers to share data are:
There are other external drivers for sharing data, including:
Data sharing within STEM increases machine learning and artificial intelligence. It creates huge opportunities in bringing data together and combining data to provide novel research and findings. This process involves making sure that data can be read by machines and humans, and be transferred backwards and forwards, which is challenging but evolutionizing. Data sharing in STEM also increases collaboration across disciplines, as data being produced in different disciplines can actually be combined and brought together.
What does data really mean in humanities and social sciences? It’s a frequently asked question. A starting point is to see data in a broader perspective. Data can be all sorts of materials and assets. In digital humanities, we’ve seen quite a strong push towards open data principles. The main goal is to use visualization to understand data and visualize data in a meaningful way. As the scientific workflows don’t usually apply directly to humanities and social sciences, it’s the creativity within humanities and social sciences that comes up with different ways and routes to findings.
One of the initiatives ARDC supported is Humanities, Arts & Social Sciences Data Enhanced Virtual Lab (Tinker). It aims to establish workflows across national capabilities and research institutions for a more cohesive and interoperable landscape within AHSS. The project will also give the AHSS community the ability to publish their own tools for use through the Virtual Laboratory, Tinker.
Findable. It means data should have a DOI and be stored in a discipline or institutional repository that feeds up into international collection mechanisms.
Accessible. Data should be made available through appropriate mechanisms and routes so that the researchers can get access to it. The data should also be accessible in a standard format that’s machine readable. This doesn’t mean that data have to be open, especially around sensitive datasets.
Interoperable. Data should use as many as possible discipline standards, so that other researchers can easily read and combine it with different types of data without having to find some proprietary software to read or use it.
Reusable. It’s important that data also has a clear license over it and associated information about how it was created/selected, so researchers know how they can reuse it. For instance, if you have a series of interviews, under which conditions were those interviews conducted and why did you select those respondents for those interviews. Having some provenance information is a very important aspect to understand the data and reuse it.
Taylor & Francis data-sharing policies are tiered, with the five standard policies offering increasing levels of expectations around how and when data should be shared. At the more progressive end, the open and FAIR policy mandates making data open under a CC BY, CC0 or equivalent licence and must be aligned with the FAIR principles. Find out about our journals with an open and FAIR data policy
AGU Commitment Statement has now been signed by a range of repository managers, researchers, research institutions, and publishers (including Taylor & Francis), who have committed themselves to making data available. ARDC very strongly encourage not to have the data as an appendix in the article (that doesn’t make it very reusable), but to have it in a repository where it can be made FAIR. It means there is an appropriate DOI with a citation over the data, so researchers can reference the data with the credit associated with it.
Making data accessible for the longer term requires storing data in a certified and trusted repository. Core Trust Seal, an international standard certification organization, can certify a repository as a CoreTrustSeal certified repository. ARDC have been working with a number of data repositories in Australia such as Australian Data Archive to have their repository certified by CoreTrustSeal, and they were successful.
ARDC provides a self-assessment tool that researchers or research support staff can use to score how FAIR the data is and what they can do to make it more FAIR, then turn them into tangible actions. ARDC have also developed top 10 FAIR data and software in different disciplines and they’re now publicly available so anybody can use them to untangle “what does it mean to make my data FAIR” in a specific area.
Making data FAIR requires actions from a number of different stakeholders in scholarly ecosystem.
Visit Taylor & Francis data sharing hub to understand our policies.