Chapter 4 Open Access Reproducible Science

In physics, the observations of physical experiments are by definition reproducible by anyone. The verifiability of those results is an important cornerstone of the scientific process. Historically, notebooks and scientific journals have played a crucial role in the development of new technology. Nowadays, much of fundamental physics research is funded in by nationally funded organizations, such as the National Science Foundation (NSF). Proposals now require a data management plan. There has been a recent movement in physics towards open access and reproducible science, sharing of data for the advancement of science, but also to avoid errors and to become more efficient. With modern tools, this means that we need data management, open-source analysis and visualization tools.

Some journals are emphasizing reproducibility. The journal Nature requires manuscript authors to complete a checklist.(“Announcement: Reducing Our Irreproducibility” 2013) Authors can opt to submit supplemental materials and data. Data sets have long been available for large high-energy physics projects; i.e. opendata CERN for astrophysics; i.e. Sloan Digital Sky Survey, for life sciences; i.e. DOE Data Explorer, and for national science laboratories; i.e. NIST Physical Reference Data repository. However, more recently several important tools are emerging for small groups in condensed matter physics:

  • Zenodo: open source code, poster, presentation repository that is citable
  • DataVerse: open source research data repository
  • Harvard Dataverse: 2000+ physics datasets
  • FigShare: repository for research outputs
  • GitHub: general data repository

Research data repositories are springing up and enabling more reproducibility and exploration, see re3data repository search engine for example.


Excercise: Review the 34 slides of “Intro to Reproducible Science


4.1 Open Knowledge Network Roadmap

The National Science Foundation, the White House Office of Science and Technology together with industrial players in the United States have developed the Open Knowledge Network Roadmap (OKN) in 2022. The premise of artificial intelligence (AI) powered future, requires a network of open knowledge. Modern society is more and more based on vast amounts of diverse data to establish services.

It includes the design of data structures, data ingestion, data storage, and data consumption. The system aims to promote transparency and an ethical approach towards societal gains.

Models for such a system are Wikipedia or Wikidata. The data can be otherstood through schemas or other data structures and entities. OKN strives to provide a public data infrastructure that enhances access to resources created from government supported programs.

When machine learning (ML) and AI are applied to the OKN, improvements in the following areas are expected: equity and justice issues, climate change, disaster prevention, energy systems, health communication, innovation in research, and financial risk analysis.

4.2 Tools

Some open-source tools are useful in this context:

Getting familiar with the concept of data management and its tools is essential for physics research and for related industry projects.

References

“Announcement: Reducing Our Irreproducibility.” 2013. Nature 496 (7446): 398–98. https://doi.org/10.1038/496398a.