Data management
latest

Contents:

  • Good practices
  • The LCBC github organization
  • Data organization
  • Generate README file templates for metadata
  • References and useful links
    • Data repositories
    • How to write README files
    • File Formats
    • Examples for papers publishing their data
    • Recommended licenses
  • Data management meetings
Data management
  • Docs »
  • References and useful links
  • Edit on GitHub

References and useful links¶

  • The starting point for most of the info: https://researchdata.epfl.ch
  • The slides of Pablo’s presentation on RDM can be found at: https://github.com/lcbc-epfl/data_management
  • A nice tool for writting data management plans: https://dmp.opidor.fr

Data repositories¶

  • The zenodo data repository: https://zenodo.org
  • The Journal of Physical and Chemical Reference Data: https://aip.scitation.org/journal/jpr
  • Figshare https://figshare.com/ (It has to be noted that figshare is commercial and is currently owned by Nature Publishing Group.)
  • Open Science Framework https://osf.io/

How to write README files¶

  • https://www.dataone.org/best-practices/metadata
  • https://data.research.cornell.edu/content/readme

File Formats¶

  • DataCite Metadata Schema: https://schema.datacite.org
  • https://www.force11.org
  • http://dublincore.org/documents/dces
  • hdf5 format: https://www.hdfgroup.org
  • Automated Interactive Infrastructure and Database for Computational Science, AiiDA: http://www.aiida.net

Examples for papers publishing their data¶

Molecular dynamics data

The Dynamic Conformational Landscapes of the Protein Methyltransferase SETD8, Shi Chen, Rafal P. Wiewiora, Fanwang Meng, Nicolas Babault, Anqi Ma, Wenyu Yu, Kun Qian, Hao Hu, Hua Zou, Junyi Wang, Shijie Fan, Gil Blum, Fabio Pittella-Silva, Kyle A. Beauchamp, Wolfman Tempel, Hualiang Jiang, Kaixian Chen, Robert Skene, Y. George Zheng, Peter J. Brown, Jian Jin, Cheng Luo, John D. Chodera, Minkui Luo bioRxiv 438994; doi: https://doi.org/10.1101/438994

Text in the paper: The molecular dynamics datasets generated and analyzed in this study are available via the Open Science Framework at https://osf.io/2h6p4. The code used for the generation and analysis of the molecular dynamics data is available via a Github repository at https://github.com/choderalab/SETD8-materials

Quantum Chemistry

Random versus Systematic Errors in Reaction Enthalpies Computed Using Semiempirical and Minimal Basis Set Methods, Jimmy C. Kromann, Alexander Welford, Anders S. Christensen, and Jan H. Jensen, ACS Omega, 2018, 3 (4), pp 4372–4377, DOI: 10.1021/acsomega.8b00189

Text in the paper: The Cartesian coordinates of the molecules used in this study can be found here: https://doi.org/10.6084/m9.figshare.5822061

Software

Graph-based genetic algorithm andgenerative model/Monte Carlo tree searchfor the exploration of chemical space, Jan H. Jensen, Preprint, 2019, https://doi.org/10.26434/chemrxiv.7240751.v2

Text in the paper: SUPPLEMENTARY INFORMATION: The codes used in this study can be found on GitHub: github.com/jensengroup/GB-GA/tree/v0.0 andgithub.com/jensengroup/GB-GM/tree/v0.0

Recommended licenses¶

  • MIT licence: Completely free whatever you want to do with the software, data. Has to redistributed with the license file and your name. Compatible with most other licenses. E.g it is possible to include MIT licensed code in a GPL software package. https://en.wikipedia.org/wiki/MIT_License
  • GNU GPLv3: A bit more restricted than MIT License for free software. MIT License is permissive, GPL License has strong copyleft. https://www.gnu.org/licenses/gpl-3.0.de.html
  • Creative Commons BY 4.0: Recommended license for data, it is free to reuse and alter your software or data but can only be shared with attribution to you. https://creativecommons.org/licenses/by/4.0/deed.en
Next Previous

© Copyright 2019, LCBC Revision fbd0ab21.

Built with Sphinx using a theme provided by Read the Docs.