NIH Data Management Guidelines

The following text is from the UNMC PowerPoint presentation entitled, "NIH Data Sharing Policy."


  • Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data.
  • Applies to researchers seeking $500K or more in direct costs in any year of the proposed research.

Applicability – Data Sharing Applies to ...

  • Sharing of final research data for research purposes.
  • Basic research, clinical studies, surveys, and other types of research supported by NIH.
  • Research that involves human and non-human subjects. It is especially important to share unique data that cannot be readily replicated.
  • If data sharing is not possible, state the reasons.

Data Sharing Across Countries

  • Investigators from foreign institutions and U.S. investigators collecting data in other countries should familiarize themselves with the policies governing data sharing in the countries in which they plan to work and to address any specific limitations in the data-sharing plan in their application.

Data Sharing Policy Implementation

  • For most studies, final research data will be a computerized dataset upon which the accepted publication was based, not the underlying pathology reports and other clinical source documents.
  • For some but not all scientific areas, the final dataset might include both raw data and derived variables, which would be described in the documentation associated with the dataset.
  • NIH supports neither the precise content for the data documentation, nor the formatting, presentation, or transport mode for data is stipulated.
  • However, NIH recommends following the best practices established by professional societies.
  • If an application describes a data-sharing plan, NIH expects that plan to be enacted.
  • If progress has been made with the data-sharing plan, then note this in the progress report.
  • In the final progress report, the PI should note what steps have been taken with respect to the data-sharing plan. In the case of noncompliance (depending on its severity and duration) NIH can take various actions to protect the Federal Government's interests. In some instances, for example, NIH may make data sharing an explicit term and condition of subsequent awards.

How Long to Keep the Data

  • Depends on:
    • The nature of the data (human subjects, non-human, etc.,)
    • Data’s on going research value
    • Whether there are specific policies governing that particular research (e.g., Genome-wide Association Studies Policy and so on)
  • As a rule of thumb, PI is required to keep the data for minimum of 3 years following closeout of the grant or contract agreement or from the date of the last expenditure report filed with the granting agency.
  • Often times, the PI institution may have additional policies and procedures regarding the custody, distribution, and required retention period for data produced under research awards.
  • The Office for Human Research Protections (OHRP) requires research records to be retained for at least 3 years after the completion of the research. 
  • Further, any research that involved collecting identifiable health information records must be retained for a minimum of 6 years after each subject signed an authorization. 
  • Note that these are minimum times.

Timeliness of Data Sharing

  • Data sharing should occur in a timely fashion, no later than the acceptance for publication of the main findings from the final dataset.
  • If data from large epidemiologic or longitudinal studies are collected over several discrete time periods or waves, it is reasonable to expect that the data would be released in waves as data become available or main findings from waves of the data are published.

Human Subjects and Privacy Issues

  • The PI, IRB, and the Institution have responsibility to protect the rights of subjects and the confidentiality of the data.
  • Prior to sharing the data, de-identify the data.
  • Researchers should also consider removing indirect identifiers and other information that could lead to "deductive disclosure" of participants' identities.
  • Researchers who seek access to individual level data are typically required to enter into a data-sharing agreement.
  • Researchers who are planning clinical trials and intend to share the resulting data should think carefully about the study design, the informed consent documents, and the structure of the resulting dataset prior to the initiation of the study.
  • Investigators who are working for or who are themselves covered entities under HIPAA must consider issues related to the Privacy Rule, The Department of Health and Human Services (DHHS) provides guidance on research and the Privacy Rule (
  • If research participants are promised that their de-identified data will not be shared with other researchers, the application should explain the reasons for such promises. Such promises should not be made routinely and without adequate justification.
  • For the most part, it is not appropriate for the initial PI to place limits on the research questions or methods other PIs might pursue with the data.
  • It is also not appropriate for the investigator who produced the data to require co-authorship as a condition for sharing the data.
  • Research datasets from studies that do not include human subjects generally should not be constrained by the limitations deemed necessary and appropriate for human subjects.

Methods for Data Sharing

  • Under the auspices of the PI
  • Data archive
  • Data enclave
  • Mixed mode sharing

Investigator’s Choice

  • PI choice is likely to depend on several factors, including the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated.
  • Data sharing channels may include mailing a CD with the data, or post the data on institutional or personal Website, collaborative network area with other investigators, transferring data seekers to a data archive facility, maintain associated documentation, meet reporting requirements.
  • Data archives can be particularly attractive for investigators concerned about a large volume of requests, vetting frivolous or inappropriate requests, or providing technical assistance for users seeking help with analyses.
  • Investigators should consider using a data-sharing agreement to impose appropriate limitations on users. Such an agreement usually indicates the criteria for data access, whether or not there are any conditions for research use, and can incorporate privacy and confidentiality standards to ensure data security at the recipient site and prohibit manipulation of data for the purposes of identifying subjects.

Data Archive & Enclave

  • Data Archive - A place where machine-readable data are acquired, manipulated, documented, and finally distributed to the scientific community for further analysis.
  • Data Enclave - A controlled, secure environment in which eligible researchers can perform analyses using restricted data resources.

Data Enclave

  • Datasets that cannot be distributed to the general public, for example, because of participant confidentiality concerns, third-party licensing or use agreements that prohibit redistribution, or national security considerations, can be accessed through a data enclave.

Mixed Mode Data Sharing

  • This method allows for more than one version of the dataset and provides different levels of access depending on the version.
  • For example, a de-identifed dataset could be made available for general use, but stricter controls through a data enclave would be applied if access to more sensitive data were required.

Data Sharing Workbook

  • Investigators will need to determine which method of data sharing is best for their particular dataset.
  • The Data Sharing Workbook provides information and examples of how others have shared data.

Data Documentation

  • Proper documentation is needed to ensure that others can use the dataset and to prevent misuse, misinterpretation, and confusion.
  • Documentation should provide information about the methodology and procedures used to collect the data, details about codes, definitions of variables, variable field locations, frequencies, and the like. The precise content of documentation will vary by scientific area, study design, the type of data collected, and characteristics of the dataset.
  • It is appropriate for scientific authors to acknowledge the source of data upon which their manuscript is based.
  • It could be in methods and/or reference, or in acknowledgement sections of the manuscripts,
  • Authors using shared data should check the policies of the journal to which they plan to submit to determine the precise location in the manuscript for such acknowledgement.
  • Most journals now expect that DNA and amino acid sequences that appear in articles will be submitted to a sequence database before publication.

Funds for Data Sharing

  • Applicants can request funds for data sharing and archiving in their grant application.
  • Investigators who incorporate data sharing in the initial design of the study may more readily and economically establish adequate procedures for protecting the identities of participants and share a useful dataset with appropriate documentation.