Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Audit of National Institutes of Health's Data Integrity Controls for the Sequence Read Archive Data

The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, hosts one of the National Institutes of Health's largest and most diverse datasets, the Sequence Read Archive (SRA). SRA is a broad collection of experimental DNA and RNA sequences that represent genome diversity. In 2019, SRA held 9 million records in 2 formats. The original format (23 petabytes) is received by NCBI from submitters and is instrument and experiment specific; these data are stored to tape. NCBI then transforms these original data into standard SRA normalized format (12.7 petabytes) for redistribution. Through this SRA normalized database, which is cloud based and accessed via NCBI servers, researchers can search metadata to locate the sequence reads for further analyses. SRA usage follows International Nucleotide Sequence Database Collaboration principles, which state that data are shared without restriction, that the individual submitting the data must be the owner of the data, and that ownership of the data remains with the submitter even after submission. This audit will concentrate on system integrity controls, including malicious code protection and data input validation as well as other Federal requirements for normalizing and archiving SRA data. The audit objective will be to determine whether NIH has implemented adequate system integrity controls to ensure the reliability of SRA data.

Announced or Revised Agency Title Component Report Number(s) Expected Issue Date (FY)
Revised National Institutes of Health Audit of National Institutes of Health's Data Integrity Controls for the Sequence Read Archive Data Office of Audit Services WA-22-0005 (W-00-22-42043) 2024