Quality Assurance Checks and Data-Processing Activities Performed by CDIAC

An important part of the data documentation and dissemination process at CDIAC is the quality assurance (QA) of data before distribution. Data received at CDIAC are rarely in perfect condition for immediate distribution, regardless of the source. To guarantee data of the highest possible quality, CDIAC conducts extensive QA reviews, which involve examining the data for completeness, reasonableness, and accuracy. Although these reviews have common objectives, they are tailored to each data set, often requiring extensive programming efforts. This time-consuming process is an important component in the value-added concept of assuring accurate, usable data for researchers.

The NOAA/CMDL flask CO2 database contains CO2 measurements and other parameters from many sites. That only a few minor problems were discovered by CDIAC reflects the considerable effort and scrutiny exerted by the NOAA/CMDL Carbon Cycle Group in providing high-quality, well-documented, consistently-formatted data to an international scientific audience. The few problems encountered by CDIAC were quickly addressed and resolved by the NOAA/CMDL Carbon Cycle Group. The following summarizes the QA checks and data-processing activities performed by CDIAC.

QA Checks

CDIAC obtained the original NOAA/CMDL flask CO2 database from the NOAA/CMDL Carbon Cycle Group anonymous FTP area as two UNIX "tar" files. These files were transferred to CDIAC using FTP commands and exploded (i.e., untarred). Working copies of the files were created and processed in the following ways:
  1. All data files contributed by the NOAA/CMDL Carbon Cycle Group were checked to ensure that each was formatted as stated, contained the data described, and contained the period of record specified.

  2. Each file was checked to ensure that the prescribed missing value conventions (i.e., -999.99 for CO2 mixing ratios and 99 for date parameters) were consistent throughout all files and that no other missing value designations were used in the files.

  3. Frequencies of occurrence were generated for the instrument codes and data selection codes to assess the abundances of each code, check for bogus codes, and permit documentation of all possible codes.

  4. Mean values were generated for each numeric variable in each data file and these values were checked for reasonableness (e.g., a range of month values from 1 to 12).

  5. All data were plotted. Extreme values were identified, and these values were traced to the original data files to ensure that nonbackground flag codes were associated with each value.

Data Processing

  1. CDIAC did not alter the format of the NOAA/CMDL flask CO2 database files. The files distributed by CDIAC are identical in format to the files distributed by NOAA/CMDL.
  2. To assist users wishing to retrieve and process fewer files, two files were created by CDIAC from the >100 files distributed by NOAA/CMDL. One (all.co2) contains CO2 mixing ratios from all individual flask air samples for all sites except the shipboard measurements. The second file ( allmm.co2 ) contains the monthly atmospheric CO2 measurements for all sites, again excluding the shipboard measurements.
  3. The annual values shown in the data listings in Appendix B were generated by CDIAC for those wishing to have annual atmospheric CO2 mixing ratios. Annual values were calculated arithmetically for years having all 12 monthly values. These values are not provided in the machine-readable data files.


    Previous | Continue | Access Data | Table of Contents
    CDIAC Home Page | E-mail CDIAC