Source: EPA 1994 Toxics Release Inventory
Public Data Release, Appendix C

The goals of EPA's data quality program for TRI are to: (1) identify and assist facilities that must report so that data submitted will be of the highest quality; (2) insure high quality data entry; (3) correct and normalize as much of the submitted data as possible in order to maximize the utility of the data; (4) accurately assess the relative validity of release estimates and other data, and (5) ensure completeness of the database with compliance and enforcement measures.

Identification and Assistance to Facilities

Through mass mailings to all facilities within the manufacturing sector of the economy, work with a wide variety of trade associations, local and national seminars, training courses, and enforcement activities, EPA has endeavored to locate all facilities required to report under section 313 of EPCRA and inform them of their obligations. In addition, EPA has prepared various materials to assist facilities in complying with EPCRA. These include detailed reporting instructions, a question-andanswer document, magnetic media reporting instructions, general technical guidance, and sixteen industry-specific guidance documents. In addition, EPA maintains a toll-free hotline to answer regulatory and technical questions to assist facilities.

Data Entry Quality Activities

EPA continues to place a high emphasis on data entry accuracy within the Toxics Release Inventory database. EPA's internal review of 3% of the records showed a data entry accuracy rate of over 99.9%. This is up from a 1987 reporting year rate of 97.5%. EPA continued the computerized edit checks at the point of data entry, including a high percent of verification and formalization of data reconciliation activities. EPA mailed copies of the release and transfer estimates to all reporting facilities to allow them to verify the entered data. EPA also received 62% of the 1994 submissions from facilities reporting on magnetic media, which ensures against EPA data entry errors. This compares to 53% magnetic media submissions for 1993. EPA is continuing to encourage the use of magnetic media by all submitters.

Correction and Normalization of Data

Because Congress has required that EPA make the TRI data available to the public through computer telecommunications, EPA has found it necessary to undertake a variety of activities to make the data more usable. This is due to the fact that computers only retrieve data in exactly the format requested (e.g., if asked for "Los Angeles," the computer will not identify facilities listed under "LA"), and facilities report their data in a wide variety of ways. As a result, EPA has taken steps to use a consistent name for all counties, used a variety of nomenclature standards for names within the database, added latitude and longitude representing the center of the zip code area in which the facility is found, and has taken other steps to assist in the utilization of the data.

EPA generates a facility identification number at the time of data entry. Linkage between all years of reports has been made to the best of EPA's ability. This allows easy retrieval of crossyear data, even when a facility is sold or changes its name. The identification number has been sent to all facilities. Facilities are required to use this number on all future Form R reports submitted to the Agency. Use of this number facilitates data quality and cross-year analysis.

In 1995, EPA provided all states with a listing of facilities that reported for 1994 to verify that both the state and federal government received the same data. States that responded found cases where facilities had not reported to one or the other government. States provided copies of forms to the EPA where EPA had not received copies, and vice-versa. This activity has provided a critical step to assist EPA in coordinating the data collection with the states.

Every year EPA issues Notices of Noncompliance (NONs) to facilities who use invalid forms or provide incomplete forms, incomplete facility identification, or incorrect/missing chemical identification. These facilities are also notified by telephone to make sure their follow-up revisions correct these errors. A facility that does not comply with a NON may be subject to civil penalties.

For reporting year 1994, EPA has again issued Notices of Technical Error (NOTEs) for missing required data or for incorrect information, such as facility identification numbers or invalid codes. The response rate to the NONs and NOTEs has been very good and has prevented errors from recurring in following years. To help facilities avoid these types of errors, a list of common errors was provided in the 1989 through 1994 reporting year instructions. Due to lack of a final regulation for the pollution prevention data elements and budget cuts for the TRI program, EPA did not issue NOTEs for the 1991 and 1992 reporting years.

Accuracy Evaluation

The accuracy of the release data can vary. Some releases can be estimated fairly easily, just by knowing how much of the chemical was used during the reporting year or by weighing drums of solid/liquid waste. Where monitoring of release streams or wastes has been done, release estimates may be within 20% of actual amount released, although infrequent, non-representative sampling may lead to much less accuracy. Estimates of fugitive air emissions and complex wastewaters for which monitoring data are not available may be off by one or even two orders of magnitude, particularly when the release is a small percentage of the amount of the chemical actually processed.

For the 1987 and 1988 reporting years, EPA conducted audits at facilities to determine how well facilities complied with the law and estimated release quantities. These audits did not "confirm" estimates through monitoring, but determined how well facilities used available data and estimation techniques to calculate releases.

Overall, based on the audit of 156 facilities, 1987 total annual releases appeared to have been underestimated by 2%, representing the net effect of overestimates and underestimates. For non-zero release estimates, more than three-quarters were within a factor of two of EPA's best estimate. About 15% were in error by an order of magnitude or more.

The survey of the 1988 data focused on facilities in Standard Industrial Classification (SIC) codes 28 (chemical manufacturing), 29 (petroleum refining), and 34 (metal finishing and fabrication). Ninety facilities were visited. The aggregate 1988 release estimates in these industries were more accurate than their 1987 estimates, since their aggregate 1988 estimates were found to be approximately equal to the estimates calculated by the EPA contractor.

EPA is conducting another data quality survey of the 1994 data for SIC codes 28 (chemical manufacturing), SIC code 25 (furniture manufacturing), and SIC code 30 (Rubber and plastic products). This survey should be completed by the end of 1996 with results available by early 1997.

For the 1987 and 1988 reporting years, in a different type of survey, EPA also identified approximately 1,800 forms with suspect release data and telephoned facilities to discuss how to improve and correct their estimates. The information from this survey was also used to improve the reporting instructions and technical guidance.

Compliance Activities

EPA has taken steps to make data quality a priority in its enforcement program. EPA conducted approximately 70 inspections during 1995 that focused on data quality in addition to nonreporting violations. EPA has developed a guidance manual for EPA Regional inspectors outlining what to look for when auditing an EPCRA reporting facility. The manual contains detailed guidance on how to determine if a facility has identified all reportable chemicals, made proper threshold determinations, and provided reasonable release estimates.

In fiscal year 1990, $1 million was awarded to 11 states to develop and implement TRI data quality assurance programs. These projects focused on one or more broad data quality assurance objectives: 1) verification of the accuracy of the estimates and other data submitted by the facilities; 2) identification of facilities that should have reported but did not; and 3) identification of discrepancies between TRI data reported to EPA and to the state. Quality assurance activities included facility site visits and telephone audits, cross-checking TRI data against other state data, such as permit data, using computer algorithms to identify suspect estimates, and comparing TRI data across reporting years.

