Web Feature 
Data Preservation: A Global Perspective
Christopher M. Keane

By the early 1990s, petroleum companies spent decades exploring the North Sea for oil. The Norwegian government realized that the data these companies had been collecting held their own value, not only for the development at hand, but for future development and other potential applications. The government also saw a need to ensure the quality and longevity of the data.

The Norwegian government formed the first national geoscience data repository. Run by the Norwegian Petroleum Directory, the data repository is both a system for managing seismic data and a core store, where anyone can access the hundreds of drill cores that petroleum companies have taken from the bottom of the North Sea. To ensure that the repository’s data are readily available and of high quality, the government mandated that companies submit data — seismic surveys, well logs and cores — to the national data repository. More importantly, the government decided that the repository should function as more than a reference library of earth materials, but rather as a data management system integral to the operations of both the industry and government agencies.

A key concept intrinsic to supporting national data repositories is that geoscience data are a valuable national asset. Though the exploration and production staff of most companies recognize this value in geoscience data, many senior leaders in industry view data storage and management as a cost center. In Norway and the England, for example, companies offload the data management of seismic data to the Norway’s DISKOS component of its repository, where well logs and seismic data are stored, and offload well logs to Common Data Access, a repository formed in England that also houses North Sea data. As a result, companies reduce their burden of a direct cost center. Additionally, with ready access to their data held by the national data repositories, the companies are able to provide equal access to their data for all appropriate parties. For example, in 1996, the Norwegian Parliament changed the reporting requirements for petroleum producers to an on-request system. Now companies no longer need to worry about meeting some of their reporting deadlines, as the licensing and tax authorities are able to directly build the reports on-demand from the data held in the repository. Additionally, with these systems, data transfers during lease sales now take minutes compared to days and weeks, as all that is required is reassignment of access rights within the data system. This approach not only maximizes the value of the data repository concept, but also ensures the long-term value of the data through extremely effective access and quality control.

The concept of creating data repositories has since spread across the world, with many countries either having recently established data repositories or being in the process of initiating operations. In March 2002, the Norwegian Petroleum Directorate, Department of Trade and Industries of the United Kingdom, and the Petrochemical Open Software Consortium hosted the fourth meeting of National Geoscience Data Repositories in Stavanger, Norway. Bringing together data repository personnel from 18 of the approximately 50 countries that house data repositories, the meeting revealed that national geoscience data repositories in all countries face similar issues.

A data repository operates as more than a simple “library” of data. It is an opportunity for government institutions and industry to cooperate for mutual benefit. The issues driving the specific functionality and design of any given country’s repository is who owns the data, the priorities of national investment, and the nature of the geoscience industries in those countries.

Petroleum geoscience activities drive the formation of national geoscience data repositories. In most countries, the focus on data is well logs, seismic data, and cores and cuttings. Other geoscience data, such as maps, scout tickets, paleontological samples or materials gathered for mineral exploration are also managed, but often at a lower priority. Depending on the particular country’s need, some repositories focus on data related to all subsurface information, including cores from mineral and groundwater cores.

Ownership rights of geoscience data vary from country to country. Most European and North American countries grant initial data ownership to the acquiring entity. These data, however, generally must be reported to the national data repository and are held proprietary for a set amount of time or until the company abandons their exploration and production leases. This open-market approach made some companies reluctant to participate in establishing national data repositories in these areas. In most other countries, including South Africa and New Zealand, all geoscience data are the property of the government and are “borrowed” by a company for exploration and development. These countries also hold the data proprietary during the company’s operations. But companies otherwise have little control over their acquired data, particularly during lease sales or transfers.

Most national data repositories function as government or quasi-government organizations serving three sets of clients. They must provide data management services to the operating companies, regulatory data management for the licensing and tax authorities, and access for economic development agencies who try to promote future exploration of the country’s resources. Thus, the data repositories are responsible to multiple clients, all of whom have distinctly different interests and valuations of the geoscience data. The only overarching need of these groups, beyond basic access, is reasonable data quality.

During the March meeting, most of the representatives said the largest issue they encounter is the quality of the data provided by the companies. Most national data repositories spend a substantial proportion of their non-fixed costs on quality control issues, including evaluation and data correction. The quality issue is so critical that in some cases, such as New Zealand, the national data repository will return the data to the acquiring company until those data are sufficiently improved. The most common quality control problem is with the metadata — the data about the data. What is often lacking is sufficient information about where and at what depth each core was taken, or sufficient navigational information about seismic data. However, all of the national data repository representatives who attended the March meeting said that the ultimate responsibility for quality control rests not with the companies acquiring the data, but with the organizations running the repositories.

The United States is an exception, as it does not have a centralized data repository system. The data at risk are also different in the United States. Whereas other countries make preservation of well log and seismic data the priority, the data most at risk in the United States are drill cores. Most countries prioritize the active management of seismic and well log data, because they are considered easier to manage, they can be made digital, and are the critical first-order data for exploration. Most countries actively preserve cores and cuttings, but they consider these data the details supporting the use of well logs and seismic data. The American Geological Institute, with support from the Department of Energy, is working to establish a National Geoscience Data Repository System. And the National Research Council recently published a report calling for preservation of geoscience data (see the story on page 16). Both of these projects focus on preserving physical data, such as cores and cuttings. The application of the free market to data preservation in the United States has created large commercial markets for seismic and well log resales, but limited commercial interest in cores and cuttings. Just as in other countries, the cost of preserving the physical data is high compared to the near-term return on investment. However, while most other countries do recognize that cores and cuttings have substantial, long-term value, it is difficult for a free-market system to address such needs.

Digital data management is also an area of concern, for which two distinct perspectives exist. The largest data management issue is the retention of original or field data. Unanimous agreement exists that paper records, such as well logs, should be and are, except in the United States, universally scanned and stored in digital databases, followed by disposal of the paper files. At the same time, particularly with seismic data, no consensus exists about retaining field data. A number of representatives said their countries only want to preserve the stacked seismic data that are ready for interpretation. Yet some representatives, such as those from Brazil, argued that unprocessed field data must be retained as its value will likely increase significantly with improvements in processing. However, storing field data is very expensive because they are produced in large volumes, they must be transcribed periodically to new media, and they are vulnerable to becoming worthless if their metadata are lost.

The third major issue is determining the reality of a perceived need for massive amounts of digital data to be online, versus a more cost-effective “near-line” system where most digital data can be made available in hours or days. Both DISKOS in Norway and CDA in the United Kingdom have complete online data access of seismic data and well logs, respectively. Although these systems are exemplary, those running most other national data repositories do not perceive that their customers have immediate need for such rapid access, and their clients probably would not appreciate the costs they would incur to enable such access. Wide recognition exists for the eventual need to provide more instantaneously accessible online data, but the cost structures of multi-terabyte data centers need to be guaranteed funding, which has varied by the level of exploration activity in each country.

The final major issue all national data repositories face is ensuring the preservation and accessibility of the data. Key to preservation and accessibility is valid and extensive metadata, including ownership, location, methodologies and other pertinent parameters. Regretfully, poor metadata is a common issue throughout the world, particularly for data that were taken in areas eventually regarded as non-economical for exploitation. Without sufficient metadata, quality control efforts often lead to the disposal of the data, and the fundamental loss of a potential asset.

A geoscience data repository is more than a dusty building full of rocks and paper. National data repositories are centers of active data management and dsitribution focused on supporting a country’s economic and environmental needs. The United States lacks such a centralized function and, given the independent natures of companies, states and federal agencies, it is difficult to envision one evolving. Many states house world-class state data repositories. But all the domestic data repositories should look beyond U.S. borders and understand not only data management best practices, but also the vision of a dynamic data archive and management approach seen throughout the world.

Keane is Director of Development, Communications, and Technology for the American Geological Institute. He works regularly on AGI’s National Geoscience Data Repository System project. E-mail

Geotimes Home | AGI Home | Information Services | Geoscience Education | Public Policy | Programs | Publications | Careers

© 2019 American Geological Institute. All rights reserved. Any copying, redistribution or retransmission of any of the contents of this service without the express written consent of the American Geological Institute is expressly prohibited. For all electronic copyright requests, visit: