GEOINFORMATICS : A DEFINING OPPORTUNITY FOR EARTH SCIENCE RESEARCH
Written by the Interim Steering Committee on Geoinformatics of the American Geophysical Union, chaired by Krishna Sinha, Virginia Tech
This document was converted to Web format and edited slightly by Ramon Arrowsmith, Arizona State University
August 4, 2000
The future research directions and opportunities in earth sciences will be significantly affected both by the availability and utilization of information technology. Researchers in earth sciences are deeply involved in discovering the relationships between the observed geologic record and the complex processes that has shaped it, and recognize the uniqueness of the geologic evolution of the Earth within the solar system . The earth has a complex record of the dynamic interaction of plates and earth materials that provide clues to the physical and chemical evolution of continents and oceans. The rock record which preserves nearly 4.5 billion years of history has been meticulously gathered through observations over the centuries, and as the complexities of processes are only now being recognized through the application of new technologies, it is evident that new knowledge can be gleamed only if the multidisciplinary data can be evaluated numerically and geospatially through the utilization of Information Technology.

The Need for an Earth Science Data System

Ever-growing understanding and acceptance that the Earth functions as a complex system composed of myriad interrelated mechanisms have made Earth scientists realize that existing information systems and techniques used are oten inadequate. Currently, the unmanaged distribution of available data sets, a lack of documentation about them, and the lack of easy-to-use access tools and computer codes are major obstacles for scientists and educators alike. These obstacles have hindered scientists and educators in the access and full use of available data and information, and hence have limited scientific productivity and the quality of education. Recent technological advances, however, provide practical means to overcome such problems. Advances in computer design, software, disk storage systems as well as the growth of the World Wide Web (WWW) now permit for the first time the management of Gbytes to terabytes of data and the uniform distribution of information to scientists, educators, students, and the general public.

Earth Science is a discipline that is strongly data driven, and large data sets are often developed by researchers and government agencies. The complexity of the fundamental scientific questions being addressed requires integrative and innovative approaches employing these data sets if we are to find understanding. Although a number of databases exist, none of them are truly complete, error free as is practical, easily accessible, and simple to use. The ultimate goal of the Earth Science community is a fully integrated data system populated with high quality, freely available data, as well as, a robust set of software to analyze and interpret the data. This system would feature rich and deep databases and convenient access. These capabilities are needed to attack a variety of basic and applied Earth Science problems.

The development of the capability to construct, organize, and verify an Earth Science data system is a natural, and indeed essential, step for the Earth Sciences to move forward so that we can understand the Earth as a system as well as meet societal needs. Most Earth Science problems are inherently 4-D (x,y,z,t) in nature involving the subsurface and variation with time. Thus, their solution requires data analysis that is far more complex than provided by traditional geographic information systems (GIS). The extent, complexity, and sometimes primitive form of existing data sets and data bases, as well as the need for the optimization of the collection of new data, dictate that only a large, cooperative, well coordinated, and sustained effort will allow the community to attain its scientific goals. With a strong emphasis on ease of access and use, the resulting data system would be a very powerful scientific tool to reveal new relationships in space and time and would be an important resource for students, teachers, the public at large, governmental agencies and industry.

Fundamental new discoveries will require the availability of databases that encompass a variety of temporal and spatial scales. Because of the need to integrate heterogeneous data sets and tools to analyze them , the Geoinformatics program provides the focus for community participation in a national experiment to enhance and retain the pre-eminent role in the world for the United States in Earth Sciences research. It is also going to be the catalyst for the creation of a global data base.

The Interim Steering Committee (ISC) has identified both the procedural details for community participation, as well as recommends the most exciting research frontiers for the near future that require construction and utilization of databases. However, the most important Earth Science problems to be attacked using this data system and software are probably not yet known because the creative energies of people getting together to explore relationships among the data and test ideas will lead to unanticipated insights. The TWO recommendations are described separately, and the benefits to the entire Earth Sciences community are presented in the summary section.

Creation of a National Consortium of Academic Institutions

Although it has never been tried before, the power of having all information and knowledge along with access, modeling, and visualization tools at the finger tips of a user has great potential in advancing science, accelerating the discovery process, and enhancing the quality of Earth Science education. One of our goals is to bring this power to all scientists and interested parties by forming a virtual center consisting of a number of nodes that develop and maintain elements of the data system. Broad input and participation from the Earth Science community is sought, and the ultimate goal would be to form a consortium modeled after IRIS and UNAVCO. The membership would consist of all interested academic organizations in the U. S. and could easily exceed 100 eventually. Each member institution would appoint a representative to the governing body that would in turn populate a series of committees to address key issue such as standards, data management, software arrangements, publication strategies, personnel, and system architecture. Only a small staff could be hired initially.

Initial Organization Structure






The ISC recommends the establishment of a consortium of academic institutions through

1. invitation through mailings to all Earth Sciences institutions to participate.

2. announcements in national journals and news magazines

3. inform all earth science societies

In order to take the first step in this process, an initial group would be formed and would propose to design and develop selected nodes and the core of the first comprehensive Earth information system for research and education covering scales from global to local, spatial to temporal. This system will ultimately contain not only multidisciplinary data sets, but also data manipulation, analysis, visualization, plotting tools and modeling codes to exploit the digital data, all accessible on-line real-time via the World Wide Web. It will be built to handle not only 3D spatial but also temporal changes. There are countless data sets that could be developed into nodes on this system, but the funding levels anticipated, and prudence dictate that the initial implementation be modest. The details of this plan will be discussed at a workshop scheduled for the fall of 2000, but it will be limited to about six nodes and a central node that provides coordination, technical support, and facilities for needs such as backups. The nodes could be based on type of data, topic, or region.
 

Frontiers of Earth Science Research and Geoinformatics

The ISC at its second workshop (May 22nd  2000), identified three major research categories that are likely to bring opportunities for new discoveries in the immediate future through the creation of multidisciplinary geospatially referenced databases.
 



Two centuries of observational and analytical data collection and analysis are available to construct databases. As it is unlikely that all the data can be verified and digitally cataloged, the ISC recommends the creation of databases utilizing a progressive growth model based on near term research needs. A representation of our vision that provides for full community participation, identifies data sources and expert working groups responsible for formulating quality control methods, as well as creating attributes for all disciplinary data is shown below. The structure of the database will be constructed by experts in Earth Sciences that have significant expertise in both GIS and database management techniques. Additional help will be requested as needed from the computer science community.
 
 

Creating the Earth Science data base

A. Expert working groups

Represent expertise in research categories as defined by the programs within the EAR of the National Science Foundation. Responsibilities of the expert working group include

 (1) defining criteria for quality control within subdisciplines,

(2) locating databases available in the various subdisciplines,

(3) cataloging available software for data reduction or modeling,

(4) providing the attributes of data to be entered into the databases,

(5) promoting the utilization of geospatial data.

B. Data Sources and considerations

1. Published data will form the main component of the databases.

2. All unpublished (non proprietary) data and meeting standards of quality (i.e., would be published if submitted to a national journal) as defined the expert working groups.

3. Data available from other agencies and programs

C. Creating the database

Competitive proposals funded by NSF will provide the initial stages for construction of databases. The ISC recommends funding multiple requests in as many disciplines as possible to create the nascent interdisciplinary database. As oversight and management of the growing database is required, the ISC recommends a progressive growth model ,whereby the senior principal investigator will be responsible for nodal data management till a more centralized clearinghouse is established. The individual PI will then turn data over to the clearinghouse facility for permanent storage and distribution to the entire community.

Well crafted initial projects are critical to the success of the data system and ultimately to the formation of the consortium. A fundamental objective of the initiative must be the implementation of a visible change within the community by adding a geospatial component to the geologic culture. This initiative must be perceived as a significant contribution to the community at large. If the proposed data and information system is not regarded as an exciting and useful tool, members of the community will not expend the resources (monetary and time) required to access and ultimately contribute to the data system. The initiative requires several exciting, well integrated, and easily accessible examples of data system construction to establish the infrastructure as a indispensable community utility. To achieve this goal, the initial projects must address fundamental earth processes and make possible significant contributions to scientific understanding. It is not necessary to collect new data for this to be successful, rather the emphasis should be on mining existing data resources for the development and integration of data sets in a spatially and temporally referenced framework. In development of initial projects, this initiative must be sensitive to existing data infrastructure (IRIS, UNAVCO, NASA, USGS, NOAA/NGDC) and the anticipated needs of EarthScope.

SOME SUGGESTED INITIAL RESEARCH DATABASES AND STRUCTURAL ASPECTS FOR GEOINFORMATICS INITIATIVE

D. Available and needed Toolbox of software

The various expert working groups will be responsible for identifying all available software (academic and commercial) for data reduction, manipulation and modeling. The expert working groups will also be responsible for recommending the development of new software to enhance the utilization of the databases.

This requires development and maintenance of a well designed front-end for a variety of programs needed to extract, interface, and model data available from the data system (e.g., GPS community has good model to review: Scripts data structure for access to raw information; UNAVCO working groups to make velocities available to the non-GPS community). The development of a toolbox is a vital consideration, using existing data sets to construct and verifying databases is a major task and, without the needed software, virtually impossible. Thus these tools are an absolute necessity for the success of the data system. In an environment characterized by access to rapidly evolving data sets developed to address specific problems (curiosity driven research) modification, addition of information, and reorientation of the structure to address a new motive for data set development will require an evolving system of software applications

E. Dynamic models

The creation of numerical models with graphic and visualization capabilities will be significant for the growth of the Geoinformatics program. The ISC recognizes the increase opportunities for fundamental breakthroughs in EAR research if the right software is available to integrate and analyze multidisciplinary data with physical process-based tools.

F. Linkages to available databases

Many federal and state agencies as well as academic institutions and industry have either national and regional or thematic databases. Fusion of these databases with those created through the Geoinformatics program will require development of new software, new protocols as well as interagency agreements.

Summary and Long Term Vision

Our approach is central to the vitality and longevity of the data system. Simplicity and flexibility are crucial in developing a system that can respond to changing technologies and user needs. At the early stages of development, regional data system nodes will be crucial to gathering and maintaining regional contributions, and as an interface with the local community. The evolving data system would require the establishment of an interim facility where fundamental data sets are housed (e.g., available tools and programs) and linked via broad bandwidth Internet connections needed to handle data access and transfer. The data system must be flexible and have minimal infrastructure requirements (PC versus UNIX based systems; various data management protocols; peripheral hardware requirements) and a minimum of mandated data structure requirements. Metadata are needed and development efforts must be fully supported. For the data system to be successful, there must be an incentive for users to contribute data to the community system. NSF and the community could develop a system of rewards via some formal system of citation of produced data sets. A mechanism for publication of data sets (with or without interpretation) may interface with emerging digital publication systems and it is conceivable that the data system initiative may be able to enlist different societies (AGU, GSA, AAPG, etc.) to support electronic publication of data sets, depending upon the contents.

In the long term, the goal of this effort is build the initial organization into a consortium overseeing a comprehensive, effective national program in support of Geoinformatics. This effort will take a number of years to mature, and will require considerable thought and deliberation. The funding required is substantial and will probably require interagency cooperation.

The benefits of this program would include the continued scientific leadership of the United States in Earth Sciences, as well as the opportunity to construct a global data base that would uniquely characterize our planet.