Extract from: Malafant, K.W.J. and Radke, S. (1995). The Terabyte problem in Environmental Databases. In: Recent Advances in Marine Science and Technology '94. Eds. Bellwood, O., Choat, H. and Saxena, N. ISBN 0 86443 540 1, Pp. 615-623.
The Terabyte Problem in Environmental Databases
The size and complexity of environmental issues is paralleled by the size and complexity of the databases used to comprehend them. Many environmental issues now demand the management and processing of huge volumes of data. Indeed, terabytes of data (1012 bytes or one million megabytes) are now required to understand some problems effectively. Remotely sensed data in general, and satellite imagery and seismic data in particular, generate datasets of such sizes. This imposes demands on the way we compute, the way we communicate, the way we use storage management techniques to maintain and manage the store, and the way we deliver information to end users. Each of these issues needs to be addressed for the next generation of environmental databases to be used effectively for sustainable development.
Hierarchical Storage Management (HSM) systems provide management and storage of files, extending the file system capacity by the ability to migrate files from fast, expensive devices to slower, more economical devices. They are based on a layered or hierarchy of storage devices which contrast device and media cost to performance.
To provide effective management of storage, especially for large, terabyte plus, data storage needs, a hierarchical storage management scheme must satisfy five important requirements. The scheme must be: automatic; transparent; reliable; scalable; and distributed (Malafant and Radke, 1995).
Hierarchical Storage Management provides a flexible approach to storage management not currently available for network/systems administrators. HSM places both tape and optical subsystems as primary storage devices.
Petroleum exploration and development in Australia requires digital data to be lodged with the Australian Archives in Villawood, NSW. A storage system for 6 to 10 terabytes was needed to provide efficient storage and retrieval of data transcribed to high density media. Initially, the aim was to store around 1 terabyte of data in an easily accessible form to allow the companies and researchers rapid access to the data. Other constraints on the solution included:
The solution chosen to provide this innovative storage management was a hierarchical storage management system based on five levels of storage (Malafant and Radke, 1994):
The use of optical and tape technology to provide hierarchical storage solutions is set to continue. There are at least four developing technologies, which will ensure that this development will continue:
One other technology that will be used to provide larger storage solutions will be CD-ROM (Dvorak, 1994). This will almost certainly be used for distribution of collateral or value-added products after analysis of our terabyte datasets. The emergence of CD-ROM jukeboxes, holding more than a dozen disks, allows the storage of gigabytes of such products.
Dvorak. J.C., 1994. Dvorak Predicts: An Insider's Look at the Computer Industry. McGraw-Hill, Berkeley, California, USA.
Malafant, K.W.J. and Radke, S. (1995). The Terabyte problem in Environmental Databases. In: Recent Advances in Marine Science and Technology '94. Eds. Bellwood, O., Choat, H. and Saxena, N. ISBN 0 86443 540 1, Pp. 615-623.
Copyright © 1998 by Kim Malafant. All rights reserved. This Web page may be freely linked to by other Web pages. Contents may not be republished, altered or plagiarized. compleXia does not control or endorse the content of third party Web Sites.