Bits to Data:


As outlined in the figure below, one can think of the "stuff" that comes out of sensors as "bits" (or bytes or words) that have to be scaled and calibrated before they can be viewed into commonly used units (such as temperature in degrees or phase angle in radians.) Once these "corrections" have been made these numbers can be evaluated for "reasonableness" (for want of a better term). The output of this validation step can be thought of as data.
 

Block diagram of a typical data
system

Once there is "data", it is appropriate to add it to a "database" of some sort. Historically, this is likely to be flat file although of late some have come to think of geographic information systems (GIS) as a kind of database.

I've (slowly) come to believe that it's most effective for the metadata to reside in a relational database that can be effectively and efficiently searched using standard query languages (perhaps best embedded in a suitable client.) Such queries result in in pointers (persistent resource locators) from which the underlying ("real") data can be obtained.

Users interacting with the data may turn up further questions about the validity of the data. There should be a clear and responsive path to feed this information back to the maintainers so that the data quality can be improved.

It is crucial that the raw (bits) be archived w/ accurate time stamping in the event that it is necessary to recover from any of a wide variety of possible errors such as firmware upgrades to the sensors, changes in calibration coefficients, etc.

The database must be backed up up to recover from various kinds of disasters (such as fire, flood, or failed updates.)

It is important that these issues get fed back to the data maintainers by some path other than direct user manipulation
of the database.
 

Data Exchange:

It is also important that the content of the "database" be transferable to other's. To this end, a discussion is under way under the auspices of the Research Vessel Technical Enhancement Committee (RVTEC) which is part of the University National Oceanographic Laboratory Systems (UNOLS) on data exchange formats.

The following diagram may help illuminate the discussion:

Block diagram of creating and exchange file and extracting data from it. For discussion purposes, lets assume that we are not going to have substantial impact on the behavior of the exiting ship's data systems. However, we can elect to define the exchange file format to include required as well as optional "fields". For instance, we could require that the file contains the raw (as generated) binary data and the calibration coefficients and allow an optional fields for scaled (engineering) units.



 



With regard to the RVTEC effort, it appears that we are agreed that
  1. the exchange file will be in NetCDF format,
  2. the exchange file will contain appropriate metadata
  3. one (or more) extraction tool(s) will be created
  4. the initial effort will be based on a widely used physical/chemical oceanographic data (IMET or TSG)
There is an open list server for the RVTEC discussion. You can subscribe by sending an email message to "majordomo@ldeo.columbia.edu". In the body of the message include the following line: "subscribe rvtecdata-d".
 



Created:
Posted:     September 9, 1999
Updated:  October 30, 1999 (content)

Updated:  December 22, 2004 (spelling)


By:  Dale Chayes, LDEO/CU, (c) LDEO/Columbia University

Questions, comments etc. to:  Dale Chayes