Development and production use of efficient space-time indexing

Kumar, Amit Ranjan; Kumar, Amit Ranjan

Geospatial database solutions have derived incredible efficiencies by doing away with coordinate systems and implementing custom spatial datatypes. These same efficiencies can be applied to the time dimension as well by extending the n-dimensional spatial model to include a scalar for time.

In recording meteorological data we are still at the mercy of bandwidth and storage capacity. The gap between currently available data and what can be stored for processing, while shrinking, is massive. To this end, this space-time datatype aims to increase the maximum practical database record limit by allowing more efficient indexing and thus reduced query and processing times.

With this approach, a space-time datatype as well as associated methods were implemented in PL/Python. This architecture outperformed the following conventional architectures: MySQL spatial, PostgreSQL with PostGIS, Oracle Spatial, Microsoft SQL server, and MongoDB with spatial extensions.

Experiments were run against a database of events that contain space-time information, and three variables covering associated measurement data. The size of the database during testing was 100 million records. Average query times for both single records and then all records within a time range were recorded and compiled.

The percentage improvement between this new architecture and the competition varied tremendously, but on average is 20% faster, except for MongoDB spatial. Performance in MongoDB is superior, 10% on average, in the case of retrieving single records, but inferior, -30% on average, in the case of retrieving a time-range of relevant records.

4.1 Development and production use of efficient space-time indexing