Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Array DBMS
An array database management system or array DBMS provides database services specifically for arrays (also called raster data), that is: homogeneous collections of data items (often called pixels, voxels, etc.), sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.
In the same style as standard database systems do on sets, Array DBMSs offer scalable, flexible storage and flexible retrieval/manipulation on arrays of (conceptually) unlimited size. As in practice arrays never appear standalone, such an array model normally is embedded into some overall data model, such as the relational model. Some systems implement arrays as an analogy to tables, some introduce arrays as an additional attribute type.
Management of arrays requires novel techniques, particularly due to the fact that traditional database tuples and objects tend to fit well into a single database page – a unit of disk access on server, typically 4 KB – while array objects easily can span several media. The prime task of the array storage manager is to give fast access to large arrays and sub-arrays. To this end, arrays get partitioned, during insertion, into so-called tiles or chunks of convenient size which then act as units of access during query evaluation.
Array DBMSs offer query languages giving declarative access to such arrays, allowing to create, manipulate, search, and delete them. Like with, e.g., SQL, expressions of arbitrary complexity can be built on top of a set of core array operations. Due to the extensions made in the data and query model, Array DBMSs sometimes are subsumed under the NoSQL category, in the sense of "not only SQL". Query optimization and parallelization are important for achieving scalability; actually, many array operators lend themselves well towards parallel evaluation, by processing each tile on separate nodes or cores.
Important application domains of Array DBMSs include Earth, Space, Life, and Social sciences, as well as the related commercial applications (such as hydrocarbon exploration in industry and OLAP in business). The variety occurring can be observed, e.g., in geo data where 1-D environmental sensor time series, 2-D satellite images, 3-D x/y/t image time series and x/y/z geophysics data, as well as 4-D x/y/z/t climate and ocean data can be found.
The relational data model, which is prevailing today, does not directly support the array paradigm to the same extent as sets and tuples. ISO SQL lists an array-valued attribute type, but this is only one-dimensional, with almost no operational support, and not usable for the application domains of Array DBMSs. Another option is to resort to BLOBs ("binary large objects") which are the equivalent to files: byte strings of (conceptually) unlimited length, but again without any query language functionality, such as multi-dimensional subsetting.
First significant work in going beyond BLOBs has been established with PICDMS. This system offers the precursor of a 2-D array query language, albeit still procedural and without suitable storage support.
A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by Baumann, together with a scalable architecture. Another array database language, constrained to 2-D, has been presented by Marathe and Salem. Seminal theoretical work has been accomplished by Libkin et al.; in their model, called NCRA, they extend a nested relational calculus with multidimensional arrays; among the results are important contributions on array query complexity analysis. A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al.
Hub AI
Array DBMS AI simulator
(@Array DBMS_simulator)
Array DBMS
An array database management system or array DBMS provides database services specifically for arrays (also called raster data), that is: homogeneous collections of data items (often called pixels, voxels, etc.), sitting on a regular grid of one, two, or more dimensions. Often arrays are used to represent sensor, simulation, image, or statistics data. Such arrays tend to be Big Data, with single objects frequently ranging into Terabyte and soon Petabyte sizes; for example, today's earth and space observation archives typically grow by Terabytes a day. Array databases aim at offering flexible, scalable storage and retrieval on this information category.
In the same style as standard database systems do on sets, Array DBMSs offer scalable, flexible storage and flexible retrieval/manipulation on arrays of (conceptually) unlimited size. As in practice arrays never appear standalone, such an array model normally is embedded into some overall data model, such as the relational model. Some systems implement arrays as an analogy to tables, some introduce arrays as an additional attribute type.
Management of arrays requires novel techniques, particularly due to the fact that traditional database tuples and objects tend to fit well into a single database page – a unit of disk access on server, typically 4 KB – while array objects easily can span several media. The prime task of the array storage manager is to give fast access to large arrays and sub-arrays. To this end, arrays get partitioned, during insertion, into so-called tiles or chunks of convenient size which then act as units of access during query evaluation.
Array DBMSs offer query languages giving declarative access to such arrays, allowing to create, manipulate, search, and delete them. Like with, e.g., SQL, expressions of arbitrary complexity can be built on top of a set of core array operations. Due to the extensions made in the data and query model, Array DBMSs sometimes are subsumed under the NoSQL category, in the sense of "not only SQL". Query optimization and parallelization are important for achieving scalability; actually, many array operators lend themselves well towards parallel evaluation, by processing each tile on separate nodes or cores.
Important application domains of Array DBMSs include Earth, Space, Life, and Social sciences, as well as the related commercial applications (such as hydrocarbon exploration in industry and OLAP in business). The variety occurring can be observed, e.g., in geo data where 1-D environmental sensor time series, 2-D satellite images, 3-D x/y/t image time series and x/y/z geophysics data, as well as 4-D x/y/z/t climate and ocean data can be found.
The relational data model, which is prevailing today, does not directly support the array paradigm to the same extent as sets and tuples. ISO SQL lists an array-valued attribute type, but this is only one-dimensional, with almost no operational support, and not usable for the application domains of Array DBMSs. Another option is to resort to BLOBs ("binary large objects") which are the equivalent to files: byte strings of (conceptually) unlimited length, but again without any query language functionality, such as multi-dimensional subsetting.
First significant work in going beyond BLOBs has been established with PICDMS. This system offers the precursor of a 2-D array query language, albeit still procedural and without suitable storage support.
A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by Baumann, together with a scalable architecture. Another array database language, constrained to 2-D, has been presented by Marathe and Salem. Seminal theoretical work has been accomplished by Libkin et al.; in their model, called NCRA, they extend a nested relational calculus with multidimensional arrays; among the results are important contributions on array query complexity analysis. A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al.
