Hubbry Logo
search
logo

Data orientation

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Data orientation

Data orientation is the representation of tabular data in a linear memory model such as in-disk or in-memory. The two most common representations are column-oriented (columnar format) and row-oriented (row format).

The choice of data orientation is a trade-off and an architectural decision in databases, query engines, and numerical simulations. As a result of these tradeoffs, row-oriented formats are more commonly used in online transaction processing (OLTP) and column-oriented formats are more commonly used in online analytical processing (OLAP).

Examples of column-oriented formats include Apache ORC, Apache Parquet, Apache Arrow, formats used by BigQuery, Amazon Redshift and Snowflake. Predominant examples of row-oriented formats include CSV, formats used in most relational databases (Oracle, MySQL etc.), the in-memory format of Apache Spark, and Apache Avro.

Tabular data is two dimensional — data is modeled as rows and columns. However, computer systems represent data in a linear memory model, both in-disk and in-memory. Therefore, a table in a linear memory model requires mapping its two-dimensional scheme into a one-dimensional space. Data orientation is to the decision taken in this mapping. There are two prominent mappings: row-oriented and column-oriented.

In a row-oriented database, also known as a rowstore, the elements of the table

are stored linearly as

I.e. each row of the table is located one after the other. In this orientation, values in the same row are close in space (e.g. similar address in an addressable space).

In a column-oriented database, also known as a columnstore, the elements of the table

See all
User Avatar
No comments yet.