Hubbry Logo
PostgreSQLPostgreSQLMain
Open search
PostgreSQL
Community hub
PostgreSQL
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
PostgreSQL
PostgreSQL
from Wikipedia

PostgreSQL
DeveloperPostgreSQL Global Development Group[2]
Initial release8 July 1996;
29 years ago
 (1996-07-08)[3]
Stable release
18.0[4] Edit this on Wikidata / 25 September 2025
Repository
Written inC (and C++ for the LLVM dependency)
TypeRDBMS
LicensePostgreSQL License (free and open-source, permissive)[5][6][7]
Websitewww.postgresql.org Edit this on Wikidata
PostgreSQL License[5]
PublisherPostgreSQL Global Development Group
Regents of the University of California
Debian FSG compatibleYes[8][9]
FSF approvedYes[10]
OSI approvedYes[7]
GPL compatibleYes
CopyleftNo
Linking from code with a different licenceYes
Websitepostgresql.org/about/licence

PostgreSQL (/ˌpstɡrɛskjuˈɛl/ POHST-gres-kew-EL)[11][12] also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures.[13] It is supported on all major operating systems, including Windows, Linux, macOS, FreeBSD, and OpenBSD, and handles a range of workloads from single machines to data warehouses, data lakes,[14] or web services with many concurrent users.

The PostgreSQL Global Development Group focuses only on developing a database engine and closely related components. This core is, technically, what comprises PostgreSQL itself, but there is an extensive developer community and ecosystem that provides other important feature sets that might, traditionally, be provided by a proprietary software vendor. These include special-purpose database engine features, like those needed to support a geospatial[15] or temporal[16] database or features which emulate other database products.[17][18][19][20] Also available from third parties are a wide variety of user and machine interface features, such as graphical user interfaces[21][22][23] or load balancing and high availability toolsets.[24] The large third-party PostgreSQL support network of people, companies, products, and projects, even though not part of The PostgreSQL Development Group, are essential to the PostgreSQL database engine's adoption and use and make up the PostgreSQL ecosystem writ large.[25]

PostgreSQL was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley.[26][27] In 1996, the project was renamed PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.[28]

History

[edit]

Ingres and University POSTGRES (1982–1994)

[edit]

PostgreSQL evolved from the Ingres project at the University of California, Berkeley. In 1982, the leader of the Ingres team, Michael Stonebraker, left Berkeley to make a proprietary version of Ingres.[26] He returned to Berkeley in 1985, and began a post-Ingres project to address the problems with contemporary database systems that had become increasingly clear during the early 1980s. He won the Turing Award in 2014 for these and other projects,[29] and techniques pioneered in them.

The new project, POSTGRES, aimed to add the fewest features needed to completely support data types.[30] These features included the ability to define types and to fully describe relationships – something used widely, but maintained entirely by the user. In POSTGRES, the database understood relationships, and could retrieve information in related tables in a natural way using rules. POSTGRES used many of the ideas of Ingres, but not its code.[31]

Starting in 1986, published papers described the basis of the system, and a prototype version was shown at the 1988 ACM SIGMOD Conference. The team released version 1 to a small number of users in June 1989, followed by version 2 with a re-written rules system in June 1990. Version 3, released in 1991, again re-wrote the rules system, and added support for multiple storage managers[32] and an improved query engine. By 1993, the number of users began to overwhelm the project with requests for support and features. After releasing version 4.2[33] on June 30, 1994 – primarily a cleanup – the project ended. Berkeley released POSTGRES under an MIT License variant, which enabled other developers to use the code for any use. At the time, POSTGRES used an Ingres-influenced POSTQUEL query language interpreter, which could be interactively used with a console application named monitor.

Postgres95 (1994–1996)

[edit]

In 1994, Berkeley graduate students Andrew Yu and Jolly Chen replaced the POSTQUEL query language interpreter with one for the SQL query language, creating Postgres95. The monitor console was also replaced by psql. Yu and Chen announced the first version (0.01) to beta testers on May 5, 1995. Version 1.0 of Postgres95 was announced on September 5, 1995, with a more liberal license that enabled the software to be freely modifiable.

On July 8, 1996, Marc Fournier at Hub.org Networking Services provided the first non-university development server for the open-source development effort.[3] With the participation of Bruce Momjian and Vadim B. Mikheev, work began to stabilize the code inherited from Berkeley.

PostgreSQL (1996–present)

[edit]

In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. The online presence at the website PostgreSQL.org began on October 22, 1996.[34] The first PostgreSQL release formed version 6.0 on January 29, 1997. Since then developers and volunteers around the world have maintained the software as The PostgreSQL Global Development Group.[2]

The project continues to make releases available under its free and open-source software PostgreSQL License. Code comes from contributions from proprietary vendors, support companies, and open-source programmers.

Multiversion concurrency control (MVCC)

[edit]

PostgreSQL manages concurrency through multiversion concurrency control (MVCC), which gives each transaction a "snapshot" of the database, allowing changes to be made without affecting other transactions. This largely eliminates the need for read locks, and ensures the database maintains ACID principles. PostgreSQL offers four levels of transaction isolation: Read Uncommitted, Read Committed, Repeatable Read and Serializable. Because PostgreSQL is immune to dirty reads, requesting a Read Uncommitted transaction isolation level provides read committed instead. PostgreSQL supports full serializability via the serializable snapshot isolation (SSI) method.[35] The PostgreSQL MVCC implementation is prone to performance issues that require tuning when under a heavy write load which updates existing rows.[36]

Storage and replication

[edit]

Replication

[edit]

PostgreSQL includes built-in binary replication based on shipping the changes (write-ahead logs (WAL)) to replica nodes asynchronously, with the ability to run read-only queries against these replicated nodes. This allows splitting read traffic among multiple nodes efficiently. Earlier replication software that allowed similar read scaling normally relied on adding replication triggers to the master, increasing load.

PostgreSQL includes built-in synchronous replication[37] that ensures that, for each write transaction, the master waits until at least one replica node has written the data to its transaction log. Unlike other database systems, the durability of a transaction (whether it is asynchronous or synchronous) can be specified per-database, per-user, per-session or even per-transaction. This can be useful for workloads that do not require such guarantees, and may not be wanted for all data as it slows down performance due to the requirement of the confirmation of the transaction reaching the synchronous standby.

Standby servers can be synchronous or asynchronous. Synchronous standby servers can be specified in the configuration which determines which servers are candidates for synchronous replication. The first in the list that is actively streaming will be used as the current synchronous server. When this fails, the system fails over to the next in line.

PostgreSQL's replication can trigger conflicts between the primary and standby servers, particularly whenever tuples are prematurely deleted from the standby server despite ongoing queries accessing them[38]. To address this, PostgreSQL includes a feedback flag[39] where the standby server proactively informs the primary server of any ongoing queries to mitigate this type of conflict.

Synchronous multi-master replication is not included in the PostgreSQL core. Postgres-XC which is based on PostgreSQL provides scalable synchronous multi-master replication.[40] It is licensed under the same license as PostgreSQL. A related project is called Postgres-XL. Postgres-R is yet another fork.[41] Bidirectional replication (BDR) is an asynchronous multi-master replication system for PostgreSQL.[42]

Tools such as repmgr make managing replication clusters easier.

Several asynchronous trigger-based replication packages are available. These remain useful even after introduction of the expanded core abilities, for situations where binary replication of a full database cluster is inappropriate:

Indexes

[edit]

PostgreSQL includes built-in support for regular B-tree and hash table indexes, and four index access methods: generalized search trees (GiST), generalized inverted indexes (GIN), Space-Partitioned GiST (SP-GiST)[44] and Block Range Indexes (BRIN). In addition, user-defined index methods can be created, although this is quite an involved process. Indexes in PostgreSQL also support the following features:

  • Expression indexes can be created with an index of the result of an expression or function, instead of simply the value of a column.
  • Partial indexes, which only index part of a table, can be created by adding a WHERE clause to the end of the CREATE INDEX statement. This allows a smaller index to be created.
  • The planner is able to use multiple indexes together to satisfy complex queries, using temporary in-memory bitmap index operations (useful for data warehouse applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema).
  • k-nearest neighbors (k-NN) indexing (also referred to KNN-GiST[45]) provides efficient searching of "closest values" to that specified, useful to finding similar words, or close objects or locations with geospatial data. This is achieved without exhaustive matching of values.
  • Index-only scans often allow the system to fetch data from indexes without ever having to access the main table.
  • Block Range Indexes (BRIN).

Schemas

[edit]

PostgreSQL schemas are namespaces, allowing objects of the same kind and name to co-exist in a single database. They are not to be confused with a database schema—the abstract, structural, organizational specification which defines how every table's data relates to data within other tables. All PostgreSQL database objects, except for a few global objects such as roles and tablespaces, exist within a schema. They cannot be nested, schemas cannot contain schemas. The permission system controls access to schemas and their content. By default, newly created databases have only a single schema called public but other schemas can be added and the public schema isn't mandatory.

A search_path setting determines the order in which PostgreSQL checks schemas for unqualified objects (those without a prefixed schema). By default, it is set to $user, public ($user refers to the currently connected database user). This default can be set on a database or role level, but as it is a session parameter, it can be freely changed (even multiple times) during a client session, affecting that session only.

Non-existent schemas, or other schemas not accessible to the logged-in user, that are listed in search_path are silently skipped during object lookup.

New objects are created in whichever valid schema (one that can be accessed) appears first in the search_path.

Data types

[edit]

A wide variety of native data types are supported, including:

In addition, users can create their own data types which can usually be made fully indexable via PostgreSQL's indexing infrastructures – GiST, GIN, SP-GiST. Examples of these include the geographic information system (GIS) data types from the PostGIS project for PostgreSQL.

There is also a data type called a domain, which is the same as any other data type but with optional constraints defined by the creator of that domain. This means any data entered into a column using the domain will have to conform to whichever constraints were defined as part of the domain.

A data type that represents a range of data can be used which are called range types. These can be discrete ranges (e.g. all integer values 1 to 10) or continuous ranges (e.g., any time between 10:00 am and 11:00 am). The built-in range types available include ranges of integers, big integers, decimal numbers, time stamps (with and without time zone) and dates.

Custom range types can be created to make new types of ranges available, such as IP address ranges using the inet type as a base, or float ranges using the float data type as a base. Range types support inclusive and exclusive range boundaries using the [] and () characters respectively. (e.g., [4,9) represents all integers starting from and including 4 up to but not including 9.) Range types are also compatible with existing operators used to check for overlap, containment, right of etc.

User-defined objects

[edit]

New types of almost all objects inside the database can be created, including:

  • Casts
  • Conversions
  • Data types
  • Data domains
  • Functions, including aggregate functions and window functions
  • Indexes including custom indexes for custom types
  • Operators (existing ones can be overloaded)
  • Procedural languages

Inheritance

[edit]

Tables can be set to inherit their characteristics from a parent table. Data in child tables will appear to exist in the parent tables, unless data is selected from the parent table using the ONLY keyword, i.e. SELECT * FROM ONLY parent_table;. Adding a column in the parent table will cause that column to appear in the child table.

Inheritance can be used to implement table partitioning, using either triggers or rules to direct inserts to the parent table into the proper child tables.

This feature is not fully supported. In particular, table constraints are not currently inheritable. All check constraints and not-null constraints on a parent table are automatically inherited by its children. Other types of constraints (unique, primary key, and foreign key constraints) are not inherited.

Inheritance provides a way to map the features of generalization hierarchies depicted in entity–relationship diagrams (ERDs) directly into the PostgreSQL database.

Other storage features

[edit]
  • Referential integrity constraints including foreign key constraints, column constraints, and row checks
  • Binary and textual large-object storage
  • Tablespaces
  • Per-column collation
  • Online backup
  • Point-in-time recovery, implemented using write-ahead logging
  • In-place upgrades with pg_upgrade for less downtime

Control and connectivity

[edit]

Foreign data wrappers

[edit]

PostgreSQL can link to other systems to retrieve data via foreign data wrappers (FDWs).[48] These can take the form of any data source, such as a file system, another relational database management system (RDBMS), or a web service. This means that regular database queries can use these data sources like regular tables, and even join multiple data-sources together.

Interfaces

[edit]

PostgreSQL supports a binary communication protocol that allows applications to connect to the database server. The protocol is versioned (currently 3.0, as of PostgreSQL 7.4) and has a detailed specification.[49]

The official client implementation of this communication protocol is a C API, libpq.[50] In addition, the officially supported ECPG tool allows SQL commands to be embedded in C code.[51] Both are part of the standard PostgreSQL distribution.[52]

Third-party libraries for connecting to PostgreSQL are available for many programming languages, including C++,[53] Java,[54] Julia,[55][56][57] Python,[58] Node.js,[59] Go,[60] and Rust.[61]

Procedural languages

[edit]

Procedural languages allow developers to extend the database with custom subroutines (functions), often called stored procedures. These functions can be used to build database triggers (functions invoked on modification of certain data) and custom data types and aggregate functions.[62] Procedural languages can also be invoked without defining a function, using a DO command at SQL level.[63]

Languages are divided into two groups: Procedures written in safe languages are sandboxed and can be safely created and used by any user. Procedures written in unsafe languages can only be created by superusers, because they allow bypassing a database's security restrictions, but can also access sources external to the database. Some languages like Perl provide both safe and unsafe versions.

PostgreSQL has built-in support for three procedural languages:

  • Plain SQL (safe). Simpler SQL functions can get expanded inline into the calling (SQL) query, which saves function call overhead and allows the query optimizer to "see inside" the function.
  • Procedural Language/PostgreSQL (PL/pgSQL) (safe), which resembles Oracle's Procedural Language for SQL (PL/SQL) procedural language and SQL/Persistent Stored Modules (SQL/PSM).
  • C (unsafe), which allows loading one or more custom shared library into the database. Functions written in C offer the best performance, but bugs in code can crash and potentially corrupt the database. Most built-in functions are written in C.

In addition, PostgreSQL allows procedural languages to be loaded into the database through extensions. Three language extensions are included with PostgreSQL to support Perl, Tcl, and Python. For Python, the current Python 3 is used, and the discontinued Python 2 is no longer supported as of PostgreSQL 15. Both were supported previously, defaulting to Python 2, while old and new versions couldn't be used in the same session.[64] External projects provide support for many other languages,[65] including PL/Java, JavaScript (PL/V8), PL/Julia,[57] PL/R,[66] PL/Ruby, and others.

Triggers

[edit]

Triggers are events triggered by the action of SQL data manipulation language (DML) statements. For example, an INSERT statement might activate a trigger that checks if the values of the statement are valid. Most triggers are only activated by either INSERT or UPDATE statements.

Triggers are fully supported and can be attached to tables. Triggers can be per-column and conditional, in that UPDATE triggers can target specific columns of a table, and triggers can be told to execute under a set of conditions as specified in the trigger's WHERE clause. Triggers can be attached to views by using the INSTEAD OF condition. Multiple triggers are fired in alphabetical order. In addition to calling functions written in the native PL/pgSQL, triggers can also invoke functions written in other languages like PL/Python or PL/Perl.

Asynchronous notifications

[edit]

PostgreSQL provides an asynchronous messaging system that is accessed through the NOTIFY, LISTEN and UNLISTEN commands. A session can issue a NOTIFY command, along with the user-specified channel and an optional payload, to mark a particular event occurring. Other sessions are able to detect these events by issuing a LISTEN command, which can listen to a particular channel. This functionality can be used for a wide variety of purposes, such as letting other sessions know when a table has updated or for separate applications to detect when a particular action has been performed. Such a system prevents the need for continuous polling by applications to see if anything has yet changed, and reducing unnecessary overhead. Notifications are fully transactional, in that messages are not sent until the transaction they were sent from is committed. This eliminates the problem of messages being sent for an action being performed which is then rolled back.

Many connectors for PostgreSQL provide support for this notification system (including libpq, JDBC, Npgsql, psycopg and node.js) so it can be used by external applications.

PostgreSQL can act as an effective, persistent "pub/sub" server or job server by combining LISTEN with FOR UPDATE SKIP LOCKED.[67][68][69]

Rules

[edit]

Rules allow the "query tree" of an incoming query to be rewritten; they are an, automatically invoked, macro language for SQL. "Query Re-Write Rules" are attached to a table/class and "Re-Write" the incoming DML (select, insert, update, and/or delete) into one or more queries that either replace the original DML statement or execute in addition to it. Query Re-Write occurs after DML statement parsing and before query planning.

The functionality rules provide was, in almost every way, later duplicated with the introduction of newer types of triggers. The use of triggers is usually preferred over rules as it is easier to reason about trigger behavior and interactions than when equivalent rules are used.

Other querying features

[edit]
  • Transactions
  • Full-text search
  • Views
    • Materialized views[70]
    • Updateable views[71]
    • Recursive views[72]
  • Inner, outer (full, left, and right), and cross joins
  • Sub-selects
    • Correlated sub-queries[73]
  • Regular expressions[74]
  • Common table expressions and writable common table expressions
  • Encrypted connections via Transport Layer Security (TLS); current versions do not use vulnerable SSL, even with that configuration option[75]
  • Domains
  • Savepoints
  • Two-phase commit
  • The Oversized-Attribute Storage Technique (TOAST) is used to transparently store large table attributes (such as big MIME attachments or XML messages) in a separate area, with automatic compression.
  • Embedded SQL is implemented using preprocessor. SQL code is first written embedded into C code. Then code is run through ECPG preprocessor, which replaces SQL with calls to code library. Then code can be compiled using a C compiler. Embedding works also with C++ but it does not recognize all C++ constructs.

Concurrency model

[edit]

PostgreSQL server is process-based (not threaded), and uses one operating system process per database session. Multiple sessions are automatically spread across all available CPUs by the operating system. Many types of queries can also be parallelized across multiple background worker processes, taking advantage of multiple CPUs or cores.[76] Client applications can use threads and create multiple database connections from each thread.[77]

Security

[edit]

PostgreSQL manages its internal security on a per-role basis. A role is generally regarded to be a user (a role that can log in), or a group (a role of which other roles are members). Permissions can be granted or revoked on any object down to the column level, and can allow or prevent the visibility/creation/alteration/deletion of objects at the database, schema, table, and row levels.

PostgreSQL's SECURITY LABEL feature (extension to SQL standards), allows for additional security; with a bundled loadable module that supports label-based mandatory access control (MAC) based on Security-Enhanced Linux (SELinux) security policy.[78][79]

PostgreSQL natively supports a broad number of external authentication mechanisms, including:

The GSSAPI, SSPI, Kerberos, peer, ident and certificate methods can also use a specified "map" file that lists which users matched by that authentication system are allowed to connect as a specific database user.

These methods are specified in the cluster's host-based authentication configuration file (pg_hba.conf), which determines what connections are allowed. This allows control over which user can connect to which database, where they can connect from (IP address, IP address range, domain socket), which authentication system will be enforced, and whether the connection must use Transport Layer Security (TLS).

Standards compliance

[edit]

PostgreSQL claims high, but not complete, conformance with the latest SQL standard ("as of the version 17 release in September 2024, PostgreSQL conforms to at least 170 of the 177 mandatory features for SQL:2023 Core conformance", and no other databases fully conformed to it[81]). One exception is the handling of unquoted identifiers like table or column names. In PostgreSQL they are folded, internally, to lower case characters[82] whereas the standard says that unquoted identifiers should be folded to upper case. Thus, Foo should be equivalent to FOO not foo according to the standard. Other shortcomings concern the absence of temporal tables allowing automatic logging of row versions during transactions with the possibility of browsing in time (FOR SYSTEM TIME predicate),[citation needed] although relatively SQL compliant third-party extensions are available.[16]

Benchmarks and performance

[edit]

Many informal performance studies of PostgreSQL have been done.[83] Performance improvements aimed at improving scalability began heavily with version 8.1. Simple benchmarks between version 8.0 and version 8.4 showed that the latter was more than ten times faster on read-only workloads and at least 7.5 times faster on both read and write workloads.[84]

The first industry-standard and peer-validated benchmark was completed in June 2007, using the Sun Java System Application Server (proprietary version of GlassFish) 9.0 Platform Edition, UltraSPARC T1-based Sun Fire server and PostgreSQL 8.2.[85] This result of 778.14 SPECjAppServer2004 JOPS@Standard compares favourably with the 874 JOPS@Standard with Oracle 10 on an Itanium-based HP-UX system.[83]

In August 2007, Sun submitted an improved benchmark score of 813.73 SPECjAppServer2004 JOPS@Standard. With the system under test at a reduced price, the price/performance improved from $84.98/JOPS to $70.57/JOPS.[86]

The default configuration of PostgreSQL uses only a small amount of dedicated memory for performance-critical purposes such as caching database blocks and sorting. This limitation is primarily because older operating systems required kernel changes to allow allocating large blocks of shared memory.[87] PostgreSQL.org provides advice on basic recommended performance practice in a wiki.[88]

In April 2012, Robert Haas of EnterpriseDB demonstrated PostgreSQL 9.2's linear CPU scalability using a server with 64 cores.[89]

Matloob Khushi performed benchmarking between PostgreSQL 9.0 and MySQL 5.6.15 for their ability to process genomic data. In his performance analysis he found that PostgreSQL extracts overlapping genomic regions eight times faster than MySQL using two datasets of 80,000 each forming random human DNA regions. Insertion and data uploads in PostgreSQL were also better, although general searching ability of both databases was almost equivalent.[90]

Platforms

[edit]

PostgreSQL is available for the following operating systems: Linux (all recent distributions), 64-bit ARM and x86-64 installers available and tested for macOS version 10.14 and newer,[91] Windows (with installers available and tested for 64-bit Windows Server 2022 and 2016[92]), FreeBSD, OpenBSD,[93] NetBSD, DragonFlyBSD, and these without official (though unofficial likely available) binary executables, Solaris,[94] and illumos.

PostgreSQL can be expected to work on any of the following instruction set architectures (and operating systems): 64-bit x86-64 and 32-bit x86 on Windows and other operating systems; these are supported on other than Windows: 64-bit ARM[95] and the older 32-bit ARM, including older such as ARMv6 in Raspberry Pi[96]), RISC-V, z/Architecture, S/390, PowerPC (incl. 64-bit Power ISA), SPARC (also 64-bit), MIPS and PA-RISC. It was also known to work on some other platforms (while not been tested on for years, i.e. for latest versions).[97]

Database administration

[edit]

Open source front-ends and tools for administering PostgreSQL include:

psql Session Example[98]
regression=# select foo;
ERROR:  column "foo" does not exist
CONTEXT:  PL/pgSQL function "test1" while casting return value to function's return type
LINE 1: select foo;
               ^
regression=# \q
peter@localhost testdb=> \a \t \x
Output format is aligned.
Tuples only is off.
Expanded display is on.
regression=# select '\x';
WARNING:  nonstandard use of escape in a string literal
LINE 1: select '\x';
               ^
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.
 ?column? 
----------
 x
(1 row)

regression=# select E'\x';
piro=> \set foo 30;
piro=> select * from test where foo <= :foo;
 foo | bar 
-----+-----
  10 | 
  20 | 
(2 rows)
testdb=> \set foo 'my_table'
testdb=> SELECT * FROM :"foo";

testdb=> \set content `cat my_file.txt`
testdb=> INSERT INTO my_table VALUES (:'content');
regression=# select (
regression(# 1);
 ?column? 
----------
        1
(1 row)
piro=> select (
piro(> '
piro'> ' || $$
piro$> $$)
piro-> from "
piro"> foo";
ERROR:  relation "
foo" does not exist
LINE 5: from "
             ^
testdb=> CREATE TABLE my_table (
first integer not null default 0,
second text) ; -- end of command
CREATE TABLE

=# SELECT '0x10'::mpz AS "hex", '10'::mpz AS "dec",
-#        '010'::mpz AS oct, '0b10'::mpz AS bin; -- Table output
 hex | dec | oct | bin
-----+-----+-----+-----
 16  | 10  | 8   | 2
(1 row)
regression=# select schemaname from  pg_tables limit 3; -- One field output
 schemaname 
------------
 pg_catalog
 pg_catalog
 pg_catalog
(3 rows)

=# select 10.0, 1e-6, 1E+6;
 ?column? | ?column? | ?column? 
----------+----------+----------
     10.0 | 0.000001 |  1000000
(1 row)

regression=# begin;
BEGIN
regression=# create table asdf (foo serial primary key);
NOTICE:  CREATE TABLE will create implicit sequence "asdf_foo_seq" for serial column "asdf.foo"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "asdf_pkey" for table "asdf"
CREATE TABLE
regression=# insert into asdf values (10) returning foo;
 foo 
-----
  10
(1 row)

INSERT 0 1
regression=# ROLLBACK ;
ROLLBACK
psql
The primary front-end for PostgreSQL is the psql command-line program, which can be used to enter SQL queries directly, or execute them from a file. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks; for example tab completion of object names and SQL syntax.
pgAdmin
The pgAdmin package is a free and open-source graphical user interface (GUI) administration tool for PostgreSQL, which is supported on many computer platforms.[99] The program is available in more than a dozen languages. The first prototype, named pgManager, was written for PostgreSQL 6.3.2 from 1998, and rewritten and released as pgAdmin under the GNU General Public License (GPL) in later months. The second incarnation (named pgAdmin II) was a complete rewrite, first released on January 16, 2002. The third version, pgAdmin III, was originally released under the Artistic License and then released under the same license as PostgreSQL. Unlike prior versions that were written in Visual Basic, pgAdmin III is written in C++, using the wxWidgets[100] framework allowing it to run on most common operating systems. The query tool includes a scripting language called pgScript for supporting admin and development tasks. In December 2014, Dave Page, the pgAdmin project founder and primary developer,[101] announced that with the shift towards web-based models, work has begun on pgAdmin 4 with the aim to facilitate cloud deployments.[102] In 2016, pgAdmin 4 was released. The pgAdmin 4 backend was written in Python, using Flask and the Qt framework.[103]
phpPgAdmin
phpPgAdmin is a web-based administration tool for PostgreSQL written in PHP and based on the popular phpMyAdmin interface originally written for MySQL administration.[104]
PostgreSQL Studio
PostgreSQL Studio allows users to perform essential PostgreSQL database development tasks from a web-based console. PostgreSQL Studio allows users to work with cloud databases without the need to open firewalls.[105]
TeamPostgreSQL
AJAX/JavaScript-driven web interface for PostgreSQL. Allows browsing, maintaining and creating data and database objects via a web browser. The interface offers tabbed SQL editor with autocompletion, row editing widgets, click-through foreign key navigation between rows and tables, favorites management for commonly used scripts, among other features. Supports SSH for both the web interface and the database connections. Installers are available for Windows, Macintosh, and Linux, and a simple cross-platform archive that runs from a script.[106]
LibreOffice, OpenOffice.org
LibreOffice and OpenOffice.org Base can be used as a front-end for PostgreSQL.[107][108]
pgBadger
The pgBadger PostgreSQL log analyzer generates detailed reports from a PostgreSQL log file.[109]
pgDevOps
pgDevOps is a suite of web tools to install & manage multiple PostgreSQL versions, extensions, and community components, develop SQL queries, monitor running databases and find performance problems.[110]
Adminer
Adminer is a simple web-based administration tool for PostgreSQL and others, written in PHP.
pgBackRest
pgBackRest is a backup and restore tool for PostgreSQL that provides support for full, differential, and incremental backups.[111]
pgaudit
pgaudit is a PostgreSQL extension that provides detailed session and/or object audit logging via the standard logging facility provided by PostgreSQL.[112]
WAL-E
WAL-E is a backup and restore tool for PostgreSQL that provides support for physical (WAL-based) backups, written in Python.[113]
DBeaver
DBeaver is a free and open source GUI administration tool for PostgreSQL, it has Visual Entity Diagrams and Intellisense features. It also has a commercial PRO license.

A number of companies offer proprietary tools for PostgreSQL. They often consist of a universal core that is adapted for various specific database products. These tools mostly share the administration features with the open source tools but offer improvements in data modeling, importing, exporting or reporting.

Notable users

[edit]

Notable organizations and products that use PostgreSQL as the primary database include:

Service implementations

[edit]

Some notable vendors offer PostgreSQL as software as a service:

  • Heroku, a platform as a service provider, has supported PostgreSQL since the start in 2007.[140] They offer value-add features like full database roll-back (ability to restore a database from any specified time),[141] which is based on WAL-E, open-source software developed by Heroku.[142]
  • In January 2012, EnterpriseDB released a cloud version of both PostgreSQL and their own proprietary Postgres Plus Advanced Server with automated provisioning for failover, replication, load-balancing, and scaling. It runs on Amazon Web Services.[143] Since 2015, Postgres Advanced Server has been offered as ApsaraDB for PPAS, a relational database as a service on Alibaba Cloud.[144]
  • VMware has offered vFabric Postgres (also termed vPostgres[145]) for private clouds on VMware vSphere since May 2012.[146] The company announced End of Availability (EOA) of the product in 2014.[147]
  • In November 2013, Amazon Web Services announced the addition of PostgreSQL to their Relational Database Service offering.[148][149]
  • In November 2016, Amazon Web Services announced the addition of PostgreSQL compatibility to their cloud-native Amazon Aurora managed database offering.[150]
  • In May 2017, Microsoft Azure announced Azure Databases for PostgreSQL.[151]
  • In May 2019, Alibaba Cloud announced PolarDB for PostgreSQL.[152]
  • Jelastic Multicloud Platform as a Service has provided container-based PostgreSQL support since 2011. It also offers automated asynchronous master-slave replication of PostgreSQL.[153]
  • In June 2019, IBM Cloud announced IBM Cloud Hyper Protect DBaaS for PostgreSQL.[154]
  • In September 2020, Crunchy Data announced Crunchy Bridge.[155]
  • In June 2022, Neon.tech announced Neon Serverless Postgres.[156]
  • In December 2022, Google Cloud Platform announced general availability of AlloyDB as fully managed PostgreSQL cloud service.[157]
  • In October 2023, Nile announced Nile Postgres Platform.[158]
  • In April 2024, Google Cloud Platform announced general availability of AlloyDB Omni, a downloadable version of AlloyDB designed to run on any infrastructure, including on-premises, other clouds, or edge environments.[159]

Release history

[edit]
Release history
Release First release Latest minor version Latest release End of
life[160]
Milestones
6.0 1997-01-29 First formal release of PostgreSQL, unique indexes, pg_dumpall utility, ident authentication
6.1 1997-06-08 Unsupported: 6.1.1 1997-07-22 Multicolumn indexes, sequences, money data type, GEQO (GEnetic Query Optimizer)
6.2 1997-10-02 Unsupported: 6.2.1 1997-10-17 JDBC interface, triggers, server programming interface, constraints
6.3 1998-03-01 Unsupported: 6.3.2 1998-04-07 2003-03-01 SQL-92 subselect ability, PL/pgTCL
6.4 1998-10-30 Unsupported: 6.4.2 1998-12-20 2003-10-30 VIEWs (then only read-only) and RULEs, PL/pgSQL
6.5 1999-06-09 Unsupported: 6.5.3 1999-10-13 2004-06-09 MVCC, temporary tables, more SQL statement support (CASE, INTERSECT, and EXCEPT)
7.0 2000-05-08 Unsupported: 7.0.3 2000-11-11 2004-05-08 Foreign keys, SQL-92 syntax for joins
7.1 2001-04-13 Unsupported: 7.1.3 2001-08-15 2006-04-13 Write-ahead log, outer joins
7.2 2002-02-04 Unsupported: 7.2.8 2005-05-09 2007-02-04 PL/Python, OIDs no longer required, internationalization of messages
7.3 2002-11-27 Unsupported: 7.3.21 2008-01-07 2007-11-27 Schema, table function, prepared query[161]
7.4 2003-11-17 Unsupported: 7.4.30 2010-10-04 2010-10-01 Optimization on JOINs and data warehouse functions[162]
8.0 2005-01-19 Unsupported: 8.0.26 2010-10-04 2010-10-01 Native server on Microsoft Windows, savepoints, tablespaces, point-in-time recovery[163]
8.1 2005-11-08 Unsupported: 8.1.23 2010-12-16 2010-11-08 Performance optimization, two-phase commit, table partitioning, index bitmap scan, shared row locking, roles
8.2 2006-12-05 Unsupported: 8.2.23 2011-12-05 2011-12-05 Performance optimization, online index builds, advisory locks, warm standby[164]
8.3 2008-02-04 Unsupported: 8.3.23 2013-02-07 2013-02-07 Heap-only tuples, full text search,[165] SQL/XML, ENUM types, UUID types
8.4 2009-07-01 Unsupported: 8.4.22 2014-07-24 2014-07-24 Window functions, column-level permissions, parallel database restore, per-database collation, common table expressions and recursive queries[166]
9.0 2010-09-20 Unsupported: 9.0.23 2015-10-08 2015-10-08 Built-in binary streaming replication, hot standby, in-place upgrade ability, 64-bit Windows[167]
9.1 2011-09-12 Unsupported: 9.1.24 2016-10-27 2016-10-27 Synchronous replication, per-column collations, unlogged tables, serializable snapshot isolation, writeable common table expressions, SELinux integration, extensions, foreign tables[168]
9.2 2012-09-10[169] Unsupported: 9.2.24 2017-11-09 2017-11-09 Cascading streaming replication, index-only scans, native JSON support, improved lock management, range types, pg_receivexlog tool, space-partitioned GiST indexes
9.3 2013-09-09 Unsupported: 9.3.25 2018-11-08 2018-11-08 Custom background workers, data checksums, dedicated JSON operators, LATERAL JOIN, faster pg_dump, new pg_isready server monitoring tool, trigger features, view features, writeable foreign tables, materialized views, replication improvements
9.4 2014-12-18 Unsupported: 9.4.26 2020-02-13 2020-02-13 JSONB data type, ALTER SYSTEM statement for changing config values, ability to refresh materialized views without blocking reads, dynamic registration/start/stop of background worker processes, Logical Decoding API, GiN index improvements, Linux huge page support, database cache reloading via pg_prewarm, reintroducing Hstore as the column type of choice for document-style data.[170]
9.5 2016-01-07 Unsupported: 9.5.25 2021-02-11 2021-02-11 UPSERT, row level security, TABLESAMPLE, CUBE/ROLLUP, GROUPING SETS, and new BRIN index[171]
9.6 2016-09-29 Unsupported: 9.6.24 2021-11-11 2021-11-11 Parallel query support, PostgreSQL foreign data wrapper (FDW) improvements with sort/join pushdown, multiple synchronous standbys, faster vacuuming of large table
10 2017-10-05 Unsupported: 10.23 2022-11-10 2022-11-10 Logical replication,[172] declarative table partitioning, improved query parallelism
11 2018-10-18 Unsupported: 11.22 2023-11-09 2023-11-09 Increased robustness and performance for partitioning, transactions supported in stored procedures, enhanced abilities for query parallelism, just-in-time (JIT) compiling for expressions[173][174]
12 2019-10-03 Unsupported: 12.22 2024-11-21 2024-11-21 Improvements to query performance and space utilization; SQL/JSON path expression support; generated columns; improvements to internationalization, and authentication; new pluggable table storage interface.[175]
13 2020-09-24 Supported: 13.22 2025-08-14 2025-11-13 Space savings and performance gains from de-duplication of B-tree index entries, improved performance for queries that use aggregates or partitioned tables, better query planning when using extended statistics, parallelized vacuuming of indexes, incremental sorting[176][177]
14 2021-09-30 Supported: 14.19 2025-08-14 2026-11-12 Added SQL-standard SEARCH and CYCLE clauses for common table expressions, allow DISTINCT to be added to GROUP BY[178][179]
15 2022-10-13 Supported: 15.14 2025-08-14 2027-11-11 Implements SQL-standard MERGE statement. PL/Python now only supports current Python 3, and plpythonu now means Python 3, no longer the discontinued Python 2.
16 2023-09-14 Supported: 16.10 2025-08-14 2028-11-09 Improvements to logical replication, pg_stat_io view (for I/O metrics)[180]
17 2024-09-26 Supported: 17.6 2025-08-14 2029-11-08 Performance boosts to the vacuum process, I/O layer, and query execution, expanding JSON functionality, more features to MERGE and improving COPY; enhances logical replication for high availability and upgrades, improvements to security, operations, monitoring, and analysis.[181]
18 2025-09-25 Latest version: 18.0 2025-09-25 2030-11-14 New I/O subsystem and asynchronous I/O enhancements..[182]
Legend:
Unsupported
Supported
Latest version
Preview version
Future version

Ecosystem and Derivatives

[edit]

Due to its permissive open-source license and extensible architecture, a broad ecosystem has developed around PostgreSQL. This includes numerous companies offering dedicated support and hosting, as well as several forks and derivative databases that adapt PostgreSQL for specific workloads. Notable derivatives include:

  • Greenplum Database: A massively parallel processing (MPP) data warehouse based on an older version of PostgreSQL, designed for large-scale analytics[183].
  • TimescaleDB: A time-series database delivered as a PostgreSQL extension, optimized for handling fast ingest and complex queries of time-series data[184].
  • Amazon Aurora: A cloud-native relational database offered by Amazon Web Services that provides a PostgreSQL-compatible edition[185].
  • Neon: An open-source, serverless implementation of PostgreSQL that separates storage and compute to offer modern development features like database branching[186].

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
PostgreSQL is a powerful, object-relational database management system (ORDBMS) that uses and extends the SQL language combined with many features that safely store and scale complex data workloads. It originated in 1986 as the POSTGRES project at the , led by and sponsored by organizations including , ARO, NSF, and ESL, Inc., with the first operational "demoware" system demonstrated at the 1988 ACM-SIGMOD Conference. In 1994, developers Andrew Yu and Jolly Chen added an SQL interpreter, leading to the release of Postgres95, which replaced the original PostQUEL with SQL and introduced tools like the psql interactive terminal. The project was renamed PostgreSQL in 1996 with version 6.0, shifting focus toward feature enhancements while maintaining its commitment to robustness and standards compliance. Developed by a global community under the PostgreSQL Global Development Group, PostgreSQL has undergone nearly 40 years of active, volunteer-driven evolution, with major releases occurring annually. The latest version, PostgreSQL 18, was released on September 25, 2025, introducing improvements such as a new version (3.2) of the wire protocol—the first update since 2003—and enhanced performance for logical replication and query optimization. Licensed under the permissive PostgreSQL License (a variant similar to the BSD or MIT licenses), it is , allowing unrestricted use, modification, and distribution. PostgreSQL emphasizes (Atomicity, Consistency, Isolation, Durability) compliance, achieved since 2001, ensuring reliable even under high concurrency. Key features include support for advanced data types such as /JSONB for and geospatial data via extensions like , alongside robust indexing methods (e.g., , GiST, ), full-text search, and row-level security for fine-grained . Its extensibility allows users to define custom functions, data types, operators, and procedural languages like , making it highly adaptable for applications ranging from web services to scientific computing. PostgreSQL conforms to at least 170 of the 177 mandatory features in the SQL:2023 Core standard, positioning it as one of the most standards-compliant relational databases available. With built-in replication, partitioning, and parallel query execution, it scales effectively for enterprise environments while remaining suitable for embedded and mobile deployments.

History and Development

Origins and Early History

The PostgreSQL database management system traces its origins to the POSTGRES project, initiated in 1986 at the , under the leadership of Professor . Sponsored by the , the Army Research Office (ARO), the , and ESL, Inc., the project aimed to extend the model beyond the capabilities of its predecessor, the Ingres system, which Stonebraker had also led in the 1970s. Drawing on Ingres' foundational relational architecture, POSTGRES introduced innovations such as abstract data types (ADTs), complex objects, and rule-based query rewriting to support emerging needs like tools and spatial data handling. Key milestones marked the project's early progress. Development began with a prototype in 1986, followed by the first operational "demoware" system in 1987, which was demonstrated at the 1988 ACM-SIGMOD Conference. The initial public release, Version 1, occurred in June 1989 and was distributed to a limited number of external users, featuring the PostQUEL query language inspired by Ingres. Subsequent versions built on this foundation: Version 2 in June 1990 introduced a redesigned rule system, while Version 3 in 1991 added support for multiple storage managers and an enhanced query executor. A major rewrite came in 1994, when graduate students Andrew Yu and Jolly Chen integrated an SQL interpreter, replacing PostQUEL and resulting in the Postgres95 release, which improved performance by 30–50% over prior versions. Early development faced significant challenges, including reliance on academic funding, which constrained resources and scalability. The growing maintenance burden from user support further strained the team, leading to the official end of the Berkeley POSTGRES project with Version 4.2 in 1994. To facilitate broader adoption, the code was released under a permissive —initially a variant of the for Version 4.2—transitioning from academic distribution to community-driven development. In 1996, the project was renamed PostgreSQL to reflect its SQL compatibility, with version numbering restarting at 6.0, marking the shift to an independent open-source effort. Following its open-source release and renaming to PostgreSQL in 1996, the database began gaining adoption in web development stacks during the late 1990s and 2000s. It emerged as an alternative to MySQL within the popular LAMP stack (Linux, Apache, MySQL, PHP/Perl/Python), leading to the creation of the LAPP stack (Linux, Apache, PostgreSQL, PHP/Perl/Python). While MySQL was favored for its simplicity and performance in basic web applications, PostgreSQL offered advanced features such as superior SQL standards compliance, extensibility, and support for complex queries and data types, making it suitable for more demanding applications.

Release History

The PostgreSQL Global Development Group oversees the project's development through a community-driven process that includes quarterly commit fests, where contributors submit, review, and refine patches for upcoming major releases. Major versions are released annually, typically in September, following a nine-month development cycle that culminates in beta testing and final stabilization. The first major public release, PostgreSQL 6.0, arrived on January 29, 1997, marking the system's transition to full SQL compliance and renaming from Postgres95 to PostgreSQL. Subsequent releases built incrementally on this foundation; for instance, version 7.0, released May 8, 2000, introduced significant improvements to Multi-Version Concurrency Control (MVCC), enhancing transaction isolation and performance. In 2005, PostgreSQL 8.1, released November 8, brought autovacuum into the core system, automating table maintenance to prevent transaction ID wraparound without manual intervention. A pivotal advancement came with PostgreSQL 9.0 on September 20, 2010, which added built-in streaming replication for high-availability setups, allowing continuous between primary and standby servers. More recent versions have focused on replication and maintenance efficiencies: PostgreSQL 16, released September 14, 2023, enhanced logical decoding by enabling it on standby servers, supporting active-active configurations and reducing primary load. PostgreSQL 17, released September 26, 2024, improved operations with a new memory structure that reduces consumption by up to 20 times, accelerating cleanup and enabling better handling of large datasets. The latest major release, PostgreSQL 18 on September 25, 2025, introduced performance optimizations including for sequential scans and vacuums, parallel GIN index builds for faster JSONB querying, and expanded parallelism in query execution. PostgreSQL follows a consistent support lifecycle: each major version receives five years of full support with regular minor releases for bug fixes and security updates, followed by a final minor release before end-of-life (EOL). For example, version 13, released September 24, 2020, reaches EOL in November 2025, after which no further updates will be provided. This policy ensures stability for production environments while encouraging timely upgrades.

Core Architecture

Storage Model

PostgreSQL employs a physical storage model based on fixed-size pages, typically 8 kB, to organize database files on disk. This page-based approach allows efficient and management of , with each database cluster stored in a directory containing subdirectories for per-database files, global objects, tablespaces, and write-ahead log (WAL) segments. Tables and indexes are represented as heap files, where is stored in segments that split when exceeding 1 GB, ensuring scalability for large relations. The core of heap file organization consists of pages filled with tuples (rows), each preceded by item pointers that reference their location and length within the page. A page header tracks free space, the number of items, WAL-related metadata, and a for (when enabled, which is the default for new clusters since PostgreSQL 18), while line pointers (ItemIdData structures) form an array pointing to tuple positions, enabling compact storage without fixed offsets. Heap tuples include a header with transaction identifiers for visibility (supporting , as detailed in the section), a null bitmap if needed, and the user data payload. To handle oversized attributes that could exceed page limits, PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique), which compresses or externalizes large varlena data types into a separate TOAST table, storing chunks up to approximately 2 kB each, with a total limit of 1 GB per value. For efficient space management, PostgreSQL maintains a Free Space Map (FSM) for each heap and index relation, stored in a dedicated fork file with a _fsm suffix. The FSM is structured as a three-level tree of pages, where leaf nodes use one byte per heap page to encode available free space (categorized as full, mostly full, mostly empty, or empty), and upper levels aggregate the maximum values from children to quickly locate insertion points. Complementing this, the Visibility Map (VM), stored in a _vm fork, uses two bits per heap page to indicate if all tuples are visible to every transaction (all-visible bit) or fully frozen against transaction ID wraparound (all-frozen bit), optimizing vacuum operations and enabling index-only scans by skipping unnecessary heap fetches. Tablespaces provide a logical abstraction for mapping database objects to physical storage locations, allowing administrators to distribute files across disks for performance or capacity reasons. Created via the CREATE TABLESPACE command, they point to user-specified directories (e.g., SSDs for high-I/O tables), with default tablespaces pg_default and pg_global handling user databases and shared catalogs, respectively. Objects like tables can be assigned to specific tablespaces during creation, facilitating I/O optimization without altering the underlying page-based layout. Write-Ahead Logging (WAL) integrates deeply with the storage model to ensure ACID properties, particularly durability, by recording all changes sequentially in pre-allocated 16 MB segment files before applying them to heap or index pages. This approach minimizes random I/O during transactions, as WAL writes are append-only and flushed durably, deferring data page updates until checkpoints. Checkpoints, triggered periodically (default every 5 minutes via checkpoint_timeout) or when WAL reaches max_wal_size (default 1 GB), flush dirty pages to disk and record the redo point in WAL, spreading I/O load over time with checkpoint_completion_target set to 0.9 by default. In crash recovery, the system replays WAL from the last checkpoint's redo location, reapplying committed changes to reconstruct the database state, with segment recycling preventing unbounded growth. This WAL-driven recovery mechanism supports efficient restarts, typically completing in seconds to minutes depending on the interval since the last checkpoint.

Concurrency Control

PostgreSQL employs (MVCC) as its primary mechanism for managing concurrent access to , allowing multiple transactions to read and write simultaneously without blocking each other for read operations. In this model, each row in a table maintains multiple versions rather than overwriting the original , enabling readers to access a consistent snapshot of the database without acquiring locks on the being read. This approach contrasts with traditional locking systems by prioritizing non-blocking reads, which enhances performance in read-heavy workloads while ensuring the properties of transactions. At the core of MVCC is row versioning, where every row includes two key fields: xmin, the transaction ID (XID) of the transaction that inserted or last updated the row, and xmax, the XID of the transaction that deleted or updated it. When a transaction begins, it receives a snapshot consisting of the current active transaction IDs and the IDs of recently committed transactions, which determines row visibility based on comparisons with xmin and xmax. For instance, a row is visible to a transaction if its xmin is committed before the snapshot and its xmax is either absent or from a transaction after the snapshot. This snapshot isolation ensures that each transaction operates on a consistent view of the database as of its start time, preventing dirty reads and allowing writers to proceed without waiting for readers. PostgreSQL supports three transaction isolation levels—Read Committed, Repeatable Read, and Serializable—that build upon MVCC to control the degree of consistency and anomaly prevention. The default Read Committed level uses MVCC to ensure each SQL statement sees only committed data from before the query starts, but it permits nonrepeatable reads and phantom reads within a transaction since each statement gets a fresh snapshot. Repeatable Read, achieved through a single snapshot taken at the transaction's first non-transaction-control statement, provides snapshot isolation that blocks dirty and nonrepeatable reads but may allow anomalies, requiring application-level retries in some cases. Serializable offers the highest isolation by emulating serial transaction execution, using MVCC combined with predicate locking to detect and prevent all concurrency anomalies, including phantoms; it tracks access dependencies without traditional table locks and signals failures via errors when conflicts arise. To maintain storage efficiency and prevent issues from accumulated row versions, PostgreSQL relies on the vacuuming process, which reclaims space from dead tuples—obsolete row versions no longer visible to any transaction. The autovacuum daemon, enabled by default, automates this by launching worker processes that periodically scan tables, mark dead tuples for reuse, and update query planner statistics via ANALYZE. A critical aspect of vacuuming is preventing transaction ID wraparound, where XIDs, being 32-bit values, could cycle and cause ; autovacuum enforces freezing of old rows by assigning them a special FrozenTransactionId during regular vacuums, triggered when a table's oldest unfrozen XID approaches the threshold (default 200 million transactions). Failure to perform timely anti-wraparound vacuuming can lead to very old unfrozen tuples, resulting in errors such as "found xmin from before relfrozenxid" (indicating tuples with xmin older than the relation's relfrozenxid) and "MultiXactId has not been created yet -- apparent wraparound" (signaling MultiXact ID issues), which signal transaction ID wraparound problems and potential database inconsistency. Since PostgreSQL 18, normal vacuums can also eagerly freeze some all-visible pages to proactively manage freezing, controlled by the vacuum_max_eager_freeze_failure_rate parameter (default 0.05), further optimizing against wraparound issues. This process is essential to MVCC, as it balances concurrency benefits with disk space management without interrupting ongoing transactions. For write operations, PostgreSQL uses its lock manager to coordinate access, acquiring locks such as ROW EXCLUSIVE for INSERT, UPDATE, and DELETE, while leveraging page-level share and exclusive locks to control read/write access to pages in the shared buffer pool. Notably, SELECT statements do not require reader-writer locks, allowing them to proceed concurrently with writes via MVCC visibility rules, though writes may contend for shared buffers during page modifications. The lock manager maintains these in memory, supporting fine-grained row-level locking to minimize contention. Deadlock detection is handled automatically by the lock manager, which periodically checks for cycles in the wait graph during lock acquisition attempts, a process configurable via the deadlock_timeout parameter (default 1 second). Upon detecting a deadlock—such as two transactions waiting on each other's row locks in a —PostgreSQL resolves it by aborting one of the involved transactions, rolling it back and releasing its locks to allow the others to proceed; the choice of which transaction to abort is not deterministic and based on minimal cost estimation. Applications should implement retry logic for aborted transactions to handle such resolutions gracefully.

Data Definition

Data Types

PostgreSQL provides a rich set of built-in data types that extend beyond standard SQL, supporting both primitive and advanced structures for versatile data storage and manipulation. These types are designed to handle a wide range of applications, from simple numerical computations to complex geospatial and semi-structured data processing. Among the primitive types, PostgreSQL offers robust numeric support, including fixed-precision integers such as smallint (2 bytes, range -32,768 to 32,767), integer (4 bytes, range -2,147,483,648 to 2,147,483,647), and bigint (8 bytes, range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807). For floating-point numbers, it includes real (single precision, approximately 6 decimal digits) and double precision (double precision, approximately 15 decimal digits), both of which can represent special values like NaN (Not a Number) and infinity. Exact decimal arithmetic is handled by the numeric type, which allows user-defined precision and scale (e.g., numeric(10,2) for 10 total digits with 2 after the decimal point). Character types encompass character(n) for fixed-length strings padded with spaces, varchar(n) for variable-length strings up to n characters, and text for unlimited variable-length strings. Date and time types include timestamp for storing date and time values (with or without time zone) and interval for representing time spans, such as '1 day 2 hours'. Advanced types in PostgreSQL enable handling of specialized data. Geometric types support spatial data, with point storing x,y coordinates (e.g., '(1,2)') and circle representing a center point and radius (e.g., '((0,0),5)'). Network types include inet for IP addresses (e.g., '192.168.1.0') and cidr for IP networks (e.g., '192.168.1.0/24'). For semi-structured data, JSON types consist of json (textual storage) and jsonb (binary format for efficient processing and indexing support). Additionally, PostgreSQL includes uuid for universally unique identifiers (e.g., '550e8400-e29b-41d4-a716-446655440000') and xml for XML documents. PostgreSQL supports array types for collections of the same element type, including multidimensional arrays such as integer[][] for a 2D array. For example, a 2x3 array can be declared as integer[2][3] and initialized with values like {{1,2,3},{4,5,6}}. Arrays can be of any built-in or user-defined type and support operations like indexing and slicing. A key strength of PostgreSQL is its extensibility for custom data types via the CREATE TYPE command. This allows creation of composite types (e.g., CREATE TYPE complex AS (r float8, i float8); for a pair of real and imaginary components), domain types (constraints on existing types), and enum types (ordered sets of values, e.g., CREATE TYPE mood AS ENUM ('sad', '[ok](/page/OK)', 'happy');). Custom types can define associated operators and functions to integrate seamlessly with the query engine.

Schemas and User-Defined Objects

In PostgreSQL, serve as namespaces that organize database objects such as tables, views, sequences, and functions into logical groups, enabling multiple users to share a database without naming conflicts. Each schema acts as a container where objects must have unique names within it, but the same name can be reused across different schemas, facilitating modular organization and isolation for applications or users. Schemas are created using the CREATE SCHEMA command, which can specify an owner via the clause, and by default, every new database includes a schema accessible to all users. The search_path parameter determines the order in which PostgreSQL looks for unqualified object names, defaulting to the current user's personal followed by , allowing flexible resolution without always prefixing schema names. For instance, setting search_path to a custom followed by ensures that objects in the custom schema take precedence. Privileges on schemas include USAGE, which grants access to contained objects, and CREATE, which allows new objects to be added; the schema initially grants these to all users, though this can be revoked for . Tables in PostgreSQL are the primary structures for storing relational data and are defined using the CREATE TABLE command, specifying columns with their data types and optional constraints. Constraints ensure : a constraint enforces uniqueness and non-null values on one or more columns, automatically creating a unique index; a constraint links a column to a in another table, maintaining with options like ON DELETE CASCADE; and check constraints validate column values against a , such as ensuring a price exceeds zero. Tables can be created within specific by qualifying the name, defaulting to the search_path's first schema if unspecified. Views provide a virtual representation of query results without storing data physically, defined via CREATE VIEW as a SELECT query that executes on each reference, supporting updatability for simple cases like single-table queries without aggregates. Materialized views, in contrast, store query results in a table-like structure for faster access, created with CREATE MATERIALIZED VIEW and requiring explicit refresh via REFRESH MATERIALIZED VIEW to update contents, making them suitable for performance optimization in read-heavy scenarios. Both views and materialized views inherit privileges similar to tables, with SELECT allowing reads and additional privileges like INSERT for updatable views. Sequences generate unique, sequential numbers, often for primary keys, implemented as special single-row tables managed by functions like nextval and currval. The SERIAL pseudo-type simplifies auto-increment by creating an column with a default from an owned , available in smallserial (2-byte), serial (4-byte), and bigserial (8-byte) variants, though sequences may have gaps due to transaction rollbacks. For SQL standard compliance, GENERATED AS IDENTITY creates an identity column tied to an implicit , with ALWAYS mode preventing overrides and BY DEFAULT allowing manual inserts, ensuring automatic NOT NULL behavior but requiring explicit uniqueness constraints. Sequence privileges include USAGE for access and UPDATE for advancing the value. Table inheritance allows a child table to inherit columns from one or more parent tables, creating hierarchies for modeling related data, such as a cities table inheriting from a general locations table. Single inheritance links to one parent, while combines columns from several, with column name conflicts resolved by the last parent in the list; queries on the parent by default include child data unless ONLY is specified. This feature supports partitioning-like structures but does not propagate indexes, unique constraints, or foreign keys to children, and INSERT operations do not automatically route to appropriate children. PostgreSQL tracks dependencies between objects to prevent invalidation, such as refusing to drop a table referenced by a unless CASCADE is used, which recursively removes dependents while preserving the core object. Privileges on user-defined objects are granular: tables support SELECT, INSERT, UPDATE, DELETE, and REFERENCES, while sequences and views align with their usage patterns, all managed via GRANT and REVOKE to control access beyond schema-level permissions. The pg_class system catalog stores metadata for these objects, including tables, views, sequences, and materialized views, with details like relation kind (relkind), (relnamespace), and estimated row count (reltuples), serving as the foundational reference for querying object properties.

Query Processing

Standards Compliance

PostgreSQL strives for high conformance with the SQL standard, targeting the latest iteration, SQL:2023, where it implements approximately 170 of the 177 mandatory features in the Core specification. No database management system currently claims full conformance to the SQL:2023 Core, but PostgreSQL covers significant portions of key standard parts, including Part 1 (Framework), Part 2 (Foundation), Part 9 (Management of External Data), Part 11 (Schemata), and Part 14 (XML). This level of support ensures robust handling of core and semantics, such as data definition, manipulation, and query operations, while providing a portable subset of SQL that works across compliant systems. Among advanced features, PostgreSQL offers full support for Common Table Expressions (CTEs) as defined in SQL:1999, enabling modular query construction, and recursive queries as defined in SQL:1999 for hierarchical data traversal. Window functions, introduced in SQL:2003 and enhanced in later standards, are comprehensively implemented, including advanced frame specification features such as the GROUPS frame unit (defining frames based on groups of peer rows with equivalent ORDER BY values) and frame exclusion clauses (EXCLUDE CURRENT ROW, EXCLUDE TIES, EXCLUDE GROUP, EXCLUDE NO OTHERS), which allow fine-grained control over which rows are included in the window frame. These features are absent in MySQL versions 8.0 through 8.4, which support only ROWS and RANGE frame units without exclusion options. However, support for some optional advanced features remains partial, such as certain aspects of SQL/MED for heterogeneous data access. capabilities, while not part of the core SQL standard, are integrated via extensions like the tsvector and tsquery data types, which extend standard string matching for efficient text indexing and querying. PostgreSQL includes several non-standard extensions that enhance usability but deviate from strict ANSI/ISO SQL, such as the ILIKE operator for case-insensitive , which builds on the standard LIKE but adds portability challenges. Array operations provide multidimensional support for handling complex data structures, exceeding the basic array features in SQL:1999 and offering functions like indexing and not found in the standard. Compared to other systems, PostgreSQL exhibits greater adherence to SQL standards than , which often prioritizes performance over full compliance in areas like subquery handling and joins, while differing from in syntax for procedural elements and proprietary extensions like . For interoperability, PostgreSQL complies with SQL/CLI Part 3 (Call-Level Interface) through its official ODBC driver, enabling standard connectivity for applications using ODBC-compliant tools. Similarly, the supports standard SQL queries via Java's Database Connectivity API, facilitating seamless integration with enterprise environments while preserving SQL standard portability. These interfaces ensure that core SQL:2023 features execute consistently across PostgreSQL and other compliant databases, though extensions like arrays may require vendor-specific handling.

Procedural Languages and Extensions

PostgreSQL supports procedural languages that extend its SQL capabilities by allowing users to write functions, procedures, and triggers in languages beyond plain SQL, enabling complex computations and control structures directly within the . These languages are installed using the CREATE LANGUAGE command, which loads a handler—a function that interprets and executes the procedural code. The standard distribution includes four such languages: , PL/Tcl, , and PL/Python, each providing different paradigms for server-side scripting. PL/pgSQL is a block-structured procedural language designed specifically for PostgreSQL, with goals to create functions and procedures that inherit user-defined types, operators, and functions while adding control structures like loops and conditionals to SQL. It supports cursors for processing query results row by row and via BEGIN ... EXCEPTION blocks, making it suitable for complex data manipulations. Similar to Oracle's , PL/pgSQL reduces client-server round trips by executing grouped computations server-side, improving performance and leveraging PostgreSQL's native data types and operators. Other procedural languages in the standard distribution include PL/Tcl, which embeds Tcl scripting for procedural extensions; PL/Perl, which integrates for text processing and dynamic code; and PL/Python, which allows Python-based functions for advanced computations like integrations. These languages are created via CREATE LANGUAGE, such as CREATE LANGUAGE plpython3u;, and can be used to define custom functions that interact with database objects. Procedural languages are classified as trusted or untrusted based on security restrictions. Trusted languages, like and the default (plperl), limit operations such as file system access, module loading, or external connections to prevent unprivileged users from compromising the server. Untrusted variants, such as plperlu or plpython3u, permit full language capabilities but are restricted to superusers for function creation, as they can execute arbitrary code with privileges. This distinction ensures that non-superusers can safely use trusted languages without risking server security. Extensions in PostgreSQL add modular functionality through the CREATE EXTENSION command, which installs predefined packages of SQL objects like functions, data types, and operators into the database. For example, the extension provides spatial data types and functions for geographic information systems, while pg_trgm adds trigram-based operators for efficient fuzzy text matching. Extensions must not already exist in the database and often require privileges, though trusted extensions can be installed by users with appropriate object creation rights; they enable atomic installation and removal via DROP EXTENSION. User-defined functions in procedural languages or SQL can be created with CREATE FUNCTION, supporting aggregates for grouped computations, window functions for ordered analytics (marked with the WINDOW attribute), and custom operators defined via function logic. PostgreSQL's window function support includes advanced frame specifications in the OVER clause, such as the GROUPS frame unit (defining frames based on groups of peer rows with equivalent ORDER BY values) and frame exclusion clauses (EXCLUDE CURRENT ROW, EXCLUDE TIES, EXCLUDE GROUP, EXCLUDE NO OTHERS), which provide fine-grained control over the window frame; these features are absent in MySQL up to version 8.4, which supports only ROWS and RANGE frame units without exclusion options. Functions are classified by volatility: IMMUTABLE for those always returning the same output for identical inputs without side effects, allowing query optimizer constant folding; STABLE for consistent results within a query but potentially varying across statements; and VOLATILE for functions like random() that may change outputs unpredictably. Security modes include SECURITY INVOKER (default), executing with the caller's privileges, and SECURITY DEFINER, running with the owner's privileges to enable controlled privilege escalation, though it requires a secure search_path to avoid injection risks.

Advanced Query Features

Triggers and Rules

PostgreSQL provides two primary mechanisms for event-driven : triggers and rules. Triggers allow the automatic execution of functions in response to modification operations on tables, views, or foreign tables, enabling actions such as , auditing, or cascading updates. Rules, in contrast, operate at the query rewrite stage to modify or augment SQL statements before they reach the , facilitating features like view updates and without direct function calls. These features support complex database behaviors while integrating with PostgreSQL's procedural extensions for function . Triggers are defined using the CREATE TRIGGER command and fire on specific events: INSERT, UPDATE (optionally limited to certain columns), DELETE, or TRUNCATE. They can be classified by timing—BEFORE (executing prior to the operation, allowing row modification or skipping), AFTER (executing post-operation, with options for deferral in constraint triggers), or INSTEAD OF (replacing the operation entirely, supported only on views)—and by granularity: row-level (FOR EACH ROW, firing once per affected row) or statement-level (FOR EACH STATEMENT, firing once per SQL statement). TRUNCATE triggers are exclusively statement-level. Trigger functions, typically written in procedural languages such as , must return type trigger and have no arguments; they access row data via special variables like NEW (for inserted or updated rows) and OLD (for updated or deleted rows). Advanced trigger features include transition tables, which capture entire sets of NEW or OLD rows for AFTER triggers at both row and statement levels, referenced via the REFERENCING clause (e.g., REFERENCING NEW TABLE AS new_rows). Conditional firing is enabled by an optional WHEN clause specifying a , which skips the trigger if false, though this is unavailable for INSTEAD OF triggers. in triggers is permitted but unbounded by default, requiring careful design to avoid infinite loops in cascading scenarios; for instance, a BEFORE trigger on a table can modify the same row, potentially leading to repeated firings if not conditioned properly. Triggers require the TRIGGER privilege on the target object and EXECUTE on the function. Common use cases for triggers include auditing changes by logging OLD and NEW values in an audit table via an AFTER trigger, or enforcing cascading deletes across related tables with an AFTER DELETE trigger that issues additional DELETE statements. BEFORE triggers are often used for data normalization, such as automatically setting timestamps on INSERT. However, developers must guard against performance overhead in high-volume operations, as row-level triggers scale linearly with affected rows. The rule system, implemented via CREATE RULE, rewrites incoming queries at parse time to alter their behavior, such as substituting actions or generating additional queries. Rules apply to SELECT, INSERT, UPDATE, or DELETE on tables or views and can be conditional via a WHERE ; they do not support TRUNCATE or INSTEAD OF directly but can emulate complex behaviors through . For example, an ON INSERT rule on a view might rewrite the statement to target an underlying table, enabling updatable views. Logging can be achieved by appending an INSERT into a log table within the rule's action. Rules are defined with the same name as the target event type (e.g., a rule named "insert_log" for INSERT operations) and are applied in by their rule names. Use cases for rules include maintaining materialized views by rewriting SELECTs to union base table queries or implementing transparent data redirection, such as routing inserts to partitions. However, the rule system is noted for its complexity, with subtle interactions in multi-rule scenarios potentially leading to unexpected rewrites or privilege issues; PostgreSQL documentation advises caution and recommends triggers for most modern applications due to rules' historical origins and maintenance challenges. Rules can interact with access privileges, sometimes bypassing default controls unless explicitly managed. Triggers and rules differ fundamentally in execution: triggers invoke functions synchronously per row or statement during operation execution, allowing procedural logic, whereas rules perform declarative query rewriting once per statement at an earlier parse phase, potentially generating multiple queries without per-row processing. For instance, deleting 2000 rows via a trigger might execute 2000 function calls, each scanning indexes, while a rule could issue a single bulk delete, offering better performance for large operations but risking complexity in joins or conditions. PostgreSQL favors triggers for their simplicity and predictability in contemporary usage.

Asynchronous Notifications

PostgreSQL's asynchronous notification system, implemented via the LISTEN and NOTIFY commands, allows client applications to receive real-time alerts about database events without constant polling. This mechanism facilitates decoupled communication between database sessions, enabling efficient event-driven architectures. Introduced in early versions and refined over time, it operates at the session level within a single database, supporting interprocess signaling for various applications. The core protocol begins with the LISTEN command, which registers the current session as a subscriber to a named channel—any valid SQL identifier serving as the channel name. Multiple LISTEN calls for the same channel are idempotent, and subscriptions take effect only after the transaction commits. To unsubscribe, the UNLISTEN command is used, either for a specific channel or all channels with UNLISTEN *. Sessions automatically unlisten upon disconnection. When a NOTIFY command is executed on a channel, it queues a notification event, including the channel name, the sending ID, and an optional string, for delivery to all active listeners on that channel. Notifications are deferred until the issuing transaction commits, and multiple NOTIFY calls within the same transaction to the same channel are coalesced into a single event to avoid redundancy. The , limited to 8000 bytes by default, can carry arbitrary data such as JSON-serialized details about the event. Delivery occurs asynchronously with no guarantees; if a listener disconnects before processing, the notification is lost, and there is no built-in retry mechanism. The notification queue per backend is capped at 8 GB to prevent memory exhaustion. Clients receive notifications through polling mechanisms provided by PostgreSQL's client libraries. In libpq, the C library, applications call PQnotifies() after processing query results to retrieve pending PGnotify structures containing the channel, process ID, and ; this requires consuming input via PQconsumeInput() and monitoring the connection socket with select() for efficiency. Similar polling interfaces exist in other drivers, such as psycopg for Python. For programmatic sending, the built-in pg_notify(text, text) function offers a convenient alternative to the NOTIFY SQL command, accepting the channel and directly. Notifications can be triggered automatically by combining NOTIFY with database triggers, where an AFTER INSERT, UPDATE, or DELETE trigger on a table executes NOTIFY to signal changes, powering event-driven applications that respond to data modifications. Common use cases include , where NOTIFY alerts application caches to refresh stale data upon updates, reducing unnecessary database queries and improving performance. It also supports real-time updates in collaborative tools or dashboards, notifying connected clients of new rows or changes instantly. For broader integration, LISTEN/NOTIFY connects to external systems, such as daemons that relay events to endpoints for browser push notifications, often via client libraries or dedicated extensions like pgws. These extensions bridge PostgreSQL events to protocols like WebSockets, enabling scalable real-time features in web applications without external message brokers. PostgreSQL's native full text search enables efficient searching of large volumes of natural language text, supporting operations like stemming, ranking, and fuzzy matching for applications such as document retrieval or entity searches (e.g., restaurants by name or description). Text is converted to a searchable tsvector representation using the to_tsvector function, which tokenizes input, removes stop words, and applies stemming based on configured dictionaries (e.g., English for morphological variants). Queries are parsed into tsquery format via to_tsquery, allowing operators for phrase matching (@@), prefix searches (:*), ranking (ts_rank), and headline generation. For performance, GIN or GiST indexes accelerate lookups on tsvector columns, enabling fast retrieval even on large datasets without full table scans.

Storage Optimization

Indexes

PostgreSQL supports several index types to accelerate query execution by providing efficient access paths to table . These indexes are non-clustered by default, meaning they maintain a separate structure pointing to table rows without altering the physical order of the . The choice of index type depends on the query patterns, such as equality checks, range scans, or specialized operations like spatial searches. B-tree indexes serve as the default and most versatile option in PostgreSQL, suitable for a wide range of queries including equality (=), range comparisons (<, <=, >=, >), IS NULL, IS NOT NULL, and anchored pattern matching with LIKE or ILIKE. They support multicolumn indexing for composite keys, enabling sorted retrieval in ascending or descending order, and handle NULL values naturally without requiring special treatment. B-trees are self-balancing and efficient for both random and sequential access, making them ideal for general-purpose indexing on numeric, string, or timestamp columns. Since PostgreSQL 18, B-tree indexes support skip scans, which allow efficient querying when leading columns in a composite index are not selective, improving performance for certain multi-column queries. For equality-only lookups, hash indexes store 32-bit hash codes of key values, offering faster performance than B-trees for exact matches but lacking support for range queries or sorting. GiST (Generalized ) indexes provide a flexible framework for complex data types, supporting geometric and spatial operations such as bounding overlaps (&&), (@>), and nearest-neighbor searches, commonly used in GIS applications. SP-GiST (space-partitioned GiST) indexes support partitioned search trees for non-balanced structures like quadtrees and k-d trees, suitable for hierarchical data such as geometric points, polygons, or IP addresses, enabling operations like nearest-neighbor searches with lower storage overhead than GiST for certain datasets. (Generalized ) indexes excel with composite values like arrays or , enabling overlap (&&), (@>), and membership queries (<@) on multi-valued attributes. Since PostgreSQL 18, index creation supports parallelism for faster builds on large datasets. BRIN (Block Range Index) indexes are lightweight and suited for very large tables with naturally ordered data, storing summary information per block range to support range queries efficiently while minimizing storage overhead. Indexes are created using the CREATE INDEX command, which by default builds a B-tree index on one or more columns: CREATE INDEX index_name ON table_name (column_name);. The CONCURRENTLY option allows index construction without blocking table writes, performing two scans of the table to capture changes during the build process, which is essential for production environments to avoid downtime. Partial indexes target a subset of rows via a WHERE clause, such as CREATE INDEX ON table_name (column_name) WHERE condition;, reducing size and improving selectivity for conditional queries. Expression indexes apply to computed values, like CREATE INDEX ON table_name ((upper(column_name)));, accelerating queries on functions or transformations of data. The UNIQUE qualifier enforces uniqueness constraints, integrating indexing with data integrity. Since PostgreSQL 18, non-B-tree unique indexes (such as hash) can be used as partition keys or in materialized views. PostgreSQL optimizes query execution through advanced index access methods, including index-only scans, where the query planner retrieves all necessary data directly from the index structure without accessing the heap table, provided the visibility map confirms no uncommitted changes. Bitmap index scans combine results from multiple indexes by creating a bitmap of matching row locations, which is particularly efficient for queries with several restrictive conditions, as it minimizes random I/O by merging bitmaps before fetching rows. Index maintenance is handled via the REINDEX command, which rebuilds corrupted, bloated, or inefficient indexes to restore performance, either for a single index (REINDEX INDEX index_name;) or the entire database. Unlike some database systems with automatic clustered indexes that maintain physical row order on inserts and updates, PostgreSQL does not support persistent clustering; instead, the CLUSTER command performs a one-time reordering of table rows based on a specified index (CLUSTER table_name USING index_name;), which can improve sequential scan performance for range queries but requires manual reapplication after significant data changes.

Other Storage Features

PostgreSQL supports declarative partitioning, introduced in version 10, which allows a table to be divided into smaller, more manageable partitions based on a partitioning key. This feature enables range partitioning for continuous ranges of values, such as dates; list partitioning for discrete sets of values; and hash partitioning using a modulus operation on a hash function for even distribution. Subpartitioning is available since version 10, permitting multi-level partitioning where partitions themselves can be further subdivided. Partitions must match the parent's column structure and can be attached or detached dynamically using ALTER TABLE ... ATTACH PARTITION or DETACH PARTITION commands, facilitating maintenance without downtime. Constraint exclusion optimizes query performance by pruning irrelevant partitions during execution, controlled by the constraint_exclusion parameter set to partition by default. Since PostgreSQL 18, VACUUM and ANALYZE can use the ONLY option to process partitioned tables without recursing to child partitions. Constraints in PostgreSQL enforce data integrity beyond basic indexing. Unique constraints ensure that no two rows share the same values in specified columns, automatically creating a B-tree index and treating multiple NULL values as non-conflicting unless specified with NULLS NOT DISTINCT. Foreign key constraints maintain referential integrity by requiring referenced columns to match a primary key or unique constraint in another table, with cascading actions such as ON DELETE CASCADE to propagate deletions or ON UPDATE SET NULL to adjust dependent values. Since PostgreSQL 18, primary key, unique, and foreign key constraints support non-overlapping semantics for range types (e.g., time intervals), ensuring no overlaps similar to exclusion constraints but integrated into standard keys; additionally, CHECK and foreign key constraints can be specified as NOT ENFORCED, allowing validation without enforcement for data migration or federation scenarios. Exclusion constraints, implemented via GiST indexes, prevent overlapping values using custom operators (e.g., && for range overlaps), useful for scenarios like scheduling where no two events can conflict. These constraints apply to partitioned tables, with each partition inheriting or defining its own as needed. TOAST (The Oversized-Attribute Storage Technique) manages large data fields exceeding the 8 KB page size limit by compressing or externalizing them. For values over 2 KB (TOAST_TUPLE_THRESHOLD), PostgreSQL first attempts compression using methods like pglz (default) or LZ4 (configurable via default_toast_compression since version 14), storing compressed data inline if it fits under 2 KB (TOAST_TUPLE_TARGET). If still too large, data is stored externally in a TOAST table as chunks up to 2 KB each, referenced by a pointer in the main tuple. Storage strategies include PLAIN (no compression or externalization), EXTERNAL (external only), EXTENDED (compression then external, default for most types), and MAIN (inline compression with external as fallback). Since PostgreSQL 18, very large expression indexes can reference TOAST tables in the pg_index system catalog. PostgreSQL does not provide built-in whole-table or row-level compression. Compression for entire tables or rows can instead be achieved through extensions that support columnar storage, such as Citus Columnar or TimescaleDB, or via filesystem-level options like ZFS. Tablespaces enhance partitioning by allowing individual partitions to be placed on different physical storage locations for optimized I/O performance. Administrators create tablespaces with CREATE TABLESPACE specifying a directory, then assign them to partitions during creation or alteration (e.g., CREATE TABLE partition1 TABLESPACE fastspace). This is particularly useful for large partitioned tables, enabling hot data on SSDs and archival partitions on slower disks, while the global default_tablespace parameter controls unassigned placements. Partitions can also inherit indexes from the parent table or define local ones for targeted query acceleration. Introduced in PostgreSQL 18, the asynchronous I/O (AIO) subsystem improves performance for storage-intensive operations such as sequential scans, bitmap heap scans, and vacuums by allowing non-blocking I/O, with demonstrated speedups of up to 3× in certain workloads.

Replication and Availability

Replication Mechanisms

PostgreSQL provides built-in replication mechanisms to distribute data across multiple servers for redundancy and scalability, primarily through physical and logical replication options. These mechanisms leverage the Write-Ahead Logging (WAL) system to capture and propagate database changes while maintaining consistency. Introduced in version 9.0, streaming replication forms the foundation of physical replication, enabling continuous data synchronization from a primary server to one or more standby servers. Streaming replication operates by shipping WAL records in real-time from the primary to standbys, allowing for asynchronous or synchronous modes. In asynchronous mode, the primary commits transactions without waiting for standby acknowledgment, prioritizing performance but risking potential data loss on the primary failure. Synchronous mode, available since version 9.1, ensures zero data loss by requiring confirmation from at least one standby before committing, though it can impact throughput on networks with high latency. Standby servers support hot standby queries, permitting read-only access to replicated data for load balancing, a feature enabled since version 9.0. Logical replication, introduced in PostgreSQL 10, complements physical replication by allowing selective replication of data objects and changes based on their logical identity, typically primary keys, rather than entire physical copies of the database. It uses a publish-subscribe model where publications define sets of tables or databases on the publisher server, and subscriptions on subscriber servers pull initial snapshots followed by incremental changes to maintain transactional consistency. This enables per-table granularity, cross-version upgrades, and multi-node topologies, but requires subscribers to remain read-only for replicated tables to avoid conflicts, as no built-in resolution mechanism exists—applications must handle any discrepancies manually. Cascading replication extends both physical and logical setups by allowing intermediate standbys or subscribers to relay changes to downstream nodes, reducing load on the primary and supporting hierarchical topologies; this was added for streaming in version 9.2. Tools like pg_basebackup, introduced in version 9.1, facilitate initial full backups for setting up standbys, while pg_rewind, available since version 9.5, enables efficient resynchronization of diverged servers during failover without full data reloads. PostgreSQL's core replication does not support multi-master configurations out-of-the-box, where multiple nodes accept writes simultaneously, due to conflict risks in physical setups and limitations in logical change application. Extensions such as BDR (Bi-Directional Replication), developed by 2ndQuadrant (now EDB), address this by building on logical replication for asynchronous multi-master scenarios with custom conflict handling.

High Availability Enhancements

PostgreSQL enhances high availability (HA) through a suite of features that build upon its core replication mechanisms, enabling robust recovery, resource management, and monitoring in clustered environments. These enhancements focus on minimizing downtime, scaling read operations, and ensuring data consistency during failures, allowing administrators to maintain service continuity in production deployments. Point-in-time recovery (PITR) provides a critical mechanism for restoring databases to a specific moment using write-ahead log (WAL) archives combined with base backups, mitigating data loss from corruption or human error. To implement PITR, administrators configure WAL archiving by setting wal_level to replica or higher, enabling archive_mode, and defining an archive_command to copy WAL segments to a secure archive location, such as a shared filesystem or cloud storage. A base backup is then taken using pg_basebackup, capturing a consistent snapshot of the database files. For recovery, the server is stopped, the base backup restored, and a recovery.signal file created along with a restore_command to replay WAL files from the archive up to the desired recovery point, specified via recovery_target_time, recovery_target_xid, or recovery_target_name. This process allows precise rollbacks without full data reloads, supporting HA by enabling rapid restoration on standby nodes. Connection pooling addresses the overhead of establishing numerous database connections in high-concurrency HA setups, where frequent failovers or load balancing can exhaust server resources. PostgreSQL integrates effectively with PgBouncer, a lightweight external pooler that maintains a pool of reusable connections, reducing latency and CPU usage by multiplexing client sessions onto fewer backend connections. Common integration patterns involve deploying PgBouncer in transaction-pooling mode for short-lived queries or session-pooling for stateful applications, configured via its pool_mode directive to balance performance and isolation. Additionally, prepared transactions via the two-phase commit (2PC) protocol enhance HA in distributed environments by allowing transactions to be durably prepared with PREPARE TRANSACTION and later committed or rolled back across replicas using COMMIT PREPARED or ROLLBACK PREPARED, ensuring atomicity during failover without data inconsistencies. The max_prepared_transactions parameter limits concurrent prepared states to prevent resource exhaustion. Read replicas, implemented through streaming or logical replication, scale read workloads while bolstering HA by offloading queries from the primary and providing failover candidates. In HA clusters, standby servers configured with hot_standby = on accept read-only connections during WAL replay, distributing load to prevent primary overload and enabling seamless promotion via pg_ctl promote if the primary fails. This setup achieves horizontal scaling for analytics or reporting, with lag monitoring ensuring data freshness. Automatic failover scripts automate promotion by detecting primary failure—often via heartbeat checks or external tools—and executing steps like updating client connection strings and notifying replicas to switch targets, minimizing recovery time objectives (RTO) to seconds in well-tuned clusters. Monitoring tools are essential for maintaining HA, providing visibility into replication health and potential issues. The pg_stat_replication system view tracks WAL sender processes for each standby, exposing metrics like sent_lsn, replay_lsn, and lag times (write_lag, flush_lag, replay_lag) to quantify synchronization delays and detect stalled replicas. For deeper analysis, pgBadger processes PostgreSQL log files to generate HTML reports on query performance, connection patterns, and errors, highlighting bottlenecks that could impact availability, such as long-running transactions blocking WAL shipping. Recent PostgreSQL versions introduce enhancements tailored for HA resilience. PostgreSQL 17 adds support for incremental backups via pg_basebackup --incremental, which captures only changed blocks since the last full backup, reducing storage and transfer costs for frequent PITR-enabled archiving; the pg_combinebackup utility then reconstructs full backups from these increments for restores. It also includes failover control for logical replication, allowing subscribers to seamlessly switch publishers during primary failures without slot loss, improving continuity in multi-node setups. Furthermore, PostgreSQL 17's revamped VACUUM memory management reduces peak usage by up to 20 times while accelerating cleanup, enabling faster maintenance on active HA clusters without extended locks that could trigger failovers. Parallel VACUUM, refined in recent releases, distributes index scanning across workers to shorten table bloat resolution times, indirectly supporting HA by minimizing primary downtime during routine operations. PostgreSQL 18, released on September 25, 2025, builds on these with further improvements, including logical replication of generated columns via the publish_generated_columns option, default parallel streaming for new subscriptions to enhance throughput, and idle_replication_slot_timeout to automatically clean up inactive replication slots and prevent resource leaks. It also adds WAL I/O activity tracking in pg_stat_io, per-backend WAL statistics via pg_stat_get_backend_wal(), new VACUUM controls like vacuum_truncate for file truncation and freezing of all-visible pages, and backup enhancements such as pg_combinebackup --link for hard linking and pg_verifybackup support for tar-format backups. These updates improve replication performance, monitoring precision, and recovery efficiency in HA environments.

Connectivity

Client Interfaces

PostgreSQL provides client interfaces through a standardized wire protocol that facilitates communication between frontend applications (clients) and the backend server. This frontend/backend protocol, currently at version 3.2 as of PostgreSQL 18, operates over TCP/IP or Unix-domain sockets and uses a message-based format to exchange queries, results, and control signals. The protocol supports both simple query mode for direct execution and extended query mode, which includes features like prepared statements to parameterize queries and mitigate risks. At the core of client connectivity is libpq, the official C library that implements the wire protocol and offers functions for establishing connections, sending queries, and processing results. Libpq handles asynchronous operations, error reporting, and notifications, enabling efficient interaction for C-based applications. It also supports secure connections via SSL/TLS encryption, configurable through server parameters like ssl in postgresql.conf, which protects data in transit using TLS protocols. Additionally, libpq includes the COPY protocol for high-speed bulk data transfer, allowing clients to stream large datasets to or from the server without intermediate parsing, significantly improving performance for import/export operations. PostgreSQL supports a range of language-specific drivers built on or compatible with libpq, providing standardized APIs for integration. The JDBC driver enables Java applications to connect using standard database-independent code, supporting features like connection pooling and transaction management. For cross-platform access, the official ODBC driver (psqlODBC) allows Windows, Linux, and macOS applications to interface via ODBC standards. Python developers commonly use psycopg, a DB-API 2.0 compliant adapter that wraps libpq for efficient query execution and type handling. In the .NET ecosystem, Npgsql serves as the ADO.NET provider, offering full support for Entity Framework and async operations. For embedded or higher-level access in specific languages, libraries like libpqxx provide a C++ wrapper around libpq, emphasizing modern C++ idioms such as RAII for resource management and iterators for result traversal. Similarly, node-postgres (pg) is a popular non-blocking client for Node.js, with optional native libpq bindings for performance-critical applications, supporting promises and connection pooling. Introduced in PostgreSQL 14, pipeline mode in libpq allows clients to send multiple queries without waiting for individual responses, reducing round-trip latency over high-latency networks by up to 2-3 times in batched workloads. This client-side feature batches commands into a single transaction context, flushing results only when needed, and is particularly beneficial for applications with sequential query patterns.

Foreign Data Wrappers

Foreign Data Wrappers (FDWs) in PostgreSQL provide a framework for accessing external data sources, allowing users to query remote data as if it were stored in local tables, in accordance with the SQL/MED standard. This is achieved through libraries that handle connections to external systems, abstracting the details of data retrieval and integration. FDWs were introduced in PostgreSQL 9.1 with read-only support and expanded in version 9.3 to include write capabilities for certain operations. To utilize an FDW, users first create a foreign server object with CREATE SERVER to define connection parameters such as host and port, followed by a user mapping with CREATE USER MAPPING for authentication details like passwords. Foreign tables are then defined using CREATE FOREIGN TABLE, specifying the foreign server, column definitions, and options relevant to the wrapper. The built-in postgres_fdw extension, installed via CREATE EXTENSION postgres_fdw, enables access to data in external PostgreSQL servers by creating foreign tables that mirror remote structures. For example, after setting up a server and mapping, a foreign table can be created to query a remote table directly within local SQL statements. Several extensions extend the FDW framework to diverse data sources. The file_fdw wrapper allows querying local files, such as CSV data, by treating them as foreign tables without needing to import the data into PostgreSQL. For relational databases, jdbc_fdw uses Java Database Connectivity (JDBC) to connect to systems like Oracle or MySQL, supporting both reads and writes in compatible setups. Multicorn, a Python-based extension, facilitates custom FDWs by leveraging Python libraries for integration with sources like SQLAlchemy-compatible databases or even NoSQL stores. Key concepts in FDWs include query pushdown, where conditions from WHERE clauses and joins are executed on the remote server to optimize performance and reduce data transfer. This pushdown is supported for built-in functions and operators in wrappers like postgres_fdw, though it depends on the remote server's capabilities. Additionally, IMPORT FOREIGN SCHEMA automates the creation of foreign tables by importing definitions from the remote schema, with options to include or exclude specific tables and handle collations. FDWs are commonly used for data federation, enabling unified queries across multiple heterogeneous sources, and in ETL pipelines to extract and transform data from external systems without full ingestion. For instance, a PostgreSQL instance can federate data from a file-based archive and a remote database for analytics. Limitations include incomplete DML support across wrappers; while postgres_fdw handles INSERT, UPDATE, and DELETE, operations like INSERT ... ON CONFLICT DO UPDATE are not fully supported, and some wrappers like file_fdw are read-only. Security is managed through foreign server options and user mappings, with superusers able to bypass passwords but at risk of vulnerabilities if misconfigured, such as those noted in CVE-2007-3278. PostgreSQL 18 introduced SCRAM pass-through authentication for FDW connections, enabling secure authentication without storing user passwords by using SCRAM-hashed secrets passed to the remote server. Authentication often relies on the underlying protocol, like libpq for postgres_fdw, ensuring secure connections via SSL where specified.

Security

Authentication and Access Control

PostgreSQL implements a robust authentication and access control system to manage user identities and permissions within the database cluster. Authentication verifies the identity of connecting clients, while access control determines the operations they can perform on database objects. This dual approach ensures secure data management, with authentication handled primarily through the pg_hba.conf configuration file and access control enforced via roles, privileges, and policies. Roles in PostgreSQL serve as the foundation for both user accounts and group management, unifying the concepts of users and groups into a single entity. The CREATE ROLE command defines a new role, which can act as a login-enabled user or a non-login group for privilege delegation. Key attributes include LOGIN to allow direct connections (default for CREATE USER, a synonym), SUPERUSER for unrestricted administrative access (default NOSUPERUSER), and INHERIT to enable automatic inheritance of privileges from member roles (default INHERIT). For example, CREATE ROLE analyst LOGIN; creates a basic user role, while group roles facilitate scalable permission assignment by granting membership via GRANT role TO group_role. Role membership supports set-based sessions through the SET ROLE command, allowing temporary adoption of a role's privileges during a connection without full inheritance. Superusers or roles with the CREATEROLE attribute can create new roles. Access control is managed through a privilege system that grants specific permissions on database objects such as tables, schemas, and functions. Privileges include SELECT for reading data, INSERT and UPDATE for modifications, DELETE and TRUNCATE for removal, REFERENCES for foreign keys, TRIGGER for event handling, CREATE for schema objects, CONNECT for databases, EXECUTE for functions, and USAGE for types or languages, among others tailored to object types. The GRANT command assigns these privileges to roles or PUBLIC (all roles), with the WITH GRANT OPTION allowing recipients to further delegate them. For instance, GRANT SELECT, INSERT ON table_name TO analyst; permits read and write access. Conversely, the REVOKE command removes privileges, such as REVOKE ALL PRIVILEGES ON table_name FROM analyst;, and supports CASCADE to handle dependencies. Only object owners, superusers, or roles with grant options can perform these operations, ensuring controlled delegation. Default privileges can be set via ALTER DEFAULT PRIVILEGES to apply automatically to future objects created by a role. Row-level security (RLS), introduced in PostgreSQL 9.5, provides fine-grained control by restricting access to individual rows in a table based on the current role. To enable RLS, use ALTER TABLE table_name ENABLE ROW LEVEL SECURITY;, which applies a default-deny policy unless explicitly permitted. Policies are defined with CREATE POLICY, specifying commands like SELECT, INSERT, UPDATE, or ALL, and targeting roles or PUBLIC. The USING clause determines row visibility (e.g., USING (department = current_user_dept())), while WITH CHECK controls validation of new or modified rows for INSERT, UPDATE, and ALL policies covering those operations, ensuring inserted or updated rows comply with rules by evaluating the proposed new row values. If the expression evaluates to false or null, an error is thrown and the entire operation aborts; it does not apply to SELECT or DELETE-only policies. For example, WITH CHECK (condition) enforces conditions on modifications (e.g., preventing updates that violate row ownership). Policies combine permissively with OR or restrictively with AND, and can be bypassed by superusers or roles with the BYPASSRLS attribute. This feature provides a secure foundation for enforcing tenant data isolation at the database level in multi-tenant applications, reducing the risk of bugs in application-layer access enforcement; it can be integrated with frameworks such as Supabase for simplified setup. As of PostgreSQL 18, the pg_dump utility includes a --no-policies option to disable processing of RLS policies during backups. Host-based authentication is configured via the pg_hba.conf file, which defines rules for accepting connections based on user, database, IP address, and method. Each line specifies a connection type (e.g., local for Unix-domain socket connections, host for TCP/IP connections), database and user patterns (e.g., all or specific names), address ranges (e.g., 192.168.1.0/24), and an authentication method like trust, md5, or scram-sha-256 for passwords. Rules are evaluated top-to-bottom, with the first match applied, allowing layered security such as trusting local connections while requiring passwords remotely. On Microsoft Windows, Unix-domain sockets are not supported, so all local connections (including to localhost) use TCP/IP loopback and local entries in pg_hba.conf are ignored. To enable trust authentication (no password prompt) for local connections on Windows, use host entries for the loopback addresses instead, for example: host all all 127.0.0.1/32 trust host all all ::1/128 trust These entries must appear before any entries requiring passwords or other authentication methods, as the first matching rule wins. After editing pg_hba.conf, reload the configuration (e.g., via pg_ctl reload or SELECT pg_reload_conf();) or restart the PostgreSQL service for changes to take effect; on Windows, changes apply immediately to new connections without requiring a reload. To ensure the connection uses TCP/IP (rather than attempting a socket connection), connect explicitly via TCP/IP (e.g., psql -h 127.0.0.1). This addresses a common issue where trust authentication appears to fail on Windows due to reliance on local entries. As of PostgreSQL 18, the MD5 authentication method is deprecated, issuing warnings when used with CREATE or ALTER ROLE, to promote the more secure SCRAM-SHA-256. Additionally, PostgreSQL 18 increases the cancel request key size to 256 bits when using the new wire protocol version 3.2, enhancing security against certain attacks. PostgreSQL integrates with external systems for enterprise authentication, including LDAP and Kerberos via GSSAPI. LDAP authentication, configured in pg_hba.conf with the ldap method, validates credentials against an LDAP server using simple bind (prefix/suffix DN construction) or search+bind modes, requiring parameters like ldapserver, ldapprefix, and ldapbasedn in postgresql.conf. Users must pre-exist in the database. Kerberos support uses the gss method in pg_hba.conf, leveraging GSSAPI for single sign-on with Kerberos tickets, often integrated with Active Directory, and requires a compatible library like MIT Kerberos. As of PostgreSQL 18, OAuth authentication is supported via a new oauth method in pg_hba.conf, allowing token-based authentication with configurable validator libraries for integration with identity providers. These methods enable centralized identity management without storing passwords in PostgreSQL. PostgreSQL 18 also adds SCRAM passthrough support for foreign data wrappers like postgres_fdw and dblink.

Encryption and Auditing

PostgreSQL provides several mechanisms for encrypting data both in transit and at rest, as well as tools for auditing database activities to ensure security and compliance. Encryption in transit is supported natively through Secure Sockets Layer (SSL) connections, which protect client-server communications by encrypting queries, results, and authentication data. To enable SSL, administrators configure the server with SSL certificates and specify hostssl in the pg_hba.conf file to require encrypted connections for specific hosts or users. As of PostgreSQL 18, the ssl_tls13_ciphers parameter allows specification of TLSv1.3 cipher suites, and the default for ssl_groups now includes the X25519 elliptic curve for improved security. Additionally, the Generic Security Services Application Program Interface (GSSAPI) offers encrypted authentication and data transmission, often used for Kerberos-based single sign-on without sending passwords over the network. For data at rest, PostgreSQL does not include built-in Transparent Data Encryption (TDE) for database files or the Write-Ahead Log (WAL); instead, it relies on operating system-level solutions such as filesystem encryption with tools like eCryptfs or EncFS, or block-device encryption using dm-crypt with LUKS. These methods encrypt the entire storage volume, safeguarding against unauthorized access if physical media is stolen, though they offer no protection once the system is mounted and running. Column-level encryption is available through the pgcrypto extension, which provides functions for symmetric and asymmetric encryption of specific data fields, such as encrypt(data bytea, key bytea, type text) for raw bytea encryption using algorithms like AES in CBC mode, and PGP-based functions like pgp_sym_encrypt for password-based symmetric encryption. Keys for pgcrypto are typically managed client-side by the application, ensuring that the database server never stores plaintext keys, though this requires careful application design to handle decryption securely. As of PostgreSQL 18, pgcrypto adds support for sha256crypt and sha512crypt password hashing algorithms, as well as CFB mode for ciphers, expanding options for secure data handling. Auditing in PostgreSQL is facilitated by built-in logging parameters that capture database activities for security monitoring and compliance. The log_statement parameter controls logging of SQL statements, with options like all to record every query, mod for data-modifying statements (INSERT, UPDATE, DELETE), or ddl for schema changes, enabling administrators to track user actions comprehensively. Similarly, log_min_duration_statement logs queries exceeding a specified duration (e.g., 250 ms), aiding in the identification of potentially abusive or inefficient operations, while log_min_error_statement records statements causing errors at or above a severity level like ERROR. For more granular auditing, the pgaudit extension enhances these capabilities by providing session-level logging of all executed statements and object-level logging for specific tables, roles, or functions, configurable via the pgaudit.log parameter (e.g., to log READ, WRITE, or ROLE events). This extension outputs audit trails in PostgreSQL's standard log format, supporting rotation and analysis for compliance needs. As of PostgreSQL 18, the log_lock_failures parameter enables logging of lock acquisition failures, improving monitoring of concurrency-related security events. These encryption and auditing features help PostgreSQL meet requirements for regulations such as GDPR and HIPAA, where data protection and activity logging are essential; for instance, SSL and column encryption address data confidentiality, while detailed logs from log_statement or pgaudit provide audit trails for access monitoring and breach investigations. Starting with PostgreSQL 14, built-in support for hash functions via pgcrypto was improved, including better integration for password hashing with SCRAM-SHA-256 as the default since PostgreSQL 14, further enhanced by the deprecation of authentication in PostgreSQL 18 to encourage more secure practices without relying on it. Key management for encryption remains a critical aspect, often handled externally through tools like HashiCorp Vault or application-level secrets, to prevent key exposure on the database server.

Performance and Tuning

Benchmarks

PostgreSQL's performance is evaluated through standardized and community-driven benchmarks that assess its capabilities in transaction processing, analytical queries, and scalability. These benchmarks highlight PostgreSQL's strengths in online transaction processing (OLTP) and decision support systems, often using tools like pgbench for custom testing and implementations of industry standards such as TPC-C and TPC-H. Results demonstrate PostgreSQL's efficiency in handling complex workloads, with scalability improving through hardware upgrades and configuration tuning. The TPC-C benchmark, focused on OLTP workloads, measures new-order transactions per minute (tpmC) under concurrent access, emphasizing scalability across multiple users and warehouses. Community implementations like HammerDB's TPROC-C variant show PostgreSQL achieving high throughput; for instance, on a tuned server with PostgreSQL 13, it delivered over 1 million NOPM (new orders per minute) on multi-socket hardware, outperforming PostgreSQL 12 by up to 20% in transaction rates due to enhanced parallel query execution. Scalability tests indicate linear improvements with additional CPU cores and memory. For analytical processing, the TPC-H benchmark evaluates query performance on large datasets with ad-hoc business queries, measuring composite query-per-hour (QphH) metrics. PostgreSQL implementations using tools like DBT3 or pg_tpch demonstrate strong scalability; on a scale factor 100 dataset, PostgreSQL completed the full suite with a QphH@100 of approximately 43,700 on high-end hardware. Results show PostgreSQL's parallel processing enabling improved performance as dataset size scales. The pgbench tool, included in PostgreSQL distributions, facilitates custom OLTP benchmarks by simulating TPC-B-like transactions with selectable, update, and insert operations. Default runs on a single-client setup yield 1,000-5,000 transactions per second (TPS) on modest hardware, scaling to 50,000+ TPS with 100+ clients and tuned parameters like shared_buffers at 25% of RAM. In comparisons with for OLTP workloads, PostgreSQL exhibited 9 times faster execution for select queries with WHERE clauses (0.09-0.13 ms vs. 0.9-1 ms) on datasets up to 1 million rows, attributed to superior query optimization. Version improvements have enhanced specific workloads; PostgreSQL 15 introduced SQL/JSON path language support and parallel processing for JSON operations. PostgreSQL 16 added improvements to query parallelism and up to 300% faster concurrent bulk data loading with COPY. PostgreSQL 17 enhanced I/O with streaming for sequential reads and up to 2x better write throughput under high concurrency. PostgreSQL 18 introduced an asynchronous I/O subsystem for up to 3x faster sequential scans and vacuums, along with better join planning and text processing performance. Community benchmarks like YCSB evaluate NoSQL-like key-value operations, with PostgreSQL adaptations showing 10,000-20,000 operations per second in workload A (50% reads, 50% updates) on distributed setups, scaling horizontally across nodes. HammerDB supports mixed OLTP/OLAP workloads, where PostgreSQL achieved strong performance in hybrid tests. Benchmark outcomes are influenced by hardware scaling, where increasing CPU cores and NVMe storage can boost throughput by 4-6x, and configuration settings like work_mem and effective_cache_size, which optimize memory usage to reduce I/O by up to 50% in high-concurrency scenarios.
BenchmarkKey MetricPostgreSQL Example ResultComparison/Scale
TPC-C (via HammerDB)tpmC/NOPM>1M NOPM (PG 13, 64-core)20% better than PG 12; linear with cores
TPC-H (via DBT3)QphH@100≈43,700Under 1 min for SF100 queries
pgbench (TPC-B like)TPS50,000+ (100 clients)9x faster selects vs. OLTP
YCSB Workload AOps/sec10k-20k (distributed)Horizontal scaling across nodes
HammerDB HybridTPMStrong performanceIn mixed OLTP/OLAP

Optimization Techniques

PostgreSQL employs a sophisticated query planner that generates execution plans for SQL statements by evaluating multiple strategies to select the most efficient one based on estimated costs. The planner uses cost-based optimization, where costs are arbitrary units primarily representing disk page fetches, with sequential page costs set to 1.0 by default and random page costs at 4.0 to account for seek times. It begins by creating plans for scanning individual relations, supporting sequential scans, index scans, and bitmap scans, then combines them using join methods such as nested loops, merge joins, and hash joins. For queries involving fewer relations than the geqo_threshold (default 12), it performs a near-exhaustive search of join orders; for larger queries, it activates the Genetic Query Optimizer (GEQO). The EXPLAIN command allows users to inspect the without executing the query, displaying node types like sequential scans or index scans, along with estimated costs (startup and total), expected row counts, and widths. EXPLAIN ANALYZE extends this by actually running the query and providing actual runtime metrics in milliseconds, row counts, and buffer I/O statistics, enabling comparison between estimated and real performance to identify discrepancies in planner assumptions. These tools are essential for tuning, as they reveal inefficiencies such as excessive filtering or suboptimal index usage, guiding interventions like index creation or query rewrites. GEQO addresses the complexity of optimizing large join queries by modeling the problem as a traveling salesman problem and applying a for search. It encodes potential join orders as integer strings, generates an initial population of random sequences, evaluates their costs using the standard planner, and evolves better plans through crossover and mutation operations, typically converging in a few generations. This nondeterministic approach reduces planning time for queries with many tables, though it may not always find the absolute optimum; reproducibility can be ensured via the geqo_seed parameter. Performance tuning often involves adjusting resource consumption parameters in the postgresql.conf file. The shared_buffers setting allocates for caching data pages, with a default of 128 MB; for dedicated servers with over 1 GB RAM, it is recommended to set this to 25% of total , though not exceeding 40% to leave room for the operating system's cache. work_mem limits per operation like sorts or hashes, defaulting to 4 MB; increasing it can prevent disk spills for complex queries but risks excessive use across concurrent sessions, so it should be tuned based on maximum expected operations. effective_cache_size estimates the combined cache available from PostgreSQL buffers and the OS, typically set to 50-75% of system RAM to inform the planner's index usage decisions without directly allocating . Since PostgreSQL 9.6, parallel query execution enables the use of multiple CPUs for portions of a , such as sequential scans, joins, and aggregates, particularly benefiting analytical workloads that scan large tables but return few rows. The planner inserts parallel-aware nodes into the plan tree, launching background workers to perform scans or computations in parallel, with the leader process coordinating and combining results; this can yield speedups of two to four times or more on multi-core systems. Introduced in PostgreSQL 11, Just-in-Time (JIT) compilation converts interpreted expression evaluation into native at runtime, accelerating CPU-bound operations like projections, filters, and sorts in long-running queries. It generates specialized functions for expressions in WHERE clauses or aggregates when the query's estimated cost exceeds jit_above_cost (default 100000) and execution time surpasses jit_inline_above_cost (default 50000), primarily aiding analytical queries; monitoring via EXPLAIN (ANALYZE, JIT) reveals generation times and function counts to assess its impact. For ongoing performance monitoring, the pg_stat_statements extension tracks cumulative statistics on SQL statement planning and execution, including total and mean execution times, row counts, buffer hits/misses, and metrics. Enabled by loading it into shared_preload_libraries, it populates a view queryable for identifying resource-intensive statements, such as those with high total_exec_time, allowing targeted optimizations; statistics can be reset with pg_stat_statements_reset for focused analysis periods. Complementing this, the auto_explain module automatically logs execution plans for queries exceeding a configurable duration threshold (e.g., 250 ms via auto_explain.log_min_duration), using formats like text or and options akin to EXPLAIN ANALYZE, to capture slow query details in server logs without manual intervention. Vacuuming maintains table efficiency by reclaiming space from dead tuples and updating statistics, with tuning focused on cost-based delays to balance maintenance against concurrent workloads. The cost model assigns units for actions like page hits (1), misses (10 by default, adjustable via vacuum_cost_page_miss), and dirtying (20), accumulating until reaching vacuum_cost_limit (200), at which point the process sleeps for vacuum_cost_delay milliseconds (0 by default, disabled); enabling a small delay like 0.1 ms throttles I/O impact. Since PostgreSQL 13, cost-based vacuuming applies more granularly to phases like scanning and indexing. Parallel vacuum, available since version 11, employs multiple workers for index cleanup on tables with at least two indexes larger than min_parallel_index_scan_size, specified via the PARALLEL option, reducing maintenance time on multi-core systems without requiring VACUUM FULL. Autovacuum parameters like autovacuum_vacuum_scale_factor (0.2) and autovacuum_max_workers (3) further automate tuning based on table activity. For high-volume time series ingestion rates, such as around 5,000 inserts per second, vanilla PostgreSQL encounters specific limitations. The Write-Ahead Log (WAL) often serves as a primary bottleneck due to the I/O demands of sequential writes for durability during sustained high-throughput ingestion. Indexes, especially on high-cardinality fields like unique tags or series identifiers, impede write performance by necessitating updates for each insertion, resulting in contention, larger index sizes, and increased overhead. The absence of built-in time-series-specific compression leads to explosive storage growth, as general mechanisms like TOAST do not optimize for the repetitive patterns typical in time series data. High-cardinality scenarios further compound issues with index bloat, uneven partitioning, and resource-intensive maintenance. Such workloads are feasible in the short term through meticulous tuning, including batching inserts, WAL configuration adjustments, and partition management, but prove maintenance-intensive long-term, frequently necessitating extensions like TimescaleDB for specialized compression, chunking, and ingestion enhancements.

Deployment and Administration

Supported Platforms

PostgreSQL is compatible with a wide range of operating systems, including major Linux distributions such as , , , and ; Microsoft Windows; Apple macOS; and various Unix-like systems like , , , DragonFlyBSD, and . These platforms receive official binary packages or can be compiled from source, ensuring broad accessibility for deployment in diverse environments. The database supports multiple hardware architectures, encompassing x86 (both 32-bit and 64-bit variants), (including ARM64 for modern servers and devices), PowerPC, , (S/390), MIPS, and , with compatibility for big-endian and little-endian byte orders where hardware permits. For compilation from source, PostgreSQL requires an ISO/ANSI C compiler compliant with at least the standard, such as recent versions of GCC, while is supported for features like just-in-time () compilation via . Additionally, since version 18, 1.1.1 or later is required. Containerization and are facilitated through official Docker images available on Docker Hub, which provide pre-built, multi-architecture support for various PostgreSQL versions and simplify deployment in containerized workflows. A basic example to start a PostgreSQL container is:

bash

docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres

docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres

This command names the container "some-postgres", sets the required superuser password via the POSTGRES_PASSWORD environment variable, and runs the container in detached mode using the default "latest" tag (pointing to the most recent stable PostgreSQL version). The container listens on port 5432 internally. To expose it to the host, add -p 5432:5432. For persistent data, mount a volume like -v /path/to/local/data:/var/lib/postgresql/data. On , community-developed operators such as CloudNativePG, Zalando's Postgres Operator, and Crunchy Data's PostgreSQL Operator enable automated management of highly available clusters, including scaling, backups, and . In cloud environments, PostgreSQL is offered as fully across leading providers. Amazon (RDS) for PostgreSQL delivers scalable instances with automated maintenance, multi-AZ , and integration with AWS services. Google Cloud SQL for PostgreSQL provides serverless scaling, automatic backups, and built-in replication for global deployments. Azure Database for PostgreSQL supports flexible server configurations with AI-driven insights, zone-redundant , and seamless Azure ecosystem integration. Certain legacy platforms have reached end-of-life in recent versions; notably, support for the HP/Intel Itanium (IA-64) architecture was removed in PostgreSQL 16 and the HPPA/PA-RISC architecture in PostgreSQL 18 to streamline maintenance and focus on contemporary hardware.

Administrative Tools

PostgreSQL provides a suite of built-in command-line utilities and third-party tools designed for database administration tasks such as maintenance, backup, monitoring, and server management. These tools enable administrators to initialize clusters, perform backups and restores, clean up databases, rebuild indexes, analyze logs, and inspect write-ahead logs (WAL), ensuring reliable operation and performance optimization. The primary interactive interface for administration is psql, a terminal-based front-end that allows users to execute SQL queries interactively, view results, and use meta-commands for database introspection and scripting. Meta-commands, prefixed with a backslash, include \dt to list tables, \d to describe objects, and \i to execute scripts from files, facilitating efficient administrative workflows without leaving the shell. For server control and initialization, pg_ctl serves as the standard utility to manage the PostgreSQL server process, supporting operations like starting (pg_ctl start), stopping (pg_ctl stop), restarting (pg_ctl restart), and displaying status (pg_ctl status). It also integrates with initdb, which creates a new database cluster by setting up directories, initializing system catalogs, and configuring default settings such as locale, encoding, and, since version 18, data checksums (which can be disabled with --no-data-checksums). Backup and restore operations rely on pg_dump and pg_restore. The pg_dump utility exports a database into a script or , supporting formats like , custom, and directory for flexibility in selective or full backups. Complementing this, pg_restore selectively restores from custom or directory archives, for example using the --table option to restore a specific table (pg_restore --table=your_table -d your_db backup_file.dump), allowing parallel processing with the -j option to speed up large restores. Database maintenance tools address performance degradation from updates and deletes. vacuumdb automates the process, reclaiming space from dead tuples and updating optimizer statistics to prevent transaction ID wraparound and improve query planning. New in version 18, the --missing-stats-only option allows superusers to vacuum only tables without updated statistics. Similarly, reindexdb rebuilds indexes to correct fragmentation or corruption, wrapping the SQL REINDEX command for database-wide or targeted operations. For monitoring, pgAdmin offers a (GUI) for comprehensive administration, including visual query building, server monitoring via dashboards, and object management across multiple PostgreSQL instances on platforms like , Windows, and macOS. Command-line monitoring includes check_postgres.pl, a script that performs checks on connectivity, bloat, locks, and replication status, often integrated with tools like for alerting. Log analysis is enhanced by pgBadger, a fast -based analyzer that generates detailed HTML reports from PostgreSQL logs, highlighting slow queries, errors, and resource usage patterns. Introduced in PostgreSQL 9.3 and refined in later versions like 10, pg_waldump provides a human-readable display of WAL files for replication, recovery, or crash , decoding records such as inserts, updates, and checkpoints without requiring server access.

Ecosystem

Notable Users

PostgreSQL is widely adopted across various industries and projects, ranking fourth in the DB-Engines popularity ranking of database management systems as of November 2025, behind , , and . It has also been the most admired and desired in the Stack Overflow Developer Survey for multiple years, with 55.6% of respondents reporting its use in 2025. Major technology companies rely on PostgreSQL for critical production workloads. For instance, Apple uses PostgreSQL for various services. uses PostgreSQL in its microservices for internal tools and dashboards. depends on PostgreSQL to power its , storing and querying relationships among billions of users and posts while scaling through sharding techniques. Open-source projects also highlight PostgreSQL's versatility. uses PostgreSQL with the extension as its primary backend for importing, storing, and querying global geospatial data from millions of contributors. Ansible Automation Platform, developed by , employs PostgreSQL as its core database to manage automation data, inventories, and job executions across enterprise environments. , the open-source forum software, is built on with PostgreSQL as the backing database for handling discussions, user interactions, and content moderation. Case studies demonstrate PostgreSQL's scalability in large deployments. Skype, prior to its acquisition by , used PostgreSQL to support real-time communication features, call detail records, and user presence data for hundreds of millions of users. In the finance sector, PostgreSQL supports high-stakes applications requiring compliance and auditing. (BMO) uses PostgreSQL for and analytics, benefiting from its robust security features and performance in handling financial ledgers. Government and scientific organizations also adopt it; 's Database on Demand service provides PostgreSQL instances for data management, experiment logging, and collaborative analysis across global teams. As of 2025, PostgreSQL powers AI applications at companies like through extensions like pgvector.

Derivatives and Implementations

PostgreSQL has inspired a range of derivatives, including forks, specialized distributions, and extensions that adapt its core architecture for targeted workloads such as , time-series processing, and . Additionally, PostgreSQL serves as the database component in the LAPP stack, a variant of the traditional LAMP (Linux, Apache, MySQL, PHP/Perl/Python) stack, where it replaces MySQL to provide enhanced capabilities for dynamic web applications. Compared to MySQL, PostgreSQL offers superior SQL standards compliance, supporting a fuller range of SQL features including advanced subqueries and clauses like INTERSECT and OUTER JOIN, while MySQL is only partially compliant. PostgreSQL's extensibility allows for custom data types, operators, and index types, enabling spontaneous modifications without recompiling, whereas MySQL is less flexible in this regard. Furthermore, PostgreSQL excels in handling complex queries, concurrency, and read-write operations on large datasets, making it preferable for intricate workloads over MySQL's strengths in simpler, read-only scenarios. is a processing (MPP) fork of PostgreSQL optimized for large-scale and data warehousing, supporting petabyte-scale datasets through distributed query execution across multiple nodes. Originally developed as an open-source project, it emphasizes columnar storage and parallel data loading to accelerate complex analytical queries. EDB Postgres Advanced Server, formerly known as Postgres Plus, is an enterprise distribution built on PostgreSQL that incorporates Oracle-compatible features, advanced like , and high-availability options for mission-critical applications. It maintains full compatibility with standard PostgreSQL while adding multi-model support for relational, document, and vector data. TimescaleDB extends PostgreSQL as a time-series database, introducing hypertables for automatic partitioning of timestamped data to handle high-ingestion workloads efficiently, such as those in IoT monitoring and financial analytics. Managed services build on PostgreSQL to offer hosted implementations with automated operations, reducing administrative overhead for users. Postgres provides a fully managed PostgreSQL instance with on-demand scaling up to 200TB storage, built-in , and integration with Heroku's PaaS ecosystem for rapid application deployment. Supabase delivers PostgreSQL as the foundation for an open-source backend platform akin to , featuring synchronization via WebSockets, row-level security, and instant RESTful APIs generated from database schemas. Aiven for PostgreSQL is a cloud-agnostic managed service supporting over 50 extensions, with 99.99% uptime SLA, point-in-time , and compatibility across major providers like AWS, GCP, and Azure for hybrid deployments. The PostgreSQL extensions ecosystem vastly expands core functionality, with over 1,200 known extensions addressing diverse needs from geospatial analysis to . Hosted on the PostgreSQL Extension Network (PGXN), this repository enables easy discovery and installation of open-source modules. Citus, for instance, transforms PostgreSQL into a distributed system by sharding data across nodes, enabling 20x to 300x query speedups for multi-tenant SaaS and real-time analytics without major application changes. Similarly, pgvector adds native vector data types and similarity search operators (e.g., cosine distance), supporting up to 16,000-dimensional embeddings for AI tasks like and recommendation engines. Compatibility tools like ora_migrator streamline migrations from by using a single SQL command to copy schemas, data, indexes, and constraints via foreign data wrappers, ensuring transactional consistency. The broader community sustains these innovations through global PGConf events, such as annual conferences in locations like New York and , where developers present on extensions, forks, and best practices. Complementing this, PostgreSQL's mailing lists—over a dozen topic-specific forums—facilitate discussions on implementation details, from extension development to derivative troubleshooting.

References

  1. https://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling
  2. https://wiki.postgresql.org/wiki/Foreign_data_wrappers
  3. https://wiki.openstreetmap.org/wiki/PostgreSQL
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.