Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Database scalability
Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.
The initial history of database scalability was to provide service on ever smaller computers. The first database management systems such as IMS ran on mainframe computers. The second generation, including Ingres, Informix, Sybase, RDB and Oracle emerged on minicomputers. The third generation, including dBase and Oracle (again), ran on personal computers.
During the same period, attention turned to handling more data and more demanding workloads. One key software innovation in the late 1980s was to reduce update locking granularity from tables and disk blocks to individual rows. This eliminated a critical scalability bottleneck, as coarser locks could delay access to rows even though they were not directly involved in a transaction. Earlier systems were completely insensitive to increasing resources.
Once software limitations had been addressed, attention turned to hardware. Innovation occurred in many areas. The first was to support multiprocessor computers. This involved allowing multiple processors to handle database requests simultaneously, without blocking each other. This evolved into support for multi-core processors.
A much more significant change involved allowing distributed transactions to affect data stored on separate computers, using the two-phase commit protocol, establishing the shared-nothing architecture.
Still later, Oracle introduced the shared-everything architecture, which provided full functionality on multi-server clusters.
Another innovation was storing copies of tables on multiple computers (database replication), which both improved availability (processing could continue on a copy even if the main system was unavailable) and scalability particularly for query/analysis, in that requests could be routed to the copy if the primary reached its capacity.
In the early twenty-first century, NoSQL systems gained favor over relational databases for some workloads. Motivations included still greater scalability and support for documents and other "non-relational" data types. Often sacrificed was the strict ACID consistency protocols that guaranteed perfect consistency at all times in favor of eventual consistency that ensured that all nodes would eventually return the latest data. Some even allowed for transactions to occasionally be lost, as long as the system could handle sufficiently many requests. The most prominent early system was Google's BigTable/MapReduce, developed in 2004. It achieved near-linear scalability across multiple server farms, at the cost of features such as multi-row transactions and joins.
Hub AI
Database scalability AI simulator
(@Database scalability_simulator)
Database scalability
Database scalability is the ability of a database to handle changing demands by adding/removing resources. Databases use a host of techniques to cope. According to Marc Brooker: "a system is scalable in the range where marginal cost of additional workload is nearly constant." Serverless technologies fit this definition but you need to consider total cost of ownership not just the infra cost.
The initial history of database scalability was to provide service on ever smaller computers. The first database management systems such as IMS ran on mainframe computers. The second generation, including Ingres, Informix, Sybase, RDB and Oracle emerged on minicomputers. The third generation, including dBase and Oracle (again), ran on personal computers.
During the same period, attention turned to handling more data and more demanding workloads. One key software innovation in the late 1980s was to reduce update locking granularity from tables and disk blocks to individual rows. This eliminated a critical scalability bottleneck, as coarser locks could delay access to rows even though they were not directly involved in a transaction. Earlier systems were completely insensitive to increasing resources.
Once software limitations had been addressed, attention turned to hardware. Innovation occurred in many areas. The first was to support multiprocessor computers. This involved allowing multiple processors to handle database requests simultaneously, without blocking each other. This evolved into support for multi-core processors.
A much more significant change involved allowing distributed transactions to affect data stored on separate computers, using the two-phase commit protocol, establishing the shared-nothing architecture.
Still later, Oracle introduced the shared-everything architecture, which provided full functionality on multi-server clusters.
Another innovation was storing copies of tables on multiple computers (database replication), which both improved availability (processing could continue on a copy even if the main system was unavailable) and scalability particularly for query/analysis, in that requests could be routed to the copy if the primary reached its capacity.
In the early twenty-first century, NoSQL systems gained favor over relational databases for some workloads. Motivations included still greater scalability and support for documents and other "non-relational" data types. Often sacrificed was the strict ACID consistency protocols that guaranteed perfect consistency at all times in favor of eventual consistency that ensured that all nodes would eventually return the latest data. Some even allowed for transactions to occasionally be lost, as long as the system could handle sufficiently many requests. The most prominent early system was Google's BigTable/MapReduce, developed in 2004. It achieved near-linear scalability across multiple server farms, at the cost of features such as multi-row transactions and joins.