Block (data storage)

In computing (specifically data transmission and data storage), a block,^[1] sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a fixed length; a block size.^[2] Data thus structured are said to be blocked. The process of putting data into blocks is called blocking, while deblocking is the process of extracting data from blocks. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data stream.^[3] For some devices, such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to 9-track magnetic tape, NAND flash memory, and rotating media such as floppy disks, hard disks, and optical discs.

Most file systems are based on a block device, which is a level of abstraction for the hardware responsible for storing and retrieving specified blocks of data, though the block size in file systems may be a multiple of the physical block size. This leads to space inefficiency due to internal fragmentation, since file lengths are often not integer multiples of block size, and thus the last block of a file may remain partially empty. This will create slack space. Some newer file systems, such as Btrfs and FreeBSD UFS2, attempt to solve this through techniques called block suballocation and tail merging. Other file systems such as ZFS support variable block sizes.^[4]^[5]

Block storage is normally abstracted by a file system or database management system (DBMS) for use by applications and end users. The physical or logical volumes accessed via block I/O may be devices internal to a server, directly attached via SCSI or Fibre Channel, or distant devices accessed via a storage area network (SAN) using a protocol such as iSCSI, or AoE. DBMSes often use their own block I/O for improved performance and recoverability as compared to layering the DBMS on top of a file system.

On Linux the default block size for most file systems is 4096 bytes. The stat command part of GNU Core Utilities can be used to check the block size.

In Rust a block can be read with the read_exact method.^[6]

const BLOCK_SIZE: usize = 4096;

if let Ok(mut file) = File::open("example.bin")
{
    let mut buf = [0u8; BLOCK_SIZE];
    file.read_exact(&mut buf);
}

In Python a block can be read with the read method.

BLOCK_SIZE = 4096

with open("example.bin", "rb") as file:
    block = file.read(BLOCK_SIZE)

In C# a block can be read with the FileStream class.^[7]

const int BLOCK_SIZE = 4096;

using FileStream stream = File.Open("example.bin", FileMode.Open);
var block = new byte[BLOCK_SIZE];
await stream.ReadAsync(block, 0, BLOCK_SIZE);

References

^ Blaauw, Gerrit Anne; Brooks, Jr., Frederick Phillips; Buchholz, Werner (1962), "4: Natural Data Units" (PDF), in Buchholz, Werner (ed.), Planning a Computer System – Project Stretch, McGraw-Hill Book Company, Inc. / The Maple Press Company, York, PA., pp. 39–40, LCCN 61-10466, archived (PDF) from the original on 2017-04-03, retrieved 2017-04-03, […] Terms used here to describe the structure imposed by the machine design, in addition to bit, are listed below.
Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite, but respelled to avoid accidental mutation to bit.)
A word consists of the number of data bits transmitted in parallel from or to memory in one memory cycle. Word size is thus defined as a structural property of the memory. (The term catena was coined for this purpose by the designers of the Bull GAMMA 60 [fr] computer.)
Block refers to the number of words transmitted to or from an input-output unit in response to a single input-output instruction. Block size is a structural property of an input-output unit; it may have been fixed by the design or left to be varied by the program. […]
^ "Available hard drive space, block sizes, and size terminology". CNET. 2009-05-05. Retrieved 2014-04-29.
^ Chang, S. K. "Physical Structures". Captain SK. Retrieved 2014-04-29.
^ Balik, Rachel (2013-03-29). "Bruning Questions: ZFS Record Size". Joyent. Retrieved 2013-03-29.
^ Bourbonnais, Roch (2006-06-07). "Tuning ZFS recordsize". Oracle.
^ "Read in std::io - Rust". doc.rust-lang.org. Retrieved 2025-02-02.
^ "FileStream.ReadAsync Method (System.IO)". learn.microsoft.com. Retrieved 2025-02-02.

[Buchholz_1962-1] Blaauw, Gerrit Anne; Brooks, Jr., Frederick Phillips; Buchholz, Werner (1962), "4: Natural Data Units" (PDF), in Buchholz, Werner (ed.), Planning a Computer System – Project Stretch, McGraw-Hill Book Company, Inc. / The Maple Press Company, York, PA., pp. 39–40, LCCN 61-10466, archived (PDF) from the original on 2017-04-03, retrieved 2017-04-03, […] Terms used here to describe the structure imposed by the machine design, in addition to bit, are listed below.
Byte denotes a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes). In input-output transmission the grouping of bits may be completely arbitrary and have no relation to actual characters. (The term is coined from bite, but respelled to avoid accidental mutation to bit.)
A word consists of the number of data bits transmitted in parallel from or to memory in one memory cycle. Word size is thus defined as a structural property of the memory. (The term catena was coined for this purpose by the designers of the Bull GAMMA 60 [fr] computer.)
Block refers to the number of words transmitted to or from an input-output unit in response to a single input-output instruction. Block size is a structural property of an input-output unit; it may have been fixed by the design or left to be varied by the program. […]

[CNET14-2] "Available hard drive space, block sizes, and size terminology". CNET. 2009-05-05. Retrieved 2014-04-29.

[Chang-3] Chang, S. K. "Physical Structures". Captain SK. Retrieved 2014-04-29.

[ZFS1-4] Balik, Rachel (2013-03-29). "Bruning Questions: ZFS Record Size". Joyent. Retrieved 2013-03-29.

[ZFS2-5] Bourbonnais, Roch (2006-06-07). "Tuning ZFS recordsize". Oracle.

[6] "Read in std::io - Rust". doc.rust-lang.org. Retrieved 2025-02-02.

[7] "FileStream.ReadAsync Method (System.IO)". learn.microsoft.com. Retrieved 2025-02-02.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Info Pages

Talk Pages

Special Pages

Block (data storage)

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Block (data storage)

References