Hubbry Logo
Virtual Storage Access MethodVirtual Storage Access MethodMain
Open search
Virtual Storage Access Method
Community hub
Virtual Storage Access Method
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Virtual Storage Access Method
Virtual Storage Access Method
from Wikipedia

Virtual Storage Access Method (VSAM)[1] is an IBM direct-access storage device (DASD) file storage access method, first used in the OS/VS1, OS/VS2 Release 1 (SVS) and Release 2 (MVS) operating systems, later used throughout the Multiple Virtual Storage (MVS) architecture and now in z/OS. Originally a record-oriented filesystem,[NB 2] VSAM comprises four[NB 2] data set organizations: key-sequenced (KSDS), relative record (RRDS), entry-sequenced (ESDS) and linear (LDS).[2] The KSDS, RRDS and ESDS organizations contain records, while the LDS organization (added later to VSAM) contains a sequence of pages with no intrinsic record structure, for use as a memory-mapped file.

Overview

[edit]

An IBM Redbook named "VSAM PRIMER" (especially when used with the "Virtual Storage Access Method (VSAM) Options for Advanced Applications" manual) explains the concepts needed to make use of VSAM.[3] IBM uses the term data set in official documentation as a synonym for file, and direct-access storage device (DASD) for devices with random access to data locations, such as disk drives, as opposed to devices such as tape drives that can only be read sequentially.

VSAM records can be of fixed or variable length. They are organised in fixed-size blocks called control intervals (CIs),[4][5] and then into larger divisions called Control Areas (CAs). Control Interval sizes are measured in bytes – for example 4 kilobytes – while Control Area sizes are measured in disk tracks or cylinders. Control Intervals are the units of transfer between disk and computer so a read request will read one complete Control Interval. Control Areas are the units of allocation so, when a VSAM data set is defined, an integral number of Control Areas will be allocated.

The Access Method Services utility program IDCAMS is commonly used to manipulate ("delete and define") VSAM data sets. Custom programs can access VSAM datasets through Data Definition (DD) statements in Job Control Language (JCL), via dynamic allocation or in online regions such as in Customer Information Control System (CICS).

Both IMS/DB[citation needed] and Db2[6][7] are implemented on top of VSAM and use its underlying data structures.

Files

[edit]

The physical organization of VSAM data sets differs considerably from the organizations supported by other access methods, as follows.

A VSAM file is defined as a cluster of VSAM components, e.g., for KSDS a DATA component and an INDEX component.

Control intervals and control areas

[edit]

VSAM components consist of fixed length physical blocks grouped into fixed length control intervals[4][5] (CI) and control areas (CA). The size of the CI and CA is determined by the Access Method Services (AMS), and the way in which they are used is normally not visible to the user. There will be a fixed number of control intervals in each control area.

A control interval normally contains multiple records. The records are stored within the control interval starting from the low address upwards. Control information is stored at the other end of the control interval, starting from the high address and moving downwards. The space between the records and the control information is free space. The control information comprises two types of entry: a control interval descriptor field (CIDF) which is always present, and record descriptor fields (RDF) which are present when there are records within the control interval and describe the length of the associated record. Free space within a CI is always contiguous.

When records are inserted into a control interval, they are placed in the correct order relative to other records. This may require records to be moved out of the way inside the control interval. Conversely, when a record is deleted, later records are moved down so that the free space remains contiguous. If there is not enough free space in a control interval for a record to be inserted, the control interval is split. Roughly half the records are stored in the original control interval while the remaining records are moved into a new control interval. The new control interval is taken from a pool of free control intervals within the same control area as the original control interval. If there is no remaining free control interval within that control area, the control area itself is split and the control intervals are distributed equally between the old and the new control areas.

You can use three types of record-orientated file organization with VSAM (the contents of linear data sets have no record structure):

Sequential organization

[edit]

An entry-sequenced data set (ESDS) is a type of data set organization supported by VSAM.[2] Records are accessed based on their sequential order, that is, the order in which they were written to the file;[8][9][10] which means that accessing a particular record involves searching all the records sequentially until it is located, or by using a Relative byte address[NB 3] (RBA), i.e., the number of bytes from the beginning of the file to start reading.[11]

Records are loaded irrespective of their contents and their byte addresses cannot be changed.

While an ESDS has no key, alternate indexes (AIXs) may be defined to permit the use of fields as keys.[12]  An alternate index is itself a KSDS.

Indexed organization

[edit]

A key-sequenced data set (KSDS) is a type of data set supported by VSAM. Each record in a KSDS data file is embedded with a unique key.[13] A KSDS consists of two parts, the data component and a separate index file known as the index component which allows the system to physically locate the record in the data file by its key value.[14] Together, the data and index components are called a cluster.[15]

Records can be accessed randomly or in sequence and can be variable-length.

As a VSAM data set, the KSDS data and index components consist of control intervals[16] which are further organized in control areas.[17] As records are added at random to a KSDS, control intervals fill and need to be split into two new control intervals, each new control interval receiving roughly half of the records. Similarly, as the control intervals in a control area are used up, a control area will be split into two new control areas, each new control area receiving roughly half the control intervals.[18]

While a basic KSDS only has one key (the primary key), alternate indices may be defined to permit the use of additional fields as secondary keys.[19] An alternate index is itself a KSDS.[20]

The data structure used by a KSDS is nowadays known as a B+ tree.[21][22]

Relative organization

[edit]

A relative record data set (RRDS) is a type of data set organization supported by VSAM.[2] Records are accessed based on their ordinal position in the file (relative record number, RRN).[23] For example, the desired record to be accessed might be the 42nd record in the file out of 999 total. The concept of RRDS is similar to sequential access method, but it can access with data in random access and dynamic access.[clarification needed]

An RRDS consists of data records in sequence, with the record number indicating the record's logical position in the data set.[24] A program can access records randomly using this positional number or access records sequentially.[25] But unlike a Key Sequenced Data Set, an RRDS has no keys, so the program cannot access records by key value.

Linear organization

[edit]

A linear data set (LDS) is a type of data set organization supported by VSAM.[2] The LDS has a control interval size of 4096 bytes to 32768 bytes[26] in increments of 4096.[27] A LDS does not have embedded control information, because it does not contain control information, the LDS cannot be accessed as if it contained individual records.[28]

Addressing within an LDS is by Relative Byte Address (RBA), which allows it to be used by systems such as IBM Db2 or the Operating system.[clarification needed] The benefit of this is that systems such as the OS can access multiple disk spindles and view it as a single storage implementation. The limitations of this, though, is that this does not make this particularly useful to higher level abstraction levels.[original research?] Data In Virtual[29] (DIV) and Window services[30] provide an alternative method to direct use of VSAM to access an LDS with a CI size of 4096.

Data access techniques

[edit]

There are four types of access techniques for VSAM data:

  • Local Shared Resources (LSR), is optimised for "random" or direct access. LSR access is easy to achieve from CICS.[31]
  • Global Shared Resources (GSR)[32]
  • Non-Shared Resources (NSR), which is optimised for sequential access. NSR access has historically been easier to use than LSR for batch programs.[31]
  • Distributed File Management (DFM), an implementation of a Distributed Data Management Architecture server, enables programs on remote computers to create, manage, and access VSAM files.

Sharing data

[edit]

Sharing of VSAM data between CICS regions can be done by VSAM Record-Level Sharing (RLS). This adds record caching and, more importantly, record locking. Logging and commit processing remain the responsibility of CICS which means that sharing of VSAM data outside a CICS environment is severely restricted.

Sharing between CICS regions and batch jobs requires Transactional VSAM, DFSMStvs. This is an optional program that builds on VSAM RLS by adding logging and two-phase commit, using underlying z/OS system services. This permits generalised sharing of VSAM data.

History

[edit]

VSAM was introduced as a replacement for older access methods[33] and was intended to add function, to be easier to use and to overcome problems of performance and device-dependence. VSAM was introduced in the 1970s when IBM announced virtual storage operating systems (DOS/VS, OS/VS1 and OS/VS2) for its new System/370 series, as successors of the DOS/360 and OS/360 operating systems running on its System/360 computer series. While backwards compatibility was maintained, the older access methods suffered from performance problems due to the address translation required for virtual storage.

The KSDS organization was designed to replace ISAM, the Indexed Sequential Access Method. Changes in disk technology had meant that searching for data in ISAM data sets had become very inefficient. It was also difficult to move ISAM data sets as there were embedded pointers to physical disk locations which became invalid if the data set was moved. IBM also provided a compatibility interface to allow programs coded to use ISAM to use a KSDS instead.

The RRDS organization was designed to replace BDAM, the Basic Direct Access Method. In some cases, BDAM data sets contained embedded pointers which prevented them from being moved. However, most BDAM data sets did not and the incentive to move from BDAM to VSAM RRDS was much less compelling than that to move from ISAM to VSAM KSDS.

Linear data sets were added later, followed by VSAM RLS and then Transactional VSAM.

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Virtual Storage Access Method (VSAM) is an data management and file access method designed for efficient organization, storage, and retrieval of records on direct-access storage devices (DASDs) in and related mainframe environments. It supports direct, sequential, and skip-sequential access to fixed- or variable-length records using index keys, relative record numbers, or relative byte addresses, with data sets cataloged for simplified location and management. Primarily used in enterprise applications such as DB2, , IMS, and MQSeries, VSAM provides high-performance processing, , and scalability for batch and online transaction systems. Introduced in the as part of IBM's OS/VS1 and OS/VS2 operating systems for the System/370 series, VSAM replaced earlier methods like Indexed Sequential Access Method (ISAM) and Basic Direct Access Method (BDAM) to address the demands of virtual storage environments. Over decades, it has evolved with , incorporating extended addressability (up to 128 TB per data set with 32-KB control intervals), compression, , and support for up to 1 TB on extended address volumes. Key enhancements include Record Level Sharing (RLS) for concurrent sysplex access via Coupling Facility caching and locking, and transactional capabilities through DFSMStvs, enabling two-phase commit and recovery integration with z/OS Resource Recovery Services (RRS). These developments ensure VSAM's continued relevance in modern mainframe operations, supporting 24/7 availability and minimizing I/O contention. VSAM organizes data into five primary types of data sets, each suited to specific access patterns:
  • Key-Sequenced Data Set (KSDS): Records stored in ascending key order with an associated index for random or sequential retrieval, commonly used in database applications like IMS.
  • Entry-Sequenced Data Set (ESDS): Records appended in entry order, accessed sequentially or by relative byte address (RBA), supporting systems like DB2 and UNIX files.
  • Relative Record Data Set (RRDS): Fixed-length records accessed by relative record number for random processing.
  • Variable Relative Record Data Set (VRRDS): Similar to RRDS but for variable-length records.
  • Linear Data Set (LDS): Byte-stream storage without record boundaries, utilized by DB2 and z/OS subsystems.
Data sets are defined and managed using Access Method Services (AMS, or IDCAMS), which handles creation, deletion, and cataloging via (JCL) or dynamic allocation. Records are grouped into control intervals (default 4 KB, up to 32 KB) within control areas for optimized I/O, with features like (up to 16 stripes), system-managed buffering, and free space allocation enhancing performance and update efficiency. In programming, VSAM employs macros from SYS1.MACLIB, including control block macros for blocks (ACBs) and request macros like GET, PUT, POINT, and ERASE for record operations, supporting both 24-bit and 31-bit addressing modes. Buffering options such as Local Shared Resources (LSR), Global Shared Resources (GSR), and RLS provide varying levels of resource sharing and integrity across regions or systems. Robust recovery mechanisms, including backup-while-open, SMF type 60-69 records for auditing, and catalog verification, underscore VSAM's role in ensuring data reliability and business continuity in high-volume environments.

Fundamentals

Overview

Virtual Storage Access Method (VSAM) is a file storage access method designed for direct-access storage devices (DASD) on IBM mainframes, functioning as both a data set type and an access method to manage various user data. It supports both fixed-length and variable-length records, enabling the organization of complex data structures in a proprietary, non-human-readable format optimized for high-performance applications. The primary purposes of VSAM are to facilitate efficient random and sequential access to data sets stored on direct-access volumes, while replacing earlier access methods such as the Indexed Sequential Access Method (ISAM) and Basic Direct Access Method (BDAM). This allows applications to load, retrieve, update, and add records with greater flexibility and speed compared to legacy systems, making it suitable for database management systems like IMS and DB2. Key advantages of VSAM include enhanced through advanced indexing and buffering mechanisms, which reduce I/O operations and improve throughput for large-scale sets. It also integrates seamlessly with virtual storage environments, supporting scalability in systems and enabling efficient handling of voluminous without the limitations of prior methods. At its core, VSAM comprises s for storing records (organized into types such as Key-Sequenced Data Sets and Entry-Sequenced Data Sets), clusters that logically combine data components with associated indexes, and catalogs that maintain metadata, volume information, and data set locations for management and retrieval.

Control Intervals and Control Areas

In Virtual Storage Access Method (VSAM), the control interval (CI) serves as the fundamental unit of data transfer between direct access storage devices (DASD) and the system's buffer storage, enabling efficient I/O operations by moving fixed blocks of rather than individual records. Each CI encompasses one or more logical records along with associated control information and free space, with sizes ranging from 512 bytes to 32 kilobytes (32,768 bytes), though the default is typically 4 kilobytes to balance performance and space efficiency. This structure ensures that VSAM can manage integrity, support updates, and minimize fragmentation during access. The internal structure of a CI includes several key components to facilitate record management. At the beginning is the control interval definition field (CIDF), a 4-byte area that records the total length of all data records in the CI, the amount and of free space, and other metadata such as the offset to unused space. Following this are the data records themselves, which may include unused space for alignment or padding. Each record is preceded by a record definition field (RDF), typically 3 or 4 bytes long, containing details like the record's length, displacement within the CI, and flags indicating status (e.g., whether it is the first, intermediate, or last segment of a spanned record). Free space, allocated at the end of the CI, reserves room for future insertions or expansions, particularly important for variable-length records where insertions can shift subsequent data; this free space is often specified as a (e.g., 10-20%) during definition to optimize utilization. Control areas (CAs) represent the next level of organization, consisting of a contiguous group of one or more CIs that form VSAM's basic unit for space allocation and extension on DASD. A CA typically spans one to several tracks (up to one , or 15 tracks on non-striped devices), providing a framework for managing overflow and ensuring that related CIs remain physically proximate to reduce seek times during I/O. In certain VSAM data set types, such as those supporting random insertions, CAs include spans—additional CIs reserved for overflow when primary CIs fill up, preventing excessive fragmentation. CI sizes must align with the underlying device's block or track boundaries to avoid partial transfers and ensure compatibility, often resulting in common values like 4K or 8K bytes that are multiples of the DASD track capacity. Space utilization within a CI is influenced by overhead from the CIDF and RDFs, as well as free space allocation; for instance, the approximate number of records that can fit in a CI can be calculated as: Records per CI=CI sizeCIDF (4 bytes)(Number of records×RDF size (3-4 bytes))Average record size\text{Records per CI} = \frac{\text{CI size} - \text{CIDF (4 bytes)} - (\text{Number of records} \times \text{RDF size (3-4 bytes)})}{\text{Average record size}} This formula highlights the trade-off: larger CIs improve I/O efficiency for but may waste space if records are small, while overhead and free space reduce the effective . Usage of CIs varies based on whether records are fixed-length or variable-length, affecting how space is managed across VSAM data organizations. For fixed-length records, CIs are packed with a predictable number of complete records, often using slot-based allocation to simplify addressing and minimize free space needs. In contrast, variable-length records require RDFs for each to track boundaries, incorporate more free space to accommodate insertions without frequent CI splits, and support spanning across multiple CIs within the same CA if a single record exceeds the CI size (limited to 255 CIs per record). These differences ensure adaptability: fixed-length setups prioritize density and predictability in sequential or relative organizations, while variable-length approaches enhance flexibility for keyed or entry-sequenced data where updates and growth are common.

Data Set Organizations

Entry-Sequenced Data Sets

An entry-sequenced (ESDS) in VSAM is a sequential file organization where records are stored and accessed in the order of their entry, similar to a traditional non-VSAM sequential but with enhanced management features. Each record is identified by its relative byte address (RBA), which serves as the primary access identifier starting from 0 for the first record. Unlike key-based organizations, an ESDS has no index component, ensuring records are appended only at the end of the . The structure of an ESDS consists of records stored sequentially within control intervals (CIs), which are the basic units of data transfer between VSAM and the storage device. Records can be either nonspanned, fitting entirely within a single CI, or spanned, allowing larger records to extend across multiple CIs if necessary. The RBA for any record is calculated as the byte offset from the beginning of the data set, providing a direct means to locate it without relying on keys or slots. Control areas group multiple CIs, but the overall organization remains linear and entry-ordered. Creation of an ESDS involves the IDCAMS utility with the DEFINE CLUSTER command, specifying the NONINDEXED option to indicate the absence of an index. Key parameters include RECORDSIZE to define the average and maximum record lengths (e.g., RECORDSIZE(80 80) for fixed-length records of 80 bytes) and CONTROLINTERVALSIZE to set the CI size, typically 4096 bytes. After definition, the data set is loaded sequentially using the REPRO command from an input file, such as REPRO INFILE(DD:INPUT) OUTDATASET(ESDS.NAME), which appends records in entry order. No index is created during this process, keeping the structure simple and efficient for sequential operations. Access to an ESDS supports sequential reads forward or backward through the records in entry order, as well as random insertion of new records at the end via RBA. Direct access to existing records is possible by specifying their RBA, but updates are limited to rewriting the record in place without changing its length, and deletions are handled by marking records as inactive rather than removing them. Spanned records are managed automatically during access to ensure continuity across CIs. These patterns emphasize append-only and sequential processing, avoiding the overhead of indexed retrieval. ESDS organizations are particularly suited for applications where the sequence of record entry is critical, such as audit trails that log events in chronological order or queues that require appending new items without reordering. They serve as flat files for scenarios like transaction logging or queuing, where direct RBA access enables efficient retrieval of specific entries without key dependencies. Extended ESDS variants support larger data sets exceeding 4 GB using 64-bit extended RBAs (XRBAs) for modern high-volume use cases.

Key-Sequenced Data Sets

A Key-Sequenced (KSDS) is a type of Virtual Storage Access Method (VSAM) that organizes records in ascending collating sequence based on a user-defined key field, enabling both sequential and . Records are logically sequenced by this key, which serves as the primary identifier, making KSDS suitable for applications requiring efficient keyed lookups and ordered processing. The structure of a KSDS consists of two primary components: the data component and the index component. The data component stores the actual records within control intervals (CIs), grouped into control areas (CAs), with records maintained in key order to facilitate insertions and retrievals. The index component, a separate entity, includes a sequence set that maps each record's key to its relative byte address (RBA) in the data component, along with higher-level index sets (such as the master index) that form a hierarchical B-tree-like structure for rapid navigation across multiple levels. This separation allows the index to point to data locations without embedding keys in every record, optimizing storage and . Keys in a KSDS are defined at creation time using parameters like KEYS or KEYLEN, with lengths ranging from 1 to 255 bytes and a fixed offset from the record's start. The can be specified as unique (via UNIQUEKEY) to enforce no duplicates or non-unique (NONUNIQUEKEY) to permit them, depending on application needs. Optional alternate keys, managed through alternate indexes (AIX), provide additional access paths and can also be unique or non-unique, up to 255 bytes in length. Records are inserted into a KSDS in key sequence, with VSAM allocating free space during cluster definition via the FREESPACE parameter—typically 10-20% within CIs and 10% across CAs—to accommodate growth and reduce reorganization frequency. When a CI fills during insertion, a control interval split occurs, redistributing records (either at the insert point or , depending on the ), and the index is updated accordingly; control area splits handle overflow from full CAs, potentially taking tens of milliseconds. Maintenance involves reclaiming space from deletions or record shortening, with utilities like REPRO or VERIFY ensuring structural integrity and minimizing splits over time. Access to KSDS records supports random retrieval by providing a full or generic key, which traverses the index hierarchy to obtain the RBA for direct positioning in the data component. Sequential access processes records in key order using the sequence set's pointers, or by entry sequence via RBAs, while updates and deletions are performed by key, reusing freed space where possible. The RBA mechanism builds on the addressing used in entry-sequenced data sets, adapting it for indexed operations.

Relative-Record Data Sets

A Relative-Record Data Set (RRDS) in VSAM is a organization designed for fixed-length records that are accessed directly by their relative record number (RRN), which serves as a numeric position identifier starting from 1 for the first record up to a predefined maximum. This treats the data set like a one-dimensional , where each RRN corresponds to a specific slot, enabling efficient positional access without the need for keys or indexes. Unlike other VSAM organizations, RRDS does not maintain records in key-sorted order or as unstructured bytes, focusing instead on simple, slot-based storage. The internal structure of an RRDS consists of records stored in predefined fixed-length slots within control intervals (CIs), the basic unit of VSAM I/O. Each slot is sized to match the fixed record length, and the RRN directly maps to a physical position by multiplying the RRN by the slot size to determine the byte offset, though VSAM handles this mapping transparently. Unused or deleted slots are marked as available for reuse but remain allocated, with no keys or index entries required, which simplifies the but can lead to space inefficiency in sparse scenarios. Control areas group multiple CIs, but the slot-based organization ensures that records are not relocated during insertions or deletions, preserving RRN stability. To create an RRDS, the IDCAMS utility is used with a DEFINE CLUSTER command specifying the fixed record size using RECORDSIZE and space allocation parameters (e.g., TRACKS or CYLINDERS) to determine the number of slots based on control interval size. For example, RECORDSIZE(80 80) with TRACKS(10 5) on a volume with 4 KB control intervals would allocate space for a calculated number of 80-byte slots, depending on track capacity. Once created, access is primarily direct: applications specify the RRN in the key field to insert, update, retrieve, or delete records, making it ideal for patterns. is also supported by reading or writing in ascending RRN order, though it is generally less efficient than direct access due to the positional nature. A variable-length variant, the Variable Relative Record Data Set (VRRDS), operates similarly but supports variable-length records within slots. Each record includes length fields (e.g., 4-byte RDW for record descriptor word), allowing records from the minimum to maximum defined lengths to occupy varying space in the CI while maintaining RRN positioning. Creation uses RECORDSIZE(average maximum) with the NUMBERED option in DEFINE CLUSTER, and access follows the same RRN-based methods, with VSAM handling variable sizing transparently. VRRDS suits applications needing flexible record sizes in positional storage, such as dynamic data arrays, but shares RRDS limitations like no alternate indexes and potential fragmentation from varying lengths or unused slots. RRDS and VRRDS are best suited for applications requiring sparse or dense fixed-position data, such as simple tables, queues, or arrays where records are referenced by ordinal position rather than content. Their limitations include the absence of alternate indexes and potential internal fragmentation from unused slots, which can waste space if the data set is not densely populated. These characteristics make them lightweight options for scenarios where direct, keyless access outperforms more complex organizations, but they are not recommended for applications needing key-based searching or dynamic record sizing beyond VRRDS capabilities.

Linear Data Sets

A Linear Data Set (LDS) in VSAM is a byte-addressable designed for storing unformatted, contiguous data without records, keys, indexes, or embedded control information such as control interval definition fields (CIDF) or record definition fields (RDF). Unlike other VSAM organizations, an LDS treats the entire space as a continuous stream of bytes, accessible via relative byte address (RBA) starting from zero, making it suitable for applications requiring simple, raw data storage similar to a flat file. It lacks record-level management, with all operations handled by the application, and does not support VSAM record-level sharing (RLS) in the same way as key-sequenced or entry-sequenced sets. The structure of an LDS consists of a sequence of control intervals (CIs) grouped into control areas (CAs), where each CI serves as the basic unit of direct access storage, typically ranging from 512 bytes to 32 KB in size, with 4 KB being common for many system applications. Data is stored contiguously across these CIs without any internal formatting or free space allocation for records, allowing the full CI capacity to be used for user data. LDS supports extended addressability (EA), enabling datasets up to 128 terabytes when using a 32-KB CI size, and is often allocated under System Managed Storage (SMS) with features like extended format for improved performance. As referenced in VSAM fundamentals, the CI acts as the fixed storage unit, but in LDS, it contains only raw bytes without the typical VSAM overhead. To create an LDS, the IDCAMS utility's DEFINE CLUSTER command is used with the LINEAR parameter (or RECORG= in JCL), specifying the name, volumes, space allocation in tracks or cylinders, CI size, and sharing options such as SHAREOPTIONS(1,3) for cross-system access. No record definitions or key ranges are required during creation, as the is initialized as empty space without predefined logical identifiers. For example, a basic definition might allocate one track on a specific volume for initial testing or small-scale use. Access to an LDS occurs through VSAM, the Data-in-Virtual (DIV) macro, or window services, supporting both sequential and random (direct) methods via RBA offsets for reading or writing data. Updates require control interval access with authority, using routines like CSRSCOT and CSRSAVE to load and modify CIs, followed by overwriting bytes at the specified RBA without insert or delete logic. Sequential access processes data in physical order from the beginning, while random access jumps to any RBA, enabling efficient handling of large, non-structured content. LDS are commonly used for spanning large, contiguous objects such as database table spaces in , Hierarchical File System (HFS) components, system logger staging datasets, and trace data output for improved performance over sequential datasets. In environments like VSAM RLS, they serve as sharing control data sets (SHCDS) to manage access across systems, and their support for striping (up to 16 stripes) and duplexing enhances throughput for high-volume, non-record-oriented workloads. Introduced in later VSAM enhancements to support extended storage needs, LDS provide compatibility for legacy and modern mainframe applications requiring simple byte-stream management.

Access and Processing

Data Access Techniques

VSAM provides several primary techniques for accessing data sets, enabling efficient retrieval, modification, and management of records across its various organizations. Sequential access allows processing records in a forward or backward direction, typically by key in key-sequenced data sets (KSDS), relative byte (RBA) in entry-sequenced data sets (ESDS), or relative record number (RRN) in relative-record data sets (RRDS). This method is optimized for workloads that traverse the entire set or large portions in order, leveraging read-ahead mechanisms to minimize physical I/O operations. Random or direct access, in contrast, targets specific records without regard to sequence, using a search argument such as a key for indexed access or an (RBA or RRN) for non-indexed types, making it suitable for transactional or query-based applications. For instance, in a KSDS, random access by key involves traversing the index to locate the record efficiently. The core operations in VSAM are performed through request macros that interact with control blocks to specify and execute data manipulations. The GET macro retrieves a logical record into a program buffer, supporting both sequential and random modes depending on the options provided. The PUT macro inserts a new record or updates an existing one, with strategies like sequential insert (SIS) for ordered additions or non-sequential insert (NIS) for placements to avoid index splits. ERASE removes a record from the data set, requiring prior retrieval via GET to ensure the correct record is targeted, while POINT positions the access pointer to a specific record without transferring data, often used to establish a starting point for subsequent sequential operations. These macros rely on two key control blocks: the Access Method Control Block (ACB), which defines the data set's attributes such as access type (sequential, , or both) and buffering mode, generated via the GENCB or ACB macro; and the Request Parameter List (RPL), which parameterizes individual requests with details like the operation code (OPTCD), key value, and buffer address, also built using GENCB or RPL macros. VSAM supports distinct processing modes to align with different access patterns, enhancing flexibility in application design. Browse mode facilitates sequential processing, allowing forward or backward traversal of records in a controlled manner, ideal for reporting or batch updates without random jumps. Locate mode enables random reads by key, positioning to the record and optionally returning its address in the RPL without copying data to the user area, which is useful for validation or chained operations. Addressed mode provides direct access using RBA for byte-level positioning in ESDS or RRN for slot-based retrieval in RRDS, bypassing index structures for faster non-keyed lookups. These modes are specified in the RPL's OPTCD parameter, with combinations allowing hybrid access, such as skip-sequential where an initial random POINT is followed by sequential GETs. Error handling in VSAM is managed through return codes and feedback mechanisms to ensure robust program execution. Upon macro completion, register 15 contains a return code: 0 indicates success, 4 signals during , and 8 denotes general s such as duplicate keys on insert or record-not-found conditions. More severe issues, like physical I/O failures ( 12) or uncorrectable I/O s (feedback 184), trigger detailed feedback in the RPL's error fields (RPLERRCD) or area (MSGAREA), allowing programs to invoke SYNAD exits for recovery. For conditions like , applications typically check the after each GET and terminate the loop accordingly. Performance considerations in VSAM access emphasize matching techniques to workload patterns to optimize resource usage. benefits from continuous read-ahead but should be skipped in favor of direct methods for non-sequential patterns, reducing unnecessary index traversals and I/O. In scenarios, using locate mode minimizes data movement, while addressed access avoids key searches entirely for applicable types, potentially lowering EXCPs (external I/O calls) by up to 50% in high-hit-rate environments. Overall, selecting the appropriate mode and macro sequence based on access intent prevents inefficiencies like excessive splits in indexed structures.

Buffering and I/O Management

VSAM employs a dynamic buffering mechanism to manage control intervals (CIs) in virtual storage, optimizing and index access efficiency. Buffers are allocated through parameters in the Access Method Control Block (ACB), primarily BUFND (number of buffers, dynamically allocated based on STRNO and mode, e.g., STRNO+1 in NSR) and BUFNI (number of index buffers, e.g., STRNO+2 in NSR). In 3.1 and later, VSAM supports dynamic buffer addition for non-shared resources (NSR) buffering, automatically increasing buffers as needed to improve sequential I/O performance. These can specify shared buffers in Local Shared Resources (LSR) or Global Shared Resources (GSR) modes for intra- or inter-address space reuse, or private buffers in Non-Shared Resources (NSR) mode, with allocation occurring dynamically at open. For I/O operations, VSAM uses read-ahead techniques during to prefetch multiple CIs, anticipating subsequent requests via the sequence set or look-ahead processing, which enhances throughput by reducing physical disk accesses. In contrast, relies on demand paging, loading CIs on-demand into buffers to support direct record retrieval, often achieving hits without additional I/O through buffer residency. CI prefetch complements these by preloading anticipated intervals, while write-behind defers non-critical writes to batch them, minimizing synchronous overhead except in cases like random updates in Record Level Sharing (RLS) mode, where writes are immediate to ensure consistency. These techniques integrate with data access methods, such as GET or POINT, by staging CIs in buffers for rapid logical processing. Tuning parameters like BUFND, BUFNI, and STRNO (number of I/O strings, default 1) directly influence performance; for instance, increasing buffers reduces EXCPs (channel programs), where one EXCP equates to approximately 10,000 CPU instructions, thereby boosting throughput in high-activity environments. Buffer space is calculated as BUFFERSPACE = (BUFND × CI size) + (BUFNI × index CI size), with overrides possible via JCL or ACB to allocate total space across datasets, ensuring adequate residency for workloads while avoiding excessive virtual storage consumption. Optimal settings, such as STRNO up to 255 for reads, balance I/O parallelism against resource limits. String I/O enhances efficiency by transferring multiple control areas (CAs) in a single operation, leveraging STRNO to initiate concurrent channel programs for sequential or skip-sequential processing, which amortizes setup costs and improves data transfer rates over individual CI I/Os. In VSAM RLS for multi-user environments, buffering utilizes Coupling Facility (CF) caches for sysplex-wide CI sharing alongside local pools in SMSVSAM data spaces (default 100 MB, maximum 1.7 GB for 31-bit; tunable above the 2 GB bar). The Buffer Management Facility (BMF) employs an LRU algorithm with timestamps for aging, maintaining high hit ratios (target 50% or better) and supporting CI sizes up to 32 KB, though it enforces store-through writes to DASD for consistency without deferred options.

Sharing and Management

Data Sharing Mechanisms

VSAM supports multiple sharing modes to facilitate concurrent access to data sets while maintaining integrity, ranging from exclusive single-user access to multisystem sharing in z/OS Parallel Sysplex environments. In single-user mode, a data set is accessed exclusively by one task within an address space, typically specified via DISP=OLD in JCL, preventing any concurrent access to avoid conflicts. Shared access within a single system allows multiple tasks or users to access the data set concurrently using z/OS enqueue/dequeue (ENQ/DEQ) mechanisms for serialization, controlled by the Global Resource Serialization (GRS) or Enqueue Manager with DISP=SHR; this mode relies on the SYSDSN major name for resource naming and supports both read and update operations under user-managed integrity. Cross-region sharing extends this capability across multiple z/OS images in a Parallel Sysplex, employing SHAREOPTIONS parameters (such as 3,x) to permit multiple readers and writers, with buffers placed in common storage areas (CSA) and serialization handled via GRS or coupling facility structures to ensure consistency. Record-level sharing (RLS) represents an advanced multisystem sharing option introduced in DFSMS/MVS Release 1.3 in 1995, enabling full update capability for VSAM data sets across multiple systems in a Parallel Sysplex without requiring application-level serialization. RLS leverages a coupling facility for centralized lock management, caching, and buffer invalidation, allowing records to be locked at the individual level rather than the entire data set or control interval; this is activated via the MACRF=RLS parameter in the access control block (ACB) and requires the SMSVSAM address space for coordination. Supported for key-sequenced (KSDS), entry-sequenced (ESDS), relative-record (RRDS), and variable relative-record (VRRDS) data sets, RLS integrates with transactional VSAM (TVS) for two-phase commit processing and uses LOG= parameters (NONE, UNDO, or ALL) to manage recovery. In RLS mode, local buffer pools interact with the coupling facility cache to minimize I/O, achieving high availability through structure-based data movement and rebuild capabilities during failures. To preserve data integrity during shared access, VSAM employs several locking mechanisms at different granularities. Control interval (CI) latches provide serialization at the CI level in both RLS and non-RLS modes, preventing concurrent modifications to the same physical storage unit. Record locks, managed primarily through the coupling facility in RLS, can be shared for read operations or exclusive for updates, ensuring that conflicting accesses are blocked until released. VSAM spheres define logical groupings of a base cluster, its alternate indexes, and path components, protected by ENQ/DEQ operations to maintain consistency across related structures during quiescing or recovery activities. Conflict resolution in VSAM sharing environments includes automated deadlock detection and configurable timeout handling to prevent indefinite waits. Deadlock detection operates locally every 15 seconds by default and globally after four cycles, configurable via the DEADLOCK_DETECTION parameter in IGDSMSxx or through ANALYZE commands, allowing the system to identify and resolve circular wait conditions in GRS or RLS structures. Timeouts are enforced via parameters such as DSSTIMEOUT (default 300 seconds, adjustable from 0 to 65536 seconds) for general VSAM operations and RLSTMOUT (0 to 9999 seconds) specifically for RLS, enabling applications to handle contention by aborting requests after the specified duration. Despite these capabilities, VSAM sharing has limitations, particularly in supported data organizations; for instance, linear data sets (LDS) do not support RLS, restricting them to single-system or basic cross-region sharing without record-level granularity. Additionally, RLS requires a Parallel Sysplex environment with a coupling facility and is incompatible with certain legacy options like Hiperbatch or ISAM access methods.

Catalogs and Utilities

The Virtual Storage Access Method (VSAM) employs the Integrated Catalog Facility (ICF) to manage catalogs that store metadata for both VSAM and non-VSAM s. ICF catalogs consist of a Basic Catalog Structure (BCS), implemented as a VSAM key-sequenced (KSDS), and a VSAM Volume Data Set (VVDS), implemented as an entry-sequenced (ESDS). The BCS contains essential information such as names, volume locations, ownership, and attributes like average and maximum record lengths, while the VVDS holds volume-specific details including dynamic attributes for SMS-managed s, such as stripe counts and compression formats. VSAM's self-describing nature allows these catalogs to maintain metadata like high-used relative byte addresses (HURBA), high-allocated relative byte addresses (HARBA), buffer space, and key ranges, enabling automatic location and management without external tracking. ICF supports a hierarchical with one master catalog per system, which stores IPL-required data sets and aliases for user catalogs, and multiple user catalogs that hold application-specific metadata. User catalogs are recommended to be placed on dedicated volumes for optimal performance, with control interval (CI) sizes typically set to multiples of 4096 bytes for data components and 4096 bytes for index components, and free space adjusted based on update frequency (e.g., 0% for read-only access). The master catalog requires at least one more qualifier than the system's alias level to ensure proper resolution. The primary utility for VSAM catalog and data set management is IDCAMS (Access Method Services), which defines, modifies, and maintains VSAM structures and ICF catalogs. Key IDCAMS commands include DEFINE, which creates VSAM clusters, components, paths, and alternate indexes by specifying parameters such as name, volumes, cylinders, record sizes, and keys (e.g., DEFINE CLUSTER (NAME(VSAM.KSDS) VOLUMES(VOL001) CYLINDERS(1 1) RECORDSIZE(72 100) KEYS(9 8))). ALTER modifies existing attributes, such as buffer counts or volume additions, while REPRO copies data between VSAM s or to/from sequential files, supporting options like error limits (e.g., REPRO INFILE(SEQ.DS) OUTFILE(VSAM.KSDS) ELIMIT(200)). PRINT dumps and displays the contents of VSAM data sets for inspection. Additional utilities complement IDCAMS for maintenance and portability. VERIFY checks and repairs structural consistency in key-sequenced data sets, addressing issues like unclaimed control areas or interrupted splits following abnormal terminations, and can be invoked implicitly during data set open or manually for recovery. EXPORT creates portable backups of VSAM data sets, preserving catalog entries and SMS classes, while IMPORT restores them to another environment. LISTCAT inventories catalog entries, providing details on data sets such as split counts, extents, and usage statistics (e.g., via LISTCAT ENTRY('DS.NAME') ALL). Catalog recovery procedures leverage VSAM's self-describing features and regular backups to minimize outages. Daily backups of ICF catalogs are recommended using IDCAMS EXPORT, with verification of all catalogs and testing of restore processes to ensure integrity. Recovery involves restoring from backups and applying forward recovery with System Management Facilities (SMF) records (types 61, 65, and 66) via tools like the Integrated Catalog Facility Recovery Utility (ICFRU). For structural issues, EXAMINE within IDCAMS tests index and data integrity, while DIAGNOSE identifies synchronization errors between BCS and VVDS; damaged entries can then be removed and redefined using DELETE with TRUENAME or RECATALOG options. Sharing Control Data Sets (SHCDS) maintain lock integrity across sysplexes, with recovery commands like FRSETRR and FRBIND to reset errors. Integration with (JCL) facilitates automated catalog management, where IDCAMS is invoked via EXEC PGM=IDCAMS statements with SYSIN for command input and allocation handled through DD statements referencing cataloged names. For example, JCL can define data sets with logging attributes (e.g., LOG(ALL) for full recoverability) and allocate them dynamically from the catalog, ensuring seamless linkage during .
Utility/CommandPrimary FunctionKey Parameters/Options
DEFINECreate VSAM structuresNAME, VOLUMES, CYLINDERS, RECORDSIZE, KEYS
ALTERModify attributesBUFNI, VOLUMES
REPROCopy dataINFILE, OUTFILE, ELIMIT
PRINTDisplay contents-
VERIFYRepair consistencyRECOVER
Backup for portability-
Restore from backup-
LISTCATCatalog inventoryENTRY, ALL

History and Evolution

Origins and Development

The Virtual Storage Access Method (VSAM) was developed by during the late as part of the transition to virtual storage systems on the System/370 architecture, aiming to provide a more advanced and unified approach to file management. It was initially released with OS/VS1 in 1972 and subsequently with OS/VS2 in 1973, marking a significant evolution in IBM's data access methodologies for mainframe environments. This development aligned with the broader shift to virtual addressing, enabling larger data sets and more efficient resource utilization beyond the constraints of prior systems. The motivations behind VSAM's creation centered on unifying and improving upon earlier access methods, including the Indexed Sequential Access Method (ISAM), Basic Sequential Access Method (BSAM), and Queued Sequential Access Method (QSAM), which suffered from inefficiencies such as overflow handling in ISAM and limited under 24-bit addressing. VSAM addressed these by introducing device-independent data sets, automated block sizing, and distributed free space management to reduce fragmentation and enhance performance for both sequential and direct processing. Additionally, it facilitated easier data portability across DOS/VS and OS/VS systems, with built-in utilities for converting legacy ISAM and SAM data sets, thereby simplifying migration for users. Early implementations of VSAM focused on core data set organizations, providing initial support for Key-Sequenced Data Sets (KSDS), which used embedded indexes for keyed access, and Entry-Sequenced Data Sets (ESDS), which allowed sequential insertion and retrieval by relative byte address (RBA). A compatibility mode for Basic Direct Access Method (BDAM) was also included to enable addressed access without immediate reprogramming of existing applications. These features emphasized long-term data stability and flexibility for database and , distinguishing VSAM from the more rigid structures of its predecessors. Later enhancements included Variable Relative Record Data Sets (VRRDS) for variable-length records in relative access. Key milestones in VSAM's early evolution included the 1974 Release 2 enhancements, which added support for Relative-Record Data Sets (RRDS) to permit direct access via relative record numbers, expanding options for fixed-length record handling. This release also deepened integration with the emerging subsystem of OS/VS2, ensuring seamless operation in multiprogramming environments. Initial adoption occurred gradually in enterprise settings, where VSAM phased in as a replacement for older methods through conversion tools and its superior handling of large-scale data sets, particularly in sectors requiring reliable indexed and .

Modern Usage and Updates

VSAM continues to serve as a foundational data access method in IBM z/OS environments, with full support in version 3.1, released in 2023, enabling efficient management of large-scale datasets in mission-critical applications across industries such as banking and finance. In these sectors, VSAM handles extensive transaction logs, records, and operational , contributing to systems processing billions of transactions daily while maintaining and reliability. Its role persists due to the enduring demand for robust, high-performance storage on mainframes, which support petabyte-scale environments through aggregated datasets and advanced storage subsystems like DS8000. Key enhancements have sustained VSAM's relevance, including Record Level Sharing (RLS), introduced in version 2 release 1 in 1996, which facilitates sysplex-wide concurrent access to VSAM datasets with record-level locking via coupling facilities, reducing downtime in shared environments. Extended addressability, introduced in DFSMS/ 1.3 in 1995 and further enhanced in version 1 release 5 in 2000 and version 1 release 10 (2008) to support extended address volumes (EAVs), allows individual VSAM clusters to exceed 4 GB, with capabilities up to 225 TB per dataset using 64-bit addressing and extended format on EAVs. Compression for key-sequenced datasets (KSDS) via SMS-managed extended format, using algorithms like Ziv-Lempel, optimizes storage efficiency, while support, introduced in version 2 release 1 (2017), enables secure protection without application modifications through integration with RACF and ICSF. These features, combined with system-managed buffering (SMB), introduced in Release 4 in 1997, and control area (CA) reclaim, introduced in 1.12 (2007), enhance I/O performance by reducing overhead and improving space utilization. VSAM integrates deeply with core components, including DB2 for large table spaces using linear datasets, for transactional processing with RLS-enabled sharing, and IMS for database operations, often via tools like DFSMStvs for backup-while-open and recovery. Linear Data Sets (LDS), introduced in the for byte-stream storage, further support subsystems like DB2. Migration utilities, such as IDCAMS and third-party replicators, facilitate transitions from non-VSAM formats like QSAM or ISAM, preserving during modernization efforts. Performance optimizations highlighted in the 2022 IBM Redbooks publication VSAM Demystified include across up to 16 volumes for speedup and Hiperbatch mode to minimize I/O contention in batch workloads, achieving up to 64-bit buffer pools for efficiency in high-volume environments. In hybrid cloud contexts, VSAM maintains compatibility through IBM tools like z/OS Connect and Manager, allowing seamless data access from cloud-native applications via APIs and SQL queries without relocating datasets. has announced no plans for VSAM, affirming its sustained support amid mainframe modernization initiatives, with ongoing enhancements focused on , , and integration with AI-driven workloads on platforms.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.