Recent from talks
Nothing was collected or created yet.
Spooling
View on Wikipedia
In computing, spooling is a specialized form of multi-programming for the purpose of copying data between different devices. In contemporary systems,[a] it is usually used for mediating between a computer application and a slow peripheral, such as a printer. Spooling allows programs to "hand off" work to be done by the peripheral and then proceed to other tasks, or to not begin until input has been transcribed. A dedicated program, the spooler, maintains an orderly sequence of jobs for the peripheral and feeds it data at its own rate. Conversely, for slow input peripherals, such as a card reader, a spooler can maintain a sequence of computational jobs waiting for data, starting each job when all of the relevant input is available; see batch processing. The spool itself refers to the sequence of jobs, or the storage area where they are held. In many cases, the spooler is able to drive devices at their full rated speed with minimal impact on other processing.
Spooling is a combination of buffering and queueing.
Print spooling
[edit]Nowadays, the most common use of spooling is printing: documents formatted for printing are stored in a queue at the speed of the computer, then retrieved and printed at the speed of the printer. Multiple processes can write documents to the spool without waiting, and can then perform other tasks, while the "spooler" process operates the printer.[1]
For example, when a large organization prepares payroll cheques, the computation takes only a few minutes or even seconds, but the printing process might take hours. If the payroll program printed cheques directly, it would be unable to proceed to other computations until all the cheques were printed. Similarly, before spooling was added to PC operating systems, word processors were unable to do anything else, including interact with the user, while printing.
Spooler or print management software often includes a variety of related features, such as allowing priorities to be assigned to print jobs, notifying users when their documents have been printed, distributing print jobs among several printers, selecting appropriate paper for each document, etc.
A print server applies spooling techniques to allow many computers to share the same printer or group of printers.
Banner page
[edit]
Print spoolers[b] can be configured to add a banner page, also called a burst page, job sheet, or printer separator, to the beginning and end of each document and job. These separate documents from each other, identify each document (e.g. with its title) and often also state who printed it (e.g. by username or job name). Banner pages are valuable in office environments where many people share a small number of printers. They are also valuable when a single job can produce multiple documents. Depending on the configuration, banner pages might be generated on each client computer, on a centralized print server, or by the printer itself.
On printers using fanfold continuous forms a leading banner page would often be printed twice, so that one copy would always be face-up when the jobs were separated. The page might include lines printed over the fold, which would be visible along the edge of a stack of printed output, allowing the operator to easily separate the jobs. Some systems would also print a banner page at the end of each job, assuring users that they had collected all of their printout.
Other applications
[edit]Spooling is also used to mediate access to punched card readers and punches, magnetic tape drives, and other slow, sequential I/O devices. It allows the application to run at the speed of the CPU while operating peripheral devices at their full rated speed.
A batch processing system uses spooling to maintain a queue of ready-to-run tasks, which can be started as soon as the system has the resources to process them.
Some store and forward messaging systems, such as uucp, used "spool" to refer to their inbound and outbound message queues, and this terminology is still found in the documentation for email and Usenet software.
History
[edit]Peripheral devices have always been much slower than core processing units. This was an especially severe problem for early mainframes. For example, a job which read punched cards or generated printed output directly was forced to run at the speed of the slow mechanical devices. The first spooling programs, such as IBM's "SPOOL System" (7070-IO-076) copied data from punched cards to magnetic tape, and from tape back to punched cards and printers. Hard disks, which offered faster I/O speeds and support for random access, started to replace the use of magnetic tape for spooling in the middle 1960s, and by the 1970s had largely replaced it altogether.
Because the unit record equipment on IBM mainframes of the early 1960s was slow, it was common for larger systems to use a small offline computer such as an IBM 1401 instead of spooling.
The term "spool" may originate with the Simultaneous Peripheral Operations On-Line[2][3] (SPOOL) software;[4] this derivation is uncertain however, as it may be a backronym.[5][verification needed] Another explanation is that it refers to "spools" or reels of magnetic tape, although “spool” is an uncommon usage.
List of spooling systems
[edit]- IBM SPOOL System, 7070-IO-076
- Integrated facility of various operating systems, e.g., GCOS, OS/360
- Attached Support Processor (ASP)[6] in OS/360 and OS/VS2 (SVS).
- Houston Automatic Spooling Priority (HASP)[7] in OS/360 and SVS, prominent in the 1960s
- Job Entry Subsystem (JES, aka JES1) in OS/VS1
- Job Entry Subsystem 2 (JES2),[8] a follower of HASP
- Job Entry Subsystem 3 (JES3),[9] a follower of ASP
- Priority Output Writers, Execution Processors and Input Readers (POWER)[10][11]
- GRASP
- The Spooler, IBM DOS/360, DOS/VS, and DOS/VSE spooler, 1975–1980s
- The Berkeley printing system (lpr/lpd)
- CUPS
- CP-67
- VM Control Program (CP)
- VM/370 RSCS (Remote Spooling Communications Subsystem)
- Symbionts and Cooperatives in SDS Sigma series computers[12]
Notes
[edit]References
[edit]- ^ Lundin, Leigh; Stoneman, Don (1977). The Spooler User Guide (2 ed.). Harrisonburg: DataCorp of Virginia.
- ^ IBM 7070 SPOOL System, 7070 Data Processing System Bulletins (Second ed.), IBM, J28-6047-1
- ^ Donovan, John J. (1972). Systems Programming. p. 405. ISBN 0-07-085175-1.
- ^ James L. Peterson; Abraham Silberschatz (July 1984). "1.4.3 Spooling". Operating System Concepts. Addison-Wesley. p. 18. ISBN 0-201-06097-3.
- ^ Tanenbaum, Andrew S. Modern Operating Systems. 3rd Ed. Pearson Education, Inc., 2008. ISBN 978-0-13-600663-3
- ^ IBM System/360 and System/370 Asymmetric Multiprocessing System: General Information Manual, Program Number 360A-CX-15X, IBM, GH20-1173
- ^ The HASP System, February 26, 1971 HASP II (360D-05.1-014) V3M1, Version 3 Modification Level 1, IBM, February 26, 1971
- ^ z/OS V1R9.0 JES2 Introduction, IBM, SA22-7535-06
- ^ JES3 Overview (First ed.), IBM, December 1980, SC23-0040-0
- ^ DOS/VS POWER/VS Installation and Operations (PDF) (Second ed.), IBM, September 1974, GC33-5403-1
- ^ Virtual Storage Extended / Priority Output Writers, Execution Processors and Input Readers; VSE/POWER - 5686-CF9-03
- ^ CP-V Software: Concepts and Facilities Manual (PDF). Honeywell. 1976. p. 2-7. Retrieved December 6, 2023.
Spooling
View on GrokipediaFundamentals
Definition and Purpose
Spooling, an acronym for Simultaneous Peripheral Operations On-Line, is a specialized buffering technique in computing that manages the transfer of data between processes and peripheral devices by temporarily storing it in an intermediary queue. This approach originated in early multi-programming systems to handle the disparities in processing speeds between the central processing unit (CPU) and slower input/output (I/O) devices.[1] The primary purpose of spooling is to allow the CPU to proceed with other computations without waiting for I/O operations to complete, enabling asynchronous data handling that overlaps CPU execution with peripheral activities.[9] Spooling encompasses both input spooling, where data from slow input devices (e.g., card readers) is buffered for faster CPU access, and output spooling, where CPU-generated data is queued for slower output devices (e.g., printers). By queuing data in a buffer—typically on disk or in memory—spooling prevents the CPU from being idle during slow I/O tasks, such as reading from or writing to tapes, disks, or printers.[9] This mechanism delivers key benefits, including the mitigation of I/O bottlenecks that could otherwise halt system progress, improved utilization of system resources by maximizing CPU uptime, and facilitation of multitasking in environments where multiple jobs compete for device access.[1] Conceptually, spooling functions as a temporary storage layer that decouples data producers from consumers, ensuring smooth workflow even when device availability or speeds vary. A classic example is print spooling, where output files are buffered before transmission to a printer, allowing immediate user feedback.Basic Mechanism
Spooling operates by temporarily buffering data generated by a faster processing unit, such as the CPU, to accommodate slower peripheral devices, enabling asynchronous I/O operations without halting system execution. The fundamental process begins when an application generates output data or receives input data, which is immediately directed to a spool area rather than sent directly to or from the device. This spool area, typically implemented as a directory or file on disk, serves as a temporary repository, allowing the CPU to continue with other tasks while the data awaits processing. A dedicated spooler process or daemon then manages the transfer of this buffered data to or from the target device at its operational speed.[1] The step-by-step data flow in spooling follows a producer-consumer model: first, input data from the application is written to the spool file or buffer in a structured format, often using a first-in, first-out (FIFO) queue to maintain order and prevent overwriting. For input spooling, slow peripheral input is queued for the CPU; for output, CPU output is queued for the peripheral. Once spooled, the daemon monitors the queue and retrieves the data sequentially, formatting it if necessary before outputting it to the peripheral device via device drivers. For instance, in a printing context, the spooler ensures that multiple jobs are queued without interference, feeding them one at a time to the printer. This buffering decouples the data production rate from the consumption rate, optimizing resource utilization.[9] Spool files play a critical role as temporary storage mediums, residing on disk for persistence or in memory for faster access, and are organized into queues that support operations like enqueueing, dequeuing, and prioritization to handle multiple concurrent requests efficiently. These files use standardized formats to encapsulate job metadata, such as job ID and size, ensuring integrity during transfer. Interaction with the operating system kernel occurs through synchronization primitives like semaphores, which coordinate access between the producer (CPU/application) and consumer (I/O device or daemon) to avoid race conditions.[10] Conceptually, the spooling flow can be visualized as a linear pipeline: the application submits data to the spooler buffer (e.g., a FIFO queue on disk), the spooler daemon polls or is notified to process the queue, and the peripheral device consumes the data asynchronously, with kernel semaphores ensuring mutual exclusion during buffer access. This mechanism, rooted in early batch processing systems, remains foundational for efficient I/O management in modern operating systems.[11]Core Applications
Print Spooling
Print spooling refers to the process of managing print jobs by temporarily storing them in a buffer or queue on disk or memory before transmitting them to a printer, enabling efficient handling of printing operations in multi-user environments. In systems like Unix-like operating systems, the Common Unix Printing System (CUPS) serves as the primary spooler, accepting jobs submitted via commands such aslp or lpr, which generate control files and data files stored in the spool directory /var/spool/cups.[12] Similarly, in Windows, the print spooler architecture accepts jobs from applications through the Graphics Device Interface (GDI), spooling data as Enhanced Metafile (EMF) files or raw formats in the %SystemRoot%\System32\spool\PRINTERS directory.[13]
The workflow in print spooling begins with job submission, where an application generates print data that is captured by the spooler. This data undergoes rasterization or formatting through filters or drivers; for instance, in CUPS, a filter chain converts input formats like PostScript, PDF, or plain text into a printer-compatible raster or page description language (PDL) using helper programs that process the data and output it to standard output.[12] The formatted job is then queued in the spooler, where it awaits processing based on priority and availability, with the scheduler managing the order and dispatching jobs to the appropriate backend (e.g., USB or network) once the printer is ready.[12] In Windows, the spooler routes the job through print processors for any necessary conversions before queuing it for the port monitor to send to the printer.[13]
Printer daemons, such as the CUPS scheduler (cupsd), play a central role by monitoring queue status through logs and HTTP/IPP interfaces, allowing administrators to track job progress and printer availability.[12] These daemons prioritize jobs based on user-specified priorities or classes, ensuring higher-priority tasks are processed first, and support handling multiple printers by maintaining configurations in files like printers.conf and routing jobs accordingly.[12] In Windows, the spooler service similarly oversees queue monitoring, job prioritization via priority levels (1-99), and multi-printer management through the Print Management console.[13]
One key advantage of print spooling is that it allows users to submit print jobs asynchronously without waiting for immediate printer availability, freeing applications and users to continue other tasks while the job queues.[12] This also supports offline printing scenarios, where jobs are stored and processed once the printer reconnects, improving overall system responsiveness in shared environments.[13] Banner pages, a print-specific feature, can be automatically added to jobs in systems like CUPS to separate and identify multiple prints from the same queue.[14]
Common challenges in print spooling include job collisions, arising from the classic printer spooler synchronization problem where multiple processes attempt concurrent access to the shared queue, potentially leading to data corruption or lost jobs without proper semaphores or mutexes. Format conversions also pose issues, such as translating PostScript to PCL for compatibility with certain printers, which can fail due to incompatible drivers or complex document features, resulting in garbled output or stalled queues.[12][13] In Windows operating systems, particularly Windows 10 and 11, canceling a print job permanently removes it from the queue and deletes the associated temporary spool files from %SystemRoot%\System32\spool\PRINTERS, with no built-in feature to recover the job; to print the document again, the file must be reopened and resubmitted to the printer. In contrast, for stuck or stalled jobs (not canceled), users can clear the queue by restarting the Print Spooler service or by stopping the service, manually deleting files in the spool folder, and restarting the service.[15]
Batch Job Spooling
Batch job spooling facilitates the queuing and management of non-interactive computational tasks in mainframe environments by buffering input data and capturing output on auxiliary storage, thereby decoupling job execution from direct device access and enabling efficient resource sharing among multiple jobs. In this process, users submit batch jobs via Job Control Language (JCL), which specifies the program's execution steps, input requirements, and output destinations; the system then stages the input data—often inline or from external sources—into SYSIN datasets for sequential reading during processing.[16] Post-execution, output generated by the program, including reports and logs, is directed to SYSOUT datasets, where it is spooled for later retrieval, printing, or further processing without interrupting the system's primary workload.[17] This mechanism, integral to batch processing, ensures that jobs like data sorting or report generation can run unattended, with the spooling subsystem handling data persistence across job completion.[18] The historical foundation of batch job spooling traces to IBM's OS/360 operating system in the mid-1960s, where the Houston Automatic Spooling Priority (HASP) program was developed to address limitations in early batch environments lacking native asynchronous I/O support; HASP introduced disk-based queuing for job input streams and output, evolving into the Job Entry Subsystem (JES) with OS/VS2 in the early 1970s.[19] In OS/360-style systems, JCL statements such as//SYSIN DD * define inline input data terminated by /*, while //SYSPRINT DD SYSOUT=A routes output to a specific spool class (e.g., A-Z or 0-9) for prioritized handling, allowing SYSIN and SYSOUT datasets to be allocated dynamically by JES during job initiation.[16] These datasets, stored on direct-access volumes like DASD, use default parameters such as UNIT=SYSDA and SPACE=(TRK,(50,10)) if unspecified, ensuring compatibility with the system's spooling architecture.[17]
By enabling non-interactive execution, batch job spooling delivers significant efficiency gains, particularly for high-volume workloads such as monthly payroll processing or complex simulations, where it allows multiple jobs to share CPU and I/O resources without contention, reducing overall turnaround time from hours to minutes in multi-initiator configurations.[18] For instance, in a typical setup, JES can manage parallel execution across several initiators, buffering terabytes of transactional data overnight while online systems handle interactive queries.[18] This approach optimizes system utilization by deferring I/O-bound operations, such as output printing, to off-peak periods.[16]
Queue management in batch spooling incorporates priority levels to sequence jobs based on urgency or resource needs, with JES assigning classes via JCL parameters like MSGCLASS or PRTY to determine execution order within input and output queues.[18] Hold and release mechanisms further refine scheduling; for example, the TYPRUN=HOLD parameter in the JOB statement places a job in a held state upon submission, requiring operator or SDSF intervention to release it for processing, which prevents premature execution of dependent or resource-intensive tasks.[18] These controls, managed through JES queues (e.g., input, conversion, and output), ensure orderly flow in environments handling thousands of daily submissions, with tools like SDSF providing real-time monitoring and adjustment.[18]
Extended Applications
Disk and Tape Spooling
Disk spooling employs hard disks as intermediate storage media to hold large datasets, functioning as virtual drums or dedicated files that buffer input and output operations. This method enables random access to data, allowing systems to stage information temporarily without constant reliance on slower peripherals, thereby extending the lifespan of magnetic tapes by minimizing their usage in repetitive read-write cycles. In practice, disk spooling prevents the "shoe-shining" phenomenon in tape drives—where frequent starts and stops cause excessive mechanical wear—by transferring data to disk first for processing or later archival.[20] Tape spooling, in contrast, leverages magnetic tapes for sequential access in scenarios requiring bulk data transfer or long-term archival, where entire datasets are written or read in a linear fashion. Systems automate tape mounting and unmounting to streamline operations, reducing manual intervention in high-volume environments and enabling efficient handling of immutable data streams. This approach is particularly suited to legacy systems where tapes serve as cost-effective, high-capacity storage for non-volatile data preservation.[21] Technical optimizations in both disk and tape spooling focus on block sizes, access latencies, and buffering algorithms to enhance I/O throughput. For disks, block sizes are typically aligned with sector boundaries (e.g., 512 bytes or multiples thereof) to minimize fragmentation, while seek times—often in the millisecond range—influence the choice of algorithms like double or circular buffering, which overlap data transfer with computation to sustain higher transfer rates. Tape spooling relies on fixed block sizes to match tape density (e.g., 800-6250 bits per inch in early formats) and sequential buffering to avoid repositioning overhead, ensuring continuous streaming and reducing latency in bulk operations. These mechanisms extend basic buffering principles by scaling to persistent media for sustained performance.[22][21] Key use cases include data staging in scientific computing, where large simulation outputs are spooled to disk for intermediate analysis before tape archival, avoiding real-time I/O bottlenecks. Similarly, in transaction logging systems, non-critical logs are spooled to disk or tape for durability and audit trails, prioritizing reliability over immediate access in environments like early database management.[20]Network and Modern Spooling
Network spooling enables the management of I/O operations across distributed systems, allowing jobs to be queued and processed remotely without direct device attachment. The Line Printer Daemon (LPD) protocol, defined in RFC 1179, facilitates this by providing a TCP/IP-based mechanism for submitting print jobs to remote printers, where the client sends control files and data streams to a daemon listening on port 515.[23] Similarly, the Server Message Block (SMB) protocol supports spooling for file sharing and printing over networks, redirecting print jobs to a local spooler via shared queues on Windows systems.[24] These protocols decouple producers from consumers, buffering data in intermediate queues to handle network variability. In modern cloud environments, spooling has evolved into scalable message queuing services that manage asynchronous workloads across distributed components. Amazon Simple Queue Service (SQS), a fully managed service, acts as a message spooler by storing and delivering messages between software components, supporting up to 120,000 in-flight messages per queue to ensure reliability without message loss.[25] This approach extends traditional spooling to handle massive scales, such as in microservices architectures, where queues buffer events for processing in serverless functions like AWS Lambda.[26] Virtualization adaptations enhance spooling efficiency in hypervisor-based systems by leveraging memory management techniques to minimize I/O bottlenecks. For high-latency networks, spooling systems incorporate optimizations like TCP autotuning to maintain throughput, adjusting buffer sizes to counteract delays in WAN environments without compromising queue integrity.[27] Emerging trends integrate artificial intelligence to optimize spooling through predictive queuing, where machine learning models forecast workload patterns to preemptively allocate resources and reduce wait times. A reinforcement learning framework, for example, dynamically schedules jobs in queueing systems by predicting arrival rates, achieving up to 20% improvement in average response times over static methods.[28] Security enhancements, such as encrypting spool files, protect sensitive data in transit and at rest; modern systems employ AES-256 encryption for print and job queues to prevent unauthorized access during network transmission.[29] These features address vulnerabilities in distributed spooling, ensuring compliance with standards like GDPR in cloud deployments.Supporting Elements
Banner Pages
Banner pages, also referred to as separator sheets, burst pages, or job sheets, are specialized pages automatically generated and inserted by print spoolers at the start—and optionally at the end—of a print job to delineate and identify individual documents in a queue. These pages typically include key metadata such as the submitting user's ID, the job's request ID, submission timestamp, and a customizable title or description of the document.[30][31] This feature originated as a practical solution in early multi-user computing environments to manage output from shared peripherals like line printers.[32] The generation of banner pages occurs through the spooler software, which assembles the necessary information from the job request and formats it using predefined templates or dedicated programs before integrating it into the print stream. In Unix-like systems utilizing the LP (Line Printer) utilities, for instance, the spooler daemon handles this insertion automatically, allowing customization of headers, footers, and content via configuration files or command options to tailor the page's appearance and details.[33][34] This process ensures the banner is printed in a distinct format, often with bold or centered text, to make it easily distinguishable from the actual job content. In shared printer setups, banner pages primarily serve to organize output by physically separating jobs, thereby promoting accountability through recorded user and temporal details, and reducing errors such as document mix-ups in high-volume, multi-user scenarios.[31][30] Administrators can configure systems to suppress banner printing entirely via options like-o nobanner in LP commands or banner=never in printer administration settings, which is useful for conserving paper or in single-user contexts.[34][35] Variations also include support for multi-page banners in advanced LP implementations, where complex job information or custom formatting extends the separator beyond a single sheet if required by the configuration.[33]
Error Handling and Management
In spooling systems, common errors include device offline conditions, where the target peripheral such as a printer becomes unavailable due to power issues or connectivity failures, leading to stalled job processing.[1] Buffer overflows occur when the spool storage reaches capacity.[1] These errors are detected primarily through status polling, where the operating system or spooler daemon periodically queries device and queue states to identify anomalies like unresponsiveness or full buffers.[36] Recovery strategies emphasize fault tolerance, such as pausing affected jobs to allow manual intervention while keeping others in the queue active, followed by resuming once the issue is resolved.[37] Automatic retries are implemented for transient errors, like temporary device offline states, where the spooler reattempts transmission after a configurable delay to avoid unnecessary failures.[38] Comprehensive logging captures diagnostics, including error timestamps, job IDs, and failure reasons, enabling post-incident analysis and integration with banner pages to provide contextual separation for troubleshooting multi-job queues.[39] Management tools facilitate oversight and intervention; for instance, in Unix-like systems, the lpstat command allows administrators to inspect queue status, identify stalled jobs, and check printer availability for timely cancellation or reconfiguration.[37] Similar utilities in other environments, such as Windows' Print Management console, support clearing corrupted spool files and restarting services to restore operations.[40] In Windows 10 and 11 specifically, the print queue viewer allows direct cancellation of jobs, and for stuck (non-canceled) jobs that remain in the queue, administrators can clear them by restarting the Print Spooler service or by stopping the service and manually deleting temporary files from C:\Windows\System32\spool\PRINTERS. However, once a print job is canceled, it is permanently removed from the queue, with associated temporary spool files deleted from C:\Windows\System32\spool\PRINTERS, and there is no built-in Windows feature to recover or restore it; the document must be reopened and resent to the printer.[15] Best practices for reliability include implementing redundancy in spoolers through clustered configurations, where multiple nodes mirror queue states to handle failover without job loss.[1]Historical Context
Origins and Early Development
Spooling emerged in the mid-20th century as a response to input/output (I/O) bottlenecks in early batch-processing mainframes, where central processing units (CPUs) frequently idled while awaiting data from slow peripherals such as punched card readers and line printers. Systems like the IBM 1401, introduced in 1959, exemplified these challenges in commercial data processing environments, prompting the need for techniques to overlap CPU computation with peripheral operations.[3] The term "spooling," short for Simultaneous Peripheral Operations On-Line, first appeared in IBM's documentation for the 7070 series mainframes, announced in 1958, with the SPOOL System (7070-IO-076) using magnetic tape to buffer data from punched cards to tape and back to cards or printers, decoupling I/O from CPU processing.[41] This marked an early standardized implementation of spooling to mitigate I/O slowdowns in batch environments. Separately, the SABRE airline reservation system, a joint American Airlines and IBM project operational from 1964, employed disk buffering on IBM 1301 storage units (announced 1961) and magnetic drums with dual IBM 7090 mainframes to handle real-time transaction data across remote terminals, demonstrating advanced buffering for high-volume interactive workloads.[42] Key milestones in spooling's early adoption included its integration into IBM's operating systems for the 7000 series mainframes, such as IBSYS for the IBM 709/7090, which used magnetic tape for job queuing to further decouple I/O from processing. By 1964, spooling became a core feature of OS/360, IBM's landmark operating system for the System/360 family, introducing the first dedicated print spoolers for high-speed line printers like the IBM 1403. These advancements allowed output data to be buffered on disks or tape, freeing the CPU for subsequent jobs and significantly improving throughput in batch environments. Initial implementations relied on magnetic drums as spool media for rapid random access before the dominance of fixed-head disk packs.[5]Evolution in Operating Systems
Building on concepts from early IBM mainframe environments, spooling in Unix operating systems advanced significantly with the Berkeley Software Distribution (BSD) in the late 1970s. The lpr command and associated tools, such as lpq for queue status and lprm for job removal, formed the core of the Berkeley printing system, enabling users to submit jobs to the line printer daemon (lpd) for asynchronous processing and network transmission via the Line Printer Daemon (LPD) protocol. This approach decoupled application execution from printer availability, supporting multi-user workloads on systems like 4.2BSD released in 1983.[43] By the 1990s, the Common UNIX Printing System (CUPS) emerged as an evolution of BSD-style spooling, with development beginning in 1997 by Michael Sweet at Easy Software Products and the first beta release in 1999. CUPS introduced support for the Internet Printing Protocol (IPP), a filter architecture for data conversion, and a web-based interface for administration, standardizing printing across Unix-like systems and replacing older LPD implementations in many distributions. Acquired by Apple in 2007, CUPS became the de facto standard for open-source printing, emphasizing driverless and networked capabilities.[44] In Microsoft Windows, spooling progressed with the introduction of the Print Spooler service in Windows NT 3.1 in 1993, which managed job queuing using Enhanced Metafile (EMF) formats and integrated with the Win32 printing API to handle diverse data types like raw PostScript or PCL. This service operated as a core subsystem, routing jobs through drivers and monitors while supporting remote printing in enterprise networks. Further enhancements came with Windows Management Instrumentation (WMI) integration starting in Windows 2000, allowing scripted management of print jobs, queues, and devices via classes like Win32_PrintJob for querying status and enforcing policies in multi-server setups.[6][45] In IBM mainframes, spooling evolved further with the Houston Automatic Spooling Priority (HASP) system, developed in the mid-1960s for OS/360 and OS/MVT, which enhanced job scheduling, I/O buffering, and output management, becoming a foundational component later incorporated into Job Entry Subsystem 2 (JES2). Open-source developments in the 2010s extended spooling through tighter integration of CUPS with systemd, the init system adopted by major Linux distributions around 2015, where CUPS daemons run as socket-activated units for on-demand startup and dependency resolution. This improved reliability in containerized and cloud environments by automating service restarts and resource limits for spoolers. Spooling overall shifted from hardware-dependent models tied to specific peripherals to software-defined abstractions, enhancing scalability for concurrent users and distributed systems without direct device intervention.[46][47]Notable Systems
List of Spooling Systems
- IBM Job Entry Subsystem (JES): Manages batch jobs on z/OS mainframes, handling input, execution, and output spooling.[48]
- Line Printer Daemon (LPD): Unix print spooler using LPR protocol for queue management on local and networked printers.[49]
- Windows Print Spooler: Core Windows service for queuing and routing print jobs to local or network devices.[6]
- Common Unix Printing System (CUPS): Modern Unix printing system with IPP support for networked and cloud-compatible printing.[44]
General Job Spoolers
IBM's Job Entry Subsystem (JES) is a core component of the z/OS operating system, responsible for managing batch jobs, including input reading, job selection, output printing, and purging completed jobs from the system.[48] It supports supplementary functions like data management and task management on IBM mainframe platforms, originating in the 1970s with the MVS operating system and evolving through JES2 and JES3 variants.[50]Print-Focused Spoolers
The Unix Line Printer Daemon (LPD) manages print queues by receiving print requests via the LPR protocol, transferring files to spool directories, and dispatching them to printers while handling queue status and job removal.[49] It is supported on various Unix-like systems including BSD derivatives and Linux, dating back to the 1980s in early BSD Unix implementations.[51] Windows Print Spooler is the service that oversees the printing process by loading printer drivers, queuing print jobs, and routing them to local or network printers, including support for print job management and error recovery.[6] It runs on Microsoft Windows operating systems from Windows NT onward, introduced in the early 1990s as part of the NT kernel architecture.[52] The Common Unix Printing System (CUPS) provides comprehensive print spooling with features like job queuing, filtering, backend device handling, and Internet Printing Protocol (IPP) support for networked printing.[44] It is the default printing system on most Unix-like platforms, including Linux distributions and macOS, developed in the late 1990s and first released in 1999.[12]Key Implementations and Comparisons
Prominent spooling systems exhibit distinct features in security, scalability, and ease of configuration, influencing their suitability for different environments. The Common Unix Printing System (CUPS), leveraging the Internet Printing Protocol (IPP), incorporates authentication, authorization, and encryption capabilities, providing robust protection against unauthorized access and data interception, in contrast to the Line Printer Daemon (LPD), an older protocol that operates without inherent security mechanisms and is vulnerable to basic network exploits.[53] Scalability differs markedly between legacy on-premises systems like LPD, which are constrained by fixed hardware capacities and struggle with fluctuating workloads, and cloud-based approaches such as Microsoft Universal Print, which dynamically allocate resources to handle variable demand without infrastructure overprovisioning.[54] Ease of configuration favors CUPS, which offers a web-based interface at port 631 for intuitive queue management and printer setup without manual file edits, whereas LPD relies on command-line tools and lacks a graphical frontend, complicating deployment in diverse networks.[55]| Feature | CUPS (IPP-based) | LPD (Legacy) |
|---|---|---|
| Security | Authentication, encryption (IPPS), access controls | No built-in auth or encryption; prone to interception |
| Scalability | Supports cloud backends for dynamic scaling | Limited to local hardware; poor for high-volume |
| Configuration Ease | Web interface, command tools like lpadmin | Command-line only; no GUI, steeper learning curve |
