Recent from talks
Contribute something
Nothing was collected or created yet.
Scheduling (computing)
View on WikipediaIn computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.
The scheduling activity is carried out by a mechanism called a scheduler. Schedulers are often designed so as to keep all computer resources busy (as in load balancing), allow multiple users to share system resources effectively, or to achieve a target quality-of-service.
Scheduling is fundamental to computation itself, and an intrinsic part of the execution model of a computer system; the concept of scheduling makes it possible to have computer multitasking with a single central processing unit (CPU).
Goals
[edit]A scheduler may aim at one or more goals, for example:
- maximizing throughput (the total amount of work completed per time unit);
- minimizing wait time (time from work becoming ready until the first point it begins execution);
- minimizing latency or response time (time from work becoming ready until it is finished in case of batch activity,[1][2][3] or until the system responds and hands the first output to the user in case of interactive activity);[4]
- maximizing fairness (equal CPU time to each process, or more generally appropriate times according to the priority and workload of each process).
In practice, these goals often conflict (e.g. throughput versus latency), thus a scheduler will implement a suitable compromise. Preference is measured by any one of the concerns mentioned above, depending upon the user's needs and objectives.
In real-time environments, such as embedded systems for automatic control in industry (for example robotics), the scheduler also must ensure that processes can meet deadlines; this is crucial for keeping the system stable. Scheduled tasks can also be distributed to remote devices across a network and managed through an administrative back end.
Types of operating system schedulers
[edit]The scheduler is an operating system module that selects the next jobs to be admitted into the system and the next process to run. Operating systems may feature up to three distinct scheduler types: a long-term scheduler (also known as an admission scheduler or high-level scheduler), a mid-term or medium-term scheduler, and a short-term scheduler. The names suggest the relative frequency with which their functions are performed.
Process scheduler
[edit]The process scheduler is a part of the operating system that decides which process runs at a certain point in time. It usually has the ability to pause a running process, move it to the back of the running queue and start a new process; such a scheduler is known as a preemptive scheduler, otherwise it is a cooperative scheduler.[5]
We distinguish between long-term scheduling, medium-term scheduling, and short-term scheduling based on how often decisions must be made.[6]
Long-term scheduling
[edit]The long-term scheduler, or admission scheduler, decides which jobs or processes are to be admitted to the ready queue (in main memory); that is, when an attempt is made to execute a program, its admission to the set of currently executing processes is either authorized or delayed by the long-term scheduler. Thus, this scheduler dictates what processes are to run on a system, the degree of concurrency to be supported at any one time – whether many or few processes are to be executed concurrently, and how the split between I/O-intensive and CPU-intensive processes is to be handled. The long-term scheduler is responsible for controlling the degree of multiprogramming.
In general, most processes can be described as either I/O-bound or CPU-bound. An I/O-bound process is one that spends more of its time doing I/O than it spends doing computations. A CPU-bound process, in contrast, generates I/O requests infrequently, using more of its time doing computations. It is important that a long-term scheduler selects a good process mix of I/O-bound and CPU-bound processes. If all processes are I/O-bound, the ready queue will almost always be empty, and the short-term scheduler will have little to do. On the other hand, if all processes are CPU-bound, the I/O waiting queue will almost always be empty, devices will go unused, and again the system will be unbalanced. The system with the best performance will thus have a combination of CPU-bound and I/O-bound processes. In modern operating systems, this is used to make sure that real-time processes get enough CPU time to finish their tasks.[7]
Long-term scheduling is also important in large-scale systems such as batch processing systems, computer clusters, supercomputers, and render farms. For example, in concurrent systems, coscheduling of interacting processes is often required to prevent them from blocking due to waiting on each other. In these cases, special-purpose job scheduler software is typically used to assist these functions, in addition to any underlying admission scheduling support in the operating system.
Some operating systems only allow new tasks to be added if it is sure all real-time deadlines can still be met. The specific heuristic algorithm used by an operating system to accept or reject new tasks is the admission control mechanism.[8]
Medium-term scheduling
[edit]The medium-term scheduler temporarily removes processes from main memory and places them in secondary memory (such as a hard disk drive) or vice versa, which is commonly referred to as swapping out or swapping in (also incorrectly as paging out or paging in). The medium-term scheduler may decide to swap out a process that has not been active for some time, a process that has a low priority, a process that is page faulting frequently, or a process that is taking up a large amount of memory in order to free up main memory for other processes, swapping the process back in later when more memory is available, or when the process has been unblocked and is no longer waiting for a resource.
In many systems today (those that support mapping virtual address space to secondary storage other than the swap file), the medium-term scheduler may actually perform the role of the long-term scheduler, by treating binaries as swapped-out processes upon their execution. In this way, when a segment of the binary is required it can be swapped in on demand, or lazy loaded, also called demand paging.
Short-term scheduling
[edit]The short-term scheduler (also known as the CPU scheduler) decides which of the ready, in-memory processes is to be executed (allocated a CPU) after a clock interrupt, an I/O interrupt, an operating system call or another form of signal. Thus the short-term scheduler makes scheduling decisions much more frequently than the long-term or mid-term schedulers – A scheduling decision will at a minimum have to be made after every time slice, and these are very short. This scheduler can be preemptive, implying that it is capable of forcibly removing processes from a CPU when it decides to allocate that CPU to another process, or non-preemptive (also known as voluntary or co-operative), in which case the scheduler is unable to force processes off the CPU.
Dispatcher
[edit]Another component that is involved in the CPU-scheduling function is the dispatcher, which is the module that gives control of the CPU to the process selected by the short-term scheduler. It receives control in kernel mode as the result of an interrupt or system call. The functions of a dispatcher involve the following:
- Context switches, in which the dispatcher saves the state (also known as context) of the process or thread that was previously running; the dispatcher then loads the initial or previously saved state of the new process.
- Switching to user mode.
- Jumping to the proper location in the user program to restart that program indicated by its new state.
The dispatcher should be as fast as possible since it is invoked during every process switch. During the context switches, the processor is virtually idle for a fraction of time, thus unnecessary context switches should be avoided. The time it takes for the dispatcher to stop one process and start another is known as the dispatch latency.[7]: 155
Scheduling disciplines
[edit]A scheduling discipline (also called scheduling policy or scheduling algorithm) is an algorithm used for distributing resources among parties which simultaneously and asynchronously request them. Scheduling disciplines are used in routers (to handle packet traffic) as well as in operating systems (to share CPU time among both threads and processes), disk drives (I/O scheduling), printers (print spooler), most embedded systems, etc.
The main purposes of scheduling algorithms are to minimize resource starvation and to ensure fairness amongst the parties utilizing the resources. Scheduling deals with the problem of deciding which of the outstanding requests is to be allocated resources. There are many different scheduling algorithms. In this section, we introduce several of them.
In packet-switched computer networks and other statistical multiplexing, the notion of a scheduling algorithm is used as an alternative to first-come first-served queuing of data packets.
The simplest best-effort scheduling algorithms are round-robin, fair queuing (a max-min fair scheduling algorithm), proportional-fair scheduling and maximum throughput. If differentiated or guaranteed quality of service is offered, as opposed to best-effort communication, weighted fair queuing may be utilized.
In advanced packet radio wireless networks such as HSDPA (High-Speed Downlink Packet Access) 3.5G cellular system, channel-dependent scheduling may be used to take advantage of channel state information. If the channel conditions are favourable, the throughput and system spectral efficiency may be increased. In even more advanced systems such as LTE, the scheduling is combined by channel-dependent packet-by-packet dynamic channel allocation, or by assigning OFDMA multi-carriers or other frequency-domain equalization components to the users that best can utilize them.[9]
First come, first served
[edit]
First in, first out (FIFO), also known as first come, first served (FCFS), is the simplest scheduling algorithm. FIFO simply queues processes in the order that they arrive in the ready queue. This is commonly used for a task queue, for example as illustrated in this section.
- Since context switches only occur upon process termination, and no reorganization of the process queue is required, scheduling overhead is minimal.
- Throughput can be low, because long processes can be holding the CPU, causing the short processes to wait for a long time (known as the convoy effect).
- No starvation, because each process gets chance to be executed after a definite time.
- Turnaround time, waiting time and response time depend on the order of their arrival and can be high for the same reasons above.
- No prioritization occurs, thus this system has trouble meeting process deadlines.
- The lack of prioritization means that as long as every process eventually completes, there is no starvation. In an environment where some processes might not complete, there can be starvation.
- It is based on queuing.
Priority scheduling
[edit]Earliest deadline first (EDF) or least time to go is a dynamic scheduling algorithm used in real-time operating systems to place processes in a priority queue. Whenever a scheduling event occurs (a task finishes, new task is released, etc.), the queue will be searched for the process closest to its deadline, which will be the next to be scheduled for execution.
Shortest remaining time first
[edit]Similar to shortest job first (SJF). With this strategy the scheduler arranges processes with the least estimated processing time remaining to be next in the queue. This requires advanced knowledge or estimations about the time required for a process to complete.
- If a shorter process arrives during another process' execution, the currently running process is interrupted (known as preemption), dividing that process into two separate computing blocks. This creates excess overhead through additional context switching. The scheduler must also place each incoming process into a specific place in the queue, creating additional overhead.
- This algorithm is designed for maximum throughput in most scenarios.
- Waiting time and response time increase as the process's computational requirements increase. Since turnaround time is based on waiting time plus processing time, longer processes are significantly affected by this. Overall waiting time is smaller than FIFO, however since no process has to wait for the termination of the longest process.
- No particular attention is given to deadlines, the programmer can only attempt to make processes with deadlines as short as possible.
- Starvation is possible, especially in a busy system with many small processes being run.
- To use this policy we should have at least two processes of different priority
Fixed-priority pre-emptive scheduling
[edit]The operating system assigns a fixed-priority rank to every process, and the scheduler arranges the processes in the ready queue in order of their priority. Lower-priority processes get interrupted by incoming higher-priority processes.
- Overhead is not minimal, nor is it significant.
- FPPS has no particular advantage in terms of throughput over FIFO scheduling.
- If the number of rankings is limited, it can be characterized as a collection of FIFO queues, one for each priority ranking. Processes in lower-priority queues are selected only when all of the higher-priority queues are empty.
- Waiting time and response time depend on the priority of the process. Higher-priority processes have smaller waiting and response times.
- Deadlines can be met by giving processes with deadlines a higher priority.
- Starvation of lower-priority processes is possible with large numbers of high-priority processes queuing for CPU time.
Round-robin scheduling
[edit]The scheduler assigns a fixed time unit per process, and cycles through them. If process completes within that time-slice it gets terminated otherwise it is rescheduled after giving a chance to all other processes.
- RR scheduling involves extensive overhead, especially with a small time unit.
- Balanced throughput between FCFS/FIFO and SJF/SRTF, shorter jobs are completed faster than in FIFO and longer processes are completed faster than in SJF.
- Good average response time, waiting time is dependent on number of processes, and not average process length.
- Because of high waiting times, deadlines are rarely met in a pure RR system.
- Starvation can never occur, since no priority is given. Order of time unit allocation is based upon process arrival time, similar to FIFO.
- If Time-Slice is large it becomes FCFS/FIFO or if it is short then it becomes SJF/SRTF.
Multilevel queue scheduling
[edit]This is used for situations in which processes are easily divided into different groups. For example, a common division is made between foreground (interactive) processes and background (batch) processes. These two types of processes have different response-time requirements and so may have different scheduling needs. It is very useful for shared memory problems.
Work-conserving schedulers
[edit]A work-conserving scheduler is a scheduler that always tries to keep the scheduled resources busy, if there are submitted jobs ready to be scheduled. In contrast, a non-work conserving scheduler is a scheduler that, in some cases, may leave the scheduled resources idle despite the presence of jobs ready to be scheduled.
Scheduling optimization problems
[edit]There are several scheduling problems in which the goal is to decide which job goes to which station at what time, such that the total makespan is minimized:
- Job-shop scheduling – there are n jobs and m identical stations. Each job should be executed on a single machine. This is usually regarded as an online problem.
- Open-shop scheduling – there are n jobs and m different stations. Each job should spend some time at each station, in a free order.
- Flow-shop scheduling – there are n jobs and m different stations. Each job should spend some time at each station, in a pre-determined order.
Manual scheduling
[edit]A very common method in embedded systems is to schedule jobs manually. This can for example be done in a time-multiplexed fashion. Sometimes the kernel is divided in three or more parts: Manual scheduling, preemptive and interrupt level. Exact methods for scheduling jobs are often proprietary.
- No resource starvation problems
- Very high predictability; allows implementation of hard real-time systems
- Almost no overhead
- May not be optimal for all applications
- Effectiveness is completely dependent on the implementation
Choosing a scheduling algorithm
[edit]When designing an operating system, a programmer must consider which scheduling algorithm will perform best for the use the system is going to see. There is no universal best scheduling algorithm, and many operating systems use extended or combinations of the scheduling algorithms above.
For example, Windows NT/XP/Vista uses a multilevel feedback queue, a combination of fixed-priority preemptive scheduling, round-robin, and first in, first out algorithms. In this system, threads can dynamically increase or decrease in priority depending on if it has been serviced already, or if it has been waiting extensively. Every priority level is represented by its own queue, with round-robin scheduling among the high-priority threads and FIFO among the lower-priority ones. In this sense, response time is short for most threads, and short but critical system threads get completed very quickly. Since threads can only use one time unit of the round-robin in the highest-priority queue, starvation can be a problem for longer high-priority threads.
Operating system process scheduler implementations
[edit]The algorithm used may be as simple as round-robin in which each process is given equal time (for instance 1 ms, usually between 1 ms and 100 ms) in a cycling list. So, process A executes for 1 ms, then process B, then process C, then back to process A.
More advanced algorithms take into account process priority, or the importance of the process. This allows some processes to use more time than other processes. The kernel always uses whatever resources it needs to ensure proper functioning of the system, and so can be said to have infinite priority. In SMP systems, processor affinity is considered to increase overall system performance, even if it may cause a process itself to run more slowly. This generally improves performance by reducing cache thrashing.
OS/360 and successors
[edit]IBM OS/360 was available with three different schedulers. The differences were such that the variants were often considered three different operating systems:
- The Single Sequential Scheduler option, also known as the Primary Control Program (PCP) provided sequential execution of a single stream of jobs.
- The Multiple Sequential Scheduler option, known as Multiprogramming with a Fixed Number of Tasks (MFT) provided execution of multiple concurrent jobs. Execution was governed by a priority which had a default for each stream or could be requested separately for each job. MFT version II added subtasks (threads), which executed at a priority based on that of the parent job. Each job stream defined the maximum amount of memory which could be used by any job in that stream.
- The Multiple Priority Schedulers option, or Multiprogramming with a Variable Number of Tasks (MVT), featured subtasks from the start; each job requested the priority and memory it required before execution.
Later virtual storage versions of MVS added a Workload Manager feature to the scheduler, which schedules processor resources according to an elaborate scheme defined by the installation.
Windows
[edit]Very early MS-DOS and Microsoft Windows systems were non-multitasking, and as such did not feature a scheduler. Windows 3.1x used a non-preemptive scheduler, meaning that it did not interrupt programs. It relied on the program to end or tell the OS that it didn't need the processor so that it could move on to another process. This is usually called cooperative multitasking. Windows 95 introduced a rudimentary preemptive scheduler; however, for legacy support opted to let 16-bit applications run without preemption.[10]
Windows NT-based operating systems use a multilevel feedback queue. 32 priority levels are defined, 0 through to 31, with priorities 0 through 15 being normal priorities and priorities 16 through 31 being soft real-time priorities, requiring privileges to assign. 0 is reserved for the Operating System. User interfaces and APIs work with priority classes for the process and the threads in the process, which are then combined by the system into the absolute priority level.
The kernel may change the priority level of a thread depending on its I/O and CPU usage and whether it is interactive (i.e. accepts and responds to input from humans), raising the priority of interactive and I/O bounded processes and lowering that of CPU bound processes, to increase the responsiveness of interactive applications.[11] The scheduler was modified in Windows Vista to use the cycle counter register of modern processors to keep track of exactly how many CPU cycles a thread has executed, rather than just using an interval-timer interrupt routine.[12] Vista also uses a priority scheduler for the I/O queue so that disk defragmenters and other such programs do not interfere with foreground operations.[13]
Classic Mac OS and macOS
[edit]Mac OS 9 uses cooperative scheduling for threads, where one process controls multiple cooperative threads, and also provides preemptive scheduling for multiprocessing tasks. The kernel schedules multiprocessing tasks using a preemptive scheduling algorithm. All Process Manager processes run within a special multiprocessing task, called the blue task. Those processes are scheduled cooperatively, using a round-robin scheduling algorithm; a process yields control of the processor to another process by explicitly calling a blocking function such as WaitNextEvent. Each process has its own copy of the Thread Manager that schedules that process's threads cooperatively; a thread yields control of the processor to another thread by calling YieldToAnyThread or YieldToThread.[14]
macOS uses a multilevel feedback queue, with four priority bands for threads – normal, system high priority, kernel mode only, and real-time.[15] Threads are scheduled preemptively; macOS also supports cooperatively scheduled threads in its implementation of the Thread Manager in Carbon.[14]
AIX
[edit]In AIX Version 4 there are three possible values for thread scheduling policy:
- First In, First Out: Once a thread with this policy is scheduled, it runs to completion unless it is blocked, it voluntarily yields control of the CPU, or a higher-priority thread becomes dispatchable. Only fixed-priority threads can have a FIFO scheduling policy.
- Round Robin: This is similar to the AIX Version 3 scheduler round-robin scheme based on 10 ms time slices. When a RR thread has control at the end of the time slice, it moves to the tail of the queue of dispatchable threads of its priority. Only fixed-priority threads can have a Round Robin scheduling policy.
- OTHER: This policy is defined by POSIX1003.4a as implementation-defined. In AIX Version 4, this policy is defined to be equivalent to RR, except that it applies to threads with non-fixed priority. The recalculation of the running thread's priority value at each clock interrupt means that a thread may lose control because its priority value has risen above that of another dispatchable thread. This is the AIX Version 3 behavior.
Threads are primarily of interest for applications that currently consist of several asynchronous processes. These applications might impose a lighter load on the system if converted to a multithreaded structure.
AIX 5 implements the following scheduling policies: FIFO, round robin, and a fair round robin. The FIFO policy has three different implementations: FIFO, FIFO2, and FIFO3. The round robin policy is named SCHED_RR in AIX, and the fair round robin is called SCHED_OTHER.[16]
Linux
[edit]Linux 1.2
[edit]Linux 1.2 used a round-robin scheduling policy.[17]
Linux 2.2
[edit]Linux 2.2 added scheduling classes and support for symmetric multiprocessing (SMP).[17]

Linux 2.4
[edit]In Linux 2.4,[17] an O(n) scheduler with a multilevel feedback queue with priority levels ranging from 0 to 140 was used; 0–99 are reserved for real-time tasks and 100–140 are considered nice task levels. For real-time tasks, the time quantum for switching processes was approximately 200 ms, and for nice tasks approximately 10 ms.[citation needed] The scheduler ran through the run queue of all ready processes, letting the highest priority processes go first and run through their time slices, after which they will be placed in an expired queue. When the active queue is empty the expired queue will become the active queue and vice versa.
However, some enterprise Linux distributions such as SUSE Linux Enterprise Server replaced this scheduler with a backport of the O(1) scheduler (which was maintained by Alan Cox in his Linux 2.4-ac Kernel series) to the Linux 2.4 kernel used by the distribution.
Linux 2.6.0 to Linux 2.6.22
[edit]In versions 2.6.0 to 2.6.22, the kernel used an O(1) scheduler developed by Ingo Molnar and many other kernel developers during the Linux 2.5 development. For many kernel in time frame, Con Kolivas developed patch sets which improved interactivity with this scheduler or even replaced it with his own schedulers.
Linux 2.6.23 to Linux 6.5
[edit]Con Kolivas' work, most significantly his implementation of fair scheduling named Rotating Staircase Deadline (RSDL), inspired Ingo Molnár to develop the Completely Fair Scheduler (CFS) as a replacement for the earlier O(1) scheduler, crediting Kolivas in his announcement.[18] CFS is the first implementation of a fair queuing process scheduler widely used in a general-purpose operating system.[19]
The CFS uses a well-studied, classic scheduling algorithm called fair queuing originally invented for packet networks. Fair queuing had been previously applied to CPU scheduling under the name stride scheduling. The fair queuing CFS scheduler has a scheduling complexity of , where N is the number of tasks in the runqueue. Choosing a task can be done in constant time, but reinserting a task after it has run requires operations, because the run queue is implemented as a red–black tree.
The Brain Fuck Scheduler, also created by Con Kolivas, is an alternative to the CFS.
Linux 6.6
[edit]In 2023, Peter Zijlstra proposed replacing CFS with an earliest eligible virtual deadline first scheduling (EEVDF) process scheduler.[20][21] The aim was to remove the need for CFS latency nice patches.[22]
Linux 6.12
[edit]Linux 6.12 added support for userspace scheduler extensions, also known as sched_ext.[23] These schedulers can be installed and replace the default scheduler.[24]
FreeBSD
[edit]FreeBSD uses a multilevel feedback queue with priorities ranging from 0–255. 0–63 are reserved for interrupts, 64–127 for the top half of the kernel, 128–159 for real-time user threads, 160–223 for time-shared user threads, and 224–255 for idle user threads. Also, like Linux, it uses the active queue setup, but it also has an idle queue.[25]
NetBSD
[edit]NetBSD uses a multilevel feedback queue with priorities ranging from 0–223. 0–63 are reserved for time-shared threads (default, SCHED_OTHER policy), 64–95 for user threads which entered kernel space, 96–128 for kernel threads, 128–191 for user real-time threads (SCHED_FIFO and SCHED_RR policies), and 192–223 for software interrupts.
Solaris
[edit]Solaris uses a multilevel feedback queue with priorities ranging between 0 and 169. Priorities 0–59 are reserved for time-shared threads, 60–99 for system threads, 100–159 for real-time threads, and 160–169 for low priority interrupts. Unlike Linux,[25] when a process is done using its time quantum, it is given a new priority and put back in the queue. Solaris 9 introduced two new scheduling classes, namely fixed-priority class and fair share class. The threads with fixed priority have the same priority range as that of the time-sharing class, but their priorities are not dynamically adjusted. The fair scheduling class uses CPU shares to prioritize threads for scheduling decisions. CPU shares indicate the entitlement to CPU resources. They are allocated to a set of processes, which are collectively known as a project.[7]
Summary
[edit]| Operating System | Preemption | Algorithm |
|---|---|---|
| Amiga OS | Yes | Prioritized round-robin scheduling |
| FreeBSD | Yes | Multilevel feedback queue |
| Linux kernel before 2.6.0 | Yes | Multilevel feedback queue |
| Linux kernel 2.6.0–2.6.23 | Yes | O(1) scheduler |
| Linux kernel 2.6.23–6.6 | Yes | Completely Fair Scheduler |
| Linux kernel 6.6 and later | Yes | Earliest eligible virtual deadline first scheduling (EEVDF) |
| classic Mac OS pre-9 | None | Cooperative scheduler |
| Mac OS 9 | Some | Preemptive scheduler for MP tasks, and cooperative for processes and threads |
| macOS | Yes | Multilevel feedback queue |
| NetBSD | Yes | Multilevel feedback queue |
| Solaris | Yes | Multilevel feedback queue |
| Windows 3.1x | None | Cooperative scheduler |
| Windows 95, 98, Me | Half | Preemptive scheduler for 32-bit processes, and cooperative for 16-bit processes |
| Windows NT (including 2000, XP, Vista, 7, and Server) | Yes | Multilevel feedback queue |
See also
[edit]- Activity selection problem
- Aging (scheduling)
- Automated planning and scheduling
- Cyclic executive
- Dynamic priority scheduling
- Foreground-background
- Interruptible operating system
- Least slack time scheduling
- Lottery scheduling
- Priority inversion
- Process states
- Queuing theory
- Rate-monotonic scheduling
- Scheduling (production processes)
- Stochastic scheduling
- Time-utility function
Notes
[edit]- ^ C. L., Liu; James W., Layland (January 1973). "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment". Journal of the ACM. 20 (1). ACM: 46–61. doi:10.1145/321738.321743. S2CID 207669821.
We define the response time of a request for a certain task to be the time span between the request and the end of the response to that request.
- ^ Kleinrock, Leonard (1976). Queueing Systems, Vol. 2: Computer Applications (1 ed.). Wiley-Interscience. p. 171. ISBN 047149111X.
For a customer requiring x sec of service, his response time will equal his service time x plus his waiting time.
- ^ Feitelson, Dror G. (2015). Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press. Section 8.4 (Page 422) in Version 1.03 of the freely available manuscript. ISBN 9781107078239. Retrieved 2015-10-17.
if we denote the time that a job waits in the queue by tw, and the time it actually runs by tr, then the response time is r = tw + tr.
- ^ Silberschatz, Abraham; Galvin, Peter Baer; Gagne, Greg (2012). Operating System Concepts (9 ed.). Wiley Publishing. p. 187. ISBN 978-0470128725.
In an interactive system, turnaround time may not be the best criterion. Often, a process can produce some output fairly early and can continue computing new results while previous results are being output to the user. Thus, another measure is the time from the submission of a request until the first response is produced. This measure, called response time, is the time it takes to start responding, not the time it takes to output the response.
- ^ Paul Krzyzanowski (2014-02-19). "Process Scheduling: Who gets to run next?". cs.rutgers.edu. Retrieved 2023-06-19.
- ^ Raphael Finkel (1988). "Chapter 2: Time Management". An Operating Systems Vade Mecum. Prentice Hall. p. 27.
- ^ a b c Abraham Silberschatz; Peter Baer Galvin; Greg Gagne (2013). Operating System Concepts. Vol. 9. John Wiley & Sons, Inc. ISBN 978-1-118-06333-0.
- ^ Robert Kroeger (2004). "Admission Control for Independently-authored Realtime Applications". UWSpace. http://hdl.handle.net/10012/1170 . Section "2.6 Admission Control". p. 33.
- ^ Guowang Miao; Jens Zander; Ki Won Sung; Ben Slimane (2016). Fundamentals of Mobile Data Networks. Cambridge University Press. ISBN 978-1107143210.
- ^ Early Windows at the Wayback Machine (archive index)
- ^ Sriram Krishnan. "A Tale of Two Schedulers Windows NT and Windows CE". Archived from the original on July 22, 2012.
- ^ "Windows Administration: Inside the Windows Vista Kernel: Part 1". Technet.microsoft.com. 2016-11-14. Retrieved 2016-12-09.
- ^ "Archived copy". blog.gabefrost.com. Archived from the original on 19 February 2008. Retrieved 15 January 2022.
{{cite web}}: CS1 maint: archived copy as title (link) - ^ a b "Technical Note TN2028: Threading Architectures". developer.apple.com. Retrieved 2019-01-15.
- ^ "Mach Scheduling and Thread Interfaces". developer.apple.com. Retrieved 2019-01-15.
- ^ [1] Archived 2011-08-11 at the Wayback Machine
- ^ a b c Jones, M. (2018-09-18) [first published on 2009-12-14]. "Inside the Linux 2.6 Completely Fair Scheduler". developer.ibm.com. Retrieved 2024-02-07.
- ^ Molnár, Ingo (2007-04-13). "[patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]". linux-kernel (Mailing list).
- ^ Tong Li; Dan Baumberger; Scott Hahn. "Efficient and Scalable Multiprocessor Fair Scheduling Using Distributed Weighted Round-Robin" (PDF). Happyli.org. Retrieved 2016-12-09.
- ^ "EEVDF Scheduler May Be Ready For Landing With Linux 6.6". Phoronix. Retrieved 2023-08-31.
- ^ "EEVDF Scheduler Merged For Linux 6.6, Intel Hybrid Cluster Scheduling Re-Introduced". www.phoronix.com. Retrieved 2024-02-07.
- ^ "An EEVDF CPU scheduler for Linux [LWN.net]". LWN.net. Retrieved 2023-08-31.
- ^ "Sched_ext Merged For Linux 6.12 - Scheduling Policies As BPF Programs". www.phoronix.com. Retrieved 2025-02-10.
- ^ "Pluggable CPU schedulers - openSUSE Wiki". en.opensuse.org. Retrieved 2025-02-10.
- ^ a b "Comparison of Solaris, Linux, and FreeBSD Kernels" (PDF). Archived from the original (PDF) on August 7, 2008.
References
[edit]- Błażewicz, Jacek; Ecker, K.H.; Pesch, E.; Schmidt, G.; Weglarz, J. (2001). Scheduling computer and manufacturing processes (2 ed.). Berlin [u.a.]: Springer. ISBN 3-540-41931-4.
- Stallings, William (2004). Operating Systems Internals and Design Principles (fourth ed.). Prentice Hall. ISBN 0-13-031999-6.
- Information on the Linux 2.6 O(1)-scheduler
Further reading
[edit]- Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. Arpaci-Dusseau Books, 2014. Relevant chapters: Scheduling: Introduction Multi-level Feedback Queue Proportional-share Scheduling Multiprocessor Scheduling
- Brief discussion of Job Scheduling algorithms
- Understanding the Linux Kernel: Chapter 10 Process Scheduling
- Kerneltrap: Linux kernel scheduler articles
- AIX CPU monitoring and tuning
- Josh Aas' introduction to the Linux 2.6.8.1 CPU scheduler implementation
- Peter Brucker, Sigrid Knust. Complexity results for scheduling problems [2]
- TORSCHE Scheduling Toolbox for Matlab is a toolbox of scheduling and graph algorithms.
- A survey on cellular networks packet scheduling
- Large-scale cluster management at Google with Borg
Scheduling (computing)
View on GrokipediaFundamentals
Definition and Scope
In computing, scheduling refers to the mechanism employed by an operating system to determine which process or task receives access to system resources, such as the central processing unit (CPU), and for what duration, thereby managing the execution order to optimize resource utilization.[6] This process involves selecting from a ready queue of competing entities and allocating time slices, but it differs from broader resource allocation, which encompasses assigning static resources like memory or I/O devices without regard to temporal sequencing.[7][8] The scope of scheduling extends beyond CPU management to include I/O scheduling, which orders requests to peripherals like disks to minimize access delays, and task scheduling, which coordinates overall workload in diverse environments.[6] In batch systems, scheduling processes non-interactive jobs sequentially for efficient throughput; interactive systems prioritize quick response times for user-facing tasks via time-sharing; and real-time systems enforce strict deadlines to ensure timely completion, often in embedded or critical applications.[1][9] Key terminology in scheduling includes processes, which are instances of executing programs with independent address spaces and resources; threads, which are lightweight subunits within a process that share its address space for concurrent execution.[10] Context switching occurs when the operating system saves the state (registers, program counter) of the current process and restores that of another, enabling multitasking but incurring overhead.[11] Turnaround time measures the interval from a process's arrival in the system to its completion, reflecting overall efficiency.[12] Consider a simple single-processor scenario where three processes—P1, P2, and P3—arrive simultaneously in the ready queue, each requiring distinct CPU bursts (e.g., 10 ms, 5 ms, and 8 ms respectively). The scheduler selects one to execute exclusively on the CPU, suspending others until the running process completes its burst or yields, then performs a context switch to resume the next, illustrating how contention for the single processor necessitates decisions on execution order.[2]Historical Development
The origins of scheduling in computing trace back to the 1950s, when early systems like the IBM 701 operated primarily through batch processing. In these setups, jobs were grouped into batches and processed sequentially without automated scheduling mechanisms; programmers or operators manually loaded and executed programs one at a time, leading to significant downtime between jobs due to setup and I/O operations.[13][14] This approach, implemented by organizations such as General Motors Research Laboratories for the IBM 701 around 1956, marked the rudimentary beginnings of job management but lacked concurrency or prioritization.[13] A pivotal advancement occurred in 1964 with the release of IBM's OS/360, the first widely adopted operating system supporting multiprogramming, which allowed multiple programs to reside in memory simultaneously and share system resources. OS/360 employed a basic first-come, first-served (FCFS) strategy for initiating jobs from the input queue, enabling more efficient CPU utilization by overlapping computation with I/O activities in its Multiprogramming with a Fixed Number of Tasks (MFT) variant.[15][16] This shift from single-job execution to multiprogramming laid the groundwork for modern scheduling by addressing resource contention in batch environments.[17] The late 1960s and 1970s saw a transition to time-sharing systems, enabling interactive computing and more dynamic scheduling. Multics, operational from 1969, introduced time-slicing to support multiple users concurrently, allowing processes to share the CPU in short intervals for responsive terminal interactions.[18][19] Similarly, Unix, developed starting in 1969 and maturing through the 1970s, adopted time-sharing principles to facilitate multi-user environments, with its scheduler emphasizing fairness and interactivity for research and development workloads.[20] During this period, priority-based and round-robin scheduling emerged as key innovations; priority schemes assigned urgency levels to processes to meet varying needs, while round-robin allocated fixed time quanta in a cyclic manner to prevent indefinite delays, both refined in systems like Unix and VMS throughout the 1970s and 1980s.[21][22] Real-time scheduling gained prominence in the 1980s, driven by applications in embedded and critical systems requiring predictable response times. The POSIX standards, initiated by IEEE in 1984 and formalized in IEEE 1003.1 by 1988, incorporated real-time extensions (later POSIX.1b in 1993) that standardized scheduling interfaces for priority inheritance and deadline-driven policies, promoting portability across Unix-like systems.[23][21] Influential work by Edsger Dijkstra, including his 1965 introduction of semaphores for process synchronization, indirectly shaped scheduling by providing primitives to coordinate concurrent executions and avoid issues like deadlock in multiprogrammed environments.Goals and Criteria
Performance Objectives
The primary performance objectives of CPU scheduling revolve around optimizing resource usage and process execution efficiency. CPU utilization seeks to maximize the percentage of time the processor is actively executing processes rather than idling, often aiming for 40% to 90% in multiprogrammed environments to avoid wasted cycles.[2] Throughput measures the number of processes completed per unit time, emphasizing overall system productivity.[24] Turnaround time represents the total duration from process submission to completion, while waiting time quantifies the aggregate period processes spend in the ready queue awaiting allocation.[24] Response time, particularly relevant for interactive workloads, tracks the interval from request initiation to the first output delivery.[24] Fairness ensures equitable resource distribution, preventing indefinite delays for lower-priority processes and mitigating risks like starvation.[25] These objectives frequently involve inherent trade-offs, as optimizing one can degrade others. For example, strategies that boost throughput by prioritizing CPU-intensive tasks may prolong response times in interactive settings, where users expect rapid feedback.[25] Similarly, minimizing waiting times for short jobs might extend turnaround for longer ones, requiring schedulers to balance efficiency against equity based on workload characteristics.[25] Scheduling goals vary by system context to align with operational demands. In batch processing environments, the emphasis lies on maximizing throughput and reducing turnaround time to handle bulk jobs efficiently without user interaction.[26] Time-sharing systems, designed for multiple concurrent users, prioritize low response times to simulate dedicated processing and maintain user satisfaction.[26] Real-time systems focus on deadline adherence, ensuring critical tasks complete within specified time bounds to support applications like embedded controls, even if it limits flexibility in other metrics.[26] Beyond core performance, scheduling incorporates non-performance considerations such as fairness mechanisms that avert starvation. In resource-constrained mobile and embedded platforms, energy efficiency emerges as a vital objective, with schedulers dynamically scaling CPU frequencies to curb power draw—achieving 7% to 72% energy reduction in multimedia workloads—while preserving deadline compliance.[27] As of 2025, additional goals include sustainability through reduced carbon footprint in data centers and integration of machine learning for adaptive policies that optimize for dynamic workloads.[28]Evaluation Metrics
Evaluation metrics in computing scheduling assess the effectiveness of schedulers by quantifying how well they meet performance goals, such as efficiency and timeliness. These metrics are broadly divided into quantitative measures, which provide numerical evaluations, and qualitative measures, which address system behavior and equity. Quantitative metrics focus on time-based and resource usage indicators, while qualitative ones emphasize aspects like equity and reliability in specialized environments. Key quantitative metrics include turnaround time, waiting time, and response time, which capture different facets of process execution delays. Turnaround time for a process is defined as the interval from its arrival to the system until its completion, calculated as . Waiting time represents the total time a process spends in the ready queue, excluding its actual execution, given by , where is the process's CPU execution time. Response time measures the duration from a process's arrival until its first response, particularly relevant for interactive systems, defined as the time to the initial output or action following input. Throughput and CPU utilization evaluate overall system productivity and resource efficiency. Throughput is the number of processes completed per unit time, often expressed as processes per second or hour, indicating the scheduler's ability to handle workload volume. CPU utilization quantifies the percentage of time the CPU is actively executing processes rather than idling, computed as , where is the time the CPU is in use and is the overall observation period. High utilization close to 100% signifies effective scheduling without excessive idle periods. Qualitative metrics address non-numerical qualities essential for balanced and reliable scheduling. Fairness ensures that no process is indefinitely delayed, preventing issues like starvation where low-priority processes never execute; this is achieved through mechanisms like aging priorities to promote equitable resource allocation over time. In real-time systems, predictability is critical, referring to the consistency and boundability of response times to meet deadlines, often analyzed through schedulability tests that verify if all tasks can complete within specified intervals without variance due to workload fluctuations.[29] Schedulers are evaluated using simulation and benchmarking methods to model real-world behaviors under controlled conditions. Simulation involves creating virtual environments to replay workloads and compute metrics like average waiting time across scenarios, allowing comparison of algorithms without hardware risks; tools often incorporate trace-driven or synthetic workloads to assess scalability. Benchmarking employs standardized suites, such as those with mixed interactive and batch processes, to measure metrics empirically on actual systems, providing insights into practical performance under varying loads.Scheduler Components
Process Scheduler
The process scheduler, also known as the short-term scheduler or CPU scheduler, is a critical kernel module in operating systems that selects a process from the ready queue for execution on the central processing unit (CPU).[30] Its primary function is to allocate CPU time to processes in a manner that optimizes system performance while adhering to defined policies, ensuring efficient resource utilization among concurrently executing programs.[30] This selection occurs frequently, often every few milliseconds, to maintain responsiveness and throughput in multiprogrammed environments.[1] Key components of the process scheduler include the ready queue, which maintains a list of processes loaded in memory and awaiting CPU allocation, and the blocked queue, which holds processes suspended due to pending events such as I/O operations.[31] The scheduler enforces the operating system's scheduling policy, which dictates criteria like priority or arrival time for choosing the next process, thereby managing transitions between queues without delving into specific algorithmic details.[31] These structures are typically implemented using data structures like linked lists or priority queues to support efficient insertions and removals.[31] The process scheduler interacts closely with the memory management subsystem to verify that candidate processes have adequate virtual or physical memory allocated before placement in the ready queue, preventing execution of under-provisioned tasks.[32] Similarly, it coordinates with the I/O subsystem: upon a process initiating an I/O request, the scheduler relocates it to the blocked queue to free the CPU for other tasks, and upon I/O completion signaled via interrupts, it requeues the process as ready.[1] These interactions ensure seamless integration across subsystems, avoiding resource contention and maintaining overall system balance.[1] In multithreaded environments, the kernel's process scheduler treats lightweight threads as the schedulable unit rather than entire processes, allowing finer-grained control over execution within a shared address space.[33] This contrasts with user-level thread schedulers, which operate within a single process to multiplex threads onto fewer kernel threads without direct OS intervention, potentially reducing overhead but limiting kernel oversight of priorities.[33] An example workflow illustrates these dynamics through process state transitions: a process enters the ready queue after creation or unblocking, where the scheduler evaluates it against the policy; upon selection, it transitions to the running state, executing until interruption, completion, or preemption, at which point it may return to ready or move to blocked.[34] This cycle repeats, with the scheduler invoking at context switches to sustain continuous operation.[34]Dispatcher
The dispatcher is a module within the operating system kernel responsible for giving control of the CPU to the process selected by the short-term scheduler.[35] This handover enables the selected process to execute on the CPU, marking the transition from scheduling decision to actual process execution.[36] The core function of the dispatcher is to perform a context switch, which involves saving the state of the currently running process and restoring the state of the incoming process.[35] The saved context includes the contents of CPU registers, the program counter (indicating the next instruction to execute), and relevant portions of the process control block (PCB), a data structure that stores process metadata such as priority, memory allocation, and open files.[36] Once the new process's context is loaded, the dispatcher switches the CPU to user mode and jumps to the restored program counter location to resume execution.[35] Dispatching incurs overhead due to the time and CPU cycles required for context switching, which is non-productive time during which no user process advances. Typical context switch times range from 1 to 10 microseconds on modern systems, depending on hardware architecture and the extent of state saved (e.g., registers versus full page table flushes).[35] This overhead scales with factors like cache invalidation and TLB (translation lookaside buffer) reloads but remains small relative to process quanta in most scheduling policies.[36] Dispatch latency represents the total time from the scheduler's selection of a new process until it begins execution, comprising the time for the scheduler to select the process plus the context switch time performed by the dispatcher.[35] In real-time systems, minimizing this latency is critical, as excessive delays can violate timing constraints; for instance, it includes phases for process preemption and resource release if needed.[35] During the dispatch process, particularly the context switch, the operating system typically disables interrupts to ensure the operation's atomicity and prevent interference from concurrent events.[37] This masking of interrupts and traps avoids race conditions, such as partial state saves being corrupted by an incoming interrupt, though it is re-enabled promptly after the switch to maintain system responsiveness.[38] In kernel-mode execution, traps (e.g., from invalid instructions) are similarly handled by saving state before invoking handlers, but the dispatcher ensures no nested switches occur mid-operation.[37]Scheduling Levels
Long-term Scheduling
Long-term scheduling, also known as job scheduling or admission scheduling, refers to the operating system mechanism that decides which new processes or jobs to admit into the system from an external pool, thereby controlling the overall system load and resource utilization.[31] This scheduler operates infrequently, often only when a process terminates or significant resource changes occur, and its primary role is to select processes for loading into memory to maintain an efficient balance between computational demands and available resources, such as CPU, memory, and I/O devices.[39] By regulating entry, it prevents system overload that could degrade performance, ensuring that admitted processes can execute without excessive contention.[40] A key aspect of long-term scheduling is managing the degree of multiprogramming, defined as the maximum number of processes that can reside in memory simultaneously for efficient execution on a single-processor system.[41] This degree is typically constrained by physical or virtual memory capacity, process resource requirements, and system throughput goals, with higher degrees improving CPU utilization through overlap of computation and I/O but risking resource exhaustion if exceeded.[41] Systems may employ a fixed degree of multiprogramming, where the number of concurrent processes is predetermined and static (e.g., limited to a set number of partitions), or a variable degree, which dynamically adjusts based on current resource availability and workload characteristics to optimize performance.[17] Admission control policies in long-term scheduling evaluate incoming processes based on factors like resource availability, process priority, and estimated resource needs, such as expected execution time or I/O demands, to determine admissibility.[31] These policies ensure that only processes likely to complete without starving the system are accepted, often rejecting or queuing others until capacity allows.[39] In batch processing environments, the long-term scheduler interacts closely with job queues, selecting jobs from a backlog to create an optimal mix—typically a balance of CPU-intensive and I/O-intensive processes—to maximize resource usage and minimize idle time across devices.[40] An illustrative example is IBM's OS/360, where long-term scheduling limited the degree of multiprogramming to prevent thrashing, a state of severe performance degradation due to excessive paging when memory demands outstrip capacity.[17] In its Multiprogramming with Fixed Tasks (MFT) variant, the system enforced a fixed degree by partitioning memory into 4 to 15 static regions, admitting only as many jobs as could fit without overlap.[17] The Multiprogramming with Variable Tasks (MVT) variant allowed a variable degree using dynamic allocation, further mitigating thrashing by aligning admissions with available contiguous space.[17] Seminal work by Peter Denning on the working set model influenced such controls, advocating for admission decisions that monitor a process's active memory footprint to sustain multiprogramming without inducing thrashing.[42]Medium-term Scheduling
Medium-term scheduling manages the degree of multiprogramming by suspending and resuming processes, thereby adjusting the number of active processes in main memory to maintain system efficiency.[43] Its primary role is to remove processes from memory through swapping when the system experiences overload, preventing thrashing—a condition where excessive paging degrades performance as processes compete for limited memory resources.[44] By suspending low-priority or inactive processes, this level of scheduling reduces memory pressure and allows the system to focus resources on executing tasks, improving overall throughput without permanently terminating processes.[45] Swapping in medium-term scheduling is triggered by conditions such as memory pressure, where the number of resident processes exceeds available physical memory, leading to frequent page faults; prolonged I/O waits, during which processes are idle and can be temporarily removed without disrupting progress; or explicit policy decisions to balance load across the system.[46] For instance, when thrashing is detected—often by monitoring low CPU utilization alongside high paging rates—the scheduler identifies and swaps out entire processes to secondary storage, freeing contiguous memory blocks.[47] Unlike paging, which involves demand-fetching individual pages on faults, swapping entails transferring the complete process image to disk, making it suitable for coarse-grained memory management but incurring higher overhead due to large data movements.[48] Resuming swapped processes occurs based on criteria such as process priority, where higher-priority tasks are brought back into memory first to meet fairness or deadline requirements, or system-wide policies ensuring equitable resource allocation among waiting processes.[49] Once memory availability improves or thrashing subsides, the medium-term scheduler selects processes for reactivation, often prioritizing those that were swapped out due to temporary I/O blocks to optimize I/O throughput.[50] Historically, medium-term scheduling via swapping was integral to early Unix systems for memory management, enabling multiprogramming on resource-constrained hardware like the PDP-7 and PDP-11.[51] In these implementations, the kernel used swapping primitives to move entire processes to disk during fork operations or overloads, supporting process suspension and resumption to handle multiple users without virtual memory paging, which was introduced later.[51] This approach persisted in Unix variants to mitigate thrashing even after demand paging became standard, providing a mechanism to control the multiprogramming level dynamically.[52]Short-term Scheduling
Short-term scheduling, also known as CPU scheduling, is the operating system mechanism responsible for selecting processes from the ready queue and allocating CPU time to them for execution, typically occurring on timescales of milliseconds to ensure efficient processor utilization.[1] This scheduler operates frequently, often invoked every few milliseconds via timer interrupts, to decide which ready process receives the next CPU burst, thereby managing the transition of processes from the ready state to the running state.[26] The primary goal is to optimize CPU usage while balancing competing demands from multiple processes competing for the processor. Short-term scheduling can operate in preemptive or non-preemptive modes. In non-preemptive scheduling, once a process is allocated the CPU, it runs to completion or until it voluntarily relinquishes the processor, such as through I/O requests or termination, minimizing overhead from frequent switches but potentially leading to longer wait times for other processes.[1] Conversely, preemptive scheduling allows the operating system to interrupt a running process at any time—typically via hardware interrupts—and reallocate the CPU to another ready process, enabling better fairness and adaptability to varying workloads; modern systems predominantly use preemptive approaches to support multitasking environments.[26] The frequency of short-term scheduling decisions is influenced by time quantum considerations, where a fixed or dynamic slice of CPU time (often 10-100 milliseconds) determines how long a process can run before potential preemption, balancing throughput and overhead from context switches.[1] Context switches are integrated through interrupts, such as clock timers that signal the end of a time quantum or external events like I/O completion, prompting the scheduler to save the current process state, select a new one from the ready queue, and restore its context via the dispatcher.[53] In interactive systems, short-term scheduling significantly impacts responsiveness, as preemptive mechanisms with shorter quanta reduce the time from process initiation to first execution, ensuring quick feedback for user inputs and preventing any single process from monopolizing the CPU, which is critical for real-time user experiences.[1] This enhances overall system interactivity without compromising stability, though excessive switching can introduce overhead from state saving and cache invalidation.[26]Scheduling Algorithms
First-Come, First-Served
First-Come, First-Served (FCFS) scheduling, also known as First-In, First-Out (FIFO), is the simplest CPU scheduling algorithm implemented in operating systems. In this non-preemptive approach, processes are executed strictly in the order of their arrival to the ready queue, treating the queue as a standard FIFO structure where the first process to enter is the first to be allocated CPU time until completion or blocking.[1][2] This method requires minimal overhead, as the scheduler simply dequeues the head process without evaluating priorities or estimated run times.[54] The primary advantages of FCFS include its straightforward implementation, which involves no complex decision-making, and its fairness based on arrival order, ensuring that no process experiences starvation as long as it has a finite execution time and the system does not encounter indefinite blocking.[12][55] However, FCFS suffers from significant drawbacks, particularly the convoy effect, where a single long-running process at the front of the queue delays all subsequent shorter processes, leading to inefficient CPU utilization and substantially increased average waiting times.[2][1] This effect is especially pronounced in mixed workloads with varying process burst times, resulting in poor responsiveness for interactive or short jobs.[12] To illustrate, consider three processes arriving in the order P1, P2, P3 with CPU burst times of 24 ms, 3 ms, and 3 ms, respectively. Under FCFS, the execution sequence is P1 followed by P2 and then P3, yielding the following Gantt chart:| Time | 0-24 | 24-27 | 27-30 |
|---|---|---|---|
| Process | P1 | P2 | P3 |
Shortest Job First
Shortest Job First (SJF) is a non-preemptive CPU scheduling algorithm that selects the process from the ready queue with the minimal estimated next CPU burst time for execution.[1] Once a process begins running, it continues until completion, after which the scheduler selects the next shortest job from the ready queue.[1] This approach assumes that burst times are known or can be accurately estimated, allowing the operating system to prioritize shorter jobs to reduce overall queue delays.[1] SJF is optimal for minimizing the average waiting time among non-preemptive scheduling algorithms when burst times are known in advance.[59] The proof relies on an exchange argument: any schedule that deviates from ordering jobs by non-decreasing burst times can be improved by swapping adjacent inversions (longer job before shorter one), reducing total waiting time without increasing it elsewhere, until the SJF order is reached.[59] In practice, burst times are estimated using techniques like exponential averaging, where the predicted burst for the next interval is computed as ; here, is the actual length of the previous burst, is the prior prediction, and (0 ≤ ≤ 1) weights recent observations against historical estimates.[60] A key advantage of SJF is its ability to significantly lower average waiting and turnaround times compared to simpler policies like first-come, first-served, especially when job lengths vary widely.[1] However, it requires reliable burst time predictions, which can be challenging if processes exhibit variable behavior, leading to suboptimal scheduling.[1] Additionally, SJF risks starvation for longer jobs, as continually arriving short jobs may indefinitely delay them in the ready queue.[1] For illustration, consider four processes arriving at time 0 with burst times of 6 ms, 8 ms, 7 ms, and 3 ms. SJF schedules them in ascending order: the 3 ms process first (waiting time 0 ms), followed by the 6 ms (waiting time 3 ms), 7 ms (waiting time 9 ms), and 8 ms (waiting time 16 ms). The average waiting time is ms.[1] A preemptive variant known as Shortest Remaining Time First (SRTF) addresses some limitations by interrupting longer jobs for shorter arrivals but is discussed separately.[1]Priority Scheduling
Priority scheduling is a CPU scheduling algorithm in which each process is assigned a priority value, and the process with the highest priority is selected to run next.[61] This approach allows the operating system to favor processes deemed more important, such as those handling critical system tasks or user interactions. The concept of priority-based process scheduling was introduced in early multiprogramming systems, where priorities ensured quick response times for high-importance activities through preemptive allocation via clock interrupts.[62] The algorithm can operate in either preemptive or non-preemptive modes. In preemptive priority scheduling, a higher-priority process arriving while a lower-priority one is running will interrupt and preempt it, ensuring immediate execution of urgent tasks.[61] Non-preemptive priority scheduling, conversely, allows the currently running process to complete before switching to a higher-priority one, which may introduce delays but reduces context-switching overhead.[61] Priorities can be assigned in various ways: statically, where a fixed priority is set at process creation and remains unchanged; dynamically, where priorities adjust based on system conditions; or user-defined, allowing administrators or users to specify importance levels.[61] Dynamic assignment often incorporates aging, a technique that gradually increases the priority of a waiting process over time to prevent indefinite postponement, or starvation, of low-priority tasks.[63] One advantage of priority scheduling is its ability to support important tasks by ensuring they receive CPU time promptly, improving overall system responsiveness for critical workloads.[61] However, without mechanisms like aging, low-priority processes risk starvation, as higher-priority arrivals can repeatedly defer them, leading to indefinite postponement.[61] ExampleConsider four processes with arrival times of 0 and the following priorities (lower number indicates higher priority) and burst times: P1 (priority 3, burst 8), P2 (priority 1, burst 4), P3 (4, burst 2), P4 (2, burst 6). In a preemptive priority queue, the scheduler selects P2 first (highest priority), runs it to completion, then P4, followed by P1, and finally P3, yielding a scheduling order of P2 → P4 → P1 → P3.[61] Priority scheduling often integrates with multilevel feedback queues, where queue levels represent priority classes, and processes may move between queues based on behavior to balance responsiveness and fairness in time-sharing environments.[64]
Round-Robin Scheduling
Round-robin (RR) scheduling is a preemptive CPU scheduling algorithm primarily used in time-sharing operating systems to ensure fair allocation of processor time among multiple processes. In this approach, the ready queue is treated as a circular queue, and each process is assigned a fixed unit of CPU time, called the time quantum or time slice, typically ranging from a few milliseconds to tens of milliseconds. The scheduler executes the first process in the queue for up to the quantum duration; if the process completes or blocks (e.g., for I/O), it exits the queue, but if it exceeds the quantum, it is preempted and moved to the end of the queue, allowing the next process to run. This cyclic execution continues until all processes finish, preventing any single process from monopolizing the CPU. The algorithm originated in early time-sharing systems like the Compatible Time-Sharing System (CTSS) developed at MIT in the 1960s, where it enabled multiple users to interact with the computer simultaneously by rapidly switching contexts.[65][2] The choice of time quantum is critical to the algorithm's performance and is generally set between 10 and 100 milliseconds, balancing responsiveness and overhead. A very small quantum (e.g., 1 ms) results in excessive context switches, which consume CPU cycles and degrade throughput due to the overhead of saving and restoring process states—context switch times can be 10-100 microseconds on modern systems. Conversely, a large quantum (e.g., exceeding 100 ms) reduces preemption frequency but approximates first-come, first-served (FCFS) scheduling, leading to poor response times for interactive tasks as short processes wait behind long ones. Optimal quantum selection often depends on system workload; for instance, empirical studies suggest values around 20-50 ms for typical desktop environments to minimize average turnaround time while keeping context switch overhead below 5-10% of CPU utilization.[66][67] RR scheduling excels in providing fairness and low average response times in multi-user, time-sharing scenarios, as every process receives equal opportunity regardless of burst length, eliminating starvation and supporting interactive applications with quick feedback—response times can be under 100 ms for short bursts. However, its fixed quanta can disadvantage long-running jobs, which suffer frequent preemptions and increased waiting if the quantum is small, potentially raising average waiting times by 20-50% compared to non-preemptive methods for CPU-intensive workloads. In practice, RR is simple to implement with low scheduling overhead (O(1) per dispatch using a queue), making it suitable for systems prioritizing equity over strict optimality. It can be enhanced with priority mechanisms to adjust effective quanta, though such variants are discussed separately.[68][24] To illustrate, consider three processes—P1, P2, and P3—all arriving at time 0 with CPU burst times of 24 ms, 3 ms, and 3 ms, respectively, under a 4 ms quantum. The execution proceeds as follows:- 0-4 ms: P1 runs (remaining: 20 ms); queue becomes P2, P3, P1.
- 4-7 ms: P2 runs (completes); queue becomes P3, P1.
- 7-10 ms: P3 runs (completes); queue becomes P1.
- 10-14 ms: P1 runs (remaining: 16 ms).
- 14-18 ms: P1 runs (remaining: 12 ms).
- 18-22 ms: P1 runs (remaining: 8 ms).
- 22-26 ms: P1 runs (remaining: 4 ms).
- 26-30 ms: P1 runs (completes).
| P1 | P2 | P3 | P1 | P1 | P1 | P1 | P1 |
0 4 7 10 14 18 22 26 30
| P1 | P2 | P3 | P1 | P1 | P1 | P1 | P1 |
0 4 7 10 14 18 22 26 30
Multilevel Queue Scheduling
Multilevel queue scheduling is a CPU scheduling algorithm that partitions the ready queue into multiple separate queues, with each queue dedicated to a specific class of processes, such as system processes, interactive processes, or batch processes.[71] Processes are permanently assigned to one queue based on their type or priority at creation, and each queue operates independently using its own scheduling discipline tailored to the needs of that process class—for instance, round-robin (RR) for interactive queues to ensure quick response times, or first-come, first-served (FCFS) for batch queues to handle long-running jobs efficiently.[71] This approach allows the operating system to optimize resource allocation for diverse workloads without applying a uniform policy to all processes.[72] Scheduling among the queues themselves is managed through inter-queue policies, typically either fixed priority scheduling—where higher-priority queues (e.g., system or interactive) are always served before lower ones—or time-slice allocation, which divides CPU time proportionally across queues (e.g., 80% to an interactive queue using RR and 20% to a batch queue using FCFS).[71] In fixed priority, the scheduler exhaustively processes higher queues before moving to lower ones, while time-slice methods rotate service to prevent indefinite delays.[72] A common example involves two queues: a foreground queue for interactive processes scheduled with RR using a small time quantum (e.g., to prioritize responsiveness), and a background queue for batch processes using FCFS to minimize overhead for non-time-sensitive tasks.[71] The primary advantages of multilevel queue scheduling include its ability to customize algorithms to the characteristics of different process types, thereby improving overall system performance for mixed workloads, and its relative simplicity in implementation for systems with well-defined process categories.[72] However, it suffers from rigidity, as processes cannot change queues, potentially leading to suboptimal assignments if process behavior varies; fixed priority schemes also risk starvation of lower-priority queues if higher ones remain busy.[71] Variants, such as multilevel feedback queues, address some limitations by permitting processes to move between queues based on observed behavior, like CPU usage, to better approximate dynamic priorities.[71]Real-time Scheduling
Real-time scheduling in computing prioritizes meeting deadlines for task execution over metrics like average throughput or fairness, ensuring timely responses in systems where timing constraints are critical. Unlike general-purpose scheduling, which often balances resource utilization across tasks, real-time approaches guarantee that tasks complete before their specified deadlines to prevent system failures or degraded performance. This discipline is essential in embedded systems, control applications, and safety-critical environments, where violations can lead to catastrophic outcomes.[73] Real-time systems are classified into hard and soft types based on the consequences of deadline misses. In hard real-time systems, every deadline must be met without exception, as failures can result in total system breakdown; examples include avionics control and medical devices, where timing precision is paramount for safety. Soft real-time systems tolerate occasional misses, accepting degraded quality rather than failure, such as in multimedia streaming or network packet processing, where latency spikes may cause minor artifacts but do not halt operations. This distinction guides algorithm selection, with hard systems demanding stricter guarantees and soft ones allowing flexibility for higher utilization.[74][75] Tasks in real-time scheduling are modeled as periodic or aperiodic to capture their invocation patterns. Periodic tasks recur at fixed intervals, defined by a period , worst-case execution time , and relative deadline , forming the basis for predictable scheduling in recurring workloads like sensor sampling. Aperiodic tasks arrive irregularly without a fixed rate, often triggered by external events, requiring dynamic handling to integrate with periodic ones without violating guarantees; for instance, a sporadic variant assumes minimum inter-arrival times to bound worst-case behavior. These models enable schedulability analysis, ensuring the processor can handle the workload without overload.[73][76] Key algorithms include Rate Monotonic (RM) and Earliest Deadline First (EDF), which differ in priority assignment. RM is a fixed-priority scheme where tasks with shorter periods receive higher priorities, assuming deadlines equal periods; it is optimal among static-priority algorithms for periodic tasks on uniprocessors. A schedulability test for RM states that a task set is feasible if its total utilization satisfies , where is the number of tasks, providing a conservative bound of approximately 0.693 for large . EDF, a dynamic-priority approach, schedules the task with the nearest absolute deadline, achieving up to 100% utilization for feasible sets and optimality for dynamic policies; it dynamically recomputes priorities at runtime, making it suitable for varying workloads but potentially more overhead-intensive. Both were foundational in establishing real-time theory for hard constraints.[73] A critical challenge in real-time scheduling is priority inversion, where a high-priority task is delayed by a low-priority one holding a shared resource, exacerbated by medium-priority tasks preempting the low one. To mitigate this, priority inheritance protocols temporarily elevate the low-priority task's priority to match the highest waiting task, ensuring bounded blocking times and preserving schedulability. The basic priority inheritance protocol, for example, propagates the highest pending priority to the resource holder until release, preventing unbounded delays in mutex-based synchronization. This mechanism is widely adopted in real-time kernels to maintain timing guarantees.[77] POSIX real-time extensions, defined in IEEE Std 1003.1b-1993, standardize scheduling for portable real-time applications on Unix-like systems. They introduce policies like SCHED_FIFO (first-in, first-out fixed priority) and SCHED_RR (round-robin within priority levels), alongside SCHED_OTHER for default time-sharing, enabling preemptive priority-based dispatching with at least 32 priority levels. These extensions support real-time signals, semaphores, and timers for precise control, facilitating deadline-driven implementations while ensuring compatibility across compliant operating systems.[78]Advanced Topics
Work-Conserving Schedulers
A work-conserving scheduler is defined as one that keeps the processor busy at all times when there are ready processes available to execute, idling only in the absence of such work.[79] This property contrasts with non-work-conserving schedulers, which may intentionally leave resources idle, such as through mechanisms that enforce minimum delay bounds or reserved time slots for specific flows.[80] Work-conserving schedulers inherently maximize processor utilization by ensuring no unnecessary idle time occurs.[81] Common examples include First-Come, First-Served (FCFS), which processes jobs in arrival order without idling if work remains; Shortest Job First (SJF), which prioritizes shorter jobs but continues execution without gaps; and Round-Robin (RR), which cycles through processes in fixed time slices while keeping the CPU occupied.[82][83][84] These schedulers generally reduce average response times compared to non-work-conserving alternatives by fully utilizing available capacity, though they can introduce greater variance in response times, particularly for longer jobs in priority-based schemes like SJF.[85] Work-conserving principles also underpin approximations of Generalized Processor Sharing (GPS), an idealized fluid model introduced by Parekh and Gallager that allocates processor shares proportionally to weights while maintaining full utilization.[86] In networking contexts, Weighted Fair Queuing (WFQ) serves as a packet-based, work-conserving approximation of GPS, ensuring proportional bandwidth allocation without idling the link when packets are queued.[87] In real-time systems, work-conserving schedulers may fail to meet hard deadlines, as their focus on utilization can lead to unpredictable delays for high-priority tasks, necessitating non-work-conserving approaches, such as those using resource reservations or idle insertions, for predictability in scenarios where work-conserving schedulers may miss deadlines.[88]Scheduling Optimization
Scheduling optimization in computing treats task allocation as a combinatorial optimization problem, aiming to minimize objectives like makespan, total completion time, or lateness while respecting constraints such as processing times, deadlines, and resource limits. These problems are formalized using standard notation from scheduling theory, where the notation denotes single-machine makespan minimization without preemption, and represents identical parallel machines. On a single machine, is trivially solved by executing all tasks sequentially, yielding a makespan equal to the sum of processing times, as no ordering affects the result.[89] In contrast, the multiprocessor variant —assigning tasks to multiple identical processors to minimize the maximum completion time—is NP-hard in the strong sense, even for fixed numbers of machines greater than or equal to two, as shown by reduction from the 3-PARTITION problem.[90] Approximation algorithms provide efficient heuristics for NP-hard scheduling problems like , trading optimality for polynomial-time solvability. List scheduling assigns each task to the processor that can start it earliest, achieving an approximation ratio of relative to the optimal makespan, where is the number of processors; this bound is tight in the worst case.[91] The longest processing time first (LPT) variant improves this by sorting tasks in decreasing order of processing time before applying list scheduling, yielding a tighter bound of , which demonstrates better performance on instances with varying task lengths.[91] These algorithms are particularly useful in practice for their simplicity and low overhead, often integrated into larger systems while maintaining work-conserving properties to avoid idle time.[89] Many scheduling objectives fall into complexity classes beyond basic makespan minimization, highlighting the intractability of realistic scenarios. For instance, minimizing maximum lateness () on a single machine with release times ()—where each task has an earliest start time and a due date, and lateness is completion time minus due date—is NP-complete, proven via reduction from subset sum.[90] Parallel-machine extensions, such as , inherit this hardness and require advanced techniques for approximation or exact solution in restricted cases. Weighted objectives, like minimizing total weighted completion time (), become NP-hard even on a single machine when precedence constraints are present, necessitating optimization frameworks that handle dependencies.[89] Integer programming (IP) models offer a precise way to formulate scheduling optimization, especially for offline scenarios where all task details are known in advance. A canonical IP for single-machine scheduling with precedence constraints to minimize —where is the weight and the completion time of task —uses binary variables indicating if task precedes task immediately, subject to constraints ensuring no overlaps and respecting precedences: Here, is the processing time of task , and denotes predecessors of ; this model captures sequencing and dependency enforcement, solvable via commercial solvers for moderate-sized instances.[89] Extensions to parallel machines add assignment variables, increasing complexity but enabling global optimization. For tackling NP-hard offline scheduling instances, metaheuristic and exact methods like genetic algorithms and branch-and-bound are widely employed. Genetic algorithms evolve populations of schedules using crossover, mutation, and selection operators to approximate optimal solutions, with early applications demonstrating effectiveness on job-shop problems by encoding permutations as chromosomes and evaluating fitness via makespan or weighted completion time. Branch-and-bound algorithms systematically explore the decision tree of possible schedules, pruning branches using lower and upper bounds (e.g., from Lagrangian relaxation), and have been foundational for exact solutions in flowshop and job-shop settings since their adaptation to sequencing problems. These tools excel in offline contexts, such as batch processing in data centers, where computation time is decoupled from runtime execution.[89]Multiprocessor Scheduling
Multiprocessor scheduling extends single-processor techniques to systems with multiple CPUs or cores, addressing the allocation of tasks across processors to optimize utilization, throughput, and response times. In such systems, the scheduler must manage task migration between processors while contending with shared resources like caches and memory buses, which can introduce overheads not present in uniprocessor environments. Key challenges arise from the need to balance computational load, preserve data locality, and scale efficiently as the number of processors increases. One primary issue is load balancing, which ensures even distribution of tasks to prevent idle processors while busy ones are overloaded. Without effective balancing, multiprocessor systems can suffer from underutilization, as seen in multi-queue multiprocessor scheduling (MQMS) where each processor maintains its own run queue, potentially leading to imbalances. Processor affinity addresses the performance penalty from frequent task migrations, which disrupt cache locality by forcing tasks to reload data on different processors; affinity scheduling prioritizes assigning tasks to their previously used processor to reuse cached content and reduce context-switch costs. Scalability of scheduling algorithms is another concern, particularly in single-queue multiprocessor scheduling (SQMS) with a global queue, where lock contention on the shared structure degrades performance as processor count grows beyond a few cores. Common approaches include symmetric multiprocessing (SMP) using global queues for simplicity, though this often trades off scalability for ease of implementation. For parallel jobs requiring coordinated execution, gang scheduling allocates resources to entire groups of tasks simultaneously, ensuring synchronization by running all threads of a job in parallel across processors; this is particularly beneficial for fine-grain synchronization in parallel applications. Algorithms like affinity scheduling enhance SMP by favoring local execution, improving cache hit rates by up to 20-30% in benchmark workloads. Dynamic load balancing employs techniques such as push and pull migration to redistribute tasks proactively. In push migration, a periodic process identifies overloaded processors and migrates tasks to idle ones, while pull migration allows idle processors to steal tasks from busy queues, as in work-stealing schedulers. These methods achieve near-optimal balance with low overhead in MQMS setups. In non-uniform memory access (NUMA) systems, where memory latency varies by processor proximity, locality-aware scheduling minimizes remote accesses by binding tasks to nodes with local memory, reducing average latency by factors of 2-4 compared to oblivious placement. For example, the Linux Completely Fair Scheduler (CFS) handles multicore environments through per-CPU run queues with periodic load balancing via push/pull mechanisms, ensuring fair share allocation while respecting affinity to leverage cache hierarchies.Implementations
Unix-like Systems
Unix-like systems have evolved their scheduling mechanisms from simple designs in the early implementations to sophisticated, fairness-oriented schedulers in modern variants such as Linux, BSD, and Solaris. Early Unix, developed in the 1970s at Bell Labs, employed a basic multilevel feedback queue scheduler that prioritized interactive processes through dynamic priority adjustments based on recent CPU usage and wait times, using a default time quantum of approximately 1 second to balance responsiveness and throughput.[92] This approach allowed short jobs to complete quickly while preventing long-running processes from monopolizing the CPU, though it lacked advanced features like multiprocessor support. The Berkeley Software Distribution (BSD) variants introduced refinements to process prioritization, particularly in 4.4BSD, where scheduling relied on a multilevel feedback queue with 64 priority levels ranging from -20 to 43, dynamically adjusted every 100 milliseconds based on CPU utilization and sleep time to favor interactive workloads.[52] Users could influence these priorities via the nice value, a user-settable parameter ranging from -20 (highest priority) to 19 (lowest), which offsets the computed priority to allow voluntary yielding of CPU resources without root privileges for positive adjustments.[52] This mechanism ensured that processes with higher nice values received proportionally less CPU time, promoting fairness in multi-user environments. Linux's scheduler has undergone significant evolution, starting with the simple priority-based scheduler in kernel version 1.2 (1995), which used fixed time slices and nice-adjusted priorities, progressing to the O(1) scheduler in 2.6.0 (2002) for constant-time selection, the Completely Fair Scheduler (CFS) introduced in kernel 2.6.23 (2007), and most recently the Earliest Eligible Virtual Deadline First (EEVDF) scheduler in kernel 6.6 (2023).[93] CFS emphasized proportional fairness by tracking each process's virtual runtime—a normalized measure of CPU time consumed, adjusted for nice values and processor load—which determined scheduling order via a red-black tree to select the process with the smallest virtual runtime for execution.[94] EEVDF builds on CFS by using virtual deadlines to approximate ideal scheduling more accurately, improving latency for interactive tasks and scalability while maintaining fairness, where each process receives CPU time inversely proportional to its nice offset. This design enhances desktop responsiveness and performance on multiprocessors as of 2025.[95] In FreeBSD, the ULE (Unix Loadable Events) scheduler, introduced in 2003 as part of the SMPng project, replaced the traditional 4BSD scheduler to better support symmetric multiprocessing (SMP) environments.[96] ULE uses per-CPU run queues and a priority-based selection with nice adjustments, incorporating load balancing and CPU affinity to minimize migration overhead while maintaining interactive performance through dynamic priority boosts for recently woken processes.[96] Solaris implements scheduling through distinct classes, including the time-sharing (TS) class for general-purpose workloads, which uses a multilevel feedback queue with priorities from 0 to 59 adjusted based on CPU usage and nice values, and the real-time (RT) class, which assigns fixed priorities from 100 to 159 with a configurable time quantum to guarantee deterministic execution for critical tasks.[97] The system class (SYS) handles kernel threads with priorities 60 to 99, ensuring non-preemptible operations, while the fixed-priority (FX) class provides static priorities within the TS range for specialized applications.[97] Across Unix-like systems, scheduling adheres to POSIX standards for portability, supporting policies like SCHED_FIFO and SCHED_RR for real-time tasks alongside the default SCHED_OTHER, which incorporates nice adjustments to bias CPU allocation without violating fairness guarantees.[98] The nice command and renice utility enable users to set these values, with increments of 10 corresponding to halving or doubling the process's CPU share relative to peers, fostering cooperative resource sharing in diverse workloads.[98]Windows Systems
Windows scheduling is implemented in the kernel through the executive scheduler, which manages thread execution on one or more processors using a priority-based, preemptive, multilevel feedback queue algorithm.[99] Introduced in Windows NT 3.1 in 1993, the scheduler defines 32 discrete priority levels ranging from 0 (lowest, reserved for the zero-page thread) to 31 (highest), with levels 0-15 for variable priorities suitable for user-mode threads and 16-31 for real-time priorities typically reserved for kernel-mode operations.[100] The system employs one ready queue per priority level per processor, enabling O(1) scheduling by selecting the highest-priority ready thread via a bitmask summary.[99] Scheduling in Windows occurs at the thread level rather than the process level, as threads represent the basic schedulable units while processes serve primarily as containers for resources and security contexts.[99] Each thread inherits a base priority from its process's priority class (e.g., normal at level 8, real-time at 24) but can have relative adjustments, and the kernel dispatcher handles all thread dispatching, including context switches upon timer interrupts or higher-priority thread readiness.[100] Time quanta, or slices, are variable and depend on system type and thread characteristics; for example, foreground threads on client systems receive approximately 20 ms (tripled from the base 6-7 ms), while background threads get shorter slices to favor interactivity.[99] Within the same priority, threads follow a round-robin discipline until their quantum expires or a higher-priority thread preempts them.[101] In Windows 2000 and XP, the executive scheduler introduced dynamic priority boosts to enhance responsiveness, temporarily elevating thread priorities upon events like I/O completion (+1 to +2) or GUI thread foreground activation (+2), with a maximum boost of +15 to prevent starvation after prolonged waits.[99] These boosts decay over time unless renewed, ensuring fairness while prioritizing interactive workloads. Starting with Windows Vista and Windows 7, the scheduler incorporated dynamic fair share scheduling (DFSS) to allocate CPU time proportionally among processes based on their thread counts, mitigating CPU monopolization in multi-user scenarios like Terminal Services.[99] Additionally, from Vista onward, the system implements AutoBoost—a priority inheritance mechanism—for synchronization objects such as mutexes and critical sections, raising the priority of a resource-holding thread to match the highest waiting thread's priority until release, thereby preventing unbounded priority inversion.[102] Modern implementations in Windows 10 and 11 build on these foundations with optimizations for thread pools, including the ThreadPool API that dynamically manages worker threads to reduce overhead in I/O-bound and concurrent applications by reusing threads and balancing load across processors. Real-time scheduling capabilities are enhanced in Windows Server editions, supporting hard real-time priorities (16-31) with configurable timer resolution and affinity masking for dedicated cores, though full deterministic behavior requires specialized configurations or subsets like Windows Embedded for critical systems.[100] The scheduler also integrates with the Multimedia Class Scheduler Service (MMCSS), reserving CPU bandwidth (e.g., 80% for high-priority multimedia tasks over 10 ms intervals) to ensure low-latency performance for audio and video streams.[99] Users can influence scheduling via tools like Task Manager, which allows setting process priority classes (e.g., real-time, high, normal, below normal, idle) that propagate to threads, though real-time settings require administrative privileges and can destabilize the system if misused.[100] Advanced tuning is possible through registry keys, such as Win32PrioritySeparation for quantum length adjustments, but such changes are discouraged without expertise due to potential impacts on stability and fairness.[99]Other Operating Systems
In classic Mac OS, which preceded OS X and ran from System 1 in 1984 to Mac OS 9 in 2001, scheduling relied on cooperative multitasking where applications voluntarily yielded control to the Process Manager using calls likeWaitNextEvent to allow switching between tasks.[103] This model lacked preemption between applications, meaning a misbehaving application could monopolize the CPU and freeze the system, as there was no kernel-enforced time slicing for inter-application context switches.[103] Preemption was introduced in a limited form starting with Mac OS 8 in 1997, enabling it within applications for background threads via the Multiprocessing Services framework, though inter-application multitasking remained cooperative.[103]
macOS, introduced as OS X in 2001, employs the hybrid XNU kernel, which integrates Mach for core task management—including preemptive multitasking and symmetric multiprocessing—with a BSD subsystem providing Unix-like process scheduling and POSIX compliance.[104] The Mach component handles thread scheduling with support for real-time priorities to ensure predictable behavior in time-sensitive applications, such as multimedia processing, while the BSD layer overlays traditional Unix scheduling policies like priority-based dispatching.[104] This combination allows macOS to balance general-purpose computing with real-time extensions, evolving from earlier Mach-based systems to support modern multicore architectures.[104]
IBM's AIX, a Unix variant for Power Systems since 1986, uses a multilevel queue scheduler with 128 priority levels ranging from 0 (highest) to 127 (lowest), where threads are dispatched based on dynamic priority adjustments. Fixed priorities apply to system-critical threads that maintain constant levels to ensure stability, while variable priorities adjust for user processes based on factors like nice values and resource consumption, promoting fairness in multiprogrammed environments.[105] The Workload Manager (WLM), introduced in AIX 4.3.3 and enhanced in later versions, extends this by partitioning resources into hierarchical classes and tiers—up to 32 superclasses and 10 tiers—allocating CPU, memory, and I/O shares (1-65535) with minimum and maximum limits (0-100%) to isolate workloads like databases from batch jobs.[105] WLM integrates with the scheduler via Uniform Resource Access Priority (URAP), dynamically tuning thread priorities to meet service-level agreements without altering core kernel behavior.[105]
IBM OS/360, released in 1966 as the foundational mainframe operating system, pioneered multilevel scheduling through its Multiprogramming with a Fixed number of Tasks (MFT) and Variable number of Tasks (MVT) configurations, where jobs were organized into priority-based queues for efficient resource sharing on System/360 hardware.[106] The primary input queue held job control blocks ordered by priority (0-15, with lower numbers higher priority), from which the master scheduler selected ready tasks for initiation, supporting concurrent execution of up to 15 tasks in MFT.[106] A secondary output queue managed post-execution datasets for printing or punching, prioritized similarly to prevent bottlenecks, while the task dispatcher selected the highest-priority ready task from the resident queue for CPU allocation, laying groundwork for modern hierarchical dispatching in successors like z/OS.[106]
In embedded and real-time operating systems (RTOS), VxWorks from Wind River Systems employs a preemptive priority-based scheduler that supports rate monotonic scheduling (RMS), assigning fixed priorities inversely proportional to task periods to meet hard real-time deadlines in applications like aerospace and automotive controls.[107] RMS in VxWorks ensures higher-frequency tasks preempt lower ones, with schedulability tested via utilization bounds (e.g., up to 69% for n tasks), configurable through API calls like taskPrioritySet for deterministic behavior on multicore platforms.[107] Similarly, BlackBerry QNX Neutrino uses an adaptive partitioning scheduler (APS) to guarantee minimum CPU throughput percentages to thread groups, defining partitions with budgets (e.g., 20% CPU over a 100ms window) that dynamically redistribute unused cycles while enforcing hard limits to isolate critical partitions from faulty ones.[108] APS operates as an optional scheduler plugin, monitoring usage via control partitions and adjusting priorities to maintain fairness, widely adopted in safety-certified systems like medical devices.[108]
As of 2025, scheduling in mobile operating systems like iOS emphasizes energy-aware policies within the XNU kernel to optimize battery life on Apple Silicon, dynamically assigning threads to performance (P-cores) or efficiency (E-cores) based on workload intensity and power profiles.[109] The scheduler balances latency and consumption by promoting energy-efficient dispatching for background tasks, incorporating metrics from the Energy Efficiency Guide to reduce idle wakeups and favor low-power states, reflecting trends in heterogeneous computing for prolonged device usage.[110]