Hubbry Logo
search
logo

Pipeline (software)

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Pipeline (software)

In software engineering, a pipeline consists of a chain of processing elements (processes, threads, coroutines, functions, etc.), arranged so that the output of each element is the input of the next. The concept is analogous to a physical pipeline. Usually some amount of buffering is provided between consecutive elements. The information that flows in these pipelines is often a stream of records, bytes, or bits, and the elements of a pipeline may be called filters. This is also called the pipe(s) and filters design pattern which is monolithic. Its advantages are simplicity and low cost while its disadvantages are lack of elasticity, fault tolerance and scalability. Connecting elements into a pipeline is analogous to function composition.

Narrowly speaking, a pipeline is linear and one-directional, though sometimes the term is applied to more general flows. For example, a primarily one-directional pipeline may have some communication in the other direction, known as a return channel or backchannel, as in the lexer hack, or a pipeline may be fully bi-directional. Flows with one-directional trees and directed acyclic graph topologies behave similarly to linear pipelines. The lack of cycles in such flows makes them simple, and thus they may be loosely referred to as "pipelines".

Pipelines are often implemented in a multitasking OS, by launching all elements at the same time as processes, and automatically servicing the data read requests by each process with the data written by the upstream process. This can be called a multiprocessed pipeline. In this way, the scheduler will naturally switch the CPU among the processes so as to minimize its idle time. In other common models, elements are implemented as lightweight threads or as coroutines to reduce the OS overhead often involved with processes. Depending on the OS, threads may be scheduled directly by the OS or by a thread manager. Coroutines are always scheduled by a coroutine manager of some form.

Read and write requests are usually blocking operations. This means that the execution of the source process, upon writing, is suspended until all data can be written to the destination process. Likewise, the execution of the destination process, upon reading, is suspended until at least some of the requested data can be obtained from the source process. This cannot lead to a deadlock, where both processes would wait indefinitely for each other to respond, since at least one of the processes will soon have its request serviced by the operating system, and continue to run.

For performance, most operating systems implementing pipes use pipe buffers, which allow the source process to provide more data than the destination process is currently able or willing to receive. Under most Unixes and Unix-like operating systems, a special command is also available, typically called "buffer", that implements a pipe buffer of potentially much larger and configurable size. This command can be useful if the destination process is significantly slower than the source process, but it is desired that the source process complete its task as soon as possible. E.g., if the source process consists of a command which reads an audio track from a CD and the destination process consists of a command which compresses the waveform audio data to a format like MP3. In this case, buffering the entire track in a pipe buffer would allow the CD drive to spin down more quickly, and enable the user to remove the CD from the drive before the encoding process has finished.

Such a buffer command can be implemented using system calls for reading and writing data. Wasteful busy waiting can be avoided by using facilities such as poll or select or multithreading.

Some notable examples of pipeline software systems include:

CMS Pipelines is a port of the pipeline idea to VM/CMS and z/OS systems. It supports much more complex pipeline structures than Unix shells, with steps taking multiple input streams and producing multiple output streams. (Such functionality is supported by the Unix kernel, but few programs use it as it makes for complicated syntax and blocking modes, although some shells do support it via arbitrary file descriptor assignment).

See all
User Avatar
No comments yet.