Hubbry Logo
Shebang (Unix)Shebang (Unix)Main
Open search
Shebang (Unix)
Community hub
Shebang (Unix)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Shebang (Unix)
Shebang (Unix)
from Wikipedia

#!
shebang

In computing, a shebang is the character sequence #!, consisting of the characters number sign (also known as sharp or hash) and exclamation mark (also known as bang), at the beginning of a script. It is also called sharp-exclamation, sha-bang,[1][2] hashbang,[3][4] pound-bang,[5][6] or hash-pling.[7]

When a text file with a shebang is used as if it were an executable in a Unix-like operating system, the program loader mechanism parses the rest of the file's initial line as an interpreter directive. The loader executes the specified interpreter program, passing to it as an argument the path that was initially used when attempting to run the script, so that the program may use the file as input data.[8] For example, if a script is named with the path path/to/script, and it starts with the line #! /bin/sh, then the program loader is instructed to run the program /bin/sh, passing path/to/script as the first argument.

The shebang line is usually ignored by the interpreter, because the "#" character is a comment marker in many scripting languages; some language interpreters that do not use the hash mark to begin comments still may ignore the shebang line in recognition of its purpose.[9]

Syntax

[edit]

The form of a shebang interpreter directive is as follows:[8]

#! interpreter [optional-one-arg-only]

in which interpreter is a path to an executable program. The space between #! and interpreter is optional. There could be any number of spaces or tabs either before or after interpreter. The optional-arg will include any extra spaces up to the end-of-line.

In Linux, the file specified by interpreter can be executed if it has the execute rights and is one of the following:

  • a native executable, such as an ELF binary
  • any kind of file for which an interpreter was registered via the binfmt_misc mechanism (such as for executing Microsoft .exe binaries using wine)
  • another script starting with a shebang

On Linux and Minix, an interpreter can also be a script. A chain of shebangs and wrappers yields a directly executable file that gets the encountered scripts as parameters in reverse order. For example, if file /bin/A is an executable file in ELF format, file /bin/B contains the shebang #! /bin/A optparam, and file /bin/C contains the shebang #! /bin/B, then executing file /bin/C resolves to /bin/B /bin/C, which finally resolves to /bin/A optparam /bin/B /bin/C.

In Solaris- and Darwin-derived operating systems (such as macOS), the file specified by interpreter must be an executable binary and cannot itself be a script.[10]

Examples

[edit]

Some typical shebang lines:

  • #! /bin/sh – Execute the file using the Bourne shell, or a compatible shell, assumed to be in the /bin directory
  • #! /bin/bash – Execute the file using the Bash shell
  • #! /usr/bin/pwsh – Execute the file using PowerShell
  • #! /usr/bin/env python3 – Execute with a Python interpreter, using the env program search path to find it
  • #! /bin/false – Do nothing, but return a non-zero exit status, indicating failure. Used to prevent stand-alone execution of a script file intended for execution in a specific context, such as by the . command from sh/bash, source from csh/tcsh, or as a .profile, .cshrc, or .login file.

Shebang lines may include specific options that are passed to the interpreter. However, implementations vary in the parsing behavior of options; for portability, only one option should be specified without any embedded whitespace.[11] Further portability guidelines are found below.

Purpose

[edit]

Interpreter directives allow scripts and data files to be used as commands, hiding the details of their implementation from users and other programs, by removing the need to prefix scripts with their interpreter on the command line.

For example, consider a script having the initial line #! /bin/sh -x. It may be invoked simply by giving its file path, such as some/path/to/foo,[12] and some parameters, such as bar and baz:

some/path/to/foo bar baz

In this case /bin/sh is invoked in its place, with parameters -x, some/path/to/foo, bar, and baz, as if the original command had been

/bin/sh -x some/path/to/foo bar baz

Most interpreters make any additional arguments available to the script. If /bin/sh is a POSIX-compatible shell, then bar and baz are presented to the script as the positional parameter array "$@", and individually as parameters "$1" and "$2" respectively.

Because the initial # is the character used to introduce comments in the POSIX shell language (and in the languages understood by many other interpreters), the whole shebang line is ignored by the interpreter. However, it is up to the interpreter to ignore the shebang line, and not all do so; thus, a script consisting of the following two lines simply outputs both lines when run:

#! /bin/cat
Hello world!

Strengths

[edit]

When compared to the use of global association lists between file extensions and the interpreting applications, the interpreter directive method allows users to use interpreters not known at a global system level, and without administrator rights. It also allows specific selection of interpreter, without overloading the filename extension namespace (where one file extension refers to more than one file type), and allows the implementation language of a script to be changed without changing its invocation syntax by other programs. Invokers of the script need not know what the implementation language is as the script itself is responsible for specifying the interpreter to use.

Portability

[edit]

Program location

[edit]

Shebangs must specify absolute paths (or paths relative to current working directory) to system executables; this can cause problems on systems that have a non-standard file system layout. Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter. Python, for example, might be in /usr/bin/python3, /usr/local/bin/python3, or even something like /home/username/bin/python3 if installed by an ordinary user.

A similar problem exists for the POSIX shell, since POSIX only required its name to be sh, but did not mandate a path. A common value is /bin/sh, but some systems such as Solaris have the POSIX-compatible shell at /usr/xpg4/bin/sh.[13] In many Linux systems, /bin/sh is a hard or symbolic link to /bin/bash, the Bourne Again shell (BASH). Using bash-specific syntax while maintaining a shebang pointing to sh is also not portable.[14]

Because of this it is sometimes required to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this reason and because POSIX does not standardize path names, POSIX does not standardize the feature.[15] The GNU Autoconf tool can test for system support with the macro AC_SYS_INTERPRETER.[16]

Often, the program /usr/bin/env can be used to circumvent this limitation by introducing a level of indirection. #! is followed by /usr/bin/env, followed by the desired command without full path, as in this example:

#!/usr/bin/env sh

This mostly works because the path /usr/bin/env is commonly used for the env utility, and it invokes the first sh found in the user's $PATH, typically /bin/sh.

This particular example (using sh) is of limited utility: neither /bin/sh nor /usr/bin/env is universal, with similar numbers of devices lacking each. More broadly using #!/usr/bin/env for any script still has some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.

Using #!/usr/bin/env results in run-time indirection, which has the potential to degrade system security; for this reason some commentators recommend against its use[17] in packaged software, reserving it only for "educational examples".

Argument splitting

[edit]

Command arguments are split in different ways across platforms. Some systems do not split up the arguments; for example, when running the script with the first line,

#!/usr/bin/env python3 -c

all text after the first space is treated as a single argument, that is, python3 -c will be passed as one argument to /usr/bin/env, rather than two arguments. Such systems include Linux[18][19] and Cygwin.

Another approach is the use of a wrapper. FreeBSD 6.0 (2005) introduced a -S option to its env as it changed the shebang-reading behavior to non-splitting. This option tells env to split the string itself.[20] The GNU env utility since coreutils 8.30 (2018) also includes this feature.[21] Although using this option mitigates the portability issue on the kernel end with splitting, it adds the requirement that env supports this particular extension.

Character interpretation

[edit]

Another problem is scripts containing a carriage return character immediately after the shebang line, perhaps as a result of being edited on a system that uses DOS line breaks, such as Microsoft Windows. Some systems interpret the carriage return character as part of the interpreter command, resulting in an error message.[22]

Magic number

[edit]

The shebang is actually a human-readable instance of a magic number in the executable file, the magic byte string being 0x23 0x21, the two-character encoding in ASCII of #!. This magic number is detected by the "exec" family of functions, which determine whether a file is a script or an executable binary. The presence of the shebang will result in the execution of the specified executable, usually an interpreter for the script's language. It has been claimed[23] that some old versions of Unix expect the normal shebang to be followed by a space and a slash (#! /), but this appears to be untrue;[11] rather, blanks after the shebang have traditionally been allowed, and sometimes documented with a space, as described in the 1980 historical email below.

The shebang characters are represented by the same two bytes in extended ASCII encodings, including UTF-8, which is commonly used for scripts and other text files on current Unix-like systems. However, UTF-8 files may begin with the optional byte order mark (BOM); if the "exec" function specifically detects the bytes 0x23 and 0x21, then the presence of the BOM (0xEF 0xBB 0xBF) before the shebang will prevent the script interpreter from being executed. Some authorities recommend against using the byte order mark in POSIX (Unix-like) scripts,[24] for this reason and for wider interoperability and philosophical concerns. Additionally, a byte order mark is not necessary in UTF-8, as that encoding does not have endianness issues; it serves only to identify the encoding as UTF-8.[24]

Etymology

[edit]

An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter. The name shebang for the distinctive two characters may have come from an inexact contraction of SHArp bang or haSH bang, referring to the two typical Unix names for them. Another theory on the sh in shebang is that it is from the default shell sh, usually invoked with shebang.[25] This usage was current by December 1989,[26] and probably earlier.

History

[edit]

The shebang was introduced by Dennis Ritchie between Edition 7 and 8 at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD[27] and activated by default by 4.2BSD). As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.

The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix in 1979,[28] which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec() system call would fail, preventing scripts from behaving uniformly as normal system commands.

Version 8 improved shell scripts

[edit]

In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie introduced kernel support for interpreter directives in January 1980, for Version 8 Unix, with the following description:[27]

From uucp Thu Jan 10 01:37:58 1980
>From dmr Thu Jan 10 04:25:49 1980 remote from research

The system has been changed so that if a file being executed
begins with the magic characters #! , the rest of the line is understood
to be the name of an interpreter for the executed file.
Previously (and in fact still) the shell did much of this job;
it automatically executed itself on a text file with executable mode
when the text file's name was typed as a command.
Putting the facility into the system gives the following
benefits.

1) It makes shell scripts more like real executable files,
because they can be the subject of 'exec.'

2) If you do a 'ps' while such a command is running, its real
name appears instead of 'sh'.
Likewise, accounting is done on the basis of the real name.

3) Shell scripts can be set-user-ID.[a]

4) It is simpler to have alternate shells available;
e.g. if you like the Berkeley csh there is no question about
which shell is to interpret a file.

5) It will allow other interpreters to fit in more smoothly.

To take advantage of this wonderful opportunity,
put

  #! /bin/sh
 
at the left margin of the first line of your shell scripts.
Blanks after ! are OK.  Use a complete pathname (no search is done).
At the moment the whole line is restricted to 16 characters but
this limit will be raised.

Unnamed shell script feature

[edit]

The feature's creator didn't give it a name, however:[30]

From: "Ritchie, Dennis M (Dennis)** CTR **" <dmr@[redacted]>
To: <[redacted]@talisman.org>
Date: Thu, 19 Nov 2009 18:37:37 -0600
Subject: RE: What do -you- call your #!<something> line?

 I can't recall that we ever gave it a proper name.
It was pretty late that it went in--I think that I
got the idea from someone at one of the UCB conferences
on Berkeley Unix; I may have been one of the first to
actually install it, but it was an idea that I got
from elsewhere.

As for the name: probably something descriptive like
"hash-bang" though this has a specifically British flavor, but
in any event I don't recall particularly using a pet name
for the construction.

Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c.[31]

This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).

Note that, even in systems with full kernel support for the #! magic number, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants. Scripts are then interpreted by the user's default shell.

See also

[edit]

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In Unix-like operating systems, a shebang (also known as a hashbang or sha-bang) is the two-character sequence consisting of a number sign (#) immediately followed by an (!), placed as the first line of a script file to indicate the program interpreter that should process the file's contents. This mechanism allows scripts written in various languages, such as shell, , or Python, to be executed directly from the command line without explicitly invoking the interpreter, enhancing usability in environments like , BSD, and other Unix derivatives. The syntax of a shebang line begins with #! followed by the absolute or relative path to the desired interpreter executable, optionally including a single argument passed to that interpreter, all confined to a single line terminated by a character. For example, #!/bin/[sh](/page/.sh) specifies the , while #!/usr/bin/[env](/page/Env) python3 uses the env utility to locate the Python 3 interpreter in the system's PATH for better portability across systems. When the operating system's execve (or equivalent) encounters a file starting with this sequence, it parses the line, invokes the specified interpreter, and passes the script's pathname and any command-line arguments to it; in , the optional interpreter argument is treated as a single string that may include whitespace, though behavior varies across systems, with some splitting on whitespace. The shebang mechanism originated at between the development of Unix Version 7 (1979) and the unreleased Version 8, with its first documented implementation appearing in Version 8 and early BSD releases around 1980, though it was not enabled by default in some variants like SCO UNIX until later updates. The term "shebang" derives from an informal contraction of "sharp bang" (referring to the # and ! characters, with # known as "sharp" in some contexts) or "hash bang," possibly influenced by the American slang phrase "the whole shebang" meaning everything included. It was invented to streamline script execution in early Unix, evolving from simpler methods where shell scripts required indirect invocation via the shell, and has since become a in POSIX-influenced systems, though the specification itself leaves the behavior of files starting with #! as "unspecified" to accommodate this extension. Portability of shebangs varies significantly across systems due to differences in implementation details, such as the maximum length of the shebang line (ranging from 32 characters in early to 4096 in versions 6.0–8.1 and later implementations up to around 1024–4096 characters depending on system limits) and how arguments are parsed and passed to the interpreter (e.g., some systems treat the entire post-interpreter content as one argument, while others split on whitespace). These variations can lead to issues like truncation, errors on long paths, or incorrect argument handling, prompting recommendations to keep shebang lines concise and test scripts across target environments; additionally, security considerations limit shebangs in executables on many systems to prevent . Despite these quirks, the shebang remains a fundamental feature for scripting in Unix ecosystems, with support for recursive interpretation up to 4 levels in kernels since version 2.6.28, though this is not portable to other systems.

Syntax and Usage

Syntax

The shebang line in Unix consists of the two-character sequence "#!" at the absolute start of a , functioning as a magic number that the kernel's program loader recognizes to identify and execute scripts via an specified interpreter. The general syntax follows "#!" immediately or after an optional space, succeeded by the absolute path to the interpreter , which may be followed by a single optional argument passed to that interpreter; this structure is parsed by the kernel during program loading. For instance:

#! /bin/sh

#! /bin/sh

or

#! /usr/bin/env python3

#! /usr/bin/env python3

The shebang must appear as the very first line of the file, with no leading whitespace such as spaces or tabs permitted before , and the interpreter path must be specified as an absolute pathname to a non-script . In the , the total length of the shebang line—including the "#!", path, , and terminating —is limited to 128 characters due to the kernel's buffer size for binary reading (BINPRM_BUF_SIZE). Limits vary by system. Variations in implementation include the optional nature of the space immediately after "#!", as the kernel skips any leading whitespace or tabs following the sequence to locate the interpreter path. The optional argument, if present, is delimited by a single space or tab after the path and is treated as a single unit by most kernels, preserving any embedded spaces within it as part of that argument rather than splitting it further.

Examples

A common basic example of a shebang line specifies the as the interpreter for a , allowing it to be executed directly without invoking the shell explicitly.

sh

#!/bin/sh echo "Hello, World!"

#!/bin/sh echo "Hello, World!"

This script, when made executable (e.g., via chmod +x script.sh), runs the commands using /bin/sh. To enhance portability across systems where interpreter paths may vary, the shebang can invoke to locate the interpreter dynamically from the PATH . For instance, the following Python script uses env to find Python 3:

python

#!/usr/bin/[env](/page/Env) python3 print("Hello, World!")

#!/usr/bin/[env](/page/Env) python3 print("Hello, World!")

This approach avoids hardcoding paths like /usr/bin/python3, making the script work on diverse environments such as , BSD, and macOS. A utility example employs #!/bin/false to create a file that appears as a script but exits immediately with a non-zero status, preventing direct execution and often used for configuration files sourced by other programs.

sh

#!/bin/false # Configuration variables CONFIG_VAR="value"

#!/bin/false # Configuration variables CONFIG_VAR="value"

When executed, it produces no output and returns an , ensuring the content is not run as a standalone script. In an edge case, shebangs can include flags for interpreters like , but limitations apply: additional arguments after the interpreter path are typically passed as a single string rather than separate parameters. For example:

javascript

#!/usr/bin/env node --no-warnings console.log("Hello from Node.js!");

#!/usr/bin/env node --no-warnings console.log("Hello from Node.js!");

Here, --no-warnings is treated as one argument to node, which may require workarounds like a wrapper script for multiple distinct flags on older systems. Shebang lines must be confined to the first line of the file, ending at the newline character; multi-line constructs are not parsed, and only the initial line is interpreted by the kernel's binfmt mechanism.

Purpose and Benefits

Core Purpose

The shebang, denoted by the initial line beginning with the characters "#!", serves as a directive in systems that enables the direct execution of script files as standalone programs. When a script with execute permissions is invoked, such as through ./script.sh, the operating system's kernel processes the request via the , which loads and executes the file. Upon loading, execve inspects the first two bytes of the file for the magic number sequence "#!" ( 0x23 0x21). If detected, the kernel parses the remainder of that line to identify the pathname of the appropriate interpreter, such as /bin/[sh](/page/.sh) or /usr/bin/python3. The kernel then invokes this interpreter as a new process, with argv being the interpreter's pathname, argv the optional argument from the shebang line if any, argv the original script's pathname, followed by any user-supplied arguments from the invocation. The interpreter opens and reads the script file using this pathname to process its contents, effectively treating the script as input for execution. This mechanism allows scripts to be run without requiring users to explicitly prefix the command with the interpreter, contrasting with manual invocations like sh script.sh that demand knowledge of the runtime environment. By embedding the interpreter specification within the script, the shebang conceals these implementation details, promoting portability and simplifying the distribution of self-contained script files across diverse Unix environments.

Strengths

The shebang mechanism simplifies the by allowing scripts to be executed directly by their , without requiring users to specify or know the underlying interpreter, such as invoking python script.py or [perl](/page/Perl) script.pl. This direct executability treats the script as a standalone binary, streamlining invocation in command lines or file managers across systems. By encapsulating the choice of interpreter within the shebang line, it promotes that enhances code portability during development and deployment; for instance, using #!/usr/bin/env bash avoids hardcoding absolute paths to the interpreter, making scripts adaptable to varying system configurations without modification. This is particularly valuable in collaborative environments or when distributing scripts, as it reduces dependency on specific installation paths. The shebang supports a diverse array of scripting languages in the Unix ecosystem, enabling seamless integration of interpreters like (#!/usr/bin/env [perl](/page/Perl)), Python (#!/usr/bin/env python), or (#!/usr/bin/[awk](/page/AWK) -f) within the same project or workflow, without altering the execution model. This flexibility fosters , where different components can leverage specialized tools while maintaining uniform invocation. It enhances by enabling scripts to run independently in contexts like jobs, scripts, and system tools, where direct executability is essential for reliability and ease of scheduling.

Portability and Limitations

Interpreter Location

In Unix-like systems, the shebang mechanism requires the interpreter path to be specified as an absolute pathname to ensure the kernel can reliably locate and execute the designated program during script invocation via the execve system call. For instance, #!/bin/sh succeeds because /bin/sh provides a full path from the root directory, whereas a relative path like #!sh fails, as the kernel does not search the PATH environment variable and instead attempts execution relative to the current working directory, which may not contain the interpreter. This absolute path requirement creates portability issues when scripts are distributed across diverse Unix-like systems, where interpreter locations can differ significantly due to variations in package management and installation practices. A common example is Python, which may reside at /usr/bin/python on some distributions like but at /usr/local/bin/python or even /opt/homebrew/bin/python on others such as macOS with Homebrew, potentially causing execution failures if the hardcoded path is incorrect. To address these challenges and improve cross-platform compatibility, a widely adopted mitigation is employing #!/usr/bin/env interpreter in the shebang, where /usr/bin/env—a POSIX-standard utility—dynamically locates the interpreter by searching the PATH environment variable before invoking it. This approach allows scripts to use the system's default interpreter without embedding system-specific paths, facilitating deployment on multiple platforms including Linux distributions, BSD variants, and macOS. In contemporary containerized environments such as Docker, interpreter paths can vary further between base images; for example, the lightweight image often places essentials like the shell at /bin/sh (implemented as BusyBox ash) and Python at /usr/bin/python3, necessitating either image-specific shebangs or reliance on /usr/bin/env for adaptability, though the latter may require verification that env is available in minimal images. Non-POSIX extensions extend shebang support to Windows via environments like and (WSL), where Unix-like absolute paths are interpreted correctly within their emulated filesystems, allowing scripts to execute with standard shebangs such as #!/bin/bash without native Windows modifications.

Argument Processing

In most modern Unix-like systems, such as and BSD variants, the kernel processes the shebang line by identifying the interpreter path and then treating all content following the first space—up to the —as a single argument passed to the interpreter via the execve , with the original script filename appended as the subsequent argument. This approach ensures the interpreter receives the script path correctly while allowing for one optional parameter, but it concatenates any additional elements into that single string without further . This design imposes a key limitation: only one optional argument can be reliably specified in the shebang line, as multiple words (e.g., interpreter flags like -e or -u) are not split and may result in the interpreter receiving an or unexpected , leading to execution failures or incorrect . For instance, a shebang such as #!/usr/bin/env python3 -u passes "python3 -u" as a single argument to /usr/bin/env on and modern , causing it to search for a nonexistent interpreter named python3 -u rather than invoking python3 with the -u flag. Implementations vary across Unix systems, particularly in older variants. Some BSD variants, such as , split the post-interpreter content on whitespace into separate arguments, which can cause errors if the interpreter does not expect them or if the line exceeds implementation-specific limits. Modern POSIX-compliant systems aim for consistent single-argument treatment where shebangs are supported, though the mechanism itself remains implementation-defined rather than strictly standardized. To circumvent these limitations when multiple interpreter arguments are needed, scripts can embed the required flags directly in their content following the shebang line, such as by using a self-reexecuting (e.g., exec python3 -u "&#36;0" "$@" as the second line) to reinvoke the interpreter with the desired options while passing the original arguments. Alternatively, on systems with coreutils 8.30 or later (as of 2018), the shebang #!/usr/bin/env -S interpreter arg1 arg2 can pass multiple split arguments to the interpreter, improving portability for simple cases.

Character Interpretation

In Unix-like systems, the shebang line is particularly sensitive to line ending characters, as the kernel parses it byte-by-byte to identify the interpreter path. When a script uses DOS or Windows-style line endings (CRLF, where CR is ASCII 13 or \r), the carriage return immediately following the shebang can be appended to the interpreter path, rendering it invalid. For instance, a shebang like #!/bin/bash\r\n is interpreted by the kernel as seeking /bin/bash\r, which does not exist, resulting in a "bad interpreter: No such file or directory" upon execution. This issue arises because the kernel expects Unix-style LF (line feed, ASCII 10 or \n) endings exclusively for proper parsing of the first line. Encoding artifacts further complicate shebang interpretation, requiring scripts to adhere strictly to Unix LF endings and avoid extraneous bytes at the file's start. A UTF-8 (BOM, the sequence EF BB BF) prefixed before the shebang disrupts recognition, as the kernel checks for the exact bytes 0x23 0x21 (#!) at the file's beginning; the BOM shifts this sequence, causing the script to be treated as a binary or fail execution entirely. Shebangs must thus use plain without BOM to ensure compatibility, as any leading non-shebang bytes invalidate the magic number detection. For setuid scripts, character interpretation in shebangs introduces significant risks, prompting kernels to enforce strict validation of interpreter paths to prevent . In setuid mode, where a script runs with elevated privileges, an attacker could exploit parsing ambiguities—such as manipulated line endings or encodings—to redirect execution to a malicious interpreter, potentially gaining unauthorized access. To mitigate this, most Unix kernels (including ) ignore the setuid bit on scripts interpreted via shebang, closing the file after reading the directive and re-opening the specified interpreter without preserving privileges; this design avoids race conditions where the script could be swapped mid-execution. Systems like offer optional secure handling via /dev/fd/N references, but standard kernels prioritize safety by disallowing setuid shebangs altogether. Common mitigations involve preprocessing scripts to enforce clean encodings and line endings. Tools like dos2unix convert CRLF to LF by default and can remove BOMs with the -r option, ensuring the shebang starts precisely at byte offset 0; for example, dos2unix script.sh resolves CR artifacts while preserving script content. Editors should save files in Unix format without BOM, and utilities like or can strip leading BOMs (e.g., tail -c +4 script.sh > clean.sh to skip the first three bytes). Ensuring text files are "clean" via these steps prevents parsing errors across environments. Despite advances in kernel filename handling, support for Unicode characters in shebang interpreter paths remains limited and under-discussed, often failing on older kernels due to ASCII assumptions in path parsing. Modern kernels (post-2.6) support UTF-8 filenames broadly, but shebang paths with non-ASCII may trigger encoding mismatches or ENOEXEC errors on legacy systems lacking full normalization, highlighting a portability gap in diverse deployments.

Magic Number Recognition

The shebang mechanism relies on a specific magic number to identify scripts at the kernel level. This magic number consists of the two ASCII bytes 0x23 followed by 0x21, corresponding to the characters '#' and '!' in the file's first two positions. When the execve(2) attempts to load a file, the kernel checks these initial bytes; if they match the magic number and the file lacks a binary signature (such as the ELF header 0x7F 'E' 'L' 'F'), the kernel treats it as a potential script. Upon detecting the magic number, the kernel reads the remainder of the first line (up to the newline character) to parse the interpreter path and any optional single argument. It then constructs a new argument vector where the interpreter becomes the program to execute, the optional argument (if present) follows as the first argument, the original script file serves as the second argument, and the caller's original arguments are appended thereafter. The kernel invokes execve(2) recursively on the interpreter with this modified setup, effectively delegating execution while passing the script as input to the interpreter. This process supports up to four levels of recursive script interpretation to prevent excessive nesting. While the POSIX standard acknowledges the shebang but deems its effects unspecified, leaving support implementation-defined, Unix-like systems such as Linux implement it with specific constraints. In Linux, the shebang line is limited to 127 characters prior to kernel version 5.1 and 255 characters thereafter, excluding the newline; longer lines are truncated, potentially leading to invalid interpreter paths. The handling occurs in the kernel's binary format loader, specifically through the script binary format registered in fs/binfmt_script.c, invoked from the broader execution logic in fs/exec.c. If the magic number is absent, the line is malformed, or the interpreter cannot be located or executed, the kernel returns an error such as ENOEXEC (exec format error) or ENOENT (no such file or directory), allowing the caller to fail gracefully. In files without execute permissions, the shebang line is generally ignored by interpreters, as the kernel's magic number check only applies during direct execution attempts; such lines may serve as indicating the intended interpreter.

Etymology and History

Etymology

The term "shebang" for the "#!" directive in Unix scripts derives from Unix , where it emerged as informal in the 1980s among programming communities focused on shell scripting. Its etymology is uncertain but commonly attributed to a portmanteau of "sharp bang," combining the term "sharp" for the "#" symbol with "bang" as for the "!" character, or alternatively "hash bang" using the computing name "hash" for "#." Another proposed origin is "shell bang," referencing the ("sh") and the directive's role in script execution, possibly influenced by the American slang phrase "the whole shebang" meaning "the entire thing." No definitive inventor or exists for the term; it arose through organic adoption in informal Unix development circles rather than formal documentation. The word "shebang" gained traction in lexicon by the late , with early printed references appearing around 1989 in technical discussions. Alternative names include "hashbang," "bang line," "pound bang," and "sha-bang," reflecting variations in how developers referred to the two-character sequence in different contexts. These synonyms highlight the term's roots in spoken and written shell scripting practices within early Unix environments.

Early Development

The shebang mechanism was implemented by at Bell Laboratories in January 1980, during the development period between the 7th Edition of Unix (released in 1979) and the 8th Edition, drawing from an idea discussed at a Unix conference. In an dated January 10, 1980, Ritchie announced the addition of kernel support for interpreter directives, stating: "The system has been changed so that if a file being executed begins with the magic characters #!, the rest of the line is understood to be the name of an interpreter for the executed file." The initial purpose was to enable shell scripts and similar files to be executed directly as binaries without requiring users to manually invoke the appropriate interpreter, such as by prefixing commands with /bin/sh. This allowed scripts to integrate seamlessly into the Unix execution model, displaying the true interpreter command in process listings like ps and supporting features such as set-user-ID execution for privileged scripts. The feature first appeared in internal research versions of Unix at Bell Labs, where it was implemented in the kernel's exec system call to parse the directive at the start of executable files. Initially, the interpreter path following the #! was limited to 16 characters, including the complete absolute pathname without any path search, and optional whitespace after the exclamation mark was permitted but not required. Ritchie noted in his announcement that this limit would soon be expanded, reflecting the experimental nature of the early implementation aimed at enhancing script portability across different interpreters, including the Berkeley C shell. Adoption extended to (BSD) variants, where the shebang was incorporated starting with 4.0BSD in 1981, though it gained widespread use and default activation with the release of 4.2BSD in , subsequently spreading to other Unix derivatives. The magic number #! was selected as the identifier for its role as a distinctive prefix unlikely to occur naturally at the beginning of typical text files or scripts, ensuring reliable recognition by the kernel while maintaining visibility in file contents.

Version 8 Improvements

In Unix Version 8 (released in 1985), the shebang mechanism—which had been introduced during its development in —provided significant enhancements that improved the execution of shell scripts, making it more robust and suitable for general use across different interpreters. introduced kernel-level support for the #! directive, allowing the operating system to automatically identify and invoke the specified interpreter for a script file, thereby treating it as a directly program rather than requiring manual invocation. This upgrade addressed prior limitations in script handling by recognizing the shebang as a "magic number" at the file's beginning, with the rest of the first line specifying the interpreter path. Key changes included expanded support for interpreter arguments following the path, enabling scripts to pass options directly to the interpreter (though initially limited to processing the rest of the line as a single command in some implementations, with a maximum shebang length of 32 bytes). Better integration with shells like (the ) and (the ) was achieved, as the mechanism allowed scripts to specify either /bin/ or /bin/ explicitly, enhancing portability between shell environments without altering execution commands. The first public documentation of this feature appeared in the Version 8 manuals and an early from Ritchie dated January 10, 1980, which described the change: "The system has been changed so that if a file being executed begins with the magic characters #!, the rest of the line is understood to be the name of an interpreter for the executed file." These improvements had a profound impact by establishing "improved shell scripts" as a standard feature, reducing the need for users to manually prepend interpreter commands (e.g., sh script.sh) and enabling seamless execution similar to compiled binaries. As part of broader shell enhancements led by and , including refinements to process control and I/O redirection, the shebang facilitated more modular and user-friendly scripting. The legacy of Version 8's shebang upgrades set a critical precedent for later standardization efforts, influencing the POSIX specification in subsequent Unix editions and ensuring consistent interpreter handling across diverse systems.

Preceding Features

In Unix Version 7, released in 1979, shell scripts were executed by explicitly invoking the Bourne shell with the script file as an argument, such as sh scriptfile, where the shell served as the implicit interpreter without any special directive like #!. This approach relied on the Bourne shell's ability to read and process the text file directly, often after the user set execute permissions with chmod +x scriptfile, but direct execution via the kernel's exec system call was not possible because it only supported loading a.out binary executables, leading to an ENOEXEC error if attempted. To handle such cases, the Bourne shell itself would intervene upon receiving ENOEXEC during command execution, forking a new shell instance to interpret the text file as a script. Prior to formal shebang support, early practices involved placing manual comments on the first line, such as # /bin/[sh](/page/.sh) or # !/bin/[sh](/page/.sh), to document the intended interpreter for human readers; the shell treated these as standard comments (ignored from # to end of line) rather than directives, and the kernel performed no special parsing. Scripts still required explicit prefixing with sh or the dot command (. scriptfile) for invocation in the current environment, causing inconvenience as users could not run them directly like binaries without shell intervention. In Versions 6 and 7, Bourne shell scripts typically assumed the standard /bin/sh interpreter, with reliance on environment variables like PATH to locate the shell executable, though no variables directly specified alternative interpreters for shell scripts themselves—other tools like awk -f script used explicit invocation for non-shell languages. These ad-hoc methods and limitations, where the shell acted as a workaround for kernel-level execution, provided a foundational baseline that influenced Dennis Ritchie's design of kernel-supported interpreter directives in the subsequent version.

References

  1. Summary of execve Handling of Scripts with Shebang (#!) Lines

  2. Aug 13, 2001 · This #! mechanism origins from Bell Labs, between Version 7 and Version 8, and was then available on 4.0BSD (~10/'80), although not activated per default.
Add your contribution
Related Hubs
User Avatar
No comments yet.