Recent from talks
Nothing was collected or created yet.
Filename extension
View on WikipediaThis article needs additional citations for verification. (November 2015) |
A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt, .mp3, .exe) that indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a full stop (period), but in some systems[1] it is separated with spaces.
Some file systems, such as the FAT file system used in DOS, implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others, such as Unix file systems, the VFAT file system, and NTFS, treat filename extensions as part of the filename without special distinction.
Operating system and file system support
[edit]The Multics file system stores the file name as a single string, not split into base name and extension components, allowing the "." to be just another character allowed in file names. It allows for variable-length filenames, permitting more than one dot, and hence multiple suffixes, as well as no dot, and hence no suffix. Some components of Multics, and applications running on it, use suffixes to indicate file types, but not all files are required to have a suffix — for example, executables and ordinary text files usually have no suffixes in their names.
File systems for UNIX-like operating systems also store the file name as a single string, with "." as just another character in the file name. A file with more than one suffix is sometimes said to have more than one extension, although terminology varies in this regard, and most authors define extension in a way that does not allow more than one in the same file name.[citation needed] More than one extension usually represents nested transformations, such as files.tar.gz (the .tar indicates that the file is a tar archive of one or more files, and the .gz indicates that the tar archive file is compressed with gzip). Programs transforming or creating files may add the appropriate extension to names inferred from input file names (unless explicitly given an output file name), but programs reading files usually ignore the information; it is mostly intended for the human user.
It is more common, especially in binary files, for the file to contain internal or external metadata describing its contents.
This model generally requires the full filename to be provided in commands, whereas the metadata approach often allows the extension to be omitted.
CTSS was an early operating system in which the filename and file type were separately stored. Continuing this practice, and also using a dot as a separator for display and input purposes (while not storing the dot), were various DEC operating systems (such as RT-11), followed by CP/M and subsequently DOS.
In DOS and 16-bit Windows, file names have a maximum of 8 characters, a period, and an extension of up to three letters. The FAT file system for DOS and Windows stores file names as an 8-character name and a three-character extension. The period character is not stored.
The High Performance File System (HPFS), used in Microsoft and IBM's OS/2 stores the file name as a single string, with the "." character as just another character in the file name. The convention of using suffixes continued, even though HPFS supports extended attributes for files, allowing a file's type to be stored in the file as an extended attribute.
Microsoft's Windows NT's native file system, NTFS, and the later ReFS, also store the file name as a single string; again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows. In Windows NT 3.5, a variant of the FAT file system, called VFAT appeared; it supports longer file names, with the file name being treated as a single string.
Windows 95, with VFAT, introduced support for long file names, and removed the 8.3 name/extension split in file names from non-NT Windows.
The classic Mac OS disposed of filename-based extension metadata entirely; it used, instead, a distinct file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked.[2] macOS, however, uses filename suffixes as a consequence of being derived from the UNIX-like NeXTSTEP operating system, in addition to using type and creator codes.
In Commodore systems, files can only have four extensions: PRG, SEQ, USR, REL. However, these are used to separate data types used by a program and are irrelevant for identifying their contents.
With the advent of graphical user interfaces, the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with a given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file. The assumption was still that any extension represented a single file type; there was an unambiguous mapping between extension and icon.
When the Internet age first arrived, those using Windows systems that were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM, while those using Macintosh or UNIX computers could use the recommended .html filename extension. This also became a problem for programmers experimenting with the Java programming language, since it requires the four-letter suffix .java for source code files and the five-letter suffix .class for Java compiler object code output files.[3]
Content type
[edit]Filename extensions may be considered a type of metadata.[4] They are commonly used to imply information about the way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the specific file system used; usually the extension is the substring which follows the last occurrence, if any, of the dot character (example: txt is the extension of the filename readme.txt, and html the extension of index.html).
On file systems of some mainframe systems such as CMS in VM, VMS, and of PC systems such as CP/M and derivative systems such as MS-DOS, the extension is a separate namespace from the filename. Under Microsoft's DOS and Windows, extensions such as EXE, COM or BAT indicate that a file is a program executable. In OS/360 and successors, the part of the dataset name following the last period, called the low level qualifier, is treated as an extension by some software, e.g., TSO EDIT, but it has no special significance to the operating system itself; the same applies to Unix files in MVS.
The filename extension was originally used to determine the file's generic type.[citation needed] The need to condense a file's type into three characters frequently led to abbreviated extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WSn, where n was the program's version number. Also, conflicting uses of some filename extensions developed. One example is .rpm, used for both RPM Package Manager packages and RealPlayer Media files;.[5] Others are .qif, shared by DESQview fonts, Quicken financial ledgers, and QuickTime pictures;[6] .gba, shared by GrabIt scripts and Game Boy Advance ROM images;[7] .sb, used for SmallBasic and Scratch; and .dts, being used for Dynamix Three Space and DTS.
Compared to MIME type
[edit]In many Internet protocols, such as HTTP and MIME email, the type of a bitstream is stated as the media type, or MIME type, of the stream, rather than a filename extension. This is given in a line of text preceding the stream, such as Content-type: text/plain.
There is no standard mapping between filename extensions and media types, resulting in possible mismatches in interpretation between authors, web servers, and client software when transferring files over the Internet. For instance, a content author may specify the extension svgz for a compressed Scalable Vector Graphics file, but a web server that does not recognize this extension may not send the proper content type application/svg+xml and its required compression header, leaving web browsers unable to correctly interpret and display the image.
BeOS, whose BFS file system supports extended attributes, would tag a file with its media type as an extended attribute. Some desktop environments, such as KDE Plasma and GNOME, associate a media type with a file by examining both the filename suffix and the contents of the file, in the fashion of the file command, as a heuristic. They choose the application to launch when a file is opened based on that media type, reducing the dependency on filename extensions. macOS uses both filename extensions and media types, as well as file type codes, to select a Uniform Type Identifier by which to identify the file type internally.
Executable programs
[edit]This section may require cleanup to meet Wikipedia's quality standards. The specific problem is: intractable construction. (November 2015) |
The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script, e.g., for the Bourne shell or for Python, and the interpreter name being suffixed to the command name, a practice common on systems that rely on associations between filename extension and interpreter, but sharply deprecated[8] in Unix-like systems, such as Linux, Oracle Solaris, BSD-based systems, and Apple's macOS, where the interpreter is normally specified as a header in the script ("shebang").
On association-based systems, the filename extension is generally mapped to a single, system-wide selection of interpreter for that extension (such as ".py" meaning to use Python), and the command itself is runnable from the command line even if the extension is omitted (assuming appropriate setup is done). If the implementation language is changed, the command name extension is changed as well, and the OS provides a consistent API by allowing the same extensionless version of the command to be used in both cases. This method suffers somewhat from the essentially global nature of the association mapping, as well as from developers' incomplete avoidance of extensions when calling programs, and that developers can not force that avoidance. Windows is the only remaining widespread employer of this mechanism.
On systems with interpreter directives, including virtually all versions of Unix, command name extensions have no special significance, and are by standard practice not used, since the primary method to set interpreters for scripts is to start them with a single line specifying the interpreter to use. In these environments, including the extension in a command name unnecessarily exposes an implementation detail which puts all references to the commands from other programs at future risk if the implementation changes. For example, it would be perfectly normal for a shell script to be reimplemented in Python or Ruby, and later in C or C++, all of which would change the name of the command were extensions used. Without extensions, a program always has the same extension-less name, with only the interpreter directive or magic number changing, and references to the program from other programs remain valid.
Security issues
[edit]File extensions alone are not a reliable indicator of a file's type, as the extension can be modified without changing the file's contents, such as to disguise malicious content. Therefore, especially in the context of cybersecurity, a file's true nature should be examined for its signature, which is a distinctive sequence of bytes affixed to a file's header. This is accomplished using file identification software or a hex editor, which provides a hex dump of a file's contents.[9] For example, on UNIX-like systems, it is not uncommon to find files with no extensions at all,[10] as commands such as file are meant to be used instead, and will read the file's header to determine its content.[citation needed]
Malware such as Trojan horses typically takes the form of an executable, but any file type that performs input/output operations may contain malicious code. A few data file types such as PDFs have been found to be vulnerable to exploits that cause buffer overflows.[11] There have been instances of malware crafted to exploit such vulnerabilities in some Windows applications when opening a file with an overly long, unhandled filename extension.
File managers may have an option to hide filenames extensions. This is the case for File Explorer, the file browser provided with Microsoft Windows, which by default does not display extensions. Malicious users have tried to spread computer viruses and computer worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs. The idea is that this will appear as LOVE-LETTER-FOR-YOU.TXT, a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case, written in VBScript.[11] The default behavior for ReactOS is to display filename extensions in ReactOS Explorer. Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of filename extensions that should be considered "dangerous" in certain "zones" of operation, such as when downloaded from the web or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible.[citation needed]
A virus may couple itself with an executable without actually modifying the executable. These viruses, known as companion viruses, attach themselves in such a way that they are executed when the original file is requested. One way such a virus does this involves giving the virus the same name as the target file, but with a different extension to which the operating system gives priority, and often assigning the former a "hidden" attribute to conceal the malware's existence. The efficacy of this approach depends on whether the user attempts to open the intended file by entering a command and whether the user includes the extension. Later versions of DOS and Windows check for and attempt to run .COM files first by default, followed by .EXE and finally .BAT files. In this case, the infected file is the one with the .COM extension, which the user unwittingly executes.[10][11]
Some viruses take advantage of the similarity between the ".com" top-level domain and the .COM filename extension by emailing malicious, executable command-file attachments under names superficially similar to URLs (e.g., "myparty.yahoo.com"), with the effect that unaware users click on email-embedded links that they think lead to websites but actually download and execute the malicious attachments.[citation needed]
See also
[edit]References
[edit]- ^ "What Is a File?" (PDF). z/VM 7.2 CMS Primer (PDF). IBM. 2021-12-05. p. 7. SC24-6265-01.
One thing you need to know about creating files with z/VM is that each file needs its own three-part identifier. The first part of the identifier is the file name. The second part is the file type. And the third part is the file mode. These three file identifiers are often abbreviated fn ft fm.
- ^ "Mac Creator and File Type codes". livecode.byu.edu. Retrieved 2022-09-02.
- ^ "javac – Java programming language compiler". Sun Microsystems, Inc. 2004. Retrieved 2009-05-31.
Source code file names must have .java suffixes, class file names must have .class suffixes, and both source and class files must have root names that identify the class.
- ^ Stauffer, Todd; McElhearn, Kirk (2006). Mastering Mac OS X. John Wiley & Sons. pp. 95–96. ISBN 9780782151282. Retrieved 2 October 2017.
- ^ File Extension .RPM Details from filext.com
- ^ File Extension .QIF Details from filext.com
- ^ File Extension .GBA Details from filext.com
- ^ Commandname Extensions Considered Harmful
- ^ Aquilina, James M.; Casey, Eoghan; Malin, Cameron H. (2008). Malware Forensics: Investigating and Analyzing Malicious Code. Syngress. pp. 211, 298–299. ISBN 978-1-59749-268-3. Retrieved 2025-02-25.
- ^ a b Skoudis, Ed; Zeltser, Lenny (2004). Malware: Fighting Malicious Code. Prentice Hall. pp. 32–34, 253–254. ISBN 0-13-101405-6. Retrieved 2025-02-25.
- ^ a b c Grimes, Roger (August 2001). Malicious Mobile Code: Virus Protection for Windows. O'Reilly Media. pp. 41–42, 71–74, 221–222, 395–396, 422. ISBN 1-56592-682-X. Retrieved 2025-02-25.
External links
[edit]
Media related to Filename extensions at Wikimedia Commons- Database of filename extensions at FileInfo.com
Filename extension
View on GrokipediaFundamentals
Definition and Purpose
A filename extension, also known as a file extension, is a suffix appended to the end of a filename, typically consisting of a period followed by a short string of characters, usually one to four letters or digits, such as ".txt" in the filename "document.txt".[6][7] This suffix serves as a conventional indicator of the file's type or format, helping both users and software systems to recognize and handle the file appropriately.[2] The primary purposes of filename extensions include aiding operating systems and applications in identifying the file format to determine the appropriate software for opening, editing, or processing the file; facilitating the organization of files by type within directories for easier management; and providing user convenience by visually signaling the file's intended use through standardized conventions.[2][8] For instance, extensions promote interoperability across different computing environments by allowing files to retain type information even when metadata is not preserved during transfer.[8] Common examples illustrate these roles: the ".jpg" extension denotes Joint Photographic Experts Group (JPEG) files, which are compressed raster images suitable for photographs and graphics in image viewing or editing applications; ".exe" identifies executable program files on Windows systems, executable by the operating system to run software; and ".pdf" signifies Portable Document Format files, designed for documents that preserve layout, fonts, and images across various platforms and devices without alteration.[2][9] The portion of the filename preceding the extension and period is known as the stem or base name, which uniquely identifies the file's content within its type category.[6]Historical Development
The concept of filename extensions traces its roots to early time-sharing systems in the 1960s. In MIT's Compatible Time-Sharing System (CTSS), first demonstrated in 1961 on an IBM 709, each user file consisted of two separate names: a primary name up to six characters long and a secondary name of similar length, which described the file's type or processing requirements, such as "FAP" for assembly language source or "DATA" for data files.[10] These secondary names functioned as precursors to modern extensions, aiding the system in determining how to handle files, though without a dot separator or fixed length limit. By the mid-1960s, Digital Equipment Corporation (DEC) advanced this idea in systems like the PDP-6 multiprogramming monitor, released in 1964, which explicitly used "filename extensions" separated by a dot (e.g., filename.ext) to denote file types, directly influencing later designs.[11] This convention carried over to DEC's PDP-8 and other minicomputers, where extensions helped distinguish executables, sources, and data. In the 1970s, as personal computing emerged, early word processors like WordStar (1978) adopted extensions such as .WS for its proprietary format, standardizing their use for document interchange on microcomputers.[12] The rise of personal systems amplified extensions' role, enabling users to quickly identify file purposes amid growing software diversity. Control Program for Microcomputers (CP/M), developed by Gary Kildall and first released in 1974 (version 1.4 in 1975), formalized the 8.3 filename format—eight characters for the base name and three for the extension—drawing from DEC conventions to fit hardware constraints like limited directory space on 8-inch floppy disks.[13] This structure, where extensions like .COM for executables or .ASM for assembly code indicated types, became a de facto standard for microcomputers. Microsoft Disk Operating System (MS-DOS), launched in 1981 as 86-DOS adapted for IBM PC, directly cloned CP/M's 8.3 format, embedding it deeply into personal computing ecosystems.[14] Its influence persisted in Windows until 1995, when Windows 95 introduced long filenames via the VFAT extension, supporting up to 255 characters while maintaining backward compatibility with 8.3 short names.[15] In contrast, Unix-like systems from the 1970s, such as Version 7 Unix (1979), eschewed enforced extensions, treating filenames as arbitrary strings up to 14 characters without inherent type semantics; instead, file types were determined by content inspection via thefile command or, for executables, the shebang mechanism (#!) introduced by Dennis Ritchie around 1979-1980 to specify interpreters like /bin/sh.[16] This approach carried into Linux (1991) and macOS (2001, based on NeXTSTEP), where extensions remain optional conventions rather than OS mandates, though applications often rely on them for usability. By the 1990s, as computing shifted toward networking and multimedia, extensions facilitated cross-platform compatibility; the Internet Assigned Numbers Authority (IANA) began maintaining an informal association of extensions with MIME types via RFCs like 2046 (1996), aiding web browsers in content handling.
The 2000s marked a partial evolution beyond extensions, with embedded metadata gaining prominence for richer identification. The Exchangeable Image File Format (Exif), standardized by the Japan Electronics and Information Technology Industries Association (JEITA) in 1995 (version 1.0) and widely adopted by 2002 with digital cameras, embedded camera settings, timestamps, and GPS data directly in JPEG and TIFF files, reducing reliance on extensions alone for image processing.[17] This trend extended to other formats, emphasizing content-embedded details over filename suffixes for more robust, tamper-resistant identification in professional and archival contexts.
Technical Implementation
File System and OS Support
Filename extensions are integrated into the filename attribute across major file systems, serving as a suffix following a period (.) to denote file types, though their enforcement and length limits vary. In the File Allocation Table (FAT) file system, commonly used for removable media and legacy Windows installations, filenames adhere to the 8.3 convention, restricting the base name to 8 characters and the extension to up to 3 characters, for a total of up to 11 characters plus the dot.[6] This format ensures backward compatibility with MS-DOS but limits modern usage, with long filenames stored separately using Unicode while maintaining a short 8.3 alias.[6] The New Technology File System (NTFS), Windows' default, supports extended filenames up to 255 characters in total (including the extension), stored in Unicode without a rigid 8.3 constraint, allowing flexible extension lengths as part of the overall name.[6][18] Linux's ext4 file system treats filenames, including extensions, as arbitrary byte strings (typically ASCII) stored within directory entries, with a maximum length of 255 bytes for the entire name.[19] These entries, formatted asstruct ext4_dir_entry_2, include the full filename in a name field, where the extension follows the conventional dot separator but is not parsed separately by the file system itself.[19] Similarly, Apple's File System (APFS), the default for macOS and iOS, accommodates filenames up to 255 UTF-16 code units, incorporating extensions as part of the Unicode name without distinct length restrictions for the suffix.[20] In contrast, the legacy Hierarchical File System Plus (HFS+), predecessor to APFS, also supports 255 UTF-16 code units per filename, maintaining compatibility for extensions in macOS environments.[20][21]
Operating systems enforce filename extensions differently, influencing their role in file handling. Windows integrates extensions deeply into its ecosystem, using them to determine default application associations for opening files, with the registry mapping extensions like .docx to programs such as Microsoft Word.[22] This reliance makes extensions essential for user interactions in Explorer, where they trigger type-specific behaviors.[2] macOS employs extensions optionally for file identification, prioritizing Uniform Type Identifiers (UTIs) and content-based inspection via magic numbers or headers, while hiding them by default in Finder to simplify the interface—users can toggle visibility globally or per file.[23][24] Linux views extensions as mere conventions without enforcement, relying instead on magic numbers—unique byte sequences at file starts—for type detection through utilities like the file command, allowing robust identification even without extensions.[25][26]
Cross-platform file transfers introduce challenges due to differing conventions, particularly around dot usage and visibility. In Unix-like systems (including Linux and macOS), filenames starting with a dot (e.g., .bashrc) are hidden by default in directory listings, which can confuse Windows users mistaking them for extension-less files or files with leading-dot extensions, potentially leading to unintended modifications or access issues.[27] Case sensitivity in Unix file systems (e.g., File.txt differing from file.txt) contrasts with Windows' default case-insensitivity on NTFS, risking file overwrites or non-detection during portability.[28] Additionally, varying path separators (/ in Unix vs. \ in Windows) complicate scripting, though extensions themselves remain portable as dot-suffixed strings if lengths fit constraints.[28]
Specific examples highlight these dynamics in mobile ecosystems. Android's external storage, often formatted as FAT32 for SD cards and USB drives, inherits FAT's 8.3 limitations, enforcing short extensions to ensure compatibility with apps and legacy devices, though internal storage (using ext4 or F2FS) allows longer names.[6] In iOS and macOS, the Files app and Finder restrict visible extensions by default to reduce clutter, with users able to hide them individually via Get Info or globally in settings, but the system still processes them for type resolution alongside APFS metadata.[23][24] These behaviors underscore the need for tools like cross-platform archives (e.g., ZIP) to preserve extensions during transfers.[28]
Syntax and Conventions
Filename extensions are conventionally placed immediately after the last period (dot) in a filename, serving to denote the file type by appending a suffix to the base name, such as in "document.txt" where "txt" is the extension.[6] This structure is a general convention across most file systems, though the interpretation of the extension can vary. In practice, the extension follows the base name without spaces or additional separators beyond the dot.[6] Case sensitivity for filename extensions depends on the underlying operating system and file system. Unix-like systems, including Linux, treat extensions as case-sensitive, meaning "file.TXT" and "file.txt" are distinct files, with a strong convention favoring lowercase letters for consistency and portability.[29] In contrast, Windows file systems like NTFS are case-preserving but case-insensitive, so "file.txt" and "FILE.TXT" refer to the same file, though mixed case is commonly used in practice.[6] Extensions typically consist of 1 to 5 alphanumeric characters, though modern file systems impose no strict limit on length beyond the overall filename constraint of around 255 characters.[6] Allowed characters are generally letters (a-z, A-Z) and digits (0-9), with occasional use of symbols in specific contexts, but reserved characters such as forward slash (/), backslash (), colon (:), asterisk (*), question mark (?), quotes ("), less than (<), greater than (>), and pipe (|) must be avoided to prevent parsing errors across systems.[30] Compound extensions, like ".tar.gz" for gzip-compressed tar archives, arise when multiple dots are used, with the portion after the final dot treated as the primary extension while earlier parts form part of the base name.[31] The three-letter extension standard originated in the MS-DOS era with the 8.3 filename format, limiting base names to 8 characters and extensions to 3, as seen in legacy formats like ".doc" for documents and ".xls" for spreadsheets.[32] Modern conventions have evolved to include longer or multi-part extensions, such as ".7z" for 7-Zip archives, accommodating more complex file types without the DOS restrictions.[6] International variations in filename extensions are influenced by character encoding support. Contemporary UTF-8-based systems, prevalent in Linux and modern Windows, allow Unicode characters in extensions, enabling non-Latin scripts like Cyrillic or Hanzi for global compatibility.[33] However, legacy ASCII-limited systems from early Unix and DOS eras restricted extensions to 7-bit ASCII, causing compatibility issues with non-Latin characters that could lead to garbled names or rejection in cross-platform transfers.[34]File Identification
Role in Determining Content Type
Filename extensions play a crucial role in enabling operating systems and applications to identify the type of content within a file and select the appropriate software for handling it. When a user or program interacts with a file, the extension serves as a quick indicator that triggers the lookup of associated parsers, viewers, or default applications through system configurations. For instance, a file ending in .mp3 is typically mapped to an audio player, allowing the system to launch media software automatically upon double-clicking the file.[35][36] In Microsoft Windows, this mapping occurs primarily via the Windows Registry, where the HKEY_CLASSES_ROOT key stores associations between extensions and programmatic identifiers (ProgIDs). Each extension, such as .txt, is linked to a ProgID (e.g., txtfile) that defines the content type, default actions like opening with Notepad, and MIME equivalents for interoperability. This registry hive merges user-specific settings from HKEY_CURRENT_USER\Software\Classes with system-wide ones from HKEY_LOCAL_MACHINE\Software\Classes, ensuring consistent behavior across sessions. Applications register their supported extensions during installation to establish these links, enabling seamless file handling.[35][37] On Linux and Unix-like systems, filename extensions are mapped to MIME types using configuration files like /etc/mime.types, which define rules for associating suffixes with media types recognized by desktop environments and applications. For example, the entryaudio/mpeg mp3 directs the system to treat .mp3 files as MPEG audio, often launching a compatible player via desktop entry specifications in /usr/share/applications. This setup, maintained by packages like shared-mime-info, allows graphical interfaces such as GNOME or KDE to determine default handlers based on the extension.[36][38]
Despite their utility, filename extensions have inherent limitations as a sole mechanism for content type determination, since they are user-assigned and easily modifiable, potentially leading to mismatches between the extension and actual file contents. For example, renaming a malicious executable from .exe to .txt could bypass basic checks if only the extension is examined, allowing unintended execution. Systems and applications often supplement extensions with internal file signatures—known as "magic numbers"—which are byte patterns at the file's header that reliably identify formats regardless of the name; tools like the GNU file command prioritize these magic tests over extensions for accurate detection.[39][40]
Practical examples illustrate this role and its caveats. Image viewers like those in Windows or GIMP on Linux check a .png extension to invoke PNG parsers, but may fall back to signature verification if the content does not match, preventing errors with corrupted or disguised files. Similarly, web browsers handling local .js files use the extension to enable JavaScript execution in a secure context, though modern implementations increasingly validate content signatures to mitigate risks from renamed scripts.[35]
Comparison to MIME Types
MIME types, formally known as media types, are standardized identifiers used to specify the nature and format of a file or data stream in internet protocols such as email and the web. They consist of a main type and a subtype separated by a slash, such astext/plain for plain text files or image/jpeg for JPEG images, and were defined by the Internet Engineering Task Force (IETF) in RFC 2045, published in 1996.[41] These types can include additional parameters, like charset=utf-8 for character encoding, enabling precise handling of content across diverse systems.[41]
Filename extensions and MIME types both serve to identify file content for appropriate processing, but they differ fundamentally in scope and reliability. Extensions operate at the filesystem level as informal, human-readable suffixes (e.g., .txt conventionally mapping to text/plain), lacking a centralized authority and relying on operating system or application conventions.[42] In contrast, MIME types are protocol-oriented, hierarchical standards designed for network transmission, where the type/subtype structure and parameters provide explicit, machine-readable details about content semantics and handling requirements.[41] This makes MIME types more robust for interoperability in distributed environments, while extensions are simpler but prone to ambiguity due to their ad-hoc nature.
In practice, the two systems often interact through mapping mechanisms to bridge filesystem and protocol contexts. Web servers like Apache HTTP Server use modules such as mod_mime to derive MIME types from filename extensions during content delivery, consulting configuration files that associate suffixes like .html with text/html.[43] Similarly, web clients and browsers infer MIME types from extensions when handling downloads, falling back to operating system mappings if the server does not specify a Content-Type header, which helps maintain consistency in file association but can propagate errors if the extension is misleading.[42]
While filename extensions offer simplicity and ease of use for local file management, they are error-prone because they can be easily altered or omitted, leading to incorrect content interpretation without deeper inspection. MIME types provide greater precision and standardization, ensuring consistent behavior across protocols, but they demand proper server configuration and can fail if misapplied, as seen in cases where .html files containing XHTML are served as text/html instead of the stricter application/xhtml+xml, potentially causing parsing issues in compliant browsers.[44] Overall, MIME types prioritize accuracy in networked scenarios, whereas extensions suffice for basic, informal identification but risk mismatches without additional validation.[45]
Applications and Special Uses
Executable Files
Filename extensions play a crucial role in identifying executable files, which are programs designed to be run directly by an operating system or interpreter. On Windows, common extensions for executables include .exe for compiled binaries in Portable Executable (PE) format, .bat and .cmd for batch scripts, and .com for legacy command files.[2][46] In Unix-like systems such as Linux, executables often lack mandatory extensions, relying instead on file permissions, but conventions include .sh for shell scripts, .py for Python scripts, .bin for binary images, and .run for self-extracting installers.[46][47] The execution mechanics vary by platform but frequently involve the extension as a cue for the appropriate loader or interpreter. On Windows, when a user launches a file via double-click or command line, the operating system checks the extension to determine the handler; for .exe files, the PE loader in the Windows kernel (ntoskrnl.exe) parses the file header to map it into memory and start execution, ensuring compatibility with the system's architecture. For batch files like .bat, the Command Prompt (cmd.exe) interprets the script line by line. In contrast, Unix-like systems prioritize the execute permission bit set via chmod +x over extensions; upon invocation, the kernel examines the first line for a shebang (e.g., #!/bin/sh for .sh files or #!/usr/bin/env python3 for .py scripts), invoking the specified interpreter if present, which then processes the file content.[47] This supplemental role of extensions in Unix aids in human readability and IDE associations but is not enforced by the loader.[48] Cross-platform execution introduces additional layers, often requiring emulation or virtual environments. For instance, Wine, a compatibility layer for POSIX-compliant systems like Linux, translates Windows API calls to native equivalents, allowing .exe files to run without a full Windows installation by loading the PE format through its own loader (wineboot.exe).[49] Similarly, Java's .jar extension denotes an archive that can serve as a platform-independent executable; if the JAR manifest specifies a Main-Class, it launches via the Java Virtual Machine (JVM) with the command java -jar, abstracting hardware differences across Windows, Linux, and macOS.[50] These approaches mitigate extension-specific incompatibilities but may incur performance overhead due to translation or interpretation.[49] Historically, the use of extensions for executables traces back to MS-DOS 1.0, released in 1981 for the IBM PC, which introduced .com for flat, memory-resident programs limited to 64 KB and .exe for segmented, relocatable executables supporting larger code.[51] This convention influenced Windows development. In Unix-like systems, the evolution continued into the 1990s with the adoption of the Executable and Linkable Format (ELF) around 1992–1995, replacing the simpler a.out format; ELF files typically have no extension but use the same permission and shebang mechanisms for execution.[52][53]Multiple or Hidden Extensions
Filename extensions can be compounded to indicate layered file processing, such as archiving followed by compression. For instance, a.tar.gz file represents a tar archive that has been compressed with gzip, where the .tar extension denotes the tape archive format for bundling multiple files, and .gz indicates the gzip compression applied afterward.[54] Similarly, .js.map files use a compound extension to denote source map files associated with JavaScript bundles, aiding in debugging minified code. Systems typically parse these by examining the extension from right to left, prioritizing the innermost or most recent operation, though custom handling may be required for accurate identification in software.[55]
In Windows, file extensions for known file types can be hidden by default through File Explorer settings, suppressing their display to simplify the user interface. This feature is enabled via the View tab in File Explorer options, where "File name extensions" is unchecked, causing a file like resume.docx to appear as resume. While intended for usability, this concealment poses risks, as it can mask malicious files, such as executables disguised with benign-looking names, potentially leading to unintended execution of malware.[2]
Files without extensions are common in Unix-like systems, particularly for binaries and executables, as these operating systems do not rely on extensions for type identification. Instead, the file command uses magic numbers—unique byte sequences at the file's beginning—to determine content types, such as recognizing ELF binaries via their header signatures defined in standards like <elf.h>. This approach allows executables like Unix binaries to function without any suffix, emphasizing content over naming conventions.[56]
Double extensions, often referred to as "double file extensions," represent a security vulnerability where a filename includes two extensions, such as photo.jpg.exe or document.pdf.exe, to disguise malicious executables as harmless images or documents. This technique exploits operating systems that hide known file extensions by default, causing the file to appear benign (e.g., photo.jpg) to the user and potentially leading to unintended execution of malware. Adversaries commonly use this in phishing attacks and file upload vulnerabilities. The MITRE ATT&CK framework classifies this as a masquerading technique (T1036.007), with examples including PreviewReport.DOC.exe used by threat actors like Bazar for initial access via phishing.[5]
Note that this deceptive practice differs from legitimate naming conventions where dimensions or other attributes are incorporated into the base filename, such as image-200x300.jpg. In these cases, there is only one extension (.jpg), and "-200x300" forms part of the base filename to indicate the image dimensions. This is a standard convention in content management systems like WordPress for automatically generated resized or cropped image variants.[57]
Platform-specific conventions further illustrate non-standard extension use. On macOS, applications are distributed as bundles—directories structured as packages with the .app extension, such as Chess.app, which the Finder treats as a single file while hiding the suffix by default to maintain a clean appearance. This bundling organizes executables, resources, and metadata without altering core extension semantics. In contrast, Linux environments often avoid extensions for shell scripts, following best practices like those in Google's shell style guide, which recommend no extension for executables added to the PATH to enable direct invocation without suffixes, reserving .sh for non-executable library files.[58][59]
Security and Risks
Associated Vulnerabilities
Filename extension spoofing involves attackers renaming malicious files with innocuous extensions to deceive users and bypass security filters, such as changing an executable file frommalware.exe to photo.jpg to appear as an image. This tactic exploits user trust in file extensions for quick identification, leading to unintended execution of harmful code when the file is opened. A notable example is the ILOVEYOU worm from 2000, which spread via email attachments named LOVELETTER-FOR-YOU.TXT.vbs; Windows' default setting to hide known file extensions made it appear as a harmless .txt file, prompting users to open it and triggering the Visual Basic script that infected systems worldwide.[60][61]
Double file extension exploits are a common vector in malware distribution through phishing emails and file upload vulnerabilities, where attackers disguise malicious executables to bypass filters or trick users. These exploits leverage systems that parse only the final extension in a filename, often combined with operating system behaviors that hide known file extensions (such as Windows' default settings), allowing attackers to append a benign-looking extension before the malicious one. For example, a file named photo.jpg.exe executes as an executable but may appear simply as photo.jpg, tricking users into believing it is a harmless image. Similarly, document.doc.exe displays as a document but executes as a program. This masquerading technique enables the delivery of malware disguised as safe documents or images, evading basic extension-based checks in applications or antivirus software. For instance, in web upload vulnerabilities, filenames like image.jpg.php can bypass filters expecting image files, permitting server-side script execution if the application overlooks the hidden executable extension.[5][62]
Auto-execution risks arise from legacy behaviors in email clients and browsers that automatically launch associated applications or scripts upon detecting certain extensions, without user confirmation, potentially running malicious code directly. In older versions of Microsoft Outlook and Internet Explorer, extensions like .exe, .bat, or .vbs triggered immediate execution when attachments were previewed or downloaded, amplifying the impact of spoofed files. This vulnerability has historically facilitated rapid worm propagation, as seen in early 2000s email-based attacks where clicking a disguised executable led to system compromise without additional warnings.[63]
Case sensitivity attacks exploit discrepancies between case-insensitive systems like Windows NTFS and case-sensitive ones like Unix/Linux ext4, enabling name collisions that can overwrite files, alter permissions, or grant unauthorized access via filename extensions. For example, an attacker could create colliding files such as script.py and Script.PY (where the latter links to a sensitive location); on Windows, they resolve to the same file, potentially executing unintended code or exposing data during cross-platform operations. A real-world instance is CVE-2021-21300 in Git, where case-insensitive file systems allowed remote code execution by cloning repositories with colliding directory names and symlinks, such as a (symlink to .git/hooks/) and A/post-checkout (malicious script), bypassing access controls on mixed-sensitivity environments.[64]
