Recent from talks
Nothing was collected or created yet.
Natural sort order
View on WikipediaIn computing, natural sort order (or natural sorting) is a way of ordering strings that treats embedded numbers as whole numerical values rather than sequences of characters.
While standard alphabetical order compares strings character-by-character (where "10" sorts before "2" because "1" is less than "2"), natural sort order orders them by magnitude of the number, placing "2" before "10".
Natural sort order is designed to address the shortcoming of standard lexicographical order, which often produces counter-intuitive results for humans when dealing with numbered lists, filenames, or version numbers.[1]
| Natural | Lexicographical |
|---|---|
| 1.jpg
2.jpg 3.jpg 9.jpg 10.jpg 11.jpg |
1.jpg
10.jpg 11.jpg 2.jpg 3.jpg 9.jpg |
Problem with standard sorting
[edit]In standard alphabetical (lexicographical) sorting, strings are compared character by character from left to right. This causes numbers to be sorted based on the value of their first digit, rather than their whole numerical value.
For example, a computer using standard sorting will place the string "11" before "2". This occurs because the character "1" (the first digit of 11) has a lower code value than "2". While mathematically correct in terms of character codes, this ordering disrupts the logical sequence expected by users, particularly in file management and data lists.
Operation
[edit]Natural sorting algorithms generally operate by splitting strings into "chunks" of text and numbers.
- Text chunks are compared alphabetically (often case-insensitively).
- Numeric chunks are parsed into integer values and compared numerically.
Comparison of algorithms
[edit]| Standard Sorting (Lexicographical) | Natural Sorting |
|---|---|
| file1.txt
file10.txt file11.txt file12.txt file2.txt file20.txt file3.txt |
file1.txt
file2.txt file3.txt file10.txt file11.txt file12.txt file20.txt |
Handling edge cases
[edit]Different implementations of natural sort may handle edge cases differently:
- Leading zeros: Some algorithms treat "01" and "1" as identical, while others may enforce an ordering where "01" follows "1" (or vice versa) to ensure a deterministic sort.
- Whitespace: Most implementations ignore leading or trailing whitespace around the numbers to prevent sorting anomalies.
- Decimals and Version Numbers: A variation of natural sort, often called version sort, is designed to handle multiple numeric segments separated by dots (e.g., 1.2.10 vs 1.2.2). In standard sort, 1.2.10 precedes 1.2.2; in version sort, the segments are parsed individually, correctly placing 1.2.10 after 1.2.2.
History and implementations
[edit]Functionality to sort by natural sort order is now widely available in software libraries for many programming languages and operating systems.
The concept gained significant visibility in the Macintosh community. During the 1996 MacHack conference, the Natural Order Mac OS System Extension was conceived and implemented overnight as an entry for the Best Hack contest.[2][3] Subsequently, Dave Koelle published the "Alphanum Algorithm" in 1997,[4] a popular reference implementation that influenced many later libraries. Martin Pool published "Natural Order String Comparison" in 2000.[5]
Modern implementations include:
- PHP: The natsort() function is built into the standard library.[6]
- Python: The natsort library is a widely used third-party package.[7]
- Perl: The module Sort::Naturally is available via CPAN.[8]
- Unix/Linux: The GNU ls and sort commands support natural sorting via the -v (version sort) flag (for ls) or the -V flag (for sort).[9]
- .NET/C#: Various extensions exist, such as NaturalSort.Extension.[10]
File managers such as Windows Explorer (since Windows XP) and Midnight Commander utilize natural sorting by default to display file lists.
See also
[edit]References
[edit]- ^ "Sorting for Humans : Natural Sort Order". blog.codinghorror.com. 12 December 2007.
- ^ "Natural Order Numerical Sorting".
- ^ "TidBITS: The Natural Order of Things". 3 February 1997.
- ^ "Dave Koelle's Alphanum Algorithm".
- ^ "Martin Pool's Natural Order String Comparison".
- ^ "PHP: natsort - Manual". php.net.
- ^ Morton, Seth M. (23 December 2021). "natsort: Simple yet flexible natural sorting in Python" – via PyPI.
- ^ "Sort::Naturally - metacpan.org". metacpan.org.
- ^ "Version sort overview (GNU Coreutils 9.8)". www.gnu.org. Retrieved 2025-10-26.
- ^ Pažourek, Tomáš (1 April 2022). "NaturalSort.Extension: Support for natural sorting in .NET/C#". github.com.
Natural sort order
View on Grokipediaimg1.png
img10.png
img12.png
img2.png
img1.png
img10.png
img12.png
img2.png
img1.png
img2.png
img10.png
img12.png
img1.png
img2.png
img10.png
img12.png
natsort() and strnatcmp() functions, which maintain key-value associations during array sorting, and MariaDB's natural_sort_key() function for database indexing since version 10.7.[1][3] One foundational implementation is Martin Pool's Natural Order String Comparison routine, released around 2000, which influenced subsequent adoptions in languages like Python (via libraries such as natsort) and Ruby.[2]
Fundamentals
Definition
Natural sort order, also known as natural ordering or human sort, is a method for arranging strings that mimics intuitive human expectations by treating embedded numeric substrings as numerical values rather than sequences of characters. In this approach, alphanumeric strings are sorted such that sequences like "item2" appear before "item10", contrasting with pure character-by-character comparisons that would place "item10" ahead due to the initial '1' versus '2'. This sorting technique was first popularized by Stuart Cheshire in 1996 as a Macintosh system extension to improve file listing usability in the Finder.[4] The key characteristics of natural sort order involve parsing strings into alternating chunks of non-numeric (typically alphabetic) and numeric components. Non-numeric chunks are compared using standard lexicographical ordering, while numeric chunks are interpreted as integers and compared by their numerical value, ignoring leading zeros except as tie-breakers. Non-alphanumeric characters, such as spaces or punctuation, are generally preserved in their relative positions and compared lexicographically to maintain the overall string structure. This chunking process ensures that the sorting remains stable and intuitive for mixed-content strings, such as filenames or version numbers.[5][4] The primary motivation for natural sort order stems from the shortcomings of conventional lexicographical sorting in programming languages and file systems, where character-wise comparison leads to counterintuitive results—for instance, "file10" sorting before "file2" because "1" precedes "2" in ASCII order. By prioritizing numerical interpretation for digit sequences, natural sorting aligns more closely with how humans perceive and expect ordered lists, enhancing usability in applications like directory listings and data displays. This human-centric design has since been adopted in various software ecosystems, including PHP's strnatcmp function, which implements the core comparison logic.[4][6] At a basic level, the chunking mechanism in natural sort order can be described as iterating through the string to separate contiguous runs of digits from runs of non-digit characters, then evaluating these segments in sequence during comparison: non-digit runs via string comparison and numeric runs via integer conversion.[5]Comparison to Lexicographical Sorting
Lexicographical sorting, also known as dictionary or alphabetical order, compares strings character by character using their ASCII or Unicode code points, treating digits as individual characters rather than numerical values. For instance, the strings "10" and "2" are compared such that "10" precedes "2" because the first character '1' (code point 49) is less than '2' (code point 50). In contrast, natural sort order extends this by identifying and extracting contiguous numeric substrings within the text, interpreting them as integers for comparison while treating non-numeric parts lexicographically. This results in a more intuitive sequence for mixed alphanumeric data, such as ordering "1", "2", "10" as 1 < 2 < 10, rather than the lexicographical 1 < 10 < 2.[5] The primary advantage of natural sorting lies in its alignment with human perceptual ordering, particularly for lists involving version numbers, file names, or identifiers where numerical magnitude should dictate position over character precedence. This enhances readability and minimizes errors in human interpretation of sorted outputs, as demonstrated in applications like sorting filenames such as "rfc1.txt" < "rfc2.txt" < "rfc10.txt".[5][6] However, natural sorting incurs a slightly higher computational cost than pure lexicographical sorting due to the additional parsing required to separate and numerically evaluate digit sequences, increasing the constant factor in its linear O(n) time complexity per comparison—though both methods share the same asymptotic bounds. It is less suitable for purely textual data lacking numbers, where the simplicity and efficiency of lexicographical comparison suffice without the overhead of chunking.[5] The following table illustrates the differences using a sample list of strings:| Input List | Lexicographical Order | Natural Order |
|---|---|---|
| ["a1", "a10", "a2"] | a1, a10, a2 | a1, a2, a10 |
Algorithms and Implementation
Core Algorithm
The core algorithm for natural sort order tokenizes input strings into alternating segments of non-numeric text and numeric runs, then compares these segments sequentially by type: non-numeric segments via standard lexicographical string comparison, and numeric segments by converting to integers for numerical comparison. This approach ensures that embedded numbers are ordered by their mathematical value rather than character-by-character, addressing the limitations of pure lexicographical sorting where "file10" might precede "file2". The algorithm was inspired by early efforts to improve file sorting in graphical interfaces, such as Stuart Cheshire's Natural Order extension for Macintosh systems.[4] The step-by-step process begins with traversing each string character by character to identify contiguous runs: non-digit characters form text segments, while sequences of digits form numeric segments. For each pair of corresponding segments from the two strings being compared, if both are text, they are compared using the locale's standard string collation rules. If both are numeric, the digit sequences are compared numerically (discarding leading zeros for the numerical value, though zero-padding can influence ties by treating the padded form as a string indicator of significance in some variants), and compared arithmetically; for instance, "10" is greater than "2" because 10 > 2. When the current characters are not both digits, they are compared as individual characters using standard string comparison rules. This parsing occurs in a single pass per string during comparison.[2][7] If one string exhausts its segments before the other, the shorter string is deemed smaller, providing a natural tiebreaker based on length. For equal segments up to that point, the comparison continues with any remaining segments. Here is a representative pseudocode implementation of the core comparison function:function natural_compare(s1, s2):
i = 0 # index for s1
j = 0 # index for s2
while i < len(s1) and j < len(s2):
if s1[i].isdigit() and s2[j].isdigit():
function natural_compare(s1, s2):
i = 0 # index for s1
j = 0 # index for s2
while i < len(s1) and j < len(s2):
if s1[i].isdigit() and s2[j].isdigit():
# Extract numeric segments
num1 = extract_numeric_segment(s1, i)
num2 = extract_numeric_segment(s2, j)
# Compare numerically by padding shorter with leading zeros and string compare
len1 = len(num1)
len2 = len(num2)
if len1 < len2:
num1_padded = num1.zfill(len2)
num2_padded = num2
else:
num1_padded = num1
num2_padded = num2.zfill(len1)
if num1_padded < num2_padded:
return -1
if num1_padded > num2_padded:
return 1
# Advance indices past the segments
i += len1
j += len2
else:
# Compare non-numeric characters
if s1 < s2:
return -1
if s1 > s2:
return 1
i += 1
j += 1
# Tiebreaker: shorter string first
if i < len(s1):
return 1
if j < len(s2):
return -1
return 0
# Extract numeric segments
num1 = extract_numeric_segment(s1, i)
num2 = extract_numeric_segment(s2, j)
# Compare numerically by padding shorter with leading zeros and string compare
len1 = len(num1)
len2 = len(num2)
if len1 < len2:
num1_padded = num1.zfill(len2)
num2_padded = num2
else:
num1_padded = num1
num2_padded = num2.zfill(len1)
if num1_padded < num2_padded:
return -1
if num1_padded > num2_padded:
return 1
# Advance indices past the segments
i += len1
j += len2
else:
# Compare non-numeric characters
if s1 < s2:
return -1
if s1 > s2:
return 1
i += 1
j += 1
# Tiebreaker: shorter string first
if i < len(s1):
return 1
if j < len(s2):
return -1
return 0
This pseudocode illustrates the chunk extraction and conditional comparisons central to the algorithm.[](https://github.com/sourcefrog/natsort)[](https://www.c64os.com/post/naturalsort)
The time complexity of the core algorithm is O(n), where n is the total length of the strings being compared, as it involves a single linear pass to extract and evaluate segments without recursive or quadratic operations.[](https://github.com/sourcefrog/natsort)
### Handling Edge Cases
Natural sort order algorithms often require adaptations to handle leading zeros in numeric substrings, as these can affect perceived ordering. In the Alphanum algorithm, numbers with leading zeros are initially compared by their string length, potentially placing "0001" after "1" due to the longer chunk, though subsequent implementations have addressed this by stripping leading zeros for numerical comparison while preserving them for string-like treatment when lengths differ non-significantly.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) For instance, "file01" and "file1" may sort equivalently if padding is ignored, treating the numeric parts as 1 in both cases, or as distinct strings if zero-padding is preserved to maintain version-like distinctions.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html)
Variable-length numbers pose another challenge, where shorter numbers must be implicitly padded during comparison to ensure numerical rather than lexical ordering. Algorithms like Martin Pool's Natural Order String Comparison achieve this by parsing digits sequentially and aligning them logically, such as treating "9" as equivalent to "09" when compared to "10", resulting in "9" preceding "10".[](http://sourcefrog.net/projects/natsort/) This digit-by-digit comparison extends to large numbers beyond integer limits, avoiding overflow by processing as strings while evaluating value.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html)
For non-ASCII characters, natural sorting integrates locale-aware collation to order letters appropriately while maintaining numeric handling. Libraries such as Python's natsort use flags like `ns.LOCALE` to apply system locale rules for non-numeric parts, enabling proper sorting of accented characters (e.g., "café" after "cafe" in French locale) alongside numerical chunks, though full support often requires extensions like PyICU to handle UTF-8 non-ASCII correctly without fallback to ASCII order.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) Without such integration, non-ASCII elements default to byte-wise comparison, leading to unnatural results.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html)
Special cases include empty strings, which typically sort to the beginning as the shortest element; all-numeric strings, treated entirely as numbers for direct numerical evaluation; and mixed case sensitivity, where algorithms default to case-sensitive ordering but can toggle to insensitive via flags (e.g., PHP's `natcasesort` uppercases both strings before comparison).[](https://www.php.net/manual/en/function.natsort.php) These adjustments ensure robustness, such as placing "" before "a1" or sorting "A10" before "a2" in case-insensitive modes.[](https://www.php.net/manual/en/function.natsort.php)
Performance considerations emphasize efficiency in edge-heavy datasets, avoiding unnecessary string-to-number conversions by parsing on-the-fly with state machines or simple loops rather than regex, which can introduce overhead. Pool's implementation scans each character at most once for linear time complexity, suitable for large lists.[](http://sourcefrog.net/projects/natsort/)
To incorporate these, the core algorithm can be adjusted with flags for zero-padding and locale support, as in this pseudocode extension:
This pseudocode illustrates the chunk extraction and conditional comparisons central to the algorithm.[](https://github.com/sourcefrog/natsort)[](https://www.c64os.com/post/naturalsort)
The time complexity of the core algorithm is O(n), where n is the total length of the strings being compared, as it involves a single linear pass to extract and evaluate segments without recursive or quadratic operations.[](https://github.com/sourcefrog/natsort)
### Handling Edge Cases
Natural sort order algorithms often require adaptations to handle leading zeros in numeric substrings, as these can affect perceived ordering. In the Alphanum algorithm, numbers with leading zeros are initially compared by their string length, potentially placing "0001" after "1" due to the longer chunk, though subsequent implementations have addressed this by stripping leading zeros for numerical comparison while preserving them for string-like treatment when lengths differ non-significantly.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) For instance, "file01" and "file1" may sort equivalently if padding is ignored, treating the numeric parts as 1 in both cases, or as distinct strings if zero-padding is preserved to maintain version-like distinctions.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html)
Variable-length numbers pose another challenge, where shorter numbers must be implicitly padded during comparison to ensure numerical rather than lexical ordering. Algorithms like Martin Pool's Natural Order String Comparison achieve this by parsing digits sequentially and aligning them logically, such as treating "9" as equivalent to "09" when compared to "10", resulting in "9" preceding "10".[](http://sourcefrog.net/projects/natsort/) This digit-by-digit comparison extends to large numbers beyond integer limits, avoiding overflow by processing as strings while evaluating value.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html)
For non-ASCII characters, natural sorting integrates locale-aware collation to order letters appropriately while maintaining numeric handling. Libraries such as Python's natsort use flags like `ns.LOCALE` to apply system locale rules for non-numeric parts, enabling proper sorting of accented characters (e.g., "café" after "cafe" in French locale) alongside numerical chunks, though full support often requires extensions like PyICU to handle UTF-8 non-ASCII correctly without fallback to ASCII order.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) Without such integration, non-ASCII elements default to byte-wise comparison, leading to unnatural results.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html)
Special cases include empty strings, which typically sort to the beginning as the shortest element; all-numeric strings, treated entirely as numbers for direct numerical evaluation; and mixed case sensitivity, where algorithms default to case-sensitive ordering but can toggle to insensitive via flags (e.g., PHP's `natcasesort` uppercases both strings before comparison).[](https://www.php.net/manual/en/function.natsort.php) These adjustments ensure robustness, such as placing "" before "a1" or sorting "A10" before "a2" in case-insensitive modes.[](https://www.php.net/manual/en/function.natsort.php)
Performance considerations emphasize efficiency in edge-heavy datasets, avoiding unnecessary string-to-number conversions by parsing on-the-fly with state machines or simple loops rather than regex, which can introduce overhead. Pool's implementation scans each character at most once for linear time complexity, suitable for large lists.[](http://sourcefrog.net/projects/natsort/)
To incorporate these, the core algorithm can be adjusted with flags for zero-padding and locale support, as in this pseudocode extension:
This adds optional stripping for leading zeros and locale-based string comparison, building on standard tokenization.[](http://sourcefrog.net/projects/natsort/)[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html)
## Examples and Applications
### Illustrative Examples
To illustrate [natural sort order](/page/Natural_sort_order), consider a simple list of strings containing alphanumeric elements: ["apple2", "apple10", "apple1"]. In lexicographical sorting, this would order as ["apple1", "apple10", "apple2"] because "1" precedes "2" after the common prefix "apple", and "10" is treated as starting with "1". However, natural sorting chunks the strings into alternating non-numeric and numeric segments, comparing non-numeric parts lexicographically and numeric parts numerically. For these strings, the chunks are: "apple" (non-numeric) followed by "2" (numeric) for the first; "apple" + "10" (numeric) for the second; "apple" + "1" (numeric) for the third. Numerically, 1 < 2 < 10, yielding the output ["apple1", "apple2", "apple10"], which aligns with intuitive human expectations.[](https://www.php.net/manual/en/function.strnatcmp.php)
Version numbering provides another clear demonstration, where strings like ["v1.10", "v1.2", "v2.0"] are sorted. Natural order splits by delimiters such as dots to parse multi-level numeric segments, treating each as a separate numeric chunk while handling non-numeric parts (e.g., "v") lexicographically. Chunks for the first are "v" + "1" (numeric) + "." + "10" (numeric); for the second, "v" + "1" + "." + "2"; for the third, "v" + "2" + "." + "0". Comparing level-by-level: the first two share "v1.", but 2 < 10, so "v1.2" precedes "v1.10"; then "v2.0" follows as 2 > 1, resulting in ["v1.2", "v1.10", "v2.0"]. This avoids the lexicographical result ["v1.10", "v1.2", "v2.0"], where "1.10" sorts before "1.2" due to the "0" in "10" following "2".[](https://github.com/sourcefrog/natsort)
File names often mix text and numbers, as in ["report10.pdf", "report2.pdf", "report100.pdf"]. Natural sorting chunks "report" (non-numeric), followed by the numeric part ("10", "2", "100"), and ".pdf" (non-numeric). Numerically ordering the middle chunks (2 < 10 < 100) produces ["report2.pdf", "report10.pdf", "report100.pdf"], contrasting with lexicographical order ["report10.pdf", "report100.pdf", "report2.pdf"], where "10" and "100" precede "2" due to the initial "1". This logical progression mirrors how users expect files to appear in directories.[](https://www.php.net/manual/en/function.strnatcmp.php)
The following table summarizes additional examples, showing input lists, natural sort outputs, lexicographical outputs, and key chunking insights:
| Input List | Natural Sort Output | Lexicographical Output | Key Chunk Insights |
|-----------------------------|----------------------------------|----------------------------------|--------------------|
| ["a2", "a10", "a1"] | ["a1", "a2", "a10"] | ["a1", "a10", "a2"] | Chunks: "a" (non-numeric) + numeric ("1" < "2" < "10"); numbers parsed as integers.[](https://github.com/sourcefrog/natsort) |
| ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc2086.txt", "rfc822.txt"] | Chunks: "rfc" + numeric ("1" < "822" < "2086") + ".txt"; full numbers compared numerically.[](https://github.com/sourcefrog/natsort) |
| ["img12.png", "img10.png", "img2.png", "img1.png"] | ["img1.png", "img2.png", "img10.png", "img12.png"] | ["img1.png", "img10.png", "img12.png", "img2.png"] | Chunks: "img" + numeric ("1" < "2" < "10" < "12") + ".png"; prevents "10" from sorting before "2".[](https://www.php.net/manual/en/function.strnatcmp.php) |
| ["x2-g8", "x2-y08", "x2-y7"] | ["x2-g8", "x2-y7", "x2-y08"] | ["x2-g8", "x2-y08", "x2-y7"] | Chunks: "x" + "2" (numeric) + "-" + non-numeric ("g8", "y") + numeric ("8", "7", "08"); "7" < "08" numerically, ignoring leading zeros in comparison.[](https://github.com/sourcefrog/natsort) |
For a visual representation of chunking in natural sorting, consider the string "apple10". The process can be depicted as a simple flowchart:
This adds optional stripping for leading zeros and locale-based string comparison, building on standard tokenization.[](http://sourcefrog.net/projects/natsort/)[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html)
## Examples and Applications
### Illustrative Examples
To illustrate [natural sort order](/page/Natural_sort_order), consider a simple list of strings containing alphanumeric elements: ["apple2", "apple10", "apple1"]. In lexicographical sorting, this would order as ["apple1", "apple10", "apple2"] because "1" precedes "2" after the common prefix "apple", and "10" is treated as starting with "1". However, natural sorting chunks the strings into alternating non-numeric and numeric segments, comparing non-numeric parts lexicographically and numeric parts numerically. For these strings, the chunks are: "apple" (non-numeric) followed by "2" (numeric) for the first; "apple" + "10" (numeric) for the second; "apple" + "1" (numeric) for the third. Numerically, 1 < 2 < 10, yielding the output ["apple1", "apple2", "apple10"], which aligns with intuitive human expectations.[](https://www.php.net/manual/en/function.strnatcmp.php)
Version numbering provides another clear demonstration, where strings like ["v1.10", "v1.2", "v2.0"] are sorted. Natural order splits by delimiters such as dots to parse multi-level numeric segments, treating each as a separate numeric chunk while handling non-numeric parts (e.g., "v") lexicographically. Chunks for the first are "v" + "1" (numeric) + "." + "10" (numeric); for the second, "v" + "1" + "." + "2"; for the third, "v" + "2" + "." + "0". Comparing level-by-level: the first two share "v1.", but 2 < 10, so "v1.2" precedes "v1.10"; then "v2.0" follows as 2 > 1, resulting in ["v1.2", "v1.10", "v2.0"]. This avoids the lexicographical result ["v1.10", "v1.2", "v2.0"], where "1.10" sorts before "1.2" due to the "0" in "10" following "2".[](https://github.com/sourcefrog/natsort)
File names often mix text and numbers, as in ["report10.pdf", "report2.pdf", "report100.pdf"]. Natural sorting chunks "report" (non-numeric), followed by the numeric part ("10", "2", "100"), and ".pdf" (non-numeric). Numerically ordering the middle chunks (2 < 10 < 100) produces ["report2.pdf", "report10.pdf", "report100.pdf"], contrasting with lexicographical order ["report10.pdf", "report100.pdf", "report2.pdf"], where "10" and "100" precede "2" due to the initial "1". This logical progression mirrors how users expect files to appear in directories.[](https://www.php.net/manual/en/function.strnatcmp.php)
The following table summarizes additional examples, showing input lists, natural sort outputs, lexicographical outputs, and key chunking insights:
| Input List | Natural Sort Output | Lexicographical Output | Key Chunk Insights |
|-----------------------------|----------------------------------|----------------------------------|--------------------|
| ["a2", "a10", "a1"] | ["a1", "a2", "a10"] | ["a1", "a10", "a2"] | Chunks: "a" (non-numeric) + numeric ("1" < "2" < "10"); numbers parsed as integers.[](https://github.com/sourcefrog/natsort) |
| ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc2086.txt", "rfc822.txt"] | Chunks: "rfc" + numeric ("1" < "822" < "2086") + ".txt"; full numbers compared numerically.[](https://github.com/sourcefrog/natsort) |
| ["img12.png", "img10.png", "img2.png", "img1.png"] | ["img1.png", "img2.png", "img10.png", "img12.png"] | ["img1.png", "img10.png", "img12.png", "img2.png"] | Chunks: "img" + numeric ("1" < "2" < "10" < "12") + ".png"; prevents "10" from sorting before "2".[](https://www.php.net/manual/en/function.strnatcmp.php) |
| ["x2-g8", "x2-y08", "x2-y7"] | ["x2-g8", "x2-y7", "x2-y08"] | ["x2-g8", "x2-y08", "x2-y7"] | Chunks: "x" + "2" (numeric) + "-" + non-numeric ("g8", "y") + numeric ("8", "7", "08"); "7" < "08" numerically, ignoring leading zeros in comparison.[](https://github.com/sourcefrog/natsort) |
For a visual representation of chunking in natural sorting, consider the string "apple10". The process can be depicted as a simple flowchart:
This flowchart outlines the sequential parsing into non-numeric ("apple") and numeric ("10") chunks for comparison against other strings.[](https://www.php.net/manual/en/function.strnatcmp.php)
### Practical Uses
Natural sort order finds widespread application in file systems and graphical explorers, where it ensures that filenames containing numbers are ordered intuitively. In Windows Explorer, this is achieved through the StrCmpLogicalW function, which compares strings by treating embedded digits as numerical values rather than text, preventing sequences like "file2" from appearing after "file10".[](https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strcmplogicalw)[](https://stackoverflow.com/questions/442429/windows-explorer-sort-method) Similarly, macOS Finder employs natural sorting for directory listings, sorting files such as sequentially numbered images (e.g., "image1.jpg", "image2.jpg", "image10.jpg") in numerical progression to align with user expectations.[](https://stackoverflow.com/questions/71121070/mac-os-finder-sorting)[](https://discussions.apple.com/thread/1431921)
In databases and querying systems, natural sort order enhances the organization of results for alphanumeric identifiers. PostgreSQL supports this via extensions like pg_natural_sort_order, which enables custom ORDER BY clauses to handle mixed text and numbers in ID columns, producing more readable outputs for user-facing reports such as inventory lists.[](https://github.com/Zeleo/pg_natural_sort_order) In MySQL, natural sorting is implemented by combining LENGTH() with the column in ORDER BY (e.g., ORDER BY LENGTH(id_column), id_column), effectively grouping strings by length before alphabetical ordering, which is particularly useful for querying product IDs or version numbers.[](https://www.copterlabs.com/natural-sorting-in-mysql/)[](https://stackoverflow.com/questions/153633/natural-sort-in-mysql) This approach improves the relevance of sorted query results in applications like customer reports.
User interfaces in domains such as e-commerce leverage natural sort order to present lists that match human intuition, avoiding disruptions in navigation. For instance, product catalogs with alphanumeric SKUs (e.g., "Part-A1", "Part-A2", "Part-A10") are sorted to place "Part-A10" after "Part-A2", facilitating easier browsing and selection in online stores.[](https://baymard.com/blog/essential-sort-types)[](https://ux.stackexchange.com/questions/40726/default-sort-order-of-a-product-list) Software version histories and directory listings also benefit, as natural ordering ensures chronological or logical progression without misleading placements.
In data processing workflows, natural sort order is essential for managing mixed alphanumeric datasets. ETL pipelines apply it during transformation stages to correctly sequence part numbers or codes (e.g., "Component-001", "Component-010"), preventing errors in downstream analysis or reporting.[](https://stackoverflow.com/questions/69355228/natural-sorting-when-characters-and-numbers-mixed) In spreadsheets like Microsoft Excel, custom sorts approximate natural order by creating helper columns to separate text and numeric components, then sorting hierarchically; this handles mixed data such as inventory tags effectively.[](https://www.ablebits.com/office-addins-blog/sort-mixed-multilevel-numbers-text-excel/)[](https://support.microsoft.com/en-us/office/sort-data-in-a-range-or-table-in-excel-62d0b95d-2a90-4610-a6ae-2e545c4a4654)
The practical benefits of natural sort order include reduced user confusion and improved efficiency in everyday interactions. A common case study is media file organization, where lexicographical sorting might place "episode10" before "episode2", but natural ordering corrects this to follow human sequencing, minimizing errors in playback or archiving.[](https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/)
Despite these advantages, natural sort order has limitations, particularly in performance-critical environments. It incurs higher computational overhead than basic lexicographical sorting due to the need for string parsing and numerical extraction during comparisons, which can slow down operations on large-scale datasets.[](https://dev.to/grahamthedev/alphanumeric-natural-sort-in-mysql-30-years-and-we-still-cant-do-this-402c) Additionally, it is not the default behavior in many tools, requiring custom implementations or extensions that may introduce compatibility issues.[](https://stackoverflow.com/questions/9173558/postgresql-order-by-issue-natural-sort)
## Programming Implementations
### In Python
Python's built-in `sorted()` function and `list.sort()` method perform lexicographical sorting by default, which treats strings as sequences of characters based on their Unicode code points, leading to counterintuitive results for human-readable data containing embedded numbers, such as sorting "file10.txt" after "file2.txt". To implement natural sort order, developers must provide a custom key function that tokenizes strings into comparable chunks, separating alphanumeric and numeric parts for numerical comparison of digits.
A common standard library approach uses the `re` module for regex-based splitting combined with the `key` parameter of `sorted()`. The function `natural_key` processes each string by splitting it on sequences of one or more digits using the pattern `r'(\d+)'`, which captures numeric groups while preserving non-numeric parts as strings; numeric chunks are then converted to integers for proper ordering. This method can be applied as follows:
```python
import re
def natural_key(s):
return [int(c) if c.isdigit() else c for c in re.split(r'(\d+)', s)]
This flowchart outlines the sequential parsing into non-numeric ("apple") and numeric ("10") chunks for comparison against other strings.[](https://www.php.net/manual/en/function.strnatcmp.php)
### Practical Uses
Natural sort order finds widespread application in file systems and graphical explorers, where it ensures that filenames containing numbers are ordered intuitively. In Windows Explorer, this is achieved through the StrCmpLogicalW function, which compares strings by treating embedded digits as numerical values rather than text, preventing sequences like "file2" from appearing after "file10".[](https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strcmplogicalw)[](https://stackoverflow.com/questions/442429/windows-explorer-sort-method) Similarly, macOS Finder employs natural sorting for directory listings, sorting files such as sequentially numbered images (e.g., "image1.jpg", "image2.jpg", "image10.jpg") in numerical progression to align with user expectations.[](https://stackoverflow.com/questions/71121070/mac-os-finder-sorting)[](https://discussions.apple.com/thread/1431921)
In databases and querying systems, natural sort order enhances the organization of results for alphanumeric identifiers. PostgreSQL supports this via extensions like pg_natural_sort_order, which enables custom ORDER BY clauses to handle mixed text and numbers in ID columns, producing more readable outputs for user-facing reports such as inventory lists.[](https://github.com/Zeleo/pg_natural_sort_order) In MySQL, natural sorting is implemented by combining LENGTH() with the column in ORDER BY (e.g., ORDER BY LENGTH(id_column), id_column), effectively grouping strings by length before alphabetical ordering, which is particularly useful for querying product IDs or version numbers.[](https://www.copterlabs.com/natural-sorting-in-mysql/)[](https://stackoverflow.com/questions/153633/natural-sort-in-mysql) This approach improves the relevance of sorted query results in applications like customer reports.
User interfaces in domains such as e-commerce leverage natural sort order to present lists that match human intuition, avoiding disruptions in navigation. For instance, product catalogs with alphanumeric SKUs (e.g., "Part-A1", "Part-A2", "Part-A10") are sorted to place "Part-A10" after "Part-A2", facilitating easier browsing and selection in online stores.[](https://baymard.com/blog/essential-sort-types)[](https://ux.stackexchange.com/questions/40726/default-sort-order-of-a-product-list) Software version histories and directory listings also benefit, as natural ordering ensures chronological or logical progression without misleading placements.
In data processing workflows, natural sort order is essential for managing mixed alphanumeric datasets. ETL pipelines apply it during transformation stages to correctly sequence part numbers or codes (e.g., "Component-001", "Component-010"), preventing errors in downstream analysis or reporting.[](https://stackoverflow.com/questions/69355228/natural-sorting-when-characters-and-numbers-mixed) In spreadsheets like Microsoft Excel, custom sorts approximate natural order by creating helper columns to separate text and numeric components, then sorting hierarchically; this handles mixed data such as inventory tags effectively.[](https://www.ablebits.com/office-addins-blog/sort-mixed-multilevel-numbers-text-excel/)[](https://support.microsoft.com/en-us/office/sort-data-in-a-range-or-table-in-excel-62d0b95d-2a90-4610-a6ae-2e545c4a4654)
The practical benefits of natural sort order include reduced user confusion and improved efficiency in everyday interactions. A common case study is media file organization, where lexicographical sorting might place "episode10" before "episode2", but natural ordering corrects this to follow human sequencing, minimizing errors in playback or archiving.[](https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/)
Despite these advantages, natural sort order has limitations, particularly in performance-critical environments. It incurs higher computational overhead than basic lexicographical sorting due to the need for string parsing and numerical extraction during comparisons, which can slow down operations on large-scale datasets.[](https://dev.to/grahamthedev/alphanumeric-natural-sort-in-mysql-30-years-and-we-still-cant-do-this-402c) Additionally, it is not the default behavior in many tools, requiring custom implementations or extensions that may introduce compatibility issues.[](https://stackoverflow.com/questions/9173558/postgresql-order-by-issue-natural-sort)
## Programming Implementations
### In Python
Python's built-in `sorted()` function and `list.sort()` method perform lexicographical sorting by default, which treats strings as sequences of characters based on their Unicode code points, leading to counterintuitive results for human-readable data containing embedded numbers, such as sorting "file10.txt" after "file2.txt". To implement natural sort order, developers must provide a custom key function that tokenizes strings into comparable chunks, separating alphanumeric and numeric parts for numerical comparison of digits.
A common standard library approach uses the `re` module for regex-based splitting combined with the `key` parameter of `sorted()`. The function `natural_key` processes each string by splitting it on sequences of one or more digits using the pattern `r'(\d+)'`, which captures numeric groups while preserving non-numeric parts as strings; numeric chunks are then converted to integers for proper ordering. This method can be applied as follows:
```python
import re
def natural_key(s):
return [int(c) if c.isdigit() else c for c in re.split(r'(\d+)', s)]
Example usage
files = ['file2.txt', 'file10.txt', 'file1.txt'] sorted_files = sorted(files, key=natural_key)Result: ['file1.txt', 'file2.txt', 'file10.txt']
The regex pattern `r'(\d+)'` ensures that digits are isolated as capturable groups, allowing the split to produce alternating non-numeric and numeric tokens, which are then normalized into a tuple-like list for comparison during sorting. This avoids the need for `functools.cmp_to_key`, as the key function suffices for most cases and is more efficient.
For more robust and flexible natural sorting, third-party libraries like `natsort` extend this concept with optimized implementations. The `natsort` module, available via PyPI, provides the `natsorted()` function, which handles numbers as integers or floats, supports version sorting, and manages signed numbers and decimals. An example is:
```python
from natsort import natsorted
files = ['a10', 'a2', 'a1']
sorted_files = natsorted(files)
# Result: ['a1', 'a2', 'a10']
The regex pattern `r'(\d+)'` ensures that digits are isolated as capturable groups, allowing the split to produce alternating non-numeric and numeric tokens, which are then normalized into a tuple-like list for comparison during sorting. This avoids the need for `functools.cmp_to_key`, as the key function suffices for most cases and is more efficient.
For more robust and flexible natural sorting, third-party libraries like `natsort` extend this concept with optimized implementations. The `natsort` module, available via PyPI, provides the `natsorted()` function, which handles numbers as integers or floats, supports version sorting, and manages signed numbers and decimals. An example is:
```python
from natsort import natsorted
files = ['a10', 'a2', 'a1']
sorted_files = natsorted(files)
# Result: ['a1', 'a2', 'a10']
natsort employs regex to parse strings into numeric and non-numeric components, similar to the standard approach but with additional features like recursive handling for nested data structures.
Regarding performance, the regex-based key function adds computational overhead compared to plain lexicographical sorting due to repeated splitting and type conversions; benchmarks on lists of 1000 alphanumeric strings show it can be 25-50 times slower than native sorted(), though this is acceptable for most applications where correctness outweighs speed. Libraries like natsort mitigate this with caching and optimized parsing, reducing the gap for repeated sorts. Python's sorted() has been stable—preserving relative order of equal elements—since version 2.2, with explicit documentation of this guarantee strengthening in Python 3.x releases, including 3.6 onward, which also benefited from broader performance optimizations in the Timsort algorithm.
In Java and Other Languages
In PHP, natural sort order is supported natively through functions likenatsort() and strnatcmp(), introduced in PHP 4. To sort an array while preserving key-value associations, natsort() can be used directly:
$files = ['file2.txt', 'file10.txt', 'file1.txt'];
natsort($files);
// Result: ['file1.txt', 'file2.txt', 'file10.txt']
$files = ['file2.txt', 'file10.txt', 'file1.txt'];
natsort($files);
// Result: ['file1.txt', 'file2.txt', 'file10.txt']
strnatcmp() function provides a comparison callback for custom sorting, handling numeric segments as integers. These functions follow the natural order algorithm similar to Martin Pool's implementation and are efficient for typical use cases.
In Java, natural sort order for strings is typically implemented using a custom Comparator<String> that parses input strings to distinguish between textual and numeric segments, comparing the former lexicographically and the latter numerically to avoid issues like "file10" sorting before "file2". This approach often involves iterating over characters or using regular expressions to extract chunks, then applying Integer.parseInt to numeric portions for integer comparison. For example, the following comparator splits strings at transitions between letters and digits, compares non-numeric parts as strings, and numeric parts as integers:
import java.util.*;
public class NaturalSortComparator implements Comparator<String> {
public static int comparator(String s1, String s2) {
String[] pt1 = s1.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))");
String[] pt2 = s2.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))");
if (Arrays.equals(pt1, pt2)) return 0;
for (int i = 0; i < Math.min(pt1.length, pt2.length); i++) {
if (!pt1[i].equals(pt2[i])) {
if (!isNumber(pt1[i]) || !isNumber(pt2[i])) {
return pt1[i].compareTo(pt2[i]) > 0 ? 1 : -1;
} else {
int nu1 = Integer.parseInt(pt1[i]);
int nu2 = Integer.parseInt(pt2[i]);
return Integer.compare(nu1, nu2);
}
}
}
return pt1.length > pt2.length ? 1 : -1;
}
private static boolean isNumber(String s) {
return s.matches("-?\\d+");
}
@Override
public int compare(String s1, String s2) {
return comparator(s1, s2);
}
}
import java.util.*;
public class NaturalSortComparator implements Comparator<String> {
public static int comparator(String s1, String s2) {
String[] pt1 = s1.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))");
String[] pt2 = s2.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))");
if (Arrays.equals(pt1, pt2)) return 0;
for (int i = 0; i < Math.min(pt1.length, pt2.length); i++) {
if (!pt1[i].equals(pt2[i])) {
if (!isNumber(pt1[i]) || !isNumber(pt2[i])) {
return pt1[i].compareTo(pt2[i]) > 0 ? 1 : -1;
} else {
int nu1 = Integer.parseInt(pt1[i]);
int nu2 = Integer.parseInt(pt2[i]);
return Integer.compare(nu1, nu2);
}
}
}
return pt1.length > pt2.length ? 1 : -1;
}
private static boolean isNumber(String s) {
return s.matches("-?\\d+");
}
@Override
public int compare(String s1, String s2) {
return comparator(s1, s2);
}
}
Arrays.sort(files, new NaturalSortComparator()); for an array of filenames. Another widely adopted implementation is the AlphanumComparator, based on Dave Koelle's algorithm, which iterates through characters while switching between alphanumeric and numeric modes without regex for better performance, comparing digits chunk-by-chunk to handle variable lengths. Utility libraries provide pre-built comparators; for instance, the AlphanumComparator from open-source projects like JMRI offers a ready-to-use class for natural sorting, supporting case sensitivity options.
In JavaScript, natural sorting is commonly achieved by customizing Array.prototype.sort() with a comparison function that uses regex to pad numeric segments to a fixed length (e.g., 10 digits) with leading zeros, enabling lexical comparison to mimic numerical order for typical use cases. For example:
const naturalSort = (a, b) => {
const re = /(\d+)/g;
return a.replace(re, (n) => n.padStart(10, '0')).localeCompare(b.replace(re, (n) => n.padStart(10, '0')));
};
['file10', 'file2'].sort(naturalSort); // ['file2', 'file10']
const naturalSort = (a, b) => {
const re = /(\d+)/g;
return a.replace(re, (n) => n.padStart(10, '0')).localeCompare(b.replace(re, (n) => n.padStart(10, '0')));
};
['file10', 'file2'].sort(naturalSort); // ['file2', 'file10']
IComparer<string> is implemented similarly to Java, often using LINQ's OrderBy with the comparer for collections. The NaturalStringComparer class, for instance, iterates over characters to extract and compare numeric chunks numerically. Usage example: list.OrderBy(x => x, new NaturalStringComparer()).ToList();. Libraries like NaturalSort.Extension add extension methods for string comparisons supporting this order.
For SQL databases, natural sorting varies by system. In MySQL, it lacks a built-in function but can be approximated via extensions or user-defined functions (UDFs); for example, a custom NATURAL_SORT UDF might use string functions to separate and compare parts, as in SELECT * FROM files ORDER BY NATURAL_SORT(filename);, though standard queries often rely on regex replacements in ORDER BY clauses for basic cases. In contrast, MariaDB provides a built-in natural_sort_key() function for generating sort keys suitable for indexing, available since version 10.7 (released in 2022).
Across languages, implementations commonly rely on either regex-based splitting/padding for simplicity or manual character iteration with mode-switching for efficiency and precision. The table below compares representative syntax snippets:
| Language | Key Snippet Example |
|---|---|
| Java | Comparator<String> cmp = (s1, s2) -> { /* split and [Integer](/page/Integer).parseInt chunks */ }; Arrays.sort(arr, cmp); |
| JavaScript | arr.sort((a, b) => a.replace(/\d+/g, n => n.padStart(10, '0')).localeCompare(b.replace(/\d+/g, n => n.padStart(10, '0')))); |
| C# | list.OrderBy(x => x, new NaturalStringComparer()); where comparer iterates chars and compares numerically. |
Comparator implementations highlights challenges like string immutability, which necessitates creating temporary parsed objects, contrasting with more flexible key-extraction in dynamic languages. For internationalization, the International Components for Unicode (ICU) library is widely adopted across Java, JavaScript (via Intl), and C#, providing numeric collation attributes that enable natural sorting of mixed alphanumeric content in locale-aware manners, such as treating digits uniformly regardless of script.