Hubbry Logo
Natural sort orderNatural sort orderMain
Open search
Natural sort order
Community hub
Natural sort order
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Natural sort order
Natural sort order
from Wikipedia

In computing, natural sort order (or natural sorting) is a way of ordering strings that treats embedded numbers as whole numerical values rather than sequences of characters.

While standard alphabetical order compares strings character-by-character (where "10" sorts before "2" because "1" is less than "2"), natural sort order orders them by magnitude of the number, placing "2" before "10".

Natural sort order is designed to address the shortcoming of standard lexicographical order, which often produces counter-intuitive results for humans when dealing with numbered lists, filenames, or version numbers.[1]

Natural Lexicographical
1.jpg

2.jpg

3.jpg

9.jpg

10.jpg

11.jpg

1.jpg

10.jpg

11.jpg

2.jpg

3.jpg

9.jpg

Problem with standard sorting

[edit]

In standard alphabetical (lexicographical) sorting, strings are compared character by character from left to right. This causes numbers to be sorted based on the value of their first digit, rather than their whole numerical value.

For example, a computer using standard sorting will place the string "11" before "2". This occurs because the character "1" (the first digit of 11) has a lower code value than "2". While mathematically correct in terms of character codes, this ordering disrupts the logical sequence expected by users, particularly in file management and data lists.

Operation

[edit]

Natural sorting algorithms generally operate by splitting strings into "chunks" of text and numbers.

  • Text chunks are compared alphabetically (often case-insensitively).
  • Numeric chunks are parsed into integer values and compared numerically.

Comparison of algorithms

[edit]
Standard Sorting (Lexicographical) Natural Sorting
file1.txt

file10.txt

file11.txt

file12.txt

file2.txt

file20.txt

file3.txt

file1.txt

file2.txt

file3.txt

file10.txt

file11.txt

file12.txt

file20.txt

Handling edge cases

[edit]

Different implementations of natural sort may handle edge cases differently:

  • Leading zeros: Some algorithms treat "01" and "1" as identical, while others may enforce an ordering where "01" follows "1" (or vice versa) to ensure a deterministic sort.
  • Whitespace: Most implementations ignore leading or trailing whitespace around the numbers to prevent sorting anomalies.
  • Decimals and Version Numbers: A variation of natural sort, often called version sort, is designed to handle multiple numeric segments separated by dots (e.g., 1.2.10 vs 1.2.2). In standard sort, 1.2.10 precedes 1.2.2; in version sort, the segments are parsed individually, correctly placing 1.2.10 after 1.2.2.

History and implementations

[edit]

Functionality to sort by natural sort order is now widely available in software libraries for many programming languages and operating systems.

The concept gained significant visibility in the Macintosh community. During the 1996 MacHack conference, the Natural Order Mac OS System Extension was conceived and implemented overnight as an entry for the Best Hack contest.[2][3] Subsequently, Dave Koelle published the "Alphanum Algorithm" in 1997,[4] a popular reference implementation that influenced many later libraries. Martin Pool published "Natural Order String Comparison" in 2000.[5]

Modern implementations include:

  • PHP: The natsort() function is built into the standard library.[6]
  • Python: The natsort library is a widely used third-party package.[7]
  • Perl: The module Sort::Naturally is available via CPAN.[8]
  • Unix/Linux: The GNU ls and sort commands support natural sorting via the -v (version sort) flag (for ls) or the -V flag (for sort).[9]
  • .NET/C#: Various extensions exist, such as NaturalSort.Extension.[10]

File managers such as Windows Explorer (since Windows XP) and Midnight Commander utilize natural sorting by default to display file lists.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Natural sort order is a string sorting algorithm designed to order alphanumeric text in a way that reflects human expectations, treating contiguous sequences of digits as numerical values for comparison rather than as individual characters in lexicographical sequence. This method ensures that items like filenames or version numbers are arranged intuitively, such as placing "file2" before "file10" instead of after it. In contrast to standard lexicographical sorting, which relies solely on character-by-character ASCII or code point comparisons, natural sort order parses strings into alternating chunks of non-numeric text and numbers, comparing text portions alphabetically and numeric portions by their or floating-point values. For example, consider the following lists: Lexicographical sort:

img1.png img10.png img12.png img2.png

img1.png img10.png img12.png img2.png

Natural sort:

img1.png img2.png img10.png img12.png

img1.png img2.png img10.png img12.png

The algorithm typically handles leading zeros minimally (ignoring them for numeric value but not as separators), supports decimal fractions in some implementations, and maintains linear by scanning each string once until a decisive difference is found. It is particularly useful in file systems, , and user interfaces where mixed text and numbers are common, such as sorting directory listings or query results. Natural sorting has been integrated into numerous programming languages and tools, including PHP's natsort() and strnatcmp() functions, which maintain key-value associations during array sorting, and MariaDB's natural_sort_key() function for database indexing since version 10.7. One foundational implementation is Martin Pool's Natural Order String Comparison routine, released around 2000, which influenced subsequent adoptions in languages like Python (via libraries such as natsort) and .

Fundamentals

Definition

Natural sort order, also known as natural ordering or human sort, is a method for arranging strings that mimics intuitive human expectations by treating embedded numeric substrings as numerical values rather than sequences of characters. In this approach, alphanumeric strings are sorted such that sequences like "item2" appear before "item10", contrasting with pure character-by-character comparisons that would place "item10" ahead due to the initial '1' versus '2'. This sorting technique was first popularized by Stuart Cheshire in 1996 as a Macintosh system extension to improve file listing usability in the Finder. The key characteristics of natural sort order involve parsing strings into alternating chunks of non-numeric (typically alphabetic) and numeric components. Non-numeric chunks are compared using standard lexicographical ordering, while numeric chunks are interpreted as integers and compared by their numerical value, ignoring leading zeros except as tie-breakers. Non-alphanumeric characters, such as spaces or , are generally preserved in their relative positions and compared lexicographically to maintain the overall string structure. This chunking process ensures that the sorting remains stable and intuitive for mixed-content strings, such as filenames or version numbers. The primary motivation for natural sort order stems from the shortcomings of conventional lexicographical sorting in programming languages and file systems, where character-wise comparison leads to counterintuitive results—for instance, "file10" sorting before "file2" because "1" precedes "2" in ASCII order. By prioritizing numerical interpretation for digit sequences, natural sorting aligns more closely with how humans perceive and expect ordered lists, enhancing in applications like directory listings and data displays. This human-centric design has since been adopted in various software ecosystems, including PHP's strnatcmp function, which implements the core comparison logic. At a basic level, the chunking mechanism in natural sort order can be described as iterating through the string to separate contiguous runs of digits from runs of non-digit characters, then evaluating these segments in sequence during comparison: non-digit runs via string comparison and numeric runs via conversion.

Comparison to Lexicographical Sorting

Lexicographical sorting, also known as dictionary or , compares strings character by character using their ASCII or code points, treating digits as individual characters rather than numerical values. For instance, the strings "10" and "2" are compared such that "10" precedes "2" because the first character '1' ( 49) is less than '2' ( 50). In contrast, natural sort order extends this by identifying and extracting contiguous numeric substrings within the text, interpreting them as integers for comparison while treating non-numeric parts lexicographically. This results in a more intuitive sequence for mixed alphanumeric data, such as ordering "1", "2", "10" as 1 < 2 < 10, rather than the lexicographical 1 < 10 < 2. The primary advantage of natural sorting lies in its alignment with human perceptual ordering, particularly for lists involving version numbers, file names, or identifiers where numerical magnitude should dictate position over character precedence. This enhances readability and minimizes errors in human interpretation of sorted outputs, as demonstrated in applications like sorting filenames such as "rfc1.txt" < "rfc2.txt" < "rfc10.txt". However, natural sorting incurs a slightly higher computational cost than pure lexicographical sorting due to the additional parsing required to separate and numerically evaluate digit sequences, increasing the constant factor in its linear O(n) time complexity per comparison—though both methods share the same asymptotic bounds. It is less suitable for purely textual data lacking numbers, where the simplicity and efficiency of lexicographical comparison suffice without the overhead of chunking. The following table illustrates the differences using a sample list of strings:
Input ListLexicographical OrderNatural Order
["a1", "a10", "a2"]a1, a10, a2a1, a2, a10
In this example, lexicographical sorting places "a10" before "a2" due to the initial '1' versus '2', while natural sorting recognizes "10" as greater than "2", yielding a numerically sensible result.

Algorithms and Implementation

Core Algorithm

The core algorithm for natural sort order tokenizes input strings into alternating segments of non-numeric text and numeric runs, then compares these segments sequentially by type: non-numeric segments via standard lexicographical string comparison, and numeric segments by converting to integers for numerical comparison. This approach ensures that embedded numbers are ordered by their mathematical value rather than character-by-character, addressing the limitations of pure lexicographical sorting where "file10" might precede "file2". The algorithm was inspired by early efforts to improve file sorting in graphical interfaces, such as Stuart Cheshire's Natural Order extension for Macintosh systems. The step-by-step process begins with traversing each character by character to identify contiguous runs: non-digit characters form text segments, while sequences of digits form numeric segments. For each pair of corresponding segments from the two strings being compared, if both are text, they are compared using the locale's standard collation rules. If both are numeric, the digit sequences are compared numerically (discarding leading zeros for the numerical value, though zero-padding can influence ties by treating the padded form as a string indicator of significance in some variants), and compared arithmetically; for instance, "10" is greater than "2" because 10 > 2. When the current characters are not both digits, they are compared as individual characters using standard comparison rules. This parsing occurs in a single pass per string during comparison. If one string exhausts its segments before the other, the shorter string is deemed smaller, providing a natural tiebreaker based on length. For equal segments up to that point, the comparison continues with any remaining segments. Here is a representative pseudocode implementation of the core comparison function:

function natural_compare(s1, s2): i = 0 # index for s1 j = 0 # index for s2 while i < len(s1) and j < len(s2): if s1[i].isdigit() and s2[j].isdigit():

function natural_compare(s1, s2): i = 0 # index for s1 j = 0 # index for s2 while i < len(s1) and j < len(s2): if s1[i].isdigit() and s2[j].isdigit():

# Extract numeric segments num1 = extract_numeric_segment(s1, i) num2 = extract_numeric_segment(s2, j) # Compare numerically by padding shorter with leading zeros and string compare len1 = len(num1) len2 = len(num2) if len1 < len2: num1_padded = num1.zfill(len2) num2_padded = num2 else: num1_padded = num1 num2_padded = num2.zfill(len1) if num1_padded < num2_padded: return -1 if num1_padded > num2_padded: return 1 # Advance indices past the segments i += len1 j += len2 else: # Compare non-numeric characters if s1 < s2: return -1 if s1 > s2: return 1 i += 1 j += 1 # Tiebreaker: shorter string first if i < len(s1): return 1 if j < len(s2): return -1 return 0

# Extract numeric segments num1 = extract_numeric_segment(s1, i) num2 = extract_numeric_segment(s2, j) # Compare numerically by padding shorter with leading zeros and string compare len1 = len(num1) len2 = len(num2) if len1 < len2: num1_padded = num1.zfill(len2) num2_padded = num2 else: num1_padded = num1 num2_padded = num2.zfill(len1) if num1_padded < num2_padded: return -1 if num1_padded > num2_padded: return 1 # Advance indices past the segments i += len1 j += len2 else: # Compare non-numeric characters if s1 < s2: return -1 if s1 > s2: return 1 i += 1 j += 1 # Tiebreaker: shorter string first if i < len(s1): return 1 if j < len(s2): return -1 return 0

function extract_numeric_segment(s, start): end = start while end < len(s) and s[end].isdigit(): end += 1 return s[start:end]

This pseudocode illustrates the chunk extraction and conditional comparisons central to the algorithm.[](https://github.com/sourcefrog/natsort)[](https://www.c64os.com/post/naturalsort) The time complexity of the core algorithm is O(n), where n is the total length of the strings being compared, as it involves a single linear pass to extract and evaluate segments without recursive or quadratic operations.[](https://github.com/sourcefrog/natsort) ### Handling Edge Cases Natural sort order algorithms often require adaptations to handle leading zeros in numeric substrings, as these can affect perceived ordering. In the Alphanum algorithm, numbers with leading zeros are initially compared by their string length, potentially placing "0001" after "1" due to the longer chunk, though subsequent implementations have addressed this by stripping leading zeros for numerical comparison while preserving them for string-like treatment when lengths differ non-significantly.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) For instance, "file01" and "file1" may sort equivalently if padding is ignored, treating the numeric parts as 1 in both cases, or as distinct strings if zero-padding is preserved to maintain version-like distinctions.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) Variable-length numbers pose another challenge, where shorter numbers must be implicitly padded during comparison to ensure numerical rather than lexical ordering. Algorithms like Martin Pool's Natural Order String Comparison achieve this by parsing digits sequentially and aligning them logically, such as treating "9" as equivalent to "09" when compared to "10", resulting in "9" preceding "10".[](http://sourcefrog.net/projects/natsort/) This digit-by-digit comparison extends to large numbers beyond integer limits, avoiding overflow by processing as strings while evaluating value.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) For non-ASCII characters, natural sorting integrates locale-aware collation to order letters appropriately while maintaining numeric handling. Libraries such as Python's natsort use flags like `ns.LOCALE` to apply system locale rules for non-numeric parts, enabling proper sorting of accented characters (e.g., "café" after "cafe" in French locale) alongside numerical chunks, though full support often requires extensions like PyICU to handle UTF-8 non-ASCII correctly without fallback to ASCII order.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) Without such integration, non-ASCII elements default to byte-wise comparison, leading to unnatural results.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) Special cases include empty strings, which typically sort to the beginning as the shortest element; all-numeric strings, treated entirely as numbers for direct numerical evaluation; and mixed case sensitivity, where algorithms default to case-sensitive ordering but can toggle to insensitive via flags (e.g., PHP's `natcasesort` uppercases both strings before comparison).[](https://www.php.net/manual/en/function.natsort.php) These adjustments ensure robustness, such as placing "" before "a1" or sorting "A10" before "a2" in case-insensitive modes.[](https://www.php.net/manual/en/function.natsort.php) Performance considerations emphasize efficiency in edge-heavy datasets, avoiding unnecessary string-to-number conversions by parsing on-the-fly with state machines or simple loops rather than regex, which can introduce overhead. Pool's implementation scans each character at most once for linear time complexity, suitable for large lists.[](http://sourcefrog.net/projects/natsort/) To incorporate these, the core algorithm can be adjusted with flags for zero-padding and locale support, as in this pseudocode extension:

This pseudocode illustrates the chunk extraction and conditional comparisons central to the algorithm.[](https://github.com/sourcefrog/natsort)[](https://www.c64os.com/post/naturalsort) The time complexity of the core algorithm is O(n), where n is the total length of the strings being compared, as it involves a single linear pass to extract and evaluate segments without recursive or quadratic operations.[](https://github.com/sourcefrog/natsort) ### Handling Edge Cases Natural sort order algorithms often require adaptations to handle leading zeros in numeric substrings, as these can affect perceived ordering. In the Alphanum algorithm, numbers with leading zeros are initially compared by their string length, potentially placing "0001" after "1" due to the longer chunk, though subsequent implementations have addressed this by stripping leading zeros for numerical comparison while preserving them for string-like treatment when lengths differ non-significantly.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) For instance, "file01" and "file1" may sort equivalently if padding is ignored, treating the numeric parts as 1 in both cases, or as distinct strings if zero-padding is preserved to maintain version-like distinctions.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) Variable-length numbers pose another challenge, where shorter numbers must be implicitly padded during comparison to ensure numerical rather than lexical ordering. Algorithms like Martin Pool's Natural Order String Comparison achieve this by parsing digits sequentially and aligning them logically, such as treating "9" as equivalent to "09" when compared to "10", resulting in "9" preceding "10".[](http://sourcefrog.net/projects/natsort/) This digit-by-digit comparison extends to large numbers beyond integer limits, avoiding overflow by processing as strings while evaluating value.[](https://web.archive.org/web/20071211024003/http://www.davekoelle.com/alphanum.html) For non-ASCII characters, natural sorting integrates locale-aware collation to order letters appropriately while maintaining numeric handling. Libraries such as Python's natsort use flags like `ns.LOCALE` to apply system locale rules for non-numeric parts, enabling proper sorting of accented characters (e.g., "café" after "cafe" in French locale) alongside numerical chunks, though full support often requires extensions like PyICU to handle UTF-8 non-ASCII correctly without fallback to ASCII order.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) Without such integration, non-ASCII elements default to byte-wise comparison, leading to unnatural results.[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) Special cases include empty strings, which typically sort to the beginning as the shortest element; all-numeric strings, treated entirely as numbers for direct numerical evaluation; and mixed case sensitivity, where algorithms default to case-sensitive ordering but can toggle to insensitive via flags (e.g., PHP's `natcasesort` uppercases both strings before comparison).[](https://www.php.net/manual/en/function.natsort.php) These adjustments ensure robustness, such as placing "" before "a1" or sorting "A10" before "a2" in case-insensitive modes.[](https://www.php.net/manual/en/function.natsort.php) Performance considerations emphasize efficiency in edge-heavy datasets, avoiding unnecessary string-to-number conversions by parsing on-the-fly with state machines or simple loops rather than regex, which can introduce overhead. Pool's implementation scans each character at most once for linear time complexity, suitable for large lists.[](http://sourcefrog.net/projects/natsort/) To incorporate these, the core algorithm can be adjusted with flags for zero-padding and locale support, as in this pseudocode extension:

function naturalCompare(a, b, options = {ignoreLeadingZeros: true, localeAware: false}) { // Tokenize into chunks as in core algorithm let i = 0, j = 0; while (i < a.length && j < b.length) { let chunkA = getNextChunk(a, i); let chunkB = getNextChunk(b, j); if (isNumeric(chunkA) && isNumeric(chunkB)) { let numA = chunkA; let numB = chunkB; if (options.ignoreLeadingZeros) { numA = stripLeadingZeros(chunkA); numB = stripLeadingZeros(chunkB); } // Compare numerically by padding shorter with leading zeros let lenA = numA.length; let lenB = numB.length; if (lenA < lenB) { numA = numA.zfill(lenB); } else if (lenB < lenA) { numB = numB.zfill(lenA); } if (numA < numB) return -1; if (numA > numB) return 1; } else { let strCmp = options.localeAware ? localeCompare(chunkA, chunkB) : chunkA.localeCompare(chunkB); if (strCmp !== 0) return strCmp; } i += chunkA.length; j += chunkB.length; } // Handle remaining parts, empty strings sort first return (a.length - i) - (b.length - j); }

This adds optional stripping for leading zeros and locale-based string comparison, building on standard tokenization.[](http://sourcefrog.net/projects/natsort/)[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) ## Examples and Applications ### Illustrative Examples To illustrate [natural sort order](/page/Natural_sort_order), consider a simple list of strings containing alphanumeric elements: ["apple2", "apple10", "apple1"]. In lexicographical sorting, this would order as ["apple1", "apple10", "apple2"] because "1" precedes "2" after the common prefix "apple", and "10" is treated as starting with "1". However, natural sorting chunks the strings into alternating non-numeric and numeric segments, comparing non-numeric parts lexicographically and numeric parts numerically. For these strings, the chunks are: "apple" (non-numeric) followed by "2" (numeric) for the first; "apple" + "10" (numeric) for the second; "apple" + "1" (numeric) for the third. Numerically, 1 < 2 < 10, yielding the output ["apple1", "apple2", "apple10"], which aligns with intuitive human expectations.[](https://www.php.net/manual/en/function.strnatcmp.php) Version numbering provides another clear demonstration, where strings like ["v1.10", "v1.2", "v2.0"] are sorted. Natural order splits by delimiters such as dots to parse multi-level numeric segments, treating each as a separate numeric chunk while handling non-numeric parts (e.g., "v") lexicographically. Chunks for the first are "v" + "1" (numeric) + "." + "10" (numeric); for the second, "v" + "1" + "." + "2"; for the third, "v" + "2" + "." + "0". Comparing level-by-level: the first two share "v1.", but 2 < 10, so "v1.2" precedes "v1.10"; then "v2.0" follows as 2 > 1, resulting in ["v1.2", "v1.10", "v2.0"]. This avoids the lexicographical result ["v1.10", "v1.2", "v2.0"], where "1.10" sorts before "1.2" due to the "0" in "10" following "2".[](https://github.com/sourcefrog/natsort) File names often mix text and numbers, as in ["report10.pdf", "report2.pdf", "report100.pdf"]. Natural sorting chunks "report" (non-numeric), followed by the numeric part ("10", "2", "100"), and ".pdf" (non-numeric). Numerically ordering the middle chunks (2 < 10 < 100) produces ["report2.pdf", "report10.pdf", "report100.pdf"], contrasting with lexicographical order ["report10.pdf", "report100.pdf", "report2.pdf"], where "10" and "100" precede "2" due to the initial "1". This logical progression mirrors how users expect files to appear in directories.[](https://www.php.net/manual/en/function.strnatcmp.php) The following table summarizes additional examples, showing input lists, natural sort outputs, lexicographical outputs, and key chunking insights: | Input List | Natural Sort Output | Lexicographical Output | Key Chunk Insights | |-----------------------------|----------------------------------|----------------------------------|--------------------| | ["a2", "a10", "a1"] | ["a1", "a2", "a10"] | ["a1", "a10", "a2"] | Chunks: "a" (non-numeric) + numeric ("1" < "2" < "10"); numbers parsed as integers.[](https://github.com/sourcefrog/natsort) | | ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc2086.txt", "rfc822.txt"] | Chunks: "rfc" + numeric ("1" < "822" < "2086") + ".txt"; full numbers compared numerically.[](https://github.com/sourcefrog/natsort) | | ["img12.png", "img10.png", "img2.png", "img1.png"] | ["img1.png", "img2.png", "img10.png", "img12.png"] | ["img1.png", "img10.png", "img12.png", "img2.png"] | Chunks: "img" + numeric ("1" < "2" < "10" < "12") + ".png"; prevents "10" from sorting before "2".[](https://www.php.net/manual/en/function.strnatcmp.php) | | ["x2-g8", "x2-y08", "x2-y7"] | ["x2-g8", "x2-y7", "x2-y08"] | ["x2-g8", "x2-y08", "x2-y7"] | Chunks: "x" + "2" (numeric) + "-" + non-numeric ("g8", "y") + numeric ("8", "7", "08"); "7" < "08" numerically, ignoring leading zeros in comparison.[](https://github.com/sourcefrog/natsort) | For a visual representation of chunking in natural sorting, consider the string "apple10". The process can be depicted as a simple flowchart:

This adds optional stripping for leading zeros and locale-based string comparison, building on standard tokenization.[](http://sourcefrog.net/projects/natsort/)[](https://natsort.readthedocs.io/en/7.1.1/locale_issues.html) ## Examples and Applications ### Illustrative Examples To illustrate [natural sort order](/page/Natural_sort_order), consider a simple list of strings containing alphanumeric elements: ["apple2", "apple10", "apple1"]. In lexicographical sorting, this would order as ["apple1", "apple10", "apple2"] because "1" precedes "2" after the common prefix "apple", and "10" is treated as starting with "1". However, natural sorting chunks the strings into alternating non-numeric and numeric segments, comparing non-numeric parts lexicographically and numeric parts numerically. For these strings, the chunks are: "apple" (non-numeric) followed by "2" (numeric) for the first; "apple" + "10" (numeric) for the second; "apple" + "1" (numeric) for the third. Numerically, 1 < 2 < 10, yielding the output ["apple1", "apple2", "apple10"], which aligns with intuitive human expectations.[](https://www.php.net/manual/en/function.strnatcmp.php) Version numbering provides another clear demonstration, where strings like ["v1.10", "v1.2", "v2.0"] are sorted. Natural order splits by delimiters such as dots to parse multi-level numeric segments, treating each as a separate numeric chunk while handling non-numeric parts (e.g., "v") lexicographically. Chunks for the first are "v" + "1" (numeric) + "." + "10" (numeric); for the second, "v" + "1" + "." + "2"; for the third, "v" + "2" + "." + "0". Comparing level-by-level: the first two share "v1.", but 2 < 10, so "v1.2" precedes "v1.10"; then "v2.0" follows as 2 > 1, resulting in ["v1.2", "v1.10", "v2.0"]. This avoids the lexicographical result ["v1.10", "v1.2", "v2.0"], where "1.10" sorts before "1.2" due to the "0" in "10" following "2".[](https://github.com/sourcefrog/natsort) File names often mix text and numbers, as in ["report10.pdf", "report2.pdf", "report100.pdf"]. Natural sorting chunks "report" (non-numeric), followed by the numeric part ("10", "2", "100"), and ".pdf" (non-numeric). Numerically ordering the middle chunks (2 < 10 < 100) produces ["report2.pdf", "report10.pdf", "report100.pdf"], contrasting with lexicographical order ["report10.pdf", "report100.pdf", "report2.pdf"], where "10" and "100" precede "2" due to the initial "1". This logical progression mirrors how users expect files to appear in directories.[](https://www.php.net/manual/en/function.strnatcmp.php) The following table summarizes additional examples, showing input lists, natural sort outputs, lexicographical outputs, and key chunking insights: | Input List | Natural Sort Output | Lexicographical Output | Key Chunk Insights | |-----------------------------|----------------------------------|----------------------------------|--------------------| | ["a2", "a10", "a1"] | ["a1", "a2", "a10"] | ["a1", "a10", "a2"] | Chunks: "a" (non-numeric) + numeric ("1" < "2" < "10"); numbers parsed as integers.[](https://github.com/sourcefrog/natsort) | | ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc822.txt", "rfc2086.txt"] | ["rfc1.txt", "rfc2086.txt", "rfc822.txt"] | Chunks: "rfc" + numeric ("1" < "822" < "2086") + ".txt"; full numbers compared numerically.[](https://github.com/sourcefrog/natsort) | | ["img12.png", "img10.png", "img2.png", "img1.png"] | ["img1.png", "img2.png", "img10.png", "img12.png"] | ["img1.png", "img10.png", "img12.png", "img2.png"] | Chunks: "img" + numeric ("1" < "2" < "10" < "12") + ".png"; prevents "10" from sorting before "2".[](https://www.php.net/manual/en/function.strnatcmp.php) | | ["x2-g8", "x2-y08", "x2-y7"] | ["x2-g8", "x2-y7", "x2-y08"] | ["x2-g8", "x2-y08", "x2-y7"] | Chunks: "x" + "2" (numeric) + "-" + non-numeric ("g8", "y") + numeric ("8", "7", "08"); "7" < "08" numerically, ignoring leading zeros in comparison.[](https://github.com/sourcefrog/natsort) | For a visual representation of chunking in natural sorting, consider the string "apple10". The process can be depicted as a simple flowchart:

Start | v Scan characters: "a" (letter) → Non-numeric chunk: "apple" | v Next: "1" (digit) → Start numeric chunk: "10" | v End of string → Compare chunks: Non-numeric "apple" (equal), then numeric 10 | v End

This flowchart outlines the sequential parsing into non-numeric ("apple") and numeric ("10") chunks for comparison against other strings.[](https://www.php.net/manual/en/function.strnatcmp.php) ### Practical Uses Natural sort order finds widespread application in file systems and graphical explorers, where it ensures that filenames containing numbers are ordered intuitively. In Windows Explorer, this is achieved through the StrCmpLogicalW function, which compares strings by treating embedded digits as numerical values rather than text, preventing sequences like "file2" from appearing after "file10".[](https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strcmplogicalw)[](https://stackoverflow.com/questions/442429/windows-explorer-sort-method) Similarly, macOS Finder employs natural sorting for directory listings, sorting files such as sequentially numbered images (e.g., "image1.jpg", "image2.jpg", "image10.jpg") in numerical progression to align with user expectations.[](https://stackoverflow.com/questions/71121070/mac-os-finder-sorting)[](https://discussions.apple.com/thread/1431921) In databases and querying systems, natural sort order enhances the organization of results for alphanumeric identifiers. PostgreSQL supports this via extensions like pg_natural_sort_order, which enables custom ORDER BY clauses to handle mixed text and numbers in ID columns, producing more readable outputs for user-facing reports such as inventory lists.[](https://github.com/Zeleo/pg_natural_sort_order) In MySQL, natural sorting is implemented by combining LENGTH() with the column in ORDER BY (e.g., ORDER BY LENGTH(id_column), id_column), effectively grouping strings by length before alphabetical ordering, which is particularly useful for querying product IDs or version numbers.[](https://www.copterlabs.com/natural-sorting-in-mysql/)[](https://stackoverflow.com/questions/153633/natural-sort-in-mysql) This approach improves the relevance of sorted query results in applications like customer reports. User interfaces in domains such as e-commerce leverage natural sort order to present lists that match human intuition, avoiding disruptions in navigation. For instance, product catalogs with alphanumeric SKUs (e.g., "Part-A1", "Part-A2", "Part-A10") are sorted to place "Part-A10" after "Part-A2", facilitating easier browsing and selection in online stores.[](https://baymard.com/blog/essential-sort-types)[](https://ux.stackexchange.com/questions/40726/default-sort-order-of-a-product-list) Software version histories and directory listings also benefit, as natural ordering ensures chronological or logical progression without misleading placements. In data processing workflows, natural sort order is essential for managing mixed alphanumeric datasets. ETL pipelines apply it during transformation stages to correctly sequence part numbers or codes (e.g., "Component-001", "Component-010"), preventing errors in downstream analysis or reporting.[](https://stackoverflow.com/questions/69355228/natural-sorting-when-characters-and-numbers-mixed) In spreadsheets like Microsoft Excel, custom sorts approximate natural order by creating helper columns to separate text and numeric components, then sorting hierarchically; this handles mixed data such as inventory tags effectively.[](https://www.ablebits.com/office-addins-blog/sort-mixed-multilevel-numbers-text-excel/)[](https://support.microsoft.com/en-us/office/sort-data-in-a-range-or-table-in-excel-62d0b95d-2a90-4610-a6ae-2e545c4a4654) The practical benefits of natural sort order include reduced user confusion and improved efficiency in everyday interactions. A common case study is media file organization, where lexicographical sorting might place "episode10" before "episode2", but natural ordering corrects this to follow human sequencing, minimizing errors in playback or archiving.[](https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/) Despite these advantages, natural sort order has limitations, particularly in performance-critical environments. It incurs higher computational overhead than basic lexicographical sorting due to the need for string parsing and numerical extraction during comparisons, which can slow down operations on large-scale datasets.[](https://dev.to/grahamthedev/alphanumeric-natural-sort-in-mysql-30-years-and-we-still-cant-do-this-402c) Additionally, it is not the default behavior in many tools, requiring custom implementations or extensions that may introduce compatibility issues.[](https://stackoverflow.com/questions/9173558/postgresql-order-by-issue-natural-sort) ## Programming Implementations ### In Python Python's built-in `sorted()` function and `list.sort()` method perform lexicographical sorting by default, which treats strings as sequences of characters based on their Unicode code points, leading to counterintuitive results for human-readable data containing embedded numbers, such as sorting "file10.txt" after "file2.txt". To implement natural sort order, developers must provide a custom key function that tokenizes strings into comparable chunks, separating alphanumeric and numeric parts for numerical comparison of digits. A common standard library approach uses the `re` module for regex-based splitting combined with the `key` parameter of `sorted()`. The function `natural_key` processes each string by splitting it on sequences of one or more digits using the pattern `r'(\d+)'`, which captures numeric groups while preserving non-numeric parts as strings; numeric chunks are then converted to integers for proper ordering. This method can be applied as follows: ```python import re def natural_key(s): return [int(c) if c.isdigit() else c for c in re.split(r'(\d+)', s)]

This flowchart outlines the sequential parsing into non-numeric ("apple") and numeric ("10") chunks for comparison against other strings.[](https://www.php.net/manual/en/function.strnatcmp.php) ### Practical Uses Natural sort order finds widespread application in file systems and graphical explorers, where it ensures that filenames containing numbers are ordered intuitively. In Windows Explorer, this is achieved through the StrCmpLogicalW function, which compares strings by treating embedded digits as numerical values rather than text, preventing sequences like "file2" from appearing after "file10".[](https://learn.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-strcmplogicalw)[](https://stackoverflow.com/questions/442429/windows-explorer-sort-method) Similarly, macOS Finder employs natural sorting for directory listings, sorting files such as sequentially numbered images (e.g., "image1.jpg", "image2.jpg", "image10.jpg") in numerical progression to align with user expectations.[](https://stackoverflow.com/questions/71121070/mac-os-finder-sorting)[](https://discussions.apple.com/thread/1431921) In databases and querying systems, natural sort order enhances the organization of results for alphanumeric identifiers. PostgreSQL supports this via extensions like pg_natural_sort_order, which enables custom ORDER BY clauses to handle mixed text and numbers in ID columns, producing more readable outputs for user-facing reports such as inventory lists.[](https://github.com/Zeleo/pg_natural_sort_order) In MySQL, natural sorting is implemented by combining LENGTH() with the column in ORDER BY (e.g., ORDER BY LENGTH(id_column), id_column), effectively grouping strings by length before alphabetical ordering, which is particularly useful for querying product IDs or version numbers.[](https://www.copterlabs.com/natural-sorting-in-mysql/)[](https://stackoverflow.com/questions/153633/natural-sort-in-mysql) This approach improves the relevance of sorted query results in applications like customer reports. User interfaces in domains such as e-commerce leverage natural sort order to present lists that match human intuition, avoiding disruptions in navigation. For instance, product catalogs with alphanumeric SKUs (e.g., "Part-A1", "Part-A2", "Part-A10") are sorted to place "Part-A10" after "Part-A2", facilitating easier browsing and selection in online stores.[](https://baymard.com/blog/essential-sort-types)[](https://ux.stackexchange.com/questions/40726/default-sort-order-of-a-product-list) Software version histories and directory listings also benefit, as natural ordering ensures chronological or logical progression without misleading placements. In data processing workflows, natural sort order is essential for managing mixed alphanumeric datasets. ETL pipelines apply it during transformation stages to correctly sequence part numbers or codes (e.g., "Component-001", "Component-010"), preventing errors in downstream analysis or reporting.[](https://stackoverflow.com/questions/69355228/natural-sorting-when-characters-and-numbers-mixed) In spreadsheets like Microsoft Excel, custom sorts approximate natural order by creating helper columns to separate text and numeric components, then sorting hierarchically; this handles mixed data such as inventory tags effectively.[](https://www.ablebits.com/office-addins-blog/sort-mixed-multilevel-numbers-text-excel/)[](https://support.microsoft.com/en-us/office/sort-data-in-a-range-or-table-in-excel-62d0b95d-2a90-4610-a6ae-2e545c4a4654) The practical benefits of natural sort order include reduced user confusion and improved efficiency in everyday interactions. A common case study is media file organization, where lexicographical sorting might place "episode10" before "episode2", but natural ordering corrects this to follow human sequencing, minimizing errors in playback or archiving.[](https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/) Despite these advantages, natural sort order has limitations, particularly in performance-critical environments. It incurs higher computational overhead than basic lexicographical sorting due to the need for string parsing and numerical extraction during comparisons, which can slow down operations on large-scale datasets.[](https://dev.to/grahamthedev/alphanumeric-natural-sort-in-mysql-30-years-and-we-still-cant-do-this-402c) Additionally, it is not the default behavior in many tools, requiring custom implementations or extensions that may introduce compatibility issues.[](https://stackoverflow.com/questions/9173558/postgresql-order-by-issue-natural-sort) ## Programming Implementations ### In Python Python's built-in `sorted()` function and `list.sort()` method perform lexicographical sorting by default, which treats strings as sequences of characters based on their Unicode code points, leading to counterintuitive results for human-readable data containing embedded numbers, such as sorting "file10.txt" after "file2.txt". To implement natural sort order, developers must provide a custom key function that tokenizes strings into comparable chunks, separating alphanumeric and numeric parts for numerical comparison of digits. A common standard library approach uses the `re` module for regex-based splitting combined with the `key` parameter of `sorted()`. The function `natural_key` processes each string by splitting it on sequences of one or more digits using the pattern `r'(\d+)'`, which captures numeric groups while preserving non-numeric parts as strings; numeric chunks are then converted to integers for proper ordering. This method can be applied as follows: ```python import re def natural_key(s): return [int(c) if c.isdigit() else c for c in re.split(r'(\d+)', s)]

Example usage

files = ['file2.txt', 'file10.txt', 'file1.txt'] sorted_files = sorted(files, key=natural_key)

Result: ['file1.txt', 'file2.txt', 'file10.txt']

The regex pattern `r'(\d+)'` ensures that digits are isolated as capturable groups, allowing the split to produce alternating non-numeric and numeric tokens, which are then normalized into a tuple-like list for comparison during sorting. This avoids the need for `functools.cmp_to_key`, as the key function suffices for most cases and is more efficient. For more robust and flexible natural sorting, third-party libraries like `natsort` extend this concept with optimized implementations. The `natsort` module, available via PyPI, provides the `natsorted()` function, which handles numbers as integers or floats, supports version sorting, and manages signed numbers and decimals. An example is: ```python from natsort import natsorted files = ['a10', 'a2', 'a1'] sorted_files = natsorted(files) # Result: ['a1', 'a2', 'a10']

The regex pattern `r'(\d+)'` ensures that digits are isolated as capturable groups, allowing the split to produce alternating non-numeric and numeric tokens, which are then normalized into a tuple-like list for comparison during sorting. This avoids the need for `functools.cmp_to_key`, as the key function suffices for most cases and is more efficient. For more robust and flexible natural sorting, third-party libraries like `natsort` extend this concept with optimized implementations. The `natsort` module, available via PyPI, provides the `natsorted()` function, which handles numbers as integers or floats, supports version sorting, and manages signed numbers and decimals. An example is: ```python from natsort import natsorted files = ['a10', 'a2', 'a1'] sorted_files = natsorted(files) # Result: ['a1', 'a2', 'a10']

Internally, natsort employs regex to parse strings into numeric and non-numeric components, similar to the standard approach but with additional features like recursive handling for nested data structures. Regarding performance, the regex-based key function adds computational overhead compared to plain lexicographical sorting due to repeated splitting and type conversions; benchmarks on lists of 1000 alphanumeric strings show it can be 25-50 times slower than native sorted(), though this is acceptable for most applications where correctness outweighs speed. Libraries like natsort mitigate this with caching and optimized parsing, reducing the gap for repeated sorts. Python's sorted() has been stable—preserving relative order of equal elements—since version 2.2, with explicit documentation of this guarantee strengthening in Python 3.x releases, including 3.6 onward, which also benefited from broader performance optimizations in the .

In Java and Other Languages

In PHP, natural sort order is supported natively through functions like natsort() and strnatcmp(), introduced in PHP 4. To sort an array while preserving key-value associations, natsort() can be used directly:

php

$files = ['file2.txt', 'file10.txt', 'file1.txt']; natsort($files); // Result: ['file1.txt', 'file2.txt', 'file10.txt']

$files = ['file2.txt', 'file10.txt', 'file1.txt']; natsort($files); // Result: ['file1.txt', 'file2.txt', 'file10.txt']

The strnatcmp() function provides a comparison callback for custom sorting, handling numeric segments as integers. These functions follow the natural order algorithm similar to Martin Pool's implementation and are efficient for typical use cases. In Java, natural sort order for strings is typically implemented using a custom Comparator<String> that parses input strings to distinguish between textual and numeric segments, comparing the former lexicographically and the latter numerically to avoid issues like "file10" sorting before "file2". This approach often involves iterating over characters or using regular expressions to extract chunks, then applying Integer.parseInt to numeric portions for integer comparison. For example, the following comparator splits strings at transitions between letters and digits, compares non-numeric parts as strings, and numeric parts as integers:

java

import java.util.*; public class NaturalSortComparator implements Comparator<String> { public static int comparator(String s1, String s2) { String[] pt1 = s1.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))"); String[] pt2 = s2.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))"); if (Arrays.equals(pt1, pt2)) return 0; for (int i = 0; i < Math.min(pt1.length, pt2.length); i++) { if (!pt1[i].equals(pt2[i])) { if (!isNumber(pt1[i]) || !isNumber(pt2[i])) { return pt1[i].compareTo(pt2[i]) > 0 ? 1 : -1; } else { int nu1 = Integer.parseInt(pt1[i]); int nu2 = Integer.parseInt(pt2[i]); return Integer.compare(nu1, nu2); } } } return pt1.length > pt2.length ? 1 : -1; } private static boolean isNumber(String s) { return s.matches("-?\\d+"); } @Override public int compare(String s1, String s2) { return comparator(s1, s2); } }

import java.util.*; public class NaturalSortComparator implements Comparator<String> { public static int comparator(String s1, String s2) { String[] pt1 = s1.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))"); String[] pt2 = s2.split("((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))"); if (Arrays.equals(pt1, pt2)) return 0; for (int i = 0; i < Math.min(pt1.length, pt2.length); i++) { if (!pt1[i].equals(pt2[i])) { if (!isNumber(pt1[i]) || !isNumber(pt2[i])) { return pt1[i].compareTo(pt2[i]) > 0 ? 1 : -1; } else { int nu1 = Integer.parseInt(pt1[i]); int nu2 = Integer.parseInt(pt2[i]); return Integer.compare(nu1, nu2); } } } return pt1.length > pt2.length ? 1 : -1; } private static boolean isNumber(String s) { return s.matches("-?\\d+"); } @Override public int compare(String s1, String s2) { return comparator(s1, s2); } }

This can be used as Arrays.sort(files, new NaturalSortComparator()); for an array of filenames. Another widely adopted implementation is the AlphanumComparator, based on Dave Koelle's algorithm, which iterates through characters while switching between alphanumeric and numeric modes without regex for better performance, comparing digits chunk-by-chunk to handle variable lengths. Utility libraries provide pre-built comparators; for instance, the AlphanumComparator from open-source projects like JMRI offers a ready-to-use class for natural sorting, supporting case sensitivity options. In , natural sorting is commonly achieved by customizing Array.prototype.sort() with a comparison function that uses regex to pad numeric segments to a fixed length (e.g., 10 digits) with leading zeros, enabling lexical comparison to mimic numerical order for typical use cases. For example:

javascript

const naturalSort = (a, b) => { const re = /(\d+)/g; return a.replace(re, (n) => n.padStart(10, '0')).localeCompare(b.replace(re, (n) => n.padStart(10, '0'))); }; ['file10', 'file2'].sort(naturalSort); // ['file2', 'file10']

const naturalSort = (a, b) => { const re = /(\d+)/g; return a.replace(re, (n) => n.padStart(10, '0')).localeCompare(b.replace(re, (n) => n.padStart(10, '0'))); }; ['file10', 'file2'].sort(naturalSort); // ['file2', 'file10']

This method relies on string padding rather than full , trading precision for simplicity in browser environments. In C#, a custom IComparer<string> is implemented similarly to , often using LINQ's OrderBy with the comparer for collections. The NaturalStringComparer class, for instance, iterates over characters to extract and compare numeric chunks numerically. Usage example: list.OrderBy(x => x, new NaturalStringComparer()).ToList();. Libraries like NaturalSort.Extension add extension methods for string comparisons supporting this order. For SQL databases, natural sorting varies by system. In , it lacks a built-in function but can be approximated via extensions or user-defined functions (UDFs); for example, a custom NATURAL_SORT UDF might use string functions to separate and compare parts, as in SELECT * FROM files ORDER BY NATURAL_SORT(filename);, though standard queries often rely on regex replacements in ORDER BY clauses for basic cases. In contrast, provides a built-in natural_sort_key() function for generating sort keys suitable for indexing, available since version 10.7 (released in 2022). Across languages, implementations commonly rely on either regex-based splitting/padding for simplicity or manual character iteration with mode-switching for efficiency and precision. The table below compares representative syntax snippets:
LanguageKey Snippet Example
Comparator<String> cmp = (s1, s2) -> { /* split and [Integer](/page/Integer).parseInt chunks */ }; Arrays.sort(arr, cmp);
arr.sort((a, b) => a.replace(/\d+/g, n => n.padStart(10, '0')).localeCompare(b.replace(/\d+/g, n => n.padStart(10, '0'))));
C#list.OrderBy(x => x, new NaturalStringComparer()); where comparer iterates chars and compares numerically.
Java's emphasis on explicit Comparator implementations highlights challenges like string immutability, which necessitates creating temporary parsed objects, contrasting with more flexible key-extraction in dynamic languages. For internationalization, the International Components for Unicode (ICU) library is widely adopted across Java, JavaScript (via Intl), and C#, providing numeric collation attributes that enable natural sorting of mixed alphanumeric content in locale-aware manners, such as treating digits uniformly regardless of script.
Add your contribution
Related Hubs
User Avatar
No comments yet.