Hubbry Logo
Comparison of programming languages (string functions)Comparison of programming languages (string functions)Main
Open search
Comparison of programming languages (string functions)
Community hub
Comparison of programming languages (string functions)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Comparison of programming languages (string functions)
Comparison of programming languages (string functions)
from Wikipedia

String functions are used in computer programming languages to manipulate a string or query information about a string (some do both).

Most programming languages that have a string datatype will have some string functions although there may be other low-level ways within each language to handle strings directly. In object-oriented languages, string functions are often implemented as properties and methods of string objects. In functional and list-based languages a string is represented as a list (of character codes), therefore all list-manipulation procedures could be considered string functions. However such languages may implement a subset of explicit string-specific functions as well.

For function that manipulate strings, modern object-oriented languages, like C# and Java have immutable strings and return a copy (in newly allocated dynamic memory), while others, like C manipulate the original string unless the programmer copies data to a new string. See for example Concatenation below.

The most basic example of a string function is the length(string) function. This function returns the length of a string literal.

e.g. length("hello world") would return 11.

Other languages may have string functions with similar or exactly the same syntax or parameters or outcomes. For example, in many languages the length function is usually represented as len(string). The below list of common functions aims to help limit this confusion.

Common string functions (multi language reference)

[edit]

String functions common to many languages are listed below, including the different names used. The below list of common functions aims to help programmers find the equivalent function in a language. Note, string concatenation and regular expressions are handled in separate pages. Statements in guillemets (« … ») are optional.

CharAt

[edit]
Definition charAt(string,integer) returns character.
Description Returns character at index in the string.
Equivalent See substring of length 1 character.
Format Languages Base
index
string[i] ALGOL 68, APL, Julia, Pascal, Object Pascal (Delphi), Seed7 1
string[i] C, C++, C#, Cobra, D, FreeBASIC, Go, Python,[1] PHP, Ruby,[1] Windows PowerShell, JavaScript, APL 0
string{i} PHP (deprecated in 5.3) 0
string(i) Ada ≥1
Mid(string,i,1) VB 1
MID$(string,i,1) BASIC 1
string.Chars(i) VB.NET 0
string(i:i) Fortran 1
string.charAt(i) Java, JavaScript 0
string.[i] OCaml, F# 0
string.chars().nth(i) Rust[2] 0
string[i,1] Pick Basic 1
String.sub (string, i) Standard ML 0
string !! i Haskell 0
(string-ref string i) Scheme 0
(char string i) Common Lisp 0
(elt string i) ISLISP 0
(get string i) Clojure 0
substr(string, i, 1) Perl 5[1] 0
substr(string, i, 1)
string.substr(i, 1)
Raku[3] 0
substr(string, i, 1) PL/I 1
string.at(i) C++ (STL) (w/ bounds checking) 0
lists:nth(i, string) Erlang 1
[string characterAtIndex:i] Objective-C (NSString * only) 0
string.sub(string, i, i)
(string):sub(i, i)
Lua[1] 1
string at: i Smalltalk (w/ bounds checking) 1
string index string i Tcl 0
StringTake[string, {i}] Mathematica, Wolfram Language[1] 1
string@i Eiffel 1
string (i:1) COBOL 1
${string_param:i:1} Bash 0
istring APL 0 or 1
{ Example in Pascal }
var
  MyStr: string = 'Hello, World';
  MyChar: Char;
begin
  MyChar := MyStr[2];          // 'e'
# Example in ALGOL 68 #
"Hello, World"[2];             // 'e'
// Example in C
#include <stdio.h>

char myStr1[] = "Hello, World";
printf("%c", *(myStr1 + 1));      // 'e'
printf("%c", *(myStr1 + 7));      // 'W'
printf("%c", myStr1[11]);       // 'd'
printf("%s", myStr1);           // 'Hello, World'
printf("%s", "Hello(2), World(2)"); // 'Hello(2), World(2)'
import std;

using std::string;

char myStr1[] = "Hello(1), World(1)";
string myStr2 = "Hello(2), World(2)";
std::println("Hello(3), World(3)");  // 'Hello(3), World(3)'
std::println("{}", myStr2[6]);             // '2'
std::println("{}", myStr1.substr(5, 3));  // '(1)'
// Example in C#
"Hello, World"[2];             // 'l'
# Example in Perl 5
substr("Hello, World", 1, 1);  # 'e'
# Examples in Python
"Hello, World"[2]              #  'l'
"Hello, World"[-3]             #  'r'
# Example in Raku
"Hello, World".substr(1, 1);   # 'e'
' Example in Visual Basic
Mid("Hello, World",2,1)
' Example in Visual Basic .NET
"Hello, World".Chars(2)    '  "l"c
" Example in Smalltalk "
'Hello, World' at: 2.        "$e"
//Example in Rust
"Hello, World".chars().nth(2);   // Some('l')

Compare (integer result)

[edit]
Definition compare(string1,string2) returns integer.
Description Compares two strings to each other. If they are equivalent, a zero is returned. Otherwise, most of these routines will return a positive or negative result corresponding to whether string1 is lexicographically greater than, or less than, respectively, than string2. The exceptions are the Scheme and Rexx routines which return the index of the first mismatch, and Smalltalk which answer a comparison code telling how the receiver sorts relative to string parameter.
Format Languages
IF string1<string2 THEN -1 ELSE ABS (string1>string2) FI ALGOL 68
cmp(string1, string2) Python 2
(string1 > string2) - (string1 < string2) Python
strcmp(string1, string2) C, PHP
std.string.cmp(string1, string2) D
StrComp(string1, string2) VB, Object Pascal (Delphi)
string1 cmp string2 Perl, Raku
string1 compare: string2 Smalltalk (Squeak, Pharo)
string1 <=> string2 Ruby, C++ (STL, C++20)[4]
string1.compare(string2) C++ (STL), Swift (Foundation)
compare(string1, string2) Rexx, Seed7
compare(string1, string2, pad) Rexx
CompareStr(string1, string2) Pascal, Object Pascal (Delphi)
string1.compareTo(string2) Cobra, Java
string1.CompareTo(string2) VB .NET, C#, F#
(compare string1 string2) Clojure
(string= string1 string2) Common Lisp
(string-compare string1 string2 p< p= p>) Scheme (SRFI 13)
(string= string1 string2) ISLISP
compare string1 string2 OCaml
String.compare (string1, string2) Standard ML[5]
compare string1 string2 Haskell[6]
[string]::Compare(string1, string2) Windows PowerShell
[string1 compare:string2] Objective-C (NSString * only)
LLT(string1,string2)
LLE(string1,string2)
LGT(string1,string2)
LGE(string1,string2)
Fortran[7]
string1.localeCompare(string2) JavaScript
bytes.Compare([]byte(string1), []byte(string2)) Go
string compare string1 string2 Tcl
compare(string1,string2,count) PL/I[8]
string1.cmp(string2) Rust[9]
# Example in Perl 5
"hello" cmp "world";       # returns -1
# Example in Python
cmp("hello", "world")      # returns -1
# Examples in Raku
"hello" cmp "world";       # returns Less
"world" cmp "hello";       # returns More
"hello" cmp "hello";       # returns Same
/** Example in Rexx */
compare("hello", "world")  /* returns index of mismatch: 1 */
; Example in Scheme
(use-modules (srfi srfi-13))
; returns index of mismatch: 0
(string-compare "hello" "world" values values values)

Compare (relational operator-based, Boolean result)

[edit]
Definition string1 OP string2 OR (compare string1 string2) returns Boolean.
Description Lexicographically compares two strings using a relational operator or function. Boolean result returned.
Format Languages
string1 OP string2, where OP can be any of =, <>, <, >, <= and >= Pascal, Object Pascal (Delphi), OCaml, Seed7, Standard ML, BASIC, VB, VB .NET, F#
string1 OP string2, where OP can be any of =, /=, ≠, <, >, <=, ≤ and ; Also: EQ, NE, LT, LE, GE and GT ALGOL 68
(stringOP? string1 string2), where OP can be any of =, -ci=, <, -ci<, >, -ci>, <=, -ci<=, >= and -ci>= (operators starting with '-ci' are case-insensitive) Scheme
(stringOP string1 string2), where OP can be any of =, -ci=, <>, -ci<>, <, -ci<, >, -ci>, <=, -ci<=, >= and -ci>= (operators starting with '-ci' are case-insensitive) Scheme (SRFI 13)
(stringOP string1 string2), where OP can be any of =, -equal, /=, -not-equal, <, -lessp, >, -greaterp, <=, -not-greaterp, >= and -not-lessp (the verbal operators are case-insensitive) Common Lisp
(stringOP string1 string2), where OP can be any of =, /=, <, >, <=, and >= ISLISP
string1 OP string2, where OP can be any of =, \=, <, >, <= and >=[10] Rexx
string1 OP string2, where OP can be any of =, ¬=, <, >, <=, >=, ¬< and ¬>[11] PL/I
string1 OP string2, where OP can be any of =, /=, <, >, <= and >= Ada
string1 OP string2, where OP can be any of ==, /=, <, >, =< and >= Erlang
string1 OP string2, where OP can be any of ==, /=, <, >, <= and >= Haskell
string1 OP string2, where OP can be any of eq, ne, lt, gt, le and ge Perl, Raku
string1 OP string2, where OP can be any of ==, !=, <, >, <= and >= C++ (STL), C#, D, Go, JavaScript, Python, PHP, Ruby, Rust,[12] Swift
string1 OP string2, where OP can be any of -eq, -ceq, -ne, -cne, -lt, -clt, -gt, -cgt, -le, -cle, -ge, and -cge (operators starting with 'c' are case-sensitive) Windows PowerShell
string1 OP string2, where OP can be any of ==, ~=, <, >, <= and >= Lua
string1 OP string2, where OP can be any of =, ~=, <, >, <= and >= Smalltalk
string1 OP string2, where OP can be any of ==, /=, <, >, <= and >=; Also: .EQ., .NE., .LT., .LE., .GT. and .GE. Fortran.[13]
string1 OP string2 where OP can be any of =, <>, <, >, <=, >= as well as worded equivalents COBOL
string1 OP string2 where OP can be any of ==, <>, <, >, <= and >= Cobra
string1 OP string2 is available in the syntax, but means comparison of the pointers pointing to the strings, not of the string contents. Use the Compare (integer result) function. C, Java
string1.METHOD(string2) where METHOD is any of eq, ne, gt, lt, ge, le Rust[12]
% Example in Erlang
"hello" > "world".            % returns false
# Example in Raku
"art" gt "painting";           # returns False
"art" lt "painting";           # returns True
# Example in Windows PowerShell
"hello" -gt "world"           # returns false
;; Example in Common Lisp
(string> "art" "painting")      ; returns nil
(string< "art" "painting")      ; returns non nil

Concatenation

[edit]
Definition concatenate(string1,string2) returns string.
Description Concatenates (joins) two strings to each other, returning the combined string. Note that some languages like C have mutable strings, so really the second string is being appended to the first string and the mutated string is returned.
Format Languages
string1 adjacent_to string2 Rexx (abutment, equivalent to string1 || string2)
string1 whitespace string2 Rexx (equivalent to string1 || ' ' || string2)
string1 & string2 Ada, FreeBASIC, Seed7, BASIC, VB, VB .NET, COBOL (between literals only)
strcat(string1, string2) C, C++ (char * only)[14]
string1 . string2 Perl, PHP
string1 + string2 ALGOL 68, C++ (STL), C#, Cobra, FreeBASIC, Go, Pascal, Object Pascal (Delphi), Java, JavaScript, Windows PowerShell, Python, Ruby, Rust,[15] F#, Swift, Turing, VB
string1 ~ string2 D, Raku
(string-append string1 string2) Scheme, ISLISP
(concatenate 'string string1 string2) Common Lisp
(str string1 string2) Clojure
string1 || string2 Rexx, SQL, PL/I
string1 // string2 Fortran
string1 ++ string2 Erlang, Haskell
string1 ^ string2 OCaml, Standard ML, F#
[string1 stringByAppendingString:string2] Objective-C (NSString * only)
string1 .. string2 Lua
string1 , string2 Smalltalk, APL
string1 string2 SNOBOL
string1string2 Bash
string1 <> string2 Mathematica
concat string1 string2 Tcl
{ Example in Pascal }
'abc' + 'def';      // returns "abcdef"
// Example in C#
"abc" + "def";      // returns "abcdef"
' Example in Visual Basic
"abc" & "def"       '  returns "abcdef"
"abc" + "def"       '  returns "abcdef"
"abc" & Null        '  returns "abc"
"abc" + Null        '  returns Null
// Example in D
"abc" ~ "def";      // returns "abcdef"
;; Example in common lisp
(concatenate 'string "abc " "def " "ghi")  ; returns "abc def ghi"
# Example in Perl 5
"abc" . "def";      # returns "abcdef"
"Perl " . 5;        # returns "Perl 5"
/* Example in PL/I */
"abc" || "def"      /* returns "abcdef" */
# Example in Raku
"abc" ~ "def";      # returns "abcdef"
"Perl " ~ 6;        # returns "Perl 6"
/* Example in Rexx */
"Strike"2           /* returns "Strike2" */
"Strike"    2       /* returns "Strike 2" */

Contains

[edit]
Definition contains(string,substring) returns boolean
Description Returns whether string contains substring as a substring. This is equivalent to using Find and then detecting that it does not result in the failure condition listed in the third column of the Find section. However, some languages have a simpler way of expressing this test.
Related Find
Format Languages
string_in_string(string, loc int, substring) ALGOL 68
ContainsStr(string, substring) Object Pascal (Delphi)
strstr(string, substring) != NULL C, C++ (char * only)
string.Contains(substring) C#, VB .NET, Windows PowerShell, F#
string.contains(substring) Cobra, Java (1.5+), Raku, Rust,[16] C++ (C++23)[17]
string.indexOf(substring) >= 0 JavaScript
strpos(string, substring) !== false PHP
str_contains(string, substring) PHP (8+)
pos(string, substring) <> 0 Seed7
substring in string Cobra, Python (2.3+)
string.find(string, substring) ~= nil Lua
string.include?(substring) Ruby
Data.List.isInfixOf substring string Haskell (GHC 6.6+)
string includesSubstring: substring Smalltalk (Squeak, Pharo, Smalltalk/X)
String.isSubstring substring string Standard ML
(search substring string) Common Lisp
(not (null (string-index substring string))) ISLISP
(substring? substring string) Clojure
! StringFreeQ[string, substring] Mathematica
index(string, substring, startpos)>0 Fortran, PL/I[18]
index(string, substring, occurrence)>0 Pick Basic
strings.Contains(string, substring) Go
string.find(substring) != string::npos C++
[string containsString:substring] Objective-C (NSString * only, iOS 8+/OS X 10.10+)
string.rangeOfString(substring) != nil Swift (Foundation)
∨/substringstring APL
¢ Example in ALGOL 68 ¢
string in string("e", loc int, "Hello mate");      ¢ returns true ¢
string in string("z", loc int, "word");            ¢ returns false ¢
// Example In C#
"Hello mate".Contains("e");      // returns true
"word".Contains("z");            // returns false
#  Example in Python
"e" in "Hello mate"              #  returns true
"z" in "word"                    #  returns false
#  Example in Raku
"Good morning!".contains('z')    #  returns False
"¡Buenos días!".contains('í');   #  returns True
"  Example in Smalltalk "
'Hello mate' includesSubstring: 'e'  " returns true "
'word' includesSubstring: 'z'        " returns false "

Equality

[edit]

Tests if two strings are equal. See also #Compare and #Compare. Note that doing equality checks via a generic Compare with integer result is not only confusing for the programmer but is often a significantly more expensive operation; this is especially true when using "C-strings".

Format Languages
string1 == string2 Python, C++ (STL), C#, Cobra, Go, JavaScript (similarity), PHP (similarity), Ruby, Rust,[12] Erlang, Haskell, Lua, D, Mathematica, Swift
string1 === string2 JavaScript, PHP
string1 == string2
string1 .EQ. string2
Fortran
strcmp(string1, string2) == 0 C
(string=? string1 string2) Scheme
(string= string1 string2) Common Lisp, ISLISP
string1 = string2 ALGOL 68, Ada, Object Pascal (Delphi), OCaml, Pascal, Rexx, Seed7, Standard ML, BASIC, VB, VB .NET, F#, Smalltalk, PL/I, COBOL
test string1 = string2
[ string1 = string2 ]
Bourne Shell
string1 eq string2 Perl, Raku, Tcl
string1.equals(string2) Cobra, Java
string1.Equals(string2) C#
string1 -eq string2
[string]::Equals(string1, string2)
Windows PowerShell
[string1 isEqualToString:string2]
[string1 isEqual:string2]
Objective-C (NSString * only)
string1string2 APL
string1.eq(string2) Rust[12]
// Example in C#
"hello" == "world"           // returns false
' Example in Visual Basic
"hello" = "world"            '  returns false
# Examples in Perl 5
'hello' eq 'world'           # returns 0
'hello' eq 'hello'           # returns 1
# Examples in Raku
'hello' eq 'world'           # returns False
'hello' eq 'hello'           # returns True
# Example in Windows PowerShell
"hello" -eq "world"          #  returns false
⍝ Example in APL
'hello'  'world'          ⍝  returns 0


Find

[edit]
Definition find(string,substring) returns integer
Description Returns the position of the start of the first occurrence of substring in string. If the substring is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE.
Related instrrev
Format Languages If not found
string in string(substring, pos, string[startpos:]) ALGOL 68 returns BOOL: TRUE or FALSE, and position in REF INT pos.
InStr(«startposstring,substring) VB (positions start at 1) returns 0
INSTR$(string,substring) BASIC (positions start at 1) returns 0
index(string,substring) AWK returns 0
index(string,substring«,startpos») Perl 5 returns −1
index(string,substring«,startpos»)
string.index(substring,«,startpos»)
Raku returns Nil
instr(«startposstring,substring) FreeBASIC returns 0
strpos(string,substring«,startpos») PHP returns FALSE
locate(string, substring) Ingres returns string length + 1
strstr(string, substring) C, C++ (char * only, returns pointer to first character) returns NULL
std.string.indexOf(string, substring) D returns −1
pos(string, substring«, startpos») Seed7 returns 0
strings.Index(string, substring) Go returns −1
pos(substring, string) Pascal, Object Pascal (Delphi) returns 0
pos(substring, string«,startpos») Rexx returns 0
string.find(substring«,startpos») C++ (STL) returns std::string::npos
string.find(substring«,startpos«,endpos»») Python returns −1
string.index(substring«,startpos«,endpos»») raises ValueError
string.index(substring«,startpos») Ruby returns nil
string.indexOf(substring«,startpos») Java, JavaScript returns −1
string.IndexOf(substring«,startpos«, charcount»») VB .NET, C#, Windows PowerShell, F# returns −1
string:str(string, substring) Erlang returns 0
(string-contains string substring) Scheme (SRFI 13) returns #f
(search substring string) Common Lisp returns NIL
(string-index substring string) ISLISP returns nil
List.findIndex (List.isPrefixOf substring) (List.tails string) Haskell (returns only index) returns Nothing
Str.search_forward (Str.regexp_string substring) string 0 OCaml raises Not_found
Substring.size (#1 (Substring.position substring (Substring.full string))) Standard ML returns string length
[string rangeOfString:substring].location Objective-C (NSString * only) returns NSNotFound
string.find(string, substring)
(string):find(substring)
Lua returns nil
string indexOfSubCollection: substring startingAt: startpos ifAbsent: aBlock
string findString: substring startingAt: startpos
Smalltalk (Squeak, Pharo) evaluate aBlock which is a block closure (or any object understanding value)
returns 0
startpos = INDEX(string, substring «,back» «, kind») Fortran returns 0 if substring is not in string; returns LEN(string)+1 if substring is empty
POSITION(substring IN string) SQL returns 0 (positions start at 1)
index(string, substring, startpos ) PL/I[18] returns 0 (positions start at 1)
index(string, substring, occurrence ) Pick Basic returns 0 if occurrence of substring is not in string; (positions start at 1)
string.indexOf(substring«,startpos«, charcount»») Cobra returns −1
string first substring string startpos Tcl returns −1
(substringstring)⍳1 APL returns 1 + the last position in string
string.find(substring) Rust[19] returns None

Examples

  • Common Lisp
    (search "e" "Hello mate")             ;  returns 1
    (search "z" "word")                   ;  returns NIL
    
  • C#
    "Hello mate".IndexOf("e");            // returns 1
    "Hello mate".IndexOf("e", 4);         // returns 9
    "word".IndexOf("z");                  // returns -1
    
  • Raku
    "Hello, there!".index('e')           # returns 1
    "Hello, there!".index('z')           # returns Nil
    
  • Scheme
    (use-modules (srfi srfi-13))
    (string-contains "Hello mate" "e")    ;  returns 1
    (string-contains "word" "z")          ;  returns #f
    
  • Visual Basic
    ' Examples in
    InStr("Hello mate", "e")              '  returns 2
    InStr(5, "Hello mate", "e")           '  returns 10
    InStr("word", "z")                    '  returns 0
    
  • Smalltalk
    'Hello mate' indexOfSubCollection:'ate'  "returns 8"
    
    'Hello mate' indexOfSubCollection:'late' "returns 0"
    
    I'Hello mate'
        indexOfSubCollection:'late'
        ifAbsent:[ 99 ]                      "returns 99"
    
    'Hello mate'
        indexOfSubCollection:'late'
        ifAbsent:[ self error ]              "raises an exception"
    


Find character

[edit]
Definition find_character(string,char) returns integer
Description Returns the position of the start of the first occurrence of the character char in string. If the character is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE. This can be accomplished as a special case of #Find, with a string of one character; but it may be simpler or more efficient in many languages to locate just one character. Also, in many languages, characters and strings are different types, so it is convenient to have such a function.
Related find
Format Languages If not found
char in string(char, pos, string[startpos:]) ALGOL 68 returns BOOL: TRUE or FALSE, and position in REF INT pos.
instr(string, any char«,startpos») (char, can contain more them one char, in which case the position of the first appearance of any of them is returned.) FreeBASIC returns 0
strchr(string,char) C, C++ (char * only, returns pointer to character) returns NULL
std.string.find(string, dchar) D returns −1
string.find(char«,startpos») C++ (STL) returns std::string::npos
pos(string, char«, startpos») Seed7 returns 0
strings.IndexRune(string,char) Go returns −1
string.indexOf(char«,startpos») Java, JavaScript returns −1
string.IndexOf(char«,startpos«, charcount»») VB .NET, C#, Windows PowerShell, F# returns −1
(position char string) Common Lisp returns NIL
(char-index char string) ISLISP returns nil
List.elemIndex char string Haskell (returns Just index) returns Nothing
String.index string char OCaml raises Not_found
position = SCAN (string, set «, back» «, kind»)
position = VERIFY (string, set «, back» «, kind»)[a]
Fortran returns zero
string indexOf: char ifAbsent: aBlock
string indexOf: char
string includes: char
Smalltalk evaluate aBlock which is a BlockClosure (or any object understanding value)
returns 0
returns true or false
index(string, char, startpos ) PL/I[20] returns 0 (positions start at 1)
string.index(?char) Ruby returns nil
strpos(string,char,startpos) PHP returns false
string.indexOf(char«,startpos«, charcount»») Cobra returns −1
stringchar APL returns 1 + the last position in string
string.find(substring) Rust[19] returns None
// Examples in C#
"Hello mate".IndexOf('e');              // returns 1
"word".IndexOf('z')                     // returns -1
; Examples in Common Lisp
(position #\e "Hello mate")             ;  returns 1
(position #\z "word")                   ;  returns NIL

^a Given a set of characters, SCAN returns the position of the first character found,[21] while VERIFY returns the position of the first character that does not belong to the set.[22]

Format

[edit]
Definition format(formatstring, items) returns string
Description Returns the formatted string representation of one or more items.
Format Languages Format string syntax
associate(file, string); putf(file, $formatstring$, items) ALGOL 68 ALGOL
Format(item, formatstring) VB
sprintf(formatstring, items) Perl, PHP, Raku, Ruby C
item.fmt(formatstring) Raku C
io_lib:format(formatstring, items) Erlang
sprintf(outputstring, formatstring, items) C C
std::format(formatstring, items) C++ (C++20) Python
std.string.format(formatstring, items) D C
Format(formatstring, items) Object Pascal (Delphi)
fmt.Sprintf(formatstring, items) Go C
printf formatstring items Unix C
formatstring % (items) Python, Ruby C
formatstring.format(items) Python .NET
fformatstring Python 3
Printf.sprintf formatstring[23] items OCaml, F# C
Text.Printf.printf formatstring items Haskell (GHC) C
formatstring printf: items Smalltalk C
String.format(formatstring, items) Java C
String.Format(formatstring, items) VB .NET, C#, F# .NET
(format formatstring items) Scheme (SRFI 28) Lisp
(format nil formatstring items) Common Lisp Lisp
(format formatstring items) Clojure Lisp
formatstring -f items Windows PowerShell .NET
[NSString stringWithFormat:formatstring, items] Objective-C (NSString * only) C
String(format:formatstring, items) Swift (Foundation) C
string.format(formatstring, items)
(formatstring):format(items)
Lua C
WRITE (outputstring, formatstring) items Fortran Fortran
put string(string) edit(items)(format) PL/I PL/I (similar to Fortran)
String.format(formatstring, items) Cobra .NET
format formatstring items Tcl C
formatnumbersitems
formatstring ⎕FMT items
APL APL
format!(formatstring, items) Rust[24] Python
// Example in C#
String.Format("My {0} costs {1:C2}", "pen", 19.99); // returns "My pen costs $19.99"
// Example in Object Pascal (Delphi)
Format('My %s costs $%2f', ['pen', 19.99]);         // returns "My pen costs $19.99"
// Example in Java
String.format("My %s costs $%2f", "pen", 19.99);    // returns "My pen costs $19.99"
# Examples in Raku
sprintf "My %s costs \$%.2f", "pen", 19.99;          # returns "My pen costs $19.99"
1.fmt("%04d");                                       # returns "0001"
# Example in Python
"My %s costs $%.2f" % ("pen", 19.99);                #  returns "My pen costs $19.99"
"My {0} costs ${1:.2f}".format("pen", 19.99);        #  returns "My pen costs $19.99"
#Example in Python 3.6+
pen = "pen"
f"My {pen} costs {19.99}"                                          #returns "My pen costs 19.99"
; Example in Scheme
(format "My ~a costs $~1,2F" "pen" 19.99)           ;  returns "My pen costs $19.99"
/* example in PL/I */
put string(some_string) edit('My ', 'pen', ' costs', 19.99)(a,a,a,p'$$$V.99')
/* returns "My pen costs $19.99" */

Inequality

[edit]

Tests if two strings are not equal. See also #Equality.

Format Languages
string1 ne string2
string1 NE string2
ALGOL 68 – note: the operator "ne" is literally in bold type-font.
string1 /= string2 ALGOL 68, Ada, Erlang, Fortran, Haskell
string1 <> string2 BASIC, VB, VB .NET, Pascal, Object Pascal (Delphi), OCaml, PHP, Seed7, Standard ML, F#, COBOL, Cobra, Python 2 (deprecated)
string1 # string2 BASIC (some implementations)
string1 ne string2 Perl, Raku
(string<> string1 string2) Scheme (SRFI 13)
(string/= string1 string2) Common Lisp
(string/= string1 string2) ISLISP
(not= string1 string2) Clojure
string1 != string2 C++ (STL), C#, Go, JavaScript (not similar), PHP (not similar), Python, Ruby, Rust,[12] Swift, D, Mathematica
string1 !== string2 JavaScript, PHP
string1 \= string2 Rexx
string1 ¬= string2 PL/I
test string1 != string2
[ string1 != string2 ]
Bourne Shell
string1 -ne string2
-not [string]::Equals(string1, string2)
Windows PowerShell
string1 ~= string2 Lua, Smalltalk
string1string2 APL
string1.ne(string2) Rust[12]
// Example in C#
"hello" != "world"    // returns true
' Example in Visual Basic
"hello" <> "world"    '  returns true
;; Example in Clojure
(not= "hello" "world")  ; ⇒ true
# Example in Perl 5
'hello' ne 'world'      # returns 1
# Example in Raku
'hello' ne 'world'      # returns True
# Example in Windows PowerShell
"hello" -ne "world"   #  returns true

index

[edit]

see #Find

indexof

[edit]

see #Find

instr

[edit]

see #Find

instrrev

[edit]

see #rfind

join

[edit]
Definition join(separator, list_of_strings) returns a list of strings joined with a separator
Description Joins the list of strings into a new string, with the separator string between each of the substrings. Opposite of split.
Related sprintf
Format Languages
std.string.join(array_of_strings, separator) D
string:join(list_of_strings, separator) Erlang
join(separator, list_of_strings) Perl, PHP, Raku
implode(separator, array_of_strings) PHP
separator.join(sequence_of_strings) Python, Swift 1.x
array_of_strings.join(separator) Ruby, JavaScript, Raku, Rust[25]
(string-join array_of_strings separator) Scheme (SRFI 13)
(format nil "~{~a~^separator~}" array_of_strings) Common Lisp
(clojure.string/join separator list_of_strings)
(apply str (interpose separator list_of_strings))
Clojure
strings.Join(array_of_strings, separator) Go
join(array_of_strings, separator) Seed7
String.concat separator list_of_strings OCaml
String.concatWith separator list_of_strings Standard ML
Data.List.intercalate separator list_of_strings Haskell (GHC 6.8+)
Join(array_of_strings, separator) VB
String.Join(separator, array_of_strings) VB .NET, C#, F#
String.join(separator, array_of_strings) Java 8+
&{$OFS=$separator; "$array_of_strings"}
array_of_strings -join separator
Windows PowerShell
[array_of_strings componentsJoinedByString:separator] Objective-C (NSString * only)
table.concat(table_of_strings, separator) Lua
{|String streamContents: [ :stream | collectionOfAnything asStringOn: stream delimiter: separator ]
collectionOfAnything joinUsing: separator
Smalltalk (Squeak, Pharo)
array_of_strings.join(separator«, final_separator») Cobra
sequence_of_strings.joinWithSeparator(separator) Swift 2.x
1↓∊separatorlist_of_strings APL
// Example in C#
String.Join("-", {"a", "b", "c"})  // "a-b-c"
" Example in Smalltalk "
#('a' 'b' 'c') joinUsing: '-'      " 'a-b-c' "
# Example in Perl 5
join( '-', ('a', 'b', 'c'));       # 'a-b-c'
# Example in Raku
<a b c>.join('-');       # 'a-b-c'
# Example in Python
"-".join(["a", "b", "c"])          #  'a-b-c'
# Example in Ruby
["a", "b", "c"].join("-")          #  'a-b-c'
; Example in Scheme
(use-modules (srfi srfi-13))
(string-join '("a" "b" "c") "-")   ;  "a-b-c"

lastindexof

[edit]

see #rfind

left

[edit]
Definition left(string,n) returns string
Description Returns the left n part of a string. If n is greater than the length of the string then most implementations return the whole string (exceptions exist – see code examples). Note that for variable-length encodings such as UTF-8, UTF-16 or Shift-JIS, it can be necessary to remove string positions at the end, in order to avoid invalid strings.
Format Languages
string (string'First .. string'First + n - 1) Ada
substr(string, 0, n) AWK (changes string), Perl, PHP, Raku
LEFT$(string,n) BASIC, VB
left(string,n) VB, FreeBASIC, Ingres, Pick Basic
strncpy(string2, string, n) C standard library
string.substr(0,n) C++ (STL), Raku
[string substringToIndex:n] Objective-C (NSString * only)
(apply str (take n string)) Clojure
string[0 .. n] D[26]
string:substr(string, start, length) Erlang
(subseq string 0 n) Common Lisp
string[:n] Cobra, Go, Python
left(string,n «,padchar») Rexx, Erlang
string[0, n]
string[0..n - 1]
Ruby
string[1, n] Pick Basic
string[ .. n] Seed7
string.Substring(0,n) VB .NET, C#, Windows PowerShell, F#
leftstr(string, n) Pascal, Object Pascal (Delphi)
copy (string,1,n) Turbo Pascal
string.substring(0,n) Java,[27] JavaScript
(string-take string n) Scheme (SRFI 13)
take n string Haskell
String.extract (string, n, NONE) Standard ML
String.sub string 0 n OCaml[28]
string.[..n] F#
string.sub(string, 1, n)
(string):sub(1, n)
Lua
string first: n Smalltalk (Squeak, Pharo)
string(:n) Fortran
StringTake[string, n] Mathematica[29]
string («FUNCTION» LENGTH(string) - n:n) COBOL
string.substring(0, n) Cobra
nstring. APL
string[0..n]
string[..n]
string.get(0..n)
string.get(..n)
Rust[30]
# Example in Raku
"Hello, there!".substr(0, 6);  # returns "Hello,"
/* Examples in Rexx */
left("abcde", 3)         /* returns "abc"      */
left("abcde", 8)         /* returns "abcde   " */
left("abcde", 8, "*")    /* returns "abcde***" */
; Examples in Scheme
(use-modules (srfi srfi-13))
(string-take "abcde", 3) ;  returns "abc"
(string-take "abcde", 8) ;  error
' Examples in Visual Basic
Left("sandroguidi", 3)   '  returns "san"
Left("sandroguidi", 100) '  returns "sandroguidi"


len

[edit]

see #length


length

[edit]
Definition length(string) returns an integer number
Description Returns the length of a string (not counting the null terminator or any other of the string's internal structural information). An empty string returns a length of 0.
Format Returns Languages
string'Length Ada
UPB string ALGOL 68
echo "${#string_param}" Bash
length(string) Ingres, Perl 5, Pascal, Object Pascal (Delphi), Rexx, Seed7, SQL, PL/I
len(string) BASIC, FreeBASIC, Python, Go, Pick Basic
length(string), string:len(string) Erlang
Len(string) VB, Pick Basic
string.Length Number of UTF-16 code units VB .NET, C#, Windows PowerShell, F#
chars(string)
string.chars
Number of graphemes (NFG) Raku
codes(string)
string.codes
Number of Unicode code points Raku
string.size OR string.length Number of bytes[31] Ruby
strlen(string) Number of bytes C, PHP
string.length() C++ (STL)
string.length Cobra, D, JavaScript
string.length() Number of UTF-16 code units Java
(string-length string) Scheme
(length string) Common Lisp, ISLISP
(count string) Clojure
String.length string OCaml
size string Standard ML
length string Number of Unicode code points Haskell
string.length Number of UTF-16 code units Objective-C (NSString * only)
string.characters.count Number of characters Swift (2.x)
count(string) Number of characters Swift (1.2)
countElements(string) Number of characters Swift (1.0–1.1)
string.len(string)
(string):len()
#string
Lua
string size Smalltalk
LEN(string)
LEN_TRIM(string)
Fortran
StringLength[string] Mathematica
«FUNCTION» LENGTH(string) or

«FUNCTION» BYTE-LENGTH(string)

number of characters and number of bytes, respectively COBOL
string length string a decimal string giving the number of characters Tcl
string APL
string.len() Number of bytes Rust[32]
string.chars().count() Number of Unicode code points Rust[33]
// Examples in C#
"hello".Length;      // returns 5
"".Length;           // returns 0
# Examples in Erlang
string:len("hello"). %  returns 5
string:len("").      %  returns 0
# Examples in Perl 5
length("hello");     #  returns 5
length("");          #  returns 0
# Examples in Raku
"".chars; chars "";          # both return 0
"".codes; codes "";          # both return 0
' Examples in Visual Basic
Len("hello")         '  returns 5
Len("")              '  returns 0
//Examples in Objective-C
[@"hello" Length]   //returns 5
[@"" Length]   //returns 0
-- Examples in Lua
("hello"):len() -- returns 5
#"" -- returns 0

locate

[edit]

see #Find


Lowercase

[edit]
Definition lowercase(string) returns string
Description Returns the string in lower case.
Format Languages
LCase(string) VB
lcase(string) FreeBASIC
lc(string) Perl, Raku
string.lc Raku
tolower(char) C[34]
std.string.toLower(string) D
transform(string.begin(), string.end(), result.begin(), ::tolower)[35] C++[36]
lowercase(string) Object Pascal (Delphi)
strtolower(string) PHP
lower(string) Seed7
${string_param,,} Bash
echo "string" | tr 'A-Z' 'a-z' Unix
string.lower() Python
downcase(string) Pick Basic
string.downcase Ruby[37]
strings.ToLower(string) Go
(string-downcase string) Scheme (R6RS), Common Lisp
(lower-case string) Clojure
String.lowercase string OCaml
String.map Char.toLower string Standard ML
map Char.toLower string Haskell
string.toLowerCase() Java, JavaScript
to_lower(string) Erlang
string.ToLower() VB .NET, C#, Windows PowerShell, F#
string.lowercaseString Objective-C (NSString * only), Swift (Foundation)
string.lower(string)
(string):lower()
Lua
string asLowercase Smalltalk
LOWER(string) SQL
lowercase(string) PL/I[8]
ToLowerCase[string] Mathematica
«FUNCTION» LOWER-CASE(string) COBOL
string.toLower Cobra
string tolower string Tcl
string.to_lowercase() Rust[38]
// Example in C#
"Wiki means fast?".ToLower();        // "wiki means fast?"
; Example in Scheme
(use-modules (srfi srfi-13))
(string-downcase "Wiki means fast?") ;  "wiki means fast?"
/* Example in C */
#include <ctype.h>
#include <stdio.h>

int main(void) {
    char s[] = "Wiki means fast?";
    for (int i = 0; i < sizeof(s) - 1; ++i) {
        // transform characters in place, one by one
        s[i] = tolower(s[i]);
    }
    printf(string); // "wiki means fast?"
    return 0;
}
# Example in Raku
"Wiki means fast?".lc;             # "wiki means fast?"


mid

[edit]

see #substring


partition

[edit]
Definition <string>.partition(separator) returns the sub-string before the separator; the separator; then the sub-string after the separator.
Description Splits the given string by the separator and returns the three substrings that together make the original.
Format Languages Comments
string.partition(separator) Python, Ruby(1.9+)
lists:partition(pred, string) Erlang
split /(separator)/, string, 2 Perl 5
split separator, string, 2
string.split( separator, 2 )
Raku Separator does not have to be a regular expression
# Examples in Python
"Spam eggs spam spam and ham".partition('spam')   # ('Spam eggs ', 'spam', ' spam and ham')
"Spam eggs spam spam and ham".partition('X')      # ('Spam eggs spam spam and ham', "", "")
# Examples in Perl 5 / Raku
split /(spam)/, 'Spam eggs spam spam and ham' ,2;   # ('Spam eggs ', 'spam', ' spam and ham');
split /(X)/, 'Spam eggs spam spam and ham' ,2;      # ('Spam eggs spam spam and ham');


replace

[edit]
Definition replace(string, find, replace) returns string
Description Returns a string with find occurrences changed to replace.
Format Languages
changestr(find, string, replace) Rexx
std.string.replace(string, find, replace) D
Replace(string, find, replace) VB
replace(string, find, replace) Seed7
change(string, find, replace) Pick Basic
string.Replace(find, replace) C#, F#, VB .NET
str_replace(find, replace, string) PHP
re:replace(string, find, replace, «{return, list}») Erlang
string.replace(find, replace) Cobra, Java (1.5+), Python, Rust[39]
string.replaceAll(find_regex, replace)[40] Java
string.gsub(find, replace) Ruby
string =~ s/find_regex/replace/g[40] Perl 5
string.subst(find, replace, :g) Raku
string.replace(find, replace, "g") [41]
string.replace(/find_regex/g, replace)[40]
JavaScript
echo "string" | sed 's/find_regex/replace/g'[40] Unix
${string_param//find_pattern/replace} Bash
string.replace(find, replace)
string -replace find_regex, replace[40]
Windows PowerShell
Str.global_replace (Str.regexp_string find) replace string OCaml
[string stringByReplacingOccurrencesOfString:find withString:replace] Objective-C (NSString * only)
string.stringByReplacingOccurrencesOfString(find, withString:replace) Swift (Foundation)
string.gsub(string, find, replace)
(string):gsub(find, replace)
Lua
string copyReplaceAll: find with: replace Smalltalk (Squeak, Pharo)
string map {find replace} string Tcl
StringReplace[string, find -> replace] Mathematica
strings.Replace(string, find, replace, -1) Go
INSPECT string REPLACING ALL/LEADING/FIRST find BY replace COBOL
find_regex ⎕R replace_regexstring APL
// Examples in C#
"effffff".Replace("f", "jump");     // returns "ejumpjumpjumpjumpjumpjump"
"blah".Replace("z", "y");           // returns "blah"
// Examples in Java
"effffff".replace("f", "jump");     // returns "ejumpjumpjumpjumpjumpjump"
"effffff".replaceAll("f*", "jump"); // returns "ejump"
// Examples in Raku
"effffff".subst("f", "jump", :g);    # returns "ejumpjumpjumpjumpjumpjump"
"blah".subst("z", "y", :g);          # returns "blah"
' Examples in Visual Basic
Replace("effffff", "f", "jump")     '  returns "ejumpjumpjumpjumpjumpjump"
Replace("blah", "z", "y")           '  returns "blah"
# Examples in Windows PowerShell
"effffff" -replace "f", "jump"      #  returns "ejumpjumpjumpjumpjumpjump"
"effffff" -replace "f*", "jump"     #  returns "ejump"

reverse

[edit]
Definition reverse(string)
Description Reverses the order of the characters in the string.
Format Languages
reverse string Perl 5, Haskell
flip string
string.flip
Raku
lists:reverse(string) Erlang
strrev(string) PHP
string[::-1] Python
(string-reverse string) Scheme (SRFI 13)
(reverse string) Common Lisp
string.reverse Ruby, D (modifies string)
new StringBuilder(string).reverse().toString() Java
std::reverse(string.begin(), string.end()); C++ (std::string only, modifies string)
StrReverse(string) VB
string.Reverse() VB .NET, C#
implode (rev (explode string)) Standard ML
string.split("").reverse().join("") JavaScript
string.reverse(string)
(string):reverse()
Lua
string reverse Smalltalk
StringReverse[string] Mathematica
reverse(string) PL/I
«FUNCTION» REVERSE(string) COBOL
string.toCharArray.toList.reversed.join() Cobra
String(string.characters.reverse()) Swift (2.x)
String(reverse(string)) Swift (1.2)
string reverse string Tcl
string APL
string.chars().rev().collect::<String>() Rust[42]
echo string | rev Unix
" Example in Smalltalk "
'hello' reversed             " returns 'olleh' "
# Example in Perl 5
reverse "hello"              # returns "olleh"
# Example in Raku
"hello".flip                 # returns "olleh"
# Example in Python
"hello"[::-1]                # returns "olleh"
; Example in Scheme
(use-modules (srfi srfi-13))
(string-reverse "hello")     ; returns "olleh"

rfind

[edit]
Definition rfind(string,substring) returns integer
Description Returns the position of the start of the last occurrence of substring in string. If the substring is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE.
Related instr
Format Languages If not found
InStrRev(«startposstring,substring) VB returns 0
instrrev(«startposstring,substring) FreeBASIC returns 0
rindex(string,substring«,startpos») Perl 5 returns −1
rindex(string,substring«,startpos»)
string.rindex(substring«,startpos»)
Raku returns Nil
strrpos(string,substring«,startpos») PHP returns FALSE
string.rfind(substring«,startpos») C++ (STL) returns std::string::npos
std.string.rfind(string, substring) D returns −1
string.rfind(substring«,startpos«, endpos»») Python returns −1
string.rindex(substring«,startpos«, endpos»») raises ValueError
rpos(string, substring«,startpos») Seed7 returns 0
string.rindex(substring«,startpos») Ruby returns nil
strings.LastIndex(string, substring) Go returns −1
string.lastIndexOf(substring«,startpos») Java, JavaScript returns −1
string.LastIndexOf(substring«,startpos«, charcount»») VB .NET, C#, Windows PowerShell, F# returns −1
(search substring string :from-end t) Common Lisp returns NIL
[string rangeOfString:substring options:NSBackwardsSearch].location Objective-C (NSString * only) returns NSNotFound
Str.search_backward (Str.regexp_string substring) string (Str.length string - 1) OCaml raises Not_found
string.match(string, '.*()'..substring)
string:match('.*()'..substring)
Lua returns nil
Ada.Strings.Unbounded.Index(Source => string, Pattern => substring, Going => Ada.Strings.Backward) Ada returns 0
string.lastIndexOf(substring«,startpos«, charcount»») Cobra returns −1
string lastIndexOfString:substring Smalltalk returns 0
string last substring string startpos Tcl returns −1
(⌽<\substring'string')1 APL returns −1
string.rfind(substring) Rust[43] returns None
; Examples in Common Lisp
(search "e" "Hello mate" :from-end t)     ;  returns 9
(search "z" "word" :from-end t)           ;  returns NIL
// Examples in C#
"Hello mate".LastIndexOf("e");           // returns 9
"Hello mate".LastIndexOf("e", 4);        // returns 1
"word".LastIndexOf("z");                 // returns -1
# Examples in Perl 5
rindex("Hello mate", "e");               # returns 9
rindex("Hello mate", "e", 4);            # returns 1
rindex("word", "z");                     # returns -1
# Examples in Raku
"Hello mate".rindex("e");                # returns 9
"Hello mate".rindex("e", 4);             # returns 1
"word".rindex('z');                      # returns Nil
' Examples in Visual Basic
InStrRev("Hello mate", "e")              '  returns 10
InStrRev(5, "Hello mate", "e")           '  returns 2
InStrRev("word", "z")                    '  returns 0


[edit]
Definition right(string,n) returns string
Description Returns the right n part of a string. If n is greater than the length of the string then most implementations return the whole string (exceptions exist – see code examples).
Format Languages
string (string'Last - n + 1 .. string'Last) Ada
Right(string,n) VB
RIGHT$(string,n) BASIC
right(string,n) FreeBASIC, Ingres, Pick Basic
strcpy(string2, string+n) (n must not be greater than the length of string) C
string.Substring(string.Length()-n) C#
string[len(string)-n:] Go
string.substring(string.length()-n) Java
string.slice(-n) JavaScript[44]
right(string,n «,padchar») Rexx, Erlang
substr(string,-n) Perl 5, PHP
substr(string,*-n)
string.substr(*-n)
Raku
string[-n:] Cobra, Python
${string_param: -n} (note the space after the colon) Bash
string[n] Pick Basic
(string-take-right string n) Scheme (SRFI 13)
string[-n..-1] Ruby
string[$-n .. $] D[45]
String.sub string (String.length string - n) n OCaml[28]
string.sub(string, -n)
(string):sub(-n)
Lua
string last: n Smalltalk (Squeak, Pharo)
StringTake[string, -n] Mathematica[29]
string (1:n) COBOL
¯nstring. APL
string[n..]
string.get(n..)
Rust[30]
// Examples in Java; extract rightmost 4 characters
String str = "CarDoor";
str.substring(str.length()-4); // returns 'Door'
# Examples in Raku
"abcde".substr(*-3);          # returns "cde"
"abcde".substr(*-8);          # 'out of range' error
/* Examples in Rexx */
right("abcde", 3)              /* returns "cde"      */
right("abcde", 8)              /* returns "   abcde" */
right("abcde", 8, "*")         /* returns "***abcde" */
; Examples in Scheme
(use-modules (srfi srfi-13))
(string-take-right "abcde", 3) ;  returns "cde"
(string-take-right "abcde", 8) ;  error
' Examples in Visual Basic
Right("sandroguidi", 3)        '  returns "idi"
Right("sandroguidi", 100)      '  returns "sandroguidi"


rpartition

[edit]
Definition <string>.rpartition(separator) Searches for the separator from right-to-left within the string then returns the sub-string before the separator; the separator; then the sub-string after the separator.
Description Splits the given string by the right-most separator and returns the three substrings that together make the original.
Format Languages
string.rpartition(separator) Python, Ruby
# Examples in Python
"Spam eggs spam spam and ham".rpartition('spam')  ### ('Spam eggs spam ', 'spam', ' and ham')
"Spam eggs spam spam and ham".rpartition('X')     ### ("", "", 'Spam eggs spam spam and ham')

slice

[edit]

see #substring


split

[edit]
Definition <string>.split(separator[, limit]) splits a string on separator, optionally only up to a limited number of substrings
Description Splits the given string by occurrences of the separator (itself a string) and returns a list (or array) of the substrings. If limit is given, after limit – 1 separators have been read, the rest of the string is made into the last substring, regardless of whether it has any separators in it. The Scheme and Erlang implementations are similar but differ in several ways. JavaScript differs also in that it cuts, it does not put the rest of the string into the last element. See the example here. The Cobra implementation will default to whitespace. Opposite of join.
Format Languages
split(/separator/, string«, limit») Perl 5
split(separator, string«, limit»)
string.split(separator, «limit»)
Raku
explode(separator, string«, limit») PHP
string.split(separator«, limit-1») Python
string.split(separator«, limit») JavaScript, Java, Ruby
string:tokens(string, sepchars) Erlang
strings.Split(string, separator)
strings.SplitN(string, separator, limit)
Go
(string-tokenize string« charset« start« end»»») Scheme (SRFI 13)
Split(string, sepchars«, limit») VB
string.Split(sepchars«, limit«, options»») VB .NET, C#, F#
string -split separator«, limit«, options»» Windows PowerShell
Str.split (Str.regexp_string separator) string OCaml
std.string.split(string, separator) D
[string componentsSeparatedByString:separator] Objective-C (NSString * only)
string.componentsSeparatedByString(separator) Swift (Foundation)
TStringList.Delimiter, TStringList.DelimitedText Object Pascal
StringSplit[string, separator«, limit»] Mathematica
string.split«(sepchars«, limit«, options»»)» Cobra
split string separator Tcl
(separatorstring)⊂string in APL2
separator(≠⊆⊢)string in Dyalog APL 16.0
APL
string.split(separator)

string.split(limit, separator)

Rust[46]
// Example in C#
"abc,defgh,ijk".Split(',');                 // {"abc", "defgh", "ijk"}
"abc,defgh;ijk".Split(',', ';');            // {"abc", "defgh", "ijk"}
% Example in Erlang
string:tokens("abc;defgh;ijk", ";").        %  ["abc", "defgh", "ijk"]
// Examples in Java
"abc,defgh,ijk".split(",");                 // {"abc", "defgh", "ijk"}
"abc,defgh;ijk".split(",|;");               // {"abc", "defgh", "ijk"}
{ Example in Pascal }
var
  lStrings: TStringList;
  lStr: string;
begin
  lStrings := TStringList.Create;
  lStrings.Delimiter := ',';
  lStrings.DelimitedText := 'abc,defgh,ijk';
  lStr := lStrings.Strings[0]; // 'abc'
  lStr := lStrings.Strings[1]; // 'defgh'
  lStr := lStrings.Strings[2]; // 'ijk'
end;
# Examples in Perl 5
split(/spam/, 'Spam eggs spam spam and ham'); # ('Spam eggs ', ' ', ' and ham')
split(/X/, 'Spam eggs spam spam and ham');    # ('Spam eggs spam spam and ham')
# Examples in Raku
'Spam eggs spam spam and ham'.split(/spam/);  # (Spam eggs     and ham)
split(/X/, 'Spam eggs spam spam and ham');    # (Spam eggs spam spam and ham)


sprintf

[edit]

see #Format

strip

[edit]

see #trim


strcmp

[edit]

see #Compare (integer result)


substring

[edit]
Definition substring(string, startpos, endpos) returns string
substr(string, startpos, numChars) returns string
Description Returns a substring of string between starting at startpos and endpos, or starting at startpos of length numChars. The resulting string is truncated if there are fewer than numChars characters beyond the starting point. endpos represents the index after the last character in the substring. Note that for variable-length encodings such as UTF-8, UTF-16 or Shift-JIS, it can be necessary to remove string positions at the end, in order to avoid invalid strings.
Format Languages
string[startpos:endpos] ALGOL 68 (changes base index)
string (startpos .. endpos) Ada (changes base index)
Mid(string, startpos, numChars) VB
mid(string, startpos, numChars) FreeBASIC
string[startpos+(⍳numChars)-~⎕IO] APL
MID$(string, startpos, numChars) BASIC
substr(string, startpos, numChars) AWK (changes string), Perl 5,[47][48] PHP[47][48]
substr(string, startpos, numChars)
string.substr(startpos, numChars)
Raku[49][50]
substr(string, startpos «,numChars, padChar») Rexx
string[startpos:endpos] Cobra, Python,[47][51] Go
string[startpos, numChars] Pick Basic
string[startpos, numChars]
string[startpos .. endpos-1]
string[startpos ... endpos]
Ruby[47][51]
string[startpos .. endpos]
string[startpos len numChars]
Seed7
string.slice(startpos«, endpos») JavaScript[47][51]
string.substr(startpos«, numChars») C++ (STL), JavaScript
string.Substring(startpos, numChars) VB .NET, C#, Windows PowerShell, F#
string.substring(startpos«, endpos») Java, JavaScript
copy(string, startpos, numChars) Object Pascal (Delphi)
(substring string startpos endpos) Scheme
(subseq string startpos endpos) Common Lisp
(subseq string startpos endpos) ISLISP
String.sub string startpos numChars OCaml
substring (string, startpos, numChars) Standard ML
string:sub_string(string, startpos, endpos)
string:substr(string, startpos, numChars)
Erlang
strncpy(result, string + startpos, numChars); C
string[startpos .. endpos+1] D
take numChars $ drop startpos string Haskell
[string substringWithRange:NSMakeRange(startpos, numChars)] Objective-C (NSString * only)
string.[startpos..endpos] F#
string.sub(string, startpos, endpos)
(string):sub(startpos, endpos)
Lua[47][51]
string copyFrom: startpos to: endpos Smalltalk
string(startpos:endpos) Fortran
SUBSTRING(string FROM startpos «FOR numChars») SQL
StringTake[string, {startpos, endpos}] Mathematica[47][51]
string (startpos:numChars) COBOL
${string_param:startpos:numChars} Bash
string range string startpos endpos Tcl
string[startpos..endpos]
string.get(startpos..endpos)
Rust[30]
// Examples in C#
"abc".Substring(1, 1):      // returns "b"
"abc".Substring(1, 2);      // returns "bc"
"abc".Substring(1, 6);      // error
;; Examples in Common Lisp
(subseq "abc" 1 2)          ; returns "b"
(subseq "abc" 2)            ; returns "c"
% Examples in Erlang
string:substr("abc", 2, 1). %  returns "b"
string:substr("abc", 2).    %  returns "bc"
# Examples in Perl 5
substr("abc", 1, 1);       #  returns "b"
substr("abc", 1);          #  returns "bc"
# Examples in Raku
"abc".substr(1, 1);        #  returns "b"
"abc".substr(1);           #  returns "bc"
# Examples in Python
"abc"[1:2]                 #  returns "b"
"abc"[1:3]                 #  returns "bc"
/* Examples in Rexx */
substr("abc", 2, 1)         /* returns "b"      */
substr("abc", 2)            /* returns "bc"     */
substr("abc", 2, 6)         /* returns "bc    " */
substr("abc", 2, 6, "*")    /* returns "bc****" */


Uppercase

[edit]
Definition uppercase(string) returns string
Description Returns the string in upper case.
Format Languages
UCase(string) VB
ucase(string) FreeBASIC
toupper(string) AWK (changes string)
uc(string) Perl, Raku
string.uc Raku
toupper(char) C (operates on one character)
for (size_t i = 0, len = strlen(string); i< len; i++) string[i] = toupper(string[i]);
for (char* c = string; *c != '\0'; c++) *c = toupper(*c);
C (string / char array)
std.string.toUpper(string) D
transform(string.begin(), string.end(), result.begin(), toupper)[35] C++[52]
uppercase(string) Object Pascal (Delphi)
upcase(char) Object Pascal (Delphi) (operates on one character)
strtoupper(string) PHP
upper(string) Seed7
${string_param^^} (mnemonic: ^ is pointing up) Bash
echo "string" | tr 'a-z' 'A-Z' Unix
translate(string)
UPPER variables
PARSE UPPER VAR SrcVar DstVar
Rexx
string.upper() Python
upcase(string) Pick Basic
string.upcase Ruby[37]
strings.ToUpper(string) Go
(string-upcase string) Scheme, Common Lisp
String.uppercase string OCaml
String.map Char.toUpper string Standard ML
map Char.toUpper string Haskell
string.toUpperCase() Java, JavaScript
string.uppercase() Kotlin[53]
to_upper(string) Erlang
string.ToUpper() VB .NET, C#, Windows PowerShell, F#
string.uppercaseString Objective-C (NSString * only), Swift (Foundation)
string.upper(string)
(string):upper()
Lua
string asUppercase Smalltalk
UPPER(string) SQL
ToUpperCase[string] Mathematica
«FUNCTION» UPPER-CASE(string) COBOL
string.toUpper Cobra
string toupper string Tcl
string.to_uppercase() Rust[54]
// Example in C#
"Wiki means fast?".ToUpper();      // "WIKI MEANS FAST?"
# Example in Perl 5
uc("Wiki means fast?");             # "WIKI MEANS FAST?"
# Example in Raku
uc("Wiki means fast?");             # "WIKI MEANS FAST?"
"Wiki means fast?".uc;              # "WIKI MEANS FAST?"
/* Example in Rexx */
translate("Wiki means fast?")      /* "WIKI MEANS FAST?" */

/* Example #2 */
A='This is an example.'
UPPER A                            /* "THIS IS AN EXAMPLE." */

/* Example #3 */
A='upper using Translate Function.'
Translate UPPER VAR A Z            /* Z="UPPER USING TRANSLATE FUNCTION." */
; Example in Scheme
(use-modules (srfi srfi-13))
(string-upcase "Wiki means fast?") ;  "WIKI MEANS FAST?"
' Example in Visual Basic
UCase("Wiki means fast?")          '  "WIKI MEANS FAST?"

trim

[edit]

trim or strip is used to remove whitespace from the beginning, end, or both beginning and end, of a string.

Example usage Languages
String.Trim([chars]) C#, VB.NET, Windows PowerShell
string.strip(); D
(.trim string) Clojure
sequence [ predicate? ] trim Factor
(string-trim '(#\Space #\Tab #\Newline) string) Common Lisp
(string-trim string) Scheme
string.trim() Java, JavaScript (1.8.1+, Firefox 3.5+), Rust[55]
Trim(String) Pascal,[56] QBasic, Visual Basic, Delphi
string.strip() Python
strings.Trim(string, chars) Go
LTRIM(RTRIM(String)) Oracle SQL, T-SQL
strip(string [,option, char]) REXX
string:strip(string [,option, char]) Erlang
string.strip
string.lstrip
string.rstrip
Ruby
string.trim Raku
trim(string) PHP, Raku
[string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]] Objective-C using Cocoa
string withBlanksTrimmed
string withoutSpaces
string withoutSeparators
Smalltalk (Squeak, Pharo)
Smalltalk
strip(string) SAS
string trim $string Tcl
TRIM(string)
TRIM(ADJUSTL(string))
Fortran
TRIM(string) SQL
TRIM(string)
LTrim(string)
RTrim(String)
ColdFusion
String.trim string OCaml 4+

Other languages

In languages without a built-in trim function, it is usually simple to create a custom function which accomplishes the same task.

APL

[edit]

APL can use regular expressions directly:

Trim'^ +| +$'⎕R''

Alternatively, a functional approach combining Boolean masks that filter away leading and trailing spaces:

Trim{/⍨(\⌽∨\∘)' '}

Or reverse and remove leading spaces, twice:

Trim{(\' ')/}2

AWK

[edit]

In AWK, one can use regular expressions to trim:

 ltrim(v) = gsub(/^[ \t]+/, "", v)
 rtrim(v) = gsub(/[ \t]+$/, "", v)
 trim(v)  = ltrim(v); rtrim(v)

or:

 function ltrim(s) { sub(/^[ \t]+/, "", s); return s }
 function rtrim(s) { sub(/[ \t]+$/, "", s); return s }
 function trim(s)  { return rtrim(ltrim(s)); }

C/C++

[edit]

There is no standard trim function in C or C++. Most of the available string libraries[57] for C contain code which implements trimming, or functions that significantly ease an efficient implementation. The function has also often been called EatWhitespace in some non-standard C libraries.

In C, programmers often combine a ltrim and rtrim to implement trim:

#include <ctype.h>
#include <string.h>

void rtrim(char* str) {
    char* s;
    s = str + strlen(str);
    while (--s >= str) {
        if (!isspace(*s)) {
            break;
        }
        *s = 0;
    }
}

void ltrim(char* str) {
    size_t n;
    n = 0;
    while (str[n] && isspace((unsigned char) str[n])) {
        n++;
    }
    memmove(str, str + n, strlen(str) - n + 1);
}

void trim(char* str) {
    rtrim(str);
    ltrim(str);
}

The open source C++ library Boost has several trim variants, including a standard one:[58]

#include <boost/algorithm/string/trim.hpp>

trimmed = boost::algorithm::trim_copy("string");

With boost's function named simply trim the input sequence is modified in-place, and returns no result.

Another open source C++ library Qt, has several trim variants, including a standard one:[59]

#include <QString>

trimmed = s.trimmed();

The Linux kernel also includes a strip function, strstrip(), since 2.6.18-rc1, which trims the string "in place". Since 2.6.33-rc1, the kernel uses strim() instead of strstrip() to avoid false warnings.[60]

Haskell

[edit]

A trim algorithm in Haskell:

 import Data.Char (isSpace)
 trim      :: String -> String
 trim      = f . f
    where f = reverse . dropWhile isSpace

may be interpreted as follows: f drops the preceding whitespace, and reverses the string. f is then again applied to its own output. Note that the type signature (the second line) is optional.

J

[edit]

The trim algorithm in J is a functional description:

     trim =. #~ [: (+./\ *. +./\.) ' '&~:

That is: filter (#~) for non-space characters (' '&~:) between leading (+./\) and (*.) trailing (+./\.) spaces.

JavaScript

[edit]

There is a built-in trim function in JavaScript 1.8.1 (Firefox 3.5 and later), and the ECMAScript 5 standard. In earlier versions it can be added to the String object's prototype as follows:

String.prototype.trim = function() {
  return this.replace(/^\s+/g, "").replace(/\s+$/g, "");
};

Perl

[edit]

Perl 5 has no built-in trim function. However, the functionality is commonly achieved using regular expressions.

Example:

$string =~ s/^\s+//;            # remove leading whitespace
$string =~ s/\s+$//;            # remove trailing whitespace

or:

$string =~ s/^\s+|\s+$//g ;     # remove both leading and trailing whitespace

These examples modify the value of the original variable $string.

Also available for Perl is StripLTSpace in String::Strip from CPAN.

There are, however, two functions that are commonly used to strip whitespace from the end of strings, chomp and chop:

  • chop removes the last character from a string and returns it.
  • chomp removes the trailing newline character(s) from a string if present. (What constitutes a newline is $INPUT_RECORD_SEPARATOR dependent).

In Raku, the upcoming sister language of Perl, strings have a trim method.

Example:

$string = $string.trim;     # remove leading and trailing whitespace
$string .= trim;            # same thing

Tcl

[edit]

The Tcl string command has three relevant subcommands: trim, trimright and trimleft. For each of those commands, an additional argument may be specified: a string that represents a set of characters to remove—the default is whitespace (space, tab, newline, carriage return).

Example of trimming vowels:

set string onomatopoeia
set trimmed [string trim $string aeiou]         ;# result is nomatop
set r_trimmed [string trimright $string aeiou]  ;# result is onomatop
set l_trimmed [string trimleft $string aeiou]   ;# result is nomatopoeia

XSLT

[edit]

XSLT includes the function normalize-space(string) which strips leading and trailing whitespace, in addition to replacing any whitespace sequence (including line breaks) with a single space.

Example:

<xsl:variable name='trimmed'>
   <xsl:value-of select='normalize-space(string)'/>
</xsl:variable>

XSLT 2.0 includes regular expressions, providing another mechanism to perform string trimming.

Another XSLT technique for trimming is to utilize the XPath 2.0 substring() function.

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The ' functions examines the diverse built-in methods, operators, and libraries provided across languages for manipulating and querying —sequences of characters representing text data. These functions encompass core operations such as determining , concatenating multiple , extracting substrings, replacing characters or patterns, reversing or sorting content, searching for indices, converting case, checking equality, and applying regular expressions for advanced matching. Such comparisons reveal variations in syntax, semantics, and efficiency, influenced by language paradigms like imperative, functional, or object-oriented designs. This comparison covers a range of languages including procedural (e.g., C), object-oriented (e.g., Java, C++), and scripting (e.g., Python, Perl) paradigms, among others. A key distinction lies in how languages represent strings internally: low-level languages like C treat strings as null-terminated arrays of characters, requiring explicit memory management and lacking native high-level functions, whereas most higher-level languages provide immutable or mutable string objects with dedicated APIs for safe and expressive operations. For instance, languages like C++, Go, and Swift often achieve lower execution times in operations like concatenation and searching compared to interpreted languages like Python and Ruby, particularly for large strings exceeding 10 million characters, though performance varies by language and operation (e.g., Java's immutability impacts concatenation efficiency). Immutability in languages like Java and Python can lead to higher memory usage during repeated modifications, as new strings are created instead of altering existing ones, while mutable approaches in C++ or Perl enable in-place changes for efficiency. These differences impact practical applications, from text processing in data analysis to web development and software internationalization, where support for Unicode and encoding varies—e.g., built-in UTF-8 handling in modern languages like Go contrasts with manual byte manipulation in C. A 2019 empirical benchmark across languages including C, C++, Java, Python, Perl, Ruby, Go, and Swift demonstrated that no single language excels universally; for example, Perl outperforms others in regular expression tasks, while C and C++ minimize memory consumption across most operations. Understanding these comparisons aids developers in selecting appropriate languages for string-intensive tasks and informs language designers on API evolution.

Introduction

Overview of String Handling

In programming languages, strings are defined as finite sequences of characters, serving as a fundamental for representing textual data such as words, sentences, or symbols. Unicode support, which standardizes the encoding of characters from diverse writing systems worldwide, exhibits variations across languages to balance compatibility, efficiency, and internationalization needs. , strings are commonly managed as null-terminated byte arrays, often using encoding in modern systems—a variable-width scheme that extends ASCII while preserving for single-byte characters. By contrast, offers native Unicode integration through its String class, internally employing UTF-16 encoding to natively handle a broader spectrum of code points without requiring external libraries. A key distinction in string design lies between immutable and mutable implementations, influencing safety, concurrency, and efficiency. Immutable strings, prevalent in languages like and Python, cannot be altered after creation; any modification, such as appending text, produces a new string object, which enhances by preventing unintended changes during shared access. Mutable strings, as seen in C++'s std::string, permit in-place alterations to individual characters or sections, offering greater flexibility for performance-critical applications where frequent edits occur without the overhead of allocation. The evolution of string handling traces back to the 1970s with C's introduction of null-terminated character arrays, a lightweight yet manual approach that delimited strings via a trailing null byte (\0) to simplify parsing while exposing programmers to risks like buffer overflows. This paradigm persisted into the 1980s and 1990s but gave way to more robust models in object-oriented languages; Java (1995) and C# (2000) elevated strings to first-class objects with encapsulated methods for operations, integrating Unicode natively and automating memory management to mitigate low-level errors. String handling presents persistent challenges, particularly around encoding inconsistencies that can corrupt data—such as interpreting bytes as ISO-8859-1, resulting in or injection vulnerabilities—and performance trade-offs from mutability choices. Immutable designs reduce mutation-related bugs in multithreaded code but may degrade efficiency through repeated allocations for common tasks like , whereas mutable variants enable optimized in-place updates at the cost of heightened risks.

Languages and Paradigms Covered

This comparison encompasses a diverse set of programming languages selected based on their current popularity, historical influence, and ability to represent key paradigms in string handling, ensuring a balanced view of variations in string functions. Languages like Python, C, C++, Java, and JavaScript are included due to their high rankings in the TIOBE Programming Community Index for November 2025, where Python holds the top position at 23.37%, followed by C++ at 10.03% and C at 8.89%, reflecting their widespread adoption in general-purpose, systems, and web development. Perl, now in the top 10, is featured for its enduring role in text processing. To capture paradigm diversity beyond mainstream usage, Haskell (around 28th at 0.8%), APL/J, AWK, and Tcl are incorporated; these lower-ranked languages (APL/J, AWK, and Tcl outside the top 50 per TIOBE metrics, each <0.1%) exemplify functional, array-oriented, domain-specific text processing, and pure scripting approaches, respectively, without delving into ultra-low-level options like assembly for conciseness. This selection prioritizes coverage of imperative, object-oriented, functional, and scripting paradigms to illustrate how foundational design choices affect string representation and operations. In the imperative and procedural paradigm, C serves as a cornerstone, where strings are implemented as null-terminated arrays of characters (char*), managed via standard library functions like those in <string.h> for basic operations, emphasizing manual memory control. C++, a multi-paradigm extension of C, augments this with the std::string class from the header, providing dynamic allocation and safer abstractions while retaining compatibility with C-style strings. These languages highlight low-level efficiency in string handling, often requiring explicit length management to avoid buffer issues. Object-oriented paradigms are represented by , which uses the immutable class—a final class in the java.lang package—that encapsulates UTF-16 encoded character sequences and relies on methods for all manipulations, promoting thread-safety and garbage collection integration. This approach contrasts with procedural styles by treating strings as objects with built-in methods rather than raw arrays. Scripting and multi-paradigm languages like Python, , , and Tcl prioritize ease of use for dynamic text processing, with strings as core, built-in types supporting and automatic memory handling. Python's str type is an immutable sequence of code points, offering rich built-in methods for common operations. employs immutable string primitives, encoded as UTF-16 sequences, with prototype methods for web-centric tasks. uses scalar variables to store strings, featuring extensive built-in operators and regex integration tailored for text manipulation. Tcl, fundamentally a "string-only" language, represents all values as strings internally, with commands like string for operations, enabling seamless mixing of data types. , a domain-specific tool for text processing, treats strings as simple, dynamically sized values with built-in functions optimized for pattern scanning and field extraction in data streams. Functional and array-oriented paradigms introduce abstract, declarative string handling. Haskell, a pure functional language, defines String as a type synonym for [Char]—a lazy list of characters—leveraging and higher-order functions for immutable transformations without side effects. APL and J, array-oriented languages, represent strings as rank-1 character arrays (vectors), allowing primitive array operations like indexing and reshaping to apply uniformly to text, which facilitates concise, mathematical-style manipulations. Across these paradigms, string functions are often built-in for scripting and functional languages, while procedural ones rely more on libraries, underscoring how design philosophy influences accessibility and safety, including prevalent immutability in modern types.

Inspection Functions

Determining String Length

Determining the length of a is a fundamental operation in programming languages, providing the number of characters or units in a data structure. This function enables developers to validate input, allocate memory, or iterate over contents efficiently. Across languages, implementations vary in syntax, performance, and handling of , reflecting differences in how strings are represented internally—such as immutable sequences in high-level languages versus null-terminated byte arrays in low-level ones. Common variants include built-in functions or methods tailored to each language's paradigm. In Python, the len() built-in function returns the length of a as the number of Unicode code points. For example:

python

len("hello") # Returns 5

len("hello") # Returns 5

This requires no imports and works on any sequence type. In Java, the length() method on the String class returns the count of 16-bit Unicode code units, invoked as an instance method. For example:

java

"hello".length() // Returns 5

"hello".length() // Returns 5

No additional imports are needed, as String is part of the core java.lang package. In C, the strlen() function from the <string.h> header computes the number of bytes before the null terminator. For example:

c

#include <string.h> strlen("hello"); // Returns 5

#include <string.h> strlen("hello"); // Returns 5

This scans the memory until \0. Specialized languages like APL use the rho operator (shape function) to obtain the length of a vector, including strings treated as character vectors; for instance, ⍴'hello' yields 5. Similarly, AWK's length() function returns the number of characters in a string (multibyte-aware in gawk) or the current record if no argument is provided; length("hello") returns 5. Edge cases, particularly with empty or null strings, reveal implementation differences that can lead to errors if unhandled. An returns 0 in Python (len("") == 0), ("".length() == 0), C (strlen("") == 0), APL (⍴'' ≡ 0), and AWK (length("") == 0 or length of unset variable). However, null references behave differently: Python raises a TypeError on len(None), throws a NullPointerException when calling length() on null, C invokes (potentially crashing or returning garbage) if strlen(NULL) is passed, APL errors on non-array inputs like null equivalents, and AWK has no null reference equivalent—instead, unset variables are treated as empty strings yielding 0. Performance typically favors constant-time O(1) operations in modern languages due to stored metadata, avoiding full scans. Python's len() for strings achieves O(1) by accessing an internal length field in the PyUnicodeObject structure. Java's length() is O(1), retrieving the count field from the String object. APL and AWK also provide O(1) access via array shape or built-in tracking. In contrast, C's strlen() is O(n) in the worst case, as it iterates byte-by-byte to find the null terminator, making repeated calls inefficient for long strings without caching. A key distinction arises in how lengths account for Unicode: Python 3's len() counts Unicode code points, so "café".encode('utf-8') has 5 bytes but len("café") == 4. Java counts UTF-16 code units, where surrogate pairs for emojis inflate the length (e.g., "👍".length() == 2 despite one code point). C's strlen() measures raw bytes, ignoring encoding (e.g., 5 bytes for "café" in UTF-8), which can mismatch user expectations in multilingual contexts. APL and AWK similarly count characters or bytes based on their array models, often aligning with code points in modern implementations.
LanguageFunction/MethodTime ComplexityCounts
Pythonlen(s)O(1)Unicode code points
s.length()O(1)UTF-16 code units
Cstrlen(s)O(n)Bytes until null
APL⍴sO(1)Vector length (characters)
length(s)O(1)Characters (multibyte-aware)

Accessing Individual Characters

Accessing individual characters in strings is a fundamental operation in most programming languages, allowing developers to retrieve specific code units or characters by position. This typically involves indexing mechanisms, where positions are specified relative to the start of the string. The majority of languages employ zero-based indexing, where the first character is at position 0, facilitating efficient pointer arithmetic and alignment with memory addressing conventions established in early languages like C. However, exceptions exist, such as Lua and APL, which default to one-based indexing to align more closely with mathematical notation and human counting intuition. Some dialects of BASIC, like Visual Basic, support both zero- and one-based access depending on the function, though .NET strings fundamentally use zero-based indexing. Common functions for character access vary by language but often include dedicated methods or operator overloading. In Java, the charAt(int index) method retrieves the character at the specified zero-based index, returning a primitive char type representing a UTF-16 code unit. Similarly, JavaScript's String.prototype.charAt(index) returns a string containing the single UTF-16 code unit at the zero-based position, or an empty string if the index is out of bounds. Python uses square bracket notation s[index] for zero-based access, yielding a string of length 1 rather than a distinct character type, as Python treats single characters as short strings. In C++, std::string::operator[](size_type pos) provides zero-based access to a char reference without bounds checking, while the safer at(pos) method performs validation. APL employs bracket indexing [index], defaulting to one-based via the system variable ⎕IO←1, to extract elements from character vectors. Lua's strings also support one-based indexing through functions like string.byte(s, i), where i=1 accesses the first byte. Bounds checking is a critical safety feature that prevents invalid memory access, but its implementation differs significantly across languages, impacting security and performance. Java's charAt throws an IndexOutOfBoundsException for negative indices or those exceeding the string length minus one. Python's indexing raises an IndexError for out-of-range attempts, enforcing runtime validation. In contrast, C++'s operator[] exhibits for positions beyond the string size, potentially leading to crashes or exploits, though at() throws std::out_of_range. C treats as null-terminated char arrays, offering no inherent bounds checking; direct indexing like s[i] can overrun buffers if i exceeds the allocated size, a common vector for vulnerabilities that allow . Languages without built-in checks, like C, require manual validation to mitigate such risks, often at the cost of added complexity. Return types for single-character access reflect underlying type systems and design philosophies. Low-level languages like C and C++ return a primitive char, enabling direct manipulation of byte values. Java follows suit with its char primitive, which is a 16-bit unsigned integer suitable for UTF-16. Higher-level languages such as Python and JavaScript return strings of length 1, avoiding separate character types to simplify uniformity and immutability—Python explicitly states that s[0] equals s[0:1]. This approach in Python aligns with its sequence model, where characters are not distinct from strings. Unicode handling introduces nuances, as strings often encode text in UTF-16 or UTF-8, where "characters" may not align with user-perceived units. Most access functions operate on code units rather than grapheme clusters, which are visually distinct symbols potentially spanning multiple code points (e.g., accented letters like 'é' as 'e' + combining acute). In JavaScript, charAt indexes UTF-16 code units, so surrogate pairs for astral plane characters (U+10000–U+10FFFF) like emojis return individual surrogates, splitting the symbol—e.g., '💩'.charAt(0) yields the high surrogate '\uD83D', not the full glyph. Java's charAt similarly accesses code units, treating supplementary characters as two positions. For grapheme-aware access, languages recommend higher-level APIs or normalization (e.g., JavaScript's codePointAt for code points), but standard indexing prioritizes code unit efficiency over semantic completeness. Python's indexing works on Unicode code points in its internal representation, but slicing or access may still require libraries for true grapheme boundaries.
LanguageIndexingAccess SyntaxReturn TypeBounds CheckUnicode Notes
0-basedcharAt(index)charYes (IndexOutOfBoundsException)UTF-16 code units; surrogates split
0-basedcharAt(index)string (len 1)Partial (empty string on out-of-bounds)UTF-16; lone possible
Python0-baseds[index]str (len 1)Yes (IndexError)Code points; graphemes need extra handling
C++0-baseds[pos] or s.at(pos)char&No / Yes (out_of_range)Bytes; no native Unicode support
0-baseds[i]charNo (buffer overflow risk)ASCII/bytes; manual Unicode
Lua1-basedstring.byte(s, i)number (byte value)ManualBytes; UTF-8 requires care
APL1-based (default)s[index]CharacterManual via ⎕IOVector elements; Unicode via implementation

Comparison Functions

Equality and Inequality Checks

In most programming languages, string equality checks determine whether two strings contain identical sequences of characters, while inequality checks verify the opposite. These operations typically use built-in operators or functions and are case-sensitive by default, meaning distinctions between uppercase and lowercase letters are preserved (e.g., "Apple" is not equal to "apple"). For example, in Python, the == operator performs a content-based comparison for strings, returning True if the sequences match exactly, including case. Similarly, JavaScript's strict equality operator (===) compares string primitives by value without type coercion, ensuring case-sensitive equality. Inequality is handled via != or !== in these languages, negating the equality result. In lower-level languages like C, equality is tested using the strcmp() function from the standard library, which returns 0 if the null-terminated strings are identical in content and length. C++ extends this with the std::string class, where the == operator overloads perform content comparison on character sequences, also case-sensitive by default. Java differs by distinguishing between reference and value comparisons: the == operator checks if two String objects reference the same instance in memory, while the equals() method compares content for equality, returning true only if the character sequences match. For inequality in Java, != tests reference inequality, and !equals() negates content equality. Locale-aware variants exist in some languages, such as C's strcasecmp(), which ignores case by converting characters to a common form during comparison, though these are less common for basic equality checks. Null handling varies significantly and requires caution to avoid errors. In Java, comparing two null references with == returns true, but invoking equals() on a null String throws a NullPointerException; instead, the method returns false if the argument is null. Python treats None (null equivalent) separately from empty strings, where "" == None is False. String comparisons with None using == return False; identity checks to None should use 'is' to avoid potential issues with custom eq implementations. C and C++ strings are null-terminated pointers, so comparing NULL pointers with strcmp() is undefined behavior, typically requiring explicit null checks before invocation. JavaScript handles null and undefined distinctly from strings, with null == undefined being true due to coercion, but strict === returns false; the string 'null' == null is false. Empty strings "" === "" is true. Performance for content-based string equality is generally O(n) time complexity, where n is the length of the shorter string, as implementations scan characters sequentially until a mismatch or end is reached. This holds across languages like Python, Java, and C++, where early termination optimizes average cases but worst-case equality requires full traversal. Reference comparisons (e.g., Java's ==) are O(1), but they do not verify content equality.
LanguageEquality Operator/FunctionInequalityCase SensitivityNull Handling Notes
Python== (content)!=Yes (default)Comparisons with == return False; use 'is' for None checks; "" != None
Java (content); == (ref)!equals(); !=Yes (default)== true for null==null; (null) false or NPE
Cstrcmp() == 0strcmp() != 0Yes (default); strcasecmp() ignoresUndefined for NULL pointers
C++std::string::== (content)!=Yes (default)Dereference null undefined
JavaScript=== (strict value)!==Yes=== false for null vs string; coercion in ==

Lexicographical Ordering

Lexicographical ordering of strings involves comparing their character sequences position by position, typically based on the Unicode code points of the characters, to determine which string precedes, follows, or equals another. In Java, the String.compareTo(String anotherString) method performs this comparison, returning a negative integer if the current string is lexicographically less than the argument, zero if equal, and a positive integer if greater. The method iterates through characters until a difference is found, subtracting the Unicode values (this.charAt(k) - anotherString.charAt(k)), or compares lengths if one is a prefix of the other. For example:

java

"apple".compareTo("apricot") // Returns negative value, since 'p' (U+0070) < 'r' (U+0072) at position 3

"apple".compareTo("apricot") // Returns negative value, since 'p' (U+0070) < 'r' (U+0072) at position 3

A zero result from compareTo indicates the strings are equal, aligning with the equals method. The C standard library function strcmp(const char *lhs, const char *rhs) similarly returns an int: negative if lhs precedes rhs, zero if equal, and positive otherwise, based on the difference of the first differing bytes interpreted as unsigned char values. It assumes null-terminated byte strings and performs a basic byte-wise comparison without locale awareness. In Python, relational operators like < and > provide boolean results for lexicographical ordering, using Unicode code points for comparison. For instance:

python

"apple" < "apricot" # True, due to 'p' < 'r' at the differing position

"apple" < "apricot" # True, due to 'p' < 'r' at the differing position

If prefixes match but lengths differ, the shorter string is considered smaller, as if padded with null characters. Haskell's Ord typeclass instance for String (defined as [Char]) supports operators like < and > (returning Bool) and the compare function (returning Ordering: LT, EQ, or GT). Lexicographical ordering follows the list structure, comparing elements recursively until a difference, with empty lists (empty strings) preceding non-empty ones. For example, compare "apple" "apricot" == LT.
LanguageFunction/OperatorReturn TypeNotes
compareTointMagnitude indicates order; code point subtraction.
CstrcmpintByte-wise; sign of first difference.
Python<, >boolDirect relational; shorter prefix wins on length.
compare, <Ordering or boolLexicographic on list; empty < non-empty.
By default, these functions use simple code point ordering, where characters are compared by their Unicode scalar values (e.g., "a" (U+0061) < "b" (U+0062)), without considering linguistic rules. This basic approach, akin to binary string comparison, may not yield natural language order (e.g., uppercase before lowercase in some cases). For locale-sensitive collation, languages like Java provide the Collator class, which tailors comparisons to specific locales using the Unicode Collation Algorithm (UCA). Obtained via Collator.getInstance(Locale), it supports strength levels (e.g., primary for base letters, tertiary for case) and decomposition for accented characters, overriding the locale-agnostic String.compareTo. The UCA enables multi-level weighting: primary for script order, secondary for accents, and tertiary for case, allowing custom tailorings for languages like French (backwards accents) or Slovak (contractions like "ch"). Edge cases highlight these rules: for unequal lengths with matching prefixes, the shorter string precedes (e.g., "apple" < "apples" in all listed languages). Empty strings are the smallest, as they precede any non-empty string (e.g., "" < "a").

Searching Functions

Forward Substring Location

Forward substring location refers to the operation of searching for the first occurrence of a specified substring or character within a string, starting from the beginning (or an optional offset position), and returning its zero-based index if found. This functionality is essential for tasks such as parsing, validation, and text processing in programming languages. Most languages provide built-in methods that handle both single characters and longer substrings uniformly, treating characters as single-character strings, and perform case-sensitive searches by default. In imperative languages like Java, JavaScript, and Python, the common approach uses methods such as indexOf or find that return an integer representing the position of the first match, or a sentinel value like -1 if no match is found. For example, in Java, the String.indexOf(String str) method searches from the start and returns the index or -1, with an overload indexOf(String str, int fromIndex) allowing specification of a starting offset for subsequent searches within the same string. Similarly, JavaScript's String.prototype.indexOf(searchString[, position]) supports an optional position parameter and returns -1 on failure, enabling efficient forward scans without regex overhead. Python's str.find(sub[, start[, end]]) mirrors this, returning -1 if the substring is absent, and allows both start and end bounds for more precise substring searches. Perl's built-in index(STRING, SUBSTRING[, POSITION]) function operates analogously, returning the position of the first occurrence starting from the optional POSITION (default 0) or -1 if not found, and it unifies character and substring searches by accepting any scalar as the substring. In C++, std::string::find(const std::string& str, size_t pos = 0) returns the position or std::string::npos (a special constant) on failure, with the pos parameter enabling offset-based searches; this design avoids negative returns to align with C++'s unsigned indexing conventions. C#'s String.IndexOf(string value[, int startIndex]) follows the -1 convention for not found cases, supporting overloads for startIndex and even StringComparison options, though the default is case-sensitive ordinal comparison. Functional languages often return optional types to handle absence more safely without sentinels. In Rust, &str::find(&self, pat: &str) -> Option<usize> yields Some(index) if the pattern is found starting from the beginning, or None otherwise, and it lacks a direct offset parameter but can be composed with slicing for similar effects; characters are searched via the same method by passing a single-character pattern. Go's strings.Index(s, substr string) int returns the byte index or -1, performing a simple linear forward search without built-in offset support in this function, though higher-level compositions are possible. In Haskell, the text package provides breakOn :: Text -> Text -> (Text, Text) to find the first occurrence of a substring (needle) in a text (haystack), returning the prefix before the and the ; the starting index can be computed as length (fst result) if the begins with the needle, or Nothing otherwise for Maybe Int via composition. Base String ([Char]) uses similar compositions with Data.List.breakOn from the base library, emphasizing safe handling in pure functional contexts. The following table summarizes return behaviors for not found cases across these languages:
LanguageMethod/FunctionReturn on Not FoundOffset Support
Pythonstr.find()-1Yes (start, end)
String.indexOf()-1Yes (fromIndex)
String.indexOf()-1Yes (position)
index()-1Yes (POSITION)
C++std::string::find()std::string::nposYes (pos)
C#String.IndexOf()-1Yes (startIndex)
str::find()NoneNo (use slicing)
Gostrings.Index()-1No
breakOn (text pkg, composed)NothingVia composition
All listed methods are case-sensitive by default, requiring manual conversion (e.g., to lowercase) for insensitive searches, which is a common pattern to avoid performance pitfalls in large-scale text processing. For instance, in Python:

python

text = "Hello, World!" print(text.find("World")) # Output: 7 print(text.find("world")) # Output: -1 (case-sensitive)

text = "Hello, World!" print(text.find("World")) # Output: 7 print(text.find("world")) # Output: -1 (case-sensitive)

This uniformity facilitates portability when implementing forward searches across languages.

Reverse Substring Location

Reverse substring location functions in programming languages enable the identification of the last occurrence of a specified within a by searching from the end toward the beginning. These functions typically return the zero-based index of the starting position of the match, measured from the left end of the , or a like -1 if no match is found. This approach contrasts with forward substring location methods, which identify the first occurrence from the start. In Python, the str.rfind(sub[, start[, end]]) method performs this search, returning the highest index where the substring sub is found within the optional slice s[start:end], or -1 otherwise. Java's String.lastIndexOf(String str) method similarly returns the index of the last occurrence of str in the calling string, with an overloaded variant lastIndexOf(String str, int fromIndex) that initiates the backward search from the specified fromIndex position. JavaScript provides String.prototype.lastIndexOf(searchString[, position]), which searches backward from the optional position (defaulting to the string's length) and returns the index of the last match or -1. In VBA, the InStrRev(stringcheck, stringmatch[, start[, compare]]) function returns the one-based position of the last occurrence of stringmatch in stringcheck, starting from the optional start position (defaulting to the end), or 0 if absent. Perl's built-in rindex(STRING, SUBSTRING[, POSITION]) function returns the position of the last occurrence starting from the optional POSITION (default end) or -1 if not found, analogous to index but in reverse. In C++, std::string::rfind(const std::string& str, size_t pos = npos) returns the position of the last occurrence not after pos (default end) or std::string::npos on failure. Rust's &str::rfind(&self, pat: &str) -> Option<usize> yields Some(index) for the last match or None, composable with slicing for scoped searches. Go's strings.LastIndex(s, substr string) int returns the byte index of the last occurrence or -1, without direct offset but via slicing. In Haskell, the text package's breakOnEnd :: Text -> Text -> (Text, Text) finds the last occurrence by breaking from the end, with index computed as length haystack - length (snd result) if the prefix of the reversed remainder matches the reversed needle (or via composition); base String uses Data.List functions similarly. The position semantics remain consistent across these languages: despite the reverse search direction, the returned index always references the match's starting position from the string's beginning, facilitating straightforward extraction or further processing. For instance, in the string "banana", searching for "ana" yields index 3, as that is the starting position of the last match. Overloads allowing a starting index for the search enhance flexibility, enabling scoped reverse searches without full-string traversal, as seen in and implementations. Edge cases, such as overlapping potential matches, are handled by prioritizing the rightmost possible starting index. In Python, for the string "aaa" and substring "aa", rfind returns 1, corresponding to the overlap at positions 1-2 rather than 0-1, ensuring the last valid match is selected. These functions generally treat empty substrings specially: Java's lastIndexOf("") returns the string length, while Python and return -1 for empty searches unless the string is also empty. Not all languages provide built-in reverse substring location; for example, the C standard library lacks such a function, like strrstr or equivalent, necessitating manual implementation using loops or custom algorithms over strstr for forward searches. This absence highlights paradigm differences, where lower-level languages emphasize explicit control over string operations.
LanguageFunctionReturn TypeSentinel ValueOverload for Start IndexPosition Base
Pythonstr.rfind(sub[, start[, end]])int-1Yes (start, end)0-based
JavaString.lastIndexOf(str[, fromIndex])int-1Yes (fromIndex)0-based
JavaScriptString.lastIndexOf(searchString[, position])number-1Yes (position)0-based
Perlrindex(STRING, SUBSTRING[, POSITION])int-1Yes (POSITION)0-based
C++std::string::rfind(str[, pos])size_tstd::string::nposYes (pos)0-based
C#String.IndexOf() (last variant)int-1Yes (startIndex)0-based
Ruststr::rfind()OptionNoneNo (use slicing)0-based
Gostrings.LastIndex()int-1No0-based
HaskellbreakOnEnd (text pkg, composed)Maybe IntNothingVia composition0-based
VBAInStrRev(stringcheck, stringmatch[, start[, compare]])Long0Yes (start)1-based
CNone (manual)N/AN/AN/AN/A

Modification Functions

Concatenation and Appending

In programming languages, string concatenation combines two or more strings into a single string, while appending adds content to the end of an existing string. These operations are fundamental for building dynamic text, but their implementation varies based on whether strings are immutable or mutable, affecting efficiency especially in repeated operations like loops. Languages with immutable strings, such as and Python, typically create new objects for each concatenation, leading to potential performance issues if not handled carefully, whereas mutable strings in C++ allow in-place modifications for better efficiency. In , the + operator is the primary means for concatenating strings, as in String result = "Hello" + " " + "World";, which the compiler optimizes using StringBuilder for simple cases but can lead to quadratic time complexity (O(n²)) in loops due to repeated object creation from immutable String objects. The concat() method provides an alternative for pairwise appending, as in "Hello".concat(" World"), returning a new String without modifying the original; however, it is less flexible than + since it accepts only String arguments and shares the same immutability drawbacks. For efficient repeated appending, StringBuilder is recommended, with its append() method offering amortized O(1) per operation via dynamic capacity growth, avoiding the overhead of immutable strings in loops. Python employs the + operator for , as in result = "Hello" + " " + "World", which creates a new object each time due to immutability, resulting in quadratic runtime for repeated operations in loops (e.g., building a from many fragments). The += operator for appending is optimized in to achieve linear time even in loops; however, for portability across implementations and clarity, str.join() is preferred for multiple concatenations, ensuring O(n) time, though details on joining iterables are covered elsewhere. Unlike mutable alternatives, Python lacks a built-in mutable type, emphasizing list-based accumulation followed by joining for performance. JavaScript uses the + operator for string concatenation, as in let result = "Hello" + " " + "World";, which coerces non-string operands to strings and creates a new string, with no significant efficiency difference from other methods for small operations due to engine optimizations like V8's inline caching. Template literals, introduced in ES6, offer a more readable alternative for and , as in let result = Hello ${name};, supporting multiline strings and expressions without explicit + chaining, though they compile to similar underlying operations. The concat() method exists but is rarely used, as + is more idiomatic and performant in practice. In C++, std::string supports the += operator for efficient appending, as in std::string result = "Hello"; result += " World";, which modifies the string in place with amortized constant time per character appended, thanks to dynamic capacity reallocation similar to std::vector. The + operator creates a new string for , as in std::string result = "Hello" + " World";, which is less efficient for repeated use due to temporary object creation. For stream-based appending, std::ostringstream uses the << operator, as in std::ostringstream oss; oss << "Hello" << " World";, providing type-safe but involving more overhead than direct std::string operations; append() method offers another mutable alternative with similar efficiency to +=. Perl uses the dot operator . for concatenation, as in $result = "Hello" . " " . "World";, creating a new string due to immutability, but efficient for small operations. For repeated appending, scalar assignment with .= modifies in place efficiently, similar to C++'s +=, with linear time in loops; join() is also available for iterable concatenation. Ruby employs + for concatenation, as in result = "Hello" + " " + "World", creating new strings (immutable), potentially quadratic in loops. The << method appends efficiently in place for String, achieving linear time, while concat returns a new string; for multiple parts, + with arrays or join is used. Go's + operator creates new strings (immutable), inefficient for loops. For efficient appending, strings.Builder provides an Append method with amortized O(1) per operation, similar to Java's StringBuilder, recommended for building large strings. Swift uses + for concatenation and += for appending, both creating new values (value type, immutable), but compiler optimizes simple cases. For loops, String interpolation or array joining via joined(separator:) is preferred for efficiency; no built-in mutable string, but String is efficient for small operations.
LanguagePrimary Operator/MethodMutabilityEfficiency in Loops
Java+, concat()ImmutableQuadratic with +; linear with StringBuilder.append()
Python+, +=ImmutableLinear in CPython (optimized); use join() for portability
JavaScript+, template literalsImmutableOptimized, near-linear in engines
C+++=, append()MutableAmortized O(1) per append
Perl., .=ImmutableLinear with .=
Ruby+, <<Mutable (for <<)Linear with << or join()
Go+, strings.Builder.AppendImmutableLinear with Builder
Swift+, +=ImmutableOptimized; use joined() for loops

Case Conversion

Case conversion functions in programming languages typically transform the alphabetical characters within a string to either uppercase or lowercase, leaving non-alphabetic characters unchanged. These operations are fundamental for tasks such as data normalization, text processing, and user interface consistency, and they vary in implementation across languages based on string mutability and Unicode support. In object-oriented languages like Java, the String class provides toUpperCase() and toLowerCase() methods, which return a new string instance since strings are immutable. For example, "hello".toUpperCase() yields "HELLO". Similarly, Python's str type offers upper() and lower() methods that also create new strings, as in "hello".upper() producing "HELLO". In contrast, procedural languages like C use character-level functions such as toupper() from <ctype.h>, which operate on individual characters and require manual string traversal, often modifying the string in place if it's mutable. Perl provides uc() for uppercase and lc() for lowercase, which return new strings by default but can modify variables in place via the \U or \L operators in substitution contexts. Locale-aware variants enhance these functions to handle language-specific rules, particularly for characters. Java's toUpperCase(Locale) method, for instance, converts the German "ß" (sharp S) to "SS" when using the German locale, reflecting orthographic standards. Python's casefold() provides a locale-agnostic folding for caseless matching, but for precise locale support, it relies on external libraries like unicodedata. In C, the towupper() function from <wctype.h> supports wide characters and locales, ensuring correct handling in internationalized applications. Haskell, employing a functional , uses toUpper and toLower from Data.Char, which map over s via higher-order functions like fmap, producing new lists of characters that can be concatenated into a . Edge cases highlight the importance of robust Unicode compliance. Non-letter characters, such as or digits, remain unaltered in all these functions; for example, "Hello123!" becomes "HELLO123!" in uppercasing across , Python, and . Special Unicode behaviors include the Turkish dotted "İ" (U+0130), which uppercases to itself but lowercases to "i" (U+0069) without dot in locale-aware modes, preventing errors in Java's locale-specific toLowerCase() handles this correctly for Turkish. Mutable strings are rare for case conversion due to the simplicity of creating copies, but in languages like C++, std::transform with std::toupper can modify std::string in place for efficiency in performance-critical code.
LanguageUppercase FunctionLowercase FunctionReturns New String?Locale Support
JavatoUpperCase()toLowerCase()Yes (immutable)Yes, via Locale parameter
Pythonupper()lower()Yes (immutable)Partial (via casefold() or libraries)
Perluc()lc()Yes (default)Yes, via uc() with locale
Ctoupper() (per char)tolower() (per char)N/A (manual)Yes, via towupper()
HaskelltoUpper (per char)toLower (per char)Yes (functional map)Basic Unicode via Data.Char

Extraction and Slicing Functions

Substring Extraction

Substring extraction refers to operations in programming languages that allow retrieval of a contiguous portion of a string based on specified indices, enabling manipulation of text segments without altering the original string. This functionality is fundamental for tasks such as parsing, data processing, and text analysis, with implementations varying in syntax, parameter handling, and error semantics across languages. Most languages provide methods or operators that specify a starting position and either an ending position or a length, often returning a new containing the extracted characters. Common syntax patterns include method calls like substring(start, end) or substr(start, length), and operator-based slicing such as [start:end]. For instance, Java's String class uses the substring(int beginIndex, int endIndex) method, which extracts from the inclusive beginIndex to the exclusive endIndex. Similarly, Python employs slicing notation s[start:stop], where start is inclusive and stop is exclusive. In JavaScript, the preferred slice(start, end) method follows the same inclusive-exclusive convention, while the deprecated substr(start, length) uses a length parameter instead of an end index. Perl's substr(EXPR, OFFSET, LENGTH) function supports a length-based extraction, with OFFSET and LENGTH that can be negative for end-relative positioning. C++'s std::string::substr(pos, count) extracts up to count characters starting from pos, defaulting to the end of the string if count is omitted. The treatment of index boundaries differs significantly. In languages like Python, , 's slice, C++, and , the end index (when provided) is exclusive, meaning it points to the first character not included in the result; for example, Python's "abc"[1:3] yields "bc". In contrast, some variants like 's 1-based substr(string, start, length) include the full length without an exclusive end, extracting exactly length characters or fewer if bounds are exceeded. 's deprecated substr and 's Mid(string, start, length) also use length parameters, making the end inclusive up to the specified count. Default behaviors enhance usability by allowing partial specifications. Python defaults start to 0 and stop to the string length if omitted, so s[2:] extracts from index 2 to the end, and s[:3] takes the first three characters. Java provides an overload substring(beginIndex) that defaults the end to the string length. JavaScript's slice mirrors Python with defaults of 0 and length, while Perl and C++ default LENGTH or count to the remaining string if unspecified. AWK's substr(string, start) omits length to go to the end, using 1-based indexing where position 1 is the first character. Error handling for invalid indices varies to balance safety and convenience. Java throws an IndexOutOfBoundsException for negative indices, beginIndex > endIndex, or endIndex > length. C++ raises std::out_of_range if the starting position exceeds the string size. Perl returns undef and issues a warning if the extraction goes beyond the string end. In contrast, Python and JavaScript's slice avoid exceptions by clamping indices: out-of-bounds start or stop adjust to the string length or 0, returning an empty string or partial result without error, as in Python's "abc"[10:] yielding "". AWK returns an empty string if start exceeds the length, and Visual Basic's Mid does the same for start > length. JavaScript's substr returns empty if start >= length. Specialized variants exist for common extractions like prefixes, suffixes, or middles. Visual Basic provides Left(string, length) for the first length characters and Right(string, length) for the last, both 1-based and returning the full string if length >= actual length. Its Mid function handles middle sections similarly. AWK achieves left or right extraction via substr parameters, such as substr(&#36;1, 1, 3) for the first three characters. Perl supports negative offsets in substr for right-aligned starts, like substr($s, -3) for the last three characters. These variants simplify frequent operations but are often emulatable with general substring functions.
LanguageSyntax ExampleInclusive/ExclusiveDefaultsBounds Error Handling
Pythons[start:stop]Start incl., stop excl.start=0, stop=len(s)Clamps; no exception, empty if invalid
Javastr.substring(begin, end)Begin incl., end excl.End=len(str) in overloadIndexOutOfBoundsException
JavaScriptstr.slice(start, end)Start incl., end excl.start=0, end=len(str)Empty string; no exception
Perlsubstr($s, offset, length)Length-basedlength=to endundef + warning if beyond end
C++str.substr(pos, count)Pos incl., count excl.count=to endout_of_range if pos > size()
AWKsubstr(str, start, length)Length-based, 1-basedlength=to endEmpty string if start > len
Visual BasicMid(str, start, length)Length-based, 1-basedlength=to endEmpty if start > len

Trimming Whitespace

Trimming whitespace from strings is a fundamental operation in many programming languages, used to clean input data, normalize strings for comparison, or prepare text for further processing. This involves removing leading (prefix), trailing (suffix), or both types of whitespace characters from the ends of a string, without altering the original string due to immutability in most languages. Common implementations provide built-in methods that automate this process, differing in their definitions of whitespace, support for customization, and available variants for one-sided trimming. In Java, the String.trim() method removes leading and trailing characters whose Unicode codepoints are less than or equal to U+0020 (the character), effectively stripping spaces, tabs, newlines, and other control characters up to that point. Python's str.strip() serves a similar purpose for both ends, defaulting to whitespace characters such as spaces, tabs, newlines, carriage returns, formfeeds, and vertical tabs, while JavaScript's String.prototype.trim() targets both ends using a broader definition that includes all ECMAScript whitespace (e.g., spaces, tabs, line feeds, carriage returns) and line terminators. These functions return a new string, leaving the original unchanged, and handle edge cases consistently: an empty input string yields an empty result, an all-whitespace string results in an empty string, and a string without leading or trailing whitespace returns unchanged. For more granular control, languages like Python and PHP offer variants for left- or right-only trimming. Python provides lstrip() for leading characters and rstrip() for trailing, both customizable via an optional chars parameter that specifies any set of characters to remove (not limited to whitespace), where it strips all occurrences of those characters from the respective end until a non-matching character is found. For example, ' hello '.lstrip() yields 'hello ', while 'hello!'.strip('!') results in 'hello'. PHP similarly includes trim() for both ends, ltrim() for the beginning, and rtrim() for the end, all defaulting to a fixed set of whitespace characters (space, tab, newline, carriage return, null byte, vertical tab) but allowing customization with a second parameter to define a character set or range (e.g., trim($str, " \t.")). These return new strings as well. Perl, while lacking dedicated built-in trim functions in earlier versions (often relying on regular expressions like $str =~ s/^\s+|\s+$//g), introduced builtin::trim() in Perl 5.36 for both leading and trailing whitespace, which includes ordinary spaces, tabs, newlines, carriage returns, and all Unicode whitespace characters. This function returns a modified copy of the input string and is non-experimental since Perl 5.40. For one-sided trimming, Perl developers typically use regex substitutions, such as s/^\s+// for left and s/\s+$// for right, though no standardized ltrim() or rtrim() equivalents exist in the core language. The following table summarizes key differences in trimming functions across these languages:
LanguageFunction(s)Sides TrimmedDefault Whitespace CharactersCustomizable?Return TypeNotes
Javatrim()BothUnicode codepoints ≤ U+0020 (e.g., space, tab, newline, other controls up to space)NoNew StringUses internal substring for extraction; fixed definition excludes some Unicode whitespace.
Pythonstrip(), lstrip(), rstrip()Both, left, rightSpace, tab, newline, carriage return, formfeed, vertical tab (' \t\n\r\f\v')Yes (via chars)New strstrip() is an alias for full trimming; removes all instances of specified chars from ends.
JavaScripttrim() (also trimStart(), trimEnd())Both (variants: left, right)ECMAScript whitespace (spaces, tabs) and line terminators (e.g., \n, \r, \u2028, \u2029)NoNew StringPolyfillable in older environments; focuses on lexical grammar definitions.
Perlbuiltin::trim()BothSpace, tab, newline, carriage return, all Unicode whitespace (\s class)NoNew scalarAvailable since 5.36; one-sided via regex (e.g., s/^\s+//); no core ltrim/rtrim.
PHPtrim(), ltrim(), rtrim()Both, left, rightSpace, tab, newline, carriage return, null byte, vertical tab (" \t\n\r\0\v")Yes (via character mask)New stringSupports character ranges (e.g., "\x00..\x1F"); multibyte-safe variants in mbstring extension.
These implementations prioritize efficiency by scanning from the ends inward, avoiding full string copies unless necessary, and underscore the need for language-specific awareness when porting as whitespace definitions vary (e.g., Java's narrower scope versus Perl's Unicode inclusivity).

Decomposition Functions

Splitting Strings

Splitting strings involves dividing a string into a sequence of substrings based on specified delimiters, commonly returning an array or list of elements. This operation is fundamental in text processing across programming languages, enabling tasks such as parsing delimited data or tokenizing input. Languages differ in delimiter support, handling of edge cases like consecutive delimiters, and optional limits on the number of splits. In Python, the str.split() method splits a string using a separator string (defaulting to any whitespace), returning a list of substrings; it does not support regular expressions natively, requiring the re.split() function for pattern-based splitting. For example, "a,,b".split(",") yields ['a', '', 'b'], including empty strings for consecutive delimiters, while whitespace splitting collapses multiples: "a b".split() returns ['a', 'b']. The optional maxsplit parameter limits the number of splits, such as maxsplit=1 producing ['a', ',b'] from "a,b". Java's String.split(String regex, int limit) method uses a regular expression as the delimiter, always requiring regex syntax even for simple characters, and returns a String[] array. Consecutive delimiters produce empty strings, as in "a,,b".split(",") resulting in {"a", "", "b"}; the limit parameter controls the array size—positive values cap splits and retain trailing text, zero discards trailing empties, and negative allows all splits with empties. For instance, "a,b,c".split(",", 2) yields {"a", "b,c"}. This regex reliance can introduce pattern compilation overhead. JavaScript's String.prototype.split(separator, limit) accepts either a string or regular expression as separator, returning an array of substrings; an empty separator splits into UTF-16 code units. It includes empty strings for consecutive separators: "a,,b".split(",") gives ["a", "", "b"]. The limit (non-negative integer) restricts elements, omitting excess text beyond the limit, and a limit of 0 returns an empty array. Capturing groups in regex separators are included in the result array. PHP's explode(string $separator, string $string, int $limit = PHP_INT_MAX) function splits using a literal string separator (no regex; use preg_split() for patterns), returning an array and throwing a ValueError for empty separators in PHP 8.0+. Consecutive separators yield empty elements: explode(",", "a,,b") produces ["a", "", "b"]. The limit behaves differently—positive values cap elements with remainder in the last, negative excludes the last |limit| elements, and zero acts as 1. An empty input string returns [""]. In AWK, the split(string, array [, fieldsep]) function divides a string into an array using an extended regular expression (ERE) delimiter (defaulting to the global FS field separator, often whitespace), returning the element count; prior array contents are cleared. It treats consecutive delimiters as separate unless FS is whitespace, which collapses them: split("a,,b", a, ",") populates a[1]="a", a[2]="", a[3]="b". Field splitting via FS (e.g., FS=",") handles input lines similarly, supporting multiple characters or EREs like FS="[;,]".
LanguageFunctionDelimiter TypeLimit ParameterConsecutive DelimitersReturn Type
Pythonstr.split()String (whitespace default); no regexmaxsplit: caps splits, retains remainderIncludes empties for explicit sep; collapses for whitespaceList
String.split()Regex onlylimit: positive caps with remainder; 0 discards trailing empties; negative full with emptiesIncludes emptiesString[]
String.split()String or regexNon-negative: caps elements, omits excess; 0 returns []Includes emptiesArray
explode()String only; no regexPositive: caps with remainder in last; negative: excludes last |limit|; 0 as 1Includes emptiesArray
split()ERE (regex); defaults to FSNoneIncludes empties (except whitespace collapse)Array (populated, returns count)
These splitting functions typically operate in O(n) time relative to the input string length, with additional costs for regex evaluation in languages like Java, JavaScript, and AWK. As the inverse of joining operations, splitting facilitates decomposition for further processing.

Joining Collections

Joining collections refers to the process of concatenating multiple strings from an iterable or array into a single string, typically inserting a specified separator between each pair of elements. This operation is fundamental in string manipulation for tasks such as formatting lists, building CSV data, or reconstructing strings from parsed components. Unlike pairwise concatenation, which builds strings incrementally, joining functions handle the entire collection efficiently in one call, avoiding intermediate string allocations in many implementations. In Python, the str.join(iterable) method performs this operation, where the string instance serves as the separator and the iterable must contain strings or objects convertible to strings via str(). Non-string elements raise a TypeError. For example, ", ".join(["apple", "banana", "cherry"]) yields "apple, banana, cherry". An empty iterable returns an empty string, while a single-element iterable returns that element without any separator. This design ensures type safety and efficiency for iterables like lists or tuples. JavaScript provides Array.prototype.join([separator]), which converts array elements to strings using toString() and joins them with the optional separator (defaulting to a comma). Undefined or null elements are treated as empty strings. For instance, ["a", "b", "c"].join("-") produces "a-b-c". Empty arrays return an empty string, and single-element arrays return the stringified element alone. This method is versatile for handling mixed-type arrays common in web development. PHP's implode(separator, array) (or its alias join) concatenates array values into a string, converting non-string elements automatically. The separator defaults to an empty string if omitted. An example is implode(",", ["foo", "bar"]) resulting in "foo,bar". Empty arrays yield an empty string, and single-element arrays return the element without separators. Associative arrays use values only, and objects with __toString() are supported, making it flexible for dynamic data. In contrast, low-level languages like C lack a built-in joining function in the standard library, requiring manual iteration over a null-terminated array of strings (e.g., char**) with functions like strcat or snprintf in a loop to allocate and build the result. This approach demands explicit memory management and error handling for buffer overflows. For example, developers might use a loop to append strings with separators, as no single standard function exists for the operation. The following table summarizes key differences across these languages:
LanguageFunctionSeparator PlacementInput HandlingEdge Cases
Pythonstr.join(iterable)Between elements onlyStrict: strings only, else TypeErrorEmpty: ""; Single: no separator
JavaScriptArray.join([sep])Between elements onlyAuto-converts to strings; null/ undefined""Empty: ""; Single: no separator
PHPimplode(sep, array)Between elements onlyAuto-converts; supports objectsEmpty: ""; Single: no separator
CNone (manual loop)ManualManual string array handlingRequires custom implementation
This functionality often serves as the inverse of splitting strings into collections, enabling round-trip transformations.

Replacement and Formatting Functions

Substring Replacement

Substring replacement functions in programming languages allow developers to identify specific substrings within a string and substitute them with new content, facilitating text processing tasks such as data cleaning and pattern-based modifications. These operations typically distinguish between replacing all occurrences or limiting to the first match, and they often support both literal strings and regular expressions (regex) for more flexible matching. Most languages return a new string to preserve immutability, though some modify the original in place. Case sensitivity is the default behavior, with optional flags to ignore it in regex-enabled variants. In Python, the str.replace(old, new[, count]) method replaces all occurrences of the substring old with new, or the first count occurrences if specified; it operates on literal strings and is case-sensitive. For example, "hello world".replace("o", "a") yields "hella warld", replacing all instances. Python's standard library also provides re.sub(pattern, repl, string[, count=0]) from the re module for regex-based replacements, supporting case-insensitive matching via the re.IGNORECASE flag; this function similarly returns a new string and can limit substitutions. Unlike literal replacement, re.sub allows advanced pattern matching, such as substituting digits with asterisks in "abc123def", resulting in "abc***def". Java's String class offers replace(CharSequence target, CharSequence replacement) to substitute all non-overlapping occurrences of target with replacement using literal matching, which is case-sensitive and returns a new String instance due to string immutability. For instance, "hello world".replace("o", "a") produces "hella warld". Java also provides replaceFirst(String regex, String replacement) and replaceAll(String regex, String replacement) for regex support, where replaceFirst limits to the initial match and replaceAll handles all; case-insensitivity is achieved with regex flags like (?i). These methods do not support a direct count limit beyond the first/all distinction, emphasizing regex integration for complex substitutions over simple literals. Perl employs the substitution operator s/[pattern](/page/Pattern)/replacement/[gmi] for in-place modifications, where the /g replaces all occurrences, /i enables case-insensitivity, and no flag limits to the first match; it uses Perl's powerful regex engine by default. For example, $str = "hello world"; $str =~ s/o/a/g; changes $str to "hella warld", returning the number of substitutions (2 in this case). This operator modifies the string variable directly, unlike Python or Java, but can be used in list contexts to return modified copies. Perl's approach prioritizes regex for all substitutions, making literal replacements a special case without a dedicated non-regex function. In AWK, the gsub(regex, replacement [, target]) function performs global regex-based replacements, returning the number of substitutions while modifying target (or &#36;0 if omitted); it is case-sensitive by default with no built-in ignore-case flag, though regex patterns can incorporate case-insensitivity. For instance, in a script, gsub(/o/, "a", &#36;0) on "hello world" alters it to "hella warld" and returns 2. AWK lacks a literal-string-only replacement but offers sub(regex, replacement [, target]) for the first match only, aligning with its text-processing focus in Unix environments. Like Perl, it modifies in place but returns a count rather than the new string. Across these languages, substring replacement emphasizes efficiency for common text tasks, with Python and Java favoring immutable returns for thread safety, while Perl and AWK enable direct mutation for scripting performance. Regex support varies in depth—advanced in Perl and Python's re module, basic but integrated in Java and AWK—allowing case-insensitive operations via flags, though literal methods remain strictly case-sensitive. Limitations on replacement counts are more granular in Python (arbitrary N) compared to the binary first/all choices in Java, Perl, and AWK.

String Formatting and Interpolation

String formatting and interpolation in programming languages provide mechanisms to embed dynamic values into strings using templates, enabling the construction of formatted output such as logs, user interfaces, or reports. These techniques range from traditional C-style functions that use placeholders for substitution to modern interpolation syntax that integrates expressions directly into string literals, offering improved readability and reduced boilerplate code. In C, the sprintf function from the standard library (<stdio.h>) exemplifies the classic approach, where a format string specifies placeholders like %s for strings, %d for integers, or %f for floats, followed by arguments to fill them. For instance, sprintf(buf, "%s %d", "hello", 42); produces the string "hello 42" in a buffer buf, supporting advanced options such as alignment (e.g., %-10s for left-justified) and precision (e.g., %.2f for two decimal places). This method, defined in the ISO C standard, operates at runtime and requires careful buffer management to avoid overflows, with no built-in type safety beyond basic format matching. Java adopts a similar printf-inspired model through the String.format static method in the java.lang.String class, which uses printf-style format specifiers such as %s for strings and %d for integers, with support for numbered arguments (e.g., %1$s). For example, String.format("Hello %s, age %d", "Alice", 30); yields "Hello Alice, age 30". Introduced in Java 5, this API draws from the java.util.Formatter class and supports flags for width, precision, and padding, such as "%10.2f" for a right-aligned float with two decimals. Unlike C's sprintf, Java's implementation provides compile-time checks for format string validity via the @FormatMethod annotation in some IDEs, though argument type mismatches are caught at runtime; it also integrates locale-aware formatting for numbers and dates through java.text.MessageFormat. Python introduced formatted string literals, or f-strings, in version 3.6 via PEP 498, allowing direct expression evaluation within strings prefixed by f, such as f"{name} is {age}" where name and age are variables. This interpolation syntax supports format specifiers inside braces (e.g., f"{value:10.2f}" for aligned floats), evaluated at compile-time for basic expressions to enable some type safety, though complex expressions defer to runtime. F-strings improve upon Python's older %-formatting (similar to C) and str.format() method (using {} placeholders), offering concise syntax without method calls, and they handle nested formatting for dates or custom objects via the __format__ protocol. JavaScript employs template literals, introduced in ECMAScript 2015 (ES6), using backticks and ${} for interpolation, as in `Hello ${name}, you are ${age} years old.`, which substitutes variable values seamlessly. This feature, specified in the ECMAScript standard, supports tagged templates for custom processing and multiline strings, with runtime evaluation but no inherent compile-time type checking in most environments; placeholders can include expressions like ${age > 18 ? 'adult' : 'minor'}, and libraries like Intl API extend it for locale-specific number formatting (e.g., new Intl.NumberFormat('de-DE').format([value](/page/The_Variable))). Compared to concatenation (a simpler but less efficient alternative for basic cases), template literals reduce error-prone string joining. Across these languages, type safety varies: C and JavaScript rely on runtime validation, potentially leading to exceptions for mismatches, while Java and Python offer partial compile-time aids through annotations or expression parsing. Locale support enhances internationalization, as seen in Java's MessageFormat for pattern-based substitution with cultural adaptations (e.g., comma vs. period decimals), and similar capabilities in Python's strftime for dates or 's Intl extensions, ensuring formatted strings adapt to user locales without hardcoded adjustments.

Advanced or Specialized Functions

String Partitioning

String partitioning functions divide a string into three parts based on the first occurrence of a specified separator: the before the separator, the separator itself, and the after it. This approach provides a structured way to extract components around a delimiter without fully splitting the string into multiple pieces. Such functions are particularly useful in languages that emphasize readability and rapid prototyping, though they are not universally available across programming paradigms. In Python, the str.partition(sep) method implements this functionality by returning a containing the three parts. For instance, "a:b:c".partition(":") yields ("a", ":", "b:c"), capturing only the first while preserving the intact. If the separator is not found, it returns a with the original followed by two empty strings, such as "abc".partition(":") producing ("abc", "", ""). An empty separator raises a ValueError, ensuring the method is used with a valid . This design supports efficient parsing in scripting contexts. In Ruby, the String#partition(sep) method works similarly, returning a three-element with the part before the match, the match itself, and the part after. For example, "a:b:c".partition(":") yields ["a", ":", "b:c"]. If the separator is not found, it returns the original string followed by two empty strings, such as ["abc", "", ""]. Ruby's implementation also supports regular expressions as separators. Unlike Python and Ruby, many languages lack a built-in partitioning function, requiring manual implementation. In Java, the String class provides split(regex, limit) to achieve similar results by limiting splits to two, but it discards the separator and returns an rather than including it explicitly; for example, "a:b:c".split(":", 2) gives ["a", "b:c"], necessitating additional logic to reconstruct the separator. In C, string partitioning must be implemented manually using functions like strstr from <string.h> to locate the delimiter, followed by pointer arithmetic or strncpy to extract parts, as no standard library method returns the three components directly. Python also offers str.rpartition(sep) as a variant that splits at the last occurrence of the separator, returning ("a:b", ":", "c") for "a:b:c".rpartition(":") and two empty strings followed by the original if not found; this contrasts with partition by focusing on the end of the string. Ruby provides a similar rpartition method for the last occurrence. Common use cases for partitioning include parsing simple key-value pairs in configuration files or protocols, such as extracting a hostname from a URL like "scheme://host/path".partition("://") yielding ("scheme", "://", "host/path"). This feature is available in scripting languages like Python and Ruby. In other languages like Perl, split can mimic partitioning with limits, enhancing text processing workflows without the overhead of full decomposition.

String Reversal

String reversal is a fundamental string operation that inverts the order of characters within a string, often implemented differently across programming languages due to variations in string mutability and built-in support. In languages with immutable strings, reversal typically produces a new string, while mutable strings may allow in-place modification for efficiency. This operation is not universally standardized, appearing as a dedicated built-in in some languages but requiring manual implementation or auxiliary data structures in others, particularly for algorithmic exercises or palindrome checks. Perl provides a built-in reverse function that directly reverses a string in scalar context, returning a new string with characters in inverted order; for example, reverse "abc" yields "cba". This function handles both strings and lists in list context but is particularly efficient for scalar strings due to its native implementation. In Ruby, the reverse method returns a new string with the characters reversed, such as "abc".reverse yielding "cba"; there is also a reverse! method for in-place reversal on mutable strings. In contrast, Python lacks a direct string reversal method but offers the reversed() iterator, which can be converted to a string via ''.join(reversed(s)) for a reversed copy; this approach iterates from the end without modifying the original immutable string. Java, with its immutable String class, requires the mutable StringBuilder for efficient reversal, where the reverse() method inverts the contents in place and returns the modified builder, which can then be converted back to a string via toString(). For instance, new StringBuilder("abc").reverse().toString() produces "cba". Similarly, C++ supports in-place reversal of mutable std::string objects using the std::reverse algorithm from <algorithm>, which swaps elements beginning and ending from the string's iterators; this is efficient for large strings as it avoids allocation of a new object. In C, where strings are null-terminated character arrays, no built-in reversal exists, necessitating a manual loop to swap characters from the ends toward the center, often implemented with two pointers for O(n) time complexity. Edge cases in string reversal include empty strings, which remain empty upon reversal, and strings of odd length, where the middle character stays in place without special handling in most implementations. Unicode support introduces complexities, particularly with bidirectional text (e.g., mixing Latin and Arabic scripts), where naive reversal may disrupt logical reading order as defined by the Unicode Bidirectional Algorithm; languages like Python and handle this by reversing code points but may require additional normalization for correct visual rendering. Despite its utility in puzzles, data processing, and certain algorithms like checking for palindromes, dedicated string reversal is relatively rare as a core built-in across languages, with many (e.g., JavaScript, which uses s.split('').reverse().join('')) relying on decomposition into arrays for reversal via generic collection methods. This scarcity reflects a design philosophy prioritizing general-purpose operations over specialized ones, often leaving reversal to user-defined functions for portability.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.