Recent from talks
Contribute something
Nothing was collected or created yet.
Comparison of programming languages (string functions)
View on Wikipedia
String functions are used in computer programming languages to manipulate a string or query information about a string (some do both).
Most programming languages that have a string datatype will have some string functions although there may be other low-level ways within each language to handle strings directly. In object-oriented languages, string functions are often implemented as properties and methods of string objects. In functional and list-based languages a string is represented as a list (of character codes), therefore all list-manipulation procedures could be considered string functions. However such languages may implement a subset of explicit string-specific functions as well.
For function that manipulate strings, modern object-oriented languages, like C# and Java have immutable strings and return a copy (in newly allocated dynamic memory), while others, like C manipulate the original string unless the programmer copies data to a new string. See for example Concatenation below.
The most basic example of a string function is the length(string) function. This function returns the length of a string literal.
- e.g.
length("hello world")would return 11.
Other languages may have string functions with similar or exactly the same syntax or parameters or outcomes. For example, in many languages the length function is usually represented as len(string). The below list of common functions aims to help limit this confusion.
Common string functions (multi language reference)
[edit]String functions common to many languages are listed below, including the different names used. The below list of common functions aims to help programmers find the equivalent function in a language. Note, string concatenation and regular expressions are handled in separate pages. Statements in guillemets (« … ») are optional.
CharAt
[edit]| Definition | charAt(string,integer) returns character.
|
|---|---|
| Description | Returns character at index in the string. |
| Equivalent | See substring of length 1 character. |
| Format | Languages | Base index |
|---|---|---|
string[i]
|
ALGOL 68, APL, Julia, Pascal, Object Pascal (Delphi), Seed7 | 1 |
string[i]
|
C, C++, C#, Cobra, D, FreeBASIC, Go, Python,[1] PHP, Ruby,[1] Windows PowerShell, JavaScript, APL | 0 |
string{i}
|
PHP (deprecated in 5.3) | 0 |
string(i)
|
Ada | ≥1 |
Mid(string,i,1)
|
VB | 1 |
MID$(string,i,1)
|
BASIC | 1 |
string.Chars(i)
|
VB.NET | 0 |
string(i:i)
|
Fortran | 1 |
string.charAt(i)
|
Java, JavaScript | 0 |
string.[i]
|
OCaml, F# | 0 |
string.chars().nth(i)
|
Rust[2] | 0 |
string[i,1]
|
Pick Basic | 1 |
String.sub (string, i)
|
Standard ML | 0 |
string !! i
|
Haskell | 0 |
(string-ref string i)
|
Scheme | 0 |
(char string i)
|
Common Lisp | 0 |
(elt string i)
|
ISLISP | 0 |
(get string i)
|
Clojure | 0 |
substr(string, i, 1)
|
Perl 5[1] | 0 |
substr(string, i, 1)string.substr(i, 1)
|
Raku[3] | 0 |
substr(string, i, 1)
|
PL/I | 1 |
string.at(i)
|
C++ (STL) (w/ bounds checking) | 0 |
lists:nth(i, string)
|
Erlang | 1 |
[string characterAtIndex:i]
|
Objective-C (NSString * only)
|
0 |
string.sub(string, i, i)(string):sub(i, i)
|
Lua[1] | 1 |
string at: i
|
Smalltalk (w/ bounds checking) | 1 |
string index string i
|
Tcl | 0 |
StringTake[string, {i}]
|
Mathematica, Wolfram Language[1] | 1 |
string@i
|
Eiffel | 1 |
string (i:1)
|
COBOL | 1 |
${string_param:i:1}
|
Bash | 0 |
i⌷string
|
APL | 0 or 1 |
{ Example in Pascal }
var
MyStr: string = 'Hello, World';
MyChar: Char;
begin
MyChar := MyStr[2]; // 'e'
# Example in ALGOL 68 # "Hello, World"[2]; // 'e'
// Example in C
#include <stdio.h>
char myStr1[] = "Hello, World";
printf("%c", *(myStr1 + 1)); // 'e'
printf("%c", *(myStr1 + 7)); // 'W'
printf("%c", myStr1[11]); // 'd'
printf("%s", myStr1); // 'Hello, World'
printf("%s", "Hello(2), World(2)"); // 'Hello(2), World(2)'
import std;
using std::string;
char myStr1[] = "Hello(1), World(1)";
string myStr2 = "Hello(2), World(2)";
std::println("Hello(3), World(3)"); // 'Hello(3), World(3)'
std::println("{}", myStr2[6]); // '2'
std::println("{}", myStr1.substr(5, 3)); // '(1)'
// Example in C#
"Hello, World"[2]; // 'l'
# Example in Perl 5
substr("Hello, World", 1, 1); # 'e'
# Examples in Python
"Hello, World"[2] # 'l'
"Hello, World"[-3] # 'r'
# Example in Raku
"Hello, World".substr(1, 1); # 'e'
' Example in Visual Basic
Mid("Hello, World",2,1)
' Example in Visual Basic .NET
"Hello, World".Chars(2) ' "l"c
" Example in Smalltalk "
'Hello, World' at: 2. "$e"
//Example in Rust
"Hello, World".chars().nth(2); // Some('l')
Compare (integer result)
[edit]| Definition | compare(string1,string2) returns integer.
|
|---|---|
| Description | Compares two strings to each other. If they are equivalent, a zero is returned. Otherwise, most of these routines will return a positive or negative result corresponding to whether string1 is lexicographically greater than, or less than, respectively, than string2. The exceptions are the Scheme and Rexx routines which return the index of the first mismatch, and Smalltalk which answer a comparison code telling how the receiver sorts relative to string parameter. |
| Format | Languages |
|---|---|
IF string1<string2 THEN -1 ELSE ABS (string1>string2) FI
|
ALGOL 68 |
cmp(string1, string2)
|
Python 2 |
(string1 > string2) - (string1 < string2)
|
Python |
strcmp(string1, string2)
|
C, PHP |
std.string.cmp(string1, string2)
|
D |
StrComp(string1, string2)
|
VB, Object Pascal (Delphi) |
string1 cmp string2
|
Perl, Raku |
string1 compare: string2
|
Smalltalk (Squeak, Pharo) |
string1 <=> string2
|
Ruby, C++ (STL, C++20)[4] |
string1.compare(string2)
|
C++ (STL), Swift (Foundation) |
compare(string1, string2)
|
Rexx, Seed7 |
compare(string1, string2, pad)
|
Rexx |
CompareStr(string1, string2)
|
Pascal, Object Pascal (Delphi) |
string1.compareTo(string2)
|
Cobra, Java |
string1.CompareTo(string2)
|
VB .NET, C#, F# |
(compare string1 string2)
|
Clojure |
(string= string1 string2)
|
Common Lisp |
(string-compare string1 string2 p< p= p>)
|
Scheme (SRFI 13) |
(string= string1 string2)
|
ISLISP |
compare string1 string2
|
OCaml |
String.compare (string1, string2)
|
Standard ML[5] |
compare string1 string2
|
Haskell[6] |
[string]::Compare(string1, string2)
|
Windows PowerShell |
[string1 compare:string2]
|
Objective-C (NSString * only)
|
LLT(string1,string2)LLE(string1,string2)LGT(string1,string2)LGE(string1,string2)
|
Fortran[7] |
string1.localeCompare(string2)
|
JavaScript |
|
Go |
string compare string1 string2
|
Tcl |
compare(string1,string2,count)
|
PL/I[8] |
string1.cmp(string2)
|
Rust[9] |
# Example in Perl 5
"hello" cmp "world"; # returns -1
# Example in Python
cmp("hello", "world") # returns -1
# Examples in Raku
"hello" cmp "world"; # returns Less
"world" cmp "hello"; # returns More
"hello" cmp "hello"; # returns Same
/** Example in Rexx */
compare("hello", "world") /* returns index of mismatch: 1 */
; Example in Scheme
(use-modules (srfi srfi-13))
; returns index of mismatch: 0
(string-compare "hello" "world" values values values)
Compare (relational operator-based, Boolean result)
[edit]| Definition | string1 OP string2 OR (compare string1 string2) returns Boolean.
|
|---|---|
| Description | Lexicographically compares two strings using a relational operator or function. Boolean result returned. |
| Format | Languages |
|---|---|
string1 OP string2, where OP can be any of =, <>, <, >, <= and >=
|
Pascal, Object Pascal (Delphi), OCaml, Seed7, Standard ML, BASIC, VB, VB .NET, F# |
string1 OP string2, where OP can be any of =, /=, ≠, <, >, <=, ≤ and ≥; Also: EQ, NE, LT, LE, GE and GT
|
ALGOL 68 |
(stringOP? string1 string2), where OP can be any of =, -ci=, <, -ci<, >, -ci>, <=, -ci<=, >= and -ci>= (operators starting with '-ci' are case-insensitive)
|
Scheme |
(stringOP string1 string2), where OP can be any of =, -ci=, <>, -ci<>, <, -ci<, >, -ci>, <=, -ci<=, >= and -ci>= (operators starting with '-ci' are case-insensitive)
|
Scheme (SRFI 13) |
(stringOP string1 string2), where OP can be any of =, -equal, /=, -not-equal, <, -lessp, >, -greaterp, <=, -not-greaterp, >= and -not-lessp (the verbal operators are case-insensitive)
|
Common Lisp |
(stringOP string1 string2), where OP can be any of =, /=, <, >, <=, and >=
|
ISLISP |
string1 OP string2, where OP can be any of =, \=, <, >, <= and >=[10]
|
Rexx |
string1 OP string2, where OP can be any of =, ¬=, <, >, <=, >=, ¬< and ¬>[11]
|
PL/I |
string1 OP string2, where OP can be any of =, /=, <, >, <= and >=
|
Ada |
string1 OP string2, where OP can be any of ==, /=, <, >, =< and >=
|
Erlang |
string1 OP string2, where OP can be any of ==, /=, <, >, <= and >=
|
Haskell |
string1 OP string2, where OP can be any of eq, ne, lt, gt, le and ge
|
Perl, Raku |
string1 OP string2, where OP can be any of ==, !=, <, >, <= and >=
|
C++ (STL), C#, D, Go, JavaScript, Python, PHP, Ruby, Rust,[12] Swift |
string1 OP string2, where OP can be any of -eq, -ceq, -ne, -cne, -lt, -clt, -gt, -cgt, -le, -cle, -ge, and -cge (operators starting with 'c' are case-sensitive)
|
Windows PowerShell |
string1 OP string2, where OP can be any of ==, ~=, <, >, <= and >=
|
Lua |
string1 OP string2, where OP can be any of =, ~=, <, >, <= and >=
|
Smalltalk |
string1 OP string2, where OP can be any of ==, /=, <, >, <= and >=; Also: .EQ., .NE., .LT., .LE., .GT. and .GE.
|
Fortran.[13] |
string1 OP string2 where OP can be any of =, <>, <, >, <=, >= as well as worded equivalents
|
COBOL |
string1 OP string2 where OP can be any of ==, <>, <, >, <= and >=
|
Cobra |
string1 OP string2 is available in the syntax, but means comparison of the pointers pointing to the strings, not of the string contents. Use the Compare (integer result) function.
|
C, Java |
string1.METHOD(string2) where METHOD is any of eq, ne, gt, lt, ge, le
|
Rust[12] |
% Example in Erlang
"hello" > "world". % returns false
# Example in Raku
"art" gt "painting"; # returns False
"art" lt "painting"; # returns True
# Example in Windows PowerShell
"hello" -gt "world" # returns false
;; Example in Common Lisp
(string> "art" "painting") ; returns nil
(string< "art" "painting") ; returns non nil
Concatenation
[edit]| Definition | concatenate(string1,string2) returns string.
|
|---|---|
| Description | Concatenates (joins) two strings to each other, returning the combined string. Note that some languages like C have mutable strings, so really the second string is being appended to the first string and the mutated string is returned. |
| Format | Languages |
|---|---|
string1 adjacent_to string2
|
Rexx (abutment, equivalent to string1 || string2)
|
string1 whitespace string2
|
Rexx (equivalent to string1 || ' ' || string2)
|
string1 & string2
|
Ada, FreeBASIC, Seed7, BASIC, VB, VB .NET, COBOL (between literals only) |
strcat(string1, string2)
|
C, C++ (char * only)[14]
|
string1 . string2
|
Perl, PHP |
string1 + string2
|
ALGOL 68, C++ (STL), C#, Cobra, FreeBASIC, Go, Pascal, Object Pascal (Delphi), Java, JavaScript, Windows PowerShell, Python, Ruby, Rust,[15] F#, Swift, Turing, VB |
string1 ~ string2
|
D, Raku |
(string-append string1 string2)
|
Scheme, ISLISP |
(concatenate 'string string1 string2)
|
Common Lisp |
(str string1 string2)
|
Clojure |
string1 || string2
|
Rexx, SQL, PL/I |
string1 // string2
|
Fortran |
string1 ++ string2
|
Erlang, Haskell |
string1 ^ string2
|
OCaml, Standard ML, F# |
[string1 stringByAppendingString:string2]
|
Objective-C (NSString * only)
|
string1 .. string2
|
Lua |
string1 , string2
|
Smalltalk, APL |
string1 string2
|
SNOBOL |
string1string2
|
Bash |
string1 <> string2
|
Mathematica |
| concat string1 string2 | Tcl |
{ Example in Pascal }
'abc' + 'def'; // returns "abcdef"
// Example in C#
"abc" + "def"; // returns "abcdef"
' Example in Visual Basic
"abc" & "def" ' returns "abcdef"
"abc" + "def" ' returns "abcdef"
"abc" & Null ' returns "abc"
"abc" + Null ' returns Null
// Example in D
"abc" ~ "def"; // returns "abcdef"
;; Example in common lisp
(concatenate 'string "abc " "def " "ghi") ; returns "abc def ghi"
# Example in Perl 5
"abc" . "def"; # returns "abcdef"
"Perl " . 5; # returns "Perl 5"
/* Example in PL/I */
"abc" || "def" /* returns "abcdef" */
# Example in Raku
"abc" ~ "def"; # returns "abcdef"
"Perl " ~ 6; # returns "Perl 6"
/* Example in Rexx */
"Strike"2 /* returns "Strike2" */
"Strike" 2 /* returns "Strike 2" */
Contains
[edit]| Definition | contains(string,substring) returns boolean
|
|---|---|
| Description | Returns whether string contains substring as a substring. This is equivalent to using Find and then detecting that it does not result in the failure condition listed in the third column of the Find section. However, some languages have a simpler way of expressing this test. |
| Related | Find |
| Format | Languages |
|---|---|
string_in_string(string, loc int, substring)
|
ALGOL 68 |
ContainsStr(string, substring)
|
Object Pascal (Delphi) |
strstr(string, substring) != NULL
|
C, C++ (char * only)
|
string.Contains(substring)
|
C#, VB .NET, Windows PowerShell, F# |
string.contains(substring)
|
Cobra, Java (1.5+), Raku, Rust,[16] C++ (C++23)[17] |
string.indexOf(substring) >= 0
|
JavaScript |
strpos(string, substring) !== false
|
PHP |
str_contains(string, substring)
|
PHP (8+) |
pos(string, substring) <> 0
|
Seed7 |
substring in string
|
Cobra, Python (2.3+) |
string.find(string, substring) ~= nil
|
Lua |
string.include?(substring)
|
Ruby |
Data.List.isInfixOf substring string
|
Haskell (GHC 6.6+) |
string includesSubstring: substring
|
Smalltalk (Squeak, Pharo, Smalltalk/X) |
String.isSubstring substring string
|
Standard ML |
(search substring string)
|
Common Lisp |
|
ISLISP |
(substring? substring string)
|
Clojure |
! StringFreeQ[string, substring]
|
Mathematica |
index(string, substring, startpos)>0
|
Fortran, PL/I[18] |
index(string, substring, occurrence)>0
|
Pick Basic |
strings.Contains(string, substring)
|
Go |
string.find(substring) != string::npos
|
C++ |
[string containsString:substring]
|
Objective-C (NSString * only, iOS 8+/OS X 10.10+)
|
string.rangeOfString(substring) != nil
|
Swift (Foundation) |
∨/substring⍷string
|
APL |
¢ Example in ALGOL 68 ¢ string in string("e", loc int, "Hello mate"); ¢ returns true ¢ string in string("z", loc int, "word"); ¢ returns false ¢
// Example In C#
"Hello mate".Contains("e"); // returns true
"word".Contains("z"); // returns false
# Example in Python
"e" in "Hello mate" # returns true
"z" in "word" # returns false
# Example in Raku
"Good morning!".contains('z') # returns False
"¡Buenos días!".contains('í'); # returns True
" Example in Smalltalk "
'Hello mate' includesSubstring: 'e' " returns true "
'word' includesSubstring: 'z' " returns false "
Equality
[edit]Tests if two strings are equal. See also #Compare and #Compare. Note that doing equality checks via a generic Compare with integer result is not only confusing for the programmer but is often a significantly more expensive operation; this is especially true when using "C-strings".
| Format | Languages |
|---|---|
string1 == string2
|
Python, C++ (STL), C#, Cobra, Go, JavaScript (similarity), PHP (similarity), Ruby, Rust,[12] Erlang, Haskell, Lua, D, Mathematica, Swift |
string1 === string2
|
JavaScript, PHP |
string1 == string2string1 .EQ. string2
|
Fortran |
strcmp(string1, string2) == 0
|
C |
(string=? string1 string2)
|
Scheme |
(string= string1 string2)
|
Common Lisp, ISLISP |
string1 = string2
|
ALGOL 68, Ada, Object Pascal (Delphi), OCaml, Pascal, Rexx, Seed7, Standard ML, BASIC, VB, VB .NET, F#, Smalltalk, PL/I, COBOL |
test string1 = string2[ string1 = string2 ]
|
Bourne Shell |
string1 eq string2
|
Perl, Raku, Tcl |
string1.equals(string2)
|
Cobra, Java |
string1.Equals(string2)
|
C# |
string1 -eq string2[string]::Equals(string1, string2)
|
Windows PowerShell |
[string1 isEqualToString:string2][string1 isEqual:string2]
|
Objective-C (NSString * only)
|
string1 ≡ string2
|
APL |
string1.eq(string2)
|
Rust[12] |
// Example in C#
"hello" == "world" // returns false
' Example in Visual Basic
"hello" = "world" ' returns false
# Examples in Perl 5
'hello' eq 'world' # returns 0
'hello' eq 'hello' # returns 1
# Examples in Raku
'hello' eq 'world' # returns False
'hello' eq 'hello' # returns True
# Example in Windows PowerShell
"hello" -eq "world" # returns false
⍝ Example in APL
'hello' ≡ 'world' ⍝ returns 0
Find
[edit]| Definition | find(string,substring) returns integer
|
|---|---|
| Description | Returns the position of the start of the first occurrence of substring in string. If the substring is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE. |
| Related | instrrev |
| Format | Languages | If not found |
|---|---|---|
string in string(substring, pos, string[startpos:])
|
ALGOL 68 | returns BOOL: TRUE or FALSE, and position in REF INT pos. |
InStr(«startpos,»string,substring)
|
VB (positions start at 1) | returns 0 |
INSTR$(string,substring)
|
BASIC (positions start at 1) | returns 0 |
index(string,substring)
|
AWK | returns 0 |
index(string,substring«,startpos»)
|
Perl 5 | returns −1 |
index(string,substring«,startpos»)string.index(substring,«,startpos»)
|
Raku | returns Nil |
instr(«startpos,»string,substring)
|
FreeBASIC | returns 0 |
strpos(string,substring«,startpos»)
|
PHP | returns FALSE |
locate(string, substring)
|
Ingres | returns string length + 1 |
strstr(string, substring)
|
C, C++ (char * only, returns pointer to first character)
|
returns NULL |
std.string.indexOf(string, substring)
|
D | returns −1 |
pos(string, substring«, startpos»)
|
Seed7 | returns 0 |
strings.Index(string, substring)
|
Go | returns −1 |
pos(substring, string)
|
Pascal, Object Pascal (Delphi) | returns 0 |
pos(substring, string«,startpos»)
|
Rexx | returns 0 |
string.find(substring«,startpos»)
|
C++ (STL) | returns std::string::npos |
string.find(substring«,startpos«,endpos»»)
|
Python | returns −1 |
string.index(substring«,startpos«,endpos»»)
|
raises ValueError | |
string.index(substring«,startpos»)
|
Ruby | returns nil |
string.indexOf(substring«,startpos»)
|
Java, JavaScript | returns −1 |
string.IndexOf(substring«,startpos«, charcount»»)
|
VB .NET, C#, Windows PowerShell, F# | returns −1 |
string:str(string, substring)
|
Erlang | returns 0 |
(string-contains string substring)
|
Scheme (SRFI 13) | returns #f |
(search substring string)
|
Common Lisp | returns NIL |
(string-index substring string)
|
ISLISP | returns nil
|
List.findIndex (List.isPrefixOf substring) (List.tails string)
|
Haskell (returns only index) | returns Nothing |
Str.search_forward (Str.regexp_string substring) string 0
|
OCaml | raises Not_found |
Substring.size (#1 (Substring.position substring (Substring.full string)))
|
Standard ML | returns string length |
[string rangeOfString:substring].location
|
Objective-C (NSString * only)
|
returns NSNotFound |
string.find(string, substring)(string):find(substring)
|
Lua | returns nil |
string indexOfSubCollection: substring startingAt: startpos ifAbsent: aBlockstring findString: substring startingAt: startpos
|
Smalltalk (Squeak, Pharo) | evaluate aBlock which is a block closure (or any object understanding value) returns 0 |
startpos = INDEX(string, substring «,back» «, kind»)
|
Fortran | returns 0 if substring is not in string; returns LEN(string)+1 if substring is empty |
POSITION(substring IN string)
|
SQL | returns 0 (positions start at 1) |
index(string, substring, startpos )
|
PL/I[18] | returns 0 (positions start at 1) |
index(string, substring, occurrence )
|
Pick Basic | returns 0 if occurrence of substring is not in string; (positions start at 1) |
string.indexOf(substring«,startpos«, charcount»»)
|
Cobra | returns −1 |
string first substring string startpos
|
Tcl | returns −1 |
(substring⍷string)⍳1
|
APL | returns 1 + the last position in string |
string.find(substring)
|
Rust[19] | returns None
|
Examples
- Common Lisp
(search "e" "Hello mate") ; returns 1 (search "z" "word") ; returns NIL
- C#
"Hello mate".IndexOf("e"); // returns 1 "Hello mate".IndexOf("e", 4); // returns 9 "word".IndexOf("z"); // returns -1
- Raku
"Hello, there!".index('e') # returns 1 "Hello, there!".index('z') # returns Nil
- Scheme
(use-modules (srfi srfi-13)) (string-contains "Hello mate" "e") ; returns 1 (string-contains "word" "z") ; returns #f
- Visual Basic
' Examples in InStr("Hello mate", "e") ' returns 2 InStr(5, "Hello mate", "e") ' returns 10 InStr("word", "z") ' returns 0
- Smalltalk
'Hello mate' indexOfSubCollection:'ate' "returns 8"
'Hello mate' indexOfSubCollection:'late' "returns 0"
I'Hello mate' indexOfSubCollection:'late' ifAbsent:[ 99 ] "returns 99"
'Hello mate' indexOfSubCollection:'late' ifAbsent:[ self error ] "raises an exception"
Find character
[edit]| Definition | find_character(string,char) returns integer
|
|---|---|
| Description | Returns the position of the start of the first occurrence of the character char in string. If the character is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE. This can be accomplished as a special case of #Find, with a string of one character; but it may be simpler or more efficient in many languages to locate just one character. Also, in many languages, characters and strings are different types, so it is convenient to have such a function. |
| Related | find |
| Format | Languages | If not found |
|---|---|---|
char in string(char, pos, string[startpos:])
|
ALGOL 68 | returns BOOL: TRUE or FALSE, and position in REF INT pos. |
instr(string, any char«,startpos») (char, can contain more them one char, in which case the position of the first appearance of any of them is returned.)
|
FreeBASIC | returns 0 |
strchr(string,char)
|
C, C++ (char * only, returns pointer to character)
|
returns NULL |
std.string.find(string, dchar)
|
D | returns −1 |
string.find(char«,startpos»)
|
C++ (STL) | returns std::string::npos |
pos(string, char«, startpos»)
|
Seed7 | returns 0 |
strings.IndexRune(string,char)
|
Go | returns −1 |
string.indexOf(char«,startpos»)
|
Java, JavaScript | returns −1 |
string.IndexOf(char«,startpos«, charcount»»)
|
VB .NET, C#, Windows PowerShell, F# | returns −1 |
(position char string)
|
Common Lisp | returns NIL |
(char-index char string)
|
ISLISP | returns nil |
List.elemIndex char string
|
Haskell (returns Just index)
|
returns Nothing |
String.index string char
|
OCaml | raises Not_found |
position = SCAN (string, set «, back» «, kind»)position = VERIFY (string, set «, back» «, kind»)[a]
|
Fortran | returns zero |
string indexOf: char ifAbsent: aBlockstring indexOf: charstring includes: char
|
Smalltalk | evaluate aBlock which is a BlockClosure (or any object understanding value)returns 0 returns true or false
|
index(string, char, startpos )
|
PL/I[20] | returns 0 (positions start at 1) |
string.index(?char)
|
Ruby | returns nil |
strpos(string,char,startpos)
|
PHP | returns false |
string.indexOf(char«,startpos«, charcount»»)
|
Cobra | returns −1 |
string⍳char
|
APL | returns 1 + the last position in string |
string.find(substring)
|
Rust[19] | returns None |
// Examples in C#
"Hello mate".IndexOf('e'); // returns 1
"word".IndexOf('z') // returns -1
; Examples in Common Lisp
(position #\e "Hello mate") ; returns 1
(position #\z "word") ; returns NIL
^a Given a set of characters, SCAN returns the position of the first character found,[21] while VERIFY returns the position of the first character that does not belong to the set.[22]
Format
[edit]| Definition | format(formatstring, items) returns string
|
|---|---|
| Description | Returns the formatted string representation of one or more items. |
| Format | Languages | Format string syntax |
|---|---|---|
associate(file, string); putf(file, $formatstring$, items)
|
ALGOL 68 | ALGOL |
Format(item, formatstring)
|
VB |
|
sprintf(formatstring, items)
|
Perl, PHP, Raku, Ruby | C |
item.fmt(formatstring)
|
Raku | C |
io_lib:format(formatstring, items)
|
Erlang |
|
sprintf(outputstring, formatstring, items)
|
C | C |
std::format(formatstring, items)
|
C++ (C++20) | Python |
std.string.format(formatstring, items)
|
D | C |
Format(formatstring, items)
|
Object Pascal (Delphi) |
|
fmt.Sprintf(formatstring, items)
|
Go | C |
printf formatstring items
|
Unix | C |
formatstring % (items)
|
Python, Ruby | C |
formatstring.format(items)
|
Python | .NET |
fformatstring
|
Python 3 | |
Printf.sprintf formatstring[23] items
|
OCaml, F# | C |
Text.Printf.printf formatstring items
|
Haskell (GHC) | C |
formatstring printf: items
|
Smalltalk | C |
String.format(formatstring, items)
|
Java | C |
String.Format(formatstring, items)
|
VB .NET, C#, F# | .NET |
(format formatstring items)
|
Scheme (SRFI 28) | Lisp |
(format nil formatstring items)
|
Common Lisp | Lisp |
(format formatstring items)
|
Clojure | Lisp |
formatstring -f items
|
Windows PowerShell | .NET |
[NSString stringWithFormat:formatstring, items]
|
Objective-C (NSString * only)
|
C |
String(format:formatstring, items)
|
Swift (Foundation) | C |
string.format(formatstring, items)(formatstring):format(items)
|
Lua | C |
WRITE (outputstring, formatstring) items
|
Fortran | Fortran |
put string(string) edit(items)(format)
|
PL/I | PL/I (similar to Fortran) |
String.format(formatstring, items)
|
Cobra | .NET |
format formatstring items
|
Tcl | C |
formatnumbers ⍕ itemsformatstring ⎕FMT items
|
APL | APL |
format!(formatstring, items)
|
Rust[24] | Python |
// Example in C#
String.Format("My {0} costs {1:C2}", "pen", 19.99); // returns "My pen costs $19.99"
// Example in Object Pascal (Delphi)
Format('My %s costs $%2f', ['pen', 19.99]); // returns "My pen costs $19.99"
// Example in Java
String.format("My %s costs $%2f", "pen", 19.99); // returns "My pen costs $19.99"
# Examples in Raku
sprintf "My %s costs \$%.2f", "pen", 19.99; # returns "My pen costs $19.99"
1.fmt("%04d"); # returns "0001"
# Example in Python
"My %s costs $%.2f" % ("pen", 19.99); # returns "My pen costs $19.99"
"My {0} costs ${1:.2f}".format("pen", 19.99); # returns "My pen costs $19.99"
#Example in Python 3.6+
pen = "pen"
f"My {pen} costs {19.99}" #returns "My pen costs 19.99"
; Example in Scheme
(format "My ~a costs $~1,2F" "pen" 19.99) ; returns "My pen costs $19.99"
/* example in PL/I */
put string(some_string) edit('My ', 'pen', ' costs', 19.99)(a,a,a,p'$$$V.99')
/* returns "My pen costs $19.99" */
Inequality
[edit]Tests if two strings are not equal. See also #Equality.
| Format | Languages |
|---|---|
string1 ne string2string1 NE string2
|
ALGOL 68 – note: the operator "ne" is literally in bold type-font. |
string1 /= string2
|
ALGOL 68, Ada, Erlang, Fortran, Haskell |
string1 <> string2
|
BASIC, VB, VB .NET, Pascal, Object Pascal (Delphi), OCaml, PHP, Seed7, Standard ML, F#, COBOL, Cobra, Python 2 (deprecated) |
string1 # string2
|
BASIC (some implementations) |
string1 ne string2
|
Perl, Raku |
(string<> string1 string2)
|
Scheme (SRFI 13) |
(string/= string1 string2)
|
Common Lisp |
(string/= string1 string2)
|
ISLISP |
(not= string1 string2)
|
Clojure |
string1 != string2
|
C++ (STL), C#, Go, JavaScript (not similar), PHP (not similar), Python, Ruby, Rust,[12] Swift, D, Mathematica |
string1 !== string2
|
JavaScript, PHP |
string1 \= string2
|
Rexx |
string1 ¬= string2
|
PL/I |
test string1 != string2[ string1 != string2 ]
|
Bourne Shell |
string1 -ne string2
|
Windows PowerShell |
string1 ~= string2
|
Lua, Smalltalk |
string1 ≢ string2
|
APL |
string1.ne(string2)
|
Rust[12] |
// Example in C#
"hello" != "world" // returns true
' Example in Visual Basic
"hello" <> "world" ' returns true
;; Example in Clojure
(not= "hello" "world") ; ⇒ true
# Example in Perl 5
'hello' ne 'world' # returns 1
# Example in Raku
'hello' ne 'world' # returns True
# Example in Windows PowerShell
"hello" -ne "world" # returns true
index
[edit]see #Find
indexof
[edit]see #Find
instr
[edit]see #Find
instrrev
[edit]see #rfind
join
[edit]| Definition | join(separator, list_of_strings) returns a list of strings joined with a separator
|
|---|---|
| Description | Joins the list of strings into a new string, with the separator string between each of the substrings. Opposite of split. |
| Related | sprintf |
| Format | Languages |
|---|---|
std.string.join(array_of_strings, separator)
|
D |
string:join(list_of_strings, separator)
|
Erlang |
join(separator, list_of_strings)
|
Perl, PHP, Raku |
implode(separator, array_of_strings)
|
PHP |
separator.join(sequence_of_strings)
|
Python, Swift 1.x |
array_of_strings.join(separator)
|
Ruby, JavaScript, Raku, Rust[25] |
(string-join array_of_strings separator)
|
Scheme (SRFI 13) |
(format nil "~{~a~^separator~}" array_of_strings)
|
Common Lisp |
(clojure.string/join separator list_of_strings)(apply str (interpose separator list_of_strings))
|
Clojure |
strings.Join(array_of_strings, separator)
|
Go |
join(array_of_strings, separator)
|
Seed7 |
String.concat separator list_of_strings
|
OCaml |
String.concatWith separator list_of_strings
|
Standard ML |
Data.List.intercalate separator list_of_strings
|
Haskell (GHC 6.8+) |
Join(array_of_strings, separator)
|
VB |
String.Join(separator, array_of_strings)
|
VB .NET, C#, F# |
String.join(separator, array_of_strings)
|
Java 8+ |
&{$OFS=$separator; "$array_of_strings"}array_of_strings -join separator
|
Windows PowerShell |
[array_of_strings componentsJoinedByString:separator]
|
Objective-C (NSString * only)
|
table.concat(table_of_strings, separator)
|
Lua |
collectionOfAnything joinUsing: separator
|
Smalltalk (Squeak, Pharo) |
array_of_strings.join(separator«, final_separator»)
|
Cobra |
sequence_of_strings.joinWithSeparator(separator)
|
Swift 2.x |
1↓∊separator,¨list_of_strings
|
APL |
// Example in C#
String.Join("-", {"a", "b", "c"}) // "a-b-c"
" Example in Smalltalk "
#('a' 'b' 'c') joinUsing: '-' " 'a-b-c' "
# Example in Perl 5
join( '-', ('a', 'b', 'c')); # 'a-b-c'
# Example in Raku
<a b c>.join('-'); # 'a-b-c'
# Example in Python
"-".join(["a", "b", "c"]) # 'a-b-c'
# Example in Ruby
["a", "b", "c"].join("-") # 'a-b-c'
; Example in Scheme
(use-modules (srfi srfi-13))
(string-join '("a" "b" "c") "-") ; "a-b-c"
lastindexof
[edit]see #rfind
left
[edit]| Definition | left(string,n) returns string
|
|---|---|
| Description | Returns the left n part of a string. If n is greater than the length of the string then most implementations return the whole string (exceptions exist – see code examples). Note that for variable-length encodings such as UTF-8, UTF-16 or Shift-JIS, it can be necessary to remove string positions at the end, in order to avoid invalid strings. |
| Format | Languages |
|---|---|
|
Ada |
substr(string, 0, n)
|
AWK (changes string), Perl, PHP, Raku |
LEFT$(string,n)
|
BASIC, VB |
left(string,n)
|
VB, FreeBASIC, Ingres, Pick Basic |
strncpy(string2, string, n)
|
C standard library |
string.substr(0,n)
|
C++ (STL), Raku |
[string substringToIndex:n]
|
Objective-C (NSString * only)
|
|
Clojure |
string[0 .. n]
|
D[26] |
string:substr(string, start, length)
|
Erlang |
(subseq string 0 n)
|
Common Lisp |
string[:n]
|
Cobra, Go, Python |
left(string,n «,padchar»)
|
Rexx, Erlang |
string[0, n]string[0..n - 1]
|
Ruby |
string[1, n]
|
Pick Basic |
string[ .. n]
|
Seed7 |
string.Substring(0,n)
|
VB .NET, C#, Windows PowerShell, F# |
leftstr(string, n)
|
Pascal, Object Pascal (Delphi) |
copy (string,1,n)
|
Turbo Pascal |
string.substring(0,n)
|
Java,[27] JavaScript |
(string-take string n)
|
Scheme (SRFI 13) |
take n string
|
Haskell |
String.extract (string, n, NONE)
|
Standard ML |
String.sub string 0 n
|
OCaml[28] |
string.[..n]
|
F# |
string.sub(string, 1, n)(string):sub(1, n)
|
Lua |
string first: n
|
Smalltalk (Squeak, Pharo) |
string(:n)
|
Fortran |
StringTake[string, n]
|
Mathematica[29] |
string («FUNCTION» LENGTH(string) - n:n)
|
COBOL |
string.substring(0, n)
|
Cobra |
n↑string.
|
APL |
string[0..n]string[..n]string.get(0..n)string.get(..n)
|
Rust[30] |
# Example in Raku
"Hello, there!".substr(0, 6); # returns "Hello,"
/* Examples in Rexx */
left("abcde", 3) /* returns "abc" */
left("abcde", 8) /* returns "abcde " */
left("abcde", 8, "*") /* returns "abcde***" */
; Examples in Scheme
(use-modules (srfi srfi-13))
(string-take "abcde", 3) ; returns "abc"
(string-take "abcde", 8) ; error
' Examples in Visual Basic
Left("sandroguidi", 3) ' returns "san"
Left("sandroguidi", 100) ' returns "sandroguidi"
len
[edit]see #length
length
[edit]| Definition | length(string) returns an integer number
|
|---|---|
| Description | Returns the length of a string (not counting the null terminator or any other of the string's internal structural information). An empty string returns a length of 0. |
| Format | Returns | Languages |
|---|---|---|
string'Length
|
|
Ada |
UPB string
|
|
ALGOL 68 |
echo "${#string_param}"
|
|
Bash |
length(string)
|
|
Ingres, Perl 5, Pascal, Object Pascal (Delphi), Rexx, Seed7, SQL, PL/I |
len(string)
|
|
BASIC, FreeBASIC, Python, Go, Pick Basic |
length(string), string:len(string)
|
|
Erlang |
Len(string)
|
|
VB, Pick Basic |
string.Length
|
Number of UTF-16 code units | VB .NET, C#, Windows PowerShell, F# |
chars(string)string.chars
|
Number of graphemes (NFG) | Raku |
codes(string)string.codes
|
Number of Unicode code points | Raku |
string.size OR string.length
|
Number of bytes[31] | Ruby |
strlen(string)
|
Number of bytes | C, PHP |
string.length()
|
|
C++ (STL) |
string.length
|
|
Cobra, D, JavaScript |
string.length()
|
Number of UTF-16 code units | Java |
(string-length string)
|
|
Scheme |
(length string)
|
|
Common Lisp, ISLISP |
(count string)
|
|
Clojure |
String.length string
|
|
OCaml |
size string
|
|
Standard ML |
length string
|
Number of Unicode code points | Haskell |
string.length
|
Number of UTF-16 code units | Objective-C (NSString * only)
|
string.characters.count
|
Number of characters | Swift (2.x) |
count(string)
|
Number of characters | Swift (1.2) |
countElements(string)
|
Number of characters | Swift (1.0–1.1) |
string.len(string)(string):len()#string
|
|
Lua |
string size
|
|
Smalltalk |
LEN(string)LEN_TRIM(string)
|
|
Fortran |
StringLength[string]
|
|
Mathematica |
«FUNCTION» LENGTH(string) or
|
number of characters and number of bytes, respectively | COBOL |
string length string
|
a decimal string giving the number of characters | Tcl |
≢ string
|
APL | |
string.len()
|
Number of bytes | Rust[32] |
string.chars().count()
|
Number of Unicode code points | Rust[33] |
// Examples in C#
"hello".Length; // returns 5
"".Length; // returns 0
# Examples in Erlang
string:len("hello"). % returns 5
string:len(""). % returns 0
# Examples in Perl 5
length("hello"); # returns 5
length(""); # returns 0
# Examples in Raku
"".chars; chars ""; # both return 0
"".codes; codes ""; # both return 0
' Examples in Visual Basic
Len("hello") ' returns 5
Len("") ' returns 0
//Examples in Objective-C
[@"hello" Length] //returns 5
[@"" Length] //returns 0
-- Examples in Lua
("hello"):len() -- returns 5
#"" -- returns 0
locate
[edit]see #Find
Lowercase
[edit]| Definition | lowercase(string) returns string
|
|---|---|
| Description | Returns the string in lower case. |
| Format | Languages |
|---|---|
LCase(string)
|
VB |
lcase(string)
|
FreeBASIC |
lc(string)
|
Perl, Raku |
string.lc
|
Raku |
tolower(char)
|
C[34] |
std.string.toLower(string)
|
D |
transform(string.begin(), string.end(), result.begin(), ::tolower)[35]
|
C++[36] |
lowercase(string)
|
Object Pascal (Delphi) |
strtolower(string)
|
PHP |
lower(string)
|
Seed7 |
${string_param,,}
|
Bash |
echo "string" | tr 'A-Z' 'a-z'
|
Unix |
string.lower()
|
Python |
downcase(string)
|
Pick Basic |
string.downcase
|
Ruby[37] |
strings.ToLower(string)
|
Go |
(string-downcase string)
|
Scheme (R6RS), Common Lisp |
(lower-case string)
|
Clojure |
String.lowercase string
|
OCaml |
String.map Char.toLower string
|
Standard ML |
map Char.toLower string
|
Haskell |
string.toLowerCase()
|
Java, JavaScript |
to_lower(string)
|
Erlang |
string.ToLower()
|
VB .NET, C#, Windows PowerShell, F# |
string.lowercaseString
|
Objective-C (NSString * only), Swift (Foundation)
|
string.lower(string)(string):lower()
|
Lua |
string asLowercase
|
Smalltalk |
LOWER(string)
|
SQL |
lowercase(string)
|
PL/I[8] |
ToLowerCase[string]
|
Mathematica |
«FUNCTION» LOWER-CASE(string)
|
COBOL |
string.toLower
|
Cobra |
string tolower string
|
Tcl |
string.to_lowercase()
|
Rust[38] |
// Example in C#
"Wiki means fast?".ToLower(); // "wiki means fast?"
; Example in Scheme
(use-modules (srfi srfi-13))
(string-downcase "Wiki means fast?") ; "wiki means fast?"
/* Example in C */
#include <ctype.h>
#include <stdio.h>
int main(void) {
char s[] = "Wiki means fast?";
for (int i = 0; i < sizeof(s) - 1; ++i) {
// transform characters in place, one by one
s[i] = tolower(s[i]);
}
printf(string); // "wiki means fast?"
return 0;
}
# Example in Raku
"Wiki means fast?".lc; # "wiki means fast?"
mid
[edit]see #substring
partition
[edit]| Definition | <string>.partition(separator) returns the sub-string before the separator; the separator; then the sub-string after the separator. |
|---|---|
| Description | Splits the given string by the separator and returns the three substrings that together make the original. |
| Format | Languages | Comments |
|---|---|---|
string.partition(separator)
|
Python, Ruby(1.9+) | |
lists:partition(pred, string)
|
Erlang | |
split /(separator)/, string, 2
|
Perl 5 | |
split separator, string, 2string.split( separator, 2 )
|
Raku | Separator does not have to be a regular expression |
# Examples in Python
"Spam eggs spam spam and ham".partition('spam') # ('Spam eggs ', 'spam', ' spam and ham')
"Spam eggs spam spam and ham".partition('X') # ('Spam eggs spam spam and ham', "", "")
# Examples in Perl 5 / Raku
split /(spam)/, 'Spam eggs spam spam and ham' ,2; # ('Spam eggs ', 'spam', ' spam and ham');
split /(X)/, 'Spam eggs spam spam and ham' ,2; # ('Spam eggs spam spam and ham');
replace
[edit]| Definition | replace(string, find, replace) returns string
|
|---|---|
| Description | Returns a string with find occurrences changed to replace. |
| Format | Languages |
|---|---|
changestr(find, string, replace)
|
Rexx |
std.string.replace(string, find, replace)
|
D |
Replace(string, find, replace)
|
VB |
replace(string, find, replace)
|
Seed7 |
change(string, find, replace)
|
Pick Basic |
string.Replace(find, replace)
|
C#, F#, VB .NET |
str_replace(find, replace, string)
|
PHP |
re:replace(string, find, replace, «{return, list}»)
|
Erlang |
string.replace(find, replace)
|
Cobra, Java (1.5+), Python, Rust[39] |
string.replaceAll(find_regex, replace)[40]
|
Java |
string.gsub(find, replace)
|
Ruby |
string =~ s/find_regex/replace/g[40]
|
Perl 5 |
string.subst(find, replace, :g)
|
Raku |
string.replace(find, replace, "g") [41]string.replace(/find_regex/g, replace)[40]
|
JavaScript |
echo "string" | sed 's/find_regex/replace/g'[40]
|
Unix |
${string_param//find_pattern/replace}
|
Bash |
string.replace(find, replace)string -replace find_regex, replace[40]
|
Windows PowerShell |
Str.global_replace (Str.regexp_string find) replace string
|
OCaml |
[string stringByReplacingOccurrencesOfString:find withString:replace]
|
Objective-C (NSString * only)
|
string.stringByReplacingOccurrencesOfString(find, withString:replace)
|
Swift (Foundation) |
string.gsub(string, find, replace)(string):gsub(find, replace)
|
Lua |
string copyReplaceAll: find with: replace
|
Smalltalk (Squeak, Pharo) |
string map {find replace} string
|
Tcl |
StringReplace[string, find -> replace]
|
Mathematica |
strings.Replace(string, find, replace, -1)
|
Go |
INSPECT string REPLACING ALL/LEADING/FIRST find BY replace
|
COBOL |
find_regex ⎕R replace_regex ⊢ string
|
APL |
// Examples in C#
"effffff".Replace("f", "jump"); // returns "ejumpjumpjumpjumpjumpjump"
"blah".Replace("z", "y"); // returns "blah"
// Examples in Java
"effffff".replace("f", "jump"); // returns "ejumpjumpjumpjumpjumpjump"
"effffff".replaceAll("f*", "jump"); // returns "ejump"
// Examples in Raku
"effffff".subst("f", "jump", :g); # returns "ejumpjumpjumpjumpjumpjump"
"blah".subst("z", "y", :g); # returns "blah"
' Examples in Visual Basic
Replace("effffff", "f", "jump") ' returns "ejumpjumpjumpjumpjumpjump"
Replace("blah", "z", "y") ' returns "blah"
# Examples in Windows PowerShell
"effffff" -replace "f", "jump" # returns "ejumpjumpjumpjumpjumpjump"
"effffff" -replace "f*", "jump" # returns "ejump"
reverse
[edit]| Definition | reverse(string)
|
|---|---|
| Description | Reverses the order of the characters in the string. |
| Format | Languages |
|---|---|
reverse string
|
Perl 5, Haskell |
flip stringstring.flip
|
Raku |
lists:reverse(string)
|
Erlang |
strrev(string)
|
PHP |
string[::-1]
|
Python |
(string-reverse string)
|
Scheme (SRFI 13) |
(reverse string)
|
Common Lisp |
string.reverse
|
Ruby, D (modifies string) |
new StringBuilder(string).reverse().toString()
|
Java |
std::reverse(string.begin(), string.end());
|
C++ (std::string only, modifies string)
|
StrReverse(string)
|
VB |
string.Reverse()
|
VB .NET, C# |
implode (rev (explode string))
|
Standard ML |
string
|
JavaScript |
string.reverse(string)(string):reverse()
|
Lua |
string reverse
|
Smalltalk |
StringReverse[string]
|
Mathematica |
reverse(string)
|
PL/I |
|
COBOL |
string.toCharArray.toList.reversed.join()
|
Cobra |
String(string.characters.reverse())
|
Swift (2.x) |
String(reverse(string))
|
Swift (1.2) |
string reverse string
|
Tcl |
⌽string
|
APL |
string
|
Rust[42] |
echo string | rev
|
Unix |
" Example in Smalltalk "
'hello' reversed " returns 'olleh' "
# Example in Perl 5
reverse "hello" # returns "olleh"
# Example in Raku
"hello".flip # returns "olleh"
# Example in Python
"hello"[::-1] # returns "olleh"
; Example in Scheme
(use-modules (srfi srfi-13))
(string-reverse "hello") ; returns "olleh"
rfind
[edit]| Definition | rfind(string,substring) returns integer
|
|---|---|
| Description | Returns the position of the start of the last occurrence of substring in string. If the substring is not found most of these routines return an invalid index value – -1 where indexes are 0-based, 0 where they are 1-based – or some value to be interpreted as Boolean FALSE. |
| Related | instr |
| Format | Languages | If not found |
|---|---|---|
InStrRev(«startpos,» string,substring)
|
VB | returns 0 |
instrrev(«startpos,» string,substring)
|
FreeBASIC | returns 0 |
rindex(string,substring«,startpos»)
|
Perl 5 | returns −1 |
rindex(string,substring«,startpos»)string.rindex(substring«,startpos»)
|
Raku | returns Nil |
strrpos(string,substring«,startpos»)
|
PHP | returns FALSE |
string.rfind(substring«,startpos»)
|
C++ (STL) | returns std::string::npos |
std.string.rfind(string, substring)
|
D | returns −1 |
string.rfind(substring«,startpos«, endpos»»)
|
Python | returns −1 |
string.rindex(substring«,startpos«, endpos»»)
|
raises ValueError | |
rpos(string, substring«,startpos»)
|
Seed7 | returns 0 |
string.rindex(substring«,startpos»)
|
Ruby | returns nil |
strings.LastIndex(string, substring)
|
Go | returns −1 |
string.lastIndexOf(substring«,startpos»)
|
Java, JavaScript | returns −1 |
string.LastIndexOf(substring«,startpos«, charcount»»)
|
VB .NET, C#, Windows PowerShell, F# | returns −1 |
(search substring string :from-end t)
|
Common Lisp | returns NIL |
[string rangeOfString:substring options:NSBackwardsSearch].location
|
Objective-C (NSString * only)
|
returns NSNotFound |
Str.search_backward (Str.regexp_string substring) string (Str.length string - 1)
|
OCaml | raises Not_found |
string.match(string, '.*()'..substring)string
|
Lua | returns nil |
Ada.Strings.Unbounded.Index(Source => string, Pattern => substring, Going => Ada.Strings.Backward)
|
Ada | returns 0 |
string.lastIndexOf(substring«,startpos«, charcount»»)
|
Cobra | returns −1 |
string lastIndexOfString:substring
|
Smalltalk | returns 0 |
string last substring string startpos
|
Tcl | returns −1 |
|
APL | returns −1 |
string.rfind(substring)
|
Rust[43] | returns None |
; Examples in Common Lisp
(search "e" "Hello mate" :from-end t) ; returns 9
(search "z" "word" :from-end t) ; returns NIL
// Examples in C#
"Hello mate".LastIndexOf("e"); // returns 9
"Hello mate".LastIndexOf("e", 4); // returns 1
"word".LastIndexOf("z"); // returns -1
# Examples in Perl 5
rindex("Hello mate", "e"); # returns 9
rindex("Hello mate", "e", 4); # returns 1
rindex("word", "z"); # returns -1
# Examples in Raku
"Hello mate".rindex("e"); # returns 9
"Hello mate".rindex("e", 4); # returns 1
"word".rindex('z'); # returns Nil
' Examples in Visual Basic
InStrRev("Hello mate", "e") ' returns 10
InStrRev(5, "Hello mate", "e") ' returns 2
InStrRev("word", "z") ' returns 0
right
[edit]| Definition | right(string,n) returns string
|
|---|---|
| Description | Returns the right n part of a string. If n is greater than the length of the string then most implementations return the whole string (exceptions exist – see code examples). |
| Format | Languages |
|---|---|
|
Ada |
Right(string,n)
|
VB |
RIGHT$(string,n)
|
BASIC |
right(string,n)
|
FreeBASIC, Ingres, Pick Basic |
strcpy(string2, string+n) (n must not be greater than the length of string)
|
C |
string.Substring(string.Length()-n)
|
C# |
string[len(string)-n:]
|
Go |
string.substring(string.length()-n)
|
Java |
string.slice(-n)
|
JavaScript[44] |
right(string,n «,padchar»)
|
Rexx, Erlang |
substr(string,-n)
|
Perl 5, PHP |
substr(string,*-n)string.substr(*-n)
|
Raku |
string[-n:]
|
Cobra, Python |
${string_param: -n} (note the space after the colon)
|
Bash |
string[n]
|
Pick Basic |
(string-take-right string n)
|
Scheme (SRFI 13) |
string[-n..-1]
|
Ruby |
string[$-n .. $]
|
D[45] |
String.sub string (String.length string - n) n
|
OCaml[28] |
string.sub(string, -n)(string):sub(-n)
|
Lua |
string last: n
|
Smalltalk (Squeak, Pharo) |
StringTake[string, -n]
|
Mathematica[29] |
string (1:n)
|
COBOL |
¯n↑string.
|
APL |
string[n..]string.get(n..)
|
Rust[30] |
// Examples in Java; extract rightmost 4 characters
String str = "CarDoor";
str.substring(str.length()-4); // returns 'Door'
# Examples in Raku
"abcde".substr(*-3); # returns "cde"
"abcde".substr(*-8); # 'out of range' error
/* Examples in Rexx */
right("abcde", 3) /* returns "cde" */
right("abcde", 8) /* returns " abcde" */
right("abcde", 8, "*") /* returns "***abcde" */
; Examples in Scheme
(use-modules (srfi srfi-13))
(string-take-right "abcde", 3) ; returns "cde"
(string-take-right "abcde", 8) ; error
' Examples in Visual Basic
Right("sandroguidi", 3) ' returns "idi"
Right("sandroguidi", 100) ' returns "sandroguidi"
rpartition
[edit]| Definition | <string>.rpartition(separator) Searches for the separator from right-to-left within the string then returns the sub-string before the separator; the separator; then the sub-string after the separator. |
|---|---|
| Description | Splits the given string by the right-most separator and returns the three substrings that together make the original. |
| Format | Languages |
|---|---|
string.rpartition(separator)
|
Python, Ruby |
# Examples in Python
"Spam eggs spam spam and ham".rpartition('spam') ### ('Spam eggs spam ', 'spam', ' and ham')
"Spam eggs spam spam and ham".rpartition('X') ### ("", "", 'Spam eggs spam spam and ham')
slice
[edit]see #substring
split
[edit]| Definition | <string>.split(separator[, limit]) splits a string on separator, optionally only up to a limited number of substrings |
|---|---|
| Description | Splits the given string by occurrences of the separator (itself a string) and returns a list (or array) of the substrings. If limit is given, after limit – 1 separators have been read, the rest of the string is made into the last substring, regardless of whether it has any separators in it. The Scheme and Erlang implementations are similar but differ in several ways. JavaScript differs also in that it cuts, it does not put the rest of the string into the last element. See the example here. The Cobra implementation will default to whitespace. Opposite of join. |
| Format | Languages |
|---|---|
split(/separator/, string«, limit»)
|
Perl 5 |
split(separator, string«, limit»)string.split(separator, «limit»)
|
Raku |
explode(separator, string«, limit»)
|
PHP |
string.split(separator«, limit-1»)
|
Python |
string.split(separator«, limit»)
|
JavaScript, Java, Ruby |
string:tokens(string, sepchars)
|
Erlang |
strings.Split(string, separator)strings.SplitN(string, separator, limit)
|
Go |
(string-tokenize string« charset« start« end»»»)
|
Scheme (SRFI 13) |
Split(string, sepchars«, limit»)
|
VB |
string.Split(sepchars«, limit«, options»»)
|
VB .NET, C#, F# |
string -split separator«, limit«, options»»
|
Windows PowerShell |
Str.split (Str.regexp_string separator) string
|
OCaml |
std.string.split(string, separator)
|
D |
[string componentsSeparatedByString:separator]
|
Objective-C (NSString * only)
|
string.componentsSeparatedByString(separator)
|
Swift (Foundation) |
TStringList.Delimiter, TStringList.DelimitedText
|
Object Pascal |
StringSplit[string, separator«, limit»]
|
Mathematica |
string.split«(sepchars«, limit«, options»»)»
|
Cobra |
split string separator
|
Tcl |
(separator≠string)⊂string in APL2separator(≠⊆⊢)string in Dyalog APL 16.0
|
APL |
string.split(separator)
|
Rust[46] |
// Example in C#
"abc,defgh,ijk".Split(','); // {"abc", "defgh", "ijk"}
"abc,defgh;ijk".Split(',', ';'); // {"abc", "defgh", "ijk"}
% Example in Erlang
string:tokens("abc;defgh;ijk", ";"). % ["abc", "defgh", "ijk"]
// Examples in Java
"abc,defgh,ijk".split(","); // {"abc", "defgh", "ijk"}
"abc,defgh;ijk".split(",|;"); // {"abc", "defgh", "ijk"}
{ Example in Pascal }
var
lStrings: TStringList;
lStr: string;
begin
lStrings := TStringList.Create;
lStrings.Delimiter := ',';
lStrings.DelimitedText := 'abc,defgh,ijk';
lStr := lStrings.Strings[0]; // 'abc'
lStr := lStrings.Strings[1]; // 'defgh'
lStr := lStrings.Strings[2]; // 'ijk'
end;
# Examples in Perl 5
split(/spam/, 'Spam eggs spam spam and ham'); # ('Spam eggs ', ' ', ' and ham')
split(/X/, 'Spam eggs spam spam and ham'); # ('Spam eggs spam spam and ham')
# Examples in Raku
'Spam eggs spam spam and ham'.split(/spam/); # (Spam eggs and ham)
split(/X/, 'Spam eggs spam spam and ham'); # (Spam eggs spam spam and ham)
sprintf
[edit]see #Format
strip
[edit]see #trim
strcmp
[edit]
substring
[edit]| Definition | substring(string, startpos, endpos) returns stringsubstr(string, startpos, numChars) returns string
|
|---|---|
| Description | Returns a substring of string between starting at startpos and endpos, or starting at startpos of length numChars. The resulting string is truncated if there are fewer than numChars characters beyond the starting point. endpos represents the index after the last character in the substring. Note that for variable-length encodings such as UTF-8, UTF-16 or Shift-JIS, it can be necessary to remove string positions at the end, in order to avoid invalid strings. |
| Format | Languages |
|---|---|
string[startpos:endpos]
|
ALGOL 68 (changes base index) |
string (startpos .. endpos)
|
Ada (changes base index) |
Mid(string, startpos, numChars)
|
VB |
mid(string, startpos, numChars)
|
FreeBASIC |
string[startpos+(⍳numChars)-~⎕IO]
|
APL |
MID$(string, startpos, numChars)
|
BASIC |
substr(string, startpos, numChars)
|
AWK (changes string), Perl 5,[47][48] PHP[47][48] |
substr(string, startpos, numChars)string.substr(startpos, numChars)
|
Raku[49][50] |
substr(string, startpos «,numChars, padChar»)
|
Rexx |
string[startpos:endpos]
|
Cobra, Python,[47][51] Go |
string[startpos, numChars]
|
Pick Basic |
string[startpos, numChars]string[startpos .. endpos-1]string[startpos ... endpos]
|
Ruby[47][51] |
string[startpos .. endpos]string[startpos len numChars]
|
Seed7 |
string.slice(startpos«, endpos»)
|
JavaScript[47][51] |
string.substr(startpos«, numChars»)
|
C++ (STL), JavaScript |
string.Substring(startpos, numChars)
|
VB .NET, C#, Windows PowerShell, F# |
string.substring(startpos«, endpos»)
|
Java, JavaScript |
copy(string, startpos, numChars)
|
Object Pascal (Delphi) |
(substring string startpos endpos)
|
Scheme |
(subseq string startpos endpos)
|
Common Lisp |
(subseq string startpos endpos)
|
ISLISP |
String.sub string startpos numChars
|
OCaml |
substring (string, startpos, numChars)
|
Standard ML |
string:sub_string(string, startpos, endpos)string:substr(string, startpos, numChars)
|
Erlang |
strncpy(result, string + startpos, numChars);
|
C |
string[startpos .. endpos+1]
|
D |
take numChars $ drop startpos string
|
Haskell |
[string substringWithRange:NSMakeRange(startpos, numChars)]
|
Objective-C (NSString * only)
|
string.[startpos..endpos]
|
F# |
string.sub(string, startpos, endpos)(string):sub(startpos, endpos)
|
Lua[47][51] |
string copyFrom: startpos to: endpos
|
Smalltalk |
string(startpos:endpos)
|
Fortran |
SUBSTRING(string FROM startpos «FOR numChars»)
|
SQL |
StringTake[string, {startpos, endpos}]
|
Mathematica[47][51] |
string (startpos:numChars)
|
COBOL |
${string_param:startpos:numChars}
|
Bash |
string range string startpos endpos
|
Tcl |
string[startpos..endpos]string.get(startpos..endpos)
|
Rust[30] |
// Examples in C#
"abc".Substring(1, 1): // returns "b"
"abc".Substring(1, 2); // returns "bc"
"abc".Substring(1, 6); // error
;; Examples in Common Lisp
(subseq "abc" 1 2) ; returns "b"
(subseq "abc" 2) ; returns "c"
% Examples in Erlang
string:substr("abc", 2, 1). % returns "b"
string:substr("abc", 2). % returns "bc"
# Examples in Perl 5
substr("abc", 1, 1); # returns "b"
substr("abc", 1); # returns "bc"
# Examples in Raku
"abc".substr(1, 1); # returns "b"
"abc".substr(1); # returns "bc"
# Examples in Python
"abc"[1:2] # returns "b"
"abc"[1:3] # returns "bc"
/* Examples in Rexx */
substr("abc", 2, 1) /* returns "b" */
substr("abc", 2) /* returns "bc" */
substr("abc", 2, 6) /* returns "bc " */
substr("abc", 2, 6, "*") /* returns "bc****" */
Uppercase
[edit]| Definition | uppercase(string) returns string
|
|---|---|
| Description | Returns the string in upper case. |
| Format | Languages |
|---|---|
UCase(string)
|
VB |
ucase(string)
|
FreeBASIC |
toupper(string)
|
AWK (changes string) |
uc(string)
|
Perl, Raku |
string.uc
|
Raku |
toupper(char)
|
C (operates on one character) |
|
C (string / char array) |
std.string.toUpper(string)
|
D |
transform(string.begin(), string.end(), result.begin(), toupper)[35]
|
C++[52] |
uppercase(string)
|
Object Pascal (Delphi) |
upcase(char)
|
Object Pascal (Delphi) (operates on one character) |
strtoupper(string)
|
PHP |
upper(string)
|
Seed7 |
${string_param^^} (mnemonic: ^ is pointing up)
|
Bash |
echo "string" | tr 'a-z' 'A-Z'
|
Unix |
translate(string)UPPER variablesPARSE UPPER VAR SrcVar DstVar
|
Rexx |
string.upper()
|
Python |
upcase(string)
|
Pick Basic |
string.upcase
|
Ruby[37] |
strings.ToUpper(string)
|
Go |
(string-upcase string)
|
Scheme, Common Lisp |
String.uppercase string
|
OCaml |
String.map Char.toUpper string
|
Standard ML |
map Char.toUpper string
|
Haskell |
string.toUpperCase()
|
Java, JavaScript |
string.uppercase()
|
Kotlin[53] |
to_upper(string)
|
Erlang |
string.ToUpper()
|
VB .NET, C#, Windows PowerShell, F# |
string.uppercaseString
|
Objective-C (NSString * only), Swift (Foundation)
|
string.upper(string)(string):upper()
|
Lua |
string asUppercase
|
Smalltalk |
UPPER(string)
|
SQL |
ToUpperCase[string]
|
Mathematica |
|
COBOL |
string.toUpper
|
Cobra |
string toupper string
|
Tcl |
string.to_uppercase()
|
Rust[54] |
// Example in C#
"Wiki means fast?".ToUpper(); // "WIKI MEANS FAST?"
# Example in Perl 5
uc("Wiki means fast?"); # "WIKI MEANS FAST?"
# Example in Raku
uc("Wiki means fast?"); # "WIKI MEANS FAST?"
"Wiki means fast?".uc; # "WIKI MEANS FAST?"
/* Example in Rexx */
translate("Wiki means fast?") /* "WIKI MEANS FAST?" */
/* Example #2 */
A='This is an example.'
UPPER A /* "THIS IS AN EXAMPLE." */
/* Example #3 */
A='upper using Translate Function.'
Translate UPPER VAR A Z /* Z="UPPER USING TRANSLATE FUNCTION." */
; Example in Scheme
(use-modules (srfi srfi-13))
(string-upcase "Wiki means fast?") ; "WIKI MEANS FAST?"
' Example in Visual Basic
UCase("Wiki means fast?") ' "WIKI MEANS FAST?"
trim
[edit]trim or strip is used to remove whitespace from the beginning, end, or both beginning and end, of a string.
| Example usage | Languages |
|---|---|
String.Trim([chars])
|
C#, VB.NET, Windows PowerShell |
string.strip();
|
D |
(.trim string)
|
Clojure |
sequence [ predicate? ] trim
|
Factor |
|
Common Lisp |
(string-trim string)
|
Scheme |
string.trim()
|
Java, JavaScript (1.8.1+, Firefox 3.5+), Rust[55] |
Trim(String)
|
Pascal,[56] QBasic, Visual Basic, Delphi |
string.strip()
|
Python |
strings.Trim(string, chars)
|
Go |
LTRIM(RTRIM(String))
|
Oracle SQL, T-SQL |
strip(string [,option, char])
|
REXX |
string:strip(string [,option, char])
|
Erlang |
string.stripstring.lstripstring.rstrip
|
Ruby |
string.trim
|
Raku |
trim(string)
|
PHP, Raku |
[string
|
Objective-C using Cocoa |
string withBlanksTrimmedstring withoutSpacesstring withoutSeparators
|
Smalltalk (Squeak, Pharo) Smalltalk |
strip(string)
|
SAS |
string trim $string
|
Tcl |
TRIM(string)TRIM(ADJUSTL(string))
|
Fortran |
TRIM(string)
|
SQL |
TRIM(string)LTrim(string)RTrim(String)
|
ColdFusion |
String.trim string
|
OCaml 4+ |
Other languages
In languages without a built-in trim function, it is usually simple to create a custom function which accomplishes the same task.
APL
[edit]APL can use regular expressions directly:
Trim←'^ +| +$'⎕R''
Alternatively, a functional approach combining Boolean masks that filter away leading and trailing spaces:
Trim←{⍵/⍨(∨\∧∘⌽∨\∘⌽)' '≠⍵}
Or reverse and remove leading spaces, twice:
Trim←{(∨\' '≠⍵)/⍵}∘⌽⍣2
AWK
[edit]In AWK, one can use regular expressions to trim:
ltrim(v) = gsub(/^[ \t]+/, "", v)
rtrim(v) = gsub(/[ \t]+$/, "", v)
trim(v) = ltrim(v); rtrim(v)
or:
function ltrim(s) { sub(/^[ \t]+/, "", s); return s }
function rtrim(s) { sub(/[ \t]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
C/C++
[edit]There is no standard trim function in C or C++. Most of the available string libraries[57] for C contain code which implements trimming, or functions that significantly ease an efficient implementation. The function has also often been called EatWhitespace in some non-standard C libraries.
In C, programmers often combine a ltrim and rtrim to implement trim:
#include <ctype.h>
#include <string.h>
void rtrim(char* str) {
char* s;
s = str + strlen(str);
while (--s >= str) {
if (!isspace(*s)) {
break;
}
*s = 0;
}
}
void ltrim(char* str) {
size_t n;
n = 0;
while (str[n] && isspace((unsigned char) str[n])) {
n++;
}
memmove(str, str + n, strlen(str) - n + 1);
}
void trim(char* str) {
rtrim(str);
ltrim(str);
}
The open source C++ library Boost has several trim variants, including a standard one:[58]
#include <boost/algorithm/string/trim.hpp>
trimmed = boost::algorithm::trim_copy("string");
With boost's function named simply trim the input sequence is modified in-place, and returns no result.
Another open source C++ library Qt, has several trim variants, including a standard one:[59]
#include <QString>
trimmed = s.trimmed();
The Linux kernel also includes a strip function, strstrip(), since 2.6.18-rc1, which trims the string "in place". Since 2.6.33-rc1, the kernel uses strim() instead of strstrip() to avoid false warnings.[60]
Haskell
[edit]A trim algorithm in Haskell:
import Data.Char (isSpace)
trim :: String -> String
trim = f . f
where f = reverse . dropWhile isSpace
may be interpreted as follows: f drops the preceding whitespace, and reverses the string. f is then again applied to its own output. Note that the type signature (the second line) is optional.
J
[edit]The trim algorithm in J is a functional description:
trim =. #~ [: (+./\ *. +./\.) ' '&~:
That is: filter (#~) for non-space characters (' '&~:) between leading (+./\) and (*.) trailing (+./\.) spaces.
JavaScript
[edit]There is a built-in trim function in JavaScript 1.8.1 (Firefox 3.5 and later), and the ECMAScript 5 standard. In earlier versions it can be added to the String object's prototype as follows:
String.prototype.trim = function() {
return this.replace(/^\s+/g, "").replace(/\s+$/g, "");
};
Perl
[edit]Perl 5 has no built-in trim function. However, the functionality is commonly achieved using regular expressions.
Example:
$string =~ s/^\s+//; # remove leading whitespace
$string =~ s/\s+$//; # remove trailing whitespace
or:
$string =~ s/^\s+|\s+$//g ; # remove both leading and trailing whitespace
These examples modify the value of the original variable $string.
Also available for Perl is StripLTSpace in String::Strip from CPAN.
There are, however, two functions that are commonly used to strip whitespace from the end of strings, chomp and chop:
chopremoves the last character from a string and returns it.chompremoves the trailing newline character(s) from a string if present. (What constitutes a newline is $INPUT_RECORD_SEPARATOR dependent).
In Raku, the upcoming sister language of Perl, strings have a trim method.
Example:
$string = $string.trim; # remove leading and trailing whitespace
$string .= trim; # same thing
Tcl
[edit]The Tcl string command has three relevant subcommands: trim, trimright and trimleft. For each of those commands, an additional argument may be specified: a string that represents a set of characters to remove—the default is whitespace (space, tab, newline, carriage return).
Example of trimming vowels:
set string onomatopoeia
set trimmed [string trim $string aeiou] ;# result is nomatop
set r_trimmed [string trimright $string aeiou] ;# result is onomatop
set l_trimmed [string trimleft $string aeiou] ;# result is nomatopoeia
XSLT
[edit]XSLT includes the function normalize-space(string) which strips leading and trailing whitespace, in addition to replacing any whitespace sequence (including line breaks) with a single space.
Example:
<xsl:variable name='trimmed'>
<xsl:value-of select='normalize-space(string)'/>
</xsl:variable>
XSLT 2.0 includes regular expressions, providing another mechanism to perform string trimming.
Another XSLT technique for trimming is to utilize the XPath 2.0 substring() function.
References
[edit]- ^ a b c d e the index can be negative, which then indicates the number of places before the end of the string.
- ^ In Rust, the
str::charsmethod iterates over code points and thestd::iter::Iterator::nthmethod on iterators returns the zero-indexed nth value from the iterator, orNone. - ^ the index can not be negative, use *-N where N indicate the number of places before the end of the string.
- ^ In C++, the overloaded
operator<=>method on a string returns astd::strong_orderingobject (otherwisestd::weak_ordering):less,equal(same asequivalent), orgreater. - ^ returns LESS, EQUAL, or GREATER
- ^ returns LT, EQ, or GT
- ^ returns
.TRUE.or.FALSE.. These functions are based on the ASCII collating sequence. - ^ a b IBM extension.
- ^ In Rust, the
Ord::cmpmethod on a string returns anOrdering:Less,Equal, orGreater. - ^ The original REXX used ¬ for Logical Not, ANSI Rexx uses \, some implementations accept ~ or ^, and non-EBCDIC implementations vary as to whether ¬ is at code point AA or AC.
- ^ The original PL/I used ¬ for Logical Not, some implementations expect ^, and non-EBCDIC implementations vary as to whether ¬ is at code point AA or AC.
- ^ a b c d e f In Rust, the operators == and != and the methods
eq,neare implemented by thePartialEqtrait, and the operators <, >, <=, >= and the methodslt,gt,le,geare implemented by thePartialOrdtrait. - ^ The operators use the compiler's default collating sequence.
- ^ modifies
string1, which must have enough space to store the result - ^ In Rust, the
+operator is implemented by theAddtrait. - ^ See the
str::containsmethod. - ^ See the
std::basic_string::containsmethod. - ^ a b startpos is IBM extension.
- ^ a b See the
str::findmethod. - ^
startposis IBM extension. - ^ "scan in Fortran Wiki". Fortranwiki.org. 2009-04-30. Retrieved 2013-08-18.
- ^ "verify in Fortran Wiki". Fortranwiki.org. 2012-05-03. Retrieved 2013-08-18.
- ^
formatstringmust be a fixed literal at compile time for it to have the correct type. - ^ See
std::format, which is imported by the Rust prelude so that it can be used under the nameformat. - ^ See the
slice::joinmethod. - ^ if n is larger than the length of the string, then in Debug mode ArrayRangeException is thrown, in Release mode, the behaviour is unspecified.
- ^ if n is larger than the length of the string, Java will throw an IndexOutOfBoundsException
- ^ a b if n is larger than length of string, raises Invalid_argument
- ^ a b if n is larger than length of string, throw the message "StringTake::take:"
- ^ a b c In Rust, strings are indexed in terms of byte offsets and there is a runtime panic if the index is out of bounds or if it would result in invalid UTF-8. A
&str(string reference) can be indexed by various types of ranges, includingRange(0..n),RangeFrom(n..), andRangeTo(..n) because they all implement theSliceIndextrait withstrbeing the type being indexed. Thestr::getmethod is the non-panicking way to index. It returnsNonein the cases in which indexing would panic. - ^ Ruby lacks Unicode support
- ^ See the
str::lenmethod. - ^ In Rust, the
str::charsmethod iterates over code points and thestd::iter::Iterator::countmethod on iterators consumes the iterator and returns the total number of elements in the iterator. - ^ operates on one character
- ^ a b The
transformfunction exists in thestd::namespace. You must include the<algorithm>header file to use it. Thetolowerandtoupperfunctions are in the global namespace, obtained by the<ctype.h>header file. Thestd::tolowerandstd::touppernames are overloaded and cannot be passed tostd::transformwithout a cast to resolve a function overloading ambiguity, e.g.std::transform(string.begin(), string.end(), result.begin(), (int (*)(int))std::tolower); - ^
std::stringonly, result is stored in stringresultwhich is at least as long asstring, and may or may not bestringitself - ^ a b only ASCII characters as Ruby lacks Unicode support
- ^ See the
str::to_lowercasemethod. - ^ See the
str::replacemethod. - ^ a b c d e The "find" string in this construct is interpreted as a regular expression. Certain characters have special meaning in regular expressions. If you want to find a string literally, you need to quote the special characters.
- ^ third parameter is non-standard
- ^ In Rust, the
str::charsmethod iterates over code points, thestd::iter::Iterator::revmethod on reversible iterators (std::iter::DoubleEndedIterator) creates a reversed iterator, and thestd::iter::Iterator::collectmethod consumes the iterator and creates a collection (which here is specified as aStringwith the turbofish syntax) from the iterator's elements. - ^ See the
str::rfindmethod. - ^ "Annotated ES5". Es5.github.com. Archived from the original on 2013-01-28. Retrieved 2013-08-18.
- ^ if n is larger than length of string, then in Debug mode ArrayRangeException is thrown, and unspecified behaviour in Release mode
- ^ See the
str::splitandstr::rsplitmethods. - ^ a b c d e f g
startposcan be negative, which indicates to start that number of places before the end of the string. - ^ a b
numCharscan be negative, which indicates to end that number of places before the end of the string. - ^
startposcan not be negative, use * - startpos to indicate to start that number of places before the end of the string. - ^
numCharscan not be negative, use * - numChars to indicate to end that number of places before the end of the string. - ^ a b c d e
endposcan be negative, which indicates to end that number of places before the end of the string. - ^
std::stringonly, result is stored in string result which is at least as long as string, and may or may not be string itself - ^ "uppercase - Kotlin Programming Language". Kotlin. Retrieved 9 November 2024.
- ^ In Rust, the
str::to_uppercasemethod returns a newly allocatedStringwith any lowercase characters changed to uppercase ones following the Unicode rules. - ^ In Rust, the
str::trimmethod returns a reference to the original&str. - ^ "Trim – GNU Pascal priručnik". Gnu-pascal.de. Retrieved 2013-08-24.
- ^ "String library comparison". And.org. Retrieved 2013-08-24.
- ^ "Usage – 1.54.0". Boost.org. 2013-05-22. Retrieved 2013-08-24.
- ^ [1] Archived August 2, 2009, at the Wayback Machine
- ^ dankamongmen. "sprezzos-kernel-packaging/changelog at master · dankamongmen/sprezzos-kernel-packaging · GitHub". Github.com. Retrieved 2016-05-29.
Comparison of programming languages (string functions)
View on GrokipediaIntroduction
Overview of String Handling
In programming languages, strings are defined as finite sequences of characters, serving as a fundamental data type for representing textual data such as words, sentences, or symbols.[5] Unicode support, which standardizes the encoding of characters from diverse writing systems worldwide, exhibits variations across languages to balance compatibility, efficiency, and internationalization needs. In C, strings are commonly managed as null-terminated byte arrays, often using UTF-8 encoding in modern systems—a variable-width scheme that extends ASCII while preserving backward compatibility for single-byte characters. By contrast, Java offers native Unicode integration through its String class, internally employing UTF-16 encoding to natively handle a broader spectrum of code points without requiring external libraries. A key distinction in string design lies between immutable and mutable implementations, influencing safety, concurrency, and efficiency. Immutable strings, prevalent in languages like Java and Python, cannot be altered after creation; any modification, such as appending text, produces a new string object, which enhances thread safety by preventing unintended changes during shared access.[6] Mutable strings, as seen in C++'s std::string, permit in-place alterations to individual characters or sections, offering greater flexibility for performance-critical applications where frequent edits occur without the overhead of allocation.[7] The evolution of string handling traces back to the 1970s with C's introduction of null-terminated character arrays, a lightweight yet manual approach that delimited strings via a trailing null byte (\0) to simplify parsing while exposing programmers to risks like buffer overflows. This paradigm persisted into the 1980s and 1990s but gave way to more robust models in object-oriented languages; Java (1995) and C# (2000) elevated strings to first-class objects with encapsulated methods for operations, integrating Unicode natively and automating memory management to mitigate low-level errors. String handling presents persistent challenges, particularly around encoding inconsistencies that can corrupt data—such as interpreting UTF-8 bytes as ISO-8859-1, resulting in mojibake or injection vulnerabilities—and performance trade-offs from mutability choices.[8] Immutable designs reduce mutation-related bugs in multithreaded code but may degrade efficiency through repeated allocations for common tasks like concatenation, whereas mutable variants enable optimized in-place updates at the cost of heightened aliasing risks.[9]Languages and Paradigms Covered
This comparison encompasses a diverse set of programming languages selected based on their current popularity, historical influence, and ability to represent key paradigms in string handling, ensuring a balanced view of variations in string functions. Languages like Python, C, C++, Java, and JavaScript are included due to their high rankings in the TIOBE Programming Community Index for November 2025, where Python holds the top position at 23.37%, followed by C++ at 10.03% and C at 8.89%, reflecting their widespread adoption in general-purpose, systems, and web development.[10] Perl, now in the top 10, is featured for its enduring role in text processing. To capture paradigm diversity beyond mainstream usage, Haskell (around 28th at 0.8%), APL/J, AWK, and Tcl are incorporated; these lower-ranked languages (APL/J, AWK, and Tcl outside the top 50 per TIOBE metrics, each <0.1%) exemplify functional, array-oriented, domain-specific text processing, and pure scripting approaches, respectively, without delving into ultra-low-level options like assembly for conciseness.[10] This selection prioritizes coverage of imperative, object-oriented, functional, and scripting paradigms to illustrate how foundational design choices affect string representation and operations. In the imperative and procedural paradigm, C serves as a cornerstone, where strings are implemented as null-terminated arrays of characters (char*), managed via standard library functions like those in <string.h> for basic operations, emphasizing manual memory control. C++, a multi-paradigm extension of C, augments this with the std::string class from theInspection Functions
Determining String Length
Determining the length of a string is a fundamental operation in programming languages, providing the number of characters or units in a string data structure. This function enables developers to validate input, allocate memory, or iterate over string contents efficiently. Across languages, implementations vary in syntax, performance, and handling of Unicode, reflecting differences in how strings are represented internally—such as immutable sequences in high-level languages versus null-terminated byte arrays in low-level ones.[12][13] Common variants include built-in functions or methods tailored to each language's paradigm. In Python, thelen() built-in function returns the length of a string as the number of Unicode code points. For example:
len("hello") # Returns 5
len("hello") # Returns 5
length() method on the String class returns the count of 16-bit Unicode code units, invoked as an instance method. For example:
"hello".length() // Returns 5
"hello".length() // Returns 5
String is part of the core java.lang package.[13] In C, the strlen() function from the <string.h> header computes the number of bytes before the null terminator. For example:
#include <string.h>
strlen("hello"); // Returns 5
#include <string.h>
strlen("hello"); // Returns 5
\0. Specialized languages like APL use the rho operator ⍴ (shape function) to obtain the length of a vector, including strings treated as character vectors; for instance, ⍴'hello' yields 5.[14] Similarly, AWK's length() function returns the number of characters in a string (multibyte-aware in gawk) or the current record if no argument is provided; length("hello") returns 5.[15]
Edge cases, particularly with empty or null strings, reveal implementation differences that can lead to errors if unhandled. An empty string returns 0 in Python (len("") == 0), Java ("".length() == 0), C (strlen("") == 0), APL (⍴'' ≡ 0), and AWK (length("") == 0 or length of unset variable).[12][13][14] However, null references behave differently: Python raises a TypeError on len(None), Java throws a NullPointerException when calling length() on null, C invokes undefined behavior (potentially crashing or returning garbage) if strlen(NULL) is passed, APL errors on non-array inputs like null equivalents, and AWK has no null reference equivalent—instead, unset variables are treated as empty strings yielding 0.[16][17]
Performance typically favors constant-time O(1) operations in modern languages due to stored metadata, avoiding full scans. Python's len() for strings achieves O(1) by accessing an internal length field in the PyUnicodeObject structure.[18] Java's length() is O(1), retrieving the count field from the String object.[19] APL and AWK also provide O(1) access via array shape or built-in tracking.[14] In contrast, C's strlen() is O(n) in the worst case, as it iterates byte-by-byte to find the null terminator, making repeated calls inefficient for long strings without caching.[19]
A key distinction arises in how lengths account for Unicode: Python 3's len() counts Unicode code points, so "café".encode('utf-8') has 5 bytes but len("café") == 4.[16] Java counts UTF-16 code units, where surrogate pairs for emojis inflate the length (e.g., "👍".length() == 2 despite one code point).[13] C's strlen() measures raw bytes, ignoring encoding (e.g., 5 bytes for "café" in UTF-8), which can mismatch user expectations in multilingual contexts. APL and AWK similarly count characters or bytes based on their array models, often aligning with code points in modern implementations.[14]
| Language | Function/Method | Time Complexity | Counts |
|---|---|---|---|
| Python | len(s) | O(1) | Unicode code points |
| Java | s.length() | O(1) | UTF-16 code units |
| C | strlen(s) | O(n) | Bytes until null |
| APL | ⍴s | O(1) | Vector length (characters) |
| AWK | length(s) | O(1) | Characters (multibyte-aware) |
Accessing Individual Characters
Accessing individual characters in strings is a fundamental operation in most programming languages, allowing developers to retrieve specific code units or characters by position. This typically involves indexing mechanisms, where positions are specified relative to the start of the string. The majority of languages employ zero-based indexing, where the first character is at position 0, facilitating efficient pointer arithmetic and alignment with memory addressing conventions established in early languages like C.[20] However, exceptions exist, such as Lua and APL, which default to one-based indexing to align more closely with mathematical notation and human counting intuition.[21][22] Some dialects of BASIC, like Visual Basic, support both zero- and one-based access depending on the function, though .NET strings fundamentally use zero-based indexing.[23] Common functions for character access vary by language but often include dedicated methods or operator overloading. In Java, thecharAt(int index) method retrieves the character at the specified zero-based index, returning a primitive char type representing a UTF-16 code unit.[24] Similarly, JavaScript's String.prototype.charAt(index) returns a string containing the single UTF-16 code unit at the zero-based position, or an empty string if the index is out of bounds.[25] Python uses square bracket notation s[index] for zero-based access, yielding a string of length 1 rather than a distinct character type, as Python treats single characters as short strings.[26] In C++, std::string::operator[](size_type pos) provides zero-based access to a char reference without bounds checking, while the safer at(pos) method performs validation.[20] APL employs bracket indexing [index], defaulting to one-based via the system variable ⎕IO←1, to extract elements from character vectors.[22] Lua's strings also support one-based indexing through functions like string.byte(s, i), where i=1 accesses the first byte.[21]
Bounds checking is a critical safety feature that prevents invalid memory access, but its implementation differs significantly across languages, impacting security and performance. Java's charAt throws an IndexOutOfBoundsException for negative indices or those exceeding the string length minus one.[24] Python's indexing raises an IndexError for out-of-range attempts, enforcing runtime validation.[26] In contrast, C++'s operator[] exhibits undefined behavior for positions beyond the string size, potentially leading to crashes or exploits, though at() throws std::out_of_range.[20] C treats strings as null-terminated char arrays, offering no inherent bounds checking; direct indexing like s[i] can overrun buffers if i exceeds the allocated size, a common vector for buffer overflow vulnerabilities that allow arbitrary code execution.[27] Languages without built-in checks, like C, require manual validation to mitigate such risks, often at the cost of added complexity.[27]
Return types for single-character access reflect underlying type systems and design philosophies. Low-level languages like C and C++ return a primitive char, enabling direct manipulation of byte values.[20] Java follows suit with its char primitive, which is a 16-bit unsigned integer suitable for UTF-16.[24] Higher-level languages such as Python and JavaScript return strings of length 1, avoiding separate character types to simplify uniformity and immutability—Python explicitly states that s[0] equals s[0:1].[26][25] This approach in Python aligns with its sequence model, where characters are not distinct from strings.[26]
Unicode handling introduces nuances, as strings often encode text in UTF-16 or UTF-8, where "characters" may not align with user-perceived units. Most access functions operate on code units rather than grapheme clusters, which are visually distinct symbols potentially spanning multiple code points (e.g., accented letters like 'é' as 'e' + combining acute). In JavaScript, charAt indexes UTF-16 code units, so surrogate pairs for astral plane characters (U+10000–U+10FFFF) like emojis return individual surrogates, splitting the symbol—e.g., '💩'.charAt(0) yields the high surrogate '\uD83D', not the full glyph.[25][28] Java's charAt similarly accesses code units, treating supplementary characters as two positions.[24] For grapheme-aware access, languages recommend higher-level APIs or normalization (e.g., JavaScript's codePointAt for code points), but standard indexing prioritizes code unit efficiency over semantic completeness.[28] Python's indexing works on Unicode code points in its internal representation, but slicing or access may still require libraries for true grapheme boundaries.[26]
| Language | Indexing | Access Syntax | Return Type | Bounds Check | Unicode Notes |
|---|---|---|---|---|---|
| Java | 0-based | charAt(index) | char | Yes (IndexOutOfBoundsException) | UTF-16 code units; surrogates split |
| JavaScript | 0-based | charAt(index) | string (len 1) | Partial (empty string on out-of-bounds) | UTF-16; lone surrogates possible |
| Python | 0-based | s[index] | str (len 1) | Yes (IndexError) | Code points; graphemes need extra handling |
| C++ | 0-based | s[pos] or s.at(pos) | char& | No / Yes (out_of_range) | Bytes; no native Unicode support |
| C | 0-based | s[i] | char | No (buffer overflow risk) | ASCII/bytes; manual Unicode |
| Lua | 1-based | string.byte(s, i) | number (byte value) | Manual | Bytes; UTF-8 requires care |
| APL | 1-based (default) | s[index] | Character | Manual via ⎕IO | Vector elements; Unicode via implementation |
Comparison Functions
Equality and Inequality Checks
In most programming languages, string equality checks determine whether two strings contain identical sequences of characters, while inequality checks verify the opposite. These operations typically use built-in operators or functions and are case-sensitive by default, meaning distinctions between uppercase and lowercase letters are preserved (e.g., "Apple" is not equal to "apple").[29][30][31] For example, in Python, the == operator performs a content-based comparison for strings, returning True if the sequences match exactly, including case.[29] Similarly, JavaScript's strict equality operator (===) compares string primitives by value without type coercion, ensuring case-sensitive equality.[32] Inequality is handled via != or !== in these languages, negating the equality result.[33] In lower-level languages like C, equality is tested using the strcmp() function from the standard library, which returns 0 if the null-terminated strings are identical in content and length.[31] C++ extends this with the std::string class, where the == operator overloads perform content comparison on character sequences, also case-sensitive by default. Java differs by distinguishing between reference and value comparisons: the == operator checks if two String objects reference the same instance in memory, while the equals() method compares content for equality, returning true only if the character sequences match.[34] For inequality in Java, != tests reference inequality, and !equals() negates content equality.[35] Locale-aware variants exist in some languages, such as C's strcasecmp(), which ignores case by converting characters to a common form during comparison, though these are less common for basic equality checks.[36] Null handling varies significantly and requires caution to avoid errors. In Java, comparing two null references with == returns true, but invoking equals() on a null String throws a NullPointerException; instead, the method returns false if the argument is null.[34] Python treats None (null equivalent) separately from empty strings, where "" == None is False. String comparisons with None using == return False; identity checks to None should use 'is' to avoid potential issues with custom eq implementations.[37] C and C++ strings are null-terminated pointers, so comparing NULL pointers with strcmp() is undefined behavior, typically requiring explicit null checks before invocation.[31] JavaScript handles null and undefined distinctly from strings, with null == undefined being true due to coercion, but strict === returns false; the string 'null' == null is false. Empty strings "" === "" is true.[32] Performance for content-based string equality is generally O(n) time complexity, where n is the length of the shorter string, as implementations scan characters sequentially until a mismatch or end is reached.[38] This holds across languages like Python, Java, and C++, where early termination optimizes average cases but worst-case equality requires full traversal. Reference comparisons (e.g., Java's ==) are O(1), but they do not verify content equality.[35]| Language | Equality Operator/Function | Inequality | Case Sensitivity | Null Handling Notes |
|---|---|---|---|---|
| Python | == (content) | != | Yes (default) | Comparisons with == return False; use 'is' for None checks; "" != None |
| Java | equals() (content); == (ref) | !equals(); != | Yes (default) | == true for null==null; equals(null) false or NPE |
| C | strcmp() == 0 | strcmp() != 0 | Yes (default); strcasecmp() ignores | Undefined for NULL pointers |
| C++ | std::string::== (content) | != | Yes (default) | Dereference null undefined |
| JavaScript | === (strict value) | !== | Yes | === false for null vs string; coercion in == |
Lexicographical Ordering
Lexicographical ordering of strings involves comparing their character sequences position by position, typically based on the Unicode code points of the characters, to determine which string precedes, follows, or equals another.[39][29][40] In Java, theString.compareTo(String anotherString) method performs this comparison, returning a negative integer if the current string is lexicographically less than the argument, zero if equal, and a positive integer if greater.[39] The method iterates through characters until a difference is found, subtracting the Unicode values (this.charAt(k) - anotherString.charAt(k)), or compares lengths if one is a prefix of the other. For example:
"apple".compareTo("apricot") // Returns negative value, since 'p' (U+0070) < 'r' (U+0072) at position 3
"apple".compareTo("apricot") // Returns negative value, since 'p' (U+0070) < 'r' (U+0072) at position 3
compareTo indicates the strings are equal, aligning with the equals method.[39]
The C standard library function strcmp(const char *lhs, const char *rhs) similarly returns an int: negative if lhs precedes rhs, zero if equal, and positive otherwise, based on the difference of the first differing bytes interpreted as unsigned char values.[41] It assumes null-terminated byte strings and performs a basic byte-wise comparison without locale awareness.
In Python, relational operators like < and > provide boolean results for lexicographical ordering, using Unicode code points for comparison.[29] For instance:
"apple" < "apricot" # True, due to 'p' < 'r' at the differing position
"apple" < "apricot" # True, due to 'p' < 'r' at the differing position
Ord typeclass instance for String (defined as [Char]) supports operators like < and > (returning Bool) and the compare function (returning Ordering: LT, EQ, or GT).[42] Lexicographical ordering follows the list structure, comparing elements recursively until a difference, with empty lists (empty strings) preceding non-empty ones. For example, compare "apple" "apricot" == LT.[42]
| Language | Function/Operator | Return Type | Notes |
|---|---|---|---|
| Java | compareTo | int | Magnitude indicates order; code point subtraction. |
| C | strcmp | int | Byte-wise; sign of first difference. |
| Python | <, > | bool | Direct relational; shorter prefix wins on length. |
| Haskell | compare, < | Ordering or bool | Lexicographic on list; empty < non-empty. |
Collator class, which tailors comparisons to specific locales using the Unicode Collation Algorithm (UCA).[43][40] Obtained via Collator.getInstance(Locale), it supports strength levels (e.g., primary for base letters, tertiary for case) and decomposition for accented characters, overriding the locale-agnostic String.compareTo.[43] The UCA enables multi-level weighting: primary for script order, secondary for accents, and tertiary for case, allowing custom tailorings for languages like French (backwards accents) or Slovak (contractions like "ch").[40]
Edge cases highlight these rules: for unequal lengths with matching prefixes, the shorter string precedes (e.g., "apple" < "apples" in all listed languages).[39][29][41][42] Empty strings are the smallest, as they precede any non-empty string (e.g., "" < "a").[39][29][42]
Searching Functions
Forward Substring Location
Forward substring location refers to the operation of searching for the first occurrence of a specified substring or character within a string, starting from the beginning (or an optional offset position), and returning its zero-based index if found. This functionality is essential for tasks such as parsing, validation, and text processing in programming languages. Most languages provide built-in methods that handle both single characters and longer substrings uniformly, treating characters as single-character strings, and perform case-sensitive searches by default.[44][30][45] In imperative languages like Java, JavaScript, and Python, the common approach uses methods such asindexOf or find that return an integer representing the position of the first match, or a sentinel value like -1 if no match is found. For example, in Java, the String.indexOf(String str) method searches from the start and returns the index or -1, with an overload indexOf(String str, int fromIndex) allowing specification of a starting offset for subsequent searches within the same string. Similarly, JavaScript's String.prototype.indexOf(searchString[, position]) supports an optional position parameter and returns -1 on failure, enabling efficient forward scans without regex overhead. Python's str.find(sub[, start[, end]]) mirrors this, returning -1 if the substring is absent, and allows both start and end bounds for more precise substring searches.[30][45][44]
Perl's built-in index(STRING, SUBSTRING[, POSITION]) function operates analogously, returning the position of the first occurrence starting from the optional POSITION (default 0) or -1 if not found, and it unifies character and substring searches by accepting any scalar as the substring. In C++, std::string::find(const std::string& str, size_t pos = 0) returns the position or std::string::npos (a special constant) on failure, with the pos parameter enabling offset-based searches; this design avoids negative returns to align with C++'s unsigned indexing conventions. C#'s String.IndexOf(string value[, int startIndex]) follows the -1 convention for not found cases, supporting overloads for startIndex and even StringComparison options, though the default is case-sensitive ordinal comparison.[46][47]
Functional languages often return optional types to handle absence more safely without sentinels. In Rust, &str::find(&self, pat: &str) -> Option<usize> yields Some(index) if the pattern is found starting from the beginning, or None otherwise, and it lacks a direct offset parameter but can be composed with slicing for similar effects; characters are searched via the same method by passing a single-character pattern. Go's strings.Index(s, substr string) int returns the byte index or -1, performing a simple linear forward search without built-in offset support in this function, though higher-level compositions are possible. In Haskell, the text package provides breakOn :: Text -> Text -> (Text, Text) to find the first occurrence of a substring (needle) in a text (haystack), returning the prefix before the match and the remainder; the starting index can be computed as length (fst result) if the remainder begins with the needle, or Nothing otherwise for Maybe Int via composition. Base String ([Char]) uses similar compositions with Data.List.breakOn from the base library, emphasizing safe handling in pure functional contexts.[48][49][50]
The following table summarizes return behaviors for not found cases across these languages:
| Language | Method/Function | Return on Not Found | Offset Support |
|---|---|---|---|
| Python | str.find() | -1 | Yes (start, end) |
| Java | String.indexOf() | -1 | Yes (fromIndex) |
| JavaScript | String.indexOf() | -1 | Yes (position) |
| Perl | index() | -1 | Yes (POSITION) |
| C++ | std::string::find() | std::string::npos | Yes (pos) |
| C# | String.IndexOf() | -1 | Yes (startIndex) |
| Rust | str::find() | None | No (use slicing) |
| Go | strings.Index() | -1 | No |
| Haskell | breakOn (text pkg, composed) | Nothing | Via composition |
text = "Hello, World!"
print(text.find("World")) # Output: 7
print(text.find("world")) # Output: -1 (case-sensitive)
text = "Hello, World!"
print(text.find("World")) # Output: 7
print(text.find("world")) # Output: -1 (case-sensitive)
Reverse Substring Location
Reverse substring location functions in programming languages enable the identification of the last occurrence of a specified substring within a string by searching from the end toward the beginning. These functions typically return the zero-based index of the starting position of the match, measured from the left end of the string, or a sentinel value like -1 if no match is found. This approach contrasts with forward substring location methods, which identify the first occurrence from the start.[51][52][53] In Python, thestr.rfind(sub[, start[, end]]) method performs this search, returning the highest index where the substring sub is found within the optional slice s[start:end], or -1 otherwise. Java's String.lastIndexOf(String str) method similarly returns the index of the last occurrence of str in the calling string, with an overloaded variant lastIndexOf(String str, int fromIndex) that initiates the backward search from the specified fromIndex position. JavaScript provides String.prototype.lastIndexOf(searchString[, position]), which searches backward from the optional position (defaulting to the string's length) and returns the index of the last match or -1. In VBA, the InStrRev(stringcheck, stringmatch[, start[, compare]]) function returns the one-based position of the last occurrence of stringmatch in stringcheck, starting from the optional start position (defaulting to the end), or 0 if absent.[51][52][54][53][55]
Perl's built-in rindex(STRING, SUBSTRING[, POSITION]) function returns the position of the last occurrence starting from the optional POSITION (default end) or -1 if not found, analogous to index but in reverse. In C++, std::string::rfind(const std::string& str, size_t pos = npos) returns the position of the last occurrence not after pos (default end) or std::string::npos on failure. Rust's &str::rfind(&self, pat: &str) -> Option<usize> yields Some(index) for the last match or None, composable with slicing for scoped searches. Go's strings.LastIndex(s, substr string) int returns the byte index of the last occurrence or -1, without direct offset but via slicing. In Haskell, the text package's breakOnEnd :: Text -> Text -> (Text, Text) finds the last occurrence by breaking from the end, with index computed as length haystack - length (snd result) if the prefix of the reversed remainder matches the reversed needle (or via composition); base String uses Data.List functions similarly.[56][57][58][59][50]
The position semantics remain consistent across these languages: despite the reverse search direction, the returned index always references the match's starting position from the string's beginning, facilitating straightforward extraction or further processing. For instance, in the string "banana", searching for "ana" yields index 3, as that is the starting position of the last match. Overloads allowing a starting index for the search enhance flexibility, enabling scoped reverse searches without full-string traversal, as seen in Java and JavaScript implementations.[52][53][55]
Edge cases, such as overlapping potential matches, are handled by prioritizing the rightmost possible starting index. In Python, for the string "aaa" and substring "aa", rfind returns 1, corresponding to the overlap at positions 1-2 rather than 0-1, ensuring the last valid match is selected. These functions generally treat empty substrings specially: Java's lastIndexOf("") returns the string length, while Python and JavaScript return -1 for empty searches unless the string is also empty.[51]
Not all languages provide built-in reverse substring location; for example, the C standard library lacks such a function, like strrstr or equivalent, necessitating manual implementation using loops or custom algorithms over strstr for forward searches. This absence highlights paradigm differences, where lower-level languages emphasize explicit control over string operations.
| Language | Function | Return Type | Sentinel Value | Overload for Start Index | Position Base |
|---|---|---|---|---|---|
| Python | str.rfind(sub[, start[, end]]) | int | -1 | Yes (start, end) | 0-based |
| Java | String.lastIndexOf(str[, fromIndex]) | int | -1 | Yes (fromIndex) | 0-based |
| JavaScript | String.lastIndexOf(searchString[, position]) | number | -1 | Yes (position) | 0-based |
| Perl | rindex(STRING, SUBSTRING[, POSITION]) | int | -1 | Yes (POSITION) | 0-based |
| C++ | std::string::rfind(str[, pos]) | size_t | std::string::npos | Yes (pos) | 0-based |
| C# | String.IndexOf() (last variant) | int | -1 | Yes (startIndex) | 0-based |
| Rust | str::rfind() | Option | None | No (use slicing) | 0-based |
| Go | strings.LastIndex() | int | -1 | No | 0-based |
| Haskell | breakOnEnd (text pkg, composed) | Maybe Int | Nothing | Via composition | 0-based |
| VBA | InStrRev(stringcheck, stringmatch[, start[, compare]]) | Long | 0 | Yes (start) | 1-based |
| C | None (manual) | N/A | N/A | N/A | N/A |
Modification Functions
Concatenation and Appending
In programming languages, string concatenation combines two or more strings into a single string, while appending adds content to the end of an existing string. These operations are fundamental for building dynamic text, but their implementation varies based on whether strings are immutable or mutable, affecting efficiency especially in repeated operations like loops. Languages with immutable strings, such as Java and Python, typically create new objects for each concatenation, leading to potential performance issues if not handled carefully, whereas mutable strings in C++ allow in-place modifications for better efficiency.[60][29][61] In Java, the+ operator is the primary means for concatenating strings, as in String result = "Hello" + " " + "World";, which the compiler optimizes using StringBuilder for simple cases but can lead to quadratic time complexity (O(n²)) in loops due to repeated object creation from immutable String objects. The concat() method provides an alternative for pairwise appending, as in "Hello".concat(" World"), returning a new String without modifying the original; however, it is less flexible than + since it accepts only String arguments and shares the same immutability drawbacks. For efficient repeated appending, StringBuilder is recommended, with its append() method offering amortized O(1) per operation via dynamic capacity growth, avoiding the overhead of immutable strings in loops.[60][62][63]
Python employs the + operator for concatenation, as in result = "Hello" + " " + "World", which creates a new string object each time due to string immutability, resulting in quadratic runtime for repeated operations in loops (e.g., building a string from many fragments). The += operator for appending is optimized in CPython to achieve linear time even in loops; however, for portability across implementations and clarity, str.join() is preferred for multiple concatenations, ensuring O(n) time, though details on joining iterables are covered elsewhere. Unlike mutable alternatives, Python lacks a built-in mutable string type, emphasizing list-based accumulation followed by joining for performance.[29][64]
JavaScript uses the + operator for string concatenation, as in let result = "Hello" + " " + "World";, which coerces non-string operands to strings and creates a new string, with no significant efficiency difference from other methods for small operations due to engine optimizations like V8's inline caching. Template literals, introduced in ES6, offer a more readable alternative for interpolation and concatenation, as in let result = Hello ${name};, supporting multiline strings and expressions without explicit + chaining, though they compile to similar underlying operations. The concat() method exists but is rarely used, as + is more idiomatic and performant in practice.[65]
In C++, std::string supports the += operator for efficient appending, as in std::string result = "Hello"; result += " World";, which modifies the string in place with amortized constant time per character appended, thanks to dynamic capacity reallocation similar to std::vector. The + operator creates a new string for concatenation, as in std::string result = "Hello" + " World";, which is less efficient for repeated use due to temporary object creation. For stream-based appending, std::ostringstream uses the << operator, as in std::ostringstream oss; oss << "Hello" << " World";, providing type-safe concatenation but involving more overhead than direct std::string operations; append() method offers another mutable alternative with similar efficiency to +=.[66][67]
Perl uses the dot operator . for concatenation, as in $result = "Hello" . " " . "World";, creating a new string due to immutability, but efficient for small operations. For repeated appending, scalar assignment with .= modifies in place efficiently, similar to C++'s +=, with linear time in loops; join() is also available for iterable concatenation.[68]
Ruby employs + for concatenation, as in result = "Hello" + " " + "World", creating new strings (immutable), potentially quadratic in loops. The << method appends efficiently in place for String, achieving linear time, while concat returns a new string; for multiple parts, + with arrays or join is used.[69]
Go's + operator creates new strings (immutable), inefficient for loops. For efficient appending, strings.Builder provides an Append method with amortized O(1) per operation, similar to Java's StringBuilder, recommended for building large strings.[70]
Swift uses + for concatenation and += for appending, both creating new values (value type, immutable), but compiler optimizes simple cases. For loops, String interpolation or array joining via joined(separator:) is preferred for efficiency; no built-in mutable string, but String is efficient for small operations.[71]
| Language | Primary Operator/Method | Mutability | Efficiency in Loops |
|---|---|---|---|
| Java | +, concat() | Immutable | Quadratic with +; linear with StringBuilder.append() |
| Python | +, += | Immutable | Linear in CPython (optimized); use join() for portability |
| JavaScript | +, template literals | Immutable | Optimized, near-linear in engines |
| C++ | +=, append() | Mutable | Amortized O(1) per append |
| Perl | ., .= | Immutable | Linear with .= |
| Ruby | +, << | Mutable (for <<) | Linear with << or join() |
| Go | +, strings.Builder.Append | Immutable | Linear with Builder |
| Swift | +, += | Immutable | Optimized; use joined() for loops |
Case Conversion
Case conversion functions in programming languages typically transform the alphabetical characters within a string to either uppercase or lowercase, leaving non-alphabetic characters unchanged. These operations are fundamental for tasks such as data normalization, text processing, and user interface consistency, and they vary in implementation across languages based on string mutability and Unicode support. In object-oriented languages like Java, theString class provides toUpperCase() and toLowerCase() methods, which return a new string instance since strings are immutable. For example, "hello".toUpperCase() yields "HELLO". Similarly, Python's str type offers upper() and lower() methods that also create new strings, as in "hello".upper() producing "HELLO". In contrast, procedural languages like C use character-level functions such as toupper() from <ctype.h>, which operate on individual characters and require manual string traversal, often modifying the string in place if it's mutable. Perl provides uc() for uppercase and lc() for lowercase, which return new strings by default but can modify variables in place via the \U or \L operators in substitution contexts.
Locale-aware variants enhance these functions to handle language-specific rules, particularly for Unicode characters. Java's toUpperCase(Locale) method, for instance, converts the German "ß" (sharp S) to "SS" when using the German locale, reflecting orthographic standards. Python's casefold() provides a locale-agnostic folding for caseless matching, but for precise locale support, it relies on external libraries like unicodedata. In C, the towupper() function from <wctype.h> supports wide characters and locales, ensuring correct handling in internationalized applications. Haskell, employing a functional paradigm, uses toUpper and toLower from Data.Char, which map over strings via higher-order functions like fmap, producing new lists of characters that can be concatenated into a string.
Edge cases highlight the importance of robust Unicode compliance. Non-letter characters, such as punctuation or digits, remain unaltered in all these functions; for example, "Hello123!" becomes "HELLO123!" in uppercasing across Java, Python, and Perl. Special Unicode behaviors include the Turkish dotted "İ" (U+0130), which uppercases to itself but lowercases to "i" (U+0069) without dot in locale-aware modes, preventing errors in internationalization—Java's locale-specific toLowerCase() handles this correctly for Turkish. Mutable strings are rare for case conversion due to the simplicity of creating copies, but in languages like C++, std::transform with std::toupper can modify std::string in place for efficiency in performance-critical code.
| Language | Uppercase Function | Lowercase Function | Returns New String? | Locale Support |
|---|---|---|---|---|
| Java | toUpperCase() | toLowerCase() | Yes (immutable) | Yes, via Locale parameter |
| Python | upper() | lower() | Yes (immutable) | Partial (via casefold() or libraries) |
| Perl | uc() | lc() | Yes (default) | Yes, via uc() with locale |
| C | toupper() (per char) | tolower() (per char) | N/A (manual) | Yes, via towupper() |
| Haskell | toUpper (per char) | toLower (per char) | Yes (functional map) | Basic Unicode via Data.Char |
Extraction and Slicing Functions
Substring Extraction
Substring extraction refers to operations in programming languages that allow retrieval of a contiguous portion of a string based on specified indices, enabling manipulation of text segments without altering the original string. This functionality is fundamental for tasks such as parsing, data processing, and text analysis, with implementations varying in syntax, parameter handling, and error semantics across languages. Most languages provide methods or operators that specify a starting position and either an ending position or a length, often returning a new string containing the extracted characters. Common syntax patterns include method calls likesubstring(start, end) or substr(start, length), and operator-based slicing such as [start:end]. For instance, Java's String class uses the substring(int beginIndex, int endIndex) method, which extracts from the inclusive beginIndex to the exclusive endIndex.[72] Similarly, Python employs slicing notation s[start:stop], where start is inclusive and stop is exclusive.[26] In JavaScript, the preferred slice(start, end) method follows the same inclusive-exclusive convention, while the deprecated substr(start, length) uses a length parameter instead of an end index.[73] Perl's substr(EXPR, OFFSET, LENGTH) function supports a length-based extraction, with OFFSET and LENGTH that can be negative for end-relative positioning.[74] C++'s std::string::substr(pos, count) extracts up to count characters starting from pos, defaulting to the end of the string if count is omitted.[75]
The treatment of index boundaries differs significantly. In languages like Python, Java, JavaScript's slice, C++, and Perl, the end index (when provided) is exclusive, meaning it points to the first character not included in the result; for example, Python's "abc"[1:3] yields "bc".[26][72][73] In contrast, some variants like AWK's 1-based substr(string, start, length) include the full length without an exclusive end, extracting exactly length characters or fewer if bounds are exceeded.[76] JavaScript's deprecated substr and Visual Basic's Mid(string, start, length) also use length parameters, making the end inclusive up to the specified count.[77][78]
Default behaviors enhance usability by allowing partial specifications. Python defaults start to 0 and stop to the string length if omitted, so s[2:] extracts from index 2 to the end, and s[:3] takes the first three characters.[26] Java provides an overload substring(beginIndex) that defaults the end to the string length.[72] JavaScript's slice mirrors Python with defaults of 0 and length, while Perl and C++ default LENGTH or count to the remaining string if unspecified.[73][74][75] AWK's substr(string, start) omits length to go to the end, using 1-based indexing where position 1 is the first character.[76]
Error handling for invalid indices varies to balance safety and convenience. Java throws an IndexOutOfBoundsException for negative indices, beginIndex > endIndex, or endIndex > length.[72] C++ raises std::out_of_range if the starting position exceeds the string size.[75] Perl returns undef and issues a warning if the extraction goes beyond the string end.[74] In contrast, Python and JavaScript's slice avoid exceptions by clamping indices: out-of-bounds start or stop adjust to the string length or 0, returning an empty string or partial result without error, as in Python's "abc"[10:] yielding "".[26][73] AWK returns an empty string if start exceeds the length, and Visual Basic's Mid does the same for start > length.[76][78] JavaScript's substr returns empty if start >= length.[77]
Specialized variants exist for common extractions like prefixes, suffixes, or middles. Visual Basic provides Left(string, length) for the first length characters and Right(string, length) for the last, both 1-based and returning the full string if length >= actual length.[78] Its Mid function handles middle sections similarly. AWK achieves left or right extraction via substr parameters, such as substr($1, 1, 3) for the first three characters.[76] Perl supports negative offsets in substr for right-aligned starts, like substr($s, -3) for the last three characters.[74] These variants simplify frequent operations but are often emulatable with general substring functions.
| Language | Syntax Example | Inclusive/Exclusive | Defaults | Bounds Error Handling |
|---|---|---|---|---|
| Python | s[start:stop] | Start incl., stop excl. | start=0, stop=len(s) | Clamps; no exception, empty if invalid |
| Java | str.substring(begin, end) | Begin incl., end excl. | End=len(str) in overload | IndexOutOfBoundsException |
| JavaScript | str.slice(start, end) | Start incl., end excl. | start=0, end=len(str) | Empty string; no exception |
| Perl | substr($s, offset, length) | Length-based | length=to end | undef + warning if beyond end |
| C++ | str.substr(pos, count) | Pos incl., count excl. | count=to end | out_of_range if pos > size() |
| AWK | substr(str, start, length) | Length-based, 1-based | length=to end | Empty string if start > len |
| Visual Basic | Mid(str, start, length) | Length-based, 1-based | length=to end | Empty if start > len |
Trimming Whitespace
Trimming whitespace from strings is a fundamental operation in many programming languages, used to clean input data, normalize strings for comparison, or prepare text for further processing. This involves removing leading (prefix), trailing (suffix), or both types of whitespace characters from the ends of a string, without altering the original string due to immutability in most languages. Common implementations provide built-in methods that automate this process, differing in their definitions of whitespace, support for customization, and available variants for one-sided trimming. In Java, theString.trim() method removes leading and trailing characters whose Unicode codepoints are less than or equal to U+0020 (the space character), effectively stripping spaces, tabs, newlines, and other control characters up to that point.[79] Python's str.strip() serves a similar purpose for both ends, defaulting to whitespace characters such as spaces, tabs, newlines, carriage returns, formfeeds, and vertical tabs, while JavaScript's String.prototype.trim() targets both ends using a broader definition that includes all ECMAScript whitespace (e.g., spaces, tabs, line feeds, carriage returns) and line terminators.[80][81] These functions return a new string, leaving the original unchanged, and handle edge cases consistently: an empty input string yields an empty result, an all-whitespace string results in an empty string, and a string without leading or trailing whitespace returns unchanged.[79][80][81]
For more granular control, languages like Python and PHP offer variants for left- or right-only trimming. Python provides lstrip() for leading characters and rstrip() for trailing, both customizable via an optional chars parameter that specifies any set of characters to remove (not limited to whitespace), where it strips all occurrences of those characters from the respective end until a non-matching character is found.[80] For example, ' hello '.lstrip() yields 'hello ', while 'hello!'.strip('!') results in 'hello'.[80] PHP similarly includes trim() for both ends, ltrim() for the beginning, and rtrim() for the end, all defaulting to a fixed set of whitespace characters (space, tab, newline, carriage return, null byte, vertical tab) but allowing customization with a second parameter to define a character set or range (e.g., trim($str, " \t.")).[82] These return new strings as well.[82]
Perl, while lacking dedicated built-in trim functions in earlier versions (often relying on regular expressions like $str =~ s/^\s+|\s+$//g), introduced builtin::trim() in Perl 5.36 for both leading and trailing whitespace, which includes ordinary spaces, tabs, newlines, carriage returns, and all Unicode whitespace characters.[83] This function returns a modified copy of the input string and is non-experimental since Perl 5.40. For one-sided trimming, Perl developers typically use regex substitutions, such as s/^\s+// for left and s/\s+$// for right, though no standardized ltrim() or rtrim() equivalents exist in the core language.[83]
The following table summarizes key differences in trimming functions across these languages:
| Language | Function(s) | Sides Trimmed | Default Whitespace Characters | Customizable? | Return Type | Notes |
|---|---|---|---|---|---|---|
| Java | trim() | Both | Unicode codepoints ≤ U+0020 (e.g., space, tab, newline, other controls up to space) | No | New String | Uses internal substring for extraction; fixed definition excludes some Unicode whitespace.[79] |
| Python | strip(), lstrip(), rstrip() | Both, left, right | Space, tab, newline, carriage return, formfeed, vertical tab (' \t\n\r\f\v') | Yes (via chars) | New str | strip() is an alias for full trimming; removes all instances of specified chars from ends.[80] |
| JavaScript | trim() (also trimStart(), trimEnd()) | Both (variants: left, right) | ECMAScript whitespace (spaces, tabs) and line terminators (e.g., \n, \r, \u2028, \u2029) | No | New String | Polyfillable in older environments; focuses on lexical grammar definitions.[81] |
| Perl | builtin::trim() | Both | Space, tab, newline, carriage return, all Unicode whitespace (\s class) | No | New scalar | Available since 5.36; one-sided via regex (e.g., s/^\s+//); no core ltrim/rtrim.[83] |
| PHP | trim(), ltrim(), rtrim() | Both, left, right | Space, tab, newline, carriage return, null byte, vertical tab (" \t\n\r\0\v") | Yes (via character mask) | New string | Supports character ranges (e.g., "\x00..\x1F"); multibyte-safe variants in mbstring extension.[82] |
Decomposition Functions
Splitting Strings
Splitting strings involves dividing a string into a sequence of substrings based on specified delimiters, commonly returning an array or list of elements. This operation is fundamental in text processing across programming languages, enabling tasks such as parsing delimited data or tokenizing input. Languages differ in delimiter support, handling of edge cases like consecutive delimiters, and optional limits on the number of splits.[84][85] In Python, thestr.split() method splits a string using a separator string (defaulting to any whitespace), returning a list of substrings; it does not support regular expressions natively, requiring the re.split() function for pattern-based splitting. For example, "a,,b".split(",") yields ['a', '', 'b'], including empty strings for consecutive delimiters, while whitespace splitting collapses multiples: "a b".split() returns ['a', 'b']. The optional maxsplit parameter limits the number of splits, such as maxsplit=1 producing ['a', ',b'] from "a,b".[84]
Java's String.split(String regex, int limit) method uses a regular expression as the delimiter, always requiring regex syntax even for simple characters, and returns a String[] array. Consecutive delimiters produce empty strings, as in "a,,b".split(",") resulting in {"a", "", "b"}; the limit parameter controls the array size—positive values cap splits and retain trailing text, zero discards trailing empties, and negative allows all splits with empties. For instance, "a,b,c".split(",", 2) yields {"a", "b,c"}. This regex reliance can introduce pattern compilation overhead.[85]
JavaScript's String.prototype.split(separator, limit) accepts either a string or regular expression as separator, returning an array of substrings; an empty separator splits into UTF-16 code units. It includes empty strings for consecutive separators: "a,,b".split(",") gives ["a", "", "b"]. The limit (non-negative integer) restricts elements, omitting excess text beyond the limit, and a limit of 0 returns an empty array. Capturing groups in regex separators are included in the result array.[86]
PHP's explode(string $separator, string $string, int $limit = PHP_INT_MAX) function splits using a literal string separator (no regex; use preg_split() for patterns), returning an array and throwing a ValueError for empty separators in PHP 8.0+. Consecutive separators yield empty elements: explode(",", "a,,b") produces ["a", "", "b"]. The limit behaves differently—positive values cap elements with remainder in the last, negative excludes the last |limit| elements, and zero acts as 1. An empty input string returns [""].[87]
In AWK, the split(string, array [, fieldsep]) function divides a string into an array using an extended regular expression (ERE) delimiter (defaulting to the global FS field separator, often whitespace), returning the element count; prior array contents are cleared. It treats consecutive delimiters as separate unless FS is whitespace, which collapses them: split("a,,b", a, ",") populates a[1]="a", a[2]="", a[3]="b". Field splitting via FS (e.g., FS=",") handles input lines similarly, supporting multiple characters or EREs like FS="[;,]".[88]
| Language | Function | Delimiter Type | Limit Parameter | Consecutive Delimiters | Return Type |
|---|---|---|---|---|---|
| Python | str.split() | String (whitespace default); no regex | maxsplit: caps splits, retains remainder | Includes empties for explicit sep; collapses for whitespace | List |
| Java | String.split() | Regex only | limit: positive caps with remainder; 0 discards trailing empties; negative full with empties | Includes empties | String[] |
| JavaScript | String.split() | String or regex | Non-negative: caps elements, omits excess; 0 returns [] | Includes empties | Array |
| PHP | explode() | String only; no regex | Positive: caps with remainder in last; negative: excludes last |limit|; 0 as 1 | Includes empties | Array |
| AWK | split() | ERE (regex); defaults to FS | None | Includes empties (except whitespace collapse) | Array (populated, returns count) |
Joining Collections
Joining collections refers to the process of concatenating multiple strings from an iterable or array into a single string, typically inserting a specified separator between each pair of elements. This operation is fundamental in string manipulation for tasks such as formatting lists, building CSV data, or reconstructing strings from parsed components. Unlike pairwise concatenation, which builds strings incrementally, joining functions handle the entire collection efficiently in one call, avoiding intermediate string allocations in many implementations.[91][92][93] In Python, thestr.join(iterable) method performs this operation, where the string instance serves as the separator and the iterable must contain strings or objects convertible to strings via str(). Non-string elements raise a TypeError. For example, ", ".join(["apple", "banana", "cherry"]) yields "apple, banana, cherry". An empty iterable returns an empty string, while a single-element iterable returns that element without any separator. This design ensures type safety and efficiency for iterables like lists or tuples.[91]
JavaScript provides Array.prototype.join([separator]), which converts array elements to strings using toString() and joins them with the optional separator (defaulting to a comma). Undefined or null elements are treated as empty strings. For instance, ["a", "b", "c"].join("-") produces "a-b-c". Empty arrays return an empty string, and single-element arrays return the stringified element alone. This method is versatile for handling mixed-type arrays common in web development.[92]
PHP's implode(separator, array) (or its alias join) concatenates array values into a string, converting non-string elements automatically. The separator defaults to an empty string if omitted. An example is implode(",", ["foo", "bar"]) resulting in "foo,bar". Empty arrays yield an empty string, and single-element arrays return the element without separators. Associative arrays use values only, and objects with __toString() are supported, making it flexible for dynamic data.[93]
In contrast, low-level languages like C lack a built-in joining function in the standard library, requiring manual iteration over a null-terminated array of strings (e.g., char**) with functions like strcat or snprintf in a loop to allocate and build the result. This approach demands explicit memory management and error handling for buffer overflows. For example, developers might use a loop to append strings with separators, as no single standard function exists for the operation.[94]
The following table summarizes key differences across these languages:
| Language | Function | Separator Placement | Input Handling | Edge Cases |
|---|---|---|---|---|
| Python | str.join(iterable) | Between elements only | Strict: strings only, else TypeError | Empty: ""; Single: no separator |
| JavaScript | Array.join([sep]) | Between elements only | Auto-converts to strings; null/ undefined → "" | Empty: ""; Single: no separator |
| PHP | implode(sep, array) | Between elements only | Auto-converts; supports objects | Empty: ""; Single: no separator |
| C | None (manual loop) | Manual | Manual string array handling | Requires custom implementation |
Replacement and Formatting Functions
Substring Replacement
Substring replacement functions in programming languages allow developers to identify specific substrings within a string and substitute them with new content, facilitating text processing tasks such as data cleaning and pattern-based modifications. These operations typically distinguish between replacing all occurrences or limiting to the first match, and they often support both literal strings and regular expressions (regex) for more flexible matching. Most languages return a new string to preserve immutability, though some modify the original in place. Case sensitivity is the default behavior, with optional flags to ignore it in regex-enabled variants.[95][96] In Python, thestr.replace(old, new[, count]) method replaces all occurrences of the substring old with new, or the first count occurrences if specified; it operates on literal strings and is case-sensitive. For example, "hello world".replace("o", "a") yields "hella warld", replacing all instances. Python's standard library also provides re.sub(pattern, repl, string[, count=0]) from the re module for regex-based replacements, supporting case-insensitive matching via the re.IGNORECASE flag; this function similarly returns a new string and can limit substitutions. Unlike literal replacement, re.sub allows advanced pattern matching, such as substituting digits with asterisks in "abc123def", resulting in "abc***def".[95][97]
Java's String class offers replace(CharSequence target, CharSequence replacement) to substitute all non-overlapping occurrences of target with replacement using literal matching, which is case-sensitive and returns a new String instance due to string immutability. For instance, "hello world".replace("o", "a") produces "hella warld". Java also provides replaceFirst(String regex, String replacement) and replaceAll(String regex, String replacement) for regex support, where replaceFirst limits to the initial match and replaceAll handles all; case-insensitivity is achieved with regex flags like (?i). These methods do not support a direct count limit beyond the first/all distinction, emphasizing regex integration for complex substitutions over simple literals.[96]
Perl employs the substitution operator s/[pattern](/page/Pattern)/replacement/[gmi] for in-place modifications, where the /g flag replaces all occurrences, /i enables case-insensitivity, and no flag limits to the first match; it uses Perl's powerful regex engine by default. For example, $str = "hello world"; $str =~ s/o/a/g; changes $str to "hella warld", returning the number of substitutions (2 in this case). This operator modifies the string variable directly, unlike Python or Java, but can be used in list contexts to return modified copies. Perl's approach prioritizes regex for all substitutions, making literal replacements a special case without a dedicated non-regex function.[98]
In AWK, the gsub(regex, replacement [, target]) function performs global regex-based replacements, returning the number of substitutions while modifying target (or $0 if omitted); it is case-sensitive by default with no built-in ignore-case flag, though regex patterns can incorporate case-insensitivity. For instance, in a script, gsub(/o/, "a", $0) on "hello world" alters it to "hella warld" and returns 2. AWK lacks a literal-string-only replacement but offers sub(regex, replacement [, target]) for the first match only, aligning with its text-processing focus in Unix environments. Like Perl, it modifies in place but returns a count rather than the new string.
Across these languages, substring replacement emphasizes efficiency for common text tasks, with Python and Java favoring immutable returns for thread safety, while Perl and AWK enable direct mutation for scripting performance. Regex support varies in depth—advanced in Perl and Python's re module, basic but integrated in Java and AWK—allowing case-insensitive operations via flags, though literal methods remain strictly case-sensitive. Limitations on replacement counts are more granular in Python (arbitrary N) compared to the binary first/all choices in Java, Perl, and AWK.[95][96][98]
String Formatting and Interpolation
String formatting and interpolation in programming languages provide mechanisms to embed dynamic values into strings using templates, enabling the construction of formatted output such as logs, user interfaces, or reports. These techniques range from traditional C-style functions that use placeholders for substitution to modern interpolation syntax that integrates expressions directly into string literals, offering improved readability and reduced boilerplate code. In C, thesprintf function from the standard library (<stdio.h>) exemplifies the classic approach, where a format string specifies placeholders like %s for strings, %d for integers, or %f for floats, followed by arguments to fill them. For instance, sprintf(buf, "%s %d", "hello", 42); produces the string "hello 42" in a buffer buf, supporting advanced options such as alignment (e.g., %-10s for left-justified) and precision (e.g., %.2f for two decimal places). This method, defined in the ISO C standard, operates at runtime and requires careful buffer management to avoid overflows, with no built-in type safety beyond basic format matching.
Java adopts a similar printf-inspired model through the String.format static method in the java.lang.String class, which uses printf-style format specifiers such as %s for strings and %d for integers, with support for numbered arguments (e.g., %1$s). For example, String.format("Hello %s, age %d", "Alice", 30); yields "Hello Alice, age 30". Introduced in Java 5, this API draws from the java.util.Formatter class and supports flags for width, precision, and padding, such as "%10.2f" for a right-aligned float with two decimals. Unlike C's sprintf, Java's implementation provides compile-time checks for format string validity via the @FormatMethod annotation in some IDEs, though argument type mismatches are caught at runtime; it also integrates locale-aware formatting for numbers and dates through java.text.MessageFormat.
Python introduced formatted string literals, or f-strings, in version 3.6 via PEP 498, allowing direct expression evaluation within strings prefixed by f, such as f"{name} is {age}" where name and age are variables. This interpolation syntax supports format specifiers inside braces (e.g., f"{value:10.2f}" for aligned floats), evaluated at compile-time for basic expressions to enable some type safety, though complex expressions defer to runtime. F-strings improve upon Python's older %-formatting (similar to C) and str.format() method (using {} placeholders), offering concise syntax without method calls, and they handle nested formatting for dates or custom objects via the __format__ protocol.
JavaScript employs template literals, introduced in ECMAScript 2015 (ES6), using backticks and ${} for interpolation, as in `Hello ${name}, you are ${age} years old.`, which substitutes variable values seamlessly. This feature, specified in the ECMAScript standard, supports tagged templates for custom processing and multiline strings, with runtime evaluation but no inherent compile-time type checking in most environments; placeholders can include expressions like ${age > 18 ? 'adult' : 'minor'}, and libraries like Intl API extend it for locale-specific number formatting (e.g., new Intl.NumberFormat('de-DE').format([value](/page/The_Variable))). Compared to concatenation (a simpler but less efficient alternative for basic cases), template literals reduce error-prone string joining.
Across these languages, type safety varies: C and JavaScript rely on runtime validation, potentially leading to exceptions for mismatches, while Java and Python offer partial compile-time aids through annotations or expression parsing. Locale support enhances internationalization, as seen in Java's MessageFormat for pattern-based substitution with cultural adaptations (e.g., comma vs. period decimals), and similar capabilities in Python's strftime for dates or JavaScript's Intl extensions, ensuring formatted strings adapt to user locales without hardcoded adjustments.
Advanced or Specialized Functions
String Partitioning
String partitioning functions divide a string into three parts based on the first occurrence of a specified separator: the substring before the separator, the separator itself, and the substring after it. This approach provides a structured way to extract components around a delimiter without fully splitting the string into multiple pieces. Such functions are particularly useful in languages that emphasize readability and rapid prototyping, though they are not universally available across programming paradigms.[99] In Python, thestr.partition(sep) method implements this functionality by returning a tuple containing the three parts. For instance, "a:b:c".partition(":") yields ("a", ":", "b:c"), capturing only the first delimiter while preserving the remainder intact. If the separator is not found, it returns a tuple with the original string followed by two empty strings, such as "abc".partition(":") producing ("abc", "", ""). An empty separator raises a ValueError, ensuring the method is used with a valid delimiter. This design supports efficient parsing in scripting contexts.[99]
In Ruby, the String#partition(sep) method works similarly, returning a three-element array with the part before the match, the match itself, and the part after. For example, "a:b:c".partition(":") yields ["a", ":", "b:c"]. If the separator is not found, it returns the original string followed by two empty strings, such as ["abc", "", ""]. Ruby's implementation also supports regular expressions as separators.[100]
Unlike Python and Ruby, many languages lack a built-in partitioning function, requiring manual implementation. In Java, the String class provides split(regex, limit) to achieve similar results by limiting splits to two, but it discards the separator and returns an array rather than including it explicitly; for example, "a:b:c".split(":", 2) gives ["a", "b:c"], necessitating additional logic to reconstruct the separator.[101] In C, string partitioning must be implemented manually using functions like strstr from <string.h> to locate the delimiter, followed by pointer arithmetic or strncpy to extract parts, as no standard library method returns the three components directly.
Python also offers str.rpartition(sep) as a variant that splits at the last occurrence of the separator, returning ("a:b", ":", "c") for "a:b:c".rpartition(":") and two empty strings followed by the original if not found; this contrasts with partition by focusing on the end of the string.[102] Ruby provides a similar rpartition method for the last occurrence.[103]
Common use cases for partitioning include parsing simple key-value pairs in configuration files or protocols, such as extracting a hostname from a URL like "scheme://host/path".partition("://") yielding ("scheme", "://", "host/path"). This feature is available in scripting languages like Python and Ruby. In other languages like Perl, split can mimic partitioning with limits, enhancing text processing workflows without the overhead of full decomposition.[99]
String Reversal
String reversal is a fundamental string operation that inverts the order of characters within a string, often implemented differently across programming languages due to variations in string mutability and built-in support. In languages with immutable strings, reversal typically produces a new string, while mutable strings may allow in-place modification for efficiency. This operation is not universally standardized, appearing as a dedicated built-in in some languages but requiring manual implementation or auxiliary data structures in others, particularly for algorithmic exercises or palindrome checks. Perl provides a built-inreverse function that directly reverses a string in scalar context, returning a new string with characters in inverted order; for example, reverse "abc" yields "cba". This function handles both strings and lists in list context but is particularly efficient for scalar strings due to its native implementation. In Ruby, the reverse method returns a new string with the characters reversed, such as "abc".reverse yielding "cba"; there is also a reverse! method for in-place reversal on mutable strings.[104] In contrast, Python lacks a direct string reversal method but offers the reversed() iterator, which can be converted to a string via ''.join(reversed(s)) for a reversed copy; this approach iterates from the end without modifying the original immutable string.
Java, with its immutable String class, requires the mutable StringBuilder for efficient reversal, where the reverse() method inverts the contents in place and returns the modified builder, which can then be converted back to a string via toString(). For instance, new StringBuilder("abc").reverse().toString() produces "cba". Similarly, C++ supports in-place reversal of mutable std::string objects using the std::reverse algorithm from <algorithm>, which swaps elements beginning and ending from the string's iterators; this is efficient for large strings as it avoids allocation of a new object. In C, where strings are null-terminated character arrays, no built-in reversal exists, necessitating a manual loop to swap characters from the ends toward the center, often implemented with two pointers for O(n) time complexity.
Edge cases in string reversal include empty strings, which remain empty upon reversal, and strings of odd length, where the middle character stays in place without special handling in most implementations. Unicode support introduces complexities, particularly with bidirectional text (e.g., mixing Latin and Arabic scripts), where naive reversal may disrupt logical reading order as defined by the Unicode Bidirectional Algorithm; languages like Python and Java handle this by reversing code points but may require additional normalization for correct visual rendering.
Despite its utility in puzzles, data processing, and certain algorithms like checking for palindromes, dedicated string reversal is relatively rare as a core built-in across languages, with many (e.g., JavaScript, which uses s.split('').reverse().join('')) relying on decomposition into arrays for reversal via generic collection methods. This scarcity reflects a design philosophy prioritizing general-purpose operations over specialized ones, often leaving reversal to user-defined functions for portability.