Recent from talks
Nothing was collected or created yet.
Comparison of programming languages (syntax)
View on Wikipedia
This article compares the syntax of many notable programming languages.
Expressions
[edit]Programming language expressions can be broadly classified into four syntax structures:
- prefix notation
- Lisp
(* (+ 2 3) (expt 4 5))
- infix notation
- Fortran
(2 + 3) * (4 ** 5)
- suffix, postfix, or Reverse Polish notation
- Forth
2 3 + 4 5 ** *
- math-like notation
- TUTOR
(2 + 3)(45) $$ note implicit multiply operator
Statement delimitation
[edit]A language that supports the statement construct typically has rules for one or more of the following aspects:
- Statement terminator – marks the end of a statement
- Statement separator – demarcates the boundary between two statements; not needed for the last statement
- Line continuation – escapes a newline to continue a statement on the next line
Some languages define a special character as a terminator while some, called line-oriented, rely on the newline. Typically, a line-oriented language includes a line continuation feature whereas other languages have no need for line continuation since newline is treated like other whitespace. Some line-oriented languages provide a separator for use between statements on one line.
| Language | Statement delimitation |
|---|---|
| ABAP | period separated |
| Ada | semicolon terminated |
| ALGOL | semicolon separated |
| ALGOL 68 | semicolon and comma separated[1] |
| APL | newline terminated, [Direct_function ⋄] separated
|
| AppleScript | newline terminated |
| AutoHotkey | newline terminated |
| Awk | newline or semicolon terminated |
| BASIC | newline terminated, colon separated |
| Boo | newline terminated |
| C | semicolon terminated, comma separated expressions |
| C++ | semicolon terminated, comma separated expressions |
| C# | semicolon terminated |
| COBOL | whitespace separated, sometimes period separated, optionally separated with commas and semi-colons |
| Cobra | newline terminated |
| CoffeeScript | newline terminated |
| CSS | semicolon terminated |
| D | semicolon terminated |
| Eiffel | newline terminated, semicolon separated |
| Erlang | colon separated, period terminated |
| F# | newline terminated, semicolon |
| Fortran | newline terminated, semicolon separated |
| Forth | semicolons terminate word definitions; space terminates word use |
| GFA BASIC | newline terminated |
| Go | semicolon separated (inserted by compiler) |
| Haskell | in do-notation: newline separated, in do-notation with braces: semicolon separated |
| Java | semicolon terminated |
| JavaScript | semicolon separated (but often inserted as statement terminator) |
| Kotlin | semicolon separated (but sometimes implicitly inserted on newlines) |
| Lua | whitespace separated (semicolon optional) |
| Mathematica a.k.a. Wolfram | semicolon separated |
| MATLAB | newline terminated, separated by semicolon or comma (semicolon – result of receding statement hidden, comma – result displayed) |
| MUMPS a.k.a. M | newline terminates line-scope, the closest to a "statement" that M has, a space separates/terminates a command, allowing another command to follow |
| Nim | newline terminated |
| Object Pascal (Delphi) | semicolon separated |
| Objective-C | semicolon terminated |
| OCaml | semicolon separated |
| Pascal | semicolon separated |
| Perl | semicolon separated |
| PHP | semicolon terminated |
| Pick Basic | newline terminated, semicolon separated |
| PowerShell | newline terminated, semicolon separated |
| Prolog | comma separated (conjunction), semicolon separated (disjunction), period terminated (clause) |
| Python | newline terminated, semicolon separated |
| R | newline terminated, semicolon separated[2] |
| Raku | semicolon separated |
| Red | whitespace separated |
| Ruby | newline terminated, semicolon separated |
| Rust | semicolon terminated, comma separates expressions |
| Scala | newline terminated, semicolon separator |
| Seed7 | semicolon separated (semicolon termination is allowed) |
| Simula | semicolon separated |
| S-Lang | semicolon separated |
| Smalltalk | period separated |
| Standard ML | semicolon separated |
| Swift | semicolon separated (inserted by compiler) |
| Tcl | newline or semicolon terminated |
| V (Vlang) | newline terminated, comma or semicolon separated |
| Visual Basic | newline terminated, colon separated |
| Visual Basic (.NET) | newline terminated, colon separated |
| Xojo | newline terminated |
| Zig | semicolon terminated |
Line continuation
[edit]Listed below are notable line-oriented languages that provide for line continuation. Unless otherwise noted the continuation marker must be the last text of the line.
- bash[3] and other Unix shells
- C preprocessor macros; used in conjunction with C, C++ and many other programming contexts
- Mathematica, Wolfram Language
- Python[4]
- Ruby
- JavaScript – only within single- or double-quoted strings
- Vimscript as first character of continued line
- Ellipsis (three dots)
- MATLAB: The ellipsis need not end the line, but text following it is ignored.[5] It begins a comment that extends through (including) the first subsequent newline. Contrast this with a line comment which extends until the next newline.
- Ruby: comment may follow delimiter
- Batch file: starting a parenthetical block can allow line continuation[6]
- Ruby: left parenthesis, left square bracket, or left curly bracket
- Ruby: as last object of line; comment may follow operator
- AutoHotkey: As the first character of continued line; any expression operators except ++ and --, and a comma or a period[7]
- Some form of line comment serves as line continuation
- Turbo Assembler:
\ - m4:
dnl - TeX:
%
- Character position
- Fortran 77: A non-comment line is a continuation of the prior non-comment line if any non-space character appears in column 6. Comment lines cannot be continued.
- COBOL: String constants may be continued by not ending the original string in a PICTURE clause with
', then inserting a-in column 7 (same position as the*for comment is used.) - TUTOR: Lines starting with a tab (after any indentation required by the context) continue the prior command.
The C compiler concatenates adjacent string literals even if on separate lines, but this is not line continuation syntax as it works the same regardless of the kind of whitespace between the literals.
Consuming external software
[edit]This section needs expansion. You can help by adding to it. (December 2009) |
Languages support a variety of ways to reference and consume other software in the syntax of the language. In some cases this is importing the exported functionality of a library, package or module but some mechanisms are simpler text file include operations.
Import can be classified by level (module, package, class, procedure,...) and by syntax (directive name, attributes,...).
- File include
#include <filename>or#include "filename"– C preprocessor used in conjunction with C and C++ and other development tools
- File import
addpath(directory)– MATLAB[8]COPY filename.– COBOLimport <filename>;orimport "filename";– C++:-include("filename").– Prolog#include file="filename"– ASP#include <filename>or#include "filename"– AutoHotkey, AutoIt#import "filename"or#import <filename>– Objective-CImport["filename"]– Mathematica, Wolfram Languageinclude 'filename'– Fortraninclude "filename";– PHPinclude [filename] programor#include [filename] program– Pick Basicinclude!("filename");– Rustload "filename"– Rubyload %filename– Redrequire('filename')– Luarequire "filename";– Perl, PHPrequire "filename"– Rubysource(""filename"")– R@import("filename");– Zig
- Package import
#include filename– Cimport module;– C++#[path = "filename"] mod altname;– Rust@import module;– Objective-C<<name– Mathematica, Wolfram Language:-use_module(module).– Prolog:from module import *– Pythonextern crate libname;– orextern crate libname as altname;ormod modname;– Rustlibrary("package")– R:IMPORT module– Oberonimport altname "package/name"– Go:import package.module;orimport altname = package.module;– Dimport Moduleorimport qualified Module as M– Haskellimport package.*– Java, MATLAB, Kotlinimport "modname";– JavaScriptimport altname from "modname";–JavaScriptimport packageorimport package._– Scalaimport module– Swiftimport module– V (Vlang)import module– Pythonrequire('modname')– Luarequire "gem"– Rubyuse module– Fortran 90+use module, only : identifier– Fortran 90+use Module;– Perluse Module qw(import options);– Perluse Package.Name– Cobrauses unit– Pascalwith package– Ada@import("pkgname");– Zig
- Class import
from module import Class– Pythonimport package.class– Java, MATLAB, kotlinimport class from "modname";– JavaScriptimport {class} from "modname";– JavaScriptimport {class as altname} from "modname";– JavaScriptimport package.class– Scalaimport package.{ class1 => alternativeName, class2 }– Scalaimport package._– Scalause Namespace\ClassName;– PHPuse Namespace\ClassName as AliasName;– PHPusing namespace::subnamespace::Class;– C++
- Procedure/function import
from module import function– Pythonimport package.module : symbol;– Dimport package.module : altsymbolname = symbol;– Dimport Module (function)– Haskellimport function from "modname";– JavaScriptimport {function} from "modname";– JavaScriptimport {function as altname} from "modname";– JavaScriptimport package.function– MATLABimport package.class.function– Scalaimport package.class.{ function => alternativeName, otherFunction }– Scalause Module ('symbol');– Perluse function Namespace\function_name;– PHPuse Namespace\function_name as function_alias_name;– PHPusing namespace::subnamespace::symbol;– C++use module::submodule::symbol;– Rustuse module::submodule::{symbol1, symbol2};– Rustuse module::submodule::symbol as altname;– Rust
- Constant import
use const Namespace\CONST_NAME;– PHP
The above statements can also be classified by whether they are a syntactic convenience (allowing things to be referred to by a shorter name, but they can still be referred to by some fully qualified name without import), or whether they are actually required to access the code (without which it is impossible to access the code, even with fully qualified names).
- Syntactic convenience
import package.*Javaimport package.classJavaopen moduleOCamlusing namespace namespace::subnamespace;– C++use module::submodule::*;– Rust
- Required to access code
import module;C++import altname "package/name"Goimport altname from "modname";JavaScriptimport modulePython
Block delimitation
[edit]A block is a grouping of code that is treated collectively. Many block syntaxes can consist of any number of items (statements, expressions or other units of code) – including one or zero. Languages delimit a block in a variety of ways – some via marking text and others by relative formatting such as levels of indentation.
- Curley braces (a.k.a. curly brackets)
{...}
- Curly brace languages: A defining aspect of curly brace languages is that they use curly braces to delimit a block.
- Parentheses
(...)
- Square brackets
[...]
begin...end
- Ada, ALGOL, F# (verbose syntax),[9] Pascal, Ruby (
for,do/while&do/untilloops), OCaml, SCL, Simula, Erlang.
do...end
do...done
- Bash (
for&whileloops), F# (verbose syntax)[9] Visual Basic, Fortran, TUTOR (with mandatory indenting of block body), Visual Prolog
do...end
- X ...
end(e.g.if...end):
- Ruby (
if,while,until,def,class,modulestatements), OCaml (for&whileloops), MATLAB (if&switchconditionals,for&whileloops,tryclause,package,classdef,properties,methods,events, &functionblocks), Lua (then/else&function)
- (
begin...)
- (progn ...)
- (
do...)
- Indentation
- Off-side rule languages: Boo, Cobra, CoffeeScript, F#, Haskell (in do-notation when braces are omitted), LiveScript, occam, Python, Nemerle (Optional; the user may use white-space sensitive syntax instead of the curly-brace syntax if they so desire), Nim, Scala (Optional, as in Nemerle)
- Free-form languages: most descendants from ALGOL (including C, Pascal, and Perl); Lisp languages
- Others
- Ada, Visual Basic, Seed7:
if...end if - ALGOL 68:
begin...end,(...),if...fi,do...od - APL:
:If...:EndIfor:If...:End - Bash, sh, and ksh:
if...fi,do...done,case...esac; - COBOL:
IF...END-IF,PERFORM...END-PERFORM, etc. for statements; ....for sentences.* Lua, Pascal, Modula-2, Seed7:repeat...until - Small Basic:
If...EndIf,For...EndFor,While...EndWhile - Visual Basic (.NET):
If...End If,For...Next,Do...Loop
Comments
[edit]With respect to a language definition, the syntax of Comments can be classified many ways, including:
- Line vs. block – a line comment starts with a delimiter and continues to the end of the line (newline marker) whereas a block comment starts with one delimiter and ends with another and can cross lines
- Nestable – whether a block comment can be inside another block comment
- How parsed with respect to the language; tools (including compilers and interpreters) may also parse comments but that may be outside the language definition
Other ways to categorize comments that are outside a language definition:
- Inline vs. prologue – an inline comment follows code on the same line and a prologue comment precedes program code to which it pertains; line or block comments can be used as either inline or prologue
- Support for API documentation generation which is outside a language definition
Line comment
[edit]| Symbol | Languages |
|---|---|
C
|
Fortran I to Fortran 77 (C in column 1) |
REM
|
BASIC, Batch files, Visual Basic |
::
|
Batch files, COMMAND.COM, cmd.exe |
NB.
|
J; from the (historically) common abbreviation Nota bene, the Latin for "note well". |
⍝
|
APL; the mnemonic is that the glyph (jot overstruck with shoe-down) resembles a desk lamp, and hence "illuminates" the foregoing. |
#
|
Boo, Bourne shell and other UNIX shells, Cobra, Perl, Python, Ruby, Seed7, PowerShell, PHP, R, Make, Maple, Elixir, Julia, Nim[10] |
%
|
TeX, Prolog, MATLAB,[11] Erlang, S-Lang, Visual Prolog, PostScript |
//
|
ActionScript, Boo, C (C99), C++, C#, D, F#, Go, Java, JavaScript, Kotlin, Object Pascal (Delphi), Objective-C, PHP, Rust, Scala, Sass, Swift, Xojo, V (Vlang), Zig |
'
|
Monkey, Visual Basic, VBScript, Small Basic, Gambas, Xojo |
!
|
Factor, Fortran, Basic Plus, Inform, Pick Basic |
;
|
Most assembly languages, AutoHotkey, AutoIt, Lisp, Common Lisp, Clojure, PGN, Rebol, Red, Scheme |
--
|
Euphoria, Haskell, SQL, Ada, AppleScript, Eiffel, Lua, VHDL, SGML, PureScript, Elm |
*
|
Assembler S/360 (* in column 1), COBOL I to COBOL 85, PAW, Fortran IV to Fortran 77 (* in column 1), Pick Basic, GAMS (* in column 1) |
||
|
Curl |
"
|
Vimscript, ABAP |
\
|
Forth |
*>
|
COBOL 90 |
Block comment
[edit]In these examples, ~ represents the comment content, and the text around it are the delimiters. Whitespace (including newline) is not considered delimiters.
| Syntax | Languages |
|---|---|
comment ~ ;
|
ALGOL 60, SIMULA |
¢ ~ ¢,# ~ #, co ~ co,comment ~ comment
|
ALGOL 68[12][13] |
/* ~ */
|
ActionScript, AutoHotkey, C, C++, C#, CSS, D,[14] Go, Java, JavaScript, Kotlin, Objective-C, PHP, PL/I, Prolog, Rexx, Rust (can be nested), Scala (can be nested), SAS, SASS, SQL, Swift (can be nested), V (Vlang), Visual Prolog |
#cs ~ #ce
|
AutoIt[15] |
/+ ~ +/
|
D (can be nested)[14] |
/# ~ #/
|
Cobra (can be nested) |
<# ~ #>
|
PowerShell |
<!-- ~ -->
|
HTML, XML |
=begin ~ =cut
|
Perl (Plain Old Documentation) |
#`( ~ )
|
Raku (bracketing characters can be (), <>, {}, [], any Unicode characters with BiDi mirrorings, or Unicode characters with Ps/Pe/Pi/Pf properties) |
=begin ~ =end
|
Ruby |
#<TAG> ~ #</TAG>, #stop ~ EOF,#iffalse ~ #endif, #ifntrue ~ #endif,#if false ~ #endif, #if !true ~ #endif
|
S-Lang[16] |
{- ~ -}
|
Haskell (can be nested) |
(* ~ *)
|
Delphi, ML, Mathematica, Object Pascal, Pascal, Seed7, AppleScript, OCaml (can be nested), Standard ML (can be nested), Maple, Newspeak, F# |
{ ~ }
|
Delphi, Object Pascal, Pascal, PGN, Red |
{# ~ #}
|
Nunjucks, Twig |
{{! ~ }}
|
Mustache, Handlebars |
{{!-- ~ --}}
|
Handlebars (cannot be nested, but may contain {{ and }})
|
|# ~ #|
|
Curl |
%{ ~ %}
|
MATLAB[11] (the symbols must be in a separate line) |
#| ~ |#
|
Lisp, Scheme, Racket (can be nested in all three). |
#= ~ =#
|
Julia[17] |
#[ ~ ]#
|
Nim[18] |
--[[ ~ ]],--[=[ ~ ]=],--[==[ ~ ]==] etc.
|
Lua (brackets can have any number of matching = characters; can be nested within non-matching delimiters)
|
" ~ "
|
Smalltalk |
(comment ~ )
|
Clojure |
#If COMMENT Then ~ #End If[a]
|
Visual Basic (.NET) |
#if COMMENT ~ #endif[b]
|
C# |
' comment _,REM comment _[c]
|
Classic Visual Basic, VBA, VBScript |
Unique variants
[edit]- Fortran
Indenting lines in Fortran 66/77 is significant. The actual statement is in columns 7 through 72 of a line. Any non-space character in column 6 indicates that this line is a continuation of the prior line. A 'C' in column 1 indicates that this entire line is a comment. Columns 1 though 5 may contain a number which serves as a label. Columns 73 though 80 are ignored and may be used for comments; in the days of punched cards, these columns often contained a sequence number so that the deck of cards could be sorted into the correct order if someone accidentally dropped the cards. Fortran 90 removed the need for the indentation rule and added line comments, using the ! character as the comment delimiter.
- COBOL
In fixed format code, line indentation is significant. Columns 1–6 and columns from 73 onwards are ignored. If a * or / is in column 7, then that line is a comment. Until COBOL 2002, if a D or d was in column 7, it would define a "debugging line" which would be ignored unless the compiler was instructed to compile it.
- Cobra
Cobra supports block comments with "/# ... #/" which is like the "/* ... */" often found in C-based languages, but with two differences. The # character is reused from the single-line comment form "# ...", and the block comments can be nested which is convenient for commenting out large blocks of code.
- Curl
Curl supports block comments with user-defined tags as in |foo# ... #foo|.
- Lua
Like raw strings, there can be any number of equals signs between the square brackets, provided both the opening and closing tags have a matching number of equals signs; this allows nesting as long as nested block comments/raw strings use a different number of equals signs than their enclosing comment: --[[comment --[=[ nested comment ]=] ]]. Lua discards the first newline (if present) that directly follows the opening tag.
- Perl
Block comments in Perl are considered part of the documentation, and are given the name Plain Old Documentation (POD). Technically, Perl does not have a convention for including block comments in source code, but POD is routinely used as a workaround.
- PHP
PHP supports standard C/C++ style comments, but supports Perl style as well.
- Python
The use of the triple-quotes to comment-out lines of source, does not actually form a comment.[19] The enclosed text becomes a string literal, which Python usually ignores (except when it is the first statement in the body of a module, class or function; see docstring).
- Elixir
The above trick used in Python also works in Elixir, but the compiler will throw a warning if it spots this. To suppress the warning, one would need to prepend the sigil ~S (which prevents string interpolation) to the triple-quoted string, leading to the final construct ~S""" ... """. In addition, Elixir supports a limited form of block comments as an official language feature, but as in Perl, this construct is entirely intended to write documentation. Unlike in Perl, it cannot be used as a workaround, being limited to certain parts of the code and throwing errors or even suppressing functions if used elsewhere.[20]
- Raku
Raku uses #`(...) to denote block comments.[21] Raku actually allows the use of any "right" and "left" paired brackets after #` (i.e. #`(...), #`[...], #`{...}, #`<...>, and even the more complicated #`{{...}} are all valid block comments). Brackets are also allowed to be nested inside comments (i.e. #`{ a { b } c } goes to the last closing brace).
- Ruby
Block comment in Ruby opens at =begin line and closes at =end line.
- S-Lang
The region of lines enclosed by the #<tag> and #</tag> delimiters are ignored by the interpreter. The tag name can be any sequence of alphanumeric characters that may be used to indicate how the enclosed block is to be deciphered. For example, #<latex> could indicate the start of a block of LaTeX formatted documentation.
- Scheme and Racket
The next complete syntactic component (s-expression) can be commented out with #; .
- ABAP
ABAP supports two different kinds of comments. If the first character of a line, including indentation, is an asterisk (*) the whole line is considered as a comment, while a single double quote (") begins an in-line comment which acts until the end of the line. ABAP comments are not possible between the statements EXEC SQL and ENDEXEC because Native SQL has other usages for these characters. In the most SQL dialects the double dash (--) can be used instead.
- Esoteric languages
Many esoteric programming languages follow the convention that any text not executed by the instruction pointer (e.g., Befunge) or otherwise assigned a meaning (e.g., Brainfuck), is considered a "comment".
Comment comparison
[edit]There is a wide variety of syntax styles for declaring comments in source code.
BlockComment in italics is used here to indicate block comment style.
LineComment in italics is used here to indicate line comment style.
| Language | In-line comment | Block comment |
|---|---|---|
| Ada, Eiffel, Euphoria, Occam, SPARK, ANSI SQL, and VHDL | -- LineComment
|
|
| ALGOL 60 | comment BlockComment;
| |
| ALGOL 68 | ¢ BlockComment ¢
| |
| APL | ⍝ LineComment
|
|
| AppleScript | -- LineComment
|
(* BlockComment *)
|
| Assembly language (varies) | ; LineComment one example (most assembly languages use line comments only)
|
|
| AutoHotkey | ; LineComment
|
/* BlockComment */
|
| AWK, Bourne shell, C shell, Maple, PowerShell | # LineComment
|
<# BlockComment #>
|
| Bash | # LineComment
|
<<EOF: '
|
| BASIC (various dialects): | 'LineComment (not all dialects)
|
|
| C (K&R, ANSI/C89/C90), CHILL, PL/I, REXX | /* BlockComment */
| |
| C (C99), C++, Go, Swift, JavaScript, V (Vlang) | // LineComment
|
/* BlockComment */
|
| C# | // LineComment/// LineComment (XML documentation comment)
|
/* BlockComment *//** BlockComment */ (XML documentation comment)#if COMMENT (Compiler directive)[b]
|
| COBOL I to COBOL 85 | * LineComment (* in column 7)
|
|
| COBOL 2002 | *> LineComment
|
|
| Curl | || LineComment
|
|# BlockComment #|
|
| Cobra | # LineComment
|
/# BlockComment #/ (nestable)
|
| D | // LineComment/// Documentation LineComment (ddoc comments)
|
/* BlockComment *//** Documentation BlockComment */ (ddoc comments)
|
| DCL | $! LineComment
|
|
| ECMAScript (JavaScript, ActionScript, etc.) | // LineComment
|
/* BlockComment */
|
| Elixir | # LineComment
|
~S"""@doc """ (Documentation, only works in modules)@moduledoc (Module documentation)@typedoc (Type documentation)
|
| Forth | \ LineComment
|
( BlockComment ) (single line and multiline)
|
| FORTRAN I to FORTRAN 77 | C LineComment (C in column 1)
|
|
| Fortran 90 and later | ! LineComment
|
#if 0[d]
|
| Haskell | -- LineComment
|
{- BlockComment -}
|
| J | NB.
|
|
| Java | // LineComment
|
/* BlockComment */
|
| Julia | # LineComment
|
#= BlockComment =#
|
| Lisp, Scheme | ; LineComment
|
#| BlockComment |#
|
| Lua | -- LineComment
|
--[==[ BlockComment]==] (variable number of = signs, nestable with delimiters with different numbers of = signs)
|
| Maple | # LineComment
|
(* BlockComment *)
|
| Mathematica | (* BlockComment *)
| |
| Matlab | % LineComment
|
%{[e]
|
| Nim | # LineComment
|
#[ BlockComment ]#
|
| Object Pascal | // LineComment
|
(* BlockComment *){ BlockComment }
|
| OCaml | (* BlockComment (* nestable *) *)
| |
| Pascal, Modula-2, Modula-3, Oberon, ML: | (* BlockComment *)
| |
| Perl, Ruby | # LineComment
|
=begin (=end in Ruby) (POD documentation comment)
|
| PGN, Red | ; LineComment
|
{ BlockComment }
|
| PHP | # LineComment// LineComment
|
/* BlockComment *//** Documentation BlockComment */ (PHP Doc comments)
|
| PILOT | R:LineComment
|
|
| PLZ/SYS | ! BlockComment !
| |
| PL/SQL, TSQL | -- LineComment
|
/* BlockComment */
|
| Prolog | % LineComment
|
/* BlockComment */
|
| Python | # LineComment
|
''' BlockComment '''(Documentation string when first line of module, class, method, or function) |
| R | # LineComment
|
|
| Raku | # LineComment
|
#`{
|
| Rust | // LineComment
|
/* BlockComment */ (nestable)
|
| SAS | * BlockComment;/* BlockComment */
| |
| Seed7 | # LineComment
|
(* BlockComment *)
|
| Simula | comment BlockComment;! BlockComment;
| |
| Smalltalk | "BlockComment"
| |
| Smarty | {* BlockComment *}
| |
| Standard ML | (* BlockComment *)
| |
| TeX, LaTeX, PostScript, Erlang, S-Lang | % LineComment
|
|
| Texinfo | @c LineComment
|
|
| TUTOR | * LineCommentcommand $$ LineComment
|
|
| Visual Basic | ' LineCommentRem LineComment
|
' BlockComment _Rem BlockComment _[c]
|
| Visual Basic (.NET) | ' LineComment
|
#If COMMENT Then
|
| Visual Prolog | % LineComment
|
/* BlockComment */
|
| Wolfram Language | (* BlockComment *)
| |
| Xojo | ' LineComment// LineCommentrem LineComment
| |
| Zig | // LineComment/// LineComment//! LineComment
|
See also
[edit]- C syntax
- C++ syntax
- Curly bracket programming languages, a broad family of programming language syntaxes
- Java syntax
- JavaScript syntax
- PHP syntax and semantics
- Python syntax and semantics
References
[edit]- ^ Three different kinds of clauses, each separates phrases and the units differently:
- serial-clause using go-on-token (viz. semicolon): begin a; b; c end – units are executed in order.
- collateral-clause using and-also-token (viz. ","): begin a, b, c end – order of execution is to be optimised by the compiler.
- parallel-clause using and-also-token (viz. ","): par begin a, b, c end – units must be run in parallel threads.
- ^ From the R Language Definition, section 3.2 Control structures: "A semicolon always indicates the end of a statement while a new line may indicate the end of a statement. If the current statement is not syntactically complete new lines are simply ignored by the evaluator."
- ^ Bash Reference Manual, 3.1.2.1 Escape Character
- ^ Python Documentation, 2. Lexical analysis: 2.1.5. Explicit line joining
- ^ "Mathworks.com". Archived from the original on 7 February 2010.
- ^ "Parenthesis/Brackets - Windows CMD - SS64.com". ss64.com.
- ^ "Scripts - Definition & Usage | AutoHotkey".
- ^ For an M-file (MATLAB source) to be accessible by name, its parent directory must be in the search path (or current directory).
- ^ a b c "Verbose Syntax - F# | Microsoft Learn". Microsoft Learn. 5 November 2021. Retrieved 17 November 2022.
- ^ "Nim Manual". nim-lang.org.
- ^ a b "Mathworks.com". Archived from the original on 21 November 2013. Retrieved 25 June 2013.
- ^ "Algol68_revised_report-AB.pdf on PDF pp. 61–62, original document pp. 121–122" (PDF). Retrieved 27 May 2014.
- ^ "HTML Version of the Algol68 Revised Report AB". Archived from the original on 17 March 2013. Retrieved 27 May 2014.
- ^ a b "DLang.org, Lexical". Retrieved 27 May 2014.
- ^ "AutoItScript.com Keyword Reference, #comments-start". Retrieved 27 May 2014.
- ^ "slang-2.2.4/src/slprepr.c – line 43 to 113". Archived from the original on 21 November 2017. Retrieved 28 May 2014.
- ^ "Punctuation · The Julia Language".
- ^ "Nim Manual". nim-lang.org.
- ^ "Python tip: You can use multi-line strings as multi-line comments", 11 September 2011, Guido van Rossum
- ^ "Writing Documentation — Elixir v1.12.3". Retrieved 28 July 2023.
- ^ "Perl 6 Documentation (Syntax)". docs.perl6.org. Comments. Retrieved 5 April 2017.
- ^ "Using the FPP Preprocessor". Archived from the original on 18 November 2022. Retrieved 18 November 2022.
- ^ "Perl 6 POD Comments". 25 May 2023.
- ^ "Perl 6 POD (Abbreviated Blocks)". 25 May 2023.
Notes
[edit]- ^ Visual Basic (.NET) does not support traditional multi-line comments, but they can be emulated through compiler directives.
- ^ a b While C# supports traditional block comments
/* ... */, compiler directives can be used to mimic them just as in VB.NET. - ^ a b The line continuation character
_can be used to extend a single-line comment to the next line without needing to type'orREMagain. This can be done up to 24 times in a row. - ^ Fortran does not support traditional block comments, but some compilers support preprocessor directives in the style of C/C++, allowing a programmer to emulate multi-line comments.[22]
- ^ Both percent–bracket symbols must be the only non-whitespace characters on their respective lines.
Comparison of programming languages (syntax)
View on Grokipediaend in Ruby).[1] These differences arise from historical influences, such as ALGOL 60's introduction of formal Backus-Naur Form (BNF) syntax description, which standardized rules for subsequent languages, and design choices prioritizing orthogonality or simplicity.[1]
Empirical studies underscore syntax's role as a barrier for novice programmers, revealing that traditional C-style syntax in languages like Java and Perl offers no significant accuracy advantage over randomized keywords, whereas more intuitive designs in Python, Ruby, and Quorum—deviating from C conventions—correlate with higher comprehension and fewer errors among beginners.[2] Factors like case sensitivity (e.g., enforced in languages like C, C++, and Java) and the use of reserved words versus keywords further influence writability and maintainability, with overly complex syntax in languages like PL/I leading to criticism for reduced readability.[1] Overall, syntactic comparisons inform language selection for education, software development, and compiler design, emphasizing trade-offs between expressiveness and ease of use.[2]
Lexical Elements
Comments
Comments serve as non-executable annotations in programming languages, allowing developers to include explanatory text, documentation, or debugging notes within source code without affecting program execution; these are typically stripped or ignored during compilation or interpretation.[3] The primary purposes include enhancing code readability, facilitating maintenance, and enabling temporary code exclusion for testing.[4] Line comments, which apply from a delimiter to the end of the current line, are a common mechanism for single-line annotations. In Python, comments begin with the # symbol, ignoring all subsequent characters until the newline.[4] C++ uses // as the delimiter for line comments, extending this style to related languages like Java.[3] Early versions of BASIC employed REM (short for "remark") to start a full-line comment, treating the entire line as non-executable.[5] Block comments enclose multi-line text between paired delimiters, providing a convenient way to comment out larger code sections. In C and C++, the syntax /* initiates a block comment that continues until the matching /, but these do not support nesting, as an inner / would terminate at the next */ regardless of pairing. Perl offers =begin followed by a label (e.g., =begin comment) and =end for block-style comments, particularly useful in POD documentation sections, though standard code comments rely on per-line # markers.[6] Certain languages feature specialized comment variants for enhanced documentation. Python's docstrings, delimited by triple quotes (""" or '''), function as multi-line strings that, when placed immediately after module, class, or function definitions and not assigned to variables, serve documentation purposes and are accessible via the doc attribute, effectively acting like ignored comments.[7] Java extends the /* / block comment with /* */ for Javadoc, enabling structured API documentation generation from source code.[8] In HTML and scripting contexts like JavaScript within HTML, provides a comment syntax that spans lines until -->.[9]| Language | Line Comment Delimiter | Block Comment Delimiter | Nesting Supported | Limitations/Notes |
|---|---|---|---|---|
| Python | # | """ (docstring) | N/A | No true block comments; docstrings are string literals used for docs.[7] |
| C/C++ | // (C99+) | /* */ | No | Line comments added in C99; blocks ignore newlines but not nested.[3] |
| Java | // | /* / or /*/ (Javadoc) | No | Javadoc variant generates HTML docs.[8] |
| Perl | # | =begin/=end | Yes (POD) | Primarily for documentation; code blocks use multiple # lines.[6] |
| BASIC | REM | N/A | N/A | Full-line only; modern variants may use '.[5] |
| HTML (scripting) | N/A | No | Spans lines; used in markup and embedded scripts.[9] |
Identifiers and Keywords
In programming languages, identifiers are names used to denote variables, functions, classes, and other entities, while keywords are predefined reserved words that hold special syntactic meaning and cannot be used as identifiers. These elements form the foundational lexical structure for naming in code, influencing readability, portability, and error prevention across languages.[10] Identifier syntax typically allows a starting character from letters or underscores, followed by letters, digits, and sometimes other symbols, though specifics vary. For instance, in C, identifiers consist of an initial letter (uppercase or lowercase Latin) or underscore, followed by letters, digits, or underscores, with support for Unicode via escape sequences since C99. Similarly, Java permits an unlimited sequence starting with a Java letter (Unicode characters whereCharacter.isJavaIdentifierStart returns true, including A-Z, a-z, _, or $) followed by Java letters or digits, enabling international scripts like Chinese or Arabic.[11] Python follows Unicode standards, allowing initial characters from ASCII letters, underscore, or specific Unicode categories (e.g., Lu for uppercase letters, Lo for other letters), with subsequent characters including digits and connector punctuation.[10] In contrast, Common Lisp symbols (serving as identifiers) use constituent characters like alphanumeric ones, with escapes for specials, but permit arbitrary strings via vertical bars.[12]
Most modern languages treat identifiers as case-sensitive, distinguishing between uppercase and lowercase, which promotes precision but requires careful typing. C, Java, and Python are case-sensitive; for example, variable and Variable represent distinct identifiers in Python.[10][11] However, Pascal is case-insensitive, treating MyVar and myvar as identical, a design choice rooted in its origins on limited-character displays. Length limits are generally absent in high-level languages like Python and Java, but C implementations must recognize at least 63 significant characters for internal identifiers since C99.[10][11]
Keywords are fixed sets of reserved strings that the compiler or interpreter recognizes for control flow, types, and operations, preventing their reuse as identifiers to avoid ambiguity. In C, examples include if, while, and return, totaling 52 keywords in C23, with additional reserved prefixes like double underscores.[13] Java reserves 51 keywords such as class, public, and interface, plus literals like true and null.[14] Common Lisp uses symbols like defun for function definition and if for conditionals as part of its COMMON-LISP package, though not strictly "reserved" in the same way due to its dynamic nature.
Naming conventions, while not enforced by syntax, often guide identifier formation for clarity; Hungarian notation, originated by Charles Simonyi at Microsoft in the 1970s, prefixes identifiers with type indicators (e.g., iCount for an integer counter), influencing practices in C++ and Windows API code despite not being syntactic.[15] Modern languages like Python 3 and Java support Unicode identifiers natively, allowing non-ASCII characters (e.g., café as a variable name in Python via PEP 3131), broadening accessibility for international developers.[16][11]
To use reserved words as identifiers, languages provide escaping mechanisms. In standard SQL, double quotes delimit identifiers, permitting keywords like select as a column name (e.g., "select"). MySQL extends this with backticks for identifiers containing specials or reserves (e.g., `order` as a table name), ensuring compatibility with SQL keywords.
Literals and Constants
Literals and constants in programming languages provide syntactic notations for fixed values that do not change during execution, such as numbers, strings, and booleans, serving as fundamental building blocks for expressions.[17] These elements are typically defined in the language's lexical grammar and must adhere to strict syntax rules to ensure unambiguous parsing by the compiler or interpreter. Differences across languages arise in supported formats, escape mechanisms, and additional features like radix prefixes or separators, reflecting design choices for readability, precision, and compatibility with underlying hardware representations.[18] Numeric literals represent fixed numerical values, with integer and floating-point forms being ubiquitous. Integer literals commonly support decimal notation, while many languages offer alternative bases: hexadecimal (prefixed by 0x or 0X, e.g., 0xFF in C), binary (0b, e.g., 0b1010 in Python), and octal (0 or 0o, e.g., 012 in C or 0o10 in Python).[19] Floating-point literals typically include a decimal point and optional exponent (e.g., 3.14 or 1e-3 in Python and Java).[20][21] Some languages, like Rust and Python (since version 3.6), permit underscores as digit separators for improved readability, such as 1_000 for one thousand, without affecting the value.[22][23]| Language | Integer Examples | Floating-Point Examples | Notes |
|---|---|---|---|
| C/C++ | 42 (decimal), 0xFF (hex), 052 (octal), 0b101 (binary since C++14 for C++, C23 for C) | 3.14, 1.0e3 | Suffixes like U for unsigned, L for long.[24] |
| Python | 42, 0xFF, 0b101, 0o52 | 3.14, 1e3 | Underscores allowed (e.g., 1_000); arbitrary precision integers.[19] |
| Java | 42, 0xFF, 052 (octal), 0b101 (since Java 7) | 3.14, 1.0e3 | Suffixes like L for long, F for float.[25] |
| Rust | 42, 0xFF, 0b101, 0o52 | 3.14, 1e3 | Underscores for separation (e.g., 1_000); type suffixes like i32.[26] |
Statements and Delimitation
Statement Delimitation
Statement delimitation in programming languages refers to the syntactic rules that mark the end of a single statement or separate consecutive statements, ensuring unambiguous parsing by compilers or interpreters.[35] These mechanisms vary widely, reflecting design choices that balance readability, error-proneness, and historical influences from early computing environments.[36] Semicolon-based delimitation is prevalent in languages derived from C, where a semicolon (;) explicitly terminates each statement, including the last one in a block. For example, in C and Java, code like int x = 5; printf("%d\n", x); requires semicolons after each declaration and expression to signal completion, aiding precise tokenization during compilation.[35] In Go, semicolons are mandatory in syntax but often omitted, as the compiler automatically inserts them at line ends where appropriate, such as after variable declarations or simple statements. JavaScript employs optional semicolons via automatic semicolon insertion (ASI), which adds them at line breaks if omission would cause parsing errors, though explicit semicolons prevent ambiguities like the "dangling else" issue in multi-line expressions.[37] This approach reduces visual clutter but can lead to subtle bugs if ASI misinterprets code intent.[35]
Newline-based delimitation treats the end of a physical line as the natural boundary for statements, eliminating punctuation needs in many cases. Languages like Python and Ruby rely on this, where a newline typically concludes a simple statement, as in Python's x = 5 followed by a newline before the next command.[18] In Python, compound statements (e.g., if blocks) use colons and indentation for structure, but individual lines within end at newlines unless explicitly continued with backslashes.[18] Ruby similarly uses newlines for termination, allowing multiple statements per line only if separated by semicolons, which is rare in practice.[35] Other examples include BCPL and REXX, where newlines act as separators without requiring additional tokens.[35] This method promotes concise, readable code but demands careful handling of multi-line expressions through escape characters or parentheses.
Keyword-based delimitation employs reserved words to explicitly close statements, often making punctuation optional. In BASIC variants, keywords like END IF or NEXT terminate control structures, while simple statements end implicitly at line ends.[35] Shell scripting languages, such as Bash, use keywords like fi for if statements or done for loops, with newlines or semicolons separating commands in sequences. Languages like Algol 68 and Eiffel further integrate keywords (e.g., end) for delimiting, enhancing structure without relying on punctuation.[35] This approach improves clarity in nested constructs but can increase verbosity.
Errors from improper delimitation differ by method: in semicolon-based languages like C and Java, omitting a semicolon often results in syntax errors where the subsequent token is parsed as part of the prior statement, leading to cryptic compiler messages such as "expected ';' before 'int'".[35] For instance, int x = 5 int y = 10; fails because the second int is misinterpreted. In newline-based systems like Python, missing or mismatched indentation after a newline triggers an IndentationError, emphasizing structural alignment over punctuation.[18] Keyword omissions, as in shell scripts, may cause unclosed structure errors like "unexpected end of file" if fi is absent. These variances highlight how delimiter choice affects debugging, with punctuation-based systems prone to overlooked tokens and indentation-based ones sensitive to whitespace.[35]
Historically, statement delimitation evolved from fixed-format punch-card systems in early languages like FORTRAN (1957), where column positions and line ends implicitly delimited statements without punctuation.[38] Algol 60 introduced semicolons as separators between statements (not terminators), influencing Pascal, while C's 1972 adoption of semicolons as terminators—requiring one after the last statement—sparked ongoing debates dubbed the "Semicolon Wars" over verbosity versus precision.[36] Modern editors and IDEs mitigate these issues by auto-inserting delimiters, tracing back to punch-card rigidity toward flexible, editor-assisted syntax in languages like Python (1991).[36] This progression reflects a shift from hardware-constrained formats to human-readable designs.[39]
| Method | Languages | Key Characteristics | Common Error Example |
|---|---|---|---|
| Semicolon-based | C, Java, Go, JavaScript | Explicit terminator; optional in some via ASI | Missing ; causes token misparse |
| Newline-based | Python, Ruby, BCPL, REXX | Line end as boundary; indentation for blocks | IndentationError on whitespace mismatch |
| Keyword-based | BASIC, Bash, Algol 68 | Reserved words close structures; line ends for simples | Unclosed keyword leads to EOF error |
Line Continuation
Line continuation in programming languages refers to syntactic mechanisms that allow a single logical statement or expression to span multiple physical lines in source code, primarily to enhance readability without altering semantics. This feature addresses the limitations of fixed line lengths in editors and terminals, enabling developers to format complex code structures more clearly. Unlike statement delimitation, which separates distinct statements (often using semicolons or newlines), line continuation operates within a single statement to join lines implicitly or explicitly.[18] One common explicit method uses the backslash (\) character at the end of a line to escape the newline and continue the statement on the next line. In Python, a physical line ending with a backslash (not part of a string literal or comment) is joined with the following line to form a logical line, though this approach is generally discouraged in favor of implicit methods due to potential issues like inability to continue comments or tokens.[18] For example:
total = item_one + \
item_two + \
item_three
total = item_one + \
item_two + \
item_three
int total = item_one +
item_two +
item_three;
int total = item_one +
item_two +
item_three;
total = (item_one
+ item_two
+ item_three)
total = (item_one
+ item_two
+ item_three)
+, -, *, /, &&, ||, or =) or a method call dot (.), allowing expressions to flow across lines without explicit escapes; backslashes are supported but avoided except for string literals per community style guides.[42]
total = item_one +
item_two +
item_three
total = item_one +
item_two +
item_three
let total = itemOne +
itemTwo +
itemThree
let total = itemOne +
itemTwo +
itemThree
SELECT, FROM, or operators like JOIN.[44]
SELECT column1,
column2
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE condition = true;
SELECT column1,
column2
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE condition = true;
[String](/page/String) result = someObject
.method1()
.method2(param1, param2)
.method3();
[String](/page/String) result = someObject
.method1()
.method2(param1, param2)
.method3();
| Language | Method | Example Trigger | Key Source |
|---|---|---|---|
| Python | Explicit backslash | End of line with \ | Python Docs |
| Python | Implicit parentheses | Open (, [, { | PEP 8 |
| C | Explicit backslash | \ before newline | C99 Rationale |
| Ruby | Implicit operator/dot | After +, ., etc. | Ruby Style Guide |
| F# | Indentation-based | Indent continuation lines | .NET F# Guide |
| SQL | Whitespace-agnostic | Any line break outside literals | SQL Style Guide |
| Java | Implicit whitespace | Newline in expressions | JLS §3.6 |
Expressions
Expressions in programming languages are syntactic constructs that evaluate to values, typically formed by combining operands—such as variables, literals, or subexpressions—with operators. These constructs enable computation without altering control flow, distinguishing them from statements.[46][47] Operator syntax for basic computations varies slightly across languages but follows common patterns for arithmetic, logical, and bitwise operations. Arithmetic operators, including addition (+), subtraction (-), multiplication (*), and division (/), are infix in most languages, placed between operands; for instance,a + b computes the sum in C, C++, Java, and Python.[48][49][50] Logical operators for conjunction and disjunction include && (short-circuit AND) and || (short-circuit OR) in C-like languages, while Python uses keywords and and or with equivalent short-circuiting behavior.[51][52][53] Bitwise operators, such as & (AND), | (OR), and ^ (XOR), employ the same infix notation in C, C++, Java, and Python, operating on integer operands bit by bit.[54][55]
Precedence and associativity rules dictate evaluation order in expressions with multiple operators, preventing ambiguity. In C-like languages such as C, C++, and Java, arithmetic operators follow a PEMDAS-like hierarchy, with multiplicative operators (*, /, %) binding tighter than additive (+, -), followed by shifts (<<, >>), bitwise operators (&, ^, |), and finally logical operators (&&, ||); the ternary operator (?:) has the lowest precedence among these and associates right-to-left.[56][57] The following table summarizes precedence levels for representative C++ operators (levels decrease from higher to lower precedence; Java and C share nearly identical rules):
| Precedence | Category | Operators | Associativity |
|---|---|---|---|
| 5 | Multiplicative | *, /, % | Left-to-right |
| 6 | Additive | +, - | Left-to-right |
| 7 | Shift | <<, >> | Left-to-right |
| 11 | Bitwise AND | & | Left-to-right |
| 12 | Bitwise XOR | ^ | Left-to-right |
| 13 | Bitwise OR | | | Left-to-right |
| 14 | Logical AND | && | Left-to-right |
| 15 | Logical OR | || | Left-to-right |
| 16 | Ternary conditional | ? : | Right-to-left |
not, and, or) at lower levels, with or lowest among them, and all non-exponentiation operators associating left-to-right except the right-associative power operator (**).[58] In contrast, Lisp dialects like Common Lisp employ prefix notation in s-expressions, where operators precede operands within parentheses, such as (+ 1 (* 2 3)); this fully parenthesized structure eliminates the need for precedence rules, as nesting explicitly governs order.[59] Haskell assigns fixities to infix operators via precedence levels (0–9, with 9 highest) and associativity (left, right, or none), but function application binds most tightly, allowing uniform treatment of functions and operators.[60]
The ternary conditional operator provides a compact way to select between two expressions based on a condition. In C, C++, and Java, it uses the syntax condition ? expression1 : expression2, evaluating to expression1 if condition is true and expression2 otherwise, with right-to-left associativity.[56][61] Functional languages like Haskell integrate conditionals directly as expressions via if condition then expression1 else expression2, which evaluates to one of the branches and supports lazy evaluation.[62]
Lambda expressions offer concise syntax for defining anonymous functions within expressions. In C#, the lambda operator => separates parameters from the body, as in x => x * x for a squaring function; this supports both expression and statement bodies.[63] Python uses the lambda keyword followed by parameters and a colon-separated expression, such as lambda x: x * x, restricting lambdas to single expressions without statements.[64]
Expressions in imperative languages often permit side effects, allowing computation alongside state mutation. In C++, the pre-increment operator ++i evaluates to the incremented value of i while modifying i as a side effect; such operations must respect sequence points to avoid undefined behavior in complex expressions.[47][65]
Control Structures
Block Delimitation
Block delimitation in programming languages refers to the syntactic mechanisms used to group one or more statements into a compound block, typically to define the scope of control structures like conditionals or loops, ensuring that statements are executed together as a unit. These blocks often introduce lexical scopes where variables declared within are visible only to statements inside the block, promoting modularity and preventing namespace pollution. Common approaches include delimiter pairs, indentation, or keywords, each with implications for readability, error-proneness, and parser complexity. Brace-based delimitation, using curly braces{ }, is prevalent in languages like C, C++, Java, and JavaScript, where blocks explicitly enclose statements for functions, loops, and conditionals. In these languages, braces are mandatory for multi-statement blocks to avoid ambiguities such as the "dangling else" problem, where an ambiguous if-else pairing can occur without them; for instance, in C, the following is parsed with the else attaching to the inner if unless braces enforce grouping:
if (condition1)
if (condition2)
statement;
else
another_statement; // Attaches to inner if
if (condition1)
if (condition2)
statement;
else
another_statement; // Attaches to inner if
begin...end in Pascal and Ada, or do...end in Ruby for certain contexts. In Pascal, begin initiates a compound statement, and end closes it, allowing blocks in procedures and conditionals without braces or indentation reliance, which supports structured programming principles from its design in the 1970s. Ruby uses do...end for multi-line blocks in iterators like each, providing an alternative to braces for method bodies, which promotes expressiveness in dynamic code. This style avoids visual clutter from symbols but requires careful keyword balancing to prevent parsing errors.
Blocks in these languages generally introduce lexical scopes, where variables declared inside are not accessible outside, enforcing encapsulation; for example, in Java, a variable in a method's block is local to that scope. This scoping rule, rooted in ALGOL's influence, varies slightly—Python's blocks do not create new scopes for variables, as scoping is at the function or module level—but consistently limits visibility to the appropriate enclosing scope.[67]
Early languages like Fortran imposed nesting depth limits on blocks due to compiler constraints; original Fortran I (1957) restricted DO-loop nesting to 50 levels to manage symbol table overhead on limited hardware.[68] Modern Fortran relaxes this, allowing deeper nesting without fixed limits, reflecting hardware advances. Within blocks, comments can appear as non-executable elements, but their placement follows the delimitation rules without altering scope boundaries.
Conditional Statements
Conditional statements in programming languages enable selective execution of code based on boolean conditions, forming a core aspect of control flow syntax. Across languages, the basic if-else construct evaluates a condition and executes one of two code paths, but syntactic variations reflect design philosophies: C-family languages like C and Java use parenthesized conditions and braces for blocks, emphasizing explicit structure, while Python employs indentation for blocks and keyword-based conditions for readability.[69][70][71] In C, the if-else syntax requires a parenthesized condition followed by a statement or block, with an optional else clause for the alternative path. For example:if (condition) {
// statements
} else {
// statements
}
if (condition) {
// statements
} else {
// statements
}
if condition:
# statements
else:
# statements
if condition:
# statements
else:
# statements
elsif after an initial if, evaluating subsequent conditions only if prior ones fail:
if (condition1) {
# statements
} elsif (condition2) {
# statements
} else {
# statements
}
if (condition1) {
# statements
} elsif (condition2) {
# statements
} else {
# statements
}
elsif clauses and treats the block as optional if a single statement follows. Python employs elif, a contraction of "else if," which similarly chains conditions without deep nesting:
if condition1:
# statements
elif condition2:
# statements
else:
# statements
if condition1:
# statements
elif condition2:
# statements
else:
# statements
elif keyword streamlines readability in scripts with sequential checks, aligning with Python's emphasis on simplicity.[72][71]
Switch or case constructs provide multi-way branching for equality checks against constants, often more efficient than if-else chains for discrete values. In Java, the switch statement selects a case based on an integer or string expression, using colons after case labels and requiring break to prevent unintended continuation:
switch (expression) {
case value1:
// statements
break;
case value2:
// statements
break;
default:
// statements
}
switch (expression) {
case value1:
// statements
break;
case value2:
// statements
break;
default:
// statements
}
match expression {
pattern1 => // statements,
pattern2 => // statements,
_ => // default statements,
}
match expression {
pattern1 => // statements,
pattern2 => // statements,
_ => // default statements,
}
result = condition ? expression1 : expression2;
result = condition ? expression1 : expression2;
break or goto, allowing intentional grouping of cases but risking bugs from omitted breaks:
switch (expression) {
case 1:
case 2: // falls through from case 1
// statements for both
break;
case 3:
// statements
break;
}
switch (expression) {
case 1:
case 2: // falls through from case 1
// statements for both
break;
case 3:
// statements
break;
}
Iteration Statements
Iteration statements in programming languages provide mechanisms for repeating blocks of code, enabling efficient handling of repetitive tasks such as processing collections or performing computations until a condition is met. These constructs vary significantly across languages, reflecting design philosophies from imperative control flow in C-like languages to more declarative iteration in scripting languages like Python. Common forms include counted loops, condition-based loops, and collection iterators, often complemented by control modifiers like break and continue for fine-grained execution control.[71] The for loop, originating in languages like Fortran and popularized in C, typically combines initialization, condition checking, and incrementation in a single construct. In C, the syntax isfor (init; condition; increment) statement, where init declares or assigns loop variables, condition is evaluated before each iteration, and increment updates the variables after the body executes; this form supports flexible, index-based iteration over arrays or ranges.[76] In contrast, Python employs a more iterable-focused syntax: for target in iterable: body, which assigns each element of the iterable (such as a list or range object) to target sequentially, emphasizing readability over explicit indexing.[71] These differences highlight imperative versus Pythonic approaches, with C's model requiring manual counter management while Python abstracts it via built-in iterators.
Condition-based loops like while and do-while allow repetition until a boolean expression falsifies. In C, the while loop uses while (condition) statement, testing the condition before executing the body, potentially skipping execution entirely if false initially. The do-while variant, do statement while (condition);, inverts this by executing the body first and checking afterward, guaranteeing at least one iteration—useful for menus or validation prompts.[77] Such post-test loops are absent in Python, which relies solely on while condition: body for similar pre-test behavior, aligning with its avoidance of unchecked execution.[78]
Foreach-style loops simplify iteration over collections without explicit indices. Java's enhanced for loop, introduced in Java 5, follows for (Type item : collection) body, where item binds to each element of an iterable collection like an array or List, promoting type-safe traversal.[79] PHP offers a similar construct: foreach (array_expression as $value) statement, which iterates over arrays or Traversable objects, optionally accessing keys via as $key => $value; this supports both indexed and associative arrays natively.[80] These idioms reduce boilerplate compared to traditional for loops, though they limit direct index access unless augmented with counters.
In functional languages, recursion serves as a primary syntactic alternative to explicit loops, leveraging tail calls for efficiency. Scheme, per the R5RS standard, mandates proper tail recursion, where a recursive call in tail position (the last operation) reuses the current stack frame, enabling unbounded iteration without stack overflow—e.g., (define (loop n) (if (> n 0) (loop (- n 1)) 'done)) executes iteratively in constant space.[81] This contrasts with imperative languages' mutable loops, favoring immutable, declarative patterns but requiring compiler support for optimization.
Break and continue statements alter loop flow: break exits the enclosing loop prematurely, while continue skips to the next iteration. In C and Python, these apply to the innermost loop, with syntax break; or continue; inside the body.[78] Java extends this with labeled variants for nested loops, using label: for (...) { ... } followed by break label; or continue label;, allowing control of outer loops without fully unwinding inner ones—e.g., breaking from a search in a double loop.[82] This feature addresses common nesting complexities but is used judiciously to maintain code clarity.
External and Modular Syntax
Consuming External Software
Consuming external software in programming languages involves syntactic constructs for invoking system commands, importing libraries, interfacing with foreign code, and handling inter-process communication like pipes and redirection. These mechanisms allow programs to leverage functionality from outside the language's runtime, such as operating system utilities or pre-compiled binaries, but vary significantly in syntax and integration level across languages. System calls enable direct execution of external commands or processes. In C, theexec() family of functions, such as execl() or execvp(), replaces the current process image with a new one specified by a path and arguments; for example, execl("/bin/ls", "ls", "-l", NULL); lists directory contents, returning -1 on failure to indicate errors like file not found. Python provides os.system(command) to execute shell commands synchronously, as in os.system("ls -l"), which returns the exit status of the command (0 for success) but does not capture output directly. Perl uses backticks for command substitution, like my $output = ls -l;, which interpolates the command's output into a string and sets the $? variable for the exit status. Shell languages like Bash employ similar backticks or the more modern $(command) syntax for embedding external output, emphasizing their role in scripting environments.
Library imports bring in external code modules at compile or runtime. C uses the preprocessor directive #include <header.h> to incorporate declarations from system or user headers, such as #include <stdio.h> for standard I/O functions, which the compiler processes before translation. Python's import statement loads modules dynamically, e.g., import math or from math import sqrt, allowing access to namespaced functions without qualification in some cases. In C++, the using namespace std; directive after #include <iostream> brings all names from the standard namespace into scope, simplifying code like cout << "Hello"; but risking name conflicts in large projects.
Foreign function interfaces (FFIs) facilitate calling code written in other languages. LuaJIT's FFI library, accessed via local ffi = require("ffi"), declares C structures and functions for direct invocation without wrappers, such as ffi.cdef[[void printf(const char *fmt, ...);]] followed by ffi.C.printf("Hello\n");[83]. Java's Java Native Interface (JNI) requires generating header files with javac -h in modern versions and using native method declarations like public native void callNative(String arg); in Java classes, with implementation in C/C++ via JNI functions such as JNI_CreateJavaVM. As of Java 22, the Foreign Function and Memory (FFM) API provides a modern alternative, using syntax like MethodHandle mh = linker.downcallHandle(symbol, function); for direct native calls without JNI boilerplate.[84]
Pipes and redirection handle data flow between processes. In Unix-like shells, the pipe operator | connects output to input, as in ls -l | grep ".txt", chaining commands without intermediate files. Windows Batch files use > for output redirection and >> for appending, e.g., dir > output.txt or echo "append" >> output.txt, integrating with command-line tools like findstr.
Error handling for external invocations often relies on return codes or exceptions. In C, exec() functions do not return on success (process replacement occurs), but callers like fork() check for -1 and use errno for details, such as if (execvp(path, args) == -1) perror("execvp failed");. Python's os.system() returns the subprocess exit code, which programs can inspect with if os.system("command") != 0: raise RuntimeError("Command failed");, though subprocess modules offer richer exception-based handling. Perl captures external errors via $? after backticks, allowing checks like die "Command failed with code $?" if $?;, providing a simple scalar for status analysis. Languages like Java wrap JNI calls in try-catch blocks for UnsatisfiedLinkError or custom exceptions, ensuring robust integration.
Function and Module Declarations
Function declarations in programming languages vary significantly in syntax, reflecting differences in type systems, scoping rules, and design philosophies. In statically typed languages like C and Java, the return type is typically specified before the function name, followed by the name and parameter list in parentheses, with the body enclosed in braces. For example, C usesint add(int a, int b) { return a + b; } to declare a function that returns an integer sum. Similarly, Java requires public int add(int a, int b) { return a + b; }, where access modifiers like public are optional but common. In contrast, dynamically typed languages such as Python employ a keyword-based approach: def add(a, b): return a + b, omitting explicit types unless using annotations in Python 3.5+. Languages like Go and Rust place the return type after the parameter list, as in Go's func add(a int, b int) int { return a + b } or Rust's fn add(a: i32, b: i32) -> i32 { a + b }. OCaml uses a more functional style with let add a b = a + b, where types are inferred unless annotated as let add (a : int) (b : int) : int = a + b.
Parameter lists support positional arguments in most languages, with types often required in static contexts. C, C++, Java, and Rust mandate type declarations for each parameter, such as int x in C or x: i32 in Rust, while Python and JavaScript allow untyped positional parameters like def func(x): or function func(x) {}. Named parameters appear in languages supporting keyword arguments, notably Python's def func(x=1, y: int = 2):, enabling defaults and type hints. JavaScript ES6+ also supports defaults as function func(x = 1, y = 2) {}. Default values are syntactically provided via assignment-like notation in C++ (int func(int x = 0)), Python (= default), JavaScript (= default), and Go (via variadics or structs, but not directly for simple params). However, languages like C, Java, Rust, and OCaml lack built-in default parameters, requiring workarounds such as overloading or optional structs.
Return types can be explicit or inferred, influencing code verbosity and safety. Explicit declarations dominate in C (void func()), C++ (auto func() -> int for trailing returns), Java (void func()), and Go (func func() error), enforcing compile-time checks. Rust similarly requires -> Type or () for unit. Python infers returns dynamically but supports optional hints like def func() -> str:, introduced in PEP 484 for static analysis tools. OCaml infers types but allows explicit : type annotations. In JavaScript, returns are implicit (undefined if omitted), with TypeScript adding function func(): string {} for typed variants.
Function overloading allows multiple definitions with the same name but differing signatures, resolved at compile time in supporting languages. C++ enables this syntactically, as in int add(int a, int b); double add(double a, double b);, with resolution based on argument types. Java supports method overloading within classes, e.g., int add(int a, int b) {} and double add(double a, double b) {}, but not for constructors in the same way. In contrast, Python, Rust, Go, and OCaml lack native overloading, relying on dynamic dispatch, traits, or interfaces for polymorphism; for instance, Rust uses trait implementations like impl Add for i32 instead of multiple add functions. This design choice in non-overloading languages promotes explicitness and avoids ambiguity in type inference.
Module declarations organize code into namespaces, encapsulating functions and types to manage complexity and visibility. C lacks native modules, relying on header files like #include <module.h> for declarations. C++ introduces namespace blocks, e.g., namespace std { int func(); }, for logical grouping without separate files. Java uses package statements at file tops, as in package com.example; public class Module { }, compiling to directory structures. Python treats modules as individual files (e.g., module.py with def func():), imported via import module, without explicit declaration keywords. Rust employs mod for crate-internal modules, like mod mymodule { pub fn func() {} }, with pub for visibility. Go defines modules via go.mod files with module example.com/m, grouping packages as directories. OCaml uses module for structures, e.g., module MyMod = struct let func () = () end, supporting functors for parametric modules. These constructs, as analyzed in early modular language designs, emphasize separation of interface (exports) from implementation (private details) in languages like Modula-2 (DEFINITION MODULE Mod;) and Ada (package Mod is ... end Mod;), influencing modern syntax.[85]
| Language | Function Declaration Example | Supports Defaults | Supports Overloading | Module/ Namespace Example |
|---|---|---|---|---|
| C | int func(int x); | No | No | Header: #include "mod.h" |
| C++ | int func(int x = 0); | Yes | Yes | namespace Mod { ... } |
| Java | int func(int x); | No | Yes | package com.mod; |
| Python | def func(x=0): | Yes | No | File: mod.py |
| Rust | fn func(x: i32) -> i32; | No | No (traits) | mod mod { ... } |
| Go | func add(x int) int; | No | No | module example.com/m |
| OCaml | let func x = ... | No | No | module Mod = struct ... end |
Input-Output Syntax
Input-output syntax in programming languages encompasses the built-in mechanisms for reading from and writing to data streams, such as standard input/output (I/O) and files, which vary significantly across languages in terms of verbosity, formatting capabilities, and integration with core language features.[86] These differences reflect design philosophies: low-level languages like C emphasize explicit control through function calls and format strings, while higher-level ones like Python prioritize simplicity with built-in functions that handle common cases automatically.[87] In C++, stream operators provide an object-oriented approach, bridging procedural and modern paradigms. For console output, C uses theprintf family of functions from the <stdio.h> header, which require a format string followed by arguments, as in printf("%s\n", "Hello, world!"); to print a string with a newline. In contrast, Python's print function accepts multiple arguments separated by spaces, automatically adding a newline unless specified otherwise, exemplified by print("Hello, world!");.[86] C++ employs the << operator on std::cout for chained output, such as std::cout << "Hello, world!" << std::endl;, allowing seamless integration with expressions. Java relies on System.out.println("Hello, world!");, where println appends a platform-dependent line separator.[87]
Input syntax follows similar patterns of variation. In C, scanf reads formatted input into variables, like char str{{grok:render&&&type=render_inline_citation&&&citation_id=100&&&citation_type=wikipedia}}; scanf("%s", str);, parsing based on a format specifier. Python's input function reads a line from standard input as a string, optionally with a prompt: name = input("Enter name: ").[86] C++ uses the >> operator on std::cin, as in std::string name; std::cin >> name;, which extracts whitespace-separated tokens. For more flexible input in Java, the Scanner class wraps System.in, enabling String name = new Scanner(System.in).nextLine(); to capture entire lines.[88]
Formatting options enhance output precision and readability, often using placeholders or templates. C's printf supports specifiers like %d for integers and %f for floats, allowing printf("Value: %d\n", 42);. Python offers multiple methods, including f-strings (print(f"Value: {42}")), the format method ("Value: {}".format(42)), or older % formatting, providing dynamic string interpolation.[86] In C++, manipulators like std::fixed and std::setprecision adjust cout output, e.g., std::cout << std::fixed << std::setprecision(2) << 3.14159;. Java's System.out.printf mirrors C's style with %d and %f, as in System.out.printf("Value: %d\n", 42);.[89]
File handling syntax typically involves opening a stream or file object, writing data, and closing it, with languages differing in resource management. In C, files are opened with fopen("file.txt", "w") returning a FILE* pointer, followed by fprintf(fp, "%s", "Hello"); and fclose(fp);. Python uses the open built-in: with open("file.txt", "w") as f: f.write("Hello\n"), where the context manager automatically closes the file.[86] C++ provides std::ofstream for output files, like #include <fstream> std::ofstream file("file.txt"); file << "Hello";, with automatic closure on destruction. Java employs FileWriter or PrintWriter for character output: try (PrintWriter pw = new PrintWriter("file.txt")) { pw.println("Hello"); }, using try-with-resources for automatic closure.[90][91]
Programmatic stream redirection allows altering default I/O targets within code, often for logging or testing. In Python, print accepts a file parameter, e.g., print("Hello", file=open("output.txt", "w")).[86] C++ redirects by reassigning streams, such as std::ofstream log("log.txt"); std::cout.rdbuf(log.rdbuf());. Java uses System.setOut(new PrintStream(new FileOutputStream("output.txt"))); to redirect standard output.[87] In shell scripting languages like Bash, redirection operators like > achieve similar effects programmatically, as in echo "Hello" > output.txt.