Hubbry Logo
R (programming language)R (programming language)Main
Open search
R (programming language)
Community hub
R (programming language)
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
R (programming language)
R (programming language)
from Wikipedia

R
Terminal window for R
ParadigmsMulti-paradigm: procedural, object-oriented, functional, reflective, imperative, array[1]
Designed byRoss Ihaka and Robert Gentleman
DeveloperR Core Team
First appearedAugust 1993; 32 years ago (1993-08)
Stable release
4.5.1[2] Edit this on Wikidata / 13 June 2025; 4 months ago (13 June 2025)
Typing disciplineDynamic
Platformarm64 and x86-64
LicenseGPL-2.0-or-later[3]
Filename extensions
  • .R[4]
  • .r
  • .rdata
  • .rhistory
  • .rds
  • .rda[5]
Websiter-project.org
Influenced by
Influenced
  • R Programming at Wikibooks

R is a programming language for statistical computing and data visualization. It has been widely adopted in the fields of data mining, bioinformatics, data analysis, and data science.[9]

The core R language is extended by a large number of software packages, which contain reusable code, documentation, and sample data. Some of the most popular R packages are in the tidyverse collection, which enhances functionality for visualizing, transforming, and modelling data, as well as improves the ease of programming (according to the authors and users).[10]

R is free and open-source software distributed under the GNU General Public License.[3][11] The language is implemented primarily in C, Fortran, and R itself. Precompiled executables are available for the major operating systems (including Linux, MacOS, and Microsoft Windows).

Its core is an interpreted language with a native command line interface. In addition, multiple third-party applications are available as graphical user interfaces; such applications include RStudio (an integrated development environment) and Jupyter (a notebook interface).

History

[edit]
Co-originators of the R language

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.[12] The language was inspired by the S programming language, with most S programs able to run unaltered in R.[6] The language was also inspired by Scheme's lexical scoping, allowing for local variables.[1]

The name of the language, R, comes from being both an S language successor and the shared first letter of the authors, Ross and Robert.[13] In August 1993, Ihaka and Gentleman posted a binary file of R on StatLib — a data archive website.[14] At the same time, they announced the posting on the s-news mailing list.[15] On 5 December 1997, R became a GNU project when version 0.60 was released.[16] On 29 February 2000, the 1.0 version was released.[17]

Packages

[edit]
refer to caption
A violin plot created with the R package ggplot2 for data visualization

R packages are collections of functions, documentation, and data that expand R.[18] For example, packages can add reporting features (using packages such as RMarkdown, Quarto,[19] knitr, and Sweave) and support for various statistical techniques (such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering). Ease of package installation and use have contributed to the language's adoption in data science.[20]

Immediately available when starting R after installation, base packages provide the fundamental and necessary syntax and commands for programming, computing, graphics production, basic arithmetic, and statistical functionality.[21]

An example is the tidyverse collection of R packages, which bundles several subsidiary packages to provide a common API. The collection specializes in tasks related to accessing and processing "tidy data",[22] which are data contained in a two-dimensional table with a single row for each observation and a single column for each variable.[23]

Installing a package occurs only once. For example, to install the tidyverse collection:[23]

> install.packages("tidyverse")

To load the functions, data, and documentation of a package, one calls the library() function. To load the tidyverse collection, one can execute the following code:[a]

> # The package name can be enclosed in quotes
> library("tidyverse")

> # But the package name can also be used without quotes
> library(tidyverse)

The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Friedrich Leisch to host R's source code, executable files, documentation, and user-created packages.[24] CRAN's name and scope mimic the Comprehensive TeX Archive Network (CTAN) and the Comprehensive Perl Archive Network (CPAN).[24] CRAN originally had only three mirror sites and twelve contributed packages.[25] As of 30 June 2025, it has 90 mirrors[26] and 22,390 contributed packages.[27] Packages are also available in repositories such as R-Forge, Omegahat, and GitHub.[28][29][30]

To provide guidance on the CRAN web site, its Task Views area lists packages that are relevant for specific topics; sample topics include causal inference, finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.

The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.

Community

[edit]
The R Consortium is one of the three main groups that support R

There are three main groups that help support R software development:

  • The R Core Team was founded in 1997 to maintain the R source code.
  • The R Foundation for Statistical Computing was founded in April 2003 to provide financial support.
  • The R Consortium is a Linux Foundation project to develop R infrastructure.

The R Journal is an open access, academic journal that features short to medium-length articles on the use and development of R. The journal includes articles on packages, programming tips, CRAN news, and foundation news.

UseR! conference is one place the R community can gather at

The R community hosts many conferences and in-person meetups.[b] These groups include:

  • UseR!: an annual international R user conference (website)
  • Directions in Statistical Computing (DSC) (website)
  • R-Ladies: an organization to promote gender diversity in the R community (website)
  • SatRdays: R-focused conferences held on Saturdays (website)
  • Data Science & AI Conferences (website)
  • posit::conf (formerly known as rstudio::conf) (website)

On social media sites such as Twitter, the hashtag #rstats can be used to follow new developments in the R community.[31]

Examples

[edit]

Hello, World!

[edit]

The following is a "Hello, World!" program:

> print("Hello, World!")
[1] "Hello, World!"

Here is an alternative version, which uses the cat() function:

> cat("Hello, World!")
Hello, World!

Basic syntax

[edit]

The following examples illustrate the basic syntax of the language and use of the command-line interface.[c]

In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[32]

> x <- 1:6 # Create a numeric vector in the current environment
> y <- x^2 # Similarly, create a vector based on the values in x.
> print(y) # Print the vector’s contents.
[1]  1  4  9 16 25 36

> z <- x + y # Create a new vector that is the sum of x and y
> z # Return the contents of z to the current environment.
[1]  2  6 12 20 30 42

> z_matrix <- matrix(z, nrow = 3) # Create a new matrix that transforms the vector z into a 3x2 matrix object
> z_matrix 
     [,1] [,2]
[1,]    2   20
[2,]    6   30
[3,]   12   42

> 2 * t(z_matrix) - 2 # Transpose the matrix; multiply every element by 2; subtract 2 from each element in the matrix; and then return the results to the terminal.
     [,1] [,2] [,3]
[1,]    2   10   22
[2,]   38   58   82

> new_df <- data.frame(t(z_matrix), row.names = c("A", "B")) # Create a new dataframe object that contains the data from a transposed z_matrix, with row names 'A' and 'B'
> names(new_df) <- c("X", "Y", "Z") # Set the column names of the new_df dataframe as X, Y, and Z.
> print(new_df)  # Print the current results.
   X  Y  Z
A  2  6 12
B 20 30 42

> new_df$Z # Output the Z column
[1] 12 42

> new_df$Z == new_df['Z'] && new_df[3] == new_df$Z # The dataframe column Z can be accessed using the syntax $Z, ['Z'], or [3], and the values are the same. 
[1] TRUE

> attributes(new_df) # Print information about attributes of the new_df dataframe
$names
[1] "X" "Y" "Z"

$row.names
[1] "A" "B"

$class
[1] "data.frame"

> attributes(new_df)$row.names <- c("one", "two") # Access and then change the row.names attribute; this can also be done using the rownames() function
> new_df
     X  Y  Z
one  2  6 12
two 20 30 42

Structure of a function

[edit]

R is able to create functions that add new functionality for code reuse.[33] Objects created within the body of the function (which are enclosed by curly brackets) remain accessible only from within the function, and any data type may be returned. In R, almost all functions and all user-defined functions are closures.[34]

The following is an example of creating a function to perform an arithmetic calculation:

# The function's input parameters are x and y.
# The function, named f, returns a linear combination of x and y.
f <- function(x, y) {
  z <- 3 * x + 4 * y

  # An explicit return() statement is optional--it could be replaced with simply `z` in this case.
  return(z)
}

# As an alternative, the last statement executed in a function is returned implicitly.
f <- function(x, y) 3 * x + 4 * y

The following is some output from using the function defined above:

> f(1, 2) #  3 * 1 + 4 * 2 = 3 + 8
[1] 11

> f(c(1, 2, 3), c(5, 3, 4)) # Element-wise calculation
[1] 23 18 25

> f(1:3, 4) # Equivalent to f(c(1, 2, 3), c(4, 4, 4))
[1] 19 22 25

It is possible to define functions to be used as infix operators by using the special syntax `%name%`, where "name" is the function variable name:

> `%sumx2y2%` <- function(e1, e2) {e1 ^ 2 + e2 ^ 2}
> 1:3 %sumx2y2% -(1:3)
[1]  2  8 18

Since R version 4.1.0, functions can be written in a short notation, which is useful for passing anonymous functions to higher-order functions:[35]

> sapply(1:5, \(i) i^2)    # here \(i) is the same as function(i) 
[1]  1  4  9 16 25

Native pipe operator

[edit]

In R version 4.1.0, a native pipe operator, |>, was introduced.[36] This operator allows users to chain functions together, rather than using nested function calls.

> nrow(subset(mtcars, cyl == 4)) # Nested without the pipe character
[1] 11

> mtcars |> subset(cyl == 4) |> nrow() # Using the pipe character
[1] 11

An alternative to nested functions is the use of intermediate objects, rather than the pipe operator:

> mtcars_subset_rows <- subset(mtcars, cyl == 4)
> num_mtcars_subset <- nrow(mtcars_subset_rows)
> print(num_mtcars_subset)
[1] 11

While the pipe operator can produce code that is easier to read, influential R programmers like Hadley Wickham suggest to chain together at most 10-15 lines of code using this operator and saving them into objects having meaningful names to avoid code obfuscation.[37]

Object-oriented programming

[edit]

The R language has native support for object-oriented programming. There are two native frameworks, the so-called S3 and S4 systems. The former, being more informal, supports single dispatch on the first argument, and objects are assigned to a class simply by setting a "class" attribute in each object. The latter is a system like the Common Lisp Object System (CLOS), with formal classes (also derived from S) and generic methods, which supports multiple dispatch and multiple inheritance[38]

In the example below, summary() is a generic function that dispatches to different methods depending on whether its argument is a numeric vector or a factor:

> data <- c("a", "b", "c", "a", NA)
> summary(data)
   Length     Class      Mode 
        5 character character 
> summary(as.factor(data))
   a    b    c NA's 
   2    1    1    1

Modeling and plotting

[edit]
Diagnostic plots for the model from the example code in the "Modeling and plotting" section (q.v. the plot.lm() function). Mathematical notation is allowed in labels, as shown in the lower left plot.

The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.

# Create x and y values
x <- 1:6
y <- x^2

# Linear regression model: y = A + B * x
model <- lm(y ~ x)

# Display an in-depth summary of the model
summary(model)

# Create a 2-by-2 layout for figures
par(mfrow = c(2, 2))

# Output diagnostic plots of the model
plot(model)

The output from the summary() function in the preceding code block is as follows:

Residuals:
      1       2       3       4       5       6       7       8      9      10
 3.3333 -0.6667 -2.6667 -2.6667 -0.6667  3.3333

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -9.3333     2.8441  -3.282 0.030453 * 
x             7.0000     0.7303   9.585 0.000662 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared:  0.9583, Adjusted R-squared:  0.9478
F-statistic: 91.88 on 1 and 4 DF,  p-value: 0.000662

Mandelbrot set

[edit]
A Mandelbrot set as visualized in R. (Note: The colours in this image differ from the output of the sample code in the "Mandelbrot set" section.)

This example of a Mandelbrot set highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c, where c represents different complex constants.

To run this sample code, it is necessary to first install the package that provides the write.gif() function:

install.packages("caTools")

The sample code is as follows:

library(caTools)

jet.colors <-
    colorRampPalette(
        c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
          "white", "#FF7F00", "red", "#7F0000"))

dx <- 1500 # define width
dy <- 1400 # define height

C  <-
    complex(
            real = rep(seq(-2.2, 1.0, length.out = dx), each = dy),
            imag = rep(seq(-1.2, 1.2, length.out = dy), times = dx)
            )

# reshape as matrix of complex numbers
C <- matrix(C, dy, dx)

# initialize output 3D array
X <- array(0, c(dy, dx, 20))

Z <- 0

# loop with 20 iterations
for (k in 1:20) {

  # the central difference equation
  Z <- Z^2 + C

  # capture the results
  X[, , k] <- exp(-abs(Z))
}

write.gif(
    X,
    "Mandelbrot.gif",
    col = jet.colors,
    delay = 100)

Version names

[edit]
A CD with autographs on it
A CD of R Version 1.0.0, autographed by the core team of R, photographed in Quebec City in 2019

All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films.[39][40][41]

In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997.[42] Some notable early releases before the named releases include the following:

  • Version 1.0.0, released on 29 February 2000, a leap day
  • Version 2.0.0, released on 4 October 2004, "which at least had a nice ring to it"[42]

The idea of naming R version releases was inspired by the naming system for Debian and Ubuntu versions. Dalgaard noted an additional reason for the use of Peanuts references in R codenames—the humorous observation that "everyone in statistics is a P-nut."[42]

R release codenames
Version Release date Name Peanuts reference Reference
4.5.1 2025-06-13 Great Square Root [43] [44]
4.5.0 2025-04-11 How About a Twenty-Six [45] [46]
4.4.3 2025-02-28 Trophy Case [47] [48]
4.4.2 2024-10-31 Pile of Leaves [49] [50]
4.4.1 2024-06-14 Race for Your Life [51] [52]
4.4.0 2024-04-24 Puppy Cup [53] [54]
4.3.3 2024-02-29 Angel Food Cake [55] [56]
4.3.2 2023-10-31 Eye Holes [57] [58]
4.3.1 2023-06-16 Beagle Scouts [59] [60]
4.3.0 2023-04-21 Already Tomorrow [61][62][63] [64]
4.2.3 2023-03-15 Shortstop Beagle [65] [66]
4.2.2 2022-10-31 Innocent and Trusting [67] [68]
4.2.1 2022-06-23 Funny-Looking Kid [69][70][71][72][73][74] [75]
4.2.0 2022-04-22 Vigorous Calisthenics [76] [77]
4.1.3 2022-03-10 One Push-Up [76] [78]
4.1.2 2021-11-01 Bird Hippie [79][80] [78]
4.1.1 2021-08-10 Kick Things [81] [82]
4.1.0 2021-05-18 Camp Pontanezen [83] [84]
4.0.5 2021-03-31 Shake and Throw [85] [86]
4.0.4 2021-02-15 Lost Library Book [87][88][89] [90]
4.0.3 2020-10-10 Bunny-Wunnies Freak Out [91] [92]
4.0.2 2020-06-22 Taking Off Again [93] [94]
4.0.1 2020-06-06 See Things Now [95] [96]
4.0.0 2020-04-24 Arbor Day [97] [98]
3.6.3 2020-02-29 Holding the Windsock [99] [100]
3.6.2 2019-12-12 Dark and Stormy Night See It was a dark and stormy night#Literature[101] [102]
3.6.1 2019-07-05 Action of the Toes [103] [104]
3.6.0 2019-04-26 Planting of a Tree [105] [106]
3.5.3 2019-03-11 Great Truth [107] [108]
3.5.2 2018-12-20 Eggshell Igloos [109] [110]
3.5.1 2018-07-02 Feather Spray [111] [112]
3.5.0 2018-04-23 Joy in Playing [113] [114]
3.4.4 2018-03-15 Someone to Lean On [115][116][117] [118]
3.4.3 2017-11-30 Kite-Eating Tree See Kite-Eating Tree[119] [120]
3.4.2 2017-09-28 Short Summer See It Was a Short Summer, Charlie Brown [121]
3.4.1 2017-06-30 Single Candle [122] [123]
3.4.0 2017-04-21 You Stupid Darkness [122] [124]
3.3.3 2017-03-06 Another Canoe [125] [126]
3.3.2 2016-10-31 Sincere Pumpkin Patch [127] [128]
3.3.1 2016-06-21 Bug in Your Hair [129] [130]
3.3.0 2016-05-03 Supposedly Educational [131] [132]
3.2.5 2016-04-11 Very, Very Secure Dishes [133] [134][135][136]
3.2.4 2016-03-11 Very Secure Dishes [133] [137]
3.2.3 2015-12-10 Wooden Christmas-Tree See A Charlie Brown Christmas[138] [139]
3.2.2 2015-08-14 Fire Safety [140][141] [142]
3.2.1 2015-06-18 World-Famous Astronaut [143] [144]
3.2.0 2015-04-16 Full of Ingredients [145] [146]
3.1.3 2015-03-09 Smooth Sidewalk [147][page needed] [148]
3.1.2 2014-10-31 Pumpkin Helmet See You're a Good Sport, Charlie Brown [149]
3.1.1 2014-07-10 Sock it to Me [150][151][152][153] [154]
3.1.0 2014-04-10 Spring Dance [103] [155]
3.0.3 2014-03-06 Warm Puppy [156] [157]
3.0.2 2013-09-25 Frisbee Sailing [158] [159]
3.0.1 2013-05-16 Good Sport [160] [161]
3.0.0 2013-04-03 Masked Marvel [162] [163]
2.15.3 2013-03-01 Security Blanket [164] [165]
2.15.2 2012-10-26 Trick or Treat [166] [167]
2.15.1 2012-06-22 Roasted Marshmallows [168] [169]
2.15.0 2012-03-30 Easter Beagle [170] [171]
2.14.2 2012-02-29 Gift-Getting Season See It's the Easter Beagle, Charlie Brown[172] [173]
2.14.1 2011-12-22 December Snowflakes [174] [175]
2.14.0 2011-10-31 Great Pumpkin See It's the Great Pumpkin, Charlie Brown[176] [177]
r-devel N/A Unsuffered Consequences [178] [42]

Interfaces

[edit]

R is installed with a command line console by default, but there are multiple ways to interface with the language:

Statistical frameworks that use R in the background include Jamovi and JASP.[citation needed]

Implementations

[edit]

The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include the following:

Microsoft R Open (MRO) was an R implementation. As of 30 June 2021, Microsoft began to phase out MRO in favor of the CRAN distribution.[184]

Commercial support

[edit]

Although R is an open-source project, some companies provide commercial support:

  • Oracle provides commercial support for its Big Data Appliance, which integrates R into its other products.
  • IBM provides commercial support for execution of R within Hadoop.

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
R is a environment and programming language designed primarily for statistical and , providing a wide variety of built-in functions for data manipulation, calculation, statistical analysis (including linear and nonlinear modeling, time-series analysis, , and clustering), and the production of publication-quality plots with mathematical symbols and formulae. It is highly extensible through user-contributed packages available via the Comprehensive R Archive Network (CRAN), which hosts over 20,000 add-on libraries for specialized applications in fields like , bioinformatics, and (as of 2025). Originally developed in 1993 by statisticians and Robert Gentleman at the in as an open-source implementation inspired by the S language from Bell Laboratories, R has evolved into a collaborative under the GNU General Public License, maintained by the R Core Team and a global community of contributors. R's syntax is influenced by S but incorporates modern programming paradigms, supporting procedural, object-oriented, and functional styles, while its integration with languages like C, C++, and Fortran allows for efficient handling of computationally intensive tasks. The language runs on multiple platforms, including Windows, macOS, and Unix-like systems, making it accessible for researchers, data scientists, and analysts worldwide. Its popularity stems from its role in reproducible research, with tools like R Markdown enabling the integration of code, results, and narrative documentation, and it underpins major ecosystems such as the tidyverse for data wrangling and visualization. Despite competition from languages like Python, R remains a cornerstone in academia and industry for its domain-specific strengths in statistics and data visualization.

History

Origins and Development

R was created in 1993 by statisticians Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand, as an open-source implementation of the S programming language developed at Bell Laboratories. The project began as a response to the limitations of proprietary statistical software like S-PLUS, which restricted accessibility and extensibility for academic and research use. Ihaka and Gentleman aimed to provide a free alternative focused on statistical analysis, graphical capabilities, and data manipulation, enabling broader adoption in teaching and research environments. Their initial prototype was announced publicly in August 1993 via the S-news mailing list, with early versions distributed through StatLib archives. The language's design drew heavily from S for its syntax and statistical primitives but incorporated influences from languages like and Scheme to enhance evaluation semantics, memory management, and extensibility. This blend allowed R to support interactive data exploration and custom function creation more efficiently than its predecessor. In 1995, R adopted the GNU General Public License, formalizing its status as and facilitating contributions from the global community; this transition was supported by efforts from Martin Mächler to align it with standards. By mid-1997, the R Core Team was formed, comprising Ihaka, Gentleman, and other key contributors including Douglas Bates, John Chambers, and Kurt Hornik, who took collective responsibility for maintaining the source code via a CVS repository. The establishment of the R Foundation for Statistical Computing in April 2003 marked a pivotal step in institutionalizing support for the project, providing a nonprofit structure to manage donations, conferences, and ongoing development without commercial dependencies. This organization, founded by members of the Core Team, ensured the language's sustainability as an open-source tool for statistical computing, reflecting its evolution from an academic experiment to a foundational resource in .

Version History

R's version history begins with its first public release on February 29, 2000, marking the transition from alpha and beta versions to a stable software environment for statistical computing. Subsequent major releases have followed a structured cycle, with significant updates occurring roughly every few years to introduce new features, improve performance, and incorporate community contributions. Prior to version 3.0.0, major releases (x.y.0) were scheduled biannually around 1 and October 1, shifting to an annual release pattern thereafter to align with development timelines. Key milestones include R 1.0.0, the inaugural stable version that established core functionality for and . R 2.0.0, released on October 4, 2004, introduced namespaces to enhance package organization and reduce naming conflicts, a foundational change for the growing ecosystem of user-contributed extensions. The release of R 3.0.0 on April 3, 2013, brought support for long vectors, enabling handling of datasets exceeding 2^31 - 1 elements without issues. R 4.0.0, launched on April 24, 2020, emphasized performance optimizations and required reinstallation of binary packages due to changes in formats, signaling a commitment to evolving the language while maintaining core stability. More recent developments continued this progression with R 4.1.0 on May 18, 2021, which added the native pipe operator |> to streamline workflows natively, reducing reliance on external packages like magrittr. The 4.5.x series, comprising R 4.5.0 (April 11, 2025), R 4.5.1 (June 13, 2025), and R 4.5.2 (October 31, 2025), focused on refinements such as parallel downloads in install.packages() and download.packages() using libcurl for faster package acquisition, alongside enhanced string handling in functions like substr() to preserve names during subassignment and improved iconv() for encoding conversions. Deprecations in this series included warnings for matrix() when input length mismatches dimensions (introduced earlier but enforced progressively) and removal of legacy S-compatibility macros to modernize the C API. R versions employ a semantic (major.minor.patch), where major releases (e.g., 2.0.0, 3.0.0) introduce substantial features, minor releases add enhancements, and patch releases address bugs. Each release carries a playful codename, such as "" for 4.0.0, "Camp Pontanezen" for 4.1.0, and "[Not] Part in a Rumble" for 4.5.2, adding a lighthearted element to the development process without formal documentation on selection criteria. R's development adheres to a policy that prioritizes stability for existing code, though major version increments may necessitate package recompilation or updates due to internal changes like or shifts. This approach ensures broad usability across versions, with older branches occasionally reopened for critical fixes based on user needs. Releases actively incorporate community feedback through mechanisms like the R Bug Tracking System and the r-devel , where developers review submissions to prioritize enhancements and resolve issues before finalizing stable versions.

Language Overview

Philosophy and Design Goals

R was developed as an open-source implementation of the S programming language, primarily aimed at facilitating statistical analysis, with built-in extensions for graphics production and efficient data handling to support researchers in exploratory data analysis. This design choice stemmed from the need to provide a freely available alternative to the proprietary S, enabling widespread adoption in academic and research environments while preserving S's core capabilities for statistical computing. The language's philosophy emphasizes interactive use, allowing users to experiment with data and models in a dynamic environment that promotes discovery and iteration. A central design principle of R is the promotion of vectorized operations, which apply functions across entire arrays or vectors simultaneously, thereby enhancing efficiency in numerical computations and discouraging the use of explicit loops that could hinder performance in statistical tasks. This approach aligns with the language's focus on handling large datasets typical in statistical research, where operations like aggregation and transformation need to be both expressive and performant without low-level programming overhead. R's open-source ethos, governed by the GNU General Public License, further underscores its commitment to and collaborative development, as users can freely inspect, modify, and extend the codebase, fostering a vast of contributed packages that advance statistical methodologies. While R prioritizes statistical expressiveness—such as integrated support for testing, regression modeling, and visualization—over the raw speed of general-purpose languages, this trade-off enables domain-specific optimizations that make complex analyses more accessible to non-programmers in research fields. The influences from statistical needs are evident in features like built-in functions for common inferential procedures, reflecting the language's origins in addressing the practical demands of scientists and statisticians at institutions like . Overall, these goals ensure R remains a tool tailored for rigorous, reproducible scientific inquiry rather than broad .

Basic Syntax

R employs a concise syntax for variable assignment, primarily using the <- operator, though = is also permitted in certain contexts such as top-level assignments. For instance, the expression x <- 5 assigns the numeric value 5 to the variable x, creating or overwriting the object in the current environment. This operator is directional, allowing reverse assignment with ->, and variables in R are dynamically typed, meaning no explicit declaration of type is required upon assignment—data types such as numeric, character, or logical are inferred automatically. Comments in R are denoted by the # symbol, which ignores all text from that point to the end of the line, facilitating code documentation without affecting execution. Expressions in R are evaluated immediately upon entry in interactive mode, such as the R console, where entering 2 + 3 directly outputs 5 without needing assignment. This immediate evaluation supports , with results printed unless suppressed using functions like invisible(). A key feature of R's syntax is vector recycling during operations on vectors of unequal lengths, where the shorter vector is implicitly repeated to match the longer one. For example, c(1, 2) + 3 recycles the scalar 3 to produce c(4, 5), enabling element-wise arithmetic without explicit looping. A warning is issued if the lengths are not multiples, promoting awareness of potential mismatches. Workspace management in R involves functions to inspect and modify the environment's objects. The ls() function lists all objects in the current environment, returning a character vector of their names, such as ls() displaying "x" after the earlier assignment. Conversely, rm() removes specified objects, for example rm(x) to delete x, or rm(list = ls()) to clear the entire workspace. Basic error handling in R utilizes the tryCatch() function to capture and manage exceptions, allowing code to continue execution despite errors by specifying handlers for conditions like or warnings. The structure wraps an expression with arguments for error recovery, such as tryCatch(expr, error = function(e) e), providing a mechanism to gracefully handle runtime issues without halting the session.

Programming Constructs

Data Types and Structures

R's fundamental data structures are built around vectors, which serve as the core building blocks for handling statistical . Atomic vectors are homogeneous collections of elements of the same basic type, including numeric (double precision), , character, logical (), and complex numbers. They can be created using the c() function to concatenate values, such as x <- c(1.1, 2, 3.3), or seq() for sequences, like y <- seq(1, 10, by=2). These vectors support 1-based indexing for access, e.g., x[2], and enable efficient vectorized operations central to R's design for statistical computing. Atomic vectors can carry attributes, which are metadata providing additional structure without altering the underlying data. Common attributes include names for labeling elements, set via names(x) <- c("first", "second"), and dim for transforming a vector into a multi-dimensional array or matrix, such as dim(z) <- c(2, 3) to create a 2x3 matrix. Attributes are stored as a named list and can be inspected with attributes() or retrieved individually using attr(). Lists provide a flexible, heterogeneous counterpart to atomic vectors, allowing elements of different types within a single ordered collection. Created with list(), as in lst <- list(a=1:3, b="hello", c=TRUE), lists are accessed using double brackets [[ ]] for single elements, e.g., lst[[1]], or the dollar sign $ for named components, like lst$b. This structure supports recursive nesting, making lists suitable for complex, tree-like data representations in statistical applications. Factors are specialized atomic vectors designed for categorical data, storing unique levels as an attribute to represent discrete variables efficiently. They are created using factor(), for example f <- factor(c("low", "high", "low")), which internally uses integers to index levels like c("low", "high"). Levels can be accessed via levels(f), and factors support ordering for ordinal data through ordered() or the ordered argument in factor(). This design optimizes storage and enables specialized statistical modeling for categories. Data frames extend lists into tabular formats, combining vectors or factors of equal length as columns to mimic spreadsheet-like structures for multivariate data. Created with data.frame(), such as df <- data.frame(id=1:3, score=c(85, 90, 78), group=factor(c("A", "B", "A"))), they allow row and column access via df[1, 2] or df$score. Functions like nrow(df) and ncol(df) provide dimensions, while attributes such as row.names and names manage identifiers. Data frames are essential for handling datasets in statistical analysis, supporting heterogeneous column types. Missing values in R are represented by NA for general absence of data, which coerces to the appropriate type (e.g., NA_real_ for numeric), and NaN specifically for undefined numeric results like 0/0. These can be included in vectors or other structures, as in v <- c(1, NA, 3), and detected using is.na() for both or is.nan() exclusively for NaN. Proper handling of NA and NaN is crucial in statistical computations to avoid propagation errors.

Functions

Functions in R are defined using the function() constructor, which takes a list of formal arguments and a body consisting of one or more R expressions. The syntax is name <- function(arglist) body, where arglist specifies the parameters and body contains the executable code. Upon execution, a function implicitly returns the value of its last evaluated expression, though explicit returns can be used with the return() function. For example, the function square <- function(x) x * x computes the square of its input and returns the result. Arguments to R functions can be passed positionally, by name, or using partial matching for names. Formal arguments may include default values, specified as arg = default_value, allowing callers to omit them. The ellipsis ... enables variadic functions to accept a variable number of arguments, which are then accessible within the function body. For instance, in add <- function(a, b, ...) a + b + sum(...), additional arguments passed via ... are summed and added to the result. Argument evaluation employs lazy evaluation through promises, delaying computation until the argument is accessed in the body. R employs lexical scoping, where free variables in a function are resolved by searching the environment in which the function was defined, proceeding outward through enclosing environments until the global environment or search path. Environments in R are collections of named objects (frames) with a pointer to an enclosing environment, forming a hierarchy that supports this scoping mechanism. Functions are closures, meaning they capture and retain their defining environment, allowing access to non-local variables even after the original environment changes. An example is make.power <- function(n) { pow <- function(x) x^n; pow }, where the returned pow function closes over n from its creation context. Control flow in R is managed through conditional and iterative constructs. The if statement evaluates a condition and executes one of two expressions: if (cond) true_expr else false_expr, returning the value of the executed branch. Loops include for (var in vector) body for iterating over sequences, while (cond) body for condition-based repetition, and repeat body for indefinite loops (typically exited with break). The switch statement selects an expression based on the value of an input, using either numeric indices or character strings to match cases: switch(expr, case1 = value1, case2 = value2). Additionally, the ifelse() function provides vectorized conditional selection. To promote functional programming and avoid explicit loops, R includes the apply family of functions for implicit iteration over data structures. lapply() applies a function to each element of a list or vector, returning a list of results. sapply() behaves similarly but simplifies the output to a vector or matrix when possible. apply() operates on arrays or matrices along specified margins (e.g., rows or columns). For example, lapply(1:3, function(x) x^2) yields list(1, 4, 9). Anonymous functions in R are defined inline without assignment to a name, using the same function() syntax, and are commonly employed in higher-order functions like those in the apply family. For instance, sapply(1:5, function(x) x * 2) doubles each element without defining a named function. This lambda-like usage enhances conciseness in functional compositions.

Object-Oriented Features

R implements object-oriented programming primarily through three systems: S3, S4, and Reference Classes (also known as R5). These systems enable method dispatch based on object classes, facilitating extensible and modular code, particularly in statistical computing. The S3 system, R's original and simplest object-oriented approach, relies on informal conventions for class assignment and method dispatch. Objects are basic R types augmented with a class attribute, a character vector specifying one or more classes. Generic functions, such as print() or summary(), invoke methods via the UseMethod() function, which dispatches based on the class of the first argument. For instance, calling print(x) where class(x) is "foo" seeks a method named print.foo; if absent, it falls back to the default print.default or uses NextMethod() for inheritance in multi-class scenarios. This functional style of OOP, where methods are independent functions, originated in the S language and emphasizes simplicity for base R operations. In contrast, the S4 system provides a more formal and rigorous framework, suitable for complex packages requiring strict validation and multiple dispatch. Classes are defined using setClass(), specifying slots (data members) with their types and optional validity checks via validObject(). Methods are registered with generic functions using setMethod(), allowing dispatch on multiple arguments' classes. For example, a class "MyClass" might include slots like value of type "numeric", ensuring type safety at creation. S4 supports inheritance, coercion (as()), and combination classes, making it ideal for domain-specific modeling. This system, implemented in the methods package, is widely used in for genomic data structures due to its enforceability and extensibility. Reference Classes (R5) extend OOP to support mutable objects, addressing limitations in S3 and S4 where objects are immutable by default. Defined via setRefClass(), they encapsulate fields and methods in an environment, allowing direct state modification without copying—changes to one reference affect all aliases. Access occurs through $ for fields and methods (e.g., obj$field <- 5) and @ for direct slot manipulation, though the latter is discouraged outside internals. These classes support copy-on-modify semantics via copy() and are useful for stateful simulations or iterative algorithms. S3 prioritizes simplicity and is prevalent in base R for quick extensions, while S4 offers formality for robust, validated hierarchies in specialized packages like ; Reference Classes fill the gap for mutability in both. In statistical contexts, generics like lm() and glm() exemplify S3 dispatch: lm() fits linear models and returns an "lm" object, triggering class-specific methods such as summary.lm() for tailored output. Similarly, glm() handles generalized linear models with dispatch to "glm" methods, enabling seamless integration of custom behaviors.

Specialized Features

Pipe Operator

The native pipe operator |> was introduced in R version 4.1.0, released in May 2021, providing a built-in mechanism to chain operations by passing the value from the left-hand side (LHS) of the operator as the first argument to the expression on the right-hand side (RHS). This syntax simplifies code readability, particularly for sequential data transformations, by avoiding nested function calls or intermediate variable assignments; for instance, mean(1:10) can be rewritten as 1:10 |> mean(). Unlike earlier approaches that relied on external packages, the native pipe is part of base R, eliminating the need for additional dependencies and offering slight performance advantages due to its implementation as syntax rather than a function call. In contrast to the popular %>% operator from the magrittr package, which popularized in R since 2014, the native |> operator is more restrictive in some aspects but integrates seamlessly with base R features. For example, while magrittr's placeholder . can be used flexibly in multiple positions or with operators without explicit naming, the native pipe defaults to the first argument and requires the _ placeholder (introduced in R 4.2.0) for non-first positions via named arguments, such as mtcars |> lm(mpg ~ wt, data = _); alternatively, anonymous functions can be used for custom placement, like x |> (function(y) f(a, y)). This design avoids the overhead of magrittr's more versatile but computationally costlier implementation, making |> preferable for performance-critical code. The pipe excels in manipulations, such as filtering, mutating, and summarizing datasets, promoting a linear, readable flow that aligns with R's statistical workflow; the style guide recommends using |> to emphasize sequences of actions over the initial object. Best practices include limiting chains to 4-6 steps per line to maintain clarity, wrapping long in curly braces {} for multiline readability, and avoiding overuse in loops or conditional branches where traditional control structures may be more appropriate. For side-effect operations like printing or plotting without altering the pipeline's output, the pipe can feed into functions that produce no return value, though the RHS result becomes the overall output—use invisible() if needed to suppress it. The native pipe requires R 4.1.0 or later for use; in older versions, compatibility is achieved by substituting with magrittr's %>%, which offers similar functionality but requires loading the package. No direct backport package exists for |>, but transitional tools like the magrittr equivalents ensure code portability across R installations.

Statistical and Mathematical Functions

R provides a comprehensive suite of built-in functions for mathematical computations and statistical analysis, which are vectorized to operate efficiently on arrays and data structures such as vectors and matrices. These functions form the core of R's utility for numerical and probabilistic tasks, enabling users to perform calculations without external dependencies. Mathematical functions in R include elementary operations like trigonometric, logarithmic, and exponential computations. The sin(), cos(), and tan() functions compute sine, cosine, and tangent of angles in radians, respectively, and apply element-wise to vectors; for example, sin(pi/2) returns 1. The log() function calculates the natural logarithm (base e) by default, with an optional base parameter for other bases, such as log(10, base=10) returning 1. Similarly, exp() computes the exponential function exe^x, as in exp(1) yielding approximately 2.718. For linear algebra, the solve() function solves systems of linear equations Ax=bA x = b for xx, where AA is a square matrix and bb a vector or matrix; if bb is omitted, it returns the inverse of AA, using LAPACK routines for numerical stability. An example is solve(matrix(c(1,2,3,4), nrow=2)), which inverts a 2x2 matrix. These operations support complex numbers and preserve row/column names. Statistical summary functions offer quick computations of and variability. The mean() function calculates the of a numeric vector, equivalent to xi/n\sum x_i / n, as in mean(c(1,2,3)) returning 2. The sd() function computes the sample standard deviation, (xixˉ)2/(n1)\sqrt{\sum (x_i - \bar{x})^2 / (n-1)}
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.