Hubbry Logo
Uniform Resource IdentifierUniform Resource IdentifierMain
Open search
Uniform Resource Identifier
Community hub
Uniform Resource Identifier
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Uniform Resource Identifier
Uniform Resource Identifier
from Wikipedia
Uniform Resource Identifier
AbbreviationURI
Native name
RFC 3986[1]
StatusActive
OrganizationInternet Engineering Task Force
Authors
DomainWorld Wide Web
Websitedatatracker.ietf.org/doc/html/rfc3986#section-1.1

A Uniform Resource Identifier (URI), formerly Universal Resource Identifier, is a unique sequence of characters that identifies an abstract or physical resource,[1]: 1  such as resources on a webpage, email address, phone number,[1]: 7  books, real-world objects such as people and places, and concepts[1]: 5 

URIs which provide a means of locating and retrieving information resources on a network (either on the Internet or on another private network, such as a computer file system or an Intranet) are Uniform Resource Locators (URLs). Therefore, URLs are a subset of URIs, i.e. every URL is a URI (and not necessarily the other way around).[1]: 7  Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it; these are Uniform Resource Names (URNs). The web technologies that use URIs are not limited to web browsers.

History

[edit]

Conception

[edit]

URIs and URLs have a shared history. In 1990, Tim Berners-Lee's proposals for hypertext implicitly introduced the idea of a URL as a short string representing a resource that is the target of a hyperlink.[2] At the time, people referred to it as a "hypertext name"[3] or "document name".

Over the next three and a half years, as the World Wide Web's core technologies of HTML, HTTP, and web browsers developed, a need to distinguish a string that provided an address for a resource from a string that merely named a resource emerged. Although not yet formally defined, the term Uniform Resource Locator came to represent the former, and the more contentious Uniform Resource Name came to represent the latter. In July 1992 Berners-Lee's report on the Internet Engineering Task Force (IETF) "UDI (Universal Document Identifiers) BOF" mentions URLs (as Uniform Resource Locators), URNs (originally, as Unique Resource Numbers), and the need to charter a new working group.[4] In November 1992 the IETF "URI Working Group" met for the first time.[5]

During the debate over defining URLs and URNs, it became evident that the concepts embodied by the two terms were merely aspects of the fundamental, overarching, notion of resource identification. In June 1994, the IETF published RFC 1630, Berners-Lee's first Request for Comments that acknowledged the existence of URLs and URNs. Most importantly, it defined a formal syntax for Universal Resource Identifiers (i.e. URL-like strings whose precise syntaxes and semantics depended on their schemes). It also attempted to summarize the syntaxes of URL schemes in use at the time. It acknowledged – but did not standardize—the existence of relative URLs and fragment identifiers.[6]

Refinement

[edit]

In December 1994, RFC 1738[7] formally defined relative and absolute URLs, refined the general URL syntax, defined how to resolve relative URLs to absolute form, and better enumerated the URL schemes then in use. The agreed definition and syntax of URNs had to wait until the publication of IETF RFC 2141[8] in May 1997.

The publication of IETF RFC 2396[9] in August 1998 saw the URI syntax become a separate specification[9] and most of the parts of RFCs 1630 and 1738 relating to URIs and URLs in general were revised and expanded by the IETF. The new RFC changed the meaning of U in URI from "Universal" to "Uniform."

In December 1999, RFC 2732[10] provided a minor update to RFC 2396, allowing URIs to accommodate IPv6 addresses. A number of shortcomings discovered in the two specifications led to a community effort, coordinated by RFC 2396 co-author Roy Fielding, that culminated in the publication of IETF RFC 3986[1] in January 2005. While obsoleting the prior standard, it did not render the details of existing URL schemes obsolete; RFC 1738 continues to govern such schemes except where otherwise superseded. IETF RFC 2616[11] for example, refines the http scheme. Simultaneously, the IETF published the content of RFC 3986 as the full standard STD 66, reflecting the establishment of the URI generic syntax as an official Internet protocol.

In 2001, the World Wide Web Consortium's (W3C) Technical Architecture Group (TAG) published a guide to best practices and canonical URIs for publishing multiple versions of a given resource.[12] For example, content might differ by language or by size to adjust for capacity or settings of the device used to access that content.

In August 2002, IETF RFC 3305[13] pointed out that the term "URL" had, despite widespread public use, faded into near obsolescence, and serves only as a reminder that some URIs act as addresses by having schemes implying network accessibility, regardless of any such actual use. As URI-based standards such as Resource Description Framework make evident, resource identification need not suggest the retrieval of resource representations over the Internet, nor need they imply network-based resources at all.

The Semantic Web uses the HTTP URI scheme to identify both documents and concepts for practical uses, a distinction which has caused confusion as to how to distinguish the two. The TAG published an e-mail in 2005 with a solution of the problem, which became known as the httpRange-14 resolution.[14] The W3C subsequently published an Interest Group Note titled "Cool URIs for the Semantic Web", which explained the use of content negotiation and the HTTP 303 response code for redirections in more detail.[15]

Design

[edit]

URLs and URNs

[edit]

A Uniform Resource Name (URN) is a URI that identifies a resource by name in a particular namespace. A URN may be used to talk about a resource without implying its location or how to access it. For example, in the International Standard Book Number (ISBN) system, ISBN 0-486-27557-4 identifies a specific edition of the William Shakespeare play Romeo and Juliet. The URN for that edition would be urn:isbn:0-486-27557-4. However, it gives no information as to where to find a copy of that book.

A Uniform Resource Locator (URL) is a URI that specifies the means of acting upon or obtaining the representation of a resource, i.e. specifying both its primary access mechanism and network location. For example, the URL http://example.org/wiki/Main_Page refers to a resource identified as /wiki/Main_Page, whose representation is obtainable via the Hypertext Transfer Protocol (http:) from a network host whose domain name is example.org. (In this case, HTTP usually implies it to be in the form of HTML and related code. In practice, that is not necessarily the case, as HTTP allows specifying arbitrary formats in its header.)

A URN is analogous to a person's name, while a URL is analogous to their street address. In other words, a URN identifies an item and a URL provides a method for finding it.

Technical publications, especially standards produced by the IETF and by the W3C, normally reflect a view outlined in a W3C Recommendation of 30 July 2001, which acknowledges the precedence of the term URI rather than endorsing any formal subdivision into URL and URN.

URL is a useful but informal concept: a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network "location"), rather than by some other attributes it may have.[16]

As such, a URL is simply a URI that happens to point to a resource over a network.[a][13] However, in non-technical contexts and in software for the World Wide Web, the term "URL" remains widely used. Additionally, the term "web address" (which has no formal definition) often occurs in non-technical publications as a synonym for a URI that uses the http or https schemes. Such assumptions can lead to confusion, for example, in the case of XML namespaces that have a visual similarity to resolvable URIs.

Specifications produced by the WHATWG prefer URL over URI, and so newer HTML5 APIs use URL over URI.[17]

Standardize on the term URL. URI and IRI [Internationalized Resource Identifier] are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest.[18]

While most URI schemes were originally designed to be used with a particular protocol, and often have the same name, they are semantically different from protocols. For example, the scheme http is generally used for interacting with web resources using HTTP, but the scheme file has no protocol.

Syntax

[edit]

A URI has a scheme that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme. The URI generic syntax is a superset of the syntax of all URI schemes. It was first defined in RFC 2396, published in August 1998,[9] and finalized in RFC 3986, published in January 2005.[19]

A URI is composed from an allowed set of ASCII characters consisting of reserved characters (gen-delims: :, /, ?, #, [, ], and @; sub-delims: !, $, &, ', (, ), *, +, ,, ;, and =),[1]: 13–14  unreserved characters (uppercase and lowercase letters, decimal digits, -, ., _, and ~),[1]: 13–14  and the character %.[1]: 12  Syntax components and subcomponents are separated by delimiters from the reserved characters (only from generic reserved characters for components) and define identifying data represented as unreserved characters, reserved characters that do not act as delimiters in the component and subcomponent respectively,[1]: §2  and percent-encodings when the corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoding of an identifying data octet is a sequence of three characters, consisting of the character % followed by the two hexadecimal digits representing that octet's numeric value.[1]: §2.1 

The URI generic syntax consists of five components organized hierarchically in order of decreasing significance from left to right:[1]: §3 

URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]

A component is undefined if it has an associated delimiter and the delimiter does not appear in the URI; the scheme and path components are always defined.[1]: §5.2.1  A component is empty if it has no characters; the scheme component is always non-empty.[1]: §3 

The authority component consists of subcomponents:

authority = [userinfo "@"] host [":" port]

This is represented in a syntax diagram as:

URI syntax diagram

The URI comprises:

  • A non-empty scheme component followed by a colon (:), consisting of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (+), period (.), or hyphen (-). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. Examples of popular schemes include http, https, ftp, mailto, file, data and irc. URI schemes should be registered with the Internet Assigned Numbers Authority (IANA), although non-registered schemes are used in practice.[20]
  • An optional authority component preceded by two slashes (//), comprising:
    • An optional userinfo subcomponent followed by an at symbol (@), that may consist of a user name and an optional password preceded by a colon (:). Use of the format username:password in the userinfo subcomponent is deprecated for security reasons. Applications should not render as clear text any data after the first colon (:) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password).
    • A host subcomponent, consisting of either a registered name (including but not limited to a hostname) or an IP address. IPv4 addresses must be in dot-decimal notation, and IPv6 addresses must be enclosed in brackets ([]).[1]: §3.2.2 [b]
    • An optional port subcomponent preceded by a colon (:), consisting of decimal digits.
  • A path component, consisting of a sequence of path segments separated by a slash (/). A path is always defined for a URI, though the defined path may be empty (zero length). A segment may also be empty, resulting in two consecutive slashes (//) in the path component. A path component may resemble or map exactly to a file system path but does not always imply a relation to one. If an authority component is defined, then the path component must either be empty or begin with a slash (/). If an authority component is undefined, then the path cannot begin with an empty segment—that is, with two slashes (//)—since the following characters would be interpreted as an authority component.[9]: §3.3 
By convention, in http and https URIs, the last part of a path is named pathinfo and it is optional. It is composed by zero or more path segments that do not refer to an existing physical resource name (e.g. a file, an internal module program or an executable program) but to a logical part (e.g. a command or a qualifier part) that has to be passed separately to the first part of the path that identifies an executable module or program managed by a web server; this is often used to select dynamic content (a document, etc.) or to tailor it as requested (see also: CGI and PATH_INFO, etc.).
Example:
URI: "http://www.example.com/questions/3456/my-document"
where: "/questions" is the first part of the path (an executable module or program) and "/3456/my-document" is the second part of the path named pathinfo, which is passed to the executable module or program named "/questions" to select the requested document.
An http or https URI containing a pathinfo part without a query part may also be referred to as a 'clean URL,' whose last part may be a 'slug.'
Query delimiter Example
Ampersand (&) key1=value1&key2=value2
Semicolon (;)[c] key1=value1;key2=value2
  • An optional query component preceded by a question mark (?), consisting of a query string of non-hierarchical data. Its syntax is not well defined, but by convention is most often a sequence of attribute–value pairs separated by a delimiter.
  • An optional fragment component preceded by a hash (#). The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id attribute of a specific element, and web browsers will scroll this element into view.

The scheme- or implementation-specific reserved character + may be used in the scheme, userinfo, host, path, query, and fragment, and the scheme- or implementation-specific reserved characters !, $, &, ', (, ), *, ,, ;, and = may be used in the userinfo, host, path, query, and fragment. Additionally, the generic reserved character : may be used in the userinfo, path, query and fragment, the generic reserved characters @ and / may be used in the path, query and fragment, and the generic reserved character ? may be used in the query and fragment.[1]: §A 

Example URIs

[edit]

The following figure displays example URIs and their component parts.

DOIs (digital object identifiers) fit within the Handle System and fit within the URI system, as facilitated by appropriate syntax.

URI references

[edit]

A URI reference is either a URI or a relative reference when it does not begin with a scheme component followed by a colon (:).[1]: §4.1  A path segment that contains a colon character (e.g., foo:bar) cannot be used as the first path segment of a relative reference if its path component does not begin with a slash (/), as it would be mistaken for a scheme component. Such a path segment must be preceded by a dot path segment (e.g., ./foo:bar).[1]: §4.2 

Web document markup languages frequently use URI references to point to other resources, such as external documents or specific portions of the same logical document:[1]: §4.4 

  • in HTML, the value of the src attribute of the img element provides a URI reference, as does the value of the href attribute of the a or link element;
  • in XML, the system identifier appearing after the SYSTEM keyword in a DTD is a fragmentless URI reference;
  • in XSLT, the value of the href attribute of the xsl:import element/instruction is a URI reference; likewise the first argument to the document() function.
https://example.com/path/resource.txt#fragment
//example.com/path/resource.txt
/path/resource.txt
path/resource.txt
../resource.txt
./resource.txt
resource.txt
#fragment

Resolution

[edit]

Resolving a URI reference against a base URI results in a target URI. This implies that the base URI exists and is an absolute URI (a URI with no fragment component). The base URI can be obtained, in order of precedence, from:[1]: §5.1 

  • the reference URI itself if it is a URI;
  • the content of the representation;
  • the entity encapsulating the representation;
  • the URI used for the actual retrieval of the representation;
  • the context of the application.

Within a representation with a well defined base URI of

http://a/b/c/d;p?q

a relative reference is resolved to its target URI as follows:[1]: §5.4 

"g:h"     -> "g:h"
"g"       -> "http://a/b/c/g"
"./g"     -> "http://a/b/c/g"
"g/"      -> "http://a/b/c/g/"
"/g"      -> "http://a/g"
"//g"     -> "http://g"
"?y"      -> "http://a/b/c/d;p?y"
"g?y"     -> "http://a/b/c/g?y"
"#s"      -> "http://a/b/c/d;p?q#s"
"g#s"     -> "http://a/b/c/g#s"
"g?y#s"   -> "http://a/b/c/g?y#s"
";x"      -> "http://a/b/c/;x"
"g;x"     -> "http://a/b/c/g;x"
"g;x?y#s" -> "http://a/b/c/g;x?y#s"
""        -> "http://a/b/c/d;p?q"
"."       -> "http://a/b/c/"
"./"      -> "http://a/b/c/"
".."      -> "http://a/b/"
"../"     -> "http://a/b/"
"../g"    -> "http://a/b/g"
"../.."   -> "http://a/"
"../../"  -> "http://a/"
"../../g" -> "http://a/g"

URL munging

[edit]

URL munging is a technique by which a command is appended to a URL, usually at the end, after a "?" token. It is commonly used in WebDAV as a mechanism of adding functionality to HTTP. In a versioning system, for example, to add a "checkout" command to a URL, it is written as http://editing.com/resource/file.php?command=checkout. It has the advantage of both being easy for CGI parsers and also acts as an intermediary between HTTP and underlying resource, in this case.[24]

Relation to XML namespaces

[edit]

In XML, a namespace is an abstract domain to which a collection of element and attribute names can be assigned. The namespace name is a character string which must adhere to the generic URI syntax.[25] However, the name is generally not considered to be a URI,[26] because the URI specification bases the decision not only on lexical components, but also on their intended use. A namespace name does not necessarily imply any of the semantics of URI schemes; for example, a namespace name beginning with http: may have no connotation to the use of the HTTP.

Originally, the namespace name could match the syntax of any non-empty URI reference, but the use of relative URI references was deprecated by the W3C.[27] A separate W3C specification for namespaces in XML 1.1 permits Internationalized Resource Identifier (IRI) references to serve as the basis for namespace names in addition to URI references.[28]

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. URIs provide a simple and extensible means for identifying such resources, enabling uniform interpretation across different contexts and schemes on the Internet. The generic syntax of a URI consists of a scheme (e.g., "http" or "urn"), followed by a hierarchical part that may include an authority (such as a host and port), a path, an optional query for non-hierarchical data, and an optional fragment identifier for a secondary resource. This structure allows URIs to function as locators, names, or both, without implying that the resource is accessible or retrievable. For instance, common schemes include "http" for web resources, "ftp" for file transfers, and "mailto" for email addresses. URIs encompass two main subsets: Uniform Resource Locators (URLs), which identify resources while providing a primary access mechanism (e.g., network location), and Uniform Resource Names (), which offer globally unique and persistent names under the "" scheme, even if the resource becomes unavailable. Originating from the project in 1990, URIs evolved through standards like RFC 1630 and were formalized in RFC 3986 to support global resource identification independent of specific protocols.

Fundamentals

Definition and Purpose

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. This standardized string enables the unique referencing of entities such as documents, services, or concepts within networked systems, without necessarily implying direct access or location. The primary purpose of a URI is to facilitate across diverse information systems by providing a simple, universal mechanism for naming and referencing resources unambiguously. It supports a federated naming approach, allowing different protocols and schemes to coexist while ensuring consistent identification. Key characteristics include its , which aids in easy transcription and ; extensibility, permitting scheme-specific extensions without disrupting the overall framework; and scheme-based identification, where a leading scheme (e.g., "http") dictates the syntax and semantics for the remainder of the identifier. URIs originated in the early to address naming inconsistencies arising from the proliferation of protocols and systems for document retrieval on the nascent . For instance, the URI "http://" identifies a specific , distinguishing it from other identifiers by its scheme and path components. While URIs form the basis for subtypes like Uniform Resource Locators (URLs) and Uniform Resource Names (URNs), they provide a general framework for resource identification.

Components and Syntax Overview

A Uniform Resource Identifier (URI) follows a generic syntax that structures its components to enable uniform identification of resources across different schemes. The overall form is scheme : hier-part [ ? query ] [ # fragment ], where the scheme specifies the protocol or naming system, the hierarchical part often includes an and path, and optional query and fragment components provide additional data or references. This syntax ensures by defining how components delimit and encode . The scheme component identifies the URI's naming or protocol scheme, such as http or [mailto](/page/Mailto), and consists of a sequence starting with an alphabetic character followed by alphanumeric characters, plus, period, or . It is followed by a colon (:) and determines how the rest of the URI is interpreted. The component, when present, begins with two slashes (//) and represents a hierarchical addressing ; it includes an optional userinfo (credentials like username and password, in the form userinfo@), a required host (domain name, IP address, or literal), and an optional (a number for service identification). For example, in example.com:8080, [example.com](/page/Example.com) is the host and 8080 the port. The path follows the authority (or scheme if no authority) and denotes the resource's hierarchical location, composed of segments separated by slashes (/), such as /documents/file.txt. The optional query component, introduced by a (?), carries non-hierarchical parameters in key-value pairs, like key=value&other=param. Finally, the fragment identifier, starting with a hash (#), points to a secondary resource or internal section within the primary resource, such as #summary. The generic syntax is formally defined using Augmented Backus-Naur Form (ABNF) in RFC 3986. A simplified excerpt of the ABNF grammar for a URI is as follows:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty authority = [ userinfo "@" ] host [ ":" port ] scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" )

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty authority = [ userinfo "@" ] host [ ":" port ] scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" )

Here, pchar represents path characters, including unreserved, percent-encoded, sub-delims, :, and @. This notation specifies the allowable structure and characters for each part. URIs distinguish between reserved and unreserved characters to separate delimiters from data. Reserved characters include generic delimiters like : / ? # [ ] @ and sub-delimiters like ! $ & ' ( ) * + , ; =, which may have special meanings in certain components and must be percent-encoded if used as data. Unreserved characters, such as alphanumeric letters, digits, hyphen (-), period (.), underscore (_), and tilde (~), can appear without encoding. Percent-encoding represents characters outside this set (or reserved ones used as data) as a percent sign (%) followed by two hexadecimal digits, e.g., space as %20 or non-ASCII characters via UTF-8 octet sequences. This ensures safe transmission across systems, with encoded forms equivalent to their decoded counterparts when unreserved. To illustrate, consider the URI https://user:[email protected]:8080/path?key=value#section:
  • Scheme: https (specifies secure HTTP protocol).
  • Authority: user:[email protected]:8080 (userinfo user:pass, host example.com, port 8080).
  • Path: /path (hierarchical resource location).
  • Query: key=value (parameters for the request).
  • Fragment: section (internal reference within the resource).
If the URI contained a space in the path, it would be encoded as %20 to comply with syntax rules.

History

Conception

The foundational concepts for Uniform Resource Identifiers (URIs), including the addressing system now known as URLs, were developed by in late 1990 as part of his implementation of the first World Wide Web prototype at . This work proposed a unified naming system to reference resources across the , influenced by hierarchical naming conventions in earlier systems such as the directory services and the (DNS). The public conception of the URI syntax emerged in early 1992 through Berners-Lee's Universal Document Identifier (UDI) proposal, which outlined a generic structure to address the growing need for consistent resource referencing. The primary motivation was the fragmentation in internet addressing schemes during the early 1990s, hindering seamless hypertext linking in the project. Protocols like FTP, , WAIS, and news groups employed incompatible formats—such as FTP's host-relative paths versus 's menu-based selectors—creating barriers to a cohesive "information universe." The UDI addressed these by introducing a , scheme-based syntax that abstracted protocol-specific details, exemplified by file://info.cern.ch/pub/www/doc/udi1.ps. This enabled dynamic linking regardless of retrieval mechanisms, fostering interoperability. Key early documents include the February 1992 UDI draft, which solicited feedback and highlighted integrations with WAIS and , and the contemporaneous November 1992 HTTP draft, which embedded URI-like addressing for hypertext retrieval. By March 1992, at an IETF BOF, these ideas had evolved into foundational web proposals, with UDI serving as the basis for unified naming across protocols.

Standardization and Evolution

The standardization of Uniform Resource Identifiers (URIs) began with RFC 1630, published in June 1994 by , which provided an informal definition of URI syntax and its role in enabling a global information infrastructure. This document outlined the basic structure of URIs, including schemes, hierarchical components, and the use of for non-ASCII characters, laying the groundwork for uniform naming and addressing on the without enforcing strict parsing rules. A significant refinement came with RFC 2396 in January 1998, authored by , , and Larry Masinter, which introduced a more precise syntax specification and formalized the handling of relative URI references. This update addressed ambiguities in the original syntax, defined equivalence rules for URI comparison, and emphasized the separation of scheme-specific processing, making URIs more robust for protocols. The IETF URI Working Group, established around this time, played a central role in these developments, coordinating input from the broader community to ensure interoperability. The current standard, RFC 3986 from January 2005, also authored by Fielding, Masinter, and Berners-Lee, obsoleted RFC 2396 and provided a comprehensive, ABNF-based syntax definition with enhanced clarity on aspects, such as reserved characters and fragment identifiers. This revision incorporated lessons from widespread URI deployment, including better support for secure schemes and normalization procedures to reduce variant representations. URI evolution has continued through integrations with related protocols, such as RFC 7230 (June 2014), which defines HTTP/1.1 semantics and specifies how URIs are processed in HTTP messages, ensuring consistency in web transfers. Additionally, RFC 6874 (February 2013) extended URI handling to include literal addresses within the host component, using zone IDs and bracketed notation to accommodate modern networking needs. Support for internationalization advanced with RFC 3987 (January 2005), which introduced Internationalized Resource Identifiers (IRIs) as a superset of URIs, allowing Unicode characters in international contexts while maintaining compatibility through UTF-8 encoding and mapping rules. In the 2020s, discussions within IETF and W3C working groups have explored URI adaptations for decentralized systems, such as Decentralized Identifiers (DIDs) under W3C Recommendation (July 2022), which leverage URI syntax for self-sovereign identity without central authority. Key contributions to URI standardization stem from Tim Berners-Lee's foundational vision, Roy Fielding's architectural refinements in dissertations and RFCs, and collaborative efforts by IETF working groups like URI and Appsawg, which have sustained updates amid evolving web technologies.

URI Structure

General Syntax

The general syntax of a Uniform Resource Identifier (URI) is formally defined in RFC 3986 as URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ], where the scheme identifies the URI's namespace and syntax rules, the hierarchical part provides the location or name, the query adds parameters, and the fragment identifies a secondary resource within the primary one. This syntax is specified using Augmented Backus-Naur Form (ABNF) grammar, which outlines the production rules for each component. The complete relevant ABNF for URI production is as follows:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty authority = [ userinfo "@" ] host [ ":" port ] path-abempty = *( "/" segment ) path-absolute = "/" [ segment-nz *( "/" segment ) ] path-rootless = segment-nz *( "/" segment ) path-empty = 0<pchar> segment = *pchar segment-nz = 1*pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" ) pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty authority = [ userinfo "@" ] host [ ":" port ] path-abempty = *( "/" segment ) path-absolute = "/" [ segment-nz *( "/" segment ) ] path-rootless = segment-nz *( "/" segment ) path-empty = 0<pchar> segment = *pchar segment-nz = 1*pchar segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" ) pct-encoded = "%" HEXDIG HEXDIG unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

These rules ensure URIs are structured and unambiguous, with HEXDIG representing hexadecimal digits (0-9, A-F, a-f) and ALPHA and DIGIT as standard alphabetic and numeric characters. URIs are classified as absolute or relative based on the presence of a scheme. An absolute URI begins with a scheme followed by a colon and includes a hierarchical part, as in absolute-URI = scheme ":" hier-part [ "?" query ], providing a complete reference independent of context. In contrast, a relative URI lacks a scheme and is resolved against a base URI; its reference form is relative-ref = relative-part [ "?" query ] [ "#" fragment ], where the relative part can be // authority path-abempty (network-path), path-absolute (starting with "/"), path-noscheme (starting with "//" but without authority), path-rootless (no leading "/"), or path-empty. Path-absolute forms, such as those beginning with "/", denote an absolute path from the root, while path-rootless forms, like "resource" without a leading slash, indicate a relative path starting from the current level. Percent-encoding is used to represent characters outside the unreserved set (ALPHA, DIGIT, "-", ".", "_", "~") or reserved set (gen-delims: ":", "/", "?", "#", "[", "]", "@"; sub-delims: "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=") when they appear as data rather than s. Non-ASCII characters are first encoded into octet sequences, then each octet is percent-encoded as "%" followed by two uppercase digits; for example, the space character (U+0020) becomes "%20", the forward slash "/" (when used as data) becomes "%2F", and the "?" becomes "%3F". Reserved characters must be encoded if their literal interpretation would alter parsing, such as encoding "/" in a path segment to prevent it from being treated as a . Invalid URIs violate these syntax rules and may lead to failures or issues. Common errors include unencoded spaces, which are not allowed in any component and must be percent-encoded as "%20"; mismatched brackets, such as an unclosed "[" or "]" in the (e.g., in addresses), rendering the URI syntactically invalid; or improper use of , like lowercase hexadecimal digits (though normalization allows them, strict validation prefers uppercase). Implementations should reject or normalize such cases to ensure , as older systems might mishandle sequences like "/../" in queries as path traversals.

Scheme-Specific Elements

The scheme component of a URI serves as the initial identifier that specifies the protocol, , or access method for the , enabling a federated and extensible naming system across different applications and environments. Each URI scheme defines its own syntax and semantics, which may impose restrictions or extensions on the generic URI structure defined in RFC 3986, while adhering to the overall absolute-URI . For instance, schemes like "http" indicate the Hypertext Transfer Protocol, while "urn" denotes a namespace for persistent identifiers. Common schemes exhibit distinct syntactic requirements. The "http" scheme mandates an authority component with a host and optional port, where the host identifies the target server and the port defaults to 80 if omitted; for example, http://[example.com](/page/Example.com) is equivalent to http://[example.com](/page/Example.com):80. In contrast, the "file" scheme primarily uses a path-only structure for local file access, such as file:/etc/hosts, which is platform-dependent— systems start with a slash, while Windows supports drive letters like file:c:/path/file.txt—and treats an empty or "" authority as referring to the local host. The "" scheme embeds inline directly, following the syntax data:[<mediatype>][;base64],<data>, where the (defaulting to text/plain;charset=US-ASCII) specifies the content format, and the data is either URL-encoded or -encoded; an example is data:text/plain;[base64](/page/Base64),SGVsbG8gd29ybGQ=. The "" scheme, used for email addresses, consists of an email address optionally followed by headers in the query-like portion, such as mailto:user@[example.com](/page/Example.com)?subject=Hello. Authority components vary by scheme, reflecting security and usability considerations. In the "http" scheme, the userinfo subcomponent (e.g., username:[password](/page/Password)@host) is deprecated due to risks of exposing credentials in logs or referrals, and implementations should treat its presence as an error. Port defaults are scheme-specific: 80 for "http", 443 for "https", and none for schemes like "file" that do not use network authorities. Schemes without an authority, such as "" or "mailto", omit the double slash (//) and proceed directly to the path or data. Query and fragment handling also adapts to scheme semantics. For "http", the query component (?key=value) carries non-hierarchical parameters for resource selection, such as http://example.com/search?q=uri, while the fragment (#anchor) identifies a secondary resource or location within the primary one, like a document section, processed client-side without server transmission. In the "file" scheme, queries are not used, and fragments may reference byte ranges or other file-specific anchors if supported by the implementation. The "data" scheme treats any post-comma content as opaque data without separate query or fragment support, though fragments can be appended for media-type-specific dereferencing. For "mailto", the query-like part holds email headers (e.g., [email protected]&[email protected]), but true fragments are not defined. URI schemes are registered with the (IANA) to ensure uniqueness and interoperability, following procedures outlined in BCP 35 (RFC 7595) for expert review or first-come-first-served allocation. The registry includes permanent, provisional, and historical entries, with 349 schemes documented as of November 2025. Common registered schemes encompass (File Transfer Protocol), (Lightweight Directory Access Protocol), tel (Telephone), (Constrained Application Protocol), sip (Session Initiation Protocol), and the examples noted above, each referencing a defining RFC for precise syntax.

URI Variants

Uniform Resource Locators (URLs)

A Uniform Resource Locator (URL) is a subset of Uniform Resource Identifiers (URIs) that not only identifies a resource but also provides a specific mechanism for locating and accessing it, typically over a network such as the . Unlike more general URIs, URLs incorporate scheme-specific details that enable retrieval, such as network protocols like HTTP or FTP. This focus on location makes URLs essential for web addressing and resource fetching in distributed systems. The term "" was coined in RFC 1738, published in December 1994, which formalized the syntax and semantics for locating resources available via the as part of the World Wide Web initiative. This specification built on earlier concepts from RFC 1630 and established URLs as compact string representations for Internet-accessible resources. Over time, URLs have become synonymous with web addresses, evolving alongside web technologies while maintaining their core role in resource location. In terms of structure, URLs for network schemes—such as those using HTTP or —require a mandatory component, which includes the host (e.g., a or ) and optionally a and user information, prefixed by "//". This is followed by a path that specifies the resource within the host, along with optional query parameters for additional data and a fragment identifier for intra-document navigation. The general form adheres to the URI syntax but emphasizes locatability through the scheme's access method. For example, consider the URL https://www.example.com/page?query=1#fragment:
  • Scheme: https indicates a secure HTTP connection.
  • Authority: www.example.com specifies the host.
  • Path: /page identifies the .
  • Query: ?query=1 passes parameters to the .
  • Fragment: #fragment targets a section within the .
    This breakdown illustrates how URLs encode both location and access details hierarchically.
URLs have evolved to address practical challenges in global and constrained environments. services, first publicly released with in 2002, create compact aliases that redirect to the original long URL, aiding sharing on platforms with character limits like early . Additionally, support for Internationalized Domain Names (IDNs) was introduced via RFC 3490 in 2003, using to encode non-ASCII characters in domain names (e.g., converting "café.com" to "xn--caf-dma.com"), enabling multilingual URLs while preserving ASCII compatibility in DNS. These developments enhance usability without altering the foundational location-based syntax.

Uniform Resource Names (URNs)

A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the "urn" scheme to provide a persistent, location-independent name for a resource. Originally specified in 1997, URNs serve as abstract identifiers that remain stable over time, enabling the naming of entities such as documents, books, or individuals without reference to their current location. Unlike locators, URNs focus on identification rather than retrieval, supporting long-term reference in systems where resources may migrate or change access points. The syntax of a URN follows the form urn:<NID>:<NSS>, where <NID> is the Namespace Identifier—a registered string of alphanumeric characters and hyphens that defines the naming authority—and <NSS> is the Namespace-Specific String, which carries the within that . The <NID> is case-insensitive and limited to 1-32 characters, while the <NSS> may include percent-encoded characters to handle reserved or non-ASCII data. This structure ensures global uniqueness and compatibility with URI parsing rules. For instance, urn:isbn:0-306-40615-2 identifies a specific book using the ISBN . Namespace Identifiers (NIDs) are formally registered with the (IANA) to prevent collisions and maintain interoperability; examples include "" for International Standard Book Numbers and "oid" for Object Identifiers used in standards like ASN.1. Registration follows an expert review process outlined in RFC 8141, ensuring each namespace has a defined assignment and resolution policy. URNs can be resolved through dedicated resolvers that map the identifier to metadata, alternative representations, or locators as per the namespace's rules. Common examples illustrate URN applications: urn:ietf:rfc:2141 names the original URN syntax document itself, providing a stable reference for IETF standards, while namespaces like "mpeg" enable URNs for objects, such as urn:mpeg:url:abc123 for an MPEG-encoded resource. These demonstrate how URNs support diverse, enduring naming needs across digital ecosystems.

References and Resolution

URI References

A URI reference is a string that can represent either an absolute URI, a relative reference, or an , serving as a compact means to identify resources relative to a base URI. This form allows for flexible referencing in documents and protocols without requiring full absolute paths. Relative references follow the syntax relative-ref = relative-part [ "?" query ] [ "#" fragment ], where the relative-part can be a network-path (starting with "//"), an absolute-path (starting with "/"), a rootless path (starting with but no "/"), or an empty path. Absolute paths begin with a slash and denote a path from the root, rootless paths start directly with a non-empty segment for subdirectories, and empty paths indicate the base URI itself without modification. The optional query and fragment components append parameters or internal anchors as in absolute URIs. To resolve a relative reference into an absolute URI, the process merges it with a base URI through a defined algorithm. First, the base URI is parsed into its components: scheme, authority, path, query, and fragment. If the relative reference includes a scheme, it is treated as absolute; otherwise, the base scheme and authority are retained unless the reference starts with "//", in which case only the authority is replaced. Paths are then merged by appending the relative path to the base path (after removing the last segment if necessary) and resolving dot-segments: "." represents the current directory and is removed, while ".." ascends to the parent directory, with a two-buffer mechanism to handle these iteratively. Query and fragment parts from the reference override those of the base if present. For example, given a base URI of http://a/b/c/d;p?q, the relative reference g resolves to http://a/b/c/g by appending to the base path; ../g resolves to http://a/b/g by removing the last two segments before appending; and /g resolves to http://a/g by replacing the entire path. Another common case is ./image.jpg relative to http://[example.com](/page/Example.com)/dir/, which resolves to http://[example.com](/page/Example.com)/dir/image.jpg after removing the "." segment. URI references are widely used in markup languages for hyperlinks and resource inclusion, such as in HTML's <a href=""> and <img src=""> attributes, where they resolve against the document's base URI set by the <base> element. In XML, the xml:base attribute establishes a base URI for resolving relative references within elements, processing instructions, or entity content. This enables modular document structures, like linking to local images or stylesheets without absolute paths.

Resolution Mechanisms

Resolution of a Uniform Resource Identifier (URI) refers to the process of mapping the identifier to the corresponding resource through dereferencing, which involves determining the access mechanism and parameters based on the URI's scheme and components. This mechanism enables applications to locate and interact with resources without requiring prior knowledge of their exact representation or location. The resolution process begins with parsing the URI into its components: scheme, authority (including host and port), path, query, and fragment, as defined by the generic syntax. The scheme dictates the protocol or handler to use, such as TCP/IP for hierarchical schemes. Next, the authority component is contacted: for hostnames, this typically involves (DNS) resolution to obtain an , followed by establishing a connection to the specified (defaulting to scheme-specific values, like port 80 for HTTP). The path and query components then guide the request to the specific resource within the authority's namespace. Delegation in URI resolution allows hierarchical administration of the , where the component enables a central registry to assign sub-namespaces to entities. For instance, in schemes using registered names, DNS provides a distributed model, resolving hostnames through a of authoritative servers. This structure supports scalable resource location without a single point of control. In the HTTP scheme, resolution occurs over TCP/IP: after DNS resolves the host to an IP, a client connects to the port, sends a GET request with the path and query, and receives the or a response code. For the scheme, resolution is namespace-specific, often involving dedicated resolvers that map the to locators via protocols like NAPTR DNS records or HTTP-based services. Unlike location-based schemes, resolution emphasizes persistence and may not yield direct access but rather equivalent URIs. Error handling during resolution is scheme-dependent; for example, in HTTP, if the resource is unavailable, the server returns a 404 Not Found status code. Redirects are managed through 3xx status codes, instructing the client to follow an alternative URI for the resource. Invalid URIs or unreachable authorities may result in connection failures or protocol-specific errors, prompting applications to flag or retry as appropriate.

Applications and Extensions

Use in Web Technologies

URIs play a foundational role in the Hypertext Transfer Protocol (HTTP), where they form the request-target that identifies the primary upon which an HTTP method is applied, such as GET for retrieval or POST for submission in ful architectures. This usage enables precise addressing of on the server, supporting stateless interactions where the URI alone suffices to locate and operate on the target without additional session state. In design, URIs delineate endpoints that embody principles, allowing clients to manipulate through standardized methods while facilitating and across distributed systems. In markup languages like and XML, URIs integrate seamlessly to enable linking and resource embedding. The href attribute in HTML's <a> element specifies a URI reference for hyperlinks, directing users or agents to connected documents or sections, while the src attribute in elements like <img> or <script> denotes a URI for loading external media or code. Similarly, XML's specification employs the href attribute to embed URI-based locators within elements, supporting bidirectional, multi-ended, and out-of-line links that extend beyond simple anchors to complex traversals in XML documents. Within the Semantic Web, URIs function as unique, global identifiers for abstract resources in RDF and OWL ontologies, where HTTP URIs are preferred for their dereferenceability—allowing retrieval of machine-readable descriptions (e.g., ) via standard HTTP GET requests when accessed. This design promotes principles, enabling automated discovery and integration of knowledge across the web by resolving identifiers to informative representations. Contemporary web technologies extend URI applications to interactive and service-oriented protocols. WebSockets leverage the ws:// and wss:// URI schemes to initiate bidirectional communication channels over HTTP, with the URI specifying the server host, , and path for the upgrade . Service workers register via a script URL and define an associated scope URL, intercepting fetch requests within that scope to enable offline functionality and caching. APIs typically expose a single HTTP endpoint URI (e.g., /graphql) for POST requests containing queries, allowing flexible data retrieval without multiple resource-specific URIs. URIs also underpin and in web ecosystems. In HTTP , the Domain and Path attributes derive from the request URI to scope cookie applicability, ensuring state is tied to specific origins. 2.0 employs URIs for critical parameters like redirect_uri, which specifies the client endpoint for returning codes or tokens, and client_id, a unique identifier for the client application. Additionally, in HTTP uses the request URI in conjunction with Accept headers to select resource variants, such as different media types or languages, based on client preferences.

Internationalization and IRIs

Internationalized Resource Identifiers (IRIs) extend the URI framework to support characters from the Universal Character Set (UCS), also known as or ISO 10646, enabling the use of non-ASCII scripts in resource identifiers. Defined in RFC 3987 published in January 2005, an IRI is a sequence of characters that allows internationalized text while maintaining compatibility with existing URI infrastructure. This extension addresses the limitations of URIs, which are restricted to ASCII characters, by permitting native representation of scripts such as Chinese, , or Cyrillic directly in the identifier. The syntax of an IRI closely mirrors that of a URI, as outlined in RFC 3986, but replaces the unreserved character set with an expanded set that includes UCS characters (denoted as UCSCHAR in the Augmented Backus-Naur Form or ABNF grammar). Specifically, IRI components like the scheme, , path, query, and fragment follow the same hierarchical structure, but non-ASCII characters are allowed in positions where URIs permit unreserved characters, with reserved characters (such as /, ?, and #) retaining their delimiters. For instance, the authority component can include internationalized domain names via Internationalizing Domain Names in Applications (IDNA), while path and query segments support UCS characters without immediate encoding. To ensure interoperability with URI-based systems, IRIs are mapped to URIs through a process involving UTF-8 encoding followed by percent-encoding of non-ASCII octets. The conversion algorithm first transforms the IRI's UCS characters (excluding those in the authority's ireg-name) into UTF-8 byte sequences, then applies percent-encoding to any bytes outside the US-ASCII range, producing a valid URI. For the domain name portion (ireg-name), the toASCII algorithm from RFC 3490 (Punycode) is applied to convert internationalized labels to ASCII Compatible Encoding (ACE) form, prefixed with "xn--". Conversely, the toUnicode algorithm reverses this process, decoding percent-encoded sequences back to UTF-8 and interpreting ACE domains as Unicode labels where supported. These mappings ensure that IRIs can be processed in legacy URI environments without loss of information. IRIs have seen widespread adoption in web standards and implementations, particularly for global accessibility. The Living Standard requires support for IRI semantics in URL handling, including parsing and serialization, to accommodate internationalized content in attributes like href. Modern web browsers handle IRIs by converting internationalized domain names to for DNS resolution while displaying the native script to users, as per IDNA guidelines; for example, Chrome and apply these conversions transparently in the and link processing. Protocols like HTTP/1.1 and further integrate IRI support, allowing non-ASCII characters in headers and document references when encoded appropriately. Despite these advancements, IRIs present challenges related to text rendering and equivalence. Bidirectional text in scripts like Arabic or Hebrew requires logical storage order and application of the Unicode Bidirectional Algorithm, with restrictions prohibiting mixed-direction components within a single IRI to avoid visual confusion or security risks. Normalization is another key issue; IRIs should be represented in Unicode Normalization Form C (NFC) to mitigate variations from different normalization forms, ensuring consistent comparison across systems—simple string matching or syntax-based normalization can then determine equivalence. For example, the IRI http://例.com/ページ maps to the URI http://xn--fsq.com/%E3%83%9A%E3%83%BC%E3%82%B8 , where the domain is -encoded and the path segment is percent-encoded. Similarly, http://résumé.example.org as an IRI becomes http://xn--rsum-bpad.example.org in URI form, demonstrating domain application without path encoding if ASCII. These conversions highlight how IRIs facilitate multilingual web navigation while preserving URI compatibility.

Considerations

Normalization and Munging

Normalization standardizes URI representations to enable accurate comparison and determination of equivalence without accessing the referenced resource. The process, outlined in RFC 3986, involves syntax-based adjustments to eliminate variations that do not affect the identified resource. These adjustments ensure that equivalent URIs, such as those differing only in case or encoding, are transformed into identical forms for syntactic equivalence. Case normalization converts the scheme and host components to lowercase, as they are case-insensitive. For example, the URI "HTTP://www.EXAMPLE.com/" normalizes to "http://www.example.com/". digits within percent-encoded octets are also normalized to uppercase for consistency, treating "%3a" and "%3A" as equivalent. normalization decodes any percent-encoded octets that represent unreserved characters (such as A-Z, a-z, 0-9, hyphen, period, , and ), removing unnecessary encodings like "%20" for a where direct representation is allowed. Path segment normalization applies the remove_dot_segments algorithm to eliminate "." and ".." segments, simplifying paths like "/docs/./../docs" to "/docs". After these transformations, syntactic equivalence is assessed by character-by-character comparison of the normalized strings; identical results indicate the URIs reference the same syntactically. Semantic equivalence builds on this by incorporating scheme-specific rules, such as treating an empty path in HTTP URIs as equivalent to a path of "/". For instance, "http://example.com", "http://example.com/", and "http://example.com:80/" are semantically equivalent under HTTP rules. URL munging involves unauthorized or ad-hoc modifications to URIs that can alter their equivalence or cause resolution failures. Common practices include prepending "www." to the host component, such as changing "example.com" to "www.example.com", which may lead to errors if the server does not configure the equivalently. Another frequent alteration is appending or removing trailing slashes from paths, potentially creating duplicate content or triggering unintended redirects; for example, "http://example.com/page" and "http://example.com/page/" might resolve differently depending on server configuration. Such changes disrupt canonical forms and can result in broken links or inconsistent access. Best practices for handling normalization include using established canonicalization algorithms in programming libraries. Python's urllib.parse module, for instance, provides functions like urlsplit and urlunsplit that perform case normalization on the scheme and host, decode percent-encodings appropriately, and handle path components, producing a standardized representation compliant with RFC 3986 basics. Implementations should apply full syntax-based normalization before comparison to avoid false non-equivalences, prioritizing these steps over scheme-specific adjustments unless required for the application context.

Security Implications

Uniform Resource Identifiers (URIs) introduce several security risks due to their role in directing resource access, particularly when parsed or resolved without proper safeguards. Open redirects occur when applications accept untrusted URI inputs for redirection without validation, allowing attackers to manipulate users into visiting malicious sites, often as a precursor to or credential harvesting. Similarly, injection attacks exploit query parameters or fragments in URIs; for instance, unescaped inputs in query strings can lead to (XSS) if reflected into web pages, while fragments may trigger client-side script execution in vulnerable browsers. Scheme-specific threats amplify these vulnerabilities. The javascript: URI scheme enables direct execution of JavaScript code in the context of the current page, facilitating XSS attacks by injecting malicious scripts when users click or navigate to such links, as browsers historically allowed this for . The data: URI scheme, which embeds data directly into the URI, poses phishing risks by allowing attackers to craft self-contained pages mimicking legitimate sites, bypassing external hosting and evading some URL filters. Historical incidents highlight the real-world impact of URI-related exploits. In the 2010s, URL shortening services like bit.ly were abused in campaigns such as the worm, which used shortened URIs to redirect users to downloads, spreading via and infecting thousands of systems. These exploits often combined open redirects with obfuscated malicious payloads, demonstrating how URI opacity can facilitate large-scale attacks. Mitigations focus on defensive handling of URIs during and resolution. URI validation involves checking schemes, hosts, and parameters against whitelists to block untrusted inputs, while browser sandboxing isolates URI processing to prevent from malicious schemes. Content-Security-Policy (CSP) headers provide an additional layer by restricting executable scripts and navigations, effectively blocking javascript: and certain data: executions in modern browsers. Best practices emphasize proactive design to minimize exposure. Developers should avoid the deprecated userinfo component (e.g., username:password@host) in URIs, as it exposes credentials in logs and browser histories; instead, use secure alternatives like with headers. Always validate allowed schemes (e.g., restricting to https:) and enforce to encrypt URIs in transit, preventing interception of sensitive parameters during resolution.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.