World Wide Web
View on Wikipedia
| World Wide Web | |
|---|---|
| Abbreviation | WWW |
| Status | Active |
| Year started | 1989 |
| First published | 6 August 1991 |
| Organization | |
| Authors | Tim Berners-Lee |

The World Wide Web (also known as WWW, W3, or simply the Web[1]) is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists.[2] It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).[3]
The Web was invented by English computer scientist Tim Berners-Lee while at CERN in 1989 and opened to the public in 1993. It was conceived as a "universal linked information system".[4][5][6] Documents and other media content are made available to the network through web servers and can be accessed by programs such as web browsers. Servers and resources on the World Wide Web are identified and located through a character string called uniform resource locator (URL).
The original and still very common document type is a web page formatted in Hypertext Markup Language (HTML). This markup language supports plain text, images, embedded video and audio contents, and scripts (short programs) that implement complex user interaction. The HTML language also supports hyperlinks (embedded URLs) which provide immediate access to other web resources. Web navigation, or web surfing, is the common practice of following such hyperlinks across multiple websites. Web applications are web pages that function as application software. The information in the Web is transferred across the Internet using HTTP. Multiple web resources with a common theme and usually a common domain name make up a website. A single web server may provide multiple websites, while some websites, especially the most popular ones, may be provided by multiple servers. Website content is provided by a myriad of companies, organisations, government agencies, and individual users; and comprises an enormous amount of educational, entertainment, commercial, and government information.
The Web has become the world's dominant information systems platform.[7][8][9][10] It is the primary tool that billions of people worldwide use to interact with the Internet.[3]
History
[edit]
The Web was invented by English computer scientist Tim Berners-Lee while working at CERN.[11][12] He was motivated by the problem of storing, updating, and finding documents and data files in that large and constantly changing organisation, as well as distributing them to collaborators outside CERN. In his design, Berners-Lee dismissed the common tree structure approach, used for instance in the existing CERNDOC documentation system and in the Unix filesystem, as well as approaches that relied on tagging files with keywords, as in the VAX/NOTES system. Instead he adopted concepts he had put into practice with his private ENQUIRE system (1980) built at CERN. When he became aware of Ted Nelson's hypertext model (1965), in which documents can be linked in unconstrained ways through hyperlinks associated with "hot spots" embedded in the text, it helped to confirm the validity of his concept.[13][14]

The model was later popularised by Apple's HyperCard system. Unlike Hypercard, Berners-Lee's new system from the outset was meant to support links between multiple databases on independent computers, and to allow simultaneous access by many users from any computer on the Internet. He also specified that the system should eventually handle other media besides text, such as graphics, speech, and video. Links could refer to mutable data files, or even fire up programs on their server computer. He also conceived "gateways" that would allow access through the new system to documents organised in other ways (such as traditional computer file systems or the Usenet). Finally, he insisted that the system should be decentralised, without any central control or coordination over the creation of links.[5][15][11][12]
Berners-Lee submitted a proposal to CERN in May 1989, without giving the system a name.[5] He got a working system implemented by the end of 1990, including a browser called WorldWideWeb (which became the name of the project and of the network) and an HTTP server running at CERN. As part of that development he defined the first version of the HTTP protocol, the basic URL syntax, and implicitly made HTML the primary document format.[16] The technology was released outside CERN to other research institutions starting in January 1991, and then to the whole Internet on 23 August 1991. The Web was a success at CERN, and began to spread to other scientific and academic institutions. Within the next two years, there were 50 websites created.[17][18]
CERN made the Web protocol and code available royalty free on 30 April 1993, enabling its widespread use.[19][20][21] After the NCSA released the Mosaic web browser later that year, the Web's popularity grew rapidly as thousands of websites sprang up in less than a year.[22][23] Mosaic was a graphical browser that could display inline images and submit forms that were processed by the HTTPd server.[24][25] Marc Andreessen and Jim Clark founded Netscape the following year and released the Navigator browser, which introduced Java and JavaScript to the Web. It quickly became the dominant browser. Netscape became a public company in 1995 which triggered a frenzy for the Web and started the dot-com bubble.[26] Microsoft responded by developing its own browser, Internet Explorer, starting the browser wars. By bundling it with Windows, it became the dominant browser for 14 years.[27]
Berners-Lee founded the World Wide Web Consortium (W3C) which created XML in 1996 and recommended replacing HTML with stricter XHTML.[28] In the meantime, developers began exploiting an IE feature called XMLHttpRequest to make Ajax applications and launched the Web 2.0 revolution. Mozilla, Opera, and Apple rejected XHTML and created the WHATWG which developed HTML5.[29] In 2009, the W3C conceded and abandoned XHTML.[30] In 2019, it ceded control of the HTML specification to the WHATWG.[31]
The World Wide Web has been central to the development of the Information Age and is the primary tool billions of people use to interact on the Internet.[32][33][34][10]
Nomenclature
[edit]Tim Berners-Lee states that World Wide Web is officially spelled as three separate words, each capitalised, with no intervening hyphens.[35] Use of the www prefix has been declining, especially when web applications sought to brand their domain names and make them easily pronounceable. As the mobile web grew in popularity,[36] services like Gmail.com, Outlook.com, Myspace.com, Facebook.com and Twitter.com are most often mentioned without adding "www." (or, indeed, ".com") to the domain.[37]
In English, www is usually read as double-u double-u double-u.[38] Some users pronounce it dub-dub-dub, particularly in New Zealand.[39] Stephen Fry, in his "Podgrams" series of podcasts, pronounces it wuh wuh wuh.[40] The English writer Douglas Adams once quipped in The Independent on Sunday (1999): "The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for".[41]
Function
[edit]
The terms Internet and World Wide Web are often used without much distinction. However, the two terms do not mean the same thing. The Internet is a global system of computer networks interconnected through telecommunications and optical networking. In contrast, the World Wide Web is a global collection of documents and other resources, linked by hyperlinks and URIs. Web resources are accessed using HTTP or HTTPS, which are application-level Internet protocols that use the Internet transport protocols.[3]
Viewing a web page on the World Wide Web normally begins either by typing the URL of the page into a web browser or by following a hyperlink to that page or resource. The web browser then initiates a series of background communication messages to fetch and display the requested page. In the 1990s, using a browser to view web pages—and to move from one web page to another through hyperlinks—came to be known as 'browsing,' 'web surfing' (after channel surfing), or 'navigating the Web'. Early studies of this new behaviour investigated user patterns in using web browsers. One study, for example, found five user patterns: exploratory surfing, window surfing, evolved surfing, bounded navigation and targeted navigation.[42]
The following example demonstrates the functioning of a web browser when accessing a page at the URL http://example.org/home.html. The browser resolves the server name of the URL (example.org) into an Internet Protocol address using the globally distributed Domain Name System (DNS). This lookup returns an IP address such as 203.0.113.4 or 2001:db8:2e::7334. The browser then requests the resource by sending an HTTP request across the Internet to the computer at that address. It requests service from a specific TCP port number that is well known for the HTTP service so that the receiving host can distinguish an HTTP request from other network protocols it may be servicing. HTTP normally uses port number 80 and for HTTPS it normally uses port number 443. The content of the HTTP request can be as simple as two lines of text:
GET /home.html HTTP/1.1
Host: example.org
The computer receiving the HTTP request delivers it to web server software listening for requests on port 80. If the web server can fulfil the request it sends an HTTP response back to the browser indicating success:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
followed by the content of the requested page. Hypertext Markup Language (HTML) for a basic web page might look like this:
<html>
<head>
<title>Example.org – The World Wide Web</title>
</head>
<body>
<p>The World Wide Web, abbreviated as WWW and commonly known ...</p>
</body>
</html>
The web browser parses the HTML and interprets the markup (<title>, <p> for paragraph, and such) that surrounds the words to format the text on the screen. Many web pages use HTML to reference the URLs of other resources such as images, other embedded media, scripts that affect page behaviour, and Cascading Style Sheets that affect page layout. The browser makes additional HTTP requests to the web server for these other Internet media types. As it receives their content from the web server, the browser progressively renders the page onto the screen as specified by its HTML and these additional resources.
HTML
[edit]Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of cornerstone technologies for the World Wide Web.[43]
Web browsers receive HTML documents from a web server or from local storage and render the documents into multimedia web pages. HTML describes the structure of a web page semantically and originally included cues for the appearance of the document.
HTML elements are the building blocks of HTML pages. With HTML constructs, images and other objects such as interactive forms may be embedded into the rendered page. HTML provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items. HTML elements are delineated by tags, written using angle brackets. Tags such as <img /> and <input /> directly introduce content into the page. Other tags such as <p> surround and provide information about document text and may include other tags as sub-elements. Browsers do not display the HTML tags, but use them to interpret the content of the page.
HTML can embed programs written in a scripting language such as JavaScript, which affects the behaviour and content of web pages. Inclusion of CSS defines the look and layout of content. The World Wide Web Consortium (W3C), maintainer of both the HTML and the CSS standards, has encouraged the use of CSS over explicit presentational HTML since 1997.[update][44]
Linking
[edit]Most web pages contain hyperlinks to other related pages and perhaps to downloadable files, source documents, definitions and other web resources. In the underlying HTML, a hyperlink looks like this:
<a href="http://example.org/home.html">Example.org Homepage</a>.

Such a collection of useful, related resources, interconnected via hypertext links is dubbed a web of information. Publication on the Internet created what Tim Berners-Lee first called the WorldWideWeb (in its original CamelCase, which was subsequently discarded) in November 1990.[45]
The hyperlink structure of the web is described by the webgraph: the nodes of the web graph correspond to the web pages (or URLs) the directed edges between them to the hyperlinks. Over time, many web resources pointed to by hyperlinks disappear, relocate, or are replaced with different content. This makes hyperlinks obsolete, a phenomenon referred to in some circles as link rot, and the hyperlinks affected by it are often called "dead" links. The ephemeral nature of the Web has prompted many efforts to archive websites. The Internet Archive, active since 1996, is the best known of such efforts.
www prefix
[edit]Many hostnames used for the World Wide Web begin with www because of the long-standing practice of naming Internet hosts according to the services they provide. The hostname of a web server is often www, in the same way that it may be ftp for an FTP server, and news or nntp for a Usenet news server. These hostnames appear as Domain Name System (DNS) or subdomain names, as in www.example.com. The use of www is not required by any technical or policy standard and many websites do not use it; the first web server was nxoc01.cern.ch.[46] According to Paolo Palazzi, who worked at CERN along with Tim Berners-Lee, the popular use of www as subdomain was accidental; the World Wide Web project page was intended to be published at www.cern.ch while info.cern.ch was intended to be the CERN home page; however the DNS records were never switched, and the practice of prepending www to an institution's website domain name was subsequently copied.[47][better source needed] Many established websites still use the prefix, or they employ other subdomain names such as www2, secure or en for special purposes. Many such web servers are set up so that both the main domain name (e.g., example.com) and the www subdomain (e.g., www.example.com) refer to the same site; others require one form or the other, or they may map to different web sites. The use of a subdomain name is useful for load balancing incoming web traffic by creating a CNAME record that points to a cluster of web servers. Since, currently[as of?], only a subdomain can be used in a CNAME, the same result cannot be achieved by using the bare domain root.[48][dubious – discuss]
When a user submits an incomplete domain name to a web browser in its address bar input field, some web browsers automatically try adding the prefix "www" to the beginning of it and possibly ".com", ".org" and ".net" at the end, depending on what might be missing. For example, entering "microsoft" may be transformed to http://www.microsoft.com/ and "openoffice" to http://www.openoffice.org. This feature started appearing in early versions of Firefox, when it still had the working title 'Firebird' in early 2003, from an earlier practice in browsers such as Lynx.[49][unreliable source?] It is reported that Microsoft was granted a US patent for the same idea in 2008, but only for mobile devices.[50]
Scheme specifiers
[edit]The scheme specifiers http:// and https:// at the start of a web URI refer to Hypertext Transfer Protocol or HTTP Secure, respectively. They specify the communication protocol to use for the request and response. The HTTP protocol is fundamental to the operation of the World Wide Web, and the added encryption layer in HTTPS is essential when browsers send or retrieve confidential data, such as passwords or banking information. Web browsers usually automatically prepend http:// to user-entered URIs, if omitted.[citation needed]
Pages
[edit]
A web page (also written as webpage) is a document that is suitable for the World Wide Web and web browsers. A web browser displays a web page on a monitor or mobile device.
The term web page usually refers to what is visible, but may also refer to the contents of the computer file itself, which is usually a text file containing hypertext written in HTML or a comparable markup language. Typical web pages provide hypertext for browsing to other web pages via hyperlinks, often referred to as links. Web browsers will frequently have to access multiple web resource elements, such as reading style sheets, scripts, and images, while presenting each web page.
On a network, a web browser can retrieve a web page from a remote web server. The web server may restrict access to a private network such as a corporate intranet. The web browser uses the Hypertext Transfer Protocol (HTTP) to make such requests to the web server.
A static web page is delivered exactly as stored, as web content in the web server's file system. In contrast, a dynamic web page is generated by a web application, usually driven by server-side software. Dynamic web pages are used when each user may require completely different information, for example, bank websites, web email etc.
Static page
[edit]A static web page (sometimes called a flat page/stationary page) is a web page that is delivered to the user exactly as stored, in contrast to dynamic web pages which are generated by a web application.
Consequently, a static web page displays the same information for all users, from all contexts, subject to modern capabilities of a web server to negotiate content-type or language of the document where such versions are available and the server is configured to do so.
Dynamic pages
[edit]
A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, including the setting up of more client-side processing.
A client-side dynamic web page processes the web page using JavaScript running in the browser. JavaScript programs can interact with the document via Document Object Model, or DOM, to query page state and alter it. The same client-side techniques can then dynamically update or change the DOM in the same way.
A dynamic web page is then reloaded by the user or by a computer program to change some variable content. The updating information could come from the server, or from changes made to that page's DOM. This may or may not truncate the browsing history or create a saved version to go back to, but a dynamic web page update using Ajax technologies will neither create a page to go back to nor truncate the web browsing history forward of the displayed page. Using Ajax technologies the end user gets one dynamic page managed as a single page in the web browser while the actual web content rendered on that page can vary. The Ajax engine sits only on the browser requesting parts of its DOM, the DOM, for its client, from an application server.
Dynamic HTML, or DHTML, is the umbrella term for technologies and methods used to create web pages that are not static web pages, though it has fallen out of common use since the popularisation of AJAX, a term which is now itself rarely used. Client-side-scripting, server-side scripting, or a combination of these make for the dynamic web experience in a browser.[citation needed]
JavaScript is a scripting language that was initially developed in 1995 by Brendan Eich, then of Netscape, for use within web pages.[51] The standardised version is ECMAScript.[51] To make web pages more interactive, some web applications also use JavaScript techniques such as Ajax (asynchronous JavaScript and XML). Client-side script is delivered with the page that can make additional HTTP requests to the server, either in response to user actions such as mouse movements or clicks, or based on elapsed time. The server's responses are used to modify the current page rather than creating a new page with each response, so the server needs only to provide limited, incremental information. Multiple Ajax requests can be handled at the same time, and users can interact with the page while data is retrieved. Web pages may also regularly poll the server to check whether new information is available.[52]
Website
[edit]
A website[53] is a collection of related web resources including web pages, multimedia content, typically identified with a common domain name, and published on at least one web server. Notable examples are wikipedia.org, google.com, and amazon.com.
A website may be accessible via a public Internet Protocol (IP) network, such as the Internet, or a private local area network (LAN), by referencing a uniform resource locator (URL) that identifies the site.
Websites can have many functions and can be used in various fashions; a website can be a personal website, a corporate website for a company, a government website, an organisation website, etc. Websites are typically dedicated to a particular topic or purpose, ranging from entertainment and social networking to providing news and education. All publicly accessible websites collectively constitute the World Wide Web, while private websites, such as a company's website for its employees, are typically a part of an intranet.
Web pages, which are the building blocks of websites, are documents, typically composed in plain text interspersed with formatting instructions of Hypertext Markup Language (HTML, XHTML). They may incorporate elements from other websites with suitable markup anchors. Web pages are accessed and transported with the Hypertext Transfer Protocol (HTTP), which may optionally employ encryption (HTTP Secure, HTTPS) to provide security and privacy for the user. The user's application, often a web browser, renders the page content according to its HTML markup instructions onto a display terminal.
Hyperlinking between web pages conveys to the reader the site structure and guides the navigation of the site, which often starts with a home page containing a directory of the site web content. Some websites require user registration or subscription to access content. Examples of subscription websites include many business sites, news websites, academic journal websites, gaming websites, file-sharing websites, message boards, web-based email, social networking websites, websites providing real-time price quotations for different types of markets, as well as sites providing various other services. End users can access websites on a range of devices, including desktop and laptop computers, tablet computers, smartphones and smart TVs.
Browser
[edit]A web browser (commonly referred to as a browser) is a software user agent for accessing information on the World Wide Web. To connect to a website's server and display its pages, a user needs to have a web browser program. This is the program that the user runs to download, format, and display a web page on the user's computer.
In addition to allowing users to find, display, and move between web pages, a web browser will usually have features like keeping bookmarks, recording history, managing cookies (see below), and home pages and may have facilities for recording passwords for logging into websites.
The most popular browsers are Chrome, Safari, Edge, Samsung Internet and Firefox.[54]
Server
[edit]
A Web server is server software, or hardware dedicated to running said software, that can satisfy World Wide Web client requests. A web server can, in general, contain one or more websites. A web server processes incoming network requests over HTTP and several other related protocols.
The primary function of a web server is to store, process and deliver web pages to clients.[55] The communication between client and server takes place using the Hypertext Transfer Protocol (HTTP). Pages delivered are most frequently HTML documents, which may include images, style sheets and scripts in addition to the text content.

A user agent, commonly a web browser or web crawler, initiates communication by making a request for a specific resource using HTTP and the server responds with the content of that resource or an error message if unable to do so. The resource is typically a real file on the server's secondary storage, but this is not necessarily the case and depends on how the webserver is implemented.
While the primary function is to serve content, full implementation of HTTP also includes ways of receiving content from clients. This feature is used for submitting web forms, including uploading of files.
Many generic web servers also support scripting using Active Server Pages (ASP), PHP (Hypertext Preprocessor), or other scripting languages. This means that the behaviour of the webserver can be scripted in separate files, while the actual server software remains unchanged. Usually, this function is used to generate HTML documents dynamically ("on-the-fly") as opposed to returning static documents. The former is primarily used for retrieving or modifying information from databases. The latter is typically much faster and more easily cached but cannot deliver dynamic content.
Web servers can also frequently be found embedded in devices such as printers, routers, webcams and serving only a local network. The web server may then be used as a part of a system for monitoring or administering the device in question. This usually means that no additional software has to be installed on the client computer since only a web browser is required (which now is included with most operating systems).
Optical Networking
[edit]Optical networking is a sophisticated infrastructure that utilises optical fibre to transmit data over long distances, connecting countries, cities, and even private residences. The technology uses optical microsystems like tunable lasers, filters, attenuators, switches, and wavelength-selective switches to manage and operate these networks.[56][57]
The large quantity of optical fibre installed throughout the world at the end of the twentieth century set the foundation of the Internet as it is used today. The information highway relies heavily on optical networking, a method of sending messages encoded in light to relay information in various telecommunication networks.[58]
The Advanced Research Projects Agency Network (ARPANET) was one of the first iterations of the Internet, created in collaboration with universities and researchers 1969.[59][60][61][62] However, access to the ARPANET was limited to researchers, and in 1985, the National Science Foundation founded the National Science Foundation Network (NSFNET), a program that provided supercomputer access to researchers.[62]
Limited public access to the Internet led to pressure from consumers and corporations to privatise the network. In 1993, the US passed the National Information Infrastructure Act, which dictated that the National Science Foundation must hand over control of the optical capabilities to commercial operators.[63][64]
The privatisation of the Internet and the release of the World Wide Web to the public in 1993 led to an increased demand for Internet capabilities. This spurred developers to seek solutions to reduce the time and cost of laying new fibre and increase the amount of information that can be sent on a single fibre, in order to meet the growing needs of the public.[65][66][67][68]
In 1994, Pirelli S.p.A.'s optical components division introduced a wavelength-division multiplexing (WDM) system to meet growing demand for increased data transmission. This four-channel WDM technology allowed more information to be sent simultaneously over a single optical fibre, effectively boosting network capacity.[69][70]
Pirelli wasn't the only company that developed a WDM system; another company, the Ciena Corporation (Ciena), created its own technology to transmit data more efficiently. David Huber, an optical networking engineer and entrepreneur Kevin Kimberlin founded Ciena in 1992.[71][72][73] Drawing on laser technology from Gordon Gould and William Culver of Optelecom, Inc., the company focused on utilising optical amplifiers to transmit data via light.[74][75][76] Under chief executive officer Pat Nettles, Ciena developed a dual-stage optical amplifier for dense wavelength-division multiplexing (DWDM), patented in 1997 and deployed on the Sprint network in 1996.[77][78][79][80][81]
Cookie
[edit]An HTTP cookie (also called web cookie, Internet cookie, browser cookie, or simply cookie) is a small piece of data sent from a website and stored on the user's computer by the user's web browser while the user is browsing. Cookies were designed to be a reliable mechanism for websites to remember stateful information (such as items added in the shopping cart in an online store) or to record the user's browsing activity (including clicking particular buttons, logging in, or recording which pages were visited in the past). They can also be used to remember arbitrary pieces of information that the user previously entered into form fields such as names, addresses, passwords, and credit card numbers.
Cookies perform essential functions in the modern web. Perhaps most importantly, authentication cookies are the most common method used by web servers to know whether the user is logged in or not, and which account they are logged in with. Without such a mechanism, the site would not know whether to send a page containing sensitive information or require the user to authenticate themselves by logging in. The security of an authentication cookie generally depends on the security of the issuing website and the user's web browser, and on whether the cookie data is encrypted. Security vulnerabilities may allow a cookie's data to be read by a hacker, used to gain access to user data, or used to gain access (with the user's credentials) to the website to which the cookie belongs (see cross-site scripting and cross-site request forgery for examples).[82]
Tracking cookies, and especially third-party tracking cookies, are commonly used as ways to compile long-term records of individuals' browsing histories – a potential privacy concern that prompted European[83] and U.S. lawmakers to take action in 2011.[84][85] European law requires that all websites targeting European Union member states gain "informed consent" from users before storing non-essential cookies on their device.
Google Project Zero researcher Jann Horn describes ways cookies can be read by intermediaries, like Wi-Fi hotspot providers. When in such circumstances, he recommends using the browser in private browsing mode (widely known as Incognito mode in Google Chrome).[86]
Search engine
[edit]
A web search engine or Internet search engine is a software system that is designed to carry out web search (Internet search), which means to search the World Wide Web in a systematic way for particular information specified in a web search query. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of web pages, images, videos, infographics, articles, research papers, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Internet content that is not capable of being searched by a web search engine is generally described as the deep web.
In 1990, Archie, the world's first search engine, was released. The technology was originally an index of File Transfer Protocol (FTP) sites, which was a method for moving files between a client and a server network.[87][88] This early search tool was superseded by more advanced engines like Yahoo! in 1995 and Google in 1998.[89][90]
Deep web
[edit]The deep web,[91] invisible web,[92] or hidden web[93] are parts of the World Wide Web whose contents are not indexed by standard web search engines. The opposite term to the deep web is the surface web, which is accessible to anyone using the Internet.[94] Computer scientist Michael K. Bergman is credited with coining the term deep web in 2001 as a search indexing term.[95]
The content of the deep web is hidden behind HTTP forms,[96][97] and includes many very common uses such as web mail, online banking, and services that users must pay for, and which is protected by a paywall, such as video on demand, some online magazines and newspapers, among others.
The content of the deep web can be located and accessed by a direct URL or IP address and may require a password or other security access past the public website page.
Caching
[edit]A web cache is a server computer located either on the public Internet or within an enterprise that stores recently accessed web pages to improve response time for users when the same content is requested within a certain time after the original request. Most web browsers also implement a browser cache by writing recently obtained data to a local data storage device. HTTP requests by a browser may ask only for data that has changed since the last access. Web pages and resources may contain expiration information to control caching to secure sensitive data, such as in online banking, or to facilitate frequently updated sites, such as news media. Even sites with highly dynamic content may permit basic resources to be refreshed only occasionally. Web site designers find it worthwhile to collate resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. Enterprise firewalls often cache Web resources requested by one user for the benefit of many users. Some search engines store cached content of frequently accessed websites.
Security
[edit]For criminals, the Web has become a venue to spread malware and engage in a range of cybercrime, including (but not limited to) identity theft, fraud, espionage, and intelligence gathering.[98] Web-based vulnerabilities now outnumber traditional computer security concerns,[99][100] and as measured by Google, about one in ten web pages may contain malicious code.[101] Most web-based attacks take place on legitimate websites, and most, as measured by Sophos, are hosted in the United States, China and Russia.[102] The most common of all malware threats is SQL injection attacks against websites.[103] Through HTML and URIs, the Web was vulnerable to attacks like cross-site scripting (XSS) that came with the introduction of JavaScript[104] and were exacerbated to some degree by Web 2.0 and Ajax web design that favours the use of scripts.[105] In one 2007 estimate, 70% of all websites are open to XSS attacks on their users.[106] Phishing is another common threat to the Web. In February 2013, RSA (the security division of EMC) estimated the global losses from phishing at $1.5 billion in 2012.[107] Two of the well-known phishing methods are Covert Redirect and Open Redirect.
Proposed solutions vary. Large security companies like McAfee already design governance and compliance suites to meet post-9/11 regulations,[108] and some, like Finjan Holdings have recommended active real-time inspection of programming code and all content regardless of its source.[98] Some have argued that for enterprises to see Web security as a business opportunity rather than a cost centre,[109] while others call for "ubiquitous, always-on digital rights management" enforced in the infrastructure to replace the hundreds of companies that secure data and networks.[110] Jonathan Zittrain has said users sharing responsibility for computing safety is far preferable to locking down the Internet.[111]
Privacy
[edit]Every time a client requests a web page, the server can identify the request's IP address. Web servers usually log IP addresses in a log file. Also, unless set not to do so, most web browsers record requested web pages in a viewable history feature, and usually cache much of the content locally. Unless the server-browser communication uses HTTPS encryption, web requests and responses travel in plain text across the Internet and can be viewed, recorded, and cached by intermediate systems. Another way to hide personally identifiable information is by using a virtual private network. A VPN encrypts traffic between the client and VPN server, and masks the original IP address, lowering the chance of user identification.
When a web page asks for, and the user supplies, personally identifiable information—such as their real name, address, e-mail address, etc. web-based entities can associate current web traffic with that individual. If the website uses HTTP cookies, username, and password authentication, or other tracking techniques, it can relate other web visits, before and after, to the identifiable information provided. In this way, a web-based organisation can develop and build a profile of the individual people who use its site or sites. It may be able to build a record for an individual that includes information about their leisure activities, their shopping interests, their profession, and other aspects of their demographic profile. These profiles are of potential interest to marketers, advertisers, and others. Depending on the website's terms and conditions and the local laws that apply information from these profiles may be sold, shared, or passed to other organisations without the user being informed. For many ordinary people, this means little more than some unexpected emails in their inbox or some uncannily relevant advertising on a future web page. For others, it can mean that time spent indulging an unusual interest can result in a deluge of further targeted marketing that may be unwelcome. Law enforcement, counterterrorism, and espionage agencies can also identify, target, and track individuals based on their interests or proclivities on the Web.
Social networking sites usually try to get users to use their real names, interests, and locations, rather than pseudonyms, as their executives believe that this makes the social networking experience more engaging for users. On the other hand, uploaded photographs or unguarded statements can be identified to an individual, who may regret this exposure. Employers, schools, parents, and other relatives may be influenced by aspects of social networking profiles, such as text posts or digital photos, that the posting individual did not intend for these audiences. Online bullies may make use of personal information to harass or stalk users. Modern social networking websites allow fine-grained control of the privacy settings for each posting, but these can be complex and not easy to find or use, especially for beginners.[112] Photographs and videos posted onto websites have caused particular problems, as they can add a person's face to an online profile. With modern and potential facial recognition technology, it may then be possible to relate that face with other, previously anonymous, images, events, and scenarios that have been imaged elsewhere. Due to image caching, mirroring, and copying, it is difficult to remove an image from the World Wide Web.
Standards
[edit]Web standards include many interdependent standards and specifications, some of which govern aspects of the Internet, not just the World Wide Web. Even when not web-focused, such standards directly or indirectly affect the development and administration of websites and web services. Considerations include the interoperability, accessibility and usability of web pages and web sites.
Web standards, in the broader sense, consist of the following:
- Recommendations published by the World Wide Web Consortium (W3C)[113]
- "Living Standard" made by the Web Hypertext Application Technology Working Group (WHATWG)
- Request for Comments (RFC) documents published by the Internet Engineering Task Force (IETF)[114]
- Standards published by the International Organization for Standardization (ISO)[115]
- Standards published by Ecma International (formerly ECMA)[116]
- The Unicode Standard and various Unicode Technical Reports (UTRs) published by the Unicode Consortium[117]
- Name and number registries maintained by the Internet Assigned Numbers Authority (IANA)[118]
Web standards are not fixed sets of rules but are constantly evolving sets of finalised technical specifications of web technologies.[119] Web standards are developed by standards organisations—groups of interested and often competing parties chartered with the task of standardisation—not technologies developed and declared to be a standard by a single individual or company. It is crucial to distinguish those specifications that are under development from the ones that already reached the final development status (in the case of W3C specifications, the highest maturity level).
Accessibility
[edit]There are methods for accessing the Web in alternative mediums and formats to facilitate use by individuals with disabilities. These disabilities may be visual, auditory, physical, speech-related, cognitive, neurological, or some combination. Accessibility features also help people with temporary disabilities, like a broken arm, or ageing users as their abilities change.[120] The Web is receiving information as well as providing information and interacting with society. The World Wide Web Consortium claims that it is essential that the Web be accessible, so it can provide equal access and equal opportunity to people with disabilities.[121] Tim Berners-Lee once noted, "The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect."[120] Many countries regulate web accessibility as a requirement for websites.[122] International co-operation in the W3C Web Accessibility Initiative led to simple guidelines that web content authors as well as software developers can use to make the Web accessible to persons who may or may not be using assistive technology.[120][123]
Internationalisation
[edit]
The W3C Internationalisation Activity assures that web technology works in all languages, scripts, and cultures.[124] Beginning in 2004 or 2005, Unicode gained ground and eventually in December 2007 surpassed both ASCII and Western European as the Web's most frequently used character map.[125] Originally RFC 3986 allowed resources to be identified by URI in a subset of US-ASCII. RFC 3987 allows more characters—any character in the Universal Character Set—and now a resource can be identified by IRI in any language.[126]
See also
[edit]- Decentralized web
- Electronic publishing
- Electronic literature
- Gopher (protocol), an early alternative to the WWW
- Internet metaphors
- Internet security
- Lists of websites
- Minitel, a predecessor of the WWW
- Streaming media
- Web 1.0
- Web 2.0
- Web 3.0
- Web3
- Web3D
- Web development tools
- Web literacy
References
[edit]- ^ "World Wide Web - MDN Web Docs Glossary: Definitions of Web-related terms | MDN". developer.mozilla.org. Retrieved 25 April 2023.
- ^ Wright, Edmund, ed. (2006). The Desk Encyclopedia of World History. New York: Oxford University Press. p. 312. ISBN 978-0-7394-7809-7.
- ^ a b c "What is the difference between the Web and the Internet?". W3C Help and FAQ. W3C. 2009. Archived from the original on 9 July 2015. Retrieved 16 July 2015.
- ^ "World Wide Web (WWW) launches in the public domain | April 30, 1993". HISTORY. 30 March 2020. Archived from the original on 6 February 2025. Retrieved 21 January 2025.
- ^ a b c Berners-Lee, Tim. "Information Management: A Proposal". w3.org. The World Wide Web Consortium. Archived from the original on 1 April 2010. Retrieved 12 February 2022.
- ^ "The World's First Web Site". HISTORY. 30 August 2009. Archived from the original on 19 August 2023. Retrieved 4 August 2016.
- ^ Bleigh, Michael (16 May 2014). "The Once And Future Web Platform". TechCrunch. Archived from the original on 5 December 2021. Retrieved 9 March 2022.
- ^ "World Wide Web Timeline". Pews Research Center. 11 March 2014. Archived from the original on 29 July 2015. Retrieved 1 August 2015.
- ^ Dewey, Caitlin (12 March 2014). "36 Ways The Web Has Changed Us". The Washington Post. Archived from the original on 9 September 2015. Retrieved 1 August 2015.
- ^ a b "Internet Live Stats". internetlivestats.com. Archived from the original on 2 July 2015. Retrieved 1 August 2015.
- ^ a b Quittner, Joshua (29 March 1999). "Network Designer Tim Berners-Lee". Time Magazine. Archived from the original on 15 August 2007. Retrieved 17 May 2010.
He wove the World Wide Web and created a mass medium for the 21st century. The World Wide Web is Berners-Lee's alone. He designed it. He set it loose it on the world. And he more than anyone else has fought to keep it an open, non-proprietary and free.
[page needed] - ^ a b McPherson, Stephanie Sammartino (2009). Tim Berners-Lee: Inventor of the World Wide Web. Twenty-First Century Books. ISBN 978-0-8225-7273-2.
- ^ Rutter, Dorian (2005). From Diversity to Convergence: British Computer Networks and the Internet, 1970-1995 (PDF) (Computer Science thesis). The University of Warwick. Archived (PDF) from the original on 10 October 2022. Retrieved 27 December 2022.
When Berners-Lee developed his Enquire hypertext system during 1980, the ideas explored by Bush, Engelbart, and Nelson did not influence his work, as he was not aware of them. However, as Berners-Lee began to refine his ideas, the work of these predecessors would later confirm the legitimacy of his system.
- ^ Tim Berners-Lee (1999). Weaving the Web. Internet Archive. HarperSanFrancisco. pp. 5–6. ISBN 978-0-06-251586-5.
Unbeknownst to me at that early stage in my thinking, several people had hit upon similar concepts, which were never implemented.
- ^ Berners-Lee, T.; Cailliau, R.; Groff, J.-F.; Pollermann, B. (1992). "World-Wide Web: The Information Universe". Electron. Netw. Res. Appl. Policy. 2: 52–58. doi:10.1108/eb047254. ISSN 1066-2243. Archived from the original on 27 December 2022. Retrieved 27 December 2022.
- ^ W3 (1991) Re: Qualifiers on Hypertext links Archived 7 December 2021 at the Wayback Machine
- ^ Hopgood, Bob. "History of the Web". w3.org. The World Wide Web Consortium. Archived from the original on 21 March 2022. Retrieved 12 February 2022.
- ^ "A short history of the Web". CERN. Archived from the original on 17 April 2022. Retrieved 15 April 2022.
- ^ "30 years of the web: a short history of the invention that changed the world". British Council. Retrieved 19 September 2025.
Berners-Lee and others worked to ensure that CERN would make the underlying code available on a royalty-free basis, forever. This decision was announced in April 1993, and sparked a global wave of creativity, collaboration and innovation.
- ^ "Software release of WWW into public domain". CERN Document Server. CERN. 30 April 1993. Archived from the original on 17 February 2022. Retrieved 17 February 2022.
- ^ "Ten Years Public Domain for the Original Web Software". Tenyears-www.web.cern.ch. 30 April 2003. Archived from the original on 13 August 2009. Retrieved 27 July 2009.
- ^ Calore, Michael (22 April 2010). "April 22, 1993: Mosaic Browser Lights Up Web With Color, Creativity". Wired. Archived from the original on 24 April 2018. Retrieved 12 February 2022.
- ^ Couldry, Nick (2012). Media, Society, World: Social Theory and Digital Media Practice. London: Polity Press. p. 2. ISBN 9780745639208. Archived from the original on 27 February 2024. Retrieved 11 December 2020.
- ^ Hoffman, Jay (21 April 1993). "The Origin of the IMG Tag". The History of the Web. Archived from the original on 13 February 2022. Retrieved 13 February 2022.
- ^ Clarke, Roger. "The Birth of Web Commerce". Roger Clarke's Web-Site. XAMAX. Archived from the original on 15 February 2022. Retrieved 15 February 2022.
- ^ McCullough, Brian. "20 YEARS ON: WHY NETSCAPE'S IPO WAS THE "BIG BANG" OF THE INTERNET ERA". www.internethistorypodcast.com. INTERNET HISTORY PODCAST. Archived from the original on 12 February 2022. Retrieved 12 February 2022.
- ^ Calore, Michael (28 September 2009). "Sept. 28, 1998: Internet Explorer Leaves Netscape in Its Wake". Wired. Archived from the original on 30 November 2021. Retrieved 14 February 2022.
- ^ Daly, Janet (26 January 2000). "World Wide Web Consortium Issues XHTML 1.0 as a Recommendation". W3C. Archived from the original on 20 June 2021. Retrieved 8 March 2022.
- ^ Hickson, Ian. "WHAT open mailing list announcement". whatwg.org. WHATWG. Archived from the original on 8 March 2022. Retrieved 16 February 2022.
- ^ Shankland, Stephen (9 July 2009). "An epitaph for the Web standard, XHTML 2". CNet. Archived from the original on 16 February 2022. Retrieved 17 February 2022.
- ^ "Memorandum of Understanding Between W3C and WHATWG". W3C. Archived from the original on 29 May 2019. Retrieved 16 February 2022.
- ^ In, Lee (30 June 2012). Electronic Commerce Management for Business Activities and Global Enterprises: Competitive Advantages: Competitive Advantages. IGI Global. ISBN 978-1-4666-1801-5. Archived from the original on 21 April 2024. Retrieved 27 September 2020.
- ^ Misiroglu, Gina (26 March 2015). American Countercultures: An Encyclopedia of Nonconformists, Alternative Lifestyles, and Radical Ideas in U.S. History: An Encyclopedia of Nonconformists, Alternative Lifestyles, and Radical Ideas in U.S. History. Routledge. ISBN 978-1-317-47729-7. Archived from the original on 21 April 2024. Retrieved 27 September 2020.
- ^ "World Wide Web Timeline". Pew Research Center. 11 March 2014. Archived from the original on 29 July 2015. Retrieved 1 August 2015.
- ^ "Frequently asked questions - Spelling of WWW". W3C. Archived from the original on 2 August 2009. Retrieved 27 July 2009.
- ^ "Percentage of mobile device website traffic worldwide from 1st quarter 2015 to 4th quarter 2024". Statista. Retrieved 17 April 2025.
- ^ Castelluccio, Michael (1 October 2010). "It's not your grandfather's Internet". Strategic Finance. Institute of Management Accountants. Archived from the original on 5 March 2016. Retrieved 7 February 2016 – via The Free Library.
- ^ "Audible pronunciation of 'WWW'". Oxford University Press. Archived from the original on 25 May 2014. Retrieved 25 May 2014.
- ^ Harvey, Charlie (18 August 2015). "How we pronounce WWW in English: a detailed but unscientific survey". charlieharvey.org.uk. Archived from the original on 19 November 2022. Retrieved 19 May 2022.
- ^ "Stephen Fry's pronunciation of 'WWW'". Podcasts.com. Archived from the original on 4 April 2017.
- ^ Simonite, Tom (22 July 2008). "Help us find a better way to pronounce www". newscientist.com. New Scientist, Technology. Archived from the original on 13 March 2016. Retrieved 7 February 2016.
- ^ Muylle, Steve; Moenaert, Rudy; Despont, Marc (1999). "A grounded theory of World Wide Web search behaviour". Journal of Marketing Communications. 5 (3): 143. doi:10.1080/135272699345644.
- ^ Flanagan, David. JavaScript – The definitive guide (6 ed.). p. 1.
JavaScript is part of the triad of technologies that all Web developers must learn: HTML to specify the content of web pages, CSS to specify the presentation of web pages, and JavaScript to specify the behaviour of web pages.
- ^ "HTML 4.0 Specification – W3C Recommendation – Conformance: requirements and recommendations". World Wide Web Consortium. 18 December 1997. Archived from the original on 5 July 2015. Retrieved 6 July 2015.
- ^ Berners-Lee, Tim; Cailliau, Robert (12 November 1990). "WorldWideWeb: Proposal for a HyperText Project". Archived from the original on 2 May 2015. Retrieved 12 May 2015.
- ^ Berners-Lee, Tim. "Frequently asked questions by the Press". W3C. Archived from the original on 2 August 2009. Retrieved 27 July 2009.
- ^ Palazzi, P (2011). "The Early Days of the WWW at CERN". Archived from the original on 23 July 2012.
- ^ Fraser, Dominic (13 May 2018). "Why a domain's root can't be a CNAME – and other tidbits about the DNS". FreeCodeCamp. Archived from the original on 21 April 2024. Retrieved 12 March 2019.
- ^ "automatically adding www.___.com". mozillaZine. 16 May 2003. Archived from the original on 27 June 2009. Retrieved 27 May 2009.
- ^ Masnick, Mike (7 July 2008). "Microsoft Patents Adding 'www.' And '.com' To Text". Techdirt. Archived from the original on 27 June 2009. Retrieved 27 May 2009.
- ^ a b Hamilton, Naomi (31 July 2008). "The A-Z of Programming Languages: JavaScript". Computerworld. IDG. Archived from the original on 24 May 2009. Retrieved 12 May 2009.
- ^ Buntin, Seth (23 September 2008). "jQuery Polling plugin". Archived from the original on 13 August 2009. Retrieved 22 August 2009.
- ^ "website". TheFreeDictionary.com. Archived from the original on 7 May 2018. Retrieved 2 July 2011.
- ^ "Top Browsers Market Share". www.similarweb.com. Archived from the original on 17 February 2025. Retrieved 15 February 2025.
- ^ Patrick, Killelea (2002). Web performance tuning (2nd ed.). Beijing: O'Reilly. p. 264. ISBN 978-0596001728. OCLC 49502686.
- ^ Liu, Xiang (20 December 2019). "Evolution of Fiber-Optic Transmission and Networking toward the 5G Era". iScience. 22: 489–506. Bibcode:2019iSci...22..489L. doi:10.1016/j.isci.2019.11.026. ISSN 2589-0042. PMC 6920305. PMID 31838439.
- ^ Marom, Dan M. (1 January 2008), Gianchandani, Yogesh B.; Tabata, Osamu; Zappe, Hans (eds.), "3.07 - Optical Communications", Comprehensive Microsystems, Oxford: Elsevier, pp. 219–265, doi:10.1016/b978-044452190-3.00035-5, ISBN 978-0-444-52190-3, archived from the original on 23 January 2025, retrieved 17 January 2025
- ^ Chadha, Devi (2019). Optical WDM networks: from static to elastic networks. Hoboken, NJ: Wiley-IEEE Press. ISBN 978-1-119-39326-9.
- ^ "The Computer History Museum, SRI International, and BBN Celebrate the 40th Anniversary of First ARPANET Transmission, Precursor to Today's Internet | SRI International". 29 March 2019. Archived from the original on 29 March 2019. Retrieved 21 January 2025.
- ^ Markoff, John (24 January 1993). "Building the Electronic Superhighway". The New York Times. ISSN 0362-4331. Retrieved 21 January 2025.
- ^ Abbate, Janet (2000). Inventing the Internet. Inside technology (3rd printing ed.). Cambridge, Mass.: MIT Press. ISBN 978-0-262-51115-5.
- ^ a b "NSFNET: A Partnership for High-Speed Networking" (PDF). www.merit.edu. Archived (PDF) from the original on 6 November 2024. Retrieved 21 January 2025.
- ^ Boucher, Rick (14 September 1993). "H.R.1757 - 103rd Congress (1993-1994): National Information Infrastructure Act of 1993". www.congress.gov. Archived from the original on 10 November 2021. Retrieved 23 January 2025.
- ^ "NSF Shapes the Internet's Evolution | NSF - National Science Foundation". new.nsf.gov. 25 July 2003. Retrieved 23 January 2025.
- ^ Radu, Roxana (2019). "Privatization and Globalization of the Internet". Negotiating Internet Governance. pp. 75–112. doi:10.1093/oso/9780198833079.003.0004. ISBN 978-0-19-883307-9.
- ^ "Birth of the Commercial Internet - NSF Impacts | NSF - National Science Foundation". new.nsf.gov. Retrieved 23 January 2025.
- ^ Markoff, John (3 March 1997). "Fiber-Optic Technology Draws Record Stock Value". The New York Times. ISSN 0362-4331. Archived from the original on 9 October 2019. Retrieved 23 January 2025.
- ^ Korzeniowski, Paul (2 June 1997). "Record growth spurs demand for dense WDM -- Infrastructure bandwidth gears up for next wave". CommunicationsWeek. No. 666. p. T.40. ProQuest 226891627.
- ^ Hecht, Jeff (1999). City of light: the story of fiber optics. The Sloan technology series. New York: Oxford University Press. ISBN 978-0-19-510818-7.
- ^ "Cisco to Acquire Pirelli DWDM Unit for $2.15 Billion". www.fiberopticsonline.com. Retrieved 31 January 2025.
- ^ Hirsch, Stacey (February 2, 2006). "Huber steps down as CEO of Broadwing". The Baltimore Sun.
- ^ "Dr. David Huber". History of the Internet. Retrieved 3 February 2025.
- ^ "Internet Commercialization History". History of the Internet. Retrieved 3 February 2025.
- ^ "May 17, 1993, page 76 - The Baltimore Sun at Baltimore Sun". Newspapers.com. 17 May 1993. p. 76. Archived from the original on 21 February 2025. Retrieved 3 February 2025.
- ^ Hall, Carla. "Inventor Beams over Laser Patents : After 30 Years, Gordon Gould Gets Credit He Deserves." Los Angeles Times, Los Angeles Times, 17 Dec. 1987.
- ^ Chang, Kenneth (20 September 2005). "Gordon Gould, 85, Figure in Invention of the Laser, Dies". The New York Times. ISSN 0362-4331. Archived from the original on 19 September 2017. Retrieved 3 February 2025.
- ^ Carroll, Jim (12 December 2024). "Patrick Nettles Steps Down as Executive Chair of Ciena". Converge Digest. Archived from the original on 14 February 2025. Retrieved 3 February 2025.
- ^ US5696615A, Alexander, Stephen B., "Wavelength division multiplexed optical communication systems employing uniform gain optical amplifiers", issued 9 December 1997
- ^ Hecht, Jeff (2004). City of light: the story of fiber optics. The Sloan technology series (Rev. and expanded ed., 1. paperback [ed.] ed.). Oxford: Oxford Univ. Press. ISBN 978-0-19-510818-7.
- ^ "Optica Publishing Group". opg.optica.org. Archived from the original on 26 January 2025. Retrieved 3 February 2025.
- ^ Wexler, Joanie (25 March 1996). "Sprint boots some users off 'Net". Network World. Vol. 13, no. 13. p. 25. ProQuest 215944575.
- ^ Vamosi, Robert (14 April 2008). "Gmail cookie stolen via Google Spreadsheets". News.cnet.com. Archived from the original on 9 December 2013. Retrieved 19 October 2017.
- ^ "What about the "EU Cookie Directive"?". WebCookies.org. 2013. Archived from the original on 11 October 2017. Retrieved 19 October 2017.
- ^ "New net rules set to make cookies crumble". BBC. 8 March 2011. Archived from the original on 10 August 2018. Retrieved 18 February 2019.
- ^ "Sen. Rockefeller: Get Ready for a Real Do-Not-Track Bill for Online Advertising". Adage.com. 6 May 2011. Archived from the original on 24 August 2011. Retrieved 18 February 2019.
- ^ Want to use my wifi? Archived 4 January 2018 at the Wayback Machine, Jann Horn accessed 5 January 2018.
- ^ Nguyen, Jennimai (10 September 2020). "Archie, the very first search engine, was released 30 years ago today". Mashable. Retrieved 4 February 2025.
- ^ "What is File Transfer Protocol (FTP) meaning". Fortinet. Archived from the original on 26 January 2025. Retrieved 4 February 2025.
- ^ "Britannica Money". www.britannica.com. 4 February 2025. Archived from the original on 27 July 2024. Retrieved 4 February 2025.
- ^ Clark, Andrew (1 February 2008). "How Jerry's guide to the world wide web became Yahoo". The Guardian. ISSN 0261-3077. Archived from the original on 5 October 2013. Retrieved 4 February 2025.
- ^ Hamilton, Nigel (13 May 2024). "The Mechanics of a Deep Net Metasearch Engine". IADIS Digital Library: 1034–1036. ISBN 978-972-98947-0-1. Archived from the original on 31 May 2023. Retrieved 6 May 2024.
- ^ Devine, Jane; Egger-Sider, Francine (July 2004). "Beyond google: the invisible web in the academic library". The Journal of Academic Librarianship. 30 (4): 265–269. doi:10.1016/j.acalib.2004.04.010.
- ^ Raghavan, Sriram; Garcia-Molina, Hector (11–14 September 2001). "Crawling the Hidden Web". 27th International Conference on Very Large Data Bases. Archived from the original on 17 August 2019. Retrieved 18 February 2019.
- ^ "Surface Web". Computer Hope. Archived from the original on 5 May 2020. Retrieved 20 June 2018.
- ^ Wright, Alex (22 February 2009). "Exploring a 'Deep Web' That Google Can't Grasp". The New York Times. Archived from the original on 1 March 2020. Retrieved 23 February 2009.
- ^ Madhavan, J., Ko, D., Kot, Ł., Ganapathy, V., Rasmussen, A., & Halevy, A. (2008). Google's deep web crawl. Proceedings of the VLDB Endowment, 1(2), 1241–52.
- ^ Shedden, Sam (8 June 2014). "How Do You Want Me to Do It? Does It Have to Look like an Accident? – an Assassin Selling a Hit on the Net; Revealed Inside the Deep Web". Sunday Mail. Archived from the original on 1 March 2020. Retrieved 5 May 2017.
- ^ a b Ben-Itzhak, Yuval (18 April 2008). "Infosecurity 2008 – New defence strategy in battle against e-crime". ComputerWeekly. Reed Business Information. Archived from the original on 4 June 2008. Retrieved 20 April 2008.
- ^ Christey, Steve & Martin, Robert A. (22 May 2007). "Vulnerability Type Distributions in CVE (version 1.1)". MITRE Corporation. Archived from the original on 17 March 2013. Retrieved 7 June 2008.
- ^ "Symantec Internet Security Threat Report: Trends for July–December 2007 (Executive Summary)" (PDF). Symantec Internet Security Threat Report. XIII. Symantec Corp.: 1–2 April 2008. Archived from the original (PDF) on 25 June 2008. Retrieved 11 May 2008.
- ^ "Google searches web's dark side". BBC News. 11 May 2007. Archived from the original on 7 March 2008. Retrieved 26 April 2008.
- ^ "Security Threat Report (Q1 2008)" (PDF). Sophos. Archived (PDF) from the original on 31 December 2013. Retrieved 24 April 2008.
- ^ "Security threat report" (PDF). Sophos. July 2008. Archived (PDF) from the original on 31 December 2013. Retrieved 24 August 2008.
- ^ Jeremiah Grossman; Robert "RSnake" Hansen; Petko "pdp" D. Petkov; Anton Rager; Seth Fogie (2007). Cross Site Scripting Attacks: XSS Exploits and Defense (PDF). Syngress, Elsevier Science & Technology. pp. 68–69, 127. ISBN 978-1-59749-154-9. Archived (PDF) from the original on 15 November 2024. Retrieved 23 January 2025.
- ^ O'Reilly, Tim (30 September 2005). "What Is Web 2.0". O'Reilly Media. pp. 4–5. Archived from the original on 28 June 2012. Retrieved 4 June 2008. and AJAX web applications can introduce security vulnerabilities like "client-side security controls, increased attack surfaces, and new possibilities for Cross-Site Scripting (XSS)", in Ritchie, Paul (March 2007). "The security risks of AJAX/web 2.0 applications" (PDF). Infosecurity. Archived from the original (PDF) on 25 June 2008. Retrieved 6 June 2008. which cites Hayre, Jaswinder S. & Kelath, Jayasankar (22 June 2006). "Ajax Security Basics". SecurityFocus. Archived from the original on 15 May 2008. Retrieved 6 June 2008.
- ^ Berinato, Scott (1 January 2007). "Software Vulnerability Disclosure: The Chilling Effect". CSO. CXO Media. p. 7. Archived from the original on 18 April 2008. Retrieved 7 June 2008.
- ^ "2012 Global Losses From phishing Estimated At $1.5 Bn". FirstPost. 20 February 2013. Archived from the original on 21 December 2014. Retrieved 25 January 2019.
- ^ Prince, Brian (9 April 2008). "McAfee Governance, Risk and Compliance Business Unit". eWEEK. Ziff Davis Enterprise Holdings. Archived from the original on 21 April 2024. Retrieved 25 April 2008.
- ^ Preston, Rob (12 April 2008). "Down To Business: It's Past Time To Elevate The Infosec Conversation". InformationWeek. United Business Media. Archived from the original on 14 April 2008. Retrieved 25 April 2008.
- ^ Claburn, Thomas (6 February 2007). "RSA's Coviello Predicts Security Consolidation". InformationWeek. United Business Media. Archived from the original on 7 February 2009. Retrieved 25 April 2008.
- ^ Duffy Marsan, Carolyn (9 April 2008). "How the iPhone is killing the 'Net". Network World. IDG. Archived from the original on 14 April 2008. Retrieved 17 April 2008.
- ^ boyd, danah; Hargittai, Eszter (July 2010). "Facebook privacy settings: Who cares?". First Monday. 15 (8). doi:10.5210/fm.v15i8.3086.
- ^ "W3C Technical Reports and Publications". W3C. Archived from the original on 15 July 2018. Retrieved 19 January 2009.
- ^ "IETF RFC page". IETF. Archived from the original on 2 February 2009. Retrieved 19 January 2009.
- ^ "Search for World Wide Web in ISO standards". ISO. Retrieved 24 June 2025.
- ^ "Ecma formal publications". Ecma. Archived from the original on 27 December 2017. Retrieved 19 January 2009.
- ^ "Unicode Technical Reports". Unicode Consortium. Archived from the original on 2 January 2022. Retrieved 19 January 2009.
- ^ "IANA home page". IANA. Archived from the original on 24 February 2011. Retrieved 19 January 2009.
- ^ Sikos, Leslie (2011). Web standards – Mastering HTML5, CSS3, and XML. Apress. ISBN 978-1-4302-4041-9. Archived from the original on 2 April 2015. Retrieved 12 March 2019.
- ^ a b c "Web Accessibility Initiative (WAI)". World Wide Web Consortium. Archived from the original on 2 April 2009. Retrieved 7 April 2009.
- ^ "Developing a Web Accessibility Business Case for Your Organization: Overview". World Wide Web Consortium. Archived from the original on 14 April 2009. Retrieved 7 April 2009.
- ^ "Legal and Policy Factors in Developing a Web Accessibility Business Case for Your Organization". World Wide Web Consortium. Archived from the original on 5 April 2009. Retrieved 7 April 2009.
- ^ "Web Content Accessibility Guidelines (WCAG) Overview". World Wide Web Consortium. Archived from the original on 1 April 2009. Retrieved 7 April 2009.
- ^ "Internationalization (I18n) Activity". World Wide Web Consortium. Archived from the original on 16 April 2009. Retrieved 10 April 2009.
- ^ Davis, Mark (5 April 2008). "Moving to Unicode 5.1". Archived from the original on 21 May 2009. Retrieved 10 April 2009.
- ^ "World Wide Web Consortium Supports the IETF URI Standard and IRI Proposed Standard" (Press release). World Wide Web Consortium. 26 January 2005. Archived from the original on 7 February 2009. Retrieved 10 April 2009.
Further reading
[edit]- Berners-Lee, Tim; Bray, Tim; Connolly, Dan; Cotton, Paul; Fielding, Roy; Jeckle, Mario; Lilley, Chris; Mendelsohn, Noah; Orchard, David; Walsh, Norman; Williams, Stuart (15 December 2004). "Architecture of the World Wide Web, Volume One". W3C. Version 20041215.
- Berners-Lee, Tim (August 1996). "The World Wide Web: Past, Present and Future". W3C.
- Brügger, Niels, ed, Web25: Histories from the first 25 years of the World Wide Web (Peter Lang, 2017).
- Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P.; Berners-Lee, T. (June 1999). "Hypertext Transfer Protocol – HTTP/1.1". Request For Comments 2616. Information Sciences Institute.
- Niels Brügger, ed. Web History (2010) 362 pages; Historical perspective on the World Wide Web, including issues of culture, content, and preservation.
- Polo, Luciano (2003). "World Wide Web Technology Architecture: A Conceptual Analysis". New Devices.
- Skau, H.O. (March 1990). "The World Wide Web and Health Information". New Devices.
External links
[edit]- The first website
- Early archive of the first Web site
- Internet Statistics: Growth and Usage of the Web and the Internet
- Living Internet A comprehensive history of the Internet, including the World Wide Web
- World Wide Web Consortium (W3C)
- W3C Recommendations Reduce "World Wide Wait"
- World Wide Web Size Daily estimated size of the World Wide Web
- Antonio A. Casilli, Some Elements for a Sociology of Online Interactions
- The Erdős Webgraph Server[usurped] offers weekly updated graph representation of a constantly increasing fraction of the WWW
- The 25th Anniversary of the World Wide Web Archived 11 July 2021 at the Wayback Machine is an animated video produced by USAID and TechChange which explores the role of the WWW in addressing extreme poverty
World Wide Web
View on GrokipediaHistory
Invention by Tim Berners-Lee
Tim Berners-Lee, a British computer scientist employed at CERN, the European Organization for Nuclear Research, proposed the World Wide Web in March 1989 as a distributed hypertext system to facilitate information sharing among physicists across heterogeneous computers and networks.[3] The initial document, titled "Information Management: A Proposal," described a scheme for linking documents via hyperlinks, enabling efficient management of project-related data without reliance on centralized databases.[4] Berners-Lee's supervisor approved the project as a low-risk experiment, noting its vague yet promising nature.[5] A revised proposal in May 1990 incorporated collaboration with CERN colleague Robert Cailliau, emphasizing universal document access through a graphical user interface and network protocols.[6] By late 1990, Berners-Lee implemented the core components on a NeXT computer: the first web server software, named httpd, the first web browser and editor, named WorldWideWeb.app, and the foundational standards including Hypertext Markup Language (HTML) for document structure, Hypertext Transfer Protocol (HTTP) for data transfer, and Uniform Resource Identifiers (URIs) for addressing resources.[7] These elements formed a client-server architecture where documents could be linked and retrieved seamlessly over the existing Internet.[1] The system became operational at CERN by December 1990, with the first webpage—a basic description of the project itself—served from the address http://info.cern.ch.[](https://www.home.cern/science/computing/birth-web/short-history-web) On August 6, 1991, Berners-Lee publicly announced the World Wide Web via a post to the alt.hypertext Usenet newsgroup, releasing the source code for the browser, server, and protocols to encourage adoption and contributions from the research community.[8] This open dissemination marked the transition from internal prototype to a tool available for global experimentation, predating CERN's full public domain dedication of the software in 1993.[9]Early Implementation and Standardization
Tim Berners-Lee implemented the first World Wide Web server, known as httpd, and the first web browser, named WorldWideWeb (later renamed Nexus), on a NeXT computer at CERN by the end of 1990.[10] This implementation enabled the initial communication between a Hypertext Transfer Protocol (HTTP) daemon and a browser, marking the first successful demonstration of hypertext document retrieval over the internet on December 20, 1990.[11] The browser functioned both as a viewer and editor, allowing users to create and link hypertext documents using Hypertext Markup Language (HTML), a simple formatting system Berners-Lee developed based on Standard Generalized Markup Language (SGML).[10] In May 1991, Berners-Lee released the World Wide Web software, including the server, browser, and line-mode browser, to CERN colleagues and the broader internet community via anonymous FTP and Usenet newsgroups, facilitating early adoption and experimentation.[10] The inaugural public website, hosted at http://info.cern.ch, went live on August 6, 1991, providing an overview of the Web's project, setup instructions, and search capabilities for existing documents.[9] This site served as both a demonstration and entry point, explaining the Web's hypertext-based information sharing across distributed computers.[10] Early implementations were rudimentary, supporting basic HTTP/0.9 for simple GET requests without headers or status codes, prioritizing minimalism to encourage rapid prototyping and interoperability.[12] Standardization efforts began informally with Berners-Lee's publication of initial specifications for HTTP, HTML, and Uniform Resource Identifiers (URIs) in 1991–1993, distributed through Internet Engineering Task Force (IETF) drafts and CERN documents to promote consistent implementation.[13] These early documents outlined HTML as a tag-based language for structuring content and HTTP as a stateless request-response protocol, though lacking formal ratification.[13] To address growing fragmentation from proprietary extensions in emerging browsers, Berners-Lee founded the World Wide Web Consortium (W3C) in October 1994 at the Massachusetts Institute of Technology's Laboratory for Computer Science, with initial hosting also at CERN and Keio University.[14] The W3C aimed to develop open, royalty-free standards through collaborative working groups, producing "recommendations" that influenced implementations without legal enforcement, focusing on core technologies like HTML, Cascading Style Sheets (CSS), and later XML.[14] By 1995, the IETF published HTML 2.0 as RFC 1866, the first version intended as a stable reference for conformance, incorporating features from Berners-Lee's prototypes while resolving ambiguities in forms and anchors.[15] HTTP/1.0 followed in 1996 via RFC 1945, introducing methods like POST and basic authentication, reflecting lessons from early deployments.[12] These milestones established foundational interoperability, enabling the Web's transition from experimental tool to scalable system, though challenges persisted with browser vendors diverging from specs until W3C's ongoing refinements.[14]Commercialization and Mass Adoption
CERN's release of the World Wide Web software into the public domain on April 30, 1993, removed proprietary barriers and enabled commercial entities to freely implement and extend the technology, marking a pivotal step toward widespread commercialization.[16][17] This decision contrasted with earlier proprietary systems and facilitated the integration of web protocols into business applications, as developers and companies could now build upon HTTP, HTML, and URI standards without licensing restrictions.[18] The development of graphical web browsers accelerated adoption by making the web accessible to non-technical users. The Mosaic browser, released in 1993 by the National Center for Supercomputing Applications, introduced inline images and intuitive navigation, inspiring commercial spin-offs.[19] Netscape Communications, founded in April 1994 by Marc Andreessen and others from the Mosaic team, launched Netscape Navigator later that year; its support for multimedia, forms, and faster rendering drove rapid uptake, with the company achieving a market capitalization exceeding $2 billion upon its August 1995 IPO.[20] These browsers shifted the web from text-based academic tools to visually engaging platforms, spurring the creation of public-facing websites by 1994 and intensifying competition in the "browser wars."[19] Commercialization fully materialized with the decommissioning of NSFNET on April 30, 1995, which ended federal restrictions on commercial traffic over the internet backbone and transitioned control to private providers.[21][22] Prior to this, NSFNET policies prohibited direct commercial use to preserve its research focus, but growing demand from businesses prompted privatization through network access points and commercial backbones operated by firms like MCI and Sprint.[23] This infrastructure shift enabled internet service providers to offer paid dial-up and dedicated connections, lowering barriers for enterprises and consumers. Mass adoption followed, fueled by affordable personal computers, expanding dial-up services from providers like AOL, and the dot-com era's influx of e-commerce sites. Global internet users, predominantly accessing the web, grew from approximately 16 million in 1995 to 36 million in 1996, 70 million in 1997, and 147 million in 1998.[24] By late 1993, over 500 web servers existed, representing 1% of total internet traffic—a modest but rapidly expanding share that ballooned with commercial incentives.[16] This period saw the web evolve from a niche research tool to a core driver of economic activity, with businesses leveraging it for advertising, online retail, and information dissemination despite early limitations in bandwidth and security.[25]Key Milestones in Expansion
The release of the World Wide Web software into the public domain by CERN on April 30, 1993, facilitated rapid adoption by developers and institutions worldwide, transitioning from restricted academic use to broader accessibility.[17] This decision, combined with the National Science Foundation's removal of commercial restrictions on Internet backbone use by 1995, spurred the dot-com boom, where venture capital funded thousands of web-based startups, expanding infrastructure and content creation.[18] By 2000, global Internet users—predominantly accessing via the Web—reached approximately 413 million, reflecting exponential growth driven by improved browser technologies and dial-up connectivity.[26] The early 2000s marked the shift to Web 2.0 paradigms, emphasizing user-generated content and interactivity, which dramatically increased engagement and site proliferation. Key launches included Wikipedia in 2001, which amassed over 20,000 articles in its first year, democratizing information dissemination; MySpace and WordPress in 2003, enabling social networking and easy blogging; and YouTube in 2005, which popularized video sharing and contributed to bandwidth demands.[27] These platforms correlated with user growth to over 1 billion worldwide by 2005, as broadband overtook dial-up in regions like the United States, enabling richer media experiences.[27][24] Mobile integration accelerated expansion in the late 2000s, with the iPhone's 2007 debut introducing touch-based browsing and app ecosystems that blurred lines between native apps and web content. By 2010, Internet users exceeded 1.9 billion, with smartphones driving access in developing regions through affordable data plans.[26] Social media giants like Facebook, launched in 2004, further entrenched daily web usage, with platforms reaching billions by the 2010s and fostering real-time global connectivity, though raising concerns over data centralization. Website counts surged correspondingly, from tens of millions in 2000 to over 850 million active sites by 2013, underscoring infrastructure scaling via cloud hosting and content management systems.[28][27]Technical Architecture
Core Protocols and Components
The World Wide Web operates through a foundational set of protocols and components that facilitate the distributed retrieval and display of hypermedia documents. Central to this architecture are the Hypertext Transfer Protocol (HTTP) for communication, Hypertext Markup Language (HTML) for document structure, and Uniform Resource Identifiers (URIs) for resource identification. These were principally authored by Tim Berners-Lee at CERN, with HTTP and HTML emerging from his 1989 proposal and initial implementations between 1989 and 1991.[12] HTTP functions as the stateless, request-response protocol at the application layer, enabling web clients like browsers to request resources from servers and receive responses containing data such as HTML files or images. Its earliest informal specification, HTTP/0.9, was released in 1991 without headers or status codes, supporting only GET requests for simple document retrieval. Formal standardization followed with HTTP/1.0 in RFC 1945 (May 1996), which added headers for metadata like content type and basic caching, and HTTP/1.1 in RFC 2616 (June 1999), incorporating persistent connections, chunked transfer encoding, and improved error handling to enhance efficiency over TCP/IP transport.[12][29][30] HTML defines the semantic structure of web content using markup tags enclosed in angle brackets, such as<p> for paragraphs and <a> for hyperlinks, which browsers parse to render text, images, and interactive elements. Introduced alongside HTTP in 1991, it evolved from SGML-based formats to standardize web page composition, with versions like HTML 2.0 (1995) formalizing core tags and attributes for interoperability. HTML's role extends to embedding multimedia and scripts, though its primary function remains delineating document hierarchy and content semantics.[31][32]
URIs provide a standardized syntax for naming and locating web resources, consisting of a scheme (e.g., "http"), authority (host and port), path, and optional query or fragment components. URLs, a subset of URIs specifying network locations, enable hyperlinks to reference remote documents via strings like "http://example.com/path", supporting the web's navigable hyperlink model. Defined in RFC 2396 (August 1998), URIs ensure persistent, scheme-agnostic identification, underpinning HTTP requests by mapping abstract names to retrievable addresses.[33][34]
These components interoperate such that a client issues an HTTP GET request to a URI-identified server, which responds with HTML-formatted data for local rendering, forming the web's client-server exchange paradigm. While later extensions like HTTPS (via TLS encryption, first proposed in 1994 Netscape drafts) address security, the original triad of HTTP, HTML, and URIs constitutes the unchanging core enabling global hypertext linkage.[35][36]
Hypertext and Linking Mechanisms
Hypertext constitutes interconnected bodies of text where embedded references, or hyperlinks, enable users to access related content non-sequentially, departing from linear reading structures. The term "hypertext" was coined by Theodore Holm Nelson in a 1965 literary project called Xanadu, drawing from earlier conceptualizations such as Vannevar Bush's 1945 Memex system, which envisioned associative trails through microfilm-based information repositories.[37] [38] This paradigm shift facilitated rapid, user-directed exploration of information, contrasting traditional bound documents. In the World Wide Web, hypertext serves as the core navigational substrate, integrating with internet protocols to form a distributed, global repository. Tim Berners-Lee proposed this application in his March 1989 CERN memorandum, "Information Management: A Proposal," advocating a hypertext-based system to unify disparate scientific data across networked computers without proprietary formats.[4] [16] By 1990, Berners-Lee implemented the first hypertext browser and server, employing Hypertext Markup Language (HTML) to encode links within documents, thereby enabling seamless traversal of resources identified by Uniform Resource Identifiers (URIs).[5] Web linking mechanisms rely on HTML anchor elements (<a> tags) to demarcate hyperlinks, with the href attribute specifying a URI—typically a Uniform Resource Locator (URL)—as the target address. A URL delineates not only the resource's identity but also its retrievable location, comprising components such as the scheme (e.g., https://), authority (host and port), path, query parameters, and fragment identifier for intra-document jumps.[39] [40] Relative URLs reference resources within the same domain, reducing redundancy, while absolute URLs provide full paths for cross-site navigation; both resolve via domain name system lookups and HTTP requests upon user activation.[41]
Upon hyperlink invocation, the client-side user agent parses the URL, initiates a Hypertext Transfer Protocol (HTTP) or secure variant (HTTPS) transaction with the destination server, and integrates the fetched content—often HTML—into the rendering context, preserving session continuity through bidirectional anchor semantics.[42] Early implementations supported static links to text or images, but subsequent standards introduced attributes like rel for semantic relations (e.g., nofollow to influence crawling) and target for window behaviors, enhancing usability without altering the foundational URI-driven resolution.[43] This mechanism's universality stems from its reliance on open standards, fostering the Web's exponential growth from 10 hosted websites in 1993 to over 1.1 billion domains by 2023, as indexed by services like the Internet Corporation for Assigned Names and Numbers (ICANN).[44]
Client-Server Model and Rendering
The World Wide Web relies on a client-server architecture, where clients—typically web browsers—initiate requests for resources from servers that host and deliver web content.[45] This model distributes workloads, with clients handling user interface and rendering while servers manage data storage, processing, and response generation.[46] Communication between clients and servers occurs via the Hypertext Transfer Protocol (HTTP), a stateless application-layer protocol operating over TCP, which structures interactions as requests from clients followed by responses from servers.[35] HTTP/1.1, standardized in RFC 2616 in June 1999, introduced persistent connections to reduce latency by allowing multiple requests over a single TCP session, improving efficiency over the non-persistent HTTP/1.0 from 1996.[47] In a standard HTTP exchange, the client constructs a request line specifying the method (e.g., GET for retrieval or POST for submission), the uniform resource identifier (URI), and the HTTP version, followed by headers for metadata like content type or authorization, and an optional body for data such as form inputs.[48] Servers, upon receiving the request—often via port 80 for HTTP or 443 for its encrypted variant HTTPS—parse it, authenticate if required, execute server-side logic (e.g., querying a database or running scripts), and formulate a response with a three-digit status code (e.g., 200 for success, 404 for not found), headers indicating content length or type, and a body typically containing HTML markup, images, or other media.[49] This stateless design, where each request is independent without inherent memory of prior interactions, enables horizontal scalability—servers can handle thousands of concurrent requests by load balancing across multiple instances—but necessitates mechanisms like cookies or sessions to maintain user state across requests. Rendering begins after the client receives the server's response, primarily driven by the browser's rendering engine, which converts raw bytes into a visual, interactive page.[50] The process starts with parsing the HTML byte stream into tokens, then constructing the Document Object Model (DOM)—a tree representation of the page's structure—while speculatively prefetching linked resources like CSS stylesheets or JavaScript files referenced in the document.[51] CSS parsing yields the CSS Object Model (CSSOM), a tree of styling rules, and JavaScript execution via the engine (e.g., V8 in Chromium-based browsers) may dynamically alter the DOM through APIs, potentially triggering reflows or repaints.[52] The browser then merges the DOM and CSSOM to form a render tree, excluding non-visual elements like<head> or display: none nodes, and applies layout (or reflow) to compute geometric positions and sizes based on viewport dimensions, often using algorithms like those in CSS Flexbox or Grid specified in W3C recommendations from 2012 onward.[50] Painting follows, where the render tree is rasterized into layers of pixels, drawing elements like text, borders, and images onto the screen bitmap, with optimizations such as hardware-accelerated compositing in modern engines (introduced prominently in WebKit around 2009) to isolate transformations and reduce full repaints.[52] This critical rendering path, which can complete in milliseconds on capable hardware but varies with page complexity—e.g., a 2023 study noting average first paint times of 1.5 seconds for desktop sites—prioritizes above-the-fold content for progressive display, though blocking resources like synchronous JavaScript can delay it.[51] Variations exist across engines: Blink (Chrome, Edge) emphasizes multi-process isolation for stability since 2013, while Gecko (Firefox) integrates tighter JavaScript-DOM coupling for responsiveness.[53]
Content Delivery and Optimization
Content delivery in the World Wide Web occurs primarily through the Hypertext Transfer Protocol (HTTP), a request-response protocol that enables clients, such as web browsers, to retrieve resources like HTML pages, images, stylesheets, and scripts from remote servers over the Internet Protocol suite. HTTP operates in a stateless manner, with each request independent unless extended mechanisms maintain session state, facilitating scalable distribution of hypermedia content. Optimization of content delivery focuses on minimizing latency, reducing bandwidth consumption, and enhancing reliability amid growing global traffic volumes, which exceeded 3.7 zettabytes annually by 2017 according to industry estimates.[54] Key techniques include protocol enhancements in successive HTTP versions: HTTP/1.1, standardized in RFC 2616 (1999), introduced persistent connections to reuse TCP sockets for multiple requests, cutting connection setup overhead by eliminating repeated TCP handshakes.[55] HTTP/2, deployed widely from 2015 via RFC 7540, added binary framing, multiplexing of requests over a single connection, and header compression using HPACK, which collectively reduced page load times by 15-30% in benchmarks on resource-heavy sites.[56] HTTP/3, built over QUIC (RFC 9000, 2021), further optimizes delivery by integrating transport-layer features like 0-RTT handshakes and migration-resistant connections, proving resilient in mobile and lossy networks with up to 20% latency reductions over HTTP/2 in real-world tests.[57] Data compression at the transport layer compresses payloads before transmission, with HTTP supporting content-encoding headers for algorithms like gzip (DEFLATE-based, reducing text sizes by 60-80%) and Brotli (offering 20-26% better ratios than gzip for web content).[58] Servers negotiate compression via Accept-Encoding headers from clients, applying it selectively to compressible resources like HTML, CSS, and JavaScript while excluding already-compressed media such as JPEG images, thereby lowering bandwidth usage without client-side decompression burdens in modern browsers.[58] Content Delivery Networks (CDNs) distribute content via edge servers deployed globally, caching static assets closer to users to bypass origin server bottlenecks and mitigate geographic latency; for instance, a CDN can reduce round-trip times from 200ms to under 50ms for users accessing U.S.-based content from Asia.[54] Originating in the mid-1990s to handle surging web traffic during the dot-com era, CDNs employ techniques like anycast routing for DNS resolution to the nearest point-of-presence (PoP) and load balancing across thousands of nodes—Cloudflare alone operates over 300 cities as of 2023.[59] Dynamic content acceleration in CDNs uses origin shielding and route optimization, while integration with HTTP/2+ boosts throughput; adoption correlates with 20-50% faster load times for sites serving video or large files, as measured in HTTP Archive analyses.[60] These methods collectively address causal factors like propagation delays and server overload, enabling efficient scaling without altering core web architecture.[61]Operational Features
Static and Dynamic Web Pages
Static web pages consist of fixed content stored as files on a web server, such as HTML, CSS, and client-side JavaScript, which are delivered to the client's browser without any server-side processing or modification per request.[62] These pages display identical content to all users regardless of factors like time, location, or user input, making them suitable for unchanging information such as documentation, brochures, or personal portfolios.[63] In the early World Wide Web, launched by Tim Berners-Lee in 1991, all pages were inherently static, relying solely on pre-authored HTML files served directly by servers like the first NeXT-based web server at CERN.[64] Dynamic web pages, in contrast, are generated in real-time by the server in response to a user's request, often incorporating data from databases, user sessions, or external inputs to produce customized output.[65] This generation typically involves server-side scripting languages or interfaces that execute code to assemble HTML dynamically, enabling features like search results, e-commerce transactions, and personalized feeds.[66] The foundational mechanism for dynamic content emerged with the Common Gateway Interface (CGI) in 1993, developed by the National Center for Supercomputing Applications (NCSA) to allow web servers to invoke external scripts or programs, such as Perl or C, for processing requests beyond static file serving.[67] Subsequent advancements built on CGI, including server-side includes and dedicated scripting languages; for instance, PHP originated in 1994 as a set of CGI binaries created by Rasmus Lerdorf to track visitors on his personal homepage, evolving into a full-fledged dynamic content generator by 1995.[68] Dynamic pages demand more computational resources on the server, as each request may trigger database queries or logic execution, potentially leading to slower response times compared to static pages but offering greater interactivity and scalability for data-driven applications.[69] While early dynamic implementations relied heavily on server-side processing, modern approaches increasingly incorporate client-side dynamism via JavaScript frameworks, though the core distinction persists in whether content is pre-rendered or assembled on demand.[70]Websites, Servers, and Hosting
A website comprises a set of interlinked web pages and associated resources, such as images, stylesheets, and scripts, accessible via a unique domain name or IP address over the internet. These pages are typically authored in HTML, augmented by CSS for presentation and JavaScript for interactivity, and stored on a server for retrieval upon user request. As of 2024, approximately 1.13 billion websites exist worldwide, though only about 200 million are actively maintained and updated.[71] Web servers consist of hardware and software configured to process HTTP requests from clients, such as browsers, and deliver the corresponding web content. The first operational web server, implemented by Tim Berners-Lee at CERN in 1990 on a NeXT computer, demonstrated the basic client-server exchange of hypertext documents. Modern web server software dominates the ecosystem, with Nginx holding 33.8% market share and Apache 27.6% as of late 2024, reflecting Nginx's efficiency in handling concurrent connections and Apache's longstanding configurability.[72] These servers operate on physical or virtual machines, managing tasks like request routing, content caching, and error handling to ensure reliable delivery. Web hosting services provide the infrastructure for storing, serving, and managing websites, encompassing server rental, bandwidth allocation, and administrative support. Hosting emerged commercially in the mid-1990s following the web's public release, evolving from basic shared environments to sophisticated cloud-based models. Primary types include shared hosting, where multiple sites share resources on a single server for cost efficiency; virtual private servers (VPS), offering isolated partitions for greater control; dedicated servers for exclusive hardware access suited to high-traffic sites; and cloud hosting, leveraging distributed resources from providers like AWS, which commands significant market share due to scalability.[73] By 2023, cloud infrastructure from major providers such as AWS, Azure, and Google Cloud accounted for about 80% of the global cloud market, underscoring the shift toward elastic, on-demand hosting that mitigates single-point failures and supports dynamic scaling.[74] Hosting providers handle operational aspects like security patching, backups, and uptime guarantees, with data centers worldwide ensuring low-latency access; for instance, large-scale operations like those of the Wikimedia Foundation utilize racks of servers optimized for content delivery networks (CDNs) to distribute load globally.[75] Selection of hosting type depends on factors such as traffic volume, security needs, and budget, with shared options suiting small sites and cloud variants enabling auto-scaling for enterprises facing variable demands.Search Engines and Discovery
Search engines are essential tools for discovering content on the World Wide Web, enabling users to locate specific information amid billions of interconnected pages that would otherwise be inaccessible without systematic navigation aids. By processing queries and retrieving ranked results from vast indexes, they transform the decentralized hypertext structure of the WWW into a usable resource, handling over 5 trillion searches annually as of 2025.[76] Their development addressed the core challenge of scale: the WWW's growth from a few thousand pages in 1993 to over 1 trillion unique URLs indexed by major engines by the early 2010s. The origins of search technology predate the WWW's public debut. In 1990, Archie, developed by Alan Emtage at McGill University, became the first search engine by indexing FTP file archives rather than web pages.[77] With the WWW's emergence, Aliweb launched in November 1993 as the initial web-specific engine, focusing on indexing pages submitted via a form rather than automated discovery.[78] Subsequent innovations included WebCrawler in 1994, the first to employ a full web crawler for automated indexing, and AltaVista in 1995, which introduced advanced features like natural language queries and handled millions of pages with Boolean search capabilities.[78] Yahoo!, founded in 1994 by David Filo and Jerry Yang as a human-curated directory, evolved into a hybrid search service but prioritized categorization over algorithmic crawling.[78] Google, established in 1998 by Larry Page and Sergey Brin at Stanford University, marked a pivotal advancement through its PageRank algorithm, which evaluates page relevance by analyzing inbound hyperlinks as indicators of authority, mimicking academic citation networks.[77] This link-based ranking outperformed earlier keyword-density methods, reducing spam and improving result quality, leading to rapid adoption. By the early 2000s, engines like Google shifted discovery from manual directories to automated, scalable systems, fundamentally enabling the WWW's mass usability.[79] Modern search engines function via three core stages: crawling, indexing, and ranking. Crawlers—software bots—start from seed URLs and recursively follow hyperlinks to fetch pages, respecting directives like robots.txt files to avoid restricted areas; this process continuously updates the web's map against dynamic changes.[80] Indexed content is parsed, tokenized, and stored in inverted databases linking terms to documents, incorporating metadata such as titles, anchors, and page structure for efficient querying.[81] Ranking then applies proprietary algorithms to score results by factors including query relevance, link authority, content freshness, user location, and behavioral signals like click-through rates, with Google's systems processing hundreds of such variables in milliseconds.[80] As of September 2025, Google commands 90.4% of global search market share, reflecting its refined algorithms and integration into browsers and devices, while Microsoft's Bing holds 4.08% and Russia's Yandex 1.65%.[82] Privacy-focused alternatives like DuckDuckGo, launched in 2008, aggregate results without tracking users, capturing about 0.79% share amid growing concerns over data-driven personalization potentially skewing neutral discovery.[83] These engines have amplified the WWW's reach, but their gatekeeping role raises issues: high-ranking pages receive disproportionate traffic—often 90% of clicks going to the first page—creating feedback loops where visibility reinforces popularity, sometimes at the expense of niche or emerging content.[84] Empirical studies confirm that crawler biases and algorithmic opacity can hinder equitable discovery, underscoring the need for transparent methodologies to align with the WWW's open ethos.[85]Caching, Cookies, and State Management
The Hypertext Transfer Protocol (HTTP), foundational to the World Wide Web, operates as a stateless protocol, meaning each client request to a server is independent and lacks inherent memory of prior interactions, a design choice by Tim Berners-Lee in 1989-1991 to prioritize simplicity, scalability, and distributed hypermedia information systems.[86][12] This statelessness enables efficient, connectionless exchanges but requires additional mechanisms for applications needing continuity, such as user sessions or personalized content, leading to techniques like embedding state in request headers, URLs, or client-side storage.[87] HTTP cookies, small key-value data strings stored by browsers and transmitted in subsequent requests to the same domain, emerged as a core state management tool to simulate persistence over stateless connections. Invented in June 1994 by Lou Montulli, a Netscape engineer, cookies were initially implemented to track visitor history on the Netscape website and enable features like shopping carts for e-commerce clients, addressing the limitation of servers forgetting user actions between page loads.[88][89] By 1997, the IETF standardized cookie handling in RFC 2109, evolving to RFC 6265 in 2011 for improved security attributes like Secure (HTTPS-only transmission) and HttpOnly (JavaScript-inaccessible to mitigate XSS attacks), with sizes typically capped at 4KB per cookie and domains limited to prevent cross-site leakage. Cookies facilitate server-side sessions by storing opaque session IDs, which servers map to user data in databases or memory caches like Redis, balancing client-side lightness with server control; however, they introduce privacy risks, as third-party cookies (set by non-origin domains via ads or embeds) enable cross-site tracking, prompting browser restrictions like Intelligent Tracking Prevention in Safari (2017) and phased Chrome deprecation starting 2024.[90] Beyond cookies, state management encompasses client-side alternatives for modern single-page applications (SPAs), including URL query parameters for bookmarkable state, hidden form fields for POST submissions, and post-2000s APIs like localStorage (persistent, domain-bound key-value store up to 5-10MB) and sessionStorage (temporary, cleared on tab close), introduced in HTML5 specifications to reduce server round-trips without cookie overhead.[91] Server-side sessions, often using cookies as identifiers, store sensitive data centrally for scalability in distributed systems, while token-based approaches like JWT (JSON Web Tokens, standardized in RFC 7519, 2015) embed signed state directly in requests, enabling stateless authentication in microservices. Trade-offs include cookies' simplicity versus localStorage's vulnerability to client tampering and larger payloads in tokens, with best practices favoring minimal state transfer to preserve HTTP's performance ethos. Web caching complements state management by mitigating latency in repeated stateless requests, storing response copies at browser, proxy, or content delivery network (CDN) levels to reuse unchanged resources without full server fetches. Formalized in HTTP/1.1 (RFC 2616, June 1999), caching directives like Cache-Control (e.g., max-age for expiration in seconds, no-cache for validation) and ETag/Last-Modified for conditional revalidation enable heuristics such as immutable resource caching (e.g., versioned assets like style.v1.css), reducing bandwidth by up to 80-90% for static content in high-traffic sites.[92] Browser caches persist across sessions unless evicted by storage quotas (typically 50-250MB per origin) or directives like no-store, while shared caches like CDNs (e.g., Akamai, operational since 1998) employ edge servers for geographic optimization, invalidating via purge APIs upon content updates.[93] Invalidation challenges persist, as proactive purging lags behind dynamic content changes, necessitating hybrid strategies with versioning to ensure freshness without over-fetching.[94]Security Measures
Common Vulnerabilities and Exploits
The World Wide Web's architecture, reliant on HTTP/HTTPS protocols and client-server interactions, exposes systems to various vulnerabilities primarily arising from improper input validation, misconfigurations, and outdated software components. According to the OWASP Top 10 for 2021, broken access control ranks as the most prevalent risk, affecting nearly 94% of tested applications and enabling attackers to act outside intended permissions, such as accessing unauthorized data or functions. Injection flaws, including SQL injection, comprise the third most critical category, where untrusted data is executed as code, potentially leading to data exfiltration or system compromise; for instance, SQL injection has been exploited in breaches like the 2007 TJX Companies incident, exposing 94 million payment card records. Cross-site scripting (XSS) represents a widespread client-side vulnerability under injection and broken access control categories, allowing attackers to inject malicious scripts into web pages viewed by other users, often via reflected, stored, or DOM-based vectors; OWASP reports it impacts a significant portion of web applications, with exploits like the 2018 British Airways breach using XSS variants to steal payment data from 380,000 transactions. Security misconfigurations, the fifth-ranked risk, stem from default settings, incomplete configurations, or exposed error details, facilitating unauthorized access; the 2017 Equifax breach exemplified this when unpatched Apache Struts vulnerabilities (CVE-2017-5638) allowed remote code execution, compromising 147 million personal records due to failure to apply a March 2017 patch. Vulnerable and outdated components, such as third-party libraries, pose risks when unpatched, as seen in the 2014 Heartbleed bug (CVE-2014-0160) in OpenSSL, which leaked sensitive memory from web servers handling HTTPS traffic, affecting up to two-thirds of secure web servers and prompting a scramble to regenerate certificates. Cross-site request forgery (CSRF) exploits trusted relationships by tricking users into submitting unauthorized requests, often mitigated insufficiently in legacy web apps; it has been implicated in attacks like the 2011 Dutch certificate authority DigiNotar compromise, indirectly enabling man-in-the-middle attacks on web sessions. The Log4Shell vulnerability (CVE-2021-44228) in Log4j, disclosed December 2021, demonstrated supply-chain risks for web backends, allowing remote code execution and rapid exploitation across millions of servers before patches were deployed. These exploits highlight causal chains where initial flaws enable escalation, underscoring the web's distributed nature amplifies propagation risks without robust validation and updates.[95]Encryption Protocols and Authentication
The primary encryption protocol for securing communications over the World Wide Web is Transport Layer Security (TLS), which evolved from the Secure Sockets Layer (SSL) protocol originally developed by Netscape Communications in 1994 to protect HTTP traffic.[96] SSL version 2.0 was publicly released in 1995, followed by SSL 3.0 in 1996, but due to identified weaknesses, the Internet Engineering Task Force (IETF) standardized TLS 1.0 in 1999 as an upgrade, renaming and enhancing the protocol to address vulnerabilities like export-grade cipher restrictions and authentication gaps.[97] Subsequent versions—Tls 1.1 (2006), TLS 1.2 (2008), and TLS 1.3 (2018)—introduced improvements such as stronger cipher suites, forward secrecy via ephemeral keys, and reduced handshake latency, with TLS 1.3 mandating authenticated encryption to prevent downgrade attacks.[98] Hypertext Transfer Protocol Secure (HTTPS) implements TLS to encrypt data in transit between web clients and servers, ensuring confidentiality, integrity, and server authentication during the TLS handshake.[99] In this process, the client initiates a connection, the server presents a digital certificate containing its public key, and the client verifies the certificate against trusted root authorities before negotiating symmetric session keys for bulk encryption using algorithms like AES.[100] Public Key Infrastructure (PKI) underpins this authentication by relying on a hierarchy of Certificate Authorities (CAs) that issue and sign X.509 certificates, enabling clients to validate server identity through chain-of-trust verification back to pre-installed root certificates in browsers.[101] As of 2023, over 90% of web traffic uses HTTPS, driven by browser warnings for unencrypted sites and requirements from standards bodies.[97] Server authentication via TLS certificates primarily verifies the endpoint's identity, preventing man-in-the-middle attacks by binding public keys to domain names through Domain Validation (DV), Organization Validation (OV), or Extended Validation (EV) processes, though EV's visual indicators have been phased out in modern browsers due to limited additional security benefits.[100] Client authentication in web contexts is less standardized at the protocol level but can employ mutual TLS (mTLS), where clients present their own certificates for two-way verification, commonly used in enterprise APIs or IoT scenarios.[102] Application-layer mechanisms, such as HTTP Basic Authentication or Digest Authentication over HTTPS, provide username-password challenges, but these are vulnerable to replay if not combined with TLS; more robust methods include JSON Web Tokens (JWT) or OAuth 2.0 for delegated access without transmitting credentials directly.[103] Certificate revocation checks via Online Certificate Status Protocol (OCSP) or Certificate Revocation Lists (CRLs) ensure compromised keys are invalidated, though OCSP stapling optimizes this by embedding server-provided proofs to avoid client-side queries.[104]Mitigation Strategies and Best Practices
Organizations implementing web applications should adopt a defense-in-depth strategy, layering multiple controls to address vulnerabilities such as injection attacks, broken authentication, and misconfigurations identified in frameworks like the OWASP Top 10.[105] This approach recognizes that no single measure eliminates all risks, as evidenced by persistent exploitation of unpatched systems in incidents like the 2021 Log4Shell vulnerability affecting millions of applications.[105] Key best practices include rigorous input validation and output encoding to prevent injection flaws, where user-supplied data is sanitized using parameterized queries and prepared statements in database interactions. For cross-site scripting (XSS), content security policies (CSP) restrict script execution, reducing attack surface by limiting inline scripts and external resources, with studies showing CSP implementation blocks up to 70% of XSS attempts in tested environments.[105] Enforcing HTTPS with TLS 1.3 or higher encrypts data in transit, mitigating man-in-the-middle attacks; by mid-2024, approximately 85% of top websites had migrated to HTTPS, though legacy HTTP persists in resource-constrained environments, exposing sensitive data. Web application firewalls (WAFs) provide runtime protection by inspecting traffic for signatures of exploits like SQL injection, demonstrating effectiveness in blocking 90-95% of known attack patterns when properly tuned, though they require regular rule updates to counter evasion techniques.[106] Authentication mechanisms should incorporate multi-factor authentication (MFA) and strong password policies, avoiding common pitfalls like session fixation; OWASP guidelines recommend rate limiting login attempts to thwart brute-force attacks, which succeed in under 1% of cases with such controls. Regular security audits, including automated scanning tools and penetration testing, identify misconfigurations, with evidence from breach reports indicating that 80% of incidents stem from unpatched software or default credentials.[107]- Patch management: Apply updates promptly, as delays in addressing CVEs like those in Apache Struts have led to widespread compromises.[108]
- Principle of least privilege: Limit user and service account permissions to essential functions, reducing lateral movement in breaches.[105]
- Logging and monitoring: Implement comprehensive event logging with anomaly detection, enabling rapid incident response; tools adhering to OWASP standards detect 60-80% of anomalous behavior pre-escalation.[105]
- Secure development lifecycle: Integrate security from design phase via threat modeling, with code reviews catching 50% more vulnerabilities than post-deployment testing alone.[109]
Privacy Implications
Data Collection and Tracking Technologies
Data collection on the World Wide Web occurs primarily through client-server interactions, where browsers transmit user agent strings, IP addresses, referrers, and timestamps in HTTP requests, enabling servers to log access patterns without explicit consent. Client-side scripts, such as JavaScript embedded in web pages, further facilitate tracking by executing code that captures device characteristics, mouse movements, and keystrokes.[111] These mechanisms form the foundation for both functional personalization and cross-site behavioral profiling. HTTP cookies, small text files stored by browsers at the direction of web servers, were invented in June 1994 by Lou Montulli while working at Netscape Communications to maintain state across stateless HTTP connections, with their first implementation checking prior visits to the Netscape website.[88][112] Cookies include session variants that expire upon browser closure for temporary data like shopping carts, and persistent ones that survive sessions for longer-term identification, often set with expiration dates extending years. First-party cookies originate from the visited domain for site-specific functions, whereas third-party cookies from embedded external resources, such as advertisements, enable cross-site tracking by associating user activity across unrelated sites.[113] Beyond cookies, tracking pixels—tiny, invisible 1x1 images or script-invoked beacons—load from third-party servers to report events like page views or email opens, transmitting referrer data and timestamps without visible user interaction.[114] HTML5 storage APIs, including localStorage for persistent key-value pairs up to 5-10 MB per origin and sessionStorage for tab-specific data, provide cookie alternatives resilient to some privacy tools, storing identifiers for analytics or ad targeting.[115] Browser fingerprinting compiles a unique hash from passive signals like screen resolution, installed fonts, timezone, canvas rendering discrepancies, WebGL capabilities, and hardware concurrency, achieving identification rates where over 99% of browsers yield distinct fingerprints in large samples.[116] Unlike cookies, fingerprinting requires no storage and persists across sessions or devices, complicating blocking efforts.[111] Analytics platforms exemplify integrated tracking: Google Analytics, launched on November 11, 2005, after Google's acquisition of Urchin Software, deploys JavaScript snippets to collect metrics on user flows, bounce rates, and conversions, powering insights for over 80% of websites via its trackers.[117][118] Third-party trackers appear on approximately 80-99% of analyzed websites, including high-stakes domains like hospitals, transferring data to entities for advertising, fraud detection, or profiling.[118][119] These technologies, while enabling functionalities like targeted content, aggregate vast datasets correlating user identities with behaviors across the web.[120]User Protections and Regulations
The General Data Protection Regulation (GDPR), enacted by the European Union and effective from May 25, 2018, establishes stringent requirements for websites processing personal data of EU residents, including explicit consent for deploying non-essential cookies and tracking technologies such as pixels or beacons. It empowers users with rights to access, rectify, erase (known as the "right to be forgotten"), and port their data, alongside obligations for data controllers to conduct privacy impact assessments and notify breaches within 72 hours. Non-compliance can result in fines up to 4% of a company's global annual turnover or €20 million, whichever is greater, with enforcement actions exceeding €2.7 billion in penalties by mid-2023. GDPR's extraterritorial reach has influenced global practices, serving as a model for laws in over 130 countries by 2025, though critics argue its consent mechanisms often lead to "consent fatigue" without substantially reducing pervasive tracking.[121] In the United States, the California Consumer Privacy Act (CCPA), effective January 1, 2020, and expanded by the California Privacy Rights Act (CPRA) from January 1, 2023, provides residents rights to know what personal information businesses collect, opt out of its sale or sharing, request deletion, and correct inaccuracies.[122] Unlike GDPR's consent model, CCPA emphasizes opt-out mechanisms, including support for the Global Privacy Control (GPC) signal for automated do-not-sell requests, applicable to for-profit entities with annual revenues over $25 million or handling data of 100,000+ consumers.[122] By 2025, at least 18 U.S. states have enacted similar comprehensive privacy laws, such as Virginia's CDPA (effective 2023) and Colorado's CPA (effective July 2023), creating a patchwork that mandates transparency notices and data minimization but lacks a federal equivalent, leading to varied enforcement and compliance burdens.[123] Private rights of action for data breaches under CCPA have spurred over 100 lawsuits annually since 2020.[123] Internationally, regulations like Brazil's LGPD (effective September 2020) mirror GDPR by requiring consent for data processing and imposing fines up to 2% of Brazilian revenue, while China's PIPL (effective November 2021) emphasizes data localization and security assessments for cross-border transfers.[124] As of 2025, 71% of countries have data privacy legislation, with emerging laws in places like Indonesia (PDP Law, effective 2024) mandating user notifications for tracking and breach reporting.[125] These frameworks collectively aim to curb unauthorized tracking via technologies like third-party cookies, yet studies indicate mixed efficacy, with persistent data collection often evading opt-outs due to opaque vendor ecosystems and jurisdictional gaps.[126] Enforcement remains inconsistent, particularly in less-resourced regions, highlighting tensions between user autonomy and platform incentives.[127]Trade-offs Between Convenience and Anonymity
The World Wide Web's architecture facilitates user convenience through mechanisms like HTTP cookies and persistent sessions, which maintain state across visits—such as remembering login credentials or shopping cart contents—but inherently compromise anonymity by enabling persistent tracking of user behavior across sites.[128] Third-party cookies, in particular, allow advertisers and analytics firms to compile detailed profiles by correlating activity from disparate domains, trading ephemeral anonymity for tailored content and reduced friction in navigation.[129] This design choice stems from the stateless nature of HTTP, where servers cannot natively recall prior interactions without client-side storage, prioritizing seamless experiences over default privacy.[130] Empirical studies reveal a consistent "privacy paradox," wherein users voice high concerns about data exposure yet disclose personal information for marginal convenience gains, such as personalized recommendations or one-click logins via social media integrations.[131] For instance, a 2021 longitudinal analysis found no significant correlation between stated privacy worries and reduced self-disclosure on social platforms, attributing this to immediate gratifications outweighing abstract risks.[131] Similarly, fintech platform data indicates that while social logins streamline authentication, their privacy costs— including cross-site data linkage—often exceed usability benefits, with users accepting them despite alternatives like password managers.[132] Surveys corroborate this, showing 73% of global consumers leveraging accounts like Google or Facebook logins for expedited access, even amid awareness of tracking.[133] Tools enhancing anonymity, such as Virtual Private Networks (VPNs) and the Tor network, impose performance penalties that underscore the convenience-anonymity tension: VPNs encrypt traffic and mask IP addresses but introduce latency, while Tor's onion routing—relaying data through multiple nodes—yields speeds up to 10 times slower than standard browsing, deterring widespread adoption.[134] As of October 2024, Tor boasts approximately 1.95 million daily users worldwide, representing a fraction of the web's billions, partly due to its friction in everyday tasks like video streaming.[135] VPN usage remains niche, with 68% of surveyed individuals in 2025 either unaware of or abstaining from them, reflecting preferences for unencumbered access over fortified privacy.[136] These technologies, while effective against casual surveillance, falter in balancing full anonymity with the web's expectation of rapid, stateful interactions, often requiring users to forgo features like geolocated services.[137] This dichotomy manifests causally in web evolution: convenience-driven features accelerate engagement and economic value—e.g., via targeted ads yielding higher conversion rates—but erode anonymity through pervasive fingerprinting and data aggregation, even sans cookies.[138] Users navigating this trade-off rarely opt for maximal anonymity, as evidenced by domain-specific paradoxes where convenience in e-commerce trumps privacy more than in health data contexts, highlighting rational calculus over ideological commitment.[139] Absent systemic redesigns, such as privacy-by-default protocols, the web's incentives favor convenience, with anonymity relegated to specialized, suboptimal paths.[140]Standards and Governance
Role of W3C and Other Bodies
The World Wide Web Consortium (W3C), established in October 1994 by Tim Berners-Lee at the Massachusetts Institute of Technology's Laboratory for Computer Science (now part of CSAIL), functions as the principal international body for developing and promoting open standards to ensure the Web's interoperability and longevity.[14] Headquartered successively at MIT, the European Research Consortium for Informatics and Mathematics (ERCIM) in France, and Keio University in Japan, W3C operates as a membership organization with over 400 members, including major technology firms, academic institutions, and governmental entities as of 2023.[141] Its core mission involves convening global stakeholders to create technical specifications, guidelines, and tools—published as "Recommendations" after consensus-driven review by working groups—that underpin Web technologies such as HTML for document structure, CSS for presentation, XML for data exchange, and accessibility protocols like WCAG.[142] Unlike legally binding standards, W3C Recommendations gain authority through widespread adoption by browser vendors and developers, fostering a decentralized yet compatible ecosystem.[143] W3C's processes emphasize royalty-free licensing and public review to avoid proprietary lock-in, though its member-driven model has drawn scrutiny for potential influence by dominant corporations on specification priorities.[144] Key achievements include standardizing SVG for vector graphics in 1999 and advancing semantic web technologies like RDF since the early 2000s, which enable machine-readable data integration.[141] The organization also addresses emerging challenges, such as WebAssembly for high-performance code execution (finalized as a Recommendation in 2019) and privacy-enhancing features in specifications like the Permissions Policy.[145] Complementing W3C, the Internet Engineering Task Force (IETF) develops foundational protocols enabling Web communication, producing over 9,000 Request for Comments (RFCs) since 1987, including RFC 2616 (HTTP/1.1 in 1999, obsoleted by RFC 9110 in 2022) and URI standards (RFC 3986).[146] Operating as an open, volunteer-led community under the Internet Society (ISOC), IETF focuses on engineering solutions for network efficiency and security, distinct from W3C's application-layer emphasis.[147] The Web Hypertext Application Technology Working Group (WHATWG), formed in 2004 by Apple, Mozilla, and Opera representatives amid dissatisfaction with W3C's modular approach to HTML, maintains a "living standard" for HTML, DOM, and related APIs, prioritizing iterative updates based on real-world browser implementations over periodic snapshots.[148] This has accelerated features like HTML5 elements (e.g.,<video> and <canvas>) and influenced W3C's HTML5 Recommendation in 2014, though the two bodies maintain parallel tracks, with WHATWG's version serving as the de facto reference for developers.[148]
Ecma International, formerly the European Computer Manufacturers Association, standardizes client-side scripting via ECMAScript (e.g., ES6 in 2015, with annual updates), ratified as ISO/IEC 16262, which powers interactive Web applications in browsers.[148] The Internet Assigned Numbers Authority (IANA), under ICANN, manages protocol parameters like media types (e.g., text/html) and port numbers essential for Web resource identification.[146] These entities collectively ensure the Web's technical coherence through non-hierarchical collaboration, though tensions arise from competing priorities, such as speed versus exhaustive consensus, ultimately resolved via implementation testing and market adoption.[149]
