Recent from talks
Nothing was collected or created yet.
Mirror site
View on WikipediaMirror sites or mirrors are replicas of other websites. The concept of mirroring applies to network services accessible through any protocol, such as HTTP or FTP. Such sites have different URLs than the original site, but host identical or near-identical content.[1] Mirror sites are often located in a different geographic region than the original, or upstream site. The purpose of mirrors is to reduce network traffic, improve access speed, ensure availability of the original site for technical[2] or political reasons,[3] or provide a real-time backup of the original site.[4][5][6] Mirror sites are particularly important in developing countries, where internet access may be slower or less reliable.[7]
Mirror sites were heavily used on the early internet, when most users accessed through dialup and the Internet backbone had much lower bandwidth than today, making a geographically-localized mirror network a worthwhile benefit. Download archives such as Info-Mac, Tucows and CPAN maintained worldwide networks mirroring their content accessible over HTTP or anonymous FTP. Some of these networks, such as Info-Mac or Tucows are no longer active or have removed their mirrored download sections, but some like CPAN or the Debian package mirrors are still active in 2023.[8] Debian removed FTP access to its mirrors in 2017 because of declining use and the relative stagnation of the FTP protocol, mentioning FTP servers' lack of support for techniques such as caching and load balancing that are available to HTTP.[9] Modern mirrors support HTTPS and IPv6 along with IPv4.[10]
On occasion, some mirrors may choose not to replicate the entire contents of the upstream server because of technical constraints, or selecting only a subset relevant to their purpose, such as software written in a particular programming language, runnable on a single computer platform, or written by one author. These sites are called partial mirrors or secondary mirrors.[11]
Examples
[edit]Notable websites with mirrors include Project Gutenberg,[12] KickassTorrents,[13][14][15][16] The Pirate Bay,[17][18][19][20] WikiLeaks,[21][22] the website of the Environmental Protection Agency,[23][24] and Wikipedia.[25][26][27] Some notable partial mirrors include free and open-source software projects such as GNU,[28] in particular Linux distributions CentOS,[29] Debian,[30] Fedora,[31] and Ubuntu;[32][33] such projects provide mirrors of the download sites (since those are expected to have high server load). Many open source application providers such as VideoLAN use mirrors to distribute VLC Media Player,[34] and The Document Foundation uses mirrors to distribute LibreOffice.[35]
It was once common for tech companies such as Microsoft, Hewlett-Packard or Apple to maintain a network of mirrors accessible over HTTP or anonymous FTP, hosting software updates, sample code and various freely-downloadable utilities. Much of these sites were shut down in the first decades of the 21st century, with Apple shutting down its FTP services in 2012 and Microsoft stopping updates in 2010.[36][37] Today, the contents of a number of these mirror sites are archived at The FTP Site Boneyard.[38] Occasionally, some people will use web scraping software to produce static dumps of existing sites, such as the BBC's Top Gear and RedFlagDeals.
See also
[edit]References
[edit]- ^ Glushko, Robert J. (25 August 2014). The Discipline of Organizing: Core Concepts Edition. O'Reilly Media. ISBN 9781491912812. Archived from the original on 30 May 2024. Retrieved 29 April 2017.
- ^ "Debian worldwide mirror sites". Archived from the original on 6 September 2017. Retrieved 27 August 2017.
Using a nearby server will probably speed up your download, and also reduce the load on our central servers and on the Internet as a whole.
- ^ "Impending Trump has Internet Archive mirror to Canada". 29 November 2016. Archived from the original on 11 December 2019. Retrieved 11 December 2019.
The Internet Archive has several mirrors up right now, and Canada is set to be its next. This move is taking place specifically because of the new presidential elect Trump here in the United States.
- ^ "What is Mirror Site? Webopedia Definition". www.webopedia.com. 21 July 1998. Archived from the original on 20 May 2017. Retrieved 29 April 2017.
- ^ "What is Mirror Site? - Definition from Techopedia". Techopedia.com. Archived from the original on 4 January 2018. Retrieved 29 April 2017.
- ^ Wisshak, Max; Tapanila, Leif (2 June 2008). Current Developments in Bioerosion. Springer Science & Business Media. ISBN 9783540775973. Archived from the original on 30 May 2024. Retrieved 29 April 2017.
- ^ Sekikawa, A.; Sa, E. R.; Acosta, B.; Aaron, D. J.; Laporte, R. E. (2000). "Internet mirror sites - The Lancet". Lancet. 355 (9219): 2000. doi:10.1016/s0140-6736(05)72944-5. PMID 10859070. S2CID 32218172. Archived from the original on 6 October 2022. Retrieved 11 December 2019.
We all become frustrated when web pages take minutes to unfold. This can increase the gap between infrastructure haves and have-nots. Downloading time is important for other reasons; users connecting to the internet via telephone line in many countries are charged per minute and slow downloading itself may make users lose interest.
- ^ "The status of CPAN mirrors". mirrors.cpan.org. Archived from the original on 7 July 2013. Retrieved 5 February 2023.
- ^ Nestor, Marius (26 April 2017). "Debian Project to Shut Down Its Public FTP Services, Developers Are Not Affected". Softpedia. Archived from the original on 8 February 2024.
The decision to close the Debian FTP services for users was made because the FTP servers in their current state lack support for acceleration or caching, and they aren't quite used lately due to the fact that the Debian Installer no longer provides an FTP option for accessing mirrors since more than ten years ago... FTP as a protocol appears to no longer be efficient, requiring adding strange workarounds to firewalls and load-balancing daemons.
- ^ "CSpace Hostings Public Mirror". Archived from the original on 21 January 2021. Retrieved 21 January 2021.
This page and mirror are available over IPv4 and IPv6 and accessible over HTTP, HTTPS and Rsync
- ^ "Debian worldwide mirror sites". Archived from the original on 6 September 2017. Retrieved 27 August 2017.
A secondary mirror site may have restrictions on what they mirror
- ^ "Project Gutenberg, nonprofit organization". Archived from the original on 13 July 2021. Retrieved 11 December 2019.
In addition, dozens of "mirror" Web sites were created around the world, where the e-books were also stored and available for downloading.
- ^ Russon, Mary-Ann (22 July 2016). "Kickass Torrents is back: New domains, mirrors and proxies show business is as usual". International Business Times UK. Archived from the original on 5 May 2017. Retrieved 29 April 2017.
- ^ Clark, Bryan (21 July 2016). "IsoHunt just launched a working KickassTorrent mirror". The Next Web. Archived from the original on 12 July 2017. Retrieved 29 April 2017.
- ^ "Mexican Police Target Popular KickassTorrents 'Clone,' Seize Domain – TorrentFreak". TorrentFreak. 23 September 2016. Archived from the original on 27 August 2017. Retrieved 29 April 2017.
- ^ Wei, Wang. "New Kickass Torrents Site is Back Online by Original Staffers". The Hacker News. Archived from the original on 8 May 2017. Retrieved 29 April 2017.
- ^ "The Piratebay Blocked By Chrome, Mirror Sites Accessible". iTech Post. 8 October 2016. Archived from the original on 12 July 2017. Retrieved 29 April 2017.
- ^ "The Pirate Bay is blocked Australia wide... except it really isn't". CNET. Archived from the original on 22 April 2017. Retrieved 29 April 2017.
- ^ "Pirate Bay Mirror Shut Down: Alternative Clone Had Kickass Torrents Skin, Vows To Continue". Tech Times. 24 September 2016. Archived from the original on 22 April 2017. Retrieved 29 April 2017.
- ^ "Pirate Bay Blocked By Google Chrome And Firefox: Kickass Torrents Mirror, Extratorrent, Torrentz And Other Clones Accessible". Tech Times. 8 October 2016. Archived from the original on 6 July 2017. Retrieved 29 April 2017.
- ^ Greenemeier, Larry. "How Has WikiLeaks Managed to Keep Its Web Site Up and Running?". Scientific American. Archived from the original on 27 August 2017. Retrieved 29 April 2017.
- ^ Schroeder, Stan (6 December 2010). "WikiLeaks Now Has Hundreds of Mirrors". Mashable. Archived from the original on 17 April 2017. Retrieved 29 April 2017.
- ^ "The EPA Posted a Mirror of Its Website Before Trump Can Gut the Real One". Vice. 16 February 2017. Archived from the original on 28 April 2017. Retrieved 29 April 2017.
- ^ Hiltzik, Michael (24 April 2017). "Did 'people power' save a trove of EPA data from a shutdown by Trump?". Los Angeles Times. Archived from the original on 29 April 2017. Retrieved 29 April 2017.
- ^ "How to set up your own copy of Wikipedia – ExtremeTech". ExtremeTech. 18 January 2012. Archived from the original on 16 April 2017. Retrieved 29 April 2017.
- ^ Broughton, John (2008). Wikipedia: The Missing Manual. "O'Reilly Media, Inc.". ISBN 9780596515164. Archived from the original on 30 May 2024. Retrieved 29 April 2017.
- ^ Ayers, Phoebe; Matthews, Charles; Yates, Ben (2008). How Wikipedia Works: And how You Can be a Part of it. No Starch Press. ISBN 9781593271763. Archived from the original on 30 May 2024. Retrieved 29 April 2017.
- ^ "gnu.org". www.gnu.org. Archived from the original on 27 August 2017. Retrieved 27 August 2017.
- ^ "CentOS Mirror". CentOS Mirror. Archived from the original on 25 January 2021. Retrieved 21 January 2021.
- ^ "Debian worldwide mirror sites". debian.org. Archived from the original on 6 September 2017. Retrieved 27 August 2017.
- ^ "Home - MirrorManager". admin.fedoraproject.org. Archived from the original on 10 October 2017. Retrieved 27 August 2017.
- ^ "Mirrors : Ubuntu". Ubuntu Archive Mirrors. Archived from the original on 11 January 2021. Retrieved 21 January 2021.
- ^ "Mirrors : Ubuntu". Ubuntu CD Mirrors. Archived from the original on 5 February 2021. Retrieved 21 January 2021.
- ^ "Mirrors – VideoLAN". videolan.org. Archived from the original on 22 January 2021. Retrieved 21 January 2021.
- ^ "The Document Foundation Mirrors". download.documentfoundation.org. Archived from the original on 22 January 2021. Retrieved 21 January 2021.
- ^ "How do I access the MICROSOFT FTP server? – Windows 10 Help Forums". tenforums.com. Archived from the original on 20 October 2022. Retrieved 12 April 2022.
- ^ "Microsoft has closing ftp://ftp.microsoft.com". www.betaarchive.com. Archived from the original on 28 January 2020. Retrieved 11 December 2019.
- ^ "The FTP Site Boneyard : Free Software : Free Download, Borrow and Streaming : Internet Archive". archive.org.
External links
[edit]Mirror site
View on GrokipediaDefinition and Core Concepts
Definition
A mirror site is a complete replica of a primary website or server, hosted on a separate physical or virtual server, containing identical content, structure, and functionality to the original.[1][2] This duplication ensures that the mirrored content remains accessible through an alternative uniform resource locator (URL), often differing from the original to facilitate distinct access points.[5] Mirror sites are typically maintained through periodic or real-time synchronization processes to keep the copy up-to-date with changes on the primary site.[6] The primary function of mirror sites revolves around improving system reliability and performance by providing redundancy against failures, such as server downtime or network outages, and distributing user traffic to prevent overload on a single host.[2][6] They may also serve to enhance global accessibility by hosting copies in geographically diverse locations, reducing latency for users in remote regions, or to mitigate bandwidth constraints on the original server.[5] In contexts involving censorship or political restrictions, mirror sites can provide alternative access to blocked content, though such uses may raise legal considerations depending on jurisdiction.[1] Distinct from content delivery networks (CDNs), which cache only portions of static assets rather than full site replication, mirror sites aim for comprehensive duplication to support failover or independent operation.[6]Primary Purposes
Mirror sites are primarily employed to bolster the availability and redundancy of digital content, serving as exact replicas hosted on separate servers to ensure continued access during outages, maintenance, or failures of the primary site. This redundancy mitigates risks from single points of failure, such as server crashes or DDoS attacks, allowing seamless failover for users.[1][6] A key purpose is load balancing and traffic distribution, where mirror sites alleviate strain on the original server by spreading user requests across multiple hosts, particularly during spikes in demand like software downloads or high-traffic events. This approach enhances performance, reduces latency for geographically dispersed users by leveraging proximate servers, and prevents bottlenecks that could degrade service quality.[1][7][8] Mirror sites also facilitate circumvention of access restrictions, including government-imposed censorship or regional blocks, by providing alternative endpoints for prohibited or geo-restricted material, as seen in activist networks or opposition media strategies. In software ecosystems, they support efficient distribution of large files, patches, and updates—such as open-source repositories—by decentralizing downloads and minimizing bandwidth overload on central repositories.[9][1]Historical Development
Early Origins
The practice of creating mirror sites emerged in the late 1980s amid the expansion of anonymous FTP for software and data distribution, driven by bandwidth constraints on early internet infrastructure. FTP, standardized in RFC 959 in 1985, enabled remote file access, but popular archives quickly overloaded primary servers as user numbers grew from academic and research communities. Mirroring addressed this by replicating directories across geographically dispersed hosts, minimizing latency and reducing transatlantic or cross-continental traffic bottlenecks.[10] A pioneering example was the Info-Mac archive, initially launched in 1984 as a mailing list for Macintosh software discussions, which transitioned to an open FTP repository by 1988 for hosting shareware, freeware, and utilities. This collection amassed thousands of files, necessitating a global network of over 100 mirrors by the early 1990s to sustain accessibility without crashing the host at Sumex-Aim (Stanford University). Mirrors synchronized content periodically, often via rsync precursors or manual scripts, exemplifying early load distribution for non-commercial digital repositories.[11][12][13] Parallel developments occurred in academic FTP services, such as Finland's FUNET network, which began mirroring freely distributable files—including Unix tools and research data—in 1990 to serve European users efficiently. By 1993, commercial entities like Microsoft established dedicated FTP sites (e.g., ftp.microsoft.com) with initial mirrors to handle growing downloads of drivers and utilities. These efforts laid the groundwork for systematic replication, prioritizing redundancy over centralized control in an era of dial-up connections and T1 backbone limitations.[14][15] As the World Wide Web gained traction post-1993, FTP mirroring influenced HTTP site replication; notably, the Apache HTTP server's inaugural mirrors activated in April 1995 via hosts like SunSite (now ibiblio), supporting early open-source web server distribution amid surging demand. This transition marked mirroring's evolution from FTP silos to web-scale redundancy, though roots remained in 1980s-era archival needs.[4]Evolution in the Internet Era
The proliferation of mirror sites in the internet era began with the limitations of early network infrastructure, where dial-up connections and narrow backbones necessitated replicas of FTP archives to distribute software, documentation, and data without overwhelming primary servers. In the early 1990s, academic and research institutions established FTP mirrors to handle growing demand for Unix distributions, GNU software, and other open resources, as transcontinental transfers could take hours or days on connections averaging 14.4 to 28.8 kbps.[16] These mirrors reduced latency and server load by localizing access, with sites like SunSite at the University of North Carolina serving as pivotal early providers for archiving and replication efforts.[4] As the World Wide Web emerged in 1991, mirror sites adapted to HTTP protocols, replicating not just files but directory structures and static web content to support the nascent web's scalability challenges. By April 1995, the Apache Software Foundation launched its initial mirror network, relying on volunteers and institutions like SunSite to synchronize web server binaries and documentation, thereby enabling broader adoption amid bandwidth constraints that limited global traffic to under 100 Gbps total by mid-decade.[4] This shift from FTP-centric mirroring to web-inclusive models addressed the web's exponential growth, with mirrors ensuring redundancy for high-demand resources like Linux kernel releases and Perl modules via networks such as CPAN, established in 1995. The late 1990s saw technological advancements refine mirroring practices, including the introduction of rsync in 1996, which used delta-transfer algorithms to synchronize only changed portions of files, minimizing bandwidth overhead compared to full FTP copies. This efficiency supported the mirroring of entire websites for projects facing surge traffic, as internet users grew from 16 million in 1995 to over 248 million by 1999, straining single-server architectures. Mirrors thus evolved from ad-hoc file repositories to systematic tools for availability, exemplified by GNU's global mirror list, which by 1998 included dozens of synchronized sites to counter regional bottlenecks and outages.[17] While improved fiber optics and commercial backbones reduced some performance imperatives by the early 2000s, mirrors persisted for archival integrity and distributed systems in open-source ecosystems.Technical Implementation
Site Replication Methods
Site replication for mirror sites primarily involves duplicating static files, directories, scripts, and dynamic content such as databases across multiple servers to ensure identical copies. For static websites, file-level synchronization tools are commonly employed to transfer content efficiently, minimizing bandwidth usage by transmitting only differences between source and target. Rsync, a Unix-like utility, exemplifies this approach; it uses a delta-transfer algorithm to compute and send only modified portions of files, preserving permissions, timestamps, and symbolic links during replication between local or remote hosts via SSH or rsync daemon protocols.[18] This method supports incremental updates, making it suitable for periodic mirroring of web servers hosting HTML, CSS, images, and other assets. GNU Wget provides an alternative crawling-based replication technique, recursively fetching web pages and resources from HTTP/HTTPS endpoints to create a browsable offline copy. Invoked with options like--mirror, --recursive, --page-requisites, and --convert-links, it downloads linked content up to specified depths while converting absolute URLs to relative ones for local viewing, though it may require adjustments to avoid infinite loops or excessive external linking.[19] Wget is particularly effective for one-time or archival mirrors but less efficient for frequent updates compared to rsync, as it re-downloads unchanged files unless combined with timestamp checks.
Dynamic sites necessitate database replication alongside file syncing to mirror backend data. Techniques such as master-slave replication propagate changes from a primary database to secondary instances in near real-time, using binary log shipping or statement-based logging to maintain consistency across mirrors.[20] For SQL Server environments, this can integrate with file replication to form complete site mirrors, ensuring transactional integrity through commit acknowledgments, though latency and conflict resolution must be managed to prevent data divergence. Advanced setups may employ multi-master replication for bidirectional syncing, but these increase complexity and risk inconsistencies without proper schema design.[21] In cloud contexts, services like AWS Database Migration Service automate schema and data replication, but core methods remain rooted in log-based or query-based propagation for verifiable fidelity.[22]
Synchronization and Maintenance
Synchronization of mirror sites involves replicating changes from the primary site to secondary servers to maintain content consistency, typically using automated tools that detect and transfer deltas such as new files, modifications, or deletions.[18] For static websites, tools like rsync enable efficient file-level synchronization by comparing timestamps, sizes, and checksums to transfer only differing data, often scheduled via cron jobs for periodic updates every few minutes or hours depending on update frequency.[18] GNU wget, invoked with the--mirror or -m flag, recursively downloads site structures including HTML, CSS, images, and linked resources while respecting robots.txt and avoiding redundant fetches through conditional requests.[19]
Dynamic sites with databases require additional layers, such as database replication protocols (e.g., MySQL binary log replication) combined with file syncs, to propagate backend changes like user-generated content or query results, though this introduces latency risks in asynchronous modes where mirrors may temporarily lag behind the primary by seconds to minutes.[23] Real-time synchronization can be approximated using rsync over SSH with inotify hooks to trigger transfers on file system events, minimizing divergence but increasing bandwidth and CPU demands on both servers.[18] Secure protocols like SFTP or rsync's --checksum option ensure integrity against corruption during transfer, with options for compression to optimize for large media files.[18]
Maintenance entails regular verification of mirror fidelity through checksum comparisons or automated diff tools to detect desynchronization from failures like network interruptions or server overloads, often addressed by failover scripts that redirect traffic only to validated mirrors.[23] Operators must apply security patches and configuration updates uniformly across mirrors to prevent vulnerabilities, using centralized management tools like Ansible for orchestration, while monitoring logs for sync errors via scripts that alert on discrepancies exceeding predefined thresholds, such as file count mismatches.[24] Bandwidth throttling during off-peak hours prevents sync processes from impacting primary site performance, and periodic full rescans—e.g., weekly—reconcile cumulative drifts, ensuring long-term reliability without over-reliance on incremental methods alone.[19]
