Archie (search engine)
View on Wikipedia
| Archie | |
|---|---|
Screenshot of Archie | |
| Original author | Alan Emtage |
| Developers | Bunyip Information Systems, Inc. |
| Initial release | 10 September 1990[1] |
| Final release | 3.5
/ 1996 |
| Written in | C |
| Operating system | Solaris, AIX |
| Type | Search engine |
| Website | bunyip.com/products/archie/ (original product page, archived) archie |
Archie is a tool for indexing FTP archives, allowing users to more easily identify specific files. It is considered the first Internet search engine.[2] The original implementation was written in 1990 by Alan Emtage, then a postgraduate student at McGill University in Montreal, Canada.[3][4][5][6] Archie was superseded by other, more sophisticated search engines, including Jughead and Veronica, which were search engines for the Gopher protocol. These were in turn superseded by World Wide Web search engines like AltaVista and directories like Yahoo! in 1995. Work on Archie ceased in the late 1990s. A legacy Archie server was maintained for historic purposes in Poland at Interdisciplinary Centre for Mathematical and Computational Modelling in the University of Warsaw until 2023.
With assistance from the University of Warsaw, a new Archie server was created and opened for public access at The Serial Port, a web-based computer museum, on 11 May 2024.[7][8]
Origin
[edit]Archie first appeared in 1986, while Emtage was the systems manager at the McGill University School of Computer Science. His predecessor had attempted to persuade the institution to connect to the Internet, but due to the expensive cost — roughly $35,000 per year for a sluggish link to Boston — it had been challenging to persuade the appropriate parties that the investment was worthwhile.[9]
The name derives from the word "archive" without the 'v'. Emtage has said that contrary to popular belief, there was no association with the Archie Comics.[10] Despite this, other early Internet search technologies such as Jughead and Veronica were named after characters from the comics. Anarchie, one of the earliest graphical FTP clients, was named for its ability to perform Archie searches.
Function
[edit]The earliest versions of Archie would simply search a list of public anonymous File Transfer Protocol (FTP) sites using the Telnet protocol and create index files available via FTP. To view the contents of a file, it had first to be downloaded. The indexes are updated on a regular basis (contacting each roughly once a month, so as not to waste too many resources of the remote servers) by requesting a listing. These listings were stored in local files to be searched using the Unix grep command.
The developers populated the engine's servers with databases of anonymous FTP host directories.[11] This was used to find specific file titles since the list was plugged in to a searchable database of FTP sites.[12] Archie did not recognize natural language requests nor index the content inside the files. Therefore, users had to know the title of the file they wanted. The ability to index the content inside the files was later introduced by Gopher.
Development
[edit]Emtage and Heelan wrote a script allowing people to log in and search collected information using the Telnet protocol at the host "archie.mcgill.ca" [132.206.2.3].[13] Later, more efficient front- and back-ends were developed, and the system spread from a local tool to a network-wide resource and a popular service available from multiple sites around the Internet. The collected data would be exchanged between the neighbouring Archie servers. The servers could be accessed in multiple ways: using a local client (such as archie or xarchie); telnetting to a server directly; sending queries by electronic mail;[14] and later via a World Wide Web interface. At the peak of its popularity, the Archie search engine accounted for 50% of Montreal Internet traffic.[15]
In 1992, Emtage, along with J. Peter Deutsch and some financial help from McGill University, formed Bunyip Information Systems with a licensed commercial version of the Archie search engine used by millions of people worldwide. Heelan followed them into Bunyip soon after, where he together with Bibi Ali and Sandro Mazzucato significantly updated the Archie database and indexed web pages. Work on the search engine ceased in the late 1990s, and the company dissolved in 2003.[16]
See also
[edit]References
[edit]- ^ Deutsch, Peter (11 September 1990). "[next] An Internet archive server server (was about Lisp)". Retrieved 29 December 2017.
- ^ "The First Search Engine, Archie". Archived from the original on 21 June 2007. Retrieved 26 May 2007.
- ^ "Archie". PC Magazine. Retrieved 20 September 2020.
- ^ Alexandra Samuel (21 February 2017). "Meet Alan Emtage, the Black Technologist Who Invented ARCHIE, the First Internet Search Engine". ITHAKA. Retrieved 20 September 2020.
- ^ loop news barbados (30 August 2019). "Alan Emtage- a Barbadian you should know". loopnewsbarbados.com. Retrieved 28 April 2022.
- ^ Dino Grandoni, Alan Emtage (April 2013). "Alan Emtage: The Man Who Invented The World's First Search Engine (But Didn't Patent It)". HuffPost. Retrieved 21 September 2020.
- ^ The Serial Port (11 May 2024). We brought back the Internet's first search engine. YouTube.
- ^ Purdy, Kevin (16 May 2024). "Archie, the Internet's first search engine, is rescued and running". Ars Technica. Retrieved 17 May 2024.
- ^ "Article by Kevin Savetz (, )". 9 July 2015. Archived from the original on 9 July 2015. Retrieved 18 March 2023.
- ^ BBC Radio 4 - Saturday Live, 7 November 2009
- ^ West, Nicholas. A Rough Guide to the Internet. Lulu.com. ISBN 9781471005374.
- ^ Ledford, Jerri L. (2015). Search Engine Optimization Bible. Hoboken, NJ: John Wiley & Sons. p. 4. ISBN 9780470452646.
- ^ "Peter Deutsch: archie - An Electronic Directory Service for the Internet". Retrieved 23 February 2012.
- ^ "EFF's (Extended) Guide to the Internet - Your Friend Archie". www2.cs.duke.edu. 12 September 1994. Retrieved 8 January 2020.
- ^ Deutsch, P. (2000). "Archie-a Darwinian development process". IEEE Internet Computing. 4: 69–71. doi:10.1109/4236.815865.
- ^ "Canada Business Listing". CAN1 Business. Retrieved 13 May 2024.
Further reading
[edit]- Archie—A Darwinian Development Process. Peter Deutsch. IEEE Internet Computing, January/February 2000, 4(1):69-71. Part of Millennial Forecasts, doi:10.1109/4236.815865.
- P. Deutsch, A. Emtage, A. Marine, How to Use Anonymous FTP (RFC1635, May 1994)
External links
[edit]- Online instance of Archie
- Last surviving Archie web interface Archived 11 January 2020 at the Wayback Machine
Archie (search engine)
View on GrokipediaOrigins
Creation at McGill University
Development of Archie began in 1989 at McGill University in Montreal, Canada, as a personal project initiated by Alan Emtage, a graduate student in the School of Computer Science, to address the inefficiencies of manually searching for free software across anonymous FTP sites.[4][2] At the time, the Internet had just been introduced at McGill, and Emtage, serving as a system administrator, faced challenges in locating programs for the department's limited resources without dedicated IT support.[5] This automation effort was driven by the need to streamline the collection of FTP directory listings from universities and research institutions, marking the inception of what would become the first Internet search engine.[4] The initial implementation consisted of a set of shell scripts that leveraged FTP protocols to automatically fetch directory listings from anonymous FTP archives, primarily during off-peak hours to utilize the university's slow connection without interference.[5][4] These scripts were later enhanced with tools like procmail to process and index the retrieved data, enabling basic searches via email queries in the absence of the World Wide Web.[4] Emtage developed the system covertly, without formal university approval, due to concerns over bandwidth usage, reflecting his key role in pioneering this resource-discovery tool.[5] The name "Archie" was derived from "archive" with the letter "v" omitted, selected for its simplicity and direct relevance to the project's focus on file archiving and retrieval.[6] Emtage has emphasized that the name had no connection to the Archie comics character, countering a common misconception.[4] Early testing occurred in 1989 on McGill's internal network, where the system managed a small collection of North American FTP sites, providing initial access to computer science students and faculty before broader dissemination in 1990.[4][2] This phase established Archie's foundational role in automating FTP archive management within an academic setting.[5]Key Contributors and Initial Motivation
Alan Emtage, a Black Barbadian computer scientist born in 1964, conceived and implemented the first version of Archie as a postgraduate student in computer science at McGill University in Montreal, Canada, where he earned his B.S. in 1987 and M.S. in 1991.[7][2] As a system administrator at McGill's School of Computer Science, Emtage was primarily motivated by the practical need to efficiently locate free software and public domain files for university staff and students across the burgeoning Internet.[4] He developed the tool out of necessity, automating a manual process that previously required individually connecting to and searching numerous anonymous FTP sites, as no centralized discovery mechanisms existed at the time.[4] Supporting Emtage's efforts were key collaborators at McGill: Bill Heelan, a university system administrator who assisted with scripting to enable user access via Telnet, and J. Peter Deutsch, an undergraduate student who helped refine the code for improved functionality.[8][1] Together, these contributors addressed the challenges posed by the rapid proliferation of anonymous FTP sites in the late 1980s, which facilitated academic sharing but overwhelmed manual search efforts and wasted time for researchers seeking specific files.[4] By 1992, Archie's index had cataloged over 200 such public FTP sites, highlighting the scale of this growth and the tool's utility in streamlining access for the academic community.[9] Archie's initial scope was deliberately limited to indexing academic and research-oriented FTP archives containing free software and public domain resources, explicitly excluding proprietary or commercial content to align with the collaborative ethos of early Internet networks like the National Science Foundation Network.[4] This focus reflected the motivations of its creators, who aimed to support educational and scientific file sharing without encroaching on intellectual property concerns.[2]Functionality
Indexing Process
The indexing process of Archie began with automated connections from its servers to a predefined list of anonymous FTP sites across the Internet. Using the FTP protocol, the system issued commands such asls -IR to recursively fetch directory listings, capturing metadata like filenames, paths, sizes, and modification dates without downloading the full contents of the files themselves. This approach was designed to minimize bandwidth usage while building a comprehensive catalog of publicly available resources.[10][11]
These raw directory listings were then parsed to extract key attributes, primarily relying on the standardized format of FTP responses, and merged into a centralized index. The resulting data was stored in flat-file databases, including a primary filenames index and a supplementary "whatis" database containing short textual descriptions manually added by site administrators. These flat files were optimized for quick searches using Unix utilities like grep, enabling efficient pattern matching on filenames and paths. To handle the growing volume of data, the indexes employed compression techniques to reduce storage requirements.[12][11]
In its early implementation in 1991, when it covered around 600 sites, updates occurred approximately monthly per site via nightly polling of subsets, with minimum bi-weekly cycles. By early 1992, Archie had scaled to index around 900 sites encompassing more than 1 million files, reflecting rapid adoption among academic and research communities. Updates were generally bi-weekly to monthly to balance accuracy with resource constraints, supporting thousands of sites and millions of entries by the mid-1990s.[13][14][11]
A key limitation of Archie's indexing was its exclusive focus on filenames, paths, and brief descriptions, eschewing any full-text analysis of file contents primarily to conserve bandwidth given the limited internet infrastructure of the time.[4] This metadata-only approach meant the index could not search within documents, relying instead on exact or pattern-based matches against surface-level attributes.[15]