Recent from talks
Contribute something
Nothing was collected or created yet.
ArXiv
View on Wikipedia
arXiv (pronounced as "archive"—the X represents the Greek letter chi ⟨χ⟩)[1] is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer reviewed. It consists of scientific papers in the fields of mathematics, physics, astronomy, electrical engineering, computer science, quantitative biology, statistics, mathematical finance, and economics, which can be accessed online. In many fields of mathematics and physics, almost all scientific papers are self-archived on the arXiv repository before publication in a peer-reviewed journal. Some publishers also grant permission for authors to archive the peer-reviewed postprint. Begun on August 14, 1991, arXiv.org passed the half-million-article milestone on October 3, 2008,[2][3] had hit a million by the end of 2014[4][5] and two million by the end of 2021.[6][7] As of November 2024, the submission rate is about 24,000 articles per month.[8]
Key Information
History
[edit]

arXiv was made possible by the compact TeX file format, which allowed scientific papers to be easily transmitted over the Internet and rendered client-side.[11] Around 1990, Joanne Cohn began emailing physics preprints to colleagues as TeX files, but the number of papers being sent soon filled mailboxes to capacity.[12] Paul Ginsparg recognized the need for central storage, and in August 1991 he created a central repository mailbox stored at the Los Alamos National Laboratory (LANL) that could be accessed from any computer.[13] Additional modes of access were soon added: FTP in 1991, Gopher in 1992, and the World Wide Web in 1993.[5][14] The term e-print was quickly adopted to describe the articles.
It began as a physics archive, called the LANL preprint archive, but soon expanded to include astronomy, mathematics, computer science, quantitative biology and, most recently, statistics. Its original domain name was xxx.lanl.gov. Due to LANL's lack of interest in the rapidly expanding technology, in 2001 Ginsparg changed institutions to Cornell University and changed the name of the repository to arXiv.org.[15] Ginsparg brainstormed the new name with his wife; the domain "archive" was already claimed, so "chi" was replaced with "X" standing in as the Greek letter chi and the "e" dropped for symmetry around the "X".[16]
arXiv was an early adopter and promoter of preprints.[17] Its success in sharing preprints was one of the precipitating factors that led to the later movement in scientific publishing known as open access.[17] Mathematicians and scientists regularly upload their papers to arXiv.org for worldwide access[18] and sometimes for reviews before they are published in peer-reviewed journals. Ginsparg was awarded a MacArthur Fellowship in 2002 for his establishment of arXiv.[19] The annual budget for arXiv was approximately $826,000 for 2013 to 2017, funded jointly by Cornell University Library, the Simons Foundation (in both gift and challenge grant forms) and annual fee income from member institutions.[20] This model arose in 2010, when Cornell sought to broaden the financial funding of the project by asking institutions to make annual voluntary contributions based on the amount of download usage by each institution. Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,000 to $4,400. Cornell's goal is to raise at least $504,000 per year through membership fees generated by approximately 220 institutions.[21]
In September 2011, Cornell University Library took overall administrative and financial responsibility for arXiv's operation and development. Ginsparg was quoted in the Chronicle of Higher Education as joking that it "was supposed to be a three-hour tour, not a life sentence".[22] However, Ginsparg remains on the arXiv's Scientific Advisory Board and its Physics Advisory Committee.[23][24]
In January 2022, arXiv began assigning DOIs to articles, in collaboration with DataCite.[25]
Data format
[edit]Each arXiv paper has a unique identifier:
YYMM.NNNNN, e.g.1507.00123,YYMM.NNNN, e.g.0704.0001,arch-ive/YYMMNNNfor older papers, e.g.hep-th/9901001.
Different versions of the same paper are specified by a version number at the end. For example, 1709.08980v1. If no version number is specified, the default is the latest version.
arXiv uses a category system. Each paper is tagged with one or more categories. Some categories have two layers. For example, q-fin.TR is the "Trading and Market Microstructure" category within "quantitative finance". Other categories have one layer. For example, hep-ex is "high energy physics experiments".
Moderation process and endorsement
[edit]Although arXiv is not peer reviewed, a collection of moderators for each area review the submissions; they may recategorize any that are deemed off-topic,[26] or reject submissions that are not scientific papers, or sometimes for undisclosed reasons.[27] The lists of moderators for many sections of arXiv are publicly available,[28] but moderators for most of the physics sections remain unlisted.
Additionally, an "endorsement" system was introduced in 2004 as part of an effort to ensure content is relevant and of interest to current research in the specified disciplines.[29] Under the system, for categories that use it, an author must be endorsed by an established arXiv author before being allowed to submit papers to those categories. Endorsers are not asked to review the paper for errors but to check whether the paper is appropriate for the intended subject area.[26] New authors from recognized academic institutions generally receive automatic endorsement, which in practice means that they do not need to deal with the endorsement system at all. However, the endorsement system has attracted criticism for allegedly restricting scientific inquiry.[30][31]
A majority of the e-prints are also submitted to journals for publication, but some work, including some very influential papers, remain purely as e-prints and are never published in a peer-reviewed journal. A well-known example of the latter is an outline of a proof of Thurston's geometrization conjecture, including the Poincaré conjecture as a particular case, uploaded by Grigori Perelman in November 2002.[32] Perelman appears content to forgo the traditional peer-reviewed journal process, stating: "If anybody is interested in my way of solving the problem, it's all there [on the arXiv] – let them go and read about it".[33] Despite this non-traditional method of publication, other mathematicians recognized this work by offering the Fields Medal and Clay Mathematics Millennium Prizes to Perelman, both of which he refused.[34]
While arXiv does contain some dubious e-prints, such as those claiming to refute famous theorems or proving famous conjectures such as Fermat's Last Theorem using only high-school mathematics, a 2002 article which appeared in Notices of the American Mathematical Society described those as "surprisingly rare".[35] arXiv generally re-classifies these works, e.g. in "General mathematics", rather than deleting them;[36] however, some authors have voiced concern over the lack of transparency in the arXiv screening process.[27]
Withdrawn preprints
[edit]It has been reported that 14,000 preprints have been withdrawn at arXiv, most commonly due to "crucial errors".[37] A lesser number of the withdrawals were due to the preprint being subsumed by another publication. The report itself was posted at arXiv December, 2024.
Submission formats
[edit]Papers can be submitted in any of several formats, including LaTeX, and PDF printed from a word processor other than TeX or LaTeX. The submission is rejected by the arXiv software if generating the final PDF file fails, if any image file is too large, or if the total size of the submission is too large. arXiv now allows one to store and modify an incomplete submission, and only finalize the submission when ready. The time stamp on the article is set when the submission is finalized.
Access
[edit]
The standard access route is through the arXiv.org website, which is publicly accessible and does not require an account. Other interfaces and access routes have also been created by other un-associated organisations.
Metadata for arXiv is made available through OAI-PMH, the standard for open access repositories.[38] Content is therefore indexed in all major consumers of such data, such as BASE, CORE and Unpaywall. As of 2020, the Unpaywall dump links over 500,000 arxiv URLs as the open access version of a work found in CrossRef data from the publishers, making arXiv a top 10 global host of green open access.
Finally, researchers can select sub-fields and receive daily e-mailings or RSS feeds of all submissions in them.
Copyright status of files
[edit]Files on arXiv can have a number of different copyright statuses:[39]
- Some are public domain, in which case they will have a statement saying so.
- Some are available under either the Creative Commons 4.0 Attribution-ShareAlike license or the Creative Commons 4.0 Attribution-Noncommercial-ShareAlike license.
- Some are copyright to the publisher, but the author has the right to distribute them and has given arXiv a non-exclusive irrevocable license to distribute them.
- Most are copyright to the author, and arXiv has only a non-exclusive irrevocable license to distribute them.
See also
[edit]Citations
[edit]- ^ Steele, Bill (Fall 2012). "Library-managed 'arXiv' spreads scientific advances rapidly and worldwide". Ezra. Ithaca, New York: Cornell University. p. 9. OCLC 263846378. Archived from the original on January 11, 2015.
Pronounce it 'archive'. The X represents the Greek letter chi [ χ ].
- ^ Ginsparg, Paul (2011). "It was twenty years ago today ...". arXiv:1108.2700 [cs.DL].
- ^ "Online Scientific Repository Hits Milestone: With 500,000 Articles, arXiv Established as Vital Library Resource". News.library.cornell.edu. October 3, 2008. Retrieved July 21, 2013.
- ^ Vence, Tracy (December 29, 2014), "One Million Preprints and Counting: A conversation with arXiv founder Paul Ginsparg", The Scientist
- ^ a b Staff (January 13, 2015). "In the News: Open Access Journals". Drug Discovery & Development.
- ^ "Monthly Submissions". arxiv.org. Retrieved May 16, 2023.
- ^ "Reports – arXiv info". info.arxiv.org. Retrieved May 16, 2023.
- ^ "arXiv monthly submission rate statistics". Arxiv.org. Retrieved November 19, 2024.
- ^ "Image" (GIF). Cs.cornell.edu. Retrieved March 9, 2019.
- ^ Ginsparg, Paul (August 4, 2021). "Lessons from arXiv's 30 years of information sharing". Nature Reviews Physics. 3 (9): 602–603. Bibcode:2021NatRP...3..602G. doi:10.1038/s42254-021-00360-z. ISSN 2522-5820. PMC 8335983. PMID 34377944.
- ^ O'Connell, Heath (2002). "Physicists Thriving with Paperless Publishing" (PDF). High Energy Physics Libraries Webzine. 6 (6): 3. arXiv:physics/0007040. Bibcode:2000physics...7040O. Archived (PDF) from the original on October 9, 2022.
- ^ Feder, Toni (November 8, 2021). "Joanne Cohn and the email list that led to arXiv". Physics Today. 2021 (4) 1108a. Bibcode:2021PhT..2021d1108.. doi:10.1063/PT.6.4.20211108a. S2CID 244015728.
- ^ Feder, Toni (November 8, 2021). "Joanne Cohn and the email list that led to arXiv". Physics Today. 2021 (4) 1108a. Bibcode:2021PhT..2021d1108.. doi:10.1063/PT.6.4.20211108a. S2CID 244015728.
- ^ Ginsparg, Paul (October 1, 2008). "The global-village pioneers". Physics World. Retrieved October 10, 2020.
- ^ Butler, Declan (July 5, 2001). "Los Alamos Loses Physics Archive as Preprint Pioneer Heads East". Nature. 412 (6842): 3–4. Bibcode:2001Natur.412....3B. doi:10.1038/35083708. PMID 11452262. S2CID 1527860.
- ^ Han, Sheon. "Inside arXiv—the Most Transformative Platform in All of Science". Wired. ISSN 1059-1028. Retrieved March 28, 2025.
- ^ a b "Celebrating 30 Years of arXiv and Its Lasting Legacy on Scientific Advancement". SPARC. October 25, 2021.
- ^ Glanz, James (May 1, 2001). "The World of Science Becomes a Global Village; Archive Opens a New Realm of Research". The New York Times.
- ^ Bill Steele (September 23, 2002). "Cornell professor Paul Ginsparg, science communication rebel, named a MacArthur Foundation fellow; three other alumni also receive 'genius award' fellowships". Cornell Chronicle. Archived from the original on October 27, 2021.
- ^ "Cornell University Library arXiv Financial Projections for 2013-2017" (PDF). Confluence.cornell.edu. March 28, 2012. Retrieved February 26, 2017.
- ^ "arXiv Member Institutions (2021) – arXiv about – Our Members". arXiv.org. Retrieved December 27, 2021.
- ^ Fischman, Joah (August 10, 2011). "The First Free Research-Sharing Site, arXiv, Turns 20 With an Uncertain Future". Chronicle of Higher Education. Retrieved August 12, 2011.
- ^ "arXiv Scientific Advisory Board | arXiv e-print repository". arxiv.org. Retrieved October 10, 2020.
- ^ "About the Physics Archive | arXiv e-print repository". arxiv.org. Retrieved October 10, 2020.
- ^ "New arXiv articles are now automatically assigned DOIs". Retrieved April 4, 2023.
- ^ a b McKinney, Michelle (2011), "ArXiv.org", Reference Reviews, 25 (7): 35–36, doi:10.1108/09504121111168622
- ^ a b Merali, Zeeya (January 29, 2016). "ArXiv rejections lead to spat over screening process". Nature. doi:10.1038/nature.2016.19267. S2CID 189061969. Retrieved December 14, 2017.
- ^ "Current arXiv moderators". Arxiv.org. Retrieved October 3, 2024.
- ^ Ginsparg, Paul (2006), "As we may read", Journal of Neuroscience, 26 (38): 9606–9608, doi:10.1523/JNEUROSCI.3161-06.2006, PMC 6674456, PMID 16988030
- ^ Greechie, Richard; Pulmannova, Sylvia; Svozil, Karl (July 2005), "Preface to the Proceedings of Quantum Structures 2002", International Journal of Theoretical Physics, 44 (7): 691–692, Bibcode:2005IJTP...44..691G, doi:10.1007/s10773-005-7053-z, S2CID 121442106,
The new endorsement system may contribute to an effective barrier, a digital divide
- ^ Josephson, Brian (February 23, 2005). "Vital resource should be open to all physicists". Nature. 433 (7028): 800. Bibcode:2005Natur.433..800J. doi:10.1038/433800a. PMID 15729314.
- ^ Perelman, Grisha (November 11, 2002). "The entropy formula for the Ricci flow and its geometric applications". arXiv:math.DG/0211159.
- ^ Lobastova, Nadejda; Hirst, Michael (August 21, 2006). "Maths genius living in poverty". Sydney Morning Herald.
- ^ Kaufman, Marc (July 2, 2010), "Russian mathematician wins $1 million prize, but he appears to be happy with $0", Washington Post
- ^ Jackson, Allyn (2002). "From Preprints to E-prints: The Rise of Electronic Preprint Servers in Mathematics" (PDF). Notices of the American Mathematical Society. 49 (1): 23–32.
- ^ Ginsparg, Paul (August 2011). "ArXiv at 20". Nature. 476 (7359): 145–147. Bibcode:2011Natur.476..145G. doi:10.1038/476145a. ISSN 0028-0836. PMID 21833066. S2CID 4421407.
- ^ Rao, Delip; Young, Jonathan; Dietterich, Thomas; Callison-Burch, Chris (2024). "WithdrarXiv: A Large-Scale Dataset for Retraction Study". arXiv:2412.03775 [cs.CL].
- ^ "Open Archives Initiative (OAI)". arxiv.org. Retrieved April 25, 2020.
- ^ "arXiv License Information". Arxiv.org. Retrieved July 21, 2013.
General and cited sources
[edit]- Butler, Declan (2003). "Biologists Join Physics Preprint Club". Nature. 425 (6958): 548. Bibcode:2003Natur.425..548B. doi:10.1038/425548b. PMID 14534551. S2CID 4374168.
- Choi, Charles Q. (2003). "Biology's New Online Archive". The Scientist. Archived from the original on March 13, 2005. Retrieved June 21, 2005.
- Giles, Jim (2003). "Preprint Server Seeks Way to Halt Plagiarists". Nature. 426 (6962): 7. Bibcode:2003Natur.426Q...7G. doi:10.1038/426007a. PMID 14603280. S2CID 29003994.
- Ginsparg, Paul (1997). "Winners and Losers in the Global Research Village". The Serials Librarian. 30 (3–4): 83–95. doi:10.1300/J123v30n03_13.
- Halpern, Joseph Y. (1998). "A Computing Research Repository". D-Lib Magazine. 4 (11). doi:10.1045/november98-halpern.
- Halpern, Joseph Y. (2000). "CoRR: A computing research repository". ACM Journal of Computer Documentation. 24 (2): 41–48. arXiv:cs.DL/0005003. Bibcode:2000cs........5003H. doi:10.1145/337271.337274. S2CID 5453868.
- Luce, Richard E. (2001). "e-Prints Intersect the Digital Library: Inside the los Alamos arXiv". Issues in Science and Technology Librarianship (29). doi:10.5062/F44B2Z95.
- McKiernan, Gerry (2000). "ArXiv.org: The los Alamos National Laboratory e-print server" (PDF). International Journal on Grey Literature. 1 (3): 127–138. doi:10.1108/14666180010345564. Archived from the original (PDF) on May 5, 2005.
- Pinfield, Stephen (2001). "How Do Physicists Use an E-Print Archive? Implications for Institutional E-Print Services". D-Lib Magazine. 7 (12). doi:10.1045/december2001-pinfield.
- Quigley, Brian (2000). "Physics Databases and the Los Alamos e-Print Archive". EContent. 23 (5): 22–26.
- Taubes, Gary (1993). "Publication by Electronic Mail Takes Physics by Storm". Science. 259 (5099): 1246–1248. Bibcode:1993Sci...259.1246T. doi:10.1126/science.259.5099.1246. PMID 17732237.
- Warner, Simeon (2001). "Open Archives Initiative protocol development and implementation at arXiv". arXiv:cs/0101027.
- "What Is q-bio?". Open Access Now. 2004.
External links
[edit]ArXiv
View on Grokipedia- [11] Exploring Reasoning Reward Model for Agents
- [12] Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data
- [13] A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
For the complete and up-to-date list of AI papers from the last 24 hours, check arXiv's new or recent submissions in cs.AI, as papers are added continuously and batched daily.[14] In November 2025, arXiv updated its policies to no longer accept review or position papers in computer science due to a surge in low-quality AI-generated submissions. As of February 2026, no announcements or changes specific to AI papers have been made beyond ongoing category activity; submission guidelines remain general and apply across all categories, including AI (cs.AI). Always check the official arXiv help pages for the latest details.[2][10][15][16]
History
Founding and Early Years
arXiv was founded by physicist Paul Ginsparg in 1991 at the Los Alamos National Laboratory (LANL), where he developed it as a centralized automated email distribution system for preprints in theoretical high-energy physics. Motivated by the inefficiencies of physical preprint exchanges and the growing use of email lists for sharing TeX files among physicists, Ginsparg created the initial archive under the domain xxx.lanl.gov to automate collection, storage, and dissemination of these documents. The system focused exclusively on high-energy physics theory (hep-th), addressing the need for rapid sharing within this specialized community.[21][22][3] The first submission arrived on August 14, 1991, to the email address [email protected], marking the official start of operations. Designed for a small user base of about 100 physicists, the archive processed submissions via email, generating compressed TeX files and distributing them daily to subscribers. In its inaugural year, arXiv received 353 total submissions, far exceeding Ginsparg's conservative estimate of around 100 annually, as adoption spread rapidly through word-of-mouth in the hep-th community. By 1992, submissions had grown to over 1,000, reflecting the system's utility in accelerating feedback and collaboration.[21][23][2] Early growth posed significant challenges, as the volume surged from hundreds to thousands of submissions per year by the mid-1990s, straining LANL's computational resources and requiring manual oversight for file processing and quality control. To cope, Ginsparg integrated feedback from users and expanded the hep-th archive to handle diverse formats while maintaining focus on unrefereed preprints. In 1993, the system transitioned to a web interface, enabling browser-based browsing and submission; this also allowed merging of decentralized remote archives back into the central repository, streamlining operations. In 1994, with support from a National Science Foundation grant, enhancements were made, including rewriting the code in Perl. By June 1995, automated PostScript generation was implemented for submissions, further reducing administrative burdens and enhancing accessibility.[21][2][24]Expansion and Institutional Changes
In 2001, arXiv's founder Paul Ginsparg returned to Cornell University from Los Alamos National Laboratory, relocating the archive's operations to Cornell and rebranding it as arXiv.org under the stewardship of Cornell University Library.[25][26] This move marked a pivotal institutional shift, enabling sustained academic oversight and integration into a university ecosystem, with arXiv formally operated by Cornell thereafter.[27] arXiv expanded its subject coverage to foster interdisciplinary growth, adding the Quantitative Biology (q-bio) archive on September 15, 2003, to accommodate experimental, numerical, statistical, and mathematical contributions relevant to biology.[28][29] In 2007, the Statistics (stat) archive was introduced on April 1, organizing content into categories such as Applications, Methodology, and Theory to better serve statistical research across domains like biology and engineering.[30] The Economics (econ) archive followed in September 2017, initially focusing on Econometrics before expanding to areas like General Economics and Theoretical Economics.[31][32] During the 2010s, arXiv integrated with ORCID in early 2015, allowing users to link their unique researcher identifiers to arXiv accounts for improved attribution and cross-platform connectivity of scholarly works.[33] This period also saw rapid scaling, with total submissions surpassing 1 million by the end of 2014, reflecting arXiv's growing role as a central hub for preprint dissemination.[34][35] Institutionally, operations transitioned in 2018 from Cornell University Library to Cornell's Computing and Information Science unit, enhancing technical infrastructure while maintaining academic governance.[36][37] In recent years through 2025, arXiv has bolstered support for artificial intelligence and machine learning categories (cs.AI and cs.LG) amid surging submissions in these fields, including refinements to handle increased volume and interdisciplinary overlaps. In October 2025, arXiv updated its moderation policy for the computer science category, no longer accepting review or position papers to address issues with AI-generated spam submissions. Additionally, in November 2025, arXiv Labs paused acceptance of new experimental project proposals to prioritize ongoing initiatives. Partnerships with journals for direct submissions have also advanced, enabling seamless transfers from arXiv preprints to peer-reviewed outlets; for instance, General Relativity and Gravitation implemented arXiv-integrated workflows in its Editorial Manager system, while open-access journals like those under SCOAP³ facilitate direct posting and review pipelines.[38][39] Funding has remained crucial, with primary support from Cornell University, supplemented by grants from the National Science Foundation and the Simons Foundation, including over $10 million combined in 2023 for infrastructure upgrades and sustainability.[40][27][41][42][43]Purpose and Scope
Subject Categories
arXiv employs a hierarchical classification system to organize submissions into distinct subject areas, facilitating targeted discovery and dissemination within scientific communities. This taxonomy divides content into primary archives, each representing a broad discipline, with further subdivisions into specialized subcategories. The system ensures that papers are grouped logically, allowing users to browse, search, and subscribe to updates by specific interests. Primary categories include Physics, Mathematics (math), Computer Science (cs), Quantitative Biology (q-bio), Statistics (stat), Electrical Engineering and Systems Science (eess), Quantitative Finance (q-fin), and Economics (econ).[1][44] Within each primary category, subcategories provide finer granularity. For instance, the Physics archive encompasses several specialized areas such as Astrophysics (astro-ph), Condensed Matter (cond-mat), and General Relativity and Quantum Cosmology (gr-qc). The Astrophysics subcategory (astro-ph) further branches into divisions like Astrophysics of Galaxies (astro-ph.GA), Cosmology and Nongalactic Astrophysics (astro-ph.CO), and High Energy Astrophysical Phenomena (astro-ph.HE), enabling precise classification of research topics. Similar hierarchical structures apply across other archives; for example, Computer Science includes subfields like Artificial Intelligence (cs.AI) and Machine Learning (cs.LG). This nested organization supports cross-listing, where a submission can appear in multiple relevant categories to enhance visibility.[44][45] The category system originated with arXiv's founding in 1991, initially limited to physics subfields like high-energy physics, reflecting its roots in serving particle physicists. Over time, it expanded to encompass interdisciplinary areas, with Mathematics added in the early 1990s, Computer Science in 1993, Quantitative Biology in 2003, Statistics in 2007, Quantitative Finance in 2008, and more recent additions like Economics and Electrical Engineering and Systems Science in 2017. This evolution transformed arXiv from a physics-centric repository into a multidisciplinary platform hosting over 2.8 million e-prints across quantitative sciences, as of November 2025.[46][1][7][10] Submissions are announced through category-specific email lists, providing daily digests of new abstracts to subscribers. These announcements, sent Sunday through Friday, include titles, authors, and summaries, helping researchers stay abreast of developments in their fields without manual searching. Users can subscribe to individual categories or subcategories via email requests to arXiv's automated system.[47][48] While comprehensive for quantitative and physical sciences, arXiv's categories deliberately exclude humanities and most social sciences, focusing instead on areas amenable to preprint sharing and mathematical rigor. This scope aligns with its mission to accelerate dissemination in fields where rapid feedback is valuable, leaving qualitative disciplines to other repositories.[1][7]Content Types and Policies
arXiv primarily hosts preprints of scholarly papers, with research articles forming the core content type, encompassing original research contributions across its subject areas. Accepted materials also include review articles, summaries or excerpts from theses and dissertations, conference proceedings, and occasionally books or book chapters when they align with scholarly standards. These submissions enable rapid dissemination of scientific work prior to formal peer review, fostering open access to emerging research. As of October 2025, submissions of review articles and position papers to the Computer Science category must have been previously accepted by a peer-reviewed journal or conference.[49][17][42] Submissions must adhere to strict policies ensuring topical relevance to arXiv's categories—such as physics, mathematics, computer science, and quantitative biology—and represent original, refereeable scholarly contributions that follow established norms of academic communication. Prohibited content includes patents, standalone software code without accompanying scholarly narrative, non-academic materials like blog posts or opinion pieces, abstracts alone, course projects, poster summaries, and research proposals without substantive results. Political, offensive, or non-scientific content is also rejected during moderation. In the computer science category, a policy update effective late 2025 restricts review articles and position papers to those previously accepted by a peer-reviewed venue, aiming to prioritize original research amid rising submission volumes potentially influenced by AI-generated content. As of early 2026, no additional specific submission guidelines unique to AI papers in the cs.AI category have been implemented for 2026, and general submission policies continue to apply across all categories, including requirements for PDF format (preferably generated from LaTeX), appropriate category and subject class selection, and potential endorsement for new submitters in moderated categories.[50][17][42] While arXiv imposes no rigid page limits, typical research preprints span 10 to 50 pages to maintain readability and focus, with file size capped at 50 MB to ensure efficient processing; oversized submissions may require compression of figures or other optimizations. Format expectations emphasize machine-readable documents, such as those generated from TeX/LaTeX source, with complete references, clear author lists, single spacing, 10-14 point font, and 1-inch margins, excluding line numbers, watermarks, or advertisements.[51][52] Non-endorsed submissions are permitted in established categories for registered users without prior endorsement requirements, though new submitters or those entering endorsement-gated categories receive a flag for moderator review to verify community affiliation. This system helps maintain quality while allowing broad access.[53][50] Ethical guidelines strictly prohibit plagiarism and require originality; all submissions undergo moderation checks for text overlap with existing works, including prior arXiv postings or publications, with authors notified to explain or revise if excessive similarity is detected—such flags do not imply misconduct but ensure transparency. Authors must affirm that submissions are not duplicates of prior arXiv entries and should note any concurrent submissions to other preprint servers to avoid redundancy, though arXiv postings do not constitute prior publication and are compatible with journal dual submissions.[54][55]Operations
Submission Process
Authors seeking to submit preprints to arXiv must first create a free account through the platform's registration process, which requires providing an email address for verification and basic personal details to establish authorship identity.[50] This registration is open to anyone, though institutional email addresses from recognized domains may facilitate easier access to certain categories.[53] Optionally, authors can link their ORCID identifier to their arXiv account during or after registration to enhance the connection of their scholarly outputs across platforms.[56] Once registered, the submission process begins by logging into the arXiv user page and selecting "START NEW SUBMISSION."[50] Authors then choose an appropriate primary category from arXiv's subject areas, such as physics, mathematics, or computer science, to classify the work.[50] The submission guidelines are uniform across all categories, including artificial intelligence (cs.AI), with no unique rules or changes specific to AI papers in 2026. Papers must be submitted in PDF format (preferably generated from LaTeX), with an appropriate category and subject class selected, and may require endorsement for new submitters in moderated categories.[50] Next, they upload the source files, preferably in TeX/LaTeX format as a ZIP archive containing the main document and ancillary files, or alternatively a single PDF if source is unavailable.[57] Concurrently, authors enter essential metadata, including the title, abstract (limited to 1920 characters), author names and affiliations, comments, and journal references if applicable.[50] During this stage, authors must affirm compliance with arXiv's content policies, ensuring the submission is topical, original, and suitable for scholarly communication. For TeX/LaTeX submissions, arXiv's automated compilation system processes the source files to generate a PDF version, using tools like pdfLaTeX to handle standard formats while supporting common packages.[57] This compilation occurs server-side after upload, producing a viewable PDF that incorporates all figures and equations, with notifications sent if errors arise requiring resubmission.[57] PDF-only submissions bypass compilation but must include all embedded fonts and be self-contained to ensure accessibility.[58] First-time submitters to most categories are required to obtain an endorsement from an established arXiv author in that field before their submission can proceed.[53] Endorsement serves to verify the submitter's legitimacy and relevance to the category, and it can be requested through the arXiv interface by identifying potential endorsers via related papers or institutional affiliations.[53] Once endorsed, the submission is queued for processing; categories with high submission volumes or specific moderation needs may impose additional checks, but endorsement is the primary gatekeeping mechanism for newcomers.[53] Upon successful submission and processing, the preprint is assigned a unique arXiv identifier (e.g., arXiv:YYYY.MMxxxx) and announced in arXiv's daily email listings and web updates, typically within 1-2 business days if submitted before the cutoff time, excluding weekends.[48] Since January 2022, all new arXiv articles have been automatically assigned a Digital Object Identifier (DOI) through collaboration with DataCite, formatted as 10.48550/arXiv.YYYYMMxxxx, to improve long-term citability and metadata interoperability.[59][60]Moderation and Endorsement
arXiv employs an endorsement system to ensure that submitters are part of the relevant scientific community, operating on a hierarchical model where established authors endorse newcomers. New users or those submitting to a category for the first time must secure an endorsement from a qualified arXiv author in that domain before proceeding. To qualify as an endorser, individuals must have authored a certain number of papers within the endorsement domain, which varies by subject area (e.g., 3 for some computer science categories).[53] Once endorsed, submissions enter the moderation process, overseen by volunteer moderators who are subject experts appointed by arXiv's advisory committees. These approximately 240 moderators, distributed across arXiv's categories, review flagged submissions for topical appropriateness, compliance with technical and formatting standards, and adherence to scholarly communication norms, while checking for issues like plagiarism or falsified data, but without performing peer review of the scientific content. The process focuses on verifying that content is refereeable and suitable for the archive, with moderators spending limited time—ideally under 30 minutes per day—on their duties. Authors are required to disclose any use of generative AI tools in the preparation of their submission.[61][15][17][49] Submissions are typically held for 1-2 days during moderation, with around 20% flagged for manual review out of the daily volume of 600-800 papers; unflagged or cleared submissions are automatically announced in the next cycle, usually within 24 hours of resolution. Authors whose submissions are rejected due to moderation issues can appeal by contacting the relevant moderators, providing additional context for reconsideration, though repeated appeals without new information are not entertained.[61][62] As of 2025, arXiv has enhanced its moderation with increased automation, incorporating AI tools to flag potential spam, plagiarism, and AI-generated content, particularly in response to a surge in low-quality submissions in fields like computer science. This update allows moderators to focus on higher-priority reviews, with policies such as requiring review and position papers in the CS category to be peer-reviewed and accepted by a conference or journal before submission, to combat floods of low-quality automated papers.[63][42][15]Corrections and Withdrawals
arXiv allows authors to update their submissions post-announcement through a versioning system, where replacing the submission files creates a new version (e.g., v2, v3), incrementing the version number while preserving all prior versions as part of the permanent scientific record.[64] This process ensures that historical iterations remain accessible via the abstract page's version history, supporting transparency in scholarly evolution.[64] Replacements after version 5 are limited to no more than once per week to manage announcement volume, and revisions beyond this point are not included in daily email mailings.[65] For minor corrections that do not warrant a full revision, authors can update specific metadata fields—such as adding or modifying journal references, DOIs, or report numbers—without generating a new version number or altering the announcement date.[66] These changes are processed directly and reflect immediately in the record, facilitating accurate bibliographic information without disrupting the version lineage.[66] However, any substantive file replacement or significant content update requires a new version submission.[65] Withdrawals are permitted for valid reasons including significant errors, duplicate submissions, or ethical concerns, but they do not result in complete removal from the archive.[67] Instead, initiating a withdrawal creates a new version marked as "withdrawn," which includes a public explanation in the comments field but provides no access to the full text; previous versions remain fully available.[67] arXiv policy explicitly prohibits withdrawals due to negative reception or criticism, emphasizing the platform's commitment to maintaining the integrity of the scholarly record.[67] Authors must provide a clear rationale for the withdrawal through the submission interface, ensuring accountability.[67] Such withdrawals are relatively rare, with over 14,000 recorded across arXiv's history through 2024 out of approximately 2.6 million total submissions.[68][10] This low incidence underscores the robustness of arXiv's initial moderation and endorsement processes, which help prevent problematic content upfront.[17]Technical Infrastructure
File Formats and Standards
arXiv strongly prefers submissions in TeX/LaTeX source format to facilitate automatic compilation into PDF, ensuring consistency and enabling the retention of editable source files for future processing and accessibility.[57] This approach allows arXiv to generate high-quality PDFs using its TeX Live distributions, including the default TeX Live 2025, which includes standard packages for bibliographies like biblatex 3.20 and Biber 2.20.[57] Authors are required to provide source files even if submitting a LaTeX-generated PDF, as the system detects and requests the underlying TeX code to avoid "PDF-only" uploads where editable sources are available.[58] While PDF-only submissions are accepted as an alternative, particularly for non-TeX documents, they are discouraged for LaTeX-based work due to limitations in searchability, editing, and long-term maintainability; such submissions must be machine-readable, include all fonts, and avoid features like line numbers or watermarks.[58] Standards emphasize compatibility with arXiv's processing: documents should use standard LaTeX classes (e.g., article.cls or field-specific ones like revtex for physics), with figures in .ps or .eps format for traditional LaTeX (or .pdf, .jpg, .png for PDFLaTeX), and equations handled via core LaTeX math environments without non-standard extensions.[50] Custom macros are permitted if included in the upload, but reliance on unsupported packages can lead to compilation failures.[57] Processing involves automated compilation on arXiv's servers, with support for hyperlinks via the hyperref package to enhance navigability in the output PDF.[69] The AutoTeX system, which iteratively attempted to resolve common errors like missing packages or figure conversions, was retired in April 2025 to streamline operations, replacing it with direct TeX Live compilation and detailed error logs for authors to fix issues manually.[70] In the 2020s, arXiv evolved its standards toward better preservation, recommending PDF/A compliance (ISO 19005-1) for direct PDF uploads to maintain visual fidelity and embeddability over time, independent of software changes.[58] Source uploads during the submission process remain essential for these compiled outputs.[50]Access and Retrieval Methods
Users primarily access arXiv content through the web interface at arXiv.org, where they can perform searches using a simple keyword box or advanced query syntax to discover articles. The search supports field-specific operators such as "au:" for authors (e.g., au:Einstein), "ti:" for titles, "abs:" for abstracts, "cat:" for categories (e.g., cat:cs.AI), and "id:" for arXiv identifiers, combined with Boolean operators like AND, OR, and NOT, as well as phrase searches in quotes.[71] This allows precise retrieval, with results displaying abstracts, authors, categories, and links to full texts, updated daily with new submissions.[72] For programmatic access, arXiv provides a RESTful API that enables querying metadata and abstracts via HTTP GET requests to https://export.arxiv.org/api/query, using the same advanced search syntax as the web interface (e.g., ?search_query=au:author+AND+cat:physics). The API returns results in Atom XML format, supporting pagination with parameters like start and max_results for up to 30,000 items per query, and is designed for non-commercial, open access applications.[71] Additionally, the OAI-PMH protocol facilitates metadata harvesting at https://export.arxiv.org/oai2, providing Dublin Core and arXiv-specific metadata for all articles in sets by category or date, with daily updates shortly after announcements.[73] Bulk retrieval options support large-scale access to arXiv's approximately 2.88 million articles as of November 2025. Metadata can be harvested comprehensively via OAI-PMH, while full-text files—including PDF versions and source tarballs (typically TeX)—are available through Amazon S3 requester-pays buckets at s3://arxiv, organized by submission date and ID, allowing efficient downloads with tools like AWS CLI.[74][75] Although traditional rsync mirrors were discontinued in September 2024, content remains distributed through archival services like the Internet Archive, which hosts snapshots and metadata dumps for redundancy.[76] Third-party mobile applications, such as arXiv mobile for Android and Lib arXiv for iOS, provide on-the-go search and download capabilities using the API.[77] Each arXiv article is assigned a unique identifier in the format YYYY.MMxxxx (e.g., 2311.12345), serving as a permanent link via https://arxiv.org/abs/YYYY.MMxxxx, with versions denoted by 'v#' (e.g., v2). These IDs enable cross-referencing to external databases, including DOIs for published versions and PubMed IDs for biomedical content, integrated directly in article pages for seamless navigation. Retrieval options include direct downloads of PDF (processed for readability), source files (compressed archives), and plain-text abstracts from individual pages. For bulk operations, users can leverage the API or S3 for automated fetching, with source formats adhering to standard TeX conventions to ensure compatibility.Legal Aspects
Copyright and Licensing
Authors retain full copyright ownership of their submissions to arXiv, granting the platform only a non-exclusive, irrevocable license to distribute and preserve the work publicly.[55] This arrangement ensures that submitters maintain all rights to their intellectual property without transferring ownership to arXiv or any third party.[55] By default, arXiv does not impose a specific license on submissions, allowing authors to choose from available options or leave the work unlicensed beyond the required distribution grant.[78] Authors are strongly encouraged to select open licenses such as Creative Commons Attribution (CC BY) to promote reuse and broader dissemination, aligning with arXiv's commitment to open access principles.[78] In 2020, arXiv expanded its licensing options to include CC BY-NonCommercial-NoDerivatives (CC BY-NC-ND), providing flexibility for authors facing restrictive journal policies while still enabling non-commercial sharing with attribution.[79] arXiv's policies explicitly permit posting preprints even if the work is later published elsewhere, provided it complies with the publisher's self-archiving rules, supporting green open access models where authors deposit versions of their manuscripts in repositories like arXiv.[80] For co-authored works, a single submitter can typically grant the license on behalf of the group under U.S. copyright law, but authors must ensure they have authority to do so and resolve any potential conflicts with journal agreements.[80] In the 2020s, funding agencies such as the National Science Foundation (NSF) have increasingly emphasized open licensing for grant-funded research, recommending CC BY or equivalent terms to facilitate reuse in public access plans.[81] This push encourages arXiv submitters supported by NSF to adopt permissive licenses, enhancing the platform's role in compliant scholarly sharing.[81]Archival Policies
arXiv maintains a strong commitment to the perpetual free access of its scholarly content, ensuring that once an article is publicly announced, it cannot be completely deleted from the archive. Withdrawals are permitted by creating a new version marked as withdrawn, which replaces the default view but leaves all previous versions accessible with full text available. Deletions are only possible for submissions prior to announcement, and even then, arXiv reserves the right to remove any submission in extreme cases, such as legal orders or verified copyright infringements. This policy underscores arXiv's role as a permanent record of scientific preprints, preventing loss due to author regret or minor errors.[67][55] For long-term preservation, arXiv relies on the archival infrastructure developed by Cornell University Library, including redundant storage systems at Cornell and off-site locations to safeguard file integrity. The platform preserves submitted source files like LaTeX when provided and maintains bitstream preservation to ensure the authenticity of original submissions. Historically, arXiv accepted PostScript files but has transitioned to emphasizing source submissions for better long-term accessibility.[46] To facilitate offline access and broader preservation efforts, arXiv releases the full corpus through bulk data downloads, including metadata via OAI-PMH and full-text files via AWS S3 snapshots, updated regularly to capture the entire archive. These dumps enable researchers and institutions to create local mirrors for independent verification and use. Additionally, backups are distributed across partners; in 2025, the Technische Informationsbibliothek (TIB) in Germany established a dark archive containing a complete copy of arXiv's content, serving as a redundant safeguard against potential disruptions to the primary U.S.-based storage. This licensing framework supports such archival reuse by granting non-exclusive rights for distribution in support of open science.[73][46][82]Impact and Usage
Usage Statistics
arXiv has experienced significant growth in submissions since its inception, reaching a total of 2,884,283 papers as of November 15, 2025.[10] In 2024, the platform received 244,031 new submissions, averaging approximately 20,336 per month, with monthly figures surpassing 24,000 by late 2024.[83] Projections for 2025 indicate continued expansion, with over 284,000 new submissions in the first 11 months, suggesting an annual total exceeding 300,000. In 2025, monthly submissions set new records, reaching 26,646 in September.[10][84] Download activity underscores arXiv's scale, with over 3.2 billion cumulative downloads recorded by the end of 2024.[83] Annual downloads have grown substantially, reaching more than 552 million in 2023 based on monthly averages of 46 million.[85] Usage is particularly concentrated in high-impact categories, such as computer science's machine learning subcategory (cs.LG), which led submissions in October 2024 with thousands of papers and corresponding high download volumes.[16] User engagement reflects a predominantly academic audience, with over 5 million monthly active users as of 2024.[83] Geographically, the United States leads in contributions and usage, followed by China, the United Kingdom, Germany, and other European nations, accounting for the majority of global activity.[86] Historically, submission growth was exponential during the 1990s, starting from a few dozen papers monthly to thousands by decade's end.[10] Post-2010, the platform has maintained a steady annual growth rate of 10-15%, driven by expansions in computer science and quantitative biology.[87] These metrics are derived from arXiv's annual reports and integrated analytics tools.[88]| Year | New Submissions | Approximate Annual Growth Rate (%) |
|---|---|---|
| 2010 | ~80,000 | - |
| 2015 | ~105,000 | 12 |
| 2020 | ~170,000 | 10 |
| 2024 | 244,031 | 15 |
