Hubbry Logo
ArXivArXivMain
Open search
ArXiv
Community hub
ArXiv
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
ArXiv
ArXiv
from Wikipedia

arXiv (pronounced as "archive"—the X represents the Greek letter chi ⟨χ⟩)[1] is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer reviewed. It consists of scientific papers in the fields of mathematics, physics, astronomy, electrical engineering, computer science, quantitative biology, statistics, mathematical finance, and economics, which can be accessed online. In many fields of mathematics and physics, almost all scientific papers are self-archived on the arXiv repository before publication in a peer-reviewed journal. Some publishers also grant permission for authors to archive the peer-reviewed postprint. Begun on August 14, 1991, arXiv.org passed the half-million-article milestone on October 3, 2008,[2][3] had hit a million by the end of 2014[4][5] and two million by the end of 2021.[6][7] As of November 2024, the submission rate is about 24,000 articles per month.[8]

Key Information

History

[edit]
A screenshot of the arXiv taken in 1994,[9] using the browser NCSA Mosaic. At the time, HTML forms were a new technology.
arXiv's yearly submission rate growth over 30 years since its beginning with topics labeled by the standard abbreviations used on arxiv.org[10]

arXiv was made possible by the compact TeX file format, which allowed scientific papers to be easily transmitted over the Internet and rendered client-side.[11] Around 1990, Joanne Cohn began emailing physics preprints to colleagues as TeX files, but the number of papers being sent soon filled mailboxes to capacity.[12] Paul Ginsparg recognized the need for central storage, and in August 1991 he created a central repository mailbox stored at the Los Alamos National Laboratory (LANL) that could be accessed from any computer.[13] Additional modes of access were soon added: FTP in 1991, Gopher in 1992, and the World Wide Web in 1993.[5][14] The term e-print was quickly adopted to describe the articles.

It began as a physics archive, called the LANL preprint archive, but soon expanded to include astronomy, mathematics, computer science, quantitative biology and, most recently, statistics. Its original domain name was xxx.lanl.gov. Due to LANL's lack of interest in the rapidly expanding technology, in 2001 Ginsparg changed institutions to Cornell University and changed the name of the repository to arXiv.org.[15] Ginsparg brainstormed the new name with his wife; the domain "archive" was already claimed, so "chi" was replaced with "X" standing in as the Greek letter chi and the "e" dropped for symmetry around the "X".[16]

arXiv was an early adopter and promoter of preprints.[17] Its success in sharing preprints was one of the precipitating factors that led to the later movement in scientific publishing known as open access.[17] Mathematicians and scientists regularly upload their papers to arXiv.org for worldwide access[18] and sometimes for reviews before they are published in peer-reviewed journals. Ginsparg was awarded a MacArthur Fellowship in 2002 for his establishment of arXiv.[19] The annual budget for arXiv was approximately $826,000 for 2013 to 2017, funded jointly by Cornell University Library, the Simons Foundation (in both gift and challenge grant forms) and annual fee income from member institutions.[20] This model arose in 2010, when Cornell sought to broaden the financial funding of the project by asking institutions to make annual voluntary contributions based on the amount of download usage by each institution. Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,000 to $4,400. Cornell's goal is to raise at least $504,000 per year through membership fees generated by approximately 220 institutions.[21]

In September 2011, Cornell University Library took overall administrative and financial responsibility for arXiv's operation and development. Ginsparg was quoted in the Chronicle of Higher Education as joking that it "was supposed to be a three-hour tour, not a life sentence".[22] However, Ginsparg remains on the arXiv's Scientific Advisory Board and its Physics Advisory Committee.[23][24]

In January 2022, arXiv began assigning DOIs to articles, in collaboration with DataCite.[25]

Data format

[edit]

Each arXiv paper has a unique identifier:

  • YYMM.NNNNN, e.g. 1507.00123,
  • YYMM.NNNN, e.g. 0704.0001,
  • arch-ive/YYMMNNN for older papers, e.g. hep-th/9901001.

Different versions of the same paper are specified by a version number at the end. For example, 1709.08980v1. If no version number is specified, the default is the latest version.

arXiv uses a category system. Each paper is tagged with one or more categories. Some categories have two layers. For example, q-fin.TR is the "Trading and Market Microstructure" category within "quantitative finance". Other categories have one layer. For example, hep-ex is "high energy physics experiments".

Moderation process and endorsement

[edit]

Although arXiv is not peer reviewed, a collection of moderators for each area review the submissions; they may recategorize any that are deemed off-topic,[26] or reject submissions that are not scientific papers, or sometimes for undisclosed reasons.[27] The lists of moderators for many sections of arXiv are publicly available,[28] but moderators for most of the physics sections remain unlisted.

Additionally, an "endorsement" system was introduced in 2004 as part of an effort to ensure content is relevant and of interest to current research in the specified disciplines.[29] Under the system, for categories that use it, an author must be endorsed by an established arXiv author before being allowed to submit papers to those categories. Endorsers are not asked to review the paper for errors but to check whether the paper is appropriate for the intended subject area.[26] New authors from recognized academic institutions generally receive automatic endorsement, which in practice means that they do not need to deal with the endorsement system at all. However, the endorsement system has attracted criticism for allegedly restricting scientific inquiry.[30][31]

A majority of the e-prints are also submitted to journals for publication, but some work, including some very influential papers, remain purely as e-prints and are never published in a peer-reviewed journal. A well-known example of the latter is an outline of a proof of Thurston's geometrization conjecture, including the Poincaré conjecture as a particular case, uploaded by Grigori Perelman in November 2002.[32] Perelman appears content to forgo the traditional peer-reviewed journal process, stating: "If anybody is interested in my way of solving the problem, it's all there [on the arXiv] – let them go and read about it".[33] Despite this non-traditional method of publication, other mathematicians recognized this work by offering the Fields Medal and Clay Mathematics Millennium Prizes to Perelman, both of which he refused.[34]

While arXiv does contain some dubious e-prints, such as those claiming to refute famous theorems or proving famous conjectures such as Fermat's Last Theorem using only high-school mathematics, a 2002 article which appeared in Notices of the American Mathematical Society described those as "surprisingly rare".[35] arXiv generally re-classifies these works, e.g. in "General mathematics", rather than deleting them;[36] however, some authors have voiced concern over the lack of transparency in the arXiv screening process.[27]

Withdrawn preprints

[edit]

It has been reported that 14,000 preprints have been withdrawn at arXiv, most commonly due to "crucial errors".[37] A lesser number of the withdrawals were due to the preprint being subsumed by another publication. The report itself was posted at arXiv December, 2024.

Submission formats

[edit]

Papers can be submitted in any of several formats, including LaTeX, and PDF printed from a word processor other than TeX or LaTeX. The submission is rejected by the arXiv software if generating the final PDF file fails, if any image file is too large, or if the total size of the submission is too large. arXiv now allows one to store and modify an incomplete submission, and only finalize the submission when ready. The time stamp on the article is set when the submission is finalized.

Access

[edit]
A screenshot of viewing a paper's abstract on arxiv.org in 2021

The standard access route is through the arXiv.org website, which is publicly accessible and does not require an account. Other interfaces and access routes have also been created by other un-associated organisations.

Metadata for arXiv is made available through OAI-PMH, the standard for open access repositories.[38] Content is therefore indexed in all major consumers of such data, such as BASE, CORE and Unpaywall. As of 2020, the Unpaywall dump links over 500,000 arxiv URLs as the open access version of a work found in CrossRef data from the publishers, making arXiv a top 10 global host of green open access.

Finally, researchers can select sub-fields and receive daily e-mailings or RSS feeds of all submissions in them.

[edit]

Files on arXiv can have a number of different copyright statuses:[39]

  1. Some are public domain, in which case they will have a statement saying so.
  2. Some are available under either the Creative Commons 4.0 Attribution-ShareAlike license or the Creative Commons 4.0 Attribution-Noncommercial-ShareAlike license.
  3. Some are copyright to the publisher, but the author has the right to distribute them and has given arXiv a non-exclusive irrevocable license to distribute them.
  4. Most are copyright to the author, and arXiv has only a non-exclusive irrevocable license to distribute them.

See also

[edit]

Citations

[edit]
  1. ^ Steele, Bill (Fall 2012). "Library-managed 'arXiv' spreads scientific advances rapidly and worldwide". Ezra. Ithaca, New York: Cornell University. p. 9. OCLC 263846378. Archived from the original on January 11, 2015. Pronounce it 'archive'. The X represents the Greek letter chi [ χ ].
  2. ^ Ginsparg, Paul (2011). "It was twenty years ago today ...". arXiv:1108.2700 [cs.DL].
  3. ^ "Online Scientific Repository Hits Milestone: With 500,000 Articles, arXiv Established as Vital Library Resource". News.library.cornell.edu. October 3, 2008. Retrieved July 21, 2013.
  4. ^ Vence, Tracy (December 29, 2014), "One Million Preprints and Counting: A conversation with arXiv founder Paul Ginsparg", The Scientist
  5. ^ a b Staff (January 13, 2015). "In the News: Open Access Journals". Drug Discovery & Development.
  6. ^ "Monthly Submissions". arxiv.org. Retrieved May 16, 2023.
  7. ^ "Reports – arXiv info". info.arxiv.org. Retrieved May 16, 2023.
  8. ^ "arXiv monthly submission rate statistics". Arxiv.org. Retrieved November 19, 2024.
  9. ^ "Image" (GIF). Cs.cornell.edu. Retrieved March 9, 2019.
  10. ^ Ginsparg, Paul (August 4, 2021). "Lessons from arXiv's 30 years of information sharing". Nature Reviews Physics. 3 (9): 602–603. Bibcode:2021NatRP...3..602G. doi:10.1038/s42254-021-00360-z. ISSN 2522-5820. PMC 8335983. PMID 34377944.
  11. ^ O'Connell, Heath (2002). "Physicists Thriving with Paperless Publishing" (PDF). High Energy Physics Libraries Webzine. 6 (6): 3. arXiv:physics/0007040. Bibcode:2000physics...7040O. Archived (PDF) from the original on October 9, 2022.
  12. ^ Feder, Toni (November 8, 2021). "Joanne Cohn and the email list that led to arXiv". Physics Today. 2021 (4) 1108a. Bibcode:2021PhT..2021d1108.. doi:10.1063/PT.6.4.20211108a. S2CID 244015728.
  13. ^ Feder, Toni (November 8, 2021). "Joanne Cohn and the email list that led to arXiv". Physics Today. 2021 (4) 1108a. Bibcode:2021PhT..2021d1108.. doi:10.1063/PT.6.4.20211108a. S2CID 244015728.
  14. ^ Ginsparg, Paul (October 1, 2008). "The global-village pioneers". Physics World. Retrieved October 10, 2020.
  15. ^ Butler, Declan (July 5, 2001). "Los Alamos Loses Physics Archive as Preprint Pioneer Heads East". Nature. 412 (6842): 3–4. Bibcode:2001Natur.412....3B. doi:10.1038/35083708. PMID 11452262. S2CID 1527860.
  16. ^ Han, Sheon. "Inside arXiv—the Most Transformative Platform in All of Science". Wired. ISSN 1059-1028. Retrieved March 28, 2025.
  17. ^ a b "Celebrating 30 Years of arXiv and Its Lasting Legacy on Scientific Advancement". SPARC. October 25, 2021.
  18. ^ Glanz, James (May 1, 2001). "The World of Science Becomes a Global Village; Archive Opens a New Realm of Research". The New York Times.
  19. ^ Bill Steele (September 23, 2002). "Cornell professor Paul Ginsparg, science communication rebel, named a MacArthur Foundation fellow; three other alumni also receive 'genius award' fellowships". Cornell Chronicle. Archived from the original on October 27, 2021.
  20. ^ "Cornell University Library arXiv Financial Projections for 2013-2017" (PDF). Confluence.cornell.edu. March 28, 2012. Retrieved February 26, 2017.
  21. ^ "arXiv Member Institutions (2021) – arXiv about – Our Members". arXiv.org. Retrieved December 27, 2021.
  22. ^ Fischman, Joah (August 10, 2011). "The First Free Research-Sharing Site, arXiv, Turns 20 With an Uncertain Future". Chronicle of Higher Education. Retrieved August 12, 2011.
  23. ^ "arXiv Scientific Advisory Board | arXiv e-print repository". arxiv.org. Retrieved October 10, 2020.
  24. ^ "About the Physics Archive | arXiv e-print repository". arxiv.org. Retrieved October 10, 2020.
  25. ^ "New arXiv articles are now automatically assigned DOIs". Retrieved April 4, 2023.
  26. ^ a b McKinney, Michelle (2011), "ArXiv.org", Reference Reviews, 25 (7): 35–36, doi:10.1108/09504121111168622
  27. ^ a b Merali, Zeeya (January 29, 2016). "ArXiv rejections lead to spat over screening process". Nature. doi:10.1038/nature.2016.19267. S2CID 189061969. Retrieved December 14, 2017.
  28. ^ "Current arXiv moderators". Arxiv.org. Retrieved October 3, 2024.
  29. ^ Ginsparg, Paul (2006), "As we may read", Journal of Neuroscience, 26 (38): 9606–9608, doi:10.1523/JNEUROSCI.3161-06.2006, PMC 6674456, PMID 16988030
  30. ^ Greechie, Richard; Pulmannova, Sylvia; Svozil, Karl (July 2005), "Preface to the Proceedings of Quantum Structures 2002", International Journal of Theoretical Physics, 44 (7): 691–692, Bibcode:2005IJTP...44..691G, doi:10.1007/s10773-005-7053-z, S2CID 121442106, The new endorsement system may contribute to an effective barrier, a digital divide
  31. ^ Josephson, Brian (February 23, 2005). "Vital resource should be open to all physicists". Nature. 433 (7028): 800. Bibcode:2005Natur.433..800J. doi:10.1038/433800a. PMID 15729314.
  32. ^ Perelman, Grisha (November 11, 2002). "The entropy formula for the Ricci flow and its geometric applications". arXiv:math.DG/0211159.
  33. ^ Lobastova, Nadejda; Hirst, Michael (August 21, 2006). "Maths genius living in poverty". Sydney Morning Herald.
  34. ^ Kaufman, Marc (July 2, 2010), "Russian mathematician wins $1 million prize, but he appears to be happy with $0", Washington Post
  35. ^ Jackson, Allyn (2002). "From Preprints to E-prints: The Rise of Electronic Preprint Servers in Mathematics" (PDF). Notices of the American Mathematical Society. 49 (1): 23–32.
  36. ^ Ginsparg, Paul (August 2011). "ArXiv at 20". Nature. 476 (7359): 145–147. Bibcode:2011Natur.476..145G. doi:10.1038/476145a. ISSN 0028-0836. PMID 21833066. S2CID 4421407.
  37. ^ Rao, Delip; Young, Jonathan; Dietterich, Thomas; Callison-Burch, Chris (2024). "WithdrarXiv: A Large-Scale Dataset for Retraction Study". arXiv:2412.03775 [cs.CL].
  38. ^ "Open Archives Initiative (OAI)". arxiv.org. Retrieved April 25, 2020.
  39. ^ "arXiv License Information". Arxiv.org. Retrieved July 21, 2013.

General and cited sources

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
arXiv is a free, open-access online repository and distribution service for electronic preprints (e-prints) of scholarly articles, primarily in the fields of , , , , , , and , and . Established in August 1991 by physicist at , arXiv began as an automated email distribution system for preprints in high-energy physics theory, addressing the need for faster sharing of research beyond traditional journals. It transitioned to a web-based platform in 1993 and expanded to additional disciplines, reaching over 500,000 articles by 2008, one million by 2014, and two million by 2022. Since 2001, arXiv has been operated and maintained by , supported by academic institutions, libraries, and philanthropic contributions, with no submission or access fees. As of February 2026, the platform hosts over 2.9 million scholarly articles, with around 24,000 new submissions processed monthly from a global community of researchers. In early February 2026, the cs.AI category exhibited significant activity, with 381 new submissions recorded on Monday, 2 February 2026. Recent examples include:
  • Exploring Reasoning Reward Model for Agents
  • Routing the Lottery: Adaptive Subnetworks for Heterogeneous Data
  • A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system
    For the complete and up-to-date list of AI papers from the last 24 hours, check arXiv's new or recent submissions in cs.AI, as papers are added continuously and batched daily. In November 2025, arXiv updated its policies to no longer accept review or position papers in due to a surge in low-quality AI-generated submissions. As of February 2026, no announcements or changes specific to AI papers have been made beyond ongoing category activity; submission guidelines remain general and apply across all categories, including AI (cs.AI). Always check the official arXiv help pages for the latest details.
Submissions undergo moderation by over 240 volunteer experts in relevant fields, alongside automated checks, to verify scientific validity and appropriateness without peer review, ensuring broad accessibility while upholding community standards. arXiv has revolutionized scientific publishing by enabling rapid dissemination of findings, promoting , and facilitating breakthroughs, such as key research papers and seminal works like the 2002–2003 solution to the by . With approximately 5 million monthly active users, it serves as a cornerstone of modern research infrastructure, influencing the development of other servers and earning recognition as one of the most transformative tools in science.

History

Founding and Early Years

arXiv was founded by physicist in 1991 at the (LANL), where he developed it as a centralized automated distribution system for preprints in theoretical high-energy physics. Motivated by the inefficiencies of physical preprint exchanges and the growing use of email lists for sharing files among physicists, Ginsparg created the initial archive under the domain xxx.lanl.gov to automate collection, storage, and dissemination of these documents. The system focused exclusively on high-energy physics theory (hep-th), addressing the need for rapid sharing within this specialized community. The first submission arrived on August 14, 1991, to the [email protected], marking the official start of operations. Designed for a small user base of about 100 physicists, the archive processed submissions via , generating compressed files and distributing them daily to subscribers. In its inaugural year, arXiv received 353 total submissions, far exceeding Ginsparg's conservative estimate of around 100 annually, as adoption spread rapidly through word-of-mouth in the hep-th community. By 1992, submissions had grown to over 1,000, reflecting the system's utility in accelerating feedback and collaboration. Early growth posed significant challenges, as the volume surged from hundreds to thousands of submissions per year by the mid-1990s, straining LANL's computational resources and requiring manual oversight for file processing and . To cope, Ginsparg integrated feedback from users and expanded the hep-th archive to handle diverse formats while maintaining focus on unrefereed preprints. In , the system transitioned to a web interface, enabling browser-based browsing and submission; this also allowed merging of decentralized remote archives back into the central repository, streamlining operations. In 1994, with support from a grant, enhancements were made, including rewriting the code in . By June 1995, automated generation was implemented for submissions, further reducing administrative burdens and enhancing accessibility.

Expansion and Institutional Changes

In 2001, arXiv's founder returned to from , relocating the archive's operations to Cornell and rebranding it as arXiv.org under the stewardship of . This move marked a pivotal institutional shift, enabling sustained academic oversight and integration into a university ecosystem, with arXiv formally operated by Cornell thereafter. arXiv expanded its subject coverage to foster interdisciplinary growth, adding the Quantitative Biology (q-bio) archive on September 15, 2003, to accommodate experimental, numerical, statistical, and mathematical contributions relevant to biology. In 2007, the Statistics (stat) archive was introduced on April 1, organizing content into categories such as Applications, Methodology, and Theory to better serve statistical research across domains like biology and engineering. The Economics (econ) archive followed in September 2017, initially focusing on Econometrics before expanding to areas like General Economics and Theoretical Economics. During the 2010s, arXiv integrated with in early 2015, allowing users to link their unique researcher identifiers to arXiv accounts for improved attribution and cross-platform connectivity of scholarly works. This period also saw rapid scaling, with total submissions surpassing 1 million by the end of 2014, reflecting arXiv's growing role as a central hub for dissemination. Institutionally, operations transitioned in 2018 from to Cornell's Computing and unit, enhancing technical infrastructure while maintaining academic governance. In recent years through 2025, arXiv has bolstered support for and categories (cs.AI and cs.LG) amid surging submissions in these fields, including refinements to handle increased volume and interdisciplinary overlaps. In October 2025, arXiv updated its moderation policy for the category, no longer accepting or position papers to address issues with AI-generated spam submissions. Additionally, in November 2025, arXiv Labs paused acceptance of new experimental project proposals to prioritize ongoing initiatives. Partnerships with journals for direct submissions have also advanced, enabling seamless transfers from arXiv preprints to peer-reviewed outlets; for instance, and Gravitation implemented arXiv-integrated workflows in its Editorial Manager system, while open-access journals like those under SCOAP³ facilitate direct posting and review pipelines. Funding has remained crucial, with primary support from , supplemented by grants from the and the , including over $10 million combined in 2023 for infrastructure upgrades and sustainability.

Purpose and Scope

Subject Categories

arXiv employs a system to organize submissions into distinct subject areas, facilitating targeted discovery and dissemination within scientific communities. This divides content into primary archives, each representing a broad discipline, with further subdivisions into specialized subcategories. The system ensures that papers are grouped logically, allowing users to browse, search, and subscribe to updates by specific interests. Primary categories include Physics, (math), (cs), Quantitative Biology (q-bio), Statistics (stat), Electrical Engineering and (eess), Quantitative Finance (q-fin), and (econ). Within each primary category, subcategories provide finer granularity. For instance, the Physics archive encompasses several specialized areas such as (astro-ph), Condensed Matter (cond-mat), and and Quantum Cosmology (gr-qc). The subcategory (astro-ph) further branches into divisions like Astrophysics of Galaxies (astro-ph.GA), Cosmology and Nongalactic Astrophysics (astro-ph.CO), and High Energy Astrophysical Phenomena (astro-ph.HE), enabling precise classification of research topics. Similar hierarchical structures apply across other archives; for example, Computer Science includes subfields like (cs.AI) and (cs.LG). This nested organization supports cross-listing, where a submission can appear in multiple relevant categories to enhance visibility. The category system originated with arXiv's founding in , initially limited to physics subfields like high-energy physics, reflecting its roots in serving particle physicists. Over time, it expanded to encompass interdisciplinary areas, with added in the early 1990s, in 1993, Quantitative Biology in 2003, in 2007, Quantitative Finance in 2008, and more recent additions like and and in 2017. This evolution transformed arXiv from a physics-centric repository into a multidisciplinary platform hosting over 2.8 million e-prints across quantitative sciences, as of November 2025. Submissions are announced through category-specific email lists, providing daily digests of new abstracts to subscribers. These announcements, sent Sunday through , include titles, authors, and summaries, helping researchers stay abreast of developments in their fields without manual searching. Users can subscribe to individual categories or subcategories via requests to arXiv's automated system. While comprehensive for quantitative and physical sciences, arXiv's categories deliberately exclude and most social sciences, focusing instead on areas amenable to sharing and mathematical rigor. This scope aligns with its mission to accelerate dissemination in fields where rapid feedback is valuable, leaving qualitative disciplines to other repositories.

Content Types and Policies

arXiv primarily hosts s of scholarly papers, with research articles forming the core content type, encompassing original research contributions across its subject areas. Accepted materials also include review articles, summaries or excerpts from theses and dissertations, , and occasionally books or book chapters when they align with scholarly standards. These submissions enable rapid dissemination of scientific work prior to formal , fostering to emerging research. As of October 2025, submissions of review articles and position papers to the category must have been previously accepted by a journal or . Submissions must adhere to strict policies ensuring topical relevance to arXiv's categories—such as physics, , , and quantitative biology—and represent original, refereeable scholarly contributions that follow established norms of academic communication. Prohibited content includes patents, standalone software code without accompanying scholarly narrative, non-academic materials like blog posts or opinion pieces, abstracts alone, course projects, poster summaries, and proposals without substantive results. Political, offensive, or non-scientific content is also rejected during moderation. In the category, a update effective late 2025 restricts review articles and position papers to those previously accepted by a peer-reviewed venue, aiming to prioritize original amid rising submission volumes potentially influenced by AI-generated content. As of early 2026, no additional specific submission guidelines unique to AI papers in the cs.AI category have been implemented for 2026, and general submission policies continue to apply across all categories, including requirements for PDF format (preferably generated from LaTeX), appropriate category and subject class selection, and potential endorsement for new submitters in moderated categories. While arXiv imposes no rigid page limits, typical research preprints span 10 to 50 pages to maintain and focus, with file size capped at 50 MB to ensure efficient processing; oversized submissions may require compression of figures or other optimizations. Format expectations emphasize machine-readable documents, such as those generated from /LaTeX source, with complete references, clear author lists, single spacing, 10-14 point font, and 1-inch margins, excluding line numbers, watermarks, or advertisements. Non-endorsed submissions are permitted in established categories for registered users without prior endorsement requirements, though new submitters or those entering endorsement-gated categories receive a flag for moderator review to verify community affiliation. This system helps maintain quality while allowing broad access. Ethical guidelines strictly prohibit and require originality; all submissions undergo moderation checks for text overlap with existing works, including prior arXiv postings or publications, with authors notified to explain or revise if excessive similarity is detected—such flags do not imply but ensure transparency. Authors must affirm that submissions are not duplicates of prior arXiv entries and should note any concurrent submissions to other servers to avoid redundancy, though arXiv postings do not constitute prior publication and are compatible with journal dual submissions.

Operations

Submission Process

Authors seeking to submit preprints to arXiv must first create a free account through the platform's registration process, which requires providing an for verification and basic personal details to establish authorship identity. This registration is open to anyone, though institutional addresses from recognized domains may facilitate easier access to certain categories. Optionally, authors can link their identifier to their arXiv account during or after registration to enhance the connection of their scholarly outputs across platforms. Once registered, the submission process begins by logging into the arXiv user page and selecting "START NEW SUBMISSION." Authors then choose an appropriate primary category from arXiv's subject areas, such as physics, mathematics, or computer science, to classify the work. The submission guidelines are uniform across all categories, including artificial intelligence (cs.AI), with no unique rules or changes specific to AI papers in 2026. Papers must be submitted in PDF format (preferably generated from LaTeX), with an appropriate category and subject class selected, and may require endorsement for new submitters in moderated categories. Next, they upload the source files, preferably in TeX/LaTeX format as a ZIP archive containing the main document and ancillary files, or alternatively a single PDF if source is unavailable. Concurrently, authors enter essential metadata, including the title, abstract (limited to 1920 characters), author names and affiliations, comments, and journal references if applicable. During this stage, authors must affirm compliance with arXiv's content policies, ensuring the submission is topical, original, and suitable for scholarly communication. For /LaTeX submissions, arXiv's automated compilation system processes the source files to generate a PDF version, using tools like pdfLaTeX to handle standard formats while supporting common packages. This compilation occurs server-side after upload, producing a viewable PDF that incorporates all figures and equations, with notifications sent if errors arise requiring resubmission. PDF-only submissions bypass compilation but must include all embedded fonts and be self-contained to ensure accessibility. First-time submitters to most categories are required to obtain an endorsement from an established arXiv author in that field before their submission can proceed. Endorsement serves to verify the submitter's legitimacy and relevance to the category, and it can be requested through the arXiv interface by identifying potential endorsers via related papers or institutional affiliations. Once endorsed, the submission is queued for processing; categories with high submission volumes or specific moderation needs may impose additional checks, but endorsement is the primary gatekeeping mechanism for newcomers. Upon successful submission and processing, the preprint is assigned a unique arXiv identifier (e.g., arXiv:YYYY.MMxxxx) and announced in arXiv's daily email listings and web updates, typically within 1-2 business days if submitted before the cutoff time, excluding weekends. Since January 2022, all new arXiv articles have been automatically assigned a Digital Object Identifier (DOI) through collaboration with DataCite, formatted as 10.48550/arXiv.YYYYMMxxxx, to improve long-term citability and metadata interoperability.

Moderation and Endorsement

arXiv employs an endorsement system to ensure that submitters are part of the relevant , operating on a hierarchical model where established authors endorse newcomers. New users or those submitting to a category for the first time must secure an endorsement from a qualified arXiv author in that domain before proceeding. To qualify as an endorser, individuals must have authored a certain number of papers within the endorsement domain, which varies by subject area (e.g., 3 for some categories). Once endorsed, submissions enter the moderation process, overseen by volunteer moderators who are subject experts appointed by arXiv's advisory committees. These approximately 240 moderators, distributed across arXiv's categories, flagged submissions for topical appropriateness, compliance with technical and formatting standards, and adherence to norms, while checking for issues like or falsified data, but without performing of the scientific content. The process focuses on verifying that content is refereeable and suitable for the archive, with moderators spending limited time—ideally under 30 minutes per day—on their duties. Authors are required to disclose any use of generative AI tools in the preparation of their submission. Submissions are typically held for 1-2 days during , with around 20% flagged for manual review out of the daily volume of 600-800 papers; unflagged or cleared submissions are automatically announced in the next cycle, usually within 24 hours of resolution. Authors whose submissions are rejected due to moderation issues can by contacting the relevant moderators, providing additional for reconsideration, though repeated appeals without new information are not entertained. As of 2025, arXiv has enhanced its moderation with increased automation, incorporating AI tools to flag potential spam, plagiarism, and AI-generated content, particularly in response to a surge in low-quality submissions in fields like computer science. This update allows moderators to focus on higher-priority reviews, with policies such as requiring review and position papers in the CS category to be peer-reviewed and accepted by a conference or journal before submission, to combat floods of low-quality automated papers.

Corrections and Withdrawals

arXiv allows authors to update their submissions post-announcement through a versioning system, where replacing the submission files creates a new version (e.g., v2, v3), incrementing the version number while preserving all prior versions as part of the permanent scientific record. This process ensures that historical iterations remain accessible via the abstract page's version history, supporting transparency in scholarly . Replacements after version 5 are limited to no more than once per week to manage announcement volume, and revisions beyond this point are not included in daily email mailings. For minor corrections that do not warrant a full revision, authors can update specific metadata fields—such as adding or modifying journal references, DOIs, or report numbers—without generating a new version number or altering the announcement date. These changes are processed directly and reflect immediately in the record, facilitating accurate bibliographic information without disrupting the version lineage. However, any substantive file replacement or significant content update requires a new version submission. Withdrawals are permitted for valid reasons including significant errors, duplicate submissions, or ethical concerns, but they do not result in complete removal from the archive. Instead, initiating a withdrawal creates a new version marked as "withdrawn," which includes a public explanation in the comments field but provides no access to the full text; previous versions remain fully available. arXiv policy explicitly prohibits withdrawals due to negative reception or criticism, emphasizing the platform's commitment to maintaining the integrity of the scholarly record. Authors must provide a clear rationale for the withdrawal through the submission interface, ensuring accountability. Such withdrawals are relatively rare, with over 14,000 recorded across arXiv's through out of approximately 2.6 million total submissions. This low incidence underscores the robustness of arXiv's initial and endorsement processes, which help prevent problematic content upfront.

Technical Infrastructure

File Formats and Standards

arXiv strongly prefers submissions in / source format to facilitate automatic compilation into PDF, ensuring consistency and enabling the retention of editable source files for future processing and . This approach allows arXiv to generate high-quality PDFs using its distributions, including the default 2025, which includes standard packages for bibliographies like biblatex 3.20 and Biber 2.20. Authors are required to provide source files even if submitting a LaTeX-generated PDF, as the system detects and requests the underlying code to avoid "PDF-only" uploads where editable sources are available. While PDF-only submissions are accepted as an alternative, particularly for non-TeX documents, they are discouraged for LaTeX-based work due to limitations in searchability, , and long-term ; such submissions must be machine-readable, include all fonts, and avoid features like line numbers or watermarks. Standards emphasize compatibility with arXiv's processing: documents should use standard LaTeX classes (e.g., article.cls or field-specific ones like revtex for physics), with figures in .ps or .eps format for traditional LaTeX (or .pdf, .jpg, .png for PDFLaTeX), and equations handled via core LaTeX math environments without non-standard extensions. Custom macros are permitted if included in the upload, but reliance on unsupported packages can lead to compilation failures. Processing involves automated compilation on arXiv's servers, with support for hyperlinks via the hyperref package to enhance navigability in the output PDF. The AutoTeX system, which iteratively attempted to resolve common errors like missing packages or figure conversions, was retired in April 2025 to streamline operations, replacing it with direct compilation and detailed error logs for authors to fix issues manually. In the 2020s, arXiv evolved its standards toward better preservation, recommending compliance (ISO 19005-1) for direct PDF uploads to maintain visual fidelity and embeddability over time, independent of software changes. Source uploads during the submission process remain essential for these compiled outputs.

Access and Retrieval Methods

Users primarily access arXiv content through the web interface at arXiv.org, where they can perform searches using a simple keyword box or advanced query syntax to discover articles. The search supports field-specific operators such as "au:" for authors (e.g., au:Einstein), "ti:" for titles, "abs:" for abstracts, "cat:" for categories (e.g., cat:cs.AI), and "id:" for arXiv identifiers, combined with operators like AND, OR, and NOT, as well as phrase searches in quotes. This allows precise retrieval, with results displaying abstracts, authors, categories, and links to full texts, updated daily with new submissions. For programmatic access, arXiv provides a RESTful API that enables querying metadata and abstracts via HTTP GET requests to https://export.arxiv.org/api/query, using the same advanced search syntax as the web interface (e.g., ?search_query=au:author+AND+cat:physics). The API returns results in Atom XML format, supporting pagination with parameters like start and max_results for up to 30,000 items per query, and is designed for non-commercial, open access applications. Additionally, the OAI-PMH protocol facilitates metadata harvesting at https://export.arxiv.org/oai2, providing Dublin Core and arXiv-specific metadata for all articles in sets by category or date, with daily updates shortly after announcements. Bulk retrieval options support large-scale access to arXiv's approximately 2.88 million articles as of November 2025. Metadata can be harvested comprehensively via OAI-PMH, while full-text files—including PDF versions and source tarballs (typically )—are available through requester-pays buckets at s3://arxiv, organized by submission date and ID, allowing efficient downloads with tools like AWS CLI. Although traditional mirrors were discontinued in September 2024, content remains distributed through archival services like the , which hosts snapshots and metadata dumps for redundancy. Third-party mobile applications, such as arXiv mobile for Android and Lib arXiv for , provide on-the-go search and download capabilities using the . Each arXiv article is assigned a unique identifier in the format YYYY.MMxxxx (e.g., 2311.12345), serving as a permanent link via https://arxiv.org/abs/YYYY.MMxxxx, with versions denoted by 'v#' (e.g., v2). These IDs enable cross-referencing to external databases, including DOIs for published versions and PubMed IDs for biomedical content, integrated directly in article pages for seamless navigation. Retrieval options include direct downloads of PDF (processed for readability), source files (compressed archives), and plain-text abstracts from individual pages. For bulk operations, users can leverage the API or S3 for automated fetching, with source formats adhering to standard TeX conventions to ensure compatibility. Authors retain full copyright ownership of their submissions to arXiv, granting the platform only a non-exclusive, irrevocable to distribute and preserve the work publicly. This arrangement ensures that submitters maintain all rights to their without transferring ownership to arXiv or any third party. By default, arXiv does not impose a specific on submissions, allowing authors to choose from available options or leave the work unlicensed beyond the required distribution grant. Authors are strongly encouraged to select open such as Attribution (CC BY) to promote reuse and broader dissemination, aligning with arXiv's commitment to principles. In 2020, arXiv expanded its licensing options to include CC BY-NonCommercial-NoDerivatives (CC BY-NC-ND), providing flexibility for authors facing restrictive journal policies while still enabling non-commercial sharing with attribution. arXiv's policies explicitly permit posting preprints even if the work is later published elsewhere, provided it complies with the publisher's rules, supporting models where authors deposit versions of their manuscripts in repositories like arXiv. For co-authored works, a single submitter can typically grant the on behalf of the group under U.S. , but authors must ensure they have to do so and resolve any potential conflicts with journal agreements. In the 2020s, funding agencies such as the (NSF) have increasingly emphasized open licensing for grant-funded research, recommending CC BY or equivalent terms to facilitate reuse in public access plans. This push encourages arXiv submitters supported by NSF to adopt permissive licenses, enhancing the platform's role in compliant scholarly sharing.

Archival Policies

arXiv maintains a strong commitment to the perpetual free access of its scholarly content, ensuring that once an article is publicly announced, it cannot be completely deleted from the . Withdrawals are permitted by creating a new version marked as withdrawn, which replaces the default view but leaves all previous versions accessible with full text available. Deletions are only possible for submissions prior to announcement, and even then, arXiv reserves the right to remove any submission in extreme cases, such as legal orders or verified infringements. This policy underscores arXiv's role as a permanent record of scientific preprints, preventing loss due to author regret or minor errors. For long-term preservation, arXiv relies on the archival infrastructure developed by , including redundant storage systems at Cornell and off-site locations to safeguard file integrity. The platform preserves submitted source files like when provided and maintains bitstream preservation to ensure the authenticity of original submissions. Historically, arXiv accepted files but has transitioned to emphasizing source submissions for better long-term . To facilitate offline access and broader preservation efforts, arXiv releases the full corpus through bulk data downloads, including metadata via OAI-PMH and full-text files via AWS S3 snapshots, updated regularly to capture the entire archive. These dumps enable researchers and institutions to create local mirrors for independent verification and use. Additionally, backups are distributed across partners; in 2025, the Technische Informationsbibliothek (TIB) in established a dark archive containing a complete copy of arXiv's content, serving as a redundant safeguard against potential disruptions to the primary U.S.-based storage. This licensing framework supports such archival reuse by granting non-exclusive rights for distribution in support of .

Impact and Usage

Usage Statistics

arXiv has experienced significant growth in submissions since its , reaching a total of 2,884,283 papers as of November 15, 2025. In 2024, the platform received 244,031 new submissions, averaging approximately 20,336 per month, with monthly figures surpassing 24,000 by late 2024. Projections for 2025 indicate continued expansion, with over 284,000 new submissions in the first 11 months, suggesting an annual total exceeding 300,000. In 2025, monthly submissions set new records, reaching 26,646 in September. Download activity underscores arXiv's scale, with over 3.2 billion cumulative downloads recorded by the end of 2024. Annual downloads have grown substantially, reaching more than 552 million in 2023 based on monthly averages of 46 million. Usage is particularly concentrated in high-impact categories, such as computer science's subcategory (cs.LG), which led submissions in October 2024 with thousands of papers and corresponding high download volumes. User engagement reflects a predominantly academic audience, with over 5 million monthly active users as of 2024. Geographically, the leads in contributions and usage, followed by , the , , and other European nations, accounting for the majority of global activity. Historically, submission growth was exponential during the , starting from a few dozen papers monthly to thousands by decade's end. Post-2010, the platform has maintained a steady annual growth rate of 10-15%, driven by expansions in and quantitative biology. These metrics are derived from arXiv's annual reports and integrated analytics tools.
YearNew SubmissionsApproximate Annual Growth Rate (%)
2010~80,000-
2015~105,00012
2020~170,00010
2024244,03115
This table illustrates post-2010 trends, highlighting consistent expansion.

Influence on Open Science

arXiv has significantly accelerated the pace of scientific research by enabling rapid dissemination of preprints, often months or years ahead of traditional journal publication timelines. This immediacy fosters early collaboration, feedback from the community, and timely integration of findings into ongoing work, particularly in fast-evolving fields like physics and . Empirical analyses indicate that arXiv preprints garner citations more quickly and at higher rates than non-preprint equivalents, with one study finding that practices including preprinting correlate with up to 20% more citations overall. As a pioneer in open access, arXiv established a scalable model for preprint repositories that has influenced the creation of discipline-specific platforms worldwide. Launched in 1991, it demonstrated the viability of free, immediate access to scholarly work, paving the way for bioRxiv in biology (2013) and SSRN in social sciences (1994), which adopted similar structures for broader open dissemination. These successors expanded arXiv's vision, collectively serving millions of users and reinforcing open access as a cornerstone of modern scholarly communication. However, arXiv's model has ignited ongoing debates about the necessity and timing of in scholarly publishing. Proponents argue that preprints enhance transparency and speed, but critics contend that unvetted content can propagate errors or incomplete ideas, potentially undermining scientific rigor. Recent concerns have intensified around predatory preprints, including those generated by AI tools or paper mills, which flood servers with low-quality submissions; in response, arXiv implemented stricter moderation policies in 2025, such as requiring documentation for certain article types in . arXiv facilitates integrations with traditional publishing through overlay journals, which conduct on preprints hosted on the platform and provide permanent links to the original submissions. Examples include mathematics journals like , which leverage arXiv for hosting while adding a layer. Additionally, tools track social media mentions, downloads, and online discussions of arXiv papers, offering a complementary measure of impact beyond journal citations and highlighting broader societal reach. In the 2025 landscape, arXiv has become central to discussions on AI ethics, hosting preprints that rapidly share frameworks for responsible AI development, such as guidelines for bias mitigation and moral reasoning in large language models. This role extends to promoting global equity in , as arXiv's free access democratizes knowledge for researchers in developing countries, where paywalled journals often exacerbate disparities, though challenges like informal gatekeeping persist.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.