Recent from talks
Nothing was collected or created yet.
LanguageTool
View on Wikipedia| LanguageTool | |
|---|---|
LanguageTool WikiCheck | |
| Developers | Daniel Naber and Marcin Miłkowski |
| Initial release | 15 August 2005 |
| Stable release | 6.7[1] |
| Repository | |
| Written in | Java |
| Platform | Java SE |
| Size | |
| Type | Grammar checker |
| License | GNU LGPL v2.1+ |
| Website | languagetool |
LanguageTool is a free and open-source grammar, style, and spell checker, and all its features are available for download.[4][5] The LanguageTool website connects to a proprietary sister project called LanguageTool Premium (formerly LanguageTool Plus), which provides improved error detection for English and German, as well as easier revision of longer texts, following the open-core model.
Overview
[edit]LanguageTool was started by Daniel Naber for his diploma thesis[6] in 2003 (then written in Python). It now supports 31 languages, each developed by volunteer maintainers, usually native speakers of each language.[7] Based on error detection patterns, rules are created and then tested for a given text. The core app itself is free and open-source and can be downloaded for offline use. Some languages use 'n-gram' data,[8] which is massive and requires considerable processing power and I/O speed, for some extra detections. As such, LanguageTool is also offered as a web service that does the processing of 'n-grams' data on the server-side. LanguageTool "Premium" also uses n-grams as part of its freemium business model.
LanguageTool web service can be used via a web interface in a web browser, or via a specialized client-side plug-ins for Microsoft Office, LibreOffice, TeXstudio, Apache OpenOffice, Vim, Emacs, Firefox, Thunderbird, and Google Chrome.[5]
LanguageTool does not check a sentence for grammatical correctness, but whether it contains typical errors. Therefore, it is easy to invent ungrammatical sentences that LanguageTool will still accept. Error detection succeeds with a variety of rules based on XML or written in Java.[9] XML-based rules can be created using an online form.[10]
More recent developments rely on large n-gram libraries that offer suggestions for improving misspellings with the help of artificial neural networks.[11]
In April 2023 Learneo acquired LanguageTool.[12][13][14][15]
See also
[edit]References
[edit]- ^ "Release 6.7". 10 October 2025. Retrieved 20 October 2025.
- ^ "Index of /download/". languagetool.org.
- ^ "Index of /download/ngram-data/". languagetool.org.
- ^ "LanguageTool - Spell and Grammar Checker". LanguageTool.
- ^ a b Ashwin (2019-07-08). "LanguageTool is a free, open-source, grammar and spell checker - gHacks Tech News". gHacks Technology News. Retrieved 2025-04-23.
- ^ Daniel Naber. "A Rule-Based Style and Grammar Checker" (PDF). Daniel Naber.de. Retrieved 30 June 2018.
- ^ "Supported languages". 28 December 2016. Retrieved 29 December 2016.
- ^ "N-Gram Data Download Page". languagetool.org. 2019-03-30. Retrieved 2019-03-30.
- ^ "Linux Administration", Pro Oracle Database 10g RAC on Linux, Berkeley, CA: Apress, pp. 385–400, 2006, doi:10.1007/978-1-4302-0214-1_15, ISBN 978-1-59059-524-4, retrieved 2022-02-23
- ^ "Create a new LanguageTool rule". community.languagetool.org. Retrieved 2023-10-26.
- ^ SKILL 2018 : Fachwissenschaftlicher Informatik-Kongress, Studierendenkonferenz Informatik, 26.-27. September 2018, Berlin. Gesellschaft für Informatik. [Bonn]. 2018. ISBN 978-3-88579-448-6. OCLC 1066024545.
{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: others (link) - ^ Naber, Daniel. "LanguageTool joins Learneo".
- ^ "Learneo | Updates | Learneo, Inc. Accelerates AI Writing Innovation with LanguageTool Acquisition". www.learneo.com. Retrieved 2025-04-10.
- ^ Pathak, Shalini (2023-04-10). "US-Based Learneo Acquires Multilingual Writing Assistant LanguageTool – EdTechReview". Retrieved 2025-04-23.
- ^ Alston, Fiona (2023-04-05). "Learneo adds LanguageTool to its stable of AI-powered writing tools and services, in its latest acquisition". Tech.eu. Retrieved 2025-04-23.
External links
[edit]LanguageTool
View on GrokipediaHistory and Development
Founding and Early Development
LanguageTool originated in 2003 when Daniel Naber developed it as part of his diploma thesis at the Technische Fakultät of Universität Bielefeld in Germany. The project was conceived as a rule-based tool for detecting style and grammar errors in text, addressing limitations in existing spell checkers by incorporating linguistic rules for more sophisticated analysis. Initially implemented in Python, it focused on basic grammar checking capabilities, including part-of-speech tagging and error pattern matching, with the software released as open-source under the GNU Lesser General Public License.[7] The project gained public visibility in 2004 through its registration on SourceForge, marking the beginning of broader community involvement. By August 15, 2005, LanguageTool reached its initial public release as version 1.0, primarily supporting grammar rules for English and German to ensure reliable error detection in those languages. This version emphasized cross-platform usability and integrated with tools like OpenOffice.org, laying the groundwork for its adoption as a proofreading assistant.[8][9] Early development encountered significant challenges in crafting precise rules, especially since many initial contributors were non-native speakers of the target languages, which sometimes led to inaccuracies in nuance detection. To address this, Naber established volunteer-based language teams, comprising linguists and enthusiasts who collaboratively developed and refined XML-based rules for error identification. These teams, numbering around 10 active members by the mid-2000s, played a crucial role in improving rule quality and fostering the tool's evolution into a multilingual resource.[9]Key Milestones and Acquisitions
LanguageTool achieved a significant milestone in 2010 with the expansion of its proofreading capabilities to support over 20 languages, enabling broader multilingual error detection and establishing it as a versatile open-source tool.[10] This development coincided with the availability of its web interface, allowing users to access grammar and style checking online without local installation, which contributed to initial adoption among writers and developers.[10] The tool maintained a consistent release cadence, with stable versions issued approximately every six months to incorporate community-contributed rules and performance optimizations.[11] In March 2025, with the release of version 6.6, Daniel Naber handed over maintenance responsibilities to Stefan Viol at LanguageTooler GmbH, transitioning to a snapshot-based release model to support ongoing development.[5] A notable recent update was version 6.7, released on October 10, 2025, which included refinements to suggestion algorithms, leveraging neural networks for more precise spellchecking and stylistic recommendations in supported languages.[12] These enhancements built on earlier neural integrations, allowing for better handling of ambiguous phrasing through probabilistic modeling.[13] In April 2023, LanguageTool was acquired by Learneo, Inc., marking a pivotal shift from its origins as a fully volunteer-driven open-source project—initiated in 2003 by a community of linguists and developers—to a hybrid model combining professional engineering resources with ongoing community contributions.[3] This acquisition facilitated accelerated innovation in AI-driven features while preserving the tool's open-source core, enabling integration with Learneo's suite of writing and learning platforms.[14] User growth reflected these advancements, expanding from a few thousand active contributors and early adopters in the mid-2000s to millions of users by 2025, fueled by the 2010 web service launch and subsequent browser extensions that reached over 3 million Chrome users alone.[1] The surge was further propelled by integrations into productivity tools and the rise of remote work, positioning LanguageTool as a staple for multilingual writing assistance.[15]Core Functionality
Error Detection Mechanisms
LanguageTool detects errors across multiple categories, including grammar, spelling, punctuation, style, tonality, and typography, primarily through a rule-based system that combines pattern matching with part-of-speech (POS) tagging for contextual analysis.[4] Rules are defined in XML format, allowing matches against specific word sequences, POS tags (e.g., noun or verb forms), and regular expressions to identify issues such as subject-verb agreement errors or inconsistent punctuation usage.[4] Spelling errors are handled via integration with dictionaries like Hunspell, while style and tonality checks target overuse of passive constructions or informal phrasing in formal contexts.[7] For more probabilistic error detection, LanguageTool leverages an approximately 8 GB n-gram dataset derived from Google's n-gram collection, which analyzes word sequence probabilities to flag confusable terms in context, such as "their" versus "there" based on surrounding phrases.[16] This method supports up to three-word n-grams and enhances accuracy for idiomatic or collocation-based errors in languages like English, German, French, and Spanish.[16] Context-aware suggestions go beyond simple fixes by recommending rephrasings for improved clarity or formality; for instance, the tool can identify overuse of passive voice in sentences like "The report was written by the team" and suggest "The team wrote the report" to promote active voice.[17] Typography errors, such as improper hyphenation or spacing, are caught through pattern rules that enforce consistency.[4] The system supports offline operation via its desktop application, which requires a download of about 252 MB for the standalone version, enabling local rule-based checks without internet connectivity.[18] In comparison, cloud-based processing provides access to the full n-gram dataset and additional AI-driven analysis for more nuanced suggestions.[16]Language Support and Coverage
LanguageTool currently supports 31 languages and dialects, encompassing a wide range of linguistic diversity through its open-source framework.[19] This includes comprehensive grammar checking for major languages such as English (with variants for US, UK, Canada, Australia, New Zealand, and South Africa), German, French, Spanish, Dutch, and Portuguese, where the tool performs advanced error detection beyond basic spelling.[1][20] Partial support is available for other languages, including Russian and Arabic, which feature grammar rules but with fewer advanced checks compared to the primary languages.[19][21] The development of language-specific rule sets is driven by a community of volunteer native speakers who contribute expertise to ensure cultural and idiomatic accuracy.[6] In total, 143 contributors have participated in building and maintaining the technology, focusing on tailoring rules to the nuances of each language.[6] This collaborative effort leverages the rule-based system to adapt checks for syntactic, semantic, and stylistic elements unique to individual languages.[19] The depth of analysis varies significantly across supported languages, reflecting resource allocation and community maturity. For top languages like English and German, LanguageTool offers deep checks including style suggestions, tonality adjustments, and confusion pair resolutions to enhance clarity and professionalism.[19] In contrast, emerging or less-resourced languages, such as Swedish or Chinese, primarily provide basic spelling and grammar corrections, with limited style or advanced punctuation analysis due to fewer rules (e.g., 32 XML rules for Swedish).[19] This tiered approach ensures broad accessibility while prioritizing robust support for high-demand languages. Community feedback plays a crucial role in ongoing enhancements, with over 20 million texts improved daily across all supported languages.[6] Users report errors and suggest refinements through platforms like GitHub and the LanguageTool community forum, enabling iterative updates to rule sets and expanding coverage for underrepresented languages.[19] This feedback loop has facilitated steady growth, with recent activity showing hundreds of rule changes in languages like Catalan and Portuguese over the past six months.[19]Technical Architecture
Rule-Based System
LanguageTool's rule-based system forms the foundational engine for its error detection capabilities, implemented primarily in Java to ensure cross-platform compatibility and performance. This architecture processes input text by first tokenizing it into sentences and words, applying part-of-speech tagging, and then applying rules defined in XML format for precise pattern matching. Rules are stored in language-specific files, such asgrammar.xml, where each rule specifies an error pattern, a corrective message, and examples for validation. This declarative approach allows for modular extension without altering the core codebase.[4]
The XML rules support flexible pattern matching through <token> elements that can target exact words, lemmas, part-of-speech tags, or regular expressions, enabling detection of syntactic and stylistic issues. For instance, a simple regex-based grammar rule might identify redundant prepositions by matching patterns like <token regexp="yes">in</token> <token>the</token> <token regexp="yes">beginning</token>, flagging phrases such as "in the beginning of" and suggesting "at the beginning of" as a correction. More complex rules incorporate logical operators like OR (|), negation (<exception> or ^), and antipatterns to avoid false positives, such as excluding matches within quotes or specific contexts. This regex integration, combined with linguistic annotations, allows rules to handle nuances like subject-verb agreement or idiomatic expressions efficiently.[4]
To facilitate community involvement, LanguageTool provides an online rule editor at community.languagetool.org/ruleEditor2, where volunteers can create, test, and refine rules interactively without needing to write code or compile the software. The editor generates XML output directly, simulating matches against sample sentences and offering previews of error highlighting and suggestions, which streamlines contributions for grammar, style, and locale-specific checks across supported languages.[22]
For efficient parsing of text structures like sentences and clauses, the system leverages finite-state automata, particularly in components such as morphological analysis and dictionary lookups, to process tokenized input rapidly and scale to large documents. This approach compiles patterns and linguistic data into compact state machines, minimizing computational overhead during rule application.[23][24]
The rule-based system is distributed under the GNU Lesser General Public License (LGPL) version 2.1 or later, which permits users to freely modify the source code, integrate it into other applications, and host custom servers for private or enterprise use while requiring that modifications remain open if redistributed.[25]