Hubbry Logo
Language identificationLanguage identificationMain
Open search
Language identification
Community hub
Language identification
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Language identification
Language identification
from Wikipedia

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods.

Overview

[edit]

There are several statistical approaches to language identification using different techniques to classify the data. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods.[citation needed] Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques.

Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web.

For a more recent method, see Řehůřek and Kolkus (2009). This method can detect multiple languages in an unstructured piece of text and works robustly on short texts of only a few words: something that the n-gram approaches struggle with.

An older statistical method by Grefenstette was based on the prevalence of certain function words (e.g., "the" in English).

A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation.[1][2]

Identifying similar languages

[edit]

One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them.

In 2014 the DSL shared task[3] has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014.

Software

[edit]
  • Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages
  • Apache Tika contains a language detector for 18 languages

See also

[edit]

References

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Language identification (LI), also known as language detection, is the computational task of determining the natural language in which a given text or speech segment is expressed, serving as a foundational step in (NLP) and systems. For textual data, LI involves analyzing features such as character n-grams, word distributions, and linguistic patterns to classify documents or segments among thousands of languages, with high accuracy achievable for well-resourced languages like those in using long texts. In spoken language identification, the process relies on acoustic cues including , prosody, and intonation to distinguish languages or dialects from audio inputs, often as a precursor to selecting appropriate speech recognizers. The importance of LI stems from its role in enabling multilingual applications, such as and , where incorrect language detection can propagate errors through downstream NLP pipelines. It also supports spoken applications like automated call routing. It facilitates processing of diverse data sources like web pages, , and global , particularly supporting low-resource languages that lack extensive training data. As of 2025, over the past five decades since the , LI research has evolved from rule-based and statistical approaches to advanced techniques, including support vector machines and neural networks, achieving accuracies exceeding 99% in controlled settings for major languages. Despite these advances, challenges persist in handling short texts, , dialects, and under-resourced languages, where performance drops significantly due to data scarcity and linguistic similarities. Ongoing efforts as of focus on robust, off-the-shelf systems and shared tasks like those from the NIST Language Recognition Evaluations, integrating models such as convolutional and recurrent neural networks to improve generalization across real-world scenarios.

Introduction

Definition and Scope

Language identification () is the computational task of automatically determining the natural of an input, such as written text or spoken audio, through algorithmic analysis. In its core form, LID processes textual documents by examining linguistic patterns to assign a language label, or analyzes speech signals to estimate the based on phonetic and prosodic cues. The task originated in in the , with early formulations framing it as a learnability problem for identifying languages from positive examples, often as a preprocessing step for systems. The scope of LID encompasses several modalities and configurations. For written text, it typically involves statistical models like character n-gram frequency analysis to detect languages from short or long documents. Spoken LID, in contrast, relies on acoustic features such as mel-frequency cepstral coefficients (MFCCs), phonotactics, and intonation patterns extracted from audio signals to distinguish languages. Hybrid systems combine these approaches, integrating textual transcripts with acoustic data for robust identification in mixed environments. LID is distinct from related tasks like script detection, which identifies writing systems (e.g., Latin vs. Cyrillic) without specifying the language, and named entity recognition, which extracts specific entities (e.g., persons or locations) within an assumed language. Key concepts in LID include distinctions based on input complexity and knowledge assumptions. Monolingual LID assumes a single language per input, simplifying classification for uniform documents, while multilingual LID detects and segments multiple languages within the same input, such as in code-switched text. Closed-set LID operates on a predefined list of known languages with available training data, outputting the most likely match from that set, whereas open-set LID handles unseen or unknown languages by rejecting or flagging inputs outside the trained classes. These variations enable LID's integration into broader pipelines, including as a precursor to tools.

Importance and Applications

Language identification plays a crucial role in facilitating global communication by automating the routing and processing of multilingual content across digital platforms. In and search engines, it enables efficient , personalized recommendations, and improved search relevance for diverse user bases, thereby bridging linguistic barriers and promoting inclusive online interactions. Similarly, in systems, automatic language detection streamlines interactions by directing queries to appropriate agents or services, enhancing response times and user satisfaction in multinational environments. Key applications of language identification span several domains, including web content classification, where tools like employ auto-detection to identify source languages for real-time of web pages and documents, supporting 244 languages as of October 2024 following the addition of 110 new languages using the PaLM 2 AI model. In , native language identification enhances authorship attribution by analyzing linguistic traces in multilingual texts, improving accuracy by up to 9% in investigations involving non-native speakers. For speech-to-text systems in call centers, language detection in automatic (ASR) handles multilingual calls by identifying spoken languages in real-time, enabling accurate transcription and for over 99 languages despite challenges like accents and . Additionally, in tools, language identification ensures proper pronunciation by screen readers for multilingual users, preventing misinterpretation of non-English text through language tags. The economic value of language identification lies in its ability to reduce manual labor in localization industries, where accelerates content adaptation for global markets. The services industry, heavily reliant on such technologies, reached USD 71.7 billion in , with projections to USD 75.7 billion in , driven by AI-enhanced localization that cuts costs and boosts in sectors like healthcare and media. This is particularly vital amid the multilingual web's growth, where approximately 7,000 languages exist worldwide, yet only about 10 have substantial online presence, underscoring the need for tools to expand digital inclusion beyond dominant languages like English, which accounts for 60% of content. Emerging applications include its role in AI ethics for bias detection, where language identification reveals disparities in model performance, such as AI detectors flagging non-native English writing as generated content at rates up to 19% higher than native text. In security contexts, it aids in analyzing encrypted communications, like VoIP traffic, by inferring languages from packet lengths with up to 86.6% accuracy in binary classifications, highlighting privacy risks and informing countermeasures.

Historical Development

Early Approaches

Early approaches to language identification relied on manual techniques employed by linguistic experts, particularly in the field of , where identifying the language of encrypted texts was a prerequisite for decryption in some cases. These methods involved analyzing phonological patterns, such as sound distributions; morphological features, like rules; and , including sentence organization. A key example was the use of character frequency analysis, which exploited the predictable distribution of letters in specific languages to aid in code-breaking, a practice refined in 19th-century during military and diplomatic efforts. The transition to computational methods began in the mid-20th century with the advent of digital text processing. In 1965, Seppo Mustonen developed one of the first automated systems using multiple discriminant analysis on character-based features, including vowel-consonant ratios and word length distributions, achieving approximately 76% accuracy in distinguishing English, Swedish, and Finnish texts from 300-word samples. This statistical approach marked the shift from purely manual analysis to machine-assisted identification, though it still required predefined linguistic models without . Early computational efforts, such as those on mainframes in the late and early , focused on processing machine-readable texts by calculating simple statistical profiles, laying the groundwork for broader application. By the 1970s, rule-based systems emerged as key milestones, emphasizing fixed heuristics tailored to specific language families. For instance, Y. Nakamura's 1971 system identified 25 Latin-alphabet languages through rules governing character and word occurrence rates. Morton D. Rau's 1974 work advanced this by incorporating probabilities and vowel-consonant ratios, attaining 89% accuracy in differentiating English from Spanish using an Model 67, though these methods struggled with scalability for non-Latin scripts due to reliance on alphabetic assumptions. Similarly, the 1977 probabilistic approach by A. S. and E. P. Neuburg provided a framework using Markovian probabilities on phonetic features for identification, limited to European languages. Influential late-1980s contributions built on these foundations with more sophisticated probabilistic models. Kenneth R. Beesley's 1988 program utilized character n-gram probabilities to automatically identify languages in online text, demonstrating effective differentiation between English and French by comparing sequence frequencies against cryptanalysis-inspired models, with confidence levels exceeding 60% after just 10-12 words. These early systems highlighted the potential of n-grams for rapid identification but underscored persistent challenges in handling diverse scripts and short texts.

Evolution in the Digital Age

The heralded a pivotal shift in language identification toward corpus-based statistical methods, enabled by the growing availability of digitized text collections that allowed for empirical modeling of linguistic patterns across languages. Early efforts focused on character n-grams as discriminative features, with Cavnar and Trenkle's 1994 approach achieving 99.8% accuracy on a corpus spanning eight languages, establishing n-grams as a foundational technique for scalable LID. This era also saw dictionary-based methods, such as Giguet's 1995 use of alphabets and function words, which complemented statistical models by incorporating lexical resources. By the 2000s, advancements accelerated through the integration of web crawling, which provided vast, diverse training data and expanded LID to dozens of languages previously underrepresented in curated corpora. Baldwin and Lui's 2010 work utilized n-gram models trained on web-crawled texts covering 67 languages, demonstrating how internet-scale data improved robustness and coverage in digital environments. Support vector machines emerged as a key classifier during this period; for instance, Kruengkrai et al. applied SVMs to character n-grams in 2005, enhancing classification precision for multilingual texts. Parallel corpora like Europarl, introduced by Koehn in 2005 and later used for source language identification in translations, further supported LID for European languages by offering aligned multilingual data. The late and witnessed the influence of shared tasks that standardized evaluation and spurred innovation amid surging computational resources. The Discriminating between Similar Languages (DSL) shared task, launched in 2014, utilized corpora like the DSL Corpus to benchmark methods for closely related languages. These events, alongside the proliferation of and GPU acceleration, enabled LID systems to process larger volumes of internet-sourced text with greater efficiency. In recent years, LID has integrated deeply with large language models, leveraging pre-trained architectures for superior performance. The release of multilingual BERT in 2018 allowed fine-tuning on LID tasks, yielding models that detect languages across 100+ variants with high accuracy using minimal domain-specific data, as evidenced by community implementations on platforms like . Subsequent advances include XLM-RoBERTa (2019), which improved multilingual representations for better generalization in low-resource LI, and ongoing shared tasks like VarDial (up to 2023) focusing on dialectal and similar language discrimination using transformer-based models. As of 2025, integration with larger LLMs such as enables zero-shot LI with accuracies exceeding 95% for many languages on short texts. The ubiquity of smartphones has amplified this progress by embedding LID in mobile ecosystems, powering features like automatic language detection in apps such as , which processes user inputs in real-time across over 100 languages to facilitate seamless communication.

Methods and Techniques

Statistical and Rule-Based Methods

Statistical methods for language identification rely on probabilistic models that analyze frequency distributions of textual elements, such as characters or words, to determine the most likely language. These approaches, including n-gram-based techniques, compute the probability P(languagetext)P(\text{language} \mid \text{text}) as the product of conditional probabilities of n-grams given the language, often approximated using maximum likelihood estimation from pre-built language profiles. For character-level identification, a common scoring mechanism sums the log-probabilities: score=logP(charnlanguage)\text{score} = \sum \log P(\text{char}_n \mid \text{language}), where charn\text{char}_n represents an n-gram of characters. This formulation enables efficient comparison against multiple language models without requiring extensive training data, making it suitable for resource-constrained environments. A seminal example is the Cavnar-Trenkle , which uses overlapping character n-grams of lengths 1 to 5 to generate ranked frequency profiles for each . The method scores a document by calculating an "out-of-place" distance metric based on rank differences between the document's n-gram ranks and those in the profile, selecting the with the minimal distance. Evaluated on 3,478 documents across eight Western European languages, it achieved 99.8% accuracy for texts longer than 300 bytes, demonstrating robustness to noise like OCR errors. Naive Bayes classifiers extend these statistical foundations by treating language identification as a generative classification task, assuming feature independence to compute posterior probabilities via Bayes' theorem: P(languagefeatures)P(language)P(featureilanguage)P(\text{language} \mid \text{features}) \propto P(\text{language}) \prod P(\text{feature}_i \mid \text{language}). Applied to word unigrams or character n-grams, this approach excels in low-compute settings, offering explainability through interpretable probability estimates and achieving near-perfect accuracy on sentence-length texts with smoothed 5-grams. Rule-based methods complement statistical techniques with hand-crafted heuristics, such as script detection via Unicode ranges to identify languages associated with specific writing systems—for instance, Cyrillic characters indicating Russian or Devanagari for Hindi. These deterministic rules, often combined with keyword matching for low-resource languages, provide rapid initial filtering but are limited to scenarios with distinct orthographic features. Overall, statistical and rule-based methods prioritize simplicity and efficiency, thriving in explainable, low-resource applications despite challenges with closely related language pairs.

Machine Learning and Neural Approaches

Machine learning approaches to language identification (LID) represent a shift from hand-crafted rules to data-driven models that learn patterns from large corpora. Classical methods typically involve feature extraction techniques such as term frequency-inverse document frequency (TF-IDF) applied to words or character n-grams, which are then fed into classifiers like support vector machines (SVMs) or random forests. For instance, TF-IDF weighting on lexical features has been shown to enhance LID performance by emphasizing distinctive terms across languages. SVMs, particularly using libraries like LIBLINEAR for efficient linear classification on high-dimensional n-gram features, have achieved strong results in tasks involving diverse language sets, such as distinguishing South African languages at word level. Random forests have also been effective as ensemble classifiers, often outperforming single models in shared tasks like the German Dialect Identification (GDI) evaluation. Neural approaches, emerging prominently after 2015, leverage architectures suited to sequential data for capturing contextual dependencies in text. Recurrent neural networks (RNNs) and (LSTM) units process character or word sequences, enabling better handling of morphological variations compared to classical methods. For example, LSTM-based models with character n-gram embeddings secured top performance in code-switched identification tasks. These methods marked an advancement in modeling long-range dependencies, though they require substantial training data. Transformer-based models, such as multilingual BERT (mBERT) introduced in 2018, utilize cross-lingual embeddings pre-trained on vast multilingual corpora to support zero-shot LID, where models identify unseen languages via . mBERT's contextual representations allow fine-tuning for LID without language-specific training, demonstrating utility in probing multilingual knowledge. Building on this, XLM-R (2020) extends to 100 languages with improved masked language modeling, enabling robust transfer for LID in low-resource settings. Advanced neural techniques include hierarchical models that first map scripts to language families before fine-grained classification, addressing ambiguities in shared writing systems. For instance, hierarchical classifiers in models like LIMIT process over 350 languages by cascading script detection with language-specific heads, improving accuracy on short texts. A typical neural LID pipeline computes probabilities as follows: output=softmax(Wembedding(text))\text{output} = \text{softmax}(W \cdot \text{embedding}(\text{text})) where the embedding layer captures contextual features from input text, and WW is a learned . Deep learning methods, particularly via in transformers, have yielded accuracies exceeding 95% on datasets spanning over 100 languages, as demonstrated by XLM-R adaptations in multilingual LID benchmarks. Recent developments as of 2025 include the application of generative large language models, such as fine-tuned GPT variants, for zero-shot LID in low-resource and code-switched scenarios, further enhancing generalization.

Challenges and Limitations

Distinguishing Similar Languages

Distinguishing closely related languages or dialects poses significant challenges in language identification () due to high degrees of lexical, phonological, and orthographic overlap. For instance, Spanish and exhibit approximately 89% , meaning a substantial portion of their vocabularies consists of cognates, which complicates automated differentiation based on word-level features alone. Similarly, dialects share the same script and much of the core lexicon derived from , obscuring phonological distinctions in written text and leading to frequent misclassifications, as over 56% of dialectal sentences can be valid across multiple varieties. To address these issues, fine-grained models leverage phonological features to capture subtle acoustic or orthographic differences that broader classifiers overlook. Perceptual phonetic similarity spaces, constructed from features such as inventories and qualities, enable clustering of related languages by quantifying overall phonetic overlap, aiding in the separation of Indo-European or Semitic pairs. Additionally, scores serve as a criterion in LID systems, scoring n-grams based on their information gain to isolate discriminative elements between similar languages, though they may prioritize domain-specific over linguistic cues. For specific cases like Serbian and Croatian, discriminative character n-grams combined with support vector machines achieve high in-domain accuracy (up to 99.5% F1 on corpora) by focusing on orthographic markers, but cross-domain performance drops due to stylistic variations. Case studies from shared tasks highlight persistent difficulties. In the Discriminating between Similar Languages (DSL) shared tasks from 2014 to 2017, systems struggled with Indo-Aryan pairs like and Bhojpuri, where linguistic proximity and short texts resulted in weighted F1 scores below 80% for confused instances, despite overall task accuracies reaching 88-90%. Orthographic reforms further exacerbate challenges, as seen in Norwegian and , where flexible spelling rules (e.g., variable endings like "-e" or "-a") allow multiple valid forms for the same word, reducing transcription consistency in LID models and contributing to error rates in automatic adaptations. Mitigation strategies often involve ensemble methods that integrate LID outputs with external signals, such as geographic metadata, to refine predictions for similar varieties. Region-specific LID models, which condition candidates on priors (e.g., limiting to local dialects in ), improve F-scores by up to 10 points for short texts by resolving ambiguities between closely related options.

Handling Multilingual and Dialectal Variations

Language identification systems encounter significant challenges when processing code-switched texts, where speakers alternate between two or more languages within a single , as seen in bilingual communities using , a mix of Spanish and English. This phenomenon is prevalent in and informal communication, complicating token-level detection due to overlapping vocabulary and grammatical structures. Conditional Random Fields (CRFs) have been employed effectively for word-level language tagging in such scenarios, leveraging features like lexical dictionaries, character n-grams, and contextual cues to label tokens as belonging to one language, the other, or mixed. For instance, in English-Spanish code-mixed data, CRF models achieve test accuracies around 85-95%, outperforming baselines by incorporating probability-based classifiers for ambiguous tokens. Dialectal variations further exacerbate identification difficulties, particularly for under-resourced languages where data scarcity limits model training. In African languages like , which exhibits substantial dialectal diversity across regions such as and mainland , variations in , morphology, and syntax hinder accurate detection, especially in speech or contexts influenced by . These challenges are amplified by the under-resourced nature of many such languages; with over 7,000 languages worldwide, only around 5%—approximately 350 languages based on digital presence indicators—have substantial digital text available as of 2023, leaving African dialects with minimal annotated resources. This data paucity results in models trained on standard variants performing poorly on dialectal inputs, often requiring specialized corpora to capture tonal and morphological nuances. Noisy inputs, including OCR errors and —where text from non-Latin scripts is romanized, such as words in English script—introduce additional variability that standard models struggle to handle. OCR , like character substitutions or deletions, can alter word forms, leading to misidentification in multilingual documents, while transliteration creates ambiguous representations that mimic multiple languages. Strategies to mitigate these include robust preprocessing techniques, such as simulation via rule-based edits or encoder-decoder models to generate synthetic noisy samples, and few-shot learning approaches that adapt models to limited examples despite label . These methods enhance robustness by mining hard examples and stabilizing representations between clean and noisy texts, improving performance on tasks like language detection in transliterated content. In real-world applications, failure to account for multilingual and dialectal variations can propagate errors to downstream tasks, such as on dialectal , where misidentification of dialects leads to accuracy drops of 10-20% compared to processing. For example, in multi-dialect datasets, models like SVM achieve only 51-61% accuracy on UAE and Egyptian dialects due to morphological complexity and , underscoring the need for dialect-aware identification to maintain reliable .

Evaluation and Tools

Performance Metrics and Datasets

Performance in language identification (LID) is primarily evaluated using accuracy, which measures the proportion of correctly identified language instances out of the total, though it can be misleading in imbalanced datasets where high-resource languages dominate. To address class imbalance, the F1-score is widely adopted, particularly the macro-averaged F1-score, calculated as the unweighted average of per-language F1-scores (where each language's F1 is the of ), which penalizes poor performance on minority languages and provides a balanced view of model effectiveness across diverse linguistic distributions. Additionally, the confusion matrix serves as a tool for , visualizing misclassifications between languages to reveal patterns such as frequent confusions between closely related varieties, enabling targeted improvements in model robustness. Key datasets for training and testing LID systems include the SETimes corpus (developed circa 2005–2007), which provides parallel texts for discriminating between similar like Bosnian, Croatian, and Serbian, including short excerpts to simulate real-world identification challenges in regional contexts. For broader coverage, the open introduced in 2023 aggregates web-sourced text across 201 languages, balancing samples to about 600,000 lines per language for a total of 121 million lines, facilitating evaluation of multilingual models on low-resource scenarios. An update, OpenLID-v2 released in October 2025, extends coverage to 200 language varieties with improved handling of dialects. In spoken LID, the VoxLingua107 from 2020 provides 6,628 hours of audio segments extracted from videos across 107 languages, averaging 62 hours per language, to support acoustic feature learning in diverse speech environments. Standard evaluation protocols involve k-fold cross-validation on held-out test sets to ensure generalizability, with splits maintaining balance to avoid to training distributions. Benchmarks from the Discriminating between Similar Languages (DSL) shared tasks highlight progress; for instance, the 2015 edition achieved top accuracies of 95.54% on text-based across 10 language groups using ensemble classifiers. Later iterations, such as those in VarDial evaluations up to 2022 and continuing through 2024, demonstrate incremental gains through neural methods, underscoring the shift toward handling dialectal nuances in multi-label settings. A notable limitation in LID datasets is bias toward high-resource languages, where English and similar Indo-European tongues are overrepresented compared to low-resource ones—leading to inflated metrics that fail to generalize to underrepresented languages and perpetuating inequities in global NLP applications.

Software Implementations

Several open-source libraries provide accessible implementations for language identification (LID), enabling developers to integrate LID into applications without proprietary dependencies. Langdetect, available in both and Python implementations, relies on an n-gram-based approach to detect languages and supports over 55 languages, making it suitable for straightforward text processing tasks. Google's Compact Language Detector 3 (CLD3), an open-source model, offers compact inference code and supports detection across more than 100 languages, prioritizing efficiency for resource-constrained environments. Frameworks like and Transformers facilitate the integration of LID through extensible pipelines and pre-trained models, allowing customization for specific use cases. In , extensions such as spacy-language-detection enable seamless addition of LID components to NLP workflows, leveraging underlying libraries for multi-language support. Transformers hosts numerous fine-tuned models for LID, including those based on architectures like XLM-RoBERTa, which can be deployed for detecting dozens of languages with high accuracy. A notable example is FastText's LID model from 2017, which uses linear classifiers on subword n-grams to identify 176 languages and remains widely adopted for its balance of speed and coverage. Commercial offerings provide scalable, managed LID services with additional features like confidence scoring and enterprise integration. Text Analytics API delivers real-time LID for unstructured text, returning language identifiers along with confidence scores for over 100 languages, optimized for cloud-based applications. Natural Language Understanding includes LID as part of its API suite, supporting detection in multiple languages for large-scale text analysis in business contexts. When deploying LID software, considerations such as latency and licensing play critical roles in practicality. For instance, lightweight models like FastText achieve latencies under 50 milliseconds for short texts on standard hardware, ensuring responsiveness in interactive systems. Licensing for open-source tools often permits free use under permissive terms, but extensions for low-resource languages may require custom training data, potentially involving additional compliance with data usage policies in commercial frameworks.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.