Zero-shot learning
View on WikipediaZero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples.
Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects.[1] For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception.[2]
Background and history
[edit]The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification.[3] The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning.[4] The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09.[5] This terminology was repeated later in another computer vision paper[6] and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier.[7]
In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes.
In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper[3] made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains,[8][9] fine entity typing[10] and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment[11] and question answering.[12]
The original paper[3] also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning).
Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation.
Prerequisite information for zero-shot classes
[edit]Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types.
- Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak".[6][13] These attributes are often organized in a structured compositional way, and taking that structure into account improves learning.[14] While this approach was used mostly in computer vision, there are some examples for it also in natural language processing.[15]
- Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class.[10][16][17]
- Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training.[18]
Generalized zero-shot learning
[edit]The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include:
- a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision,[19] or a soft probabilistic decision[20]
- a generative module, which is trained to generate feature representation of the unseen classes--a standard classifier can then be trained on samples from all classes, seen and unseen.[21]
Domains of application
[edit]Zero shot learning has been applied to the following fields:
See also
[edit]References
[edit]- ^ Xian, Yongqin; Lampert, Christoph H.; Schiele, Bernt; Akata, Zeynep (2020-09-23). "Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly". arXiv:1707.00600 [cs.CV].
- ^ Xian, Yongqin; Schiele, Bernt; Akata, Zeynep (2017). "Zero-shot learning-the good, the bad and the ugly". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 4582–4591. arXiv:1703.04394. Bibcode:2017arXiv170304394X.
- ^ a b c Chang, M.W. (2008). "Importance of Semantic Representation: Dataless Classification". AAAI.
- ^ Larochelle, Hugo (2008). "Zero-data Learning of New Tasks" (PDF).
- ^ Palatucci, Mark (2009). "Zero-Shot Learning with Semantic Output Codes" (PDF). NIPS.
- ^ a b Lampert, C.H. (2009). "Learning to detect unseen object classes by between-class attribute transfer". IEEE Conference on Computer Vision and Pattern Recognition: 951–958. CiteSeerX 10.1.1.165.9750.
- ^ Miller, E. G. (2000). "Learning from One Example Through Shared Densities on Transforms" (PDF). CVPR.
- ^ Song, Yangqiu (2019). "Toward any-language zero-shot topic classification of textual documents". Artificial Intelligence. 274: 133–150. doi:10.1016/j.artint.2019.02.002.
- ^ Song, Yangqiu (2016). "Cross-Lingual Dataless Classification for Many Languages" (PDF). IJCAI.
- ^ a b Zhou, Ben (2018). "Zero-Shot Open Entity Typing as Type-Compatible Grounding" (PDF). EMNLP. arXiv:1907.03228.
- ^ Yin, Wenpeng (2019). "Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach" (PDF). EMNLP. arXiv:1909.00161.
- ^ Levy, Omer (2017). "Zero-Shot Relation Extraction via Reading Comprehension" (PDF). CoNLL. arXiv:1706.04115.
- ^ Romera-Paredes, Bernardino; Torr, Phillip (2015). "An embarrassingly simple approach to zero-shot learning" (PDF). International Conference on Machine Learning: 2152–2161. Archived from the original (PDF) on 2017-03-13. Retrieved 2020-06-11.
- ^ Atzmon, Yuval; Chechik, Gal (2018). "Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning" (PDF). Uncertainty in Artificial Intelligence. arXiv:1806.02664. Bibcode:2018arXiv180602664A.
- ^ Roth, Dan (2009). "Aspect Guided Text Categorization with Unobserved Labels". ICDM. CiteSeerX 10.1.1.148.9946.
- ^ Hu, R Lily; Xiong, Caiming; Socher, Richard (2018). "Zero-Shot Image Classification Guided by Natural Language Descriptions of Classes: A Meta-Learning Approach" (PDF). NeurIPS.
- ^ Srivastava, Shashank; Labutov, Igor; Mitchelle, Tom (2018). "Zero-shot Learning of Classifiers from Natural Language Quantification". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 306–316. doi:10.18653/v1/P18-1029.
- ^ Frome, Andrea; et, al (2013). "Devise: A deep visual-semantic embedding model" (PDF). Advances in Neural Information Processing Systems: 2121–2129.
- ^ Socher, R; Ganjoo, M; Manning, C.D.; Ng, A. (2013). "Zero-shot learning through cross-modal transfer". Neural Information Processing Systems. arXiv:1301.3666. Bibcode:2013arXiv1301.3666S.
- ^ Atzmon, Yuval (2019). "Adaptive Confidence Smoothing for Generalized Zero-Shot Learning". The IEEE Conference on Computer Vision and Pattern Recognition: 11671–11680. arXiv:1812.09903. Bibcode:2018arXiv181209903A.
- ^ Felix, R; et, al (2018). "Multi-modal cycle-consistent generalized zero-shot learning". Proceedings of the European Conference on Computer Vision: 21–37. arXiv:1808.00136. Bibcode:2018arXiv180800136F.
- ^ Wittmann, Bruce J.; Yue, Yisong; Arnold, Frances H. (2020-12-04). "Machine Learning-Assisted Directed Evolution Navigates a Combinatorial Epistatic Fitness Landscape with Minimal Screening Burden" 2020.12.04.408955. doi:10.1101/2020.12.04.408955. S2CID 227914824.
{{cite journal}}: Cite journal requires|journal=(help)
Zero-shot learning
View on GrokipediaFundamentals
Definition and Motivation
Zero-shot learning (ZSL) is a machine learning paradigm that enables models to recognize and classify instances from unseen classes at test time, without any training examples for those classes, by leveraging auxiliary semantic information to transfer knowledge from seen classes.[4] This approach was formally introduced in the seminal work by Lampert et al. (2009), which framed ZSL as the problem of object classification where training and test classes are disjoint, meaning no visual examples of the target classes are available during training.[10] In essence, ZSL shifts the focus from data-driven pattern recognition to semantically informed inference, allowing systems to handle open-world scenarios where new categories continually emerge. The motivation for ZSL stems from the practical limitations of traditional supervised learning, which demands extensive labeled data for every class—a requirement that is often infeasible due to data scarcity, high annotation costs, and the dynamic nature of real-world environments.[4] By enabling generalization to novel categories without retraining, ZSL addresses these challenges and emulates human-like cognition, where individuals can infer properties of unfamiliar objects from linguistic descriptions or prior knowledge rather than direct observation. This capability is particularly valuable in domains like computer vision and natural language processing, where the explosion of potential classes outpaces data collection efforts. In the basic ZSL workflow, models are trained on a set of seen classes using paired visual features and auxiliary information, such as class attributes or textual descriptions, to learn a compatibility function that maps visual inputs to a shared semantic space.[4] At inference, unseen classes—described only semantically—are classified by projecting test instances into this space and matching them to the nearest unseen class representation via semantic transfer. For instance, a model trained on images of horses and patterns like stripes could classify a zebra (an unseen class) by recognizing its visual features as compatible with the attribute combination "striped horse," without ever encountering zebra images during training.[10]Comparison with Other Paradigms
Zero-shot learning (ZSL) fundamentally differs from supervised learning by enabling the recognition of entirely novel classes without any labeled training examples for those classes, instead leveraging auxiliary information like semantic descriptions or attributes to transfer knowledge from seen classes. In supervised learning, models require extensive labeled datasets covering all target classes to learn discriminative features, limiting applicability to scenarios where new categories emerge without prior data collection.[11] In contrast to few-shot learning, which adapts models using a minimal number of labeled examples (typically 1 to 5 per novel class) to generalize via metric-based or optimization techniques, ZSL relies solely on auxiliary knowledge without direct exemplars, emphasizing semantic bridging over episodic training.[12] One-shot learning, a specific case of few-shot learning, provides exactly one labeled example per new class to facilitate adaptation, whereas ZSL avoids even this single instance by focusing on cross-modal or embedding alignments for inference on unseen categories. ZSL also extends beyond traditional transfer learning, which typically involves pre-training on a source task with abundant data and fine-tuning on a related target task—often sharing similar classes or features—by enabling generalization to semantically related but completely novel classes through compatibility functions or shared latent spaces.[11] This semantic transfer in ZSL supports open-world applications where test classes are disjoint from training ones, unlike transfer learning's emphasis on domain adaptation within overlapping distributions. The following table summarizes key distinctions among these paradigms:| Paradigm | Data Requirements for Novel Classes | Generalization Type | Typical Use Cases |
|---|---|---|---|
| Supervised Learning | Many labeled examples per class | Intra-class discrimination within seen data | Abundant labeled datasets for closed-set classification[11] |
| Transfer Learning | Labeled source data; optional target labels | To related tasks/domains via feature reuse | Fine-tuning pre-trained models on similar problems[11] |
| Few-Shot Learning | 1–5 labeled examples per class | To novel classes with minimal support | Data-efficient adaptation in dynamic environments[12] |
| One-Shot Learning | Exactly 1 labeled example per class | To novel classes from single instance | Extreme data scarcity, e.g., personalized recognition |
| Zero-Shot Learning | Zero labeled examples; auxiliary information | To unseen classes via semantics | Open-vocabulary tasks like emerging categories |