Hubbry Logo
search
logo

Mamba (deep learning architecture)

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Mamba (deep learning architecture)

Mamba is a deep learning architecture focused on sequence modeling. It was developed by researchers from Carnegie Mellon University and Princeton University to address some limitations of transformer models, especially in processing long sequences. It is based on the structured state space sequence (S4) model.

To enable handling long data sequences, Mamba incorporates S4. S4 can effectively and efficiently model long dependencies by combining continuous time, and recurrent, and convolutional models. These enable it to handle irregularly sampled data, unbounded context, and remain computationally efficient during training and inferencing.

Mamba introduces significant enhancements to S4, particularly in its treatment of time-variant operations. It adopts a unique selection mechanism that adapts structured state space model (SSM) parameters based on the input. This enables Mamba to selectively focus on relevant information within sequences, effectively filtering out less pertinent data. The model transitions from a time-invariant to a time-varying framework, which impacts both computation and efficiency.

Mamba employs a hardware-aware algorithm that exploits GPUs, by using kernel fusion, parallel scan, and recomputation. The implementation avoids materializing expanded states in memory-intensive layers, thereby improving performance and memory usage. The result is significantly more efficient in processing long sequences compared to transformers.

Additionally, Mamba simplifies its architecture by integrating the SSM design with MLP blocks, resulting in a homogeneous and streamlined structure, furthering the model's capability for general sequence modeling across data types that include language, audio, and genomics, while maintaining efficiency in both training and inference.

Operating on byte-sized tokens, transformers scale poorly as every token must "attend" to every other token leading to O(n2) scaling laws, as a result, Transformers opt to use subword tokenization to reduce the number of tokens in text, however, this leads to very large vocabulary tables and word embeddings.

This research investigates a novel approach to language modeling, MambaByte, which departs from the standard token-based methods. Unlike traditional models that rely on breaking text into discrete units, MambaByte directly processes raw byte sequences. This eliminates the need for tokenization, potentially offering several advantages:

Subword tokenisation introduces a number of quirks in LLMs, such as failure modes where LLMs can't spell words, reverse certain words, handle rare tokens, which are not present in byte-level tokenisation.

See all
User Avatar
No comments yet.