Hubbry Logo
MultisetMultisetMain
Open search
Multiset
Community hub
Multiset
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Multiset
Multiset
from Wikipedia

In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set,[1] allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset. As a consequence, an infinite number of multisets exist that contain only elements a and b, but vary in the multiplicities of their elements:

  • The set {a, b} contains only elements a and b, each having multiplicity 1 when {a, b} is seen as a multiset.
  • In the multiset {a, a, b}, the element a has multiplicity 2, and b has multiplicity 1.
  • In the multiset {a, a, a, b, b, b}, a and b both have multiplicity 3.

These objects are all different when viewed as multisets, although they are the same set, since they all consist of the same elements. As with sets, and in contrast to tuples, the order in which elements are listed does not matter in discriminating multisets, so {a, a, b} and {a, b, a} denote the same multiset. To distinguish between sets and multisets, a notation that incorporates square brackets is sometimes used: the multiset {a, a, b} can be denoted by [a, a, b].[2]

The cardinality or "size" of a multiset is the sum of the multiplicities of all its elements. For example, in the multiset {a, a, b, b, b, c} the multiplicities of the members a, b, and c are respectively 2, 3, and 1, and therefore the cardinality of this multiset is 6.

Nicolaas Govert de Bruijn coined the word multiset in the 1970s, according to Donald Knuth.[3]: 694  However, the concept of multisets predates the coinage of the word multiset by many centuries. Knuth himself attributes the first study of multisets to the Indian mathematician Bhāskarāchārya, who described permutations of multisets around 1150. Other names have been proposed or used for this concept, including list, bunch, bag, heap, sample, weighted set, collection, and suite.[3]: 694 

History

[edit]

Wayne Blizard traced multisets back to the very origin of numbers, arguing that "in ancient times, the number n was often represented by a collection of n strokes, tally marks, or units."[4] These and similar collections of objects can be regarded as multisets, because strokes, tally marks, or units are considered indistinguishable. This shows that people implicitly used multisets even before mathematics emerged.

Practical needs for this structure have caused multisets to be rediscovered several times, appearing in literature under different names.[5]: 323  For instance, they were important in early AI languages, such as QA4, where they were referred to as bags, a term attributed to Peter Deutsch.[6] A multiset has been also called an aggregate, heap, bunch, sample, weighted set, occurrence set, and fireset (finitely repeated element set).[5]: 320 [7]

Although multisets were used implicitly from ancient times, their explicit exploration happened much later. The first known study of multisets is attributed to the Indian mathematician Bhāskarāchārya circa 1150, who described permutations of multisets.[3]: 694  The work of Marius Nizolius (1498–1576) contains another early reference to the concept of multisets.[8] Athanasius Kircher found the number of multiset permutations when one element can be repeated.[9] Jean Prestet published a general rule for multiset permutations in 1675.[10] John Wallis explained this rule in more detail in 1685.[11]

Multisets appeared explicitly in the work of Richard Dedekind.[12][13]

Other mathematicians formalized multisets and began to study them as precise mathematical structures in the 20th century. For example, Hassler Whitney (1933) described generalized sets ("sets" whose characteristic functions may take any integer value: positive, negative or zero).[5]: 326 [14]: 405  Monro (1987) investigated the category Mul of multisets and their morphisms, defining a multiset as a set with an equivalence relation between elements "of the same sort", and a morphism between multisets as a function that respects sorts. He also introduced a multinumber: a function f (x) from a multiset to the natural numbers, giving the multiplicity of element x in the multiset. Monro argued that the concepts of multiset and multinumber are often mixed indiscriminately, though both are useful.[5]: 327–328 [15]

Examples

[edit]

One of the simplest and most natural examples is the multiset of prime factors of a natural number n. Here the underlying set of elements is the set of prime factors of n. For example, the number 120 has the prime factorization which gives the multiset {2, 2, 2, 3, 5}.

A related example is the multiset of solutions of an algebraic equation. A quadratic equation, for example, has two solutions. However, in some cases they are both the same number. Thus the multiset of solutions of the equation could be {3, 5}, or it could be {4, 4}. In the latter case it has a solution of multiplicity 2. More generally, the fundamental theorem of algebra asserts that the complex solutions of a polynomial equation of degree d always form a multiset of cardinality d.

A special case of the above are the eigenvalues of a matrix, whose multiplicity is usually defined as their multiplicity as roots of the characteristic polynomial. However two other multiplicities are naturally defined for eigenvalues, their multiplicities as roots of the minimal polynomial, and the geometric multiplicity, which is defined as the dimension of the kernel of AλI (where λ is an eigenvalue of the matrix A). These three multiplicities define three multisets of eigenvalues, which may be all different: Let A be a n × n matrix in Jordan normal form that has a single eigenvalue. Its multiplicity is n, its multiplicity as a root of the minimal polynomial is the size of the largest Jordan block, and its geometric multiplicity is the number of Jordan blocks.

Definition

[edit]

A multiset may be formally defined as an ordered pair (U, m) where U is a set called a universe or the underlying set, and is a function from U to the nonnegative integers. The value for an element is called the multiplicity of in the multiset and intepreted as the number of occurrences of in the multiset.

The support of a multiset is the subset of formed by the elements such that . A finite multiset is a multiset with a finite support. Most authors define multisets as finite multisets. This is the case in this article, where, unless otherwise stated, all multisets are finite multisets.

Some authors[16] define multisets with the additional constraint that for every , or, equivalently, the support equals the underlying set. Multisets with infinite multiplicities have also been studied;[17] they are not considered in this article. Some authors[who?] define a multiset in terms of a finite index set and a function where the multiplicity of an element is , the number of elements of that get mapped to by .

Multisets may be represented as sets, with some elements repeated. For example, the multiset with support and multiplicity function such that can be represented as {a, a, b}. A more compact notation, in case of high multiplicities is for the same multiset.

If a multiset with support included in is often represented as to which the computation rules of indeterminates can be applied; that is, exponents 1 and factors with exponent 0 can be removed, and the multiset does not depend on the order of the factors. This allows extending the notation to infinite underlying sets as An advantage of notation is that it allows using the notation without knowing the exact support. For example, the prime factors of a natural number form a multiset such that

The finite subsets of a set are exactly the multisets with underlying set , such that for every .

Basic properties and operations

[edit]

Elements of a multiset are generally taken in a fixed set U, sometimes called a universe, which is often the set of natural numbers. An element of U that does not belong to a given multiset is said to have a multiplicity 0 in this multiset. This extends the multiplicity function of the multiset to a function from U to the set of non-negative integers. This defines a one-to-one correspondence between these functions and the multisets that have their elements in U.

This extended multiplicity function is commonly called simply the multiplicity function, and suffices for defining multisets when the universe containing the elements has been fixed. This multiplicity function is a generalization of the indicator function of a subset, and shares some properties with it.

The support of a multiset in a universe U is the underlying set of the multiset. Using the multiplicity function , it is characterized as

A multiset is finite if its support is finite, or, equivalently, if its cardinality is finite. The empty multiset is the unique multiset with an empty support (underlying set), and thus a cardinality 0.

The usual operations of sets may be extended to multisets by using the multiplicity function, in a similar way to using the indicator function for subsets. In the following, A and B are multisets in a given universe U, with multiplicity functions and

  • Inclusion: A is included in B, denoted AB, if
  • Union: the union (called, in some contexts, the maximum or lowest common multiple) of A and B is the multiset C with multiplicity function[13]
  • Intersection: the intersection (called, in some contexts, the infimum or greatest common divisor) of A and B is the multiset C with multiplicity function
  • Sum: the sum of A and B is the multiset C with multiplicity function It may be viewed as a generalization of the disjoint union of sets. It defines a commutative monoid structure on the finite multisets in a given universe. This monoid is a free commutative monoid, with the universe as a basis.
  • Difference: the difference of A and B is the multiset C with multiplicity function

Two multisets are disjoint if their supports are disjoint sets. This is equivalent to saying that their intersection is the empty multiset or that their sum equals their union.

There is an inclusion–exclusion principle for finite multisets (similar to the one for sets), stating that a finite union of finite multisets is the difference of two sums of multisets: in the first sum we consider all possible intersections of an odd number of the given multisets, while in the second sum we consider all possible intersections of an even number of the given multisets.[citation needed]

Counting multisets

[edit]
Bijection between 3-subsets of a 7-set (left)
and 3-multisets with elements from a 5-set (right)
So this illustrates that

The number of multisets of cardinality k, with elements taken from a finite set of cardinality n, is sometimes called the multiset coefficient or multiset number. This number is written by some authors as , a notation that is meant to resemble that of binomial coefficients; it is used for instance in (Stanley, 1997), and could be pronounced "n multichoose k" to resemble "n choose k" for Like the binomial distribution that involves binomial coefficients, there is a negative binomial distribution in which the multiset coefficients occur. Multiset coefficients should not be confused with the multinomial coefficients that occur in the multinomial theorem.

The value of multiset coefficients can be given explicitly as where the second expression is as a binomial coefficient;[a] many authors in fact avoid separate notation and just write binomial coefficients. So, the number of such multisets is the same as the number of subsets of cardinality k of a set of cardinality n + k − 1. The analogy with binomial coefficients can be stressed by writing the numerator in the above expression as a rising factorial power to match the expression of binomial coefficients using a falling factorial power:

For example, there are 4 multisets of cardinality 3 with elements taken from the set {1, 2} of cardinality 2 (n = 2, k = 3), namely {1, 1, 1}, {1, 1, 2}, {1, 2, 2}, {2, 2, 2}. There are also 4 subsets of cardinality 3 in the set {1, 2, 3, 4} of cardinality 4 (n + k − 1), namely {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}.

One simple way to prove the equality of multiset coefficients and binomial coefficients given above involves representing multisets in the following way. First, consider the notation for multisets that would represent {a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d} (6 as, 2 bs, 3 cs, 7 ds) in this form:

 •  •  •  •  •  •  |  •  •  |  •  •  •  |  •  •  •  •  •  •  •

This is a multiset of cardinality k = 18 made of elements of a set of cardinality n = 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of cardinality 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of cardinality 4 − 1 of a set of cardinality 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of cardinality 18 of a set of cardinality 18 + 4 − 1. This is thus is the value of the multiset coefficient and its equivalencies:

From the relation between binomial coefficients and multiset coefficients, it follows that the number of multisets of cardinality k in a set of cardinality n can be written Additionally,

Recurrence relation

[edit]

A recurrence relation for multiset coefficients may be given as with

The above recurrence may be interpreted as follows. Let be the source set. There is always exactly one (empty) multiset of size 0, and if n = 0 there are no larger multisets, which gives the initial conditions.

Now, consider the case in which n, k > 0. A multiset of cardinality k with elements from [n] might or might not contain any instance of the final element n. If it does appear, then by removing n once, one is left with a multiset of cardinality k − 1 of elements from [n], and every such multiset can arise, which gives a total of possibilities.

If n does not appear, then our original multiset is equal to a multiset of cardinality k with elements from [n − 1], of which there are

Thus,

Generating series

[edit]

The generating function of the multiset coefficients is very simple, being As multisets are in one-to-one correspondence with monomials, is also the number of monomials of degree d in n indeterminates. Thus, the above series is also the Hilbert series of the polynomial ring

As is a polynomial in n, it and the generating function are well defined for any complex value of n.

Generalization and connection to the negative binomial series

[edit]

The multiplicative formula allows the definition of multiset coefficients to be extended by replacing n by an arbitrary number α (negative, real, or complex):

With this definition one has a generalization of the negative binomial formula (with one of the variables set to 1), which justifies calling the negative binomial coefficients:

This Taylor series formula is valid for all complex numbers α and X with |X| < 1. It can also be interpreted as an identity of formal power series in X, where it actually can serve as definition of arbitrary powers of series with constant coefficient equal to 1; the point is that with this definition all identities hold that one expects for exponentiation, notably

and formulas such as these can be used to prove identities for the multiset coefficients.

If α is a nonpositive integer n, then all terms with k > −n are zero, and the infinite series becomes a finite sum. However, for other values of α, including positive integers and rational numbers, the series is infinite.

Applications

[edit]

Multisets have various applications.[7] They are becoming fundamental in combinatorics.[18][19][20][21] Multisets have become an important tool in the theory of relational databases, which often uses the synonym bag.[22][23][24] For instance, multisets are often used to implement relations in database systems. In particular, a table (without a primary key) works as a multiset, because it can have multiple identical records. Similarly, SQL operates on multisets and returns identical records. For instance, consider "SELECT name FROM Student". In the case that there are multiple records with name "Sara" in the student table, all of them are shown. That means the result of an SQL query is a multiset; if the result were instead a set, the repetitive records in the result set would have been eliminated. Another application of multisets is in modeling multigraphs. In multigraphs there can be multiple edges between any two given vertices. As such, the entity that specifies the edges is a multiset, and not a set.

There are also other applications. For instance, Richard Rado used multisets as a device to investigate the properties of families of sets. He wrote, "The notion of a set takes no account of multiple occurrence of any one of its members, and yet it is just this kind of information that is frequently of importance. We need only think of the set of roots of a polynomial f (x) or the spectrum of a linear operator."[5]: 328–329 

Generalizations

[edit]

Different generalizations of multisets have been introduced, studied and applied to solving problems.

  • Signed multisets (in which multiplicity of an element can be any integer)[25]
  • Real-valued multisets (in which multiplicity of an element can be any real number)[26]
  • Fuzzy multisets[27]
  • Rough multisets[28]
  • Hybrid sets[29]
  • Multisets whose multiplicity is any real-valued step function[30]
  • Soft multisets[31]
  • Soft fuzzy multisets[32]
  • Named sets (unification of all generalizations of sets)[33][34][35][36]

See also

[edit]
  • Frequency (statistics) as multiplicity analog
  • Quasi-sets
  • Set theory
  • Bag-of-words model
  • Learning materials related to Partitions of multisets at Wikiversity

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , a multiset (also known as a or mset) is a of a set that permits multiple instances of elements, where each element has an associated multiplicity or indicating how many times it appears. Unlike standard sets, which enforce uniqueness and disregard order, multisets emphasize the count of occurrences while remaining unordered collections. Formally, a multiset can be constructed as a pair consisting of a base set of distinct elements and a function assigning a non-negative integer multiplicity to each element, such as the multiset represented by {a: 2, b: 3, d: 1}, meaning a appears twice, b three times, and d once. Common operations on multisets include the union, which takes the maximum multiplicity for each element across the two multisets; the intersection, which takes the minimum multiplicity; and the sum (or multiset addition), which adds the multiplicities element-wise. These operations extend set-theoretic concepts to handle repetitions systematically, enabling comparisons and manipulations based on cardinality, where the total cardinality of a multiset is the sum of all multiplicities. The foundational development of multiset theory occurred in the late , with Wayne D. Blizard establishing a rigorous axiomatic framework in 1989 that defines multisets as structures allowing finite repetitions of elements within a setting. Blizard's subsequent 1991 survey traces the evolution of multiset theory from early informal uses in and to a formalized branch of , highlighting its distinctions from related concepts like fuzzy sets or sequences. Multisets play a crucial role in , where they model selections with repetition, such as the number of multisets of size k from n types given by the \binom{n+k-1}{k}, often called "stars and bars." In , they underpin data structures like bags or multisets in libraries (e.g., C++'s std::multiset) for efficient storage and querying of elements with duplicates, and they appear in algorithms for , such as multiset intersection protocols. Further applications include for knowledge representation, software verification with constraints, and in and , where multisets facilitate frequency-based similarity measures like the over repetitions.

Fundamentals

Definition

In mathematics, a multiset is a generalization of the concept of a set that allows multiple instances, or multiplicities, of elements, thereby distinguishing it from standard sets where elements are unique. Formally, a multiset MM is defined as an ordered pair (X,μ)(X, \mu), where XX is a set (often called the underlying set or the domain) and μ:XN0\mu: X \to \mathbb{N}_0 is a multiplicity function that assigns to each element xXx \in X a non-negative integer μ(x)\mu(x) representing the number of times xx occurs in the multiset. This definition assumes familiarity with basic set theory, including the notions of sets and functions. Alternative formalizations of a multiset include representing it as a set in which elements may appear more than once, such as {a,a,b}\{a, a, b\}, where repetitions are explicitly listed; however, this notation is informal and less suitable for infinite or abstract contexts. Another equivalent approach specifies a multiset via its support—the subset of elements with positive multiplicity—and the corresponding multiplicity map restricted to that support, ensuring μ(x)>0\mu(x) > 0 only for xx in the support. In a more general transfinite setting, as introduced by Richard Rado, a multiset can be viewed as a function from a set to the class of cardinal numbers, with the domain consisting of elements assigned nonzero cardinals. The of a multiset M=(X,μ)M = (X, \mu) is defined as the sum of the multiplicities over its underlying set, given by M=xXμ(x),|M| = \sum_{x \in X} \mu(x), which counts the total number of element occurrences, including repetitions. The empty multiset, denoted \emptyset or 00, is the pair (,μ)(\emptyset, \mu) where the underlying set is empty (equivalently, μ(x)=0\mu(x) = 0 for all xx in some ), and it has 0. A singleton multiset consists of an underlying set with one element xx and μ(x)=1\mu(x) = 1, yielding 1.

Examples

A shopping containing two apples and one exemplifies a multiset, which can be denoted as {apple, apple, } to indicate repetitions or more compactly as {apple: 2, : 1}, preserving the multiplicity of each item. In comparison, viewing the same through the lens of a standard set yields {apple, }, which eliminates multiplicity information and thus fails to capture the full contents. A deck of cards, particularly when multiple decks are combined in games allowing duplicates of suits and ranks, represents a multiset where identical cards coexist without regard to order. In , the prime of an provides a clear example: the number 12 decomposes into the multiset {2, 2, 3}, reflecting the repeated prime factor 2. Multisets are commonly notated by listing elements with repetitions, such as {a, a, b}, or by specifying multiplicities, as in {a: 2, b: 1} to denote two instances of a and one of b. The of a multiset, or its total size, sums the multiplicities of its elements; for instance, the multiset {1: 3, 2: 1} has 4.

Properties and Operations

Basic Properties

A multiset MM over a base set XX is defined by its multiplicity function μM:XN0\mu_M: X \to \mathbb{N}_0, where N0\mathbb{N}_0 denotes the non-negative integers, and the multiplicity μM(x)\mu_M(x) indicates the number of occurrences of each element xXx \in X. Two multisets MM and NN over the same base set are equal, denoted M=NM = N, their multiplicity functions agree for every element, that is, μM(x)=μN(x)\mu_M(x) = \mu_N(x) for all xXx \in X. This equality condition directly follows from the definition of multisets via multiplicity functions, as any difference in multiplicities would distinguish the collections. The inclusion relation for multisets generalizes set inclusion to account for multiplicities. A multiset MM is a submultiset of another multiset NN, denoted MNM \subseteq N, if μM(x)μN(x)\mu_M(x) \leq \mu_N(x) for every xXx \in X. This relation is reflexive (MMM \subseteq M) and transitive (MNM \subseteq N and NPN \subseteq P imply MPM \subseteq P), forming a partial order on the collection of multisets over XX. A proper submultiset holds when the inequality is strict for at least one xx, ensuring MNM \neq N. To verify inclusion, one compares multiplicities element-wise; if all μM(x)μN(x)\mu_M(x) \leq \mu_N(x) and equality holds everywhere, then M=NM = N, otherwise MM is a proper submultiset if the condition holds without equality. The support of a multiset MM, denoted supp(M)\operatorname{supp}(M), is the underlying set consisting of elements with positive multiplicity: supp(M)={xXμM(x)>0}\operatorname{supp}(M) = \{ x \in X \mid \mu_M(x) > 0 \}. This support forms an ordinary set without repetitions and determines the distinct elements present in MM, relating directly to the base set XX as a where multiplicities vanish outside supp(M)\operatorname{supp}(M). For submultisets, supp(M)supp(N)\operatorname{supp}(M) \subseteq \operatorname{supp}(N) if MNM \subseteq N, since zero multiplicity in MM is compatible with positive in NN, but the converse does not hold without multiplicity checks. When the base set XX is equipped with a \leq, an ordered multiset inherits this order on its elements, allowing identification of based on the order of distinct items in the support. The maximal element of MM is the largest xsupp(M)x \in \operatorname{supp}(M) under \leq, regardless of multiplicity, and similarly the minimal element is the smallest such xx. In this totally ordered context, these extremal elements coincide with the greatest and least elements of the support, providing endpoints for the ordered structure.

Arithmetic Operations

Multisets support a variety of arithmetic operations that extend their set-theoretic counterparts by accounting for element multiplicities, enabling the of new multisets from existing ones. These operations are defined via the multiplicity function μ:UN0\mu: U \to \mathbb{N}_0, where UU is the universal set and N0\mathbb{N}_0 denotes non-negative integers, ensuring that the resulting multiplicity for each element is computed based on the input multiplicities. Such operations are fundamental in and for modeling collections with repetitions, like bags in databases or weighted selections. The union of two multisets MM and NN, denoted MNM \cup N, is the multiset whose multiplicity function is given by μMN(x)=max(μM(x),μN(x))\mu_{M \cup N}(x) = \max(\mu_M(x), \mu_N(x)) for all xUx \in U. This operation retains the highest occurrence count for each element, analogous to set union but preserving multiplicities. For example, if M={a:2,b:1}M = \{a: 2, b: 1\} and N={a:1,c:3}N = \{a: 1, c: 3\}, then MN={a:2,b:1,c:3}M \cup N = \{a: 2, b: 1, c: 3\}. Union is commutative, as max(μM(x),μN(x))=max(μN(x),μM(x))\max(\mu_M(x), \mu_N(x)) = \max(\mu_N(x), \mu_M(x)); associative, since max(max(a,b),c)=max(a,max(b,c))\max(\max(a, b), c) = \max(a, \max(b, c)) for non-negative integers a,b,ca, b, c; and idempotent, with MM=MM \cup M = M. The intersection MNM \cap N takes the minimum multiplicity: μMN(x)=min(μM(x),μN(x))\mu_{M \cap N}(x) = \min(\mu_M(x), \mu_N(x)) for all xUx \in U, including only elements common to both with their overlapping counts. Using the prior example, MN={a:1}M \cap N = \{a: 1\}. Intersection shares the commutativity and associativity of union, via min(μM(x),μN(x))=min(μN(x),μM(x))\min(\mu_M(x), \mu_N(x)) = \min(\mu_N(x), \mu_M(x)) and min(min(a,b),c)=min(a,min(b,c))\min(\min(a, b), c) = \min(a, \min(b, c)), and is idempotent since MM=MM \cap M = M. The sum, or disjoint union, M+NM + N adds multiplicities directly: μM+N(x)=μM(x)+μN(x)\mu_{M + N}(x) = \mu_M(x) + \mu_N(x) for all xUx \in U, useful in combinatorial enumeration for combining independent selections. For the example multisets, M+N={a:3,b:1,c:3}M + N = \{a: 3, b: 1, c: 3\}. This operation is commutative, as addition of non-negative integers is commutative, and associative, since (a+b)+c=a+(b+c)(a + b) + c = a + (b + c). The difference MNM - N subtracts multiplicities, flooring at zero: μMN(x)=max(μM(x)μN(x),0)\mu_{M - N}(x) = \max(\mu_M(x) - \mu_N(x), 0) for all xUx \in U, removing elements from NN as much as possible from MM. In the example, MN={a:1,b:1}M - N = \{a: 1, b: 1\}, but NM={c:3}N - M = \{c: 3\}, highlighting its non-commutativity. Unlike the others, difference lacks general associativity or idempotence. The M×NM \times N extends pairs across elements, with multiplicity as the product: μM×N((x,y))=μM(x)μN(y)\mu_{M \times N}((x, y)) = \mu_M(x) \cdot \mu_N(y) for all x,yUx, y \in U, forming a multiset over U×UU \times U. For M={a:2}M = \{a: 2\} and N={b:1,c:2}N = \{b: 1, c: 2\}, M×NM \times N includes (a,b)(a, b) with multiplicity 2 and (a,c)(a, c) with multiplicity 4. This operation is commutative in the sense that M×N=N×MM \times N = N \times M up to relabeling of pairs and is associative when extended to multiple factors.

Enumeration

Recurrence Relations

The number of multisets of cardinality kk that can be formed from a base set of nn distinct types, often denoted b(n,k)b(n, k), counts the combinations where repetitions of types are permitted and the order of elements is irrelevant. A recursive approach computes b(n,k)b(n, k) via the relation b(n,k)=b(n1,k)+b(n,k1)b(n, k) = b(n-1, k) + b(n, k-1) for n1n \geq 1 and k1k \geq 1, with base cases b(n,0)=1b(n, 0) = 1 for all n0n \geq 0 (corresponding to the single empty multiset), b(0,0)=1b(0, 0) = 1, and b(0,k)=0b(0, k) = 0 for k>0k > 0 (as no positive-sized multiset is possible without types). This recurrence derives from partitioning multisets based on the highest type label, assuming types are labeled 11 through nn. Multisets excluding type nn match those of size kk from n1n-1 types, yielding b(n1,k)b(n-1, k) cases. Multisets including at least one type nn map bijectively to those of size k1k-1 from nn types by removing one instance of nn, yielding b(n,k1)b(n, k-1) cases. The then combines these disjoint cases. For illustration, the table below shows computed values of b(n,k)b(n, k) for small nn and kk, obtained by applying the recurrence iteratively from the base cases:
nkn \setminus k01234
010000
111111
212345
31361015
These values demonstrate the step-by-step buildup, such as b(3,2)=b(2,2)+b(3,1)=3+3=6b(3, 2) = b(2, 2) + b(3, 1) = 3 + 3 = 6. As a non-recursive intuition, the recurrence aligns with the stars and bars method, visualizing kk indistinguishable elements (stars) distributed among nn types via n1n-1 dividers (bars), though the recursive formulation emphasizes iterative construction over direct combinatorial placement.

Generating Functions

Generating functions provide a powerful tool for enumerating multisets by encoding the number of such objects of various sizes into a . For multisets formed from nn distinct types, the ordinary arises from the independent choice of multiplicity for each type, where the multiplicity mm of a type contributes the m=0xm=11x\sum_{m=0}^{\infty} x^m = \frac{1}{1-x}. Thus, the overall is the product i=1n11x=1(1x)n\prod_{i=1}^n \frac{1}{1-x} = \frac{1}{(1-x)^n}, which counts multisets of any size by the exponent of xx. This product form reflects the independence of choices across types, as each factor corresponds to one type and the coefficient of xkx^k sums over all ways to distribute kk indistinguishable elements into nn types allowing repetitions. The coefficient of xkx^k in 1(1x)n\frac{1}{(1-x)^n}, denoted [xk]1(1x)n[x^k] \frac{1}{(1-x)^n}, gives the number of kk-element multisets from nn types, which equals the binomial coefficient (n+k1k)\binom{n+k-1}{k}. This follows from the generalized , where (1x)n=k=0(n+k1k)xk(1-x)^{-n} = \sum_{k=0}^{\infty} \binom{n+k-1}{k} x^k. For unrestricted multiset size, the full series k=0b(n,k)xk=1(1x)n\sum_{k=0}^{\infty} b(n,k) x^k = \frac{1}{(1-x)^n}, with b(n,k)b(n,k) denoting the number of kk-multisets, provides a compact representation for problems. In the multivariate case, when types are labeled and multiplicities tracked separately, the generating function becomes i=1nm=0xim=i=1n11xi\prod_{i=1}^n \sum_{m=0}^{\infty} x_i^m = \prod_{i=1}^n \frac{1}{1 - x_i}. Here, the of x1a1xnanx_1^{a_1} \cdots x_n^{a_n} is 1 if each ai0a_i \geq 0, corresponding to the unique multiset with those multiplicities. This form extends the univariate case and is useful for distinguishing types in applications like partition enumeration. For example, consider integer partitions, which can be viewed as multisets of positive integers. The generating function for unrestricted partitions (allowing repeated parts) is i=111xi\prod_{i=1}^{\infty} \frac{1}{1 - x^i}, reflecting unlimited repetitions of each part size. In contrast, partitions into distinct parts (equivalent to sets, not multisets) have generating function i=1(1+xi)\prod_{i=1}^{\infty} (1 + x^i), where each part appears at most once, leading to different coefficient growth rates.

Connections to Binomial Expansions

The enumeration of multisets is intimately connected to the generalized binomial theorem, particularly through expansions involving negative exponents. The number of multisets of cardinality kk chosen from nn distinct elements is given by the binomial coefficient (n+k1k)\binom{n + k - 1}{k}, which arises as the coefficient in the series expansion (1x)n=k=0(n+k1k)xk(1 - x)^{-n} = \sum_{k=0}^{\infty} \binom{n + k - 1}{k} x^k. This identity follows from the generalized binomial theorem, where the binomial coefficient for negative upper index is (nk)=(1)k(n+k1k)\binom{-n}{k} = (-1)^k \binom{n + k - 1}{k}, linking the algebraic expansion directly to multiset counting upon accounting for the sign alternation. The negative binomial series (1x)n(1 - x)^{-n} converges for x<1|x| < 1, providing an infinite power series whose coefficients precisely enumerate unrestricted multisets of varying sizes from nn types. For instance, the coefficient of xkx^k counts the ways to select kk elements with repetition allowed from nn options, equivalent to the number of non-negative integer solutions to x1++xn=kx_1 + \cdots + x_n = k. A combinatorial proof of this coefficient uses the stars and bars theorem: represent the kk selected elements as kk stars (*) and the n1n-1 dividers between the nn types as bars (| ), forming a sequence of n+k1n + k - 1 positions where n1n-1 are chosen for bars, yielding (n+k1n1)=(n+k1k)\binom{n + k - 1}{n-1} = \binom{n + k - 1}{k} arrangements. This visualization maps directly to distributing indistinguishable selections into distinct bins, confirming the multiset count. For generalizations, when multiplicities are weighted or bounded, the exponent in the binomial expansion can be adjusted to reflect the structure, such as using fractional or variable exponents in the series to model weighted selections. An example of the negative upper index connection is that the number of multisets of size kk from nn elements equals (-1)^k \binom{-n}{k}, illustrating how the combinatorial interpretation "negates" the usual subset selection to allow repetitions. The hockey-stick identity further relates multiset counts by summing them: k=0m(n+k1k)=(n+mm)\sum_{k=0}^{m} \binom{n + k - 1}{k} = \binom{n + m}{m}, which enumerates all multisets of cardinality at most mm from nn elements, as the partial sum of the negative binomial series coefficients. This identity provides a closed form for cumulative multiset enumerations, bridging individual counts to total selections up to a limit.

Applications

Combinatorics and Algebra

In partition theory, multisets provide a natural framework for representing integer partitions, where the parts of the partition correspond to the elements of the multiset, allowing repetitions to indicate multiplicity in the summands. This perspective is particularly useful for typed or colored partitions, where distinct types distinguish otherwise identical parts, extending classical unrestricted partitions to more general combinatorial objects. For instance, a multiset partition algebra has been developed to handle such structures, generalizing the Robinson-Schensted-Knuth correspondence to insertions on multiset partitions. Multisets play a key role in the theory of symmetric functions by extending classical bases like power sums and elementary symmetric polynomials to account for multiplicities. The power sum symmetric functions pk=ixikp_k = \sum_i x_i^k and elementary symmetric polynomials ek=1i1<<ikxi1xike_k = \sum_{1 \leq i_1 < \cdots < i_k} x_{i_1} \cdots x_{i_k} can be generalized to multisets, where variables are weighted by their multiplicities, enabling the enumeration of symmetric objects with repeated labels. This extension aligns with Joyal's symmetric species framework, where multisets serve as the domain for functors mapping finite sets to combinatorial structures, yielding generating functions that capture multiplicity-adjusted symmetries. Seminal work in this area establishes operations like sum, product, and plethysm for symmetric species, directly corresponding to those on symmetric functions. In graph theory, multisets model the edges of multigraphs, where parallel edges between vertices form a multiset of edge incidences, allowing multiple connections without loops in simple cases. The degree sequence of a multigraph, defined as the nonincreasing list of vertex degrees, is inherently a multiset, as degrees can repeat across vertices, facilitating the study of graphical realizations and extremal properties. For example, realizing a given multiset as a degree sequence in a multigraph requires the sum of degrees to be even and no degree exceeding the possible connections. Combinatorial identities involving multisets often center on permutation counts, where the number of distinct permutations of a multiset with nn elements and multiplicities k1,k2,k_1, k_2, \dots for distinct types is given by the multinomial coefficient n!k1!k2!\frac{n!}{k_1! k_2! \cdots}. This formula arises from dividing the total permutations of a set by the repetitions within each type, providing a core identity for counting arrangements with indistinguishable objects. Such identities underpin broader results, like those in the , and extend to colored multisets for multifactorial generalizations. In algebraic structures, multisets appear in commutative algebra through multiset ideals, which generalize monomial ideals by allowing multiplicities in generators, particularly in zero-dimensional cases where they locally resemble monomial ideals under Gröbner bases. These ideals support efficient computations via algorithms like Cerlienco-Mureddu for lexicographic standard monomials. Additionally, in combinatorial species theory, multisets form the basis for species functors, as introduced by Joyal, enabling the algebraic manipulation of labeled structures through exponential generating functions that account for relabeling and multiplicity. Multisets are treated via extensions such as symmetric species, facilitating compositions like substitutions for complex enumerative problems. A specific application arises in necklace counting, where multisets encode color repetitions to determine distinct configurations under rotation. Bijections to multisets with divisible subset sums simplify enumeration for bounded colors. This approach equates necklaces with at most qq colors to multisets of integers modulo nn whose subset sums are divisible by nn, providing an explicit combinatorial correspondence.

Computer Science and Data Structures

In computer science, multisets are implemented using data structures that efficiently handle duplicate elements and their multiplicities, enabling operations like insertion, deletion, and querying counts. Balanced binary search trees, such as red-black trees, are commonly used for ordered multisets, as in the C++ standard library's std::multiset, which maintains elements in sorted order while allowing multiples of the same key. This structure supports logarithmic time complexity for key operations, making it suitable for scenarios requiring sorted access or range queries. Alternatively, hash maps provide an unordered implementation by associating each unique element with a counter for its multiplicity, as seen in libraries like Apache Commons Collections' HashMultiSet, which offers average-case constant-time insertions and lookups at the expense of order preservation. Algorithms for multiset operations, such as union and , leverage the underlying structure for efficiency. For sorted multisets, union and can be computed in linear time O(n + m) using merge-like procedures on sorted sequences, similar to those in C++'s <algorithm> header for set operations adapted to handle multiplicities by taking maximum or minimum counts, respectively. These algorithms find applications in sorting with duplicates, where multisets preserve order and multiplicity during sorts, avoiding the need for auxiliary tracking structures. Arithmetic operations like multiset union and , as defined mathematically, align with these implementations for practical computation. Programming languages provide built-in or library support for multisets through specialized classes. In Python, the collections.Counter class implements a multiset as a subclass, enabling frequency counting of hashable objects with methods like update() for adding elements and most_common() for retrieving elements by count. Similarly, Java's library offers a Multiset interface with implementations like HashMultiset and TreeMultiset, supporting operations such as count(), add(), and remove() to manage multiplicities efficiently. In databases, multisets model duplicate records naturally, with SQL's GROUP BY clause combined with COUNT() aggregating rows by key while tracking frequencies, as standardized in and implemented in systems like SQL Server. For example, querying SELECT column, COUNT(*) FROM table GROUP BY column produces a multiset view of unique values and their occurrences, facilitating duplicate without explicit multiplicity storage. The time and space complexities of multiset operations vary by implementation. Tree-based multisets, like std::multiset, achieve O(log n) time for insert, delete, and search operations, with O(n) space for n elements, due to the balanced overhead. Hash-based versions offer amortized O(1) time for these operations but may degrade to O(n) in worst-case scenarios from hash collisions, still using O(n) space. As of 2025, modern applications include , where bag-of-words models in use CountVectorizer to represent documents as multisets of terms for frequency-based feature extraction in tasks like text classification. In search engines, term frequencies are handled as multisets to compute relevance scores, such as in TF-IDF weighting, powering inverted indexes in systems like .

Extensions

Finite Variations

A bounded multiset is a generalization of a standard multiset where the multiplicity function μ satisfies μ(x) ≤ m for each element x and some fixed positive integer m, restricting the number of occurrences of any single element. This constraint models scenarios in combinatorics and computer science where repetitions are limited, such as in resource allocation or extremal set theory analogs like intersecting families of k-multisets from ^m. The number of bounded multisets of cardinality k over an n-element universe is given by the coefficient of z^k in the generating function (1 + z + ⋯ + z^m)^n, which expands using the finite geometric series formula \frac{1 - z^{m+1}}{1 - z}. Counting such objects can also employ inclusion-exclusion principles to account for violations of the bound. An m-uniform multiset extends this idea by requiring exactly m occurrences for every element in its support, meaning the support has size k and the total is mk for some k. These structures arise in the study of multiset permutations, where arrangements of such multisets exhibit symmetries analyzable via Gray codes or cyclic sieving phenomena. For instance, the multiset M(n, m) consists of m copies each of n distinct symbols, and its permutations form the set P(n, m) with \frac{(nm)!}{\prod_{i=1}^n m!}. Bounded multisets relate closely to combinatorial designs, particularly block designs, where blocks are interpreted as multisets of points with multiplicity constraints reflecting replication numbers. In such representations, the collection of blocks forms a multiset itself, allowing repeated blocks, and the between blocks is computed as the sum of products of point multiplicities. Automorphisms preserve these multiplicities, ensuring the design's symmetry under point permutations that map blocks to equivalent multisets. This framework generalizes classical balanced incomplete block designs to accommodate bounded repetitions, useful in statistical and applications. Properties of bounded multisets require adjustments to standard arithmetic operations to maintain the multiplicity bound. As an example, the bounded reformulates the input as a bounded multiset of item types, where each type i has multiplicity at most u_i (the availability bound), weights w_i, and values v_i. The goal is to select a submultiset S with total weight ∑{x ∈ S} w_x ≤ W (capacity) that maximizes ∑{x ∈ S} v_x, solvable via dynamic programming in O(n W ∑ u_i) time. This multiset perspective highlights the constraints on repetitions, distinguishing it from unbounded variants.

Infinite and Fuzzy Multisets

Infinite multisets extend the standard multiset framework to scenarios where the support—the set of elements with positive multiplicity—may be infinite. Formally, given a countable universe XX, an infinite multiset is defined by a multiplicity function μ:XN{0,}\mu: X \to \mathbb{N} \cup \{0, \infty\} (or more generally to cardinals), where the total cardinality xXμ(x)\sum_{x \in X} \mu(x) may be infinite. Cases with finite support, where μ(x)>0\mu(x) > 0 for only finitely many xx, align with finite multisets but highlight the boundary with infinite structures. These constructions appear in measure theory, for instance, as discrete atomic measures where multiplicities represent point masses. A key property of infinite multisets is that the may be infinite, with the m-cardinality providing a unified measure incorporating both the count of distinct elements and their multiplicities, yielding transfinite m-cardinals. In , infinite multisets can model local multiplicity in spaces with accumulation points, though operations like may not extend unambiguously. Counting infinite multisets poses challenges beyond finite cases, with cardinality addressed through the m-cardinality, a unified measure incorporating both the of distinct elements and their multiplicities, yielding transfinite m-cardinals strictly less than 0\aleph_0. Fuzzy multisets further generalize by replacing multiplicities with degrees of membership in the unit interval, defined as a function μ:X[0,1]\mu: X \to [0,1] for each element, capturing partial or vague repetitions beyond crisp counts. This formulation extends Zadeh's s—where membership degrees model uncertainty—by incorporating multiset-like repetition through aggregated or sequenced degrees within [0,1], as formalized by Yager. Basic operations follow fuzzy set conventions: intersection via pointwise minimum, μAB(x)=min(μA(x),μB(x))\mu_{A \cap B}(x) = \min(\mu_A(x), \mu_B(x)), and union via maximum, μAB(x)=max(μA(x),μB(x))\mu_{A \cup B}(x) = \max(\mu_A(x), \mu_B(x)), preserving Zadeh's extension principle for function propagation under vagueness. For fuzzy multisets, properties include α\alpha-cuts to convert to crisp multisets: for α[0,1]\alpha \in [0,1], the α\alpha-cut [μ]α[\mu]_\alpha is the multiset where each xx has multiplicity equal to the number of membership degrees α\geq \alpha, yielding nested crisp structures as α\alpha increases. Inverse α\alpha-cuts, defined as multiplicities for degrees <α< \alpha, are monotonic increasing and facilitate decomposition, though the ground set does not fully decompose into α\alpha-cut and inverse pairs unlike in standard fuzzy sets. Convergence for infinite fuzzy multisets requires the integral or sum of degrees to be finite, ensuring bounded total membership akin to probability measures. These extensions tease applications in for handling vague collections, with fuller details in dedicated sections on uses. For example, in , a fuzzy multiset might represent a as a collection of terms where each term tt has a relevance score μ(t)[0,1]\mu(t) \in [0,1] reflecting contextual importance or weighted frequency, enabling nuanced similarity computations.

Historical Context

Origins and Early Concepts

The concept of multisets, which permit multiple occurrences of elements unlike traditional sets, traces its implicit origins to early combinatorial problems involving repetitions. In the , the Indian mathematician addressed permutations and combinations with repetition in his treatise Lilavati, effectively employing multiset-like structures to count arrangements where identical items could recur, such as in poetic meters or selections from limited types. This approach anticipated formal multiset theory by treating repetitions as inherent to the collection rather than distinct entities. In the , Leonhard Euler's work on integer partitions during the further exemplified implicit multiset usage as a precursor to 19th-century . Euler analyzed ways to express numbers as sums of positive integers, disregarding order but allowing repeated parts, as seen in his derivations for partition counts; this treated the parts as a multiset whose elements could repeat without violating the structure. Such explorations in highlighted the utility of repetitions, influencing later combinatorial methods. The formal distinction of multisets from standard sets emerged in the mid-20th century amid the development of axiomatic . The Bourbaki group's Théorie des Ensembles (starting 1939, English 1968) rigorously defined sets via , prohibiting duplicate elements and emphasizing unique membership, which underscored the need for a separate construct to handle multiplicities in applications like algebra and geometry. By the 1950s, N. G. de Bruijn utilized multiset concepts in on sequences, paving the way for formalization. then popularized the idea in his 1968 text , Volume 1, referring to multisets as "bags" in discussions of data structures and combinatorial algorithms, where repetitions were essential for unordered collections.

Key Developments and Contributors

The term "multiset" was coined by mathematician in the 1970s through private correspondence with , who subsequently popularized the concept and defined key operations on multisets in his seminal 1973 work, , Volume 3: Sorting and Searching. Knuth's formalization integrated multisets into algorithmic analysis, particularly for sorting and problems, establishing them as a fundamental structure in . De Bruijn further advanced multiset theory in during the 1970s, developing generating functions to count multisets and their partitions. These methods linked multisets to broader combinatorial identities, influencing subsequent work on cycle indices and in algebra. In the late , Wayne D. Blizard established a rigorous axiomatic framework for multiset theory in , allowing finite repetitions of elements. Blizard's 1991 survey traced the evolution of multiset theory from early informal uses in and statistics to a formalized branch of , highlighting its distinctions from related concepts like fuzzy sets or sequences. The foundations for generalizations like fuzzy multisets were laid by Lotfi A. Zadeh's 1965 introduction of fuzzy sets, which allowed graded memberships and inspired extensions to handle multiplicities. Ronald R. Yager formalized fuzzy multisets in 1986, defining them as structures where elements have multivalued fuzzy memberships, enabling applications in uncertain reasoning. In parallel, Zdzisław Pawlak's 1982 rough set theory incorporated multiset-like approximations for handling incomplete data, bridging multisets with granular computing in the 1980s. Modern developments include integrations of multisets into , such as multiset-enriched categories for modeling nondeterministic computations and the multiset monad. Milestones encompass the formal inclusion of multisets as data types in ISO/IEC 15909-1 (2004) for high-level Petri nets, standardizing their use in .

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.