Hubbry Logo
search
logo

Chinese restaurant process

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

The restaurant analogy first appeared in a 1985 write-up by David Aldous,[1] where it was attributed to Jim Pitman (who additionally credits Lester Dubins).[2]

An equivalent partition process was published a year earlier by Fred Hoppe,[3] using an "urn scheme" akin to Pólya's urn. In comparison with Hoppe's urn model, the Chinese restaurant process has the advantage that it naturally lends itself to describing random permutations via their cycle structure, in addition to describing random partitions.

Formal definition

[edit]

For any positive integer , let denote the set of all partitions of the set . The Chinese restaurant process takes values in the infinite Cartesian product .

The value of the process at time is a partition of the set , whose probability distribution is determined as follows. At time , the trivial partition is obtained (with probability one). At time the element "" is either:

  1. added to one of the blocks of the partition , where each block is chosen with probability where is the size of the block (i.e. number of elements), or
  2. added to the partition as a new singleton block, with probability .

The random partition so generated has some special properties. It is exchangeable in the sense that relabeling does not change the distribution of the partition, and it is consistent in the sense that the law of the partition of obtained by removing the element from the random partition is the same as the law of the random partition .

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

where is a block in the partition and is the size of .

The definition can be generalized by introducing a parameter which modifies the probability of the new customer sitting at a new table to and correspondingly modifies the probability of them sitting at a table of size to . The vanilla process introduced above can be recovered by setting . Intuitively, can be interpreted as the effective number of customers sitting at the first empty table.

Alternative definition

[edit]

An equivalent, but subtly different way to define the Chinese restaurant process, is to let new customers choose companions rather than tables.[4] Customer chooses to sit at the same table as any one of the seated customers with probability , or chooses to sit at a new, unoccupied table with probability . Notice that in this formulation, the customer chooses a table without having to count table occupancies---we don't need .

Distribution of the number of tables

[edit]
Chinese restaurant table
Parameters


Support
PMF
Mean
(see digamma function)

The Chinese restaurant table distribution (CRT) is the probability distribution on the number of tables in the Chinese restaurant process.[5] It can be understood as the sum of independent Bernoulli random variables, each with a different parameter:

The probability mass function of is given by [6]

where denotes Stirling numbers of the first kind.

Two-parameter generalization

[edit]

This construction can be generalized to a model with two parameters, & ,[2][7] commonly called the strength (or concentration) and discount parameters respectively. At time , the next customer to arrive finds occupied tables and decides to sit at an empty table with probability

or at an occupied table of size with probability

In order for the construction to define a valid probability measure it is necessary to suppose that either and for some ; or that and .

Under this model the probability assigned to any particular partition of , can be expressed in the general case (for any values of that satisfy the above-mentioned constraints) in terms of the Pochhammer k-symbol, as

where, the Pochhammer k-symbol is defined as follows: by convention, , and for

where is the rising factorial and is the falling factorial. It is worth noting that for the parameter setting where and , then , which evaluates to zero whenever , so that is an upper bound on the number of blocks in the partition; see the subsection on the Dirichlet-categorical model below for more details.

For the case when and , the partition probability can be rewritten in terms of the Gamma function as

In the one-parameter case, where is zero, and this simplifies to

Or, when is zero, and

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.

If , the probability distribution of the random partition of the integer thus generated is the Ewens distribution with parameter , used in population genetics and the unified neutral theory of biodiversity.

Animation of a Chinese restaurant process with scaling parameter . Tables are hidden once the customers of a table can not be displayed anymore; however, every table has infinitely many seats. (Recording of an interactive animation.[8])

Derivation

[edit]

Here is one way to derive this partition probability. Let be the random block into which the number is added, for . Then

The probability that is any particular partition of the set is the product of these probabilities as runs from to . Now consider the size of block : it increases by one each time we add one element into it. When the last element in block is to be added in, the block size is . For example, consider this sequence of choices: (generate a new block )(join )(join )(join ). In the end, block has 4 elements and the product of the numerators in the above equation gets . Following this logic, we obtain as above.

Expected number of tables

[edit]

For the one parameter case, with and , the number of tables is distributed according to the chinese restaurant table distribution. The expected value of this random variable, given that there are seated customers, is[9]

where is the digamma function. For the two-parameter case, for , the expected number of occupied tables is[7]

where is the rising factorial (as defined above).

The Dirichlet-categorical model

[edit]

For the parameter choice and , where , the two-parameter Chinese restaurant process is equivalent to the Dirichlet-categorical model, which is a hierarchical model that can be defined as follows. Notice that for this parameter setting, the probability of occupying a new table, when there are already occupied tables, is zero; so that the number of occupied tables is upper bounded by . If we choose to identify tables with labels that take values in , then to generate a random partition of the set , the hierarchical model first draws a categorical label distribution, from the symmetric Dirichlet distribution, with concentration parameter . Then, independently for each of the customers, the table label is drawn from the categorical . Since the Dirichlet distribution is conjugate to the categorical, the hidden variable can be marginalized out to obtain the posterior predictive distribution for the next label state, , given previous labels

where is the number of customers that are already seated at table . With and , this agrees with the above general formula, , for the probability of sitting at an occupied table when . The probability for sitting at any of the unoccupied tables, also agrees with the general formula and is given by

The marginal probability for the labels is given by

where and is the rising factorial. In general, there are however multiple label states that all correspond to the same partition. For a given partition, , which has blocks, the number of label states that all correspond to this partition is given by the falling factorial, . Taking this into account, the probability for the partition is

which can be verified to agree with the general version of the partition probability that is given above in terms of the Pochhammer k-symbol. Notice again, that if is outside of the support, i.e. , the falling factorial, evaluates to zero as it should. (Practical implementations that evaluate the log probability for partitions via will return , whenever , as required.)

Relationship between Dirichlet-categorical and one-parameter CRP

[edit]

Consider on the one hand, the one-parameter Chinese restaurant process, with and , which we denote ; and on the other hand the Dirichlet-categorical model with a positive integer and where we choose , which as shown above, is equivalent to . This shows that the Dirichlet-categorical model can be made arbitrarily close to , by making large.

Stick-breaking process

[edit]

The two-parameter Chinese restaurant process can equivalently be defined in terms of a stick-breaking process.[10] For the case where and , the stick breaking process can be described as a hierarchical model, much like the above Dirichlet-categorical model, except that there is an infinite number of label states. The table labels are drawn independently from the infinite categorical distribution , the components of which are sampled using stick breaking: start with a stick of length 1 and randomly break it in two, the length of the left half is and the right half is broken again recursively to give . More precisely, the left fraction, , of the -th break is sampled from the beta distribution:

The categorical probabilities are:

For the parameter settings and , where is a positive integer, and where the categorical is finite: , we can sample from an ordinary Dirchlet distribution as explained above, but it can also be sampled with a truncated stick-breaking recipe, where the formula for sampling the fractions is modified to:

and .

The Indian buffet process

[edit]

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet process and can be used to infer latent features in data.[11]

Applications

[edit]

The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data,[12] biodiversity modelling, and image reconstruction [13][14]

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Chinese restaurant process (CRP) is a discrete-time stochastic process in probability theory and Bayesian nonparametrics that generates random partitions of a set of observations into an unbounded number of clusters, metaphorically described as customers arriving sequentially at a restaurant with infinitely many tables and choosing where to sit based on the current occupancy. In this analogy, the first customer sits at the first table, each subsequent customer joins an existing table with probability proportional to the number of customers already seated there or starts a new table with probability proportional to a concentration parameter α>0\alpha > 0, yielding a distribution over cluster assignments that favors a small number of densely populated clusters while allowing for unlimited growth.[1][2] Originally formulated by David Blackwell and James B. MacQueen in 1973 as a Pólya urn scheme for generating Ferguson distributions—random probability measures with the Dirichlet property—the process provided a constructive method for sampling from such distributions without directly specifying the underlying measure.[3] The evocative "Chinese restaurant" metaphor, inspired by the seemingly endless seating in Berkeley eateries, first appeared in print in 1985, attributed to Jim Pitman and Lester Dubins from their earlier discussions.[1] Mathematically, for nn customers, the probability of a specific assignment z\mathbf{z} with kk tables of sizes m1,m2,,mkm_1, m_2, \dots, m_k (where mj=n\sum m_j = n) is given by
P(z)=αkj=1k(mj1)!i=1n1(α+i), P(\mathbf{z}) = \frac{\alpha^k \prod_{j=1}^k (m_j - 1)!}{\prod_{i=1}^{n-1} (\alpha + i)},
ensuring exchangeability—the joint distribution is invariant to permutations of the customers.[2] This formulation arises as the posterior predictive distribution under a Dirichlet process prior GDP(αG0)G \sim \mathrm{DP}(\alpha G_0), where G0G_0 is a base probability measure, linking the CRP directly to nonparametric Bayesian inference by enabling infinite mixture models for density estimation and clustering.[1] The CRP's key properties include its reinforcement mechanism, which promotes rich-get-richer dynamics leading to power-law-like cluster size distributions, and an expected number of occupied tables after nn customers of approximately αi=1n1i1+ααlogn\alpha \sum_{i=1}^n \frac{1}{i-1 + \alpha} \approx \alpha \log n, reflecting sparse but growing complexity suitable for real-world data with unknown numbers of categories.[2] It has been generalized in various forms, such as the two-parameter CRP (introducing a discount parameter for more flexible tail behaviors) and the nested CRP (for hierarchical structures like topic models in documents), extending its applicability to complex data like trees and sequences.[1] In practice, the CRP underpins algorithms for nonparametric clustering in machine learning, including Dirichlet process mixture models for tasks like image segmentation and natural language processing, where it allows models to adaptively determine the number of components without fixed hyperparameters.[2]

Overview

The restaurant metaphor

The Chinese restaurant process is an intuitive metaphor for generating random partitions of a set of elements, often used to model clustering in data where the number of clusters is not fixed in advance. Imagine an infinitely large restaurant with an endless supply of tables, each capable of seating infinitely many customers. Customers arrive sequentially to be seated, representing data points being assigned to groups or clusters. The first customer has no choice but to sit alone at a new table, establishing the initial cluster. Subsequent customers decide where to sit based on a simple rule that favors joining occupied tables while allowing for the creation of new ones. Specifically, a new customer prefers to join an existing table in proportion to the number of people already seated there—a mechanism known as preferential attachment, which encourages the growth of popular clusters. Alternatively, the customer may opt to start a fresh table with probability proportional to the concentration parameter α\alpha, introducing a new cluster. This seating dynamic naturally produces partitions of the customers into tables, where the tables symbolize distinct clusters and the sizes of the tables reflect cluster memberships. The beauty of this process lies in its exchangeability: the probability of any particular seating arrangement depends only on the relative sizes of the tables, not on the order in which customers arrive. This property ensures that the metaphor captures scenarios where the sequence of observations is arbitrary, modeling real-world data like customer behaviors or species groupings without assuming a predetermined structure. Over many customers, the resulting cluster sizes often follow a power-law distribution, with a few large clusters and many small ones, mimicking patterns seen in natural and social systems. This analogy serves as a constructive way to understand sampling from more complex probabilistic models, such as the Dirichlet process, by sequentially building partitions without specifying the total number upfront.

Historical background

The underlying stochastic process was originally formulated by David Blackwell and James B. MacQueen in 1973 as a Pólya urn scheme for generating samples from the Dirichlet process prior introduced by Thomas S. Ferguson.[4] The Chinese restaurant process (CRP) was first described as a metaphor for generating exchangeable random partitions in a 1985 lecture by David Aldous, who attributed the idea to Jim Pitman and Lester Dubins from their earlier work in the early 1980s on constructing consistent sequences of random permutations and partitions via de Finetti's theorem on exchangeability. This metaphorical framing built on foundational concepts in probability, including applications of de Finetti's theorem to infinite exchangeable sequences, as explored by Pitman and Dubins in their contributions to partition structures.[5] Preceding the CRP metaphor, related ideas appeared in Hoppe's 1984 development of a Pólya-like urn scheme that generates the Ewens sampling formula, a distribution on partitions arising in population genetics and neutral allele models, providing an early sequential construction analogous to the restaurant seating process.[6] These urn models echoed broader influences from de Finetti's exchangeability framework, which Pitman and Dubins extended to characterize limits of finite exchangeable partitions. The CRP gained prominence through Pitman's 1995 paper on exchangeable and partially exchangeable random partitions, where it served as a key tool for analyzing cycle structures in random permutations and connections to Poisson-Dirichlet distributions. Pitman further elaborated on its role in his 2002 Saint-Flour lectures (published 2006), linking the process to combinatorial stochastic models and inference in phylogenetics, solidifying its place in the study of Poisson-Dirichlet processes.[7] In the context of Bayesian nonparametrics, the CRP evolved as a sequential sampling representation for the Dirichlet process, originally introduced by Ferguson in 1973 as a prior over distributions. This connection, highlighting the CRP's utility in generating partitions from Dirichlet process mixtures, emerged in the 1990s and early 2000s, facilitating computational inference in nonparametric models.[7] The "Chinese restaurant" naming originates from a metaphor devised by Lester Dubins and Jim Pitman in the early 1980s, inspired by cultural analogies to large seating arrangements.[7]

The one-parameter model

Formal definition

The one-parameter Chinese restaurant process is a discrete-time stochastic process defined by a concentration parameter α>0\alpha > 0, which controls the rate at which new tables (clusters) are formed relative to joining existing ones.[7] This parameter influences the trade-off between reinforcement of existing partitions and innovation through new components in the generated random partition.[7] The process begins with the first customer (n=1n=1), who always starts a new table, establishing the initial singleton partition {{1}}\{\{1\}\}.[7] For the nnth customer (n2n \geq 2), the assignment rule is as follows: the probability of joining an existing table kk with mkm_k customers already seated is mkn1+α\frac{m_k}{n-1 + \alpha}, while the probability of opening a new table is αn1+α\frac{\alpha}{n-1 + \alpha}.[7] These sequential probabilities ensure that the process builds partitions incrementally, with each step refining the previous configuration. The Chinese restaurant process exhibits consistency: the distribution of the partition induced on the first mm customers (m<nm < n) is the same regardless of the total number nn of customers considered, and as nn \to \infty, it yields a proper exchangeable random partition of the natural numbers N\mathbb{N}.[7] For a specific partition of nn customers into kk tables of sizes n1,,nk>0n_1, \dots, n_k > 0 (with ni=n\sum n_i = n), the probability is
αki=1k(ni1)!α(n), \frac{\alpha^k \prod_{i=1}^k (n_i - 1)!}{\alpha^{(n)}},
where α(n)=α(α+1)(α+n1)\alpha^{(n)} = \alpha (\alpha + 1) \cdots (\alpha + n - 1) denotes the rising factorial (Pochhammer symbol for positive steps).[7] The restaurant metaphor illustrates these rules intuitively, with customers arriving sequentially and tables representing partition blocks.[7]

Exchangeability and partition probabilities

The assignments generated by the Chinese restaurant process form an exchangeable random partition of the set of customers, meaning the joint distribution of table assignments is invariant under any finite permutation of the customer labels.[5] This exchangeability arises from the sequential construction, where each customer's seating decision depends only on the current table sizes, not the order or identities of prior customers.[5] In the context of de Finetti's theorem for partially exchangeable sequences, the process admits a representation as a mixture over limiting relative frequencies of table sizes.[5] The probability of observing a specific partition of nn customers into kk tables with sizes n1,,nkn_1, \dots, n_k (where ni=n\sum n_i = n) under the one-parameter Chinese restaurant process with parameter α>0\alpha > 0 is given by
αkα(n)i=1k(ni1)!, \frac{\alpha^k}{\alpha^{(n)}} \prod_{i=1}^k (n_i - 1)!,
where α(n)=α(α+1)(α+n1)\alpha^{(n)} = \alpha (\alpha + 1) \cdots (\alpha + n - 1) denotes the rising factorial.[8] This formula, known as the exchangeable partition probability function (EPPF), applies to any partition of the given type and follows directly from the sequential rules.[5] To derive this, consider any ordering of the nn customers that respects the partition structure (i.e., customers within each table appear consecutively in blocks, but the blocks and internal order can vary). The probability of this specific sequence under the process is the product of individual seating probabilities: the first customer always starts a new table (probability 1); each subsequent customer starting a new table has probability α/(α+m1)\alpha / (\alpha + m - 1), where mm is the number of prior customers; and each customer joining an existing table of size jj has probability j/(α+m1)j / (\alpha + m - 1). For a partition with kk tables, there are exactly kk "new table" events, and within each table of size nin_i, the (ni1)(n_i - 1) joiners each contribute a factor that telescopes to (ni1)!/(ni1+α1)(ni1)(n_i - 1)! / (n_i - 1 + \alpha - 1)^{(n_i - 1)} relative to the denominators.[8] Aggregating over the full sequential product yields αki=1k(ni1)!/α(n)\alpha^k \prod_{i=1}^k (n_i - 1)! / \alpha^{(n)}, independent of the particular ordering due to exchangeability.[5] Thus, the EPPF gives the probability mass for all ni!\prod n_i! labeled assignments compatible with the partition type, normalized appropriately.[8] This EPPF coincides with Ewens' sampling formula when α=θ\alpha = \theta, providing the distribution of allele type counts in the infinite alleles model of population genetics.[9] In Bayesian nonparametrics, the same partition structure emerges marginally from a Dirichlet process with concentration α\alpha and uniform base measure over a countably infinite discrete space.[10]

Distribution of the number of tables

In the one-parameter Chinese restaurant process with concentration parameter α>0\alpha > 0, the number of tables KnK_n after nn customers follows the Chinese restaurant table (CRT) distribution.[7] The probability mass function is given by
P(Kn=k)=s(n,k)αkα(n), P(K_n = k) = \frac{|s(n, k)| \alpha^k}{\alpha^{(n)}},
where s(n,k)|s(n, k)| denotes the unsigned Stirling numbers of the first kind, which count the number of permutations of nn elements with exactly kk cycles, and α(n)=α(α+1)(α+n1)\alpha^{(n)} = \alpha (\alpha + 1) \cdots (\alpha + n - 1) is the rising factorial.[7] An alternative constructive representation expresses KnK_n as the sum of nn independent Bernoulli random variables: Kn=j=1nZjK_n = \sum_{j=1}^n Z_j, where Z1=1Z_1 = 1 and, for j2j \geq 2, ZjBernoulli(αα+j1)Z_j \sim \text{Bernoulli}\left( \frac{\alpha}{\alpha + j - 1} \right).[7] This reflects the sequential process, with Zj=1Z_j = 1 indicating that the jjth customer starts a new table. The expected value is E[Kn]=α(ψ(n+α)ψ(α))E[K_n] = \alpha \left( \psi(n + \alpha) - \psi(\alpha) \right), where ψ\psi is the digamma function.[7] For large nn, this behaves asymptotically as E[Kn]αlognE[K_n] \approx \alpha \log n, capturing the logarithmic growth in the number of clusters.[7] The variance is Var(Kn)=j=1nα(j1)(α+j1)2\text{Var}(K_n) = \sum_{j=1}^n \frac{\alpha (j-1)}{(\alpha + j - 1)^2}, which asymptotically approximates αlogn\alpha \log n.[7] This scaling underscores the power-law behavior in clustering, where the number of tables exhibits sublinear growth with high variability relative to the mean.[7]

Generalizations

Two-parameter model

The two-parameter Chinese restaurant process generalizes the one-parameter model by incorporating a discount parameter 0α<10 \leq \alpha < 1 alongside the concentration parameter θ>0\theta > 0. The discount parameter α\alpha modifies the preferential attachment to existing tables, reducing the influence of table size and allowing for heavier-tailed cluster size distributions compared to the one-parameter case. In the sequential construction, the first customer starts the first table. For the nnth customer (n2n \geq 2), the probability of joining an existing table jj with mjm_j customers is mjαn1+θ\frac{m_j - \alpha}{n-1 + \theta}, provided mj>αm_j > \alpha to ensure non-negativity. The probability of starting a new table is θ+kαn1+θ\frac{\theta + k\alpha}{n-1 + \theta}, where kk is the current number of tables. This construction generates exchangeable random partitions, with the discount α\alpha effectively "penalizing" larger tables by subtracting α\alpha from their attraction weights. The probability of a specific partition with cluster sizes n1,,nkn_1, \dots, n_k (summing to nn) is
j=1k1(θ+jα)i=1kni(1α)j=1n1(θ+j), \frac{ \prod_{j=1}^{k-1} (\theta + j \alpha) \prod_{i=1}^k n_i^{(1-\alpha)} }{ \prod_{j=1}^{n-1} (\theta + j) },
where the generalized rising factorial is θ(n,α)=θ(θ+α)(θ+(n1)α)\theta^{(n,\alpha)} = \theta (\theta + \alpha) \cdots (\theta + (n-1)\alpha) (though not used in the denominator here, which uses step size 1) and ni(1α)n_i^{(1-\alpha)} denotes the corresponding generalized rising factorial for each cluster size, capturing the sequential attachments within clusters adjusted for the discount. This closed-form expression arises from multiplying the sequential probabilities over all customer arrival orders consistent with the partition and accounting for the number of such orders, which simplifies due to exchangeability. The derivation extends the one-parameter case by incorporating the discount α\alpha into both the table attraction terms (replacing mjm_j with mjαm_j - \alpha) and the new-table probability (adding kαk\alpha), leading to the generalized factorials in the cluster products and the adjusted product for new tables in the numerator, with the denominator reflecting the step-1 rising products from the unchanging total mass at each step. When α=0\alpha = 0, the two-parameter model recovers the one-parameter Chinese restaurant process. As α1\alpha \to 1, the model relates to uniform partitions, favoring configurations with many small clusters.

Connection to the Dirichlet process

The Chinese restaurant process (CRP) serves as a sequential sampling mechanism that generates the partition structure underlying samples from a Dirichlet process. Specifically, for a Dirichlet process $ \mathrm{DP}(\alpha, H) $ with concentration parameter $ \alpha > 0 $ and base measure $ H $, the one-parameter CRP with parameter $ \alpha $, denoted $ \mathrm{CRP}(\alpha) $, describes the conditional probabilities for assigning successive observations to clusters via sequential conditioning. This connection was established through an urn scheme generalization, where the CRP marginalizes the cluster assignments from the full Dirichlet process model. In the Dirichlet-categorical model, the joint distribution involves drawing a discrete measure from a Dirichlet process over a finite but potentially large number $ k $ of atoms, followed by categorical sampling from those atoms according to weights drawn from a Dirichlet distribution $ \mathrm{Dir}(\alpha/k, \dots, \alpha/k) $. Marginalizing over the atom locations and weights yields the CRP partition probabilities exactly in the limit as $ k \to \infty $. This construction highlights how the CRP captures the exchangeable partition induced by the Dirichlet process without specifying the base measure $ H $. For the one-parameter case, the resulting partition probabilities match Ewens' sampling formula from population genetics, which describes the distribution of allele frequencies under neutral evolution and coincides with the CRP probabilities for parameter $ \theta = \alpha $. The two-parameter generalization of the CRP, with discount parameter $ 0 \leq \alpha < 1 $ and concentration parameter $ \theta > -\alpha $, corresponds to the marginal partition structure of the Pitman-Yor process $ \mathrm{DP}(\theta, \alpha, H) $, extending the Dirichlet process to allow power-law cluster size distributions. The two-parameter sequential rules form the basis for the Pitman-Yor marginal.[11] An alternative representation of the Dirichlet process is the stick-breaking construction, where the random measure is expressed as $ G = \sum_{j=1}^\infty \pi_j \delta_{\phi_j} $, with locations $ \phi_j \sim H $ independent and weights $ \pi_j = v_j \prod_{i=1}^{j-1} (1 - v_i) $, where $ v_j \stackrel{\text{iid}}{\sim} \mathrm{Beta}(1, \alpha) $. Sampling discretely from this measure—by drawing observations according to the weights $ \pi $—induces the one-parameter CRP dynamics for the cluster assignments. For the two-parameter case, the stick-breaking weights generalize to $ v_j \sim \mathrm{Beta}(1 - \alpha, \theta + j \alpha) $, linking directly to the Pitman-Yor process. In the two-parameter CRP, the expected number of tables (clusters) after $ n $ customers satisfies $ \mathbb{E}[K_n] \approx \frac{\theta}{\alpha} \Gamma(1 - \alpha)^{-1} n^\alpha $ for large $ n $ and $ \alpha > 0 $, illustrating sublinear growth in the number of clusters that deviates from the logarithmic growth $ \mathbb{E}[K_n] \sim \alpha \log n $ of the one-parameter case. This asymptotic reflects the increased clustering diversity introduced by the discount parameter.[12] Overall, the CRP acts as the de Finetti measure for the exchangeable sequences generated by the Dirichlet process, providing the explicit partition probabilities consistent with de Finetti's theorem for infinite exchangeable random variables.

The Indian buffet process

The Indian buffet process (IBP) serves as a Bayesian nonparametric prior for feature allocation models, enabling data points to belong to multiple clusters or possess multiple latent features in a non-exclusive manner, in contrast to partition-based processes like the Chinese restaurant process.[13] It generates sparse binary matrices where rows represent data points (diners) and columns represent features (dishes), with entries indicating feature presence.[14] The process ensures exchangeability of rows, meaning the joint distribution is invariant to row permutations, facilitating inference over infinite feature spaces.[13] In the metaphorical construction, diners enter an Indian buffet sequentially and sample dishes, where each dish corresponds to a potential feature. The first diner samples a Poisson(α\alpha) number of new dishes, setting the initial features to present for that diner. Subsequent diners, upon entering as the nnth diner, sample each existing dish jj independently with probability mj/nm_j / n, where mjm_j is the number of previous diners who have sampled dish jj, and then sample an additional Poisson(α/n\alpha / n) number of new dishes, assigning presence to those novel features.[13] This one-parameter IBP(α\alpha), with α>0\alpha > 0 controlling the expected number of features per data point, produces binary matrices with exchangeable rows and a sparse structure, as later diners tend to reuse popular features while occasionally introducing innovations.[14] The two-parameter IBP(α,θ\alpha, \theta) extends this framework by incorporating a concentration parameter θ>0\theta > 0 alongside the mass parameter α>0\alpha > 0, allowing for more flexible control over feature popularity and innovation rates.[14] In the generative process, the first diner samples Poisson(α\alpha) dishes. For the nnth diner (n2n \geq 2), the probability of sampling each existing dish jj is mjn1+θ\frac{m_j}{n-1 + \theta}, and the diner samples a Poisson(αθn1+θ)\left( \frac{\alpha \theta}{n-1 + \theta} \right) number of new dishes. This formulation modulates the influence of popular features via θ\theta; when θ=1\theta = 1, it reduces to the one-parameter case.[14] Marginally, under the one-parameter IBP(α\alpha), the number of features assigned to each diner follows a Poisson(α\alpha) distribution, independent across diners, while the total number of unique features across NN diners is distributed as Poisson(αHN)\left( \alpha H_N \right), where HNlogN+γH_N \approx \log N + \gamma is the NNth harmonic number and γ\gamma is the Euler-Mascheroni constant, yielding approximately αlogN\alpha \log N expected distinct features.[13] For the two-parameter version, the per-diner feature count remains Poisson(α\alpha) distributed and independent, while the total unique features are distributed as Poisson(αi=1Nθθ+i1)\left( \alpha \sum_{i=1}^N \frac{\theta}{\theta + i - 1} \right), exhibiting logarithmic growth approximately αlogN\alpha \log N for large NN, with θ\theta affecting the rate in finite samples.[14] The IBP arises as a sequential sampling construction from the beta-Bernoulli process, analogous to how the Chinese restaurant process relates to the Dirichlet process; specifically, the beta process serves as the de Finetti mixing measure for the IBP, generating sparse infinite measures over [0,1] from which Bernoulli draws produce the binary feature indicators.[15] This connection enables hierarchical extensions and posterior inference in feature allocation models, with the one-parameter IBP corresponding to a beta process with parameters B0BeP(c,B0)B_0 \sim \text{BeP}(c, B_0) where cc \to \infty.[15]

Applications

In Bayesian nonparametrics

The Chinese restaurant process (CRP) provides a discrete representation of the Dirichlet process, facilitating the construction of Bayesian nonparametric mixture models with potentially infinite components. In Dirichlet process mixture (DPM) models, the CRP governs the assignment of latent cluster labels to data points, allowing for flexible density estimation where the number of mixture components is determined posteriorly rather than fixed in advance. For instance, in infinite Gaussian mixture models for clustering, each data point is treated as a "customer" that joins existing "tables" (clusters) proportional to their sizes or starts a new table with probability proportional to the concentration parameter, thereby enabling the model to adaptively discover the underlying number of clusters from the data.[2][16] In nonparametric Bayesian inference, the CRP supports efficient posterior computation by enabling Gibbs sampling schemes that marginalize over unnecessary parameters, thus avoiding the limitations of finite mixture models that require prespecifying the number of components K. Specifically, the posterior distribution over cluster assignments follows a CRP augmented with data likelihoods, permitting collapsed Gibbs samplers to directly sample partition structures without enumerating all possible clusterings. This methodology, detailed in foundational work on Markov chain methods for DPMs, has established the CRP as a cornerstone for scalable inference in infinite mixture settings.[17][18] Hierarchical extensions of the CRP, such as the nested CRP, allow for multilevel clustering structures by applying the process recursively across levels, generating priors over tree-like partitions suitable for modeling nested data hierarchies. In these models, observations at the leaf level are assigned to tables within sub-restaurants, which themselves form higher-level tables, enabling inference over infinite-depth hierarchies without fixed dimensionality. This framework has been applied to topic modeling, where documents are clustered into subtopics that aggregate into broader themes, capturing multilevel semantic structures in text corpora.[19] A recent advancement, the proportional CRP introduced in 2020, modifies the seating rule to make the probability of joining a table proportional to a power of its size, improving estimation of the number of clusters in high-dimensional settings by reducing under- or over-clustering biases inherent in the standard CRP. This variant maintains exchangeability while introducing a powering parameter to control cluster granularity, demonstrating superior performance in posterior inference for DPMs on synthetic and real datasets compared to traditional approaches.[20] As an example, infinite hidden Markov models (iHMMs) leverage the CRP to define nonparametric priors over state transition distributions, allowing an unbounded number of hidden states in sequential data modeling. In the iHMM, the CRP representation of the Dirichlet process prior on transitions enables posterior sampling of state sequences and the discovery of emergent states, extending classical finite HMMs to handle complex dynamics without pre-specifying state cardinality.[21]

In machine learning and data analysis

The Chinese restaurant franchise process extends the CRP to model correlated topics in extensions of latent Dirichlet allocation (LDA), such as hierarchical LDA, by representing documents as paths in a shared topic tree that captures multilevel dependencies.[19] This allows topics to be drawn from a hierarchy where higher-level distributions inform lower ones, enabling nonparametric discovery of topic structures with shared components across documents or groups.[22] In practice, inference via Chinese restaurant franchise sampling facilitates posterior computation over infinite topic hierarchies, improving coherence in large document collections compared to flat LDA models.[23] In clustering tasks, the infinite relational model leverages the CRP to infer latent clusters in network data, treating nodes as customers and clusters as tables to model block-structured graphs without predefined community counts.[24] This approach uses CRP-based Gibbs sampling to assign relational features to clusters, enabling scalable discovery of hidden structures in bipartite or multipartite networks like social graphs or citation data.[25] Recent implementations incorporate local access mechanisms for sublinear-time sampling, reducing computational overhead for large-scale network clustering by avoiding full data scans during inference.[26] For feature allocation, the Indian buffet process (IBP)—a binary matrix analogue of the CRP—is employed in matrix factorization for recommender systems and image tagging, modeling sparse latent features as dishes sampled by customers (data points).[27] Dependent IBP variants enable nonparametric nonnegative matrix factorization, automatically determining feature counts from observed data like user ratings or image annotations, which enhances sparsity and interpretability in low-rank approximations.[27] The IBP is also referenced briefly for multi-label tasks, where it generates exchangeable binary feature assignments suitable for sparse labeling scenarios. Advancements from 2020 to 2025 include sublinear-time CRP approximations that support efficient processing of large-scale datasets in streaming or distributed machine learning settings, achieving query times independent of total data size.[26] CRP-based methods further provide alternatives to K-means by enabling automatic cluster count selection through marginalization over partitions, as seen in hierarchical relational models that infer optimal groupings via posterior sampling.[28] An illustrative application is anomaly detection in time series, where CRP partitions group similar observations into clusters, flagging anomalies as singleton tables or points with low seating probabilities relative to temporal neighbors.[29] This nonparametric approach handles varying cluster numbers and irregular sampling, outperforming fixed-model baselines in detecting outliers in water consumption or sensor data streams.[29]

In other fields

In population genetics, the Chinese restaurant process serves as a generative mechanism underlying Ewens' sampling formula, which describes the probability distribution of allele configurations in a sample from a population under the infinite alleles model of neutral evolution. This connection arises because the sequential seating probabilities in the CRP exactly match the predictive probabilities for new or existing alleles in the formula, enabling inference on genetic diversity and mutation rates from observed allele frequency spectra.[9] In ecology, the two-parameter Chinese restaurant process, also known as the Pitman-Yor process, models species diversity and abundance distributions by generating partitions that exhibit power-law tails, aligning with empirical patterns in natural communities such as microbial ecosystems. This generalization introduces a discount parameter that controls the rate of new species introduction, allowing the process to capture heavy-tailed abundance spectra where a few species dominate while many are rare, as observed in metagenomic data from environmental samples. The hierarchical extension of this process further accommodates variation across multiple communities, improving estimates of alpha- and beta-diversity metrics like Simpson's index in studies of microbiome assemblages. The Chinese restaurant process has found applications in image processing, particularly for texture segmentation and reconstruction, by providing a nonparametric clustering prior that adapts to the number of homogeneous regions in an image. A weighted variant of the CRP was introduced in 2006 for clustering gene expression data from microarray experiments, where it effectively groups genes with similar expression profiles by favoring partitions based on data similarity. This approach has been extended to computer vision tasks, such as segmenting textures in natural images, through distance-dependent modifications that bias clustering toward spatially proximate pixels, enabling robust reconstruction of textured regions without fixed model complexity.[30][31] In physics, random graph models incorporating preferential attachment dynamics akin to the Chinese restaurant process simulate network growth and explain emergent scale-free properties, such as power-law degree distributions observed in complex systems. These models treat nodes as "customers" and clusters as "tables," where incoming nodes preferentially connect to high-degree nodes, leading to condensation phenomena and anomalous finite-size effects in statistical mechanics analyses. Such frameworks have been applied to study self-averaging failures and phase transitions in growing networks, bridging probabilistic partitioning with physical processes like aggregation and percolation.[32] In 2020, an advancement utilized an online Chinese restaurant process in a nonparametric Bayesian multimodal hierarchical model to facilitate spatial concept formation, integrating visual and linguistic inputs for tasks like robotic perception and scene understanding. The model employs the online CRP to dynamically estimate the number of spatial concept clusters and position distributions, enabling simultaneous place categorization, lexical acquisition, and mapping without prior knowledge.[33]

References

User Avatar
No comments yet.