Recent from talks
Nothing was collected or created yet.
Burstiness
View on WikipediaIn statistics, burstiness is the intermittent increases and decreases in activity or frequency of an event.[1][2] One measure of burstiness is the Fano factor—a ratio between the variance and mean of counts.
Burstiness is observable in natural phenomena, such as natural disasters, or other phenomena, such as network/data/email network traffic[3][4] or vehicular traffic.[5] Burstiness is, in part, due to changes in the probability distribution of inter-event times.[6] Distributions of bursty processes or events are characterised by heavy, or fat, tails.[1]
Burstiness of inter-contact time between nodes in a time-varying network can decidedly slow spreading processes over the network. This is of great interest for studying the spread of information and disease. [7]
Burstiness score
[edit]One relatively simple measure of burstiness is burstiness score. The burstiness score of a subset of time period relative to an event is a measure of how often appears in compared to its occurrences in . It is defined by
Where is the total number of occurrences of event in subset and is the total number of occurrences of in .
Burstiness score can be used to determine if is a "bursty period" relative to . A positive score says that occurs more often during subset than over total time , making a bursty period. A negative score implies otherwise. [8]
See also
[edit]References
[edit]- ^ a b Lambiotte, R. (2013.) "Burstiness and Spreading on Temporal Networks", University of Namur.
- ^ Neuts, M. F. (1993.) "The Burstiness of Point Processes", Commun. Statist.—Stochastic Models, 9(3):445–66.
- ^ D'Auria, B. and Resnick, S. I. (2006.) "Data network models of burstiness", Adv. in Appl. Probab., 38(2):373–404.
- ^ Ying, Y.; Mazumdar, R.; Rosenberg, C.; Guillemin, F. (2005.) "The Burstiness Behavior of Regulated Flows in Networks", Proceedings of the 4th IFIP-TC6 International Conference on Networking Technologies, Services, and Protocols, Performance ofo Computer and Communication Networks, Mobile and Wireless Communication Systems, 3462:918–29.
- ^ Jagerman, D. L. and Melamed, B. (1994.) "Burstiness Descriptors of Traffic Streams: Indices of Dispersion and Peakedness", Proceedings of the 1994 Conference on Information Sciences and Systems, 1:24–8.
- ^ Goh, K.-I. and Barabasi, A.-L. (2006.) "Burstiness and Memory in Complex Systems", Physics Data.
- ^ P. Holme, J. Saramäki. Temporal Networks. Phys. Rep. 519, 118–120; 10.1016/j.physrep.2012.03.001 (2012)
- ^ A. Hoonlor et al. (2013). "An Evolution of Computer Science Research", Communications of the ACM, 56(10):79
Burstiness
View on GrokipediaDefinition and Characteristics
Conceptual Definition
Burstiness refers to the intermittent increases and decreases in the frequency or intensity of events within a stochastic process, where events tend to cluster in short periods of high activity separated by extended periods of relative quiescence.[8] This phenomenon contrasts sharply with uniform processes like the Poisson process, in which events occur at a constant average rate without such clustering, leading to exponentially distributed inter-event times.[9] In bursty systems, these clusters, or "bursts," manifest as brief episodes of elevated event rates followed by prolonged inactivity, a pattern observed in diverse natural phenomena such as earthquake aftershocks, where seismic activity surges in sequences before subsiding, or in human behaviors like email sending, where individuals transmit multiple messages in rapid succession after long delays. Bursty processes are characterized as overdispersed, exhibiting greater variability (variance exceeding the mean) compared to the equidispersion of Poisson processes, while regular or underdispersed processes show the opposite, with reduced variability and more even spacing of events.[8] Positive autocorrelation among events plays a key role in sustaining these bursts, as prior occurrences increase the likelihood of subsequent ones within the cluster, thereby prolonging the high-activity phase.[10] The concept of burstiness emerged within queueing theory during the mid-20th century, building on foundational work in stochastic modeling of service systems, with early explicit references appearing in telecommunications modeling around the 1960s to describe irregular data flows in emerging computer networks. A common quantifier of burstiness is the Fano factor, which compares the variance to the mean of event counts in fixed intervals, yielding values greater than one for bursty dynamics.[8]Key Properties
Burstiness manifests as temporal clustering, where events occur in tight groups separated by long periods of inactivity, arising from positive autocorrelation in event timings. This clustering leads to heavy-tailed inter-event time distributions, characterized by a higher probability of both very short and very long intervals compared to uniform processes.[10] Such patterns reflect underlying memory effects, where the occurrence of one event increases the likelihood of subsequent events in the near term.[10] A defining feature of burstiness is its scale-invariance, meaning the clustered patterns persist across multiple time scales, often described by power-law distributions in inter-event times and burst sizes. For instance, the probability of inter-event times follows with , ensuring self-similar structure regardless of the observation window.[11] This property distinguishes bursty dynamics from scale-dependent behaviors and enables modeling with fractal-like temporal structures.[10] In contrast to random processes like the Poisson distribution, which are memoryless and produce uniform event spacing with exponential inter-event time tails, bursty systems exhibit persistent memory through long-range correlations. These correlations result in power-law decaying autocorrelation functions, sustaining bursts over extended periods rather than independent arrivals.[10] Bursty processes thus display overdispersion, where event counts vary more than expected under Poisson assumptions.[10] Observable signs of burstiness include marked variability in event rates over time, with the coefficient of variation of inter-event times exceeding 1, indicating greater dispersion than in regular or random sequences. This elevated variability underscores the non-stationary nature of bursty activity, where periods of high density alternate with quiescence.[12]Measurement Methods
Fano Factor
The Fano factor serves as a primary statistical measure for quantifying burstiness in event counts, defined as the ratio of the variance to the mean of the counts observed over fixed time intervals: , where is the variance and is the mean number of events per interval.[13] This dimensionless quantity captures the dispersion relative to what would be expected in a random process.[14] Under a Poisson process, where events occur independently and at a constant average rate, the variance equals the mean, yielding , which indicates non-bursty behavior with no clustering.[15] A value of signifies overdispersion, reflecting burstiness where events cluster more than expected, leading to higher variability; conversely, suggests underdispersion and more regular, anti-bursty patterns.[16] In bursty systems, such as transcriptional events, directly quantifies the degree of bursting, with larger values corresponding to more pronounced intermittency.[14] The Fano factor derives from the index of dispersion, which compares empirical variance in event counts to the mean expected under the Poisson assumption of independence; deviations from thus detect clustering or regularity.[17] For a binned time series of events, the calculation proceeds as follows: (1) divide the total observation period into equal non-overlapping bins of fixed duration; (2) count the number of events in each bin ; (3) compute the sample mean ; (4) compute the sample variance ; and (5) obtain .[18] This approach assumes the underlying process is stationary, with consistent statistical properties across bins.[17] Its advantages include simplicity, requiring no additional parameters beyond the raw counts, and ease of computation for any counting process.[13] However, it is sensitive to the choice of bin size, as smaller bins may underestimate clustering while larger ones can smooth out bursts, and it presumes stationarity, which may not hold in non-ergodic or trending data. The Fano factor is equivalent to the index of dispersion.[17] For example, consider daily email receipt counts over 100 days with a mean emails per day and variance ; the Fano factor is then , indicating bursty patterns where emails arrive in clusters on some days and sparsely on others.[19]Index of Dispersion and Related Metrics
The index of dispersion (ID), defined as the ratio of the variance to the mean of event counts, ID = \sigma^2 / \mu, quantifies overdispersion in discrete data such as arrival processes.[20] Applied to count data in fields like network traffic, an ID exceeding 1 signals bursty clustering beyond Poisson expectations, while values below 1 indicate underdispersion.[21] Computation proceeds by aggregating events into fixed intervals to estimate \mu and \sigma^2 empirically, then taking their ratio, a process suited to stationary count series.[20] The coefficient of variation squared, CV^2 = \sigma^2 / \mu^2, normalizes dispersion by the squared mean, enabling scale-independent comparisons of burstiness across datasets with varying event rates.[22] In biological contexts like gene expression, elevated CV^2 highlights bursty production patterns where protein levels fluctuate more relative to the average than in steady-state processes.[23] This metric proves valuable for relative variability assessment when absolute counts differ markedly between systems. In inter-event time analysis, burstiness emerges through the coefficient of variation of waiting times, \tau = \sigma_\tau / \mu_\tau, where \tau > 1 denotes irregular bursts interspersed with long silences, contrasting the \tau = 1 of memoryless Poisson events. This approach suits point processes in human activity or neural firing, capturing temporal irregularity without binning.[24] A refined metric, the burstiness parameter B = (\tau - 1)/(\tau + 1), transforms the CV into a bounded scale from -1 (periodic regularity) to +1 (extreme burstiness), mitigating sensitivity to outliers in finite samples. It surpasses the raw CV or ID in cross-system comparisons, such as email versus web browsing patterns, by providing a dimensionless, interpretable index robust to rate variations.[25] All such metrics presume ergodicity, where a single long trajectory represents the ensemble, an assumption violated in non-stationary data exhibiting trends or regime shifts that inflate apparent dispersion.[26] For non-stationary series, preprocessing via detrending—such as linear regression removal or empirical mode decomposition—restores applicability by isolating intrinsic variability from systematic changes.[27]Applications
In Communication and Network Traffic
In communication networks, burstiness refers to the phenomenon where data packets or frames arrive in concentrated clusters or "bursts" rather than at a steady rate, often triggered by protocol mechanisms such as TCP's delayed acknowledgments, which bundle multiple confirmations into a single packet to optimize efficiency. This clustering arises from the asynchronous nature of network protocols, where sources like file transfers or web browsing generate irregular traffic patterns, leading to short periods of high-intensity transmission followed by idle times. The impacts of burstiness are significant in network performance, as these sudden influxes can overwhelm router buffers, resulting in queue overflows, increased packet loss, jitter in real-time applications like VoIP, and elevated end-to-end latency. For instance, in backbone networks, bursty traffic can cause peak-to-average ratios exceeding 10:1 during congestion events, amplifying delays in service level agreements for high-priority flows. To quantify this variability, metrics like the Fano factor are occasionally applied to assess the dispersion in inter-arrival times relative to a Poisson process. Modeling bursty traffic is central to queueing theory, where sources are characterized using models like batch arrival queues or Markov-modulated arrival processes (MAPs) to capture clustering effects in arrival processes. A key aspect involves the burst length distribution, often modeled using a geometric distribution for the number of packets per burst, with probability mass function for , where is the burstiness parameter representing the average burst size. This approach enables analysis of queue stability and waiting times under bursty conditions, informing capacity planning in wide-area networks. Mitigation strategies focus on smoothing these bursts through traffic shaping and policing algorithms, which regulate output rates to enforce token bucket or leaky bucket profiles, thereby preventing downstream congestion. In Asynchronous Transfer Mode (ATM) networks, constant bit rate services incorporate burst tolerance parameters to handle variable traffic, while Ethernet standards like IEEE 802.1Q use priority queuing and shaping to prioritize bursts in virtual LANs. These techniques have evolved to support quality-of-service (QoS) guarantees in modern IP networks. Historically, the study of burstiness emerged in the 1980s amid efforts to integrate bursty data traffic with constant-rate voice services in integrated services digital networks (ISDN), prompting models that quantified burst parameters for multiplexing efficiency. This foundational work influenced subsequent QoS frameworks, culminating in burstiness-aware specifications in standards like IEEE 802.1Q, which define traffic contracts including peak rate and burst size limits to ensure predictable performance.In Human Behavior and Activity Patterns
Human activities often exhibit bursty patterns, characterized by long periods of inactivity interspersed with short bursts of intense action. For instance, in email checking behaviors, individuals spend approximately 80% of their time inactive, with the remaining 20% involving rapid execution of multiple tasks.[28] Similarly, editing activities on Wikipedia display clustered bursts, where editors contribute multiple revisions in quick succession followed by extended inactivity, reflecting non-Poissonian dynamics in collaborative online tasks.[29] These bursty patterns arise from psychological factors such as priority queuing, where individuals selectively address high-priority tasks during available time slots, and external triggers like notifications or deadlines that prompt sudden activity spikes. Barabási's queueing model formalizes this process, positing that tasks are assigned random priorities and executed based on perceived urgency rather than a first-in-first-out discipline, leading to heavy-tailed inter-event time distributions.[28] Empirical analyses of email datasets quantify this burstiness using the index of dispersion (Fano factor), which often exceeds 10, far above the value of 1 expected for Poisson processes. Key studies have confirmed these patterns across communication modalities. An analysis of mobile phone call records from millions of users revealed power-law distributed inter-event times, with exponents around 1, indicating scale-free burstiness driven by individual decision-making rather than external scheduling alone.[30] In email communication, waiting times between messages follow a power-law tail, underscoring the role of internal prioritization in generating non-uniform activity.[28] Burstiness in human behavior has significant implications for modeling productivity, as it challenges assumptions of uniform task completion rates and suggests that efficiency arises from adaptive queuing rather than constant effort. In social networks, these patterns influence dynamics by accelerating information diffusion during bursts, where correlated activity amplifies spread compared to random timing. Research from the 2010s has extended these insights to online platforms, linking bursty dynamics to circadian rhythms that modulate activity peaks during waking hours. For example, studies of social media interactions show that de-seasonalizing data for daily cycles still yields heavy-tailed inter-event times, with burstiness enhancing content popularity through synchronized user engagement.In Natural Language Processing
In natural language processing, burstiness refers to the irregular variation in textual features such as sentence length, syntactic complexity, and word repetition patterns, which contrasts with the more uniform output typically produced by AI language models.[31] Human writing often exhibits "bursts" of short, punchy sentences interspersed with longer, elaborate ones, reflecting natural cognitive rhythms and stylistic choices, whereas AI-generated text tends toward consistent pacing and homogeneity.[32] This distinction arises because large language models optimize for predictability and fluency, resulting in reduced variability compared to the diverse, context-driven fluctuations in human-authored content. A common adaptation for measuring burstiness in text involves calculating the coefficient of variation (CV) of sentence lengths, defined as the burstiness score , where is the standard deviation and is the mean sentence length in words. Higher values of indicate greater variability, often associated with human writing due to its dynamic structure, while lower values suggest the steadier patterns of AI output.[33] This metric, borrowed from general statistical dispersion measures, helps quantify stylistic burstiness without requiring complex linguistic parsing. Burstiness has gained prominence in AI detection tools, particularly since the widespread adoption of models like ChatGPT in late 2022, which amplified concerns over indistinguishable synthetic text.[34] Tools such as GPTZero integrate burstiness analysis with perplexity—a measure of text predictability—to identify AI-generated content, as human writing typically shows higher burstiness from varied sentence structures.[35] Studies confirm that AI text often displays low burstiness, with limited variation in sentence lengths and structures. As of 2025, while still used, burstiness combined with perplexity achieves automated detection accuracies of 65–90% in evaluations, with advanced models further emulating variability and necessitating hybrid approaches.[36] However, by 2025, tools and techniques to enhance AI text burstiness have emerged, reducing standalone detection efficacy and prompting integration with other features like watermarking.[37] Beyond detection, burstiness plays a role in topic modeling for dynamic text streams, such as news articles, where algorithms identify sudden surges in keyword frequency to detect emerging trends. A seminal approach is Kleinberg's 2002 algorithm, which models document streams as infinite-state automata to pinpoint bursty periods of heightened activity around specific topics, facilitating real-time event tracking in corpora like news feeds.[38] Challenges in applying burstiness metrics include cultural and stylistic variations in writing, which can skew scores across languages or regions—for instance, concise prose in some East Asian styles may mimic AI uniformity despite human authorship. Additionally, advancing AI models, such as later iterations of GPT-4, increasingly emulate human-like variability, diminishing the reliability of burstiness as a standalone detector and necessitating hybrid methods.In Physics and Biology
In physics, burstiness appears in processes like particle or information diffusion on networks, where events cluster in time, leading to irregular activity patterns. Bursty timing can accelerate the spread of diffusion compared to uniform Poisson processes, particularly in heterogeneous systems where some nodes exhibit bursty behavior while others remain steady. For instance, in models of temporal networks, bursty activity enhances overall propagation by exploiting temporal correlations, as demonstrated in analyses of complex systems.[39] The effective diffusion rate in such systems scales with the burstiness parameter, given by , where quantifies temporal irregularity based on the standard deviation and mean of inter-event times.[40] A prominent example is earthquake seismicity, where aftershocks form power-law distributed bursts following the Omori law, , with describing the decay of event rates after a main shock, reflecting clustered seismic releases.[41] In biology, burstiness is evident in neural spiking, where neurons fire in rapid bursts of action potentials interspersed with silent periods, enabling efficient coding of sensory or cognitive information. These bursts improve signal reliability and temporal precision, allowing neurons to convey more data per spike than tonic firing. The Fano factor, for spike count over a time window, exceeds 2 in bursty regimes, indicating overdispersion beyond Poisson statistics () and highlighting clustered activity.[42] Bursts optimize energy consumption in spiking neurons by maximizing information transfer while minimizing total spikes and metabolic demand, as shown in models where intermediate burstiness balances efficiency and adaptability during learning tasks.[43] This mechanism also enhances signal propagation in excitable media like neural or cardiac tissues, where burst-induced waves synchronize excitations more robustly than steady inputs.[44] Bursty patterns similarly govern gene expression in cells, where transcription proceeds in stochastic bursts of mRNA production followed by degradation phases, contributing to variability in protein levels. Seminal stochastic models from the 2000s, such as the two-state telegraph model, capture this by treating promoter switching as a random process with bursty outputs, explaining observed noise in bacterial and eukaryotic systems.[45] Across these domains, bursty dynamics are modeled via renewal processes with heavy-tailed inter-event time distributions, often power-law forms that generate clustered events. Simulations of such processes demonstrate that burstiness mitigates finite-size effects in small-scale systems, yielding dynamics more akin to large-system limits by amplifying rare events without requiring extensive sampling. Inter-event time metrics, like coefficient of variation, briefly quantify this irregularity for analysis.[46]References
- https://arxiv.org/pdf/1604.01125.pdf
