Power transform

In statistics, a power transform is a family of functions applied to create a monotonic transformation of data using power functions. It is a data transformation technique used to stabilize variance, make the data more normal distribution-like, improve the validity of measures of association (such as the Pearson correlation between variables), and for other data stabilization procedures.

Power transforms are used in multiple fields, including multi-resolution and wavelet analysis,^[1] statistical data analysis, medical research, modeling of physical processes,^[2] geochemical data analysis,^[3] epidemiology^[4] and many other clinical, environmental and social research areas.

Definition

The power transformation is defined as a continuous function of power parameter λ, typically given in piece-wise form that makes it continuous at the point of singularity (λ = 0). For data vectors (y₁,..., y_n) in which each y_i > 0, the power transform is

y_{i}^{(\lambda )}={\begin{cases}{\dfrac {y_{i}^{\lambda }-1}{\lambda (\operatorname {GM} (y))^{\lambda -1}}},&{\text{if }}\lambda \neq 0\\[12pt]\operatorname {GM} (y)\ln {y_{i}},&{\text{if }}\lambda =0\end{cases}}

where

\operatorname {GM} (y)=\left(\prod _{i=1}^{n}y_{i}\right)^{\frac {1}{n}}={\sqrt[{n}]{y_{1}y_{2}\cdots y_{n}}}\,

is the geometric mean of the observations y₁, ..., y_n. The case for $\lambda =0$ is the limit as $\lambda$ approaches 0. To see this, note that $y_{i}^{\lambda }=\exp({\lambda \ln(y_{i})})=1+\lambda \ln(y_{i})+O((\lambda \ln(y_{i}))^{2})$ - using Taylor series. Then ${\dfrac {y_{i}^{\lambda }-1}{\lambda }}=\ln(y_{i})+O(\lambda )$ , and everything but $\ln(y_{i})$ becomes negligible for $\lambda$ sufficiently small.

The inclusion of the (λ − 1)th power of the geometric mean in the denominator simplifies the scientific interpretation of any equation involving $y_{i}^{(\lambda )}$ , because the units of measurement do not change as λ changes.

Box and Cox (1964) introduced the geometric mean into this transformation by first including the Jacobian of rescaled power transformation

{\frac {y^{\lambda }-1}{\lambda }}.

with the likelihood. This Jacobian is as follows:

J(\lambda ;y_{1},\ldots ,y_{n})=\prod _{i=1}^{n}|dy_{i}^{(\lambda )}/dy|=\prod _{i=1}^{n}y_{i}^{\lambda -1}=\operatorname {GM} (y)^{n(\lambda -1)}

This allows the normal log likelihood at its maximum to be written as follows:

{\begin{aligned}\log({\mathcal {L}}({\hat {\mu }},{\hat {\sigma }}))&=(-n/2)(\log(2\pi {\hat {\sigma }}^{2})+1)+n(\lambda -1)\log(\operatorname {GM} (y))\\[5pt]&=(-n/2)(\log(2\pi {\hat {\sigma }}^{2}/\operatorname {GM} (y)^{2(\lambda -1)})+1).\end{aligned}}

From here, absorbing $\operatorname {GM} (y)^{2(\lambda -1)}$ into the expression for ${\hat {\sigma }}^{2}$ produces an expression that establishes that minimizing the sum of squares of residuals from $y_{i}^{(\lambda )}$ is equivalent to maximizing the sum of the normal log likelihood of deviations from $(y^{\lambda }-1)/\lambda$ and the log of the Jacobian of the transformation.

The value at Y = 1 for any λ is 0, and the derivative with respect to Y there is 1 for any λ. Sometimes Y is a version of some other variable scaled to give Y = 1 at some sort of average value.

The transformation is a power transformation, but done in such a way as to make it continuous with the parameter λ at λ = 0. It has proved popular in regression analysis, including econometrics.

Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter.

\tau (y_{i};\lambda ,\alpha )={\begin{cases}{\dfrac {(y_{i}+\alpha )^{\lambda }-1}{\lambda (\operatorname {GM} (y+\alpha ))^{\lambda -1}}}&{\text{if }}\lambda \neq 0,\\\\\operatorname {GM} (y+\alpha )\ln(y_{i}+\alpha )&{\text{if }}\lambda =0,\end{cases}}

which holds if y_i + α > 0 for all i. If τ(Y, λ, α) follows a truncated normal distribution, then Y is said to follow a Box–Cox distribution.

Bickel and Doksum eliminated the need to use a truncated distribution by extending the range of the transformation to all y, as follows:

\tau (y_{i};\lambda ,\alpha )={\begin{cases}{\dfrac {\operatorname {sgn} (y_{i}+\alpha )|y_{i}+\alpha |^{\lambda }-1}{\lambda (\operatorname {GM} (y+\alpha ))^{\lambda -1}}}&{\text{if }}\lambda \neq 0,\\\\\operatorname {GM} (y+\alpha )\operatorname {sgn} (y+\alpha )\ln(y_{i}+\alpha )&{\text{if }}\lambda =0,\end{cases}}

where sgn(.) is the sign function. This change in definition has little practical import as long as $\alpha$ is less than $\operatorname {min} (y_{i})$ , which it usually is.^[5]

Bickel and Doksum also proved that the parameter estimates are consistent and asymptotically normal under appropriate regularity conditions, though the standard Cramér–Rao lower bound can substantially underestimate the variance when parameter values are small relative to the noise variance.^[5] However, this problem of underestimating the variance may not be a substantive problem in many applications.^[6]^[7]

Box–Cox transformation

The one-parameter Box–Cox transformations are defined as

y_{i}^{(\lambda )}={\begin{cases}{\dfrac {y_{i}^{\lambda }-1}{\lambda }}&{\text{if }}\lambda \neq 0,\\\ln y_{i}&{\text{if }}\lambda =0,\end{cases}}

and the two-parameter Box–Cox transformations as

y_{i}^{({\boldsymbol {\lambda }})}={\begin{cases}{\dfrac {(y_{i}+\lambda _{2})^{\lambda _{1}}-1}{\lambda _{1}}}&{\text{if }}\lambda _{1}\neq 0,\\\ln(y_{i}+\lambda _{2})&{\text{if }}\lambda _{1}=0,\end{cases}}

as described in the original article.^[8]^[9] Moreover, the first transformations hold for $y_{i}>0$ , and the second for $y_{i}>-\lambda _{2}$ .^[8]

The parameter $\lambda$ is estimated using the profile likelihood function and using goodness-of-fit tests.^[10]

Confidence interval

Confidence interval for the Box–Cox transformation can be asymptotically constructed using Wilks's theorem on the profile likelihood function to find all the possible values of $\lambda$ that fulfill the following restriction:^[11]

\ln {\big (}L(\lambda ){\big )}\geq \ln {\big (}L({\hat {\lambda }}){\big )}-{\frac {1}{2}}{\chi ^{2}}_{1,1-\alpha }.

Example

The BUPA liver data set^[12] contains data on liver enzymes ALT and γGT. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help.

The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ₁²/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs.

Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line.

Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a robust regression approach leads to a more precise model.

Econometric application

Economists often characterize production relationships by some variant of the Box–Cox transformation.^[13]

Consider a common representation of production Q as dependent on services provided by a capital stock K and by labor hours N:

\tau (Q)=\alpha \tau (K)+(1-\alpha )\tau (N).\,

Solving for Q by inverting the Box–Cox transformation we find

Q={\big (}\alpha K^{\lambda }+(1-\alpha )N^{\lambda }{\big )}^{1/\lambda },\,

which is known as the constant elasticity of substitution (CES) production function.

The CES production function is a homogeneous function of degree one.

When λ = 1, this produces the linear production function:

Q=\alpha K+(1-\alpha )N.\,

When λ → 0 this produces the famous Cobb–Douglas production function:

Q=K^{\alpha }N^{1-\alpha }.\,

Activities and demonstrations

The SOCR resource pages contain a number of hands-on interactive activities^[14] demonstrating the Box–Cox (power) transformation using Java applets and charts. These directly illustrate the effects of this transform on Q–Q plots, X–Y scatterplots, time-series plots and histograms.

Yeo–Johnson transformation

The Yeo–Johnson transformation^[15] allows also for zero and negative values of $y$ . $\lambda$ can be any real number, where $\lambda =1$ produces the identity transformation. The transformation law reads:

y_{i}^{(\lambda )}={\begin{cases}((y_{i}+1)^{\lambda }-1)/\lambda &{\text{if }}\lambda \neq 0,y\geq 0\\[4pt]\ln(y_{i}+1)&{\text{if }}\lambda =0,y\geq 0\\[4pt]-((-y_{i}+1)^{(2-\lambda )}-1)/(2-\lambda )&{\text{if }}\lambda \neq 2,y<0\\[4pt]-\ln(-y_{i}+1)&{\text{if }}\lambda =2,y<0\end{cases}}

Box-Tidwell transformation

The Box-Tidwell transformation is a statistical technique used to assess and correct non-linearity between predictor variables and the logit in a generalized linear model, particularly in logistic regression. This transformation is useful when the relationship between the independent variables and the outcome is non-linear and cannot be adequately captured by the standard model.

Overview

The Box-Tidwell transformation was developed by George E. P. Box and Paul W. Tidwell in 1962 as an extension of Box-Cox transformations, which are applied to the dependent variable. However, unlike the Box-Cox transformation, the Box-Tidwell transformation is applied to the independent variables in regression models. It is often used when the assumption of linearity between the predictors and the outcome is violated.

Method

The general idea behind the Box-Tidwell transformation is to apply a power transformation to each independent variable Xi in the regression model:

$X_{i}'=X_{i}^{\lambda }$

Where $\lambda$ is the parameter estimated from the data. If Box-Tidwell Transformation is significantly different from 1, this indicates a non-linear relationship between Xi and the logit, and the transformation improves the model fit.

The Box-Tidwell test is typically performed by augmenting the regression model with terms like $X_{i}\log(X_{i})$ and testing the significance of the coefficients. If significant, this suggests that a transformation should be applied to achieve a linear relationship between the predictor and the logit.

Applications

Stabilizing Continuous Predictors

The transformation is beneficial in logistic regression or proportional hazards models where non-linearity in continuous predictors can distort the relationship with the dependent variable. It is a flexible tool that allows the researcher to fit a more appropriate model to the data without guessing the relationship's functional form in advance.

Verifying Linearity in Logistic Regression

In logistic regression, a key assumption is that continuous independent variables exhibit a linear relationship with the logit of the dependent variable. Violations of this assumption can lead to biased estimates and reduced model performance. The Box-Tidwell transformation is a method used to assess and correct such violations by determining whether a continuous predictor requires transformation to achieve linearity with the logit.

Method for Verifying Linearity

The Box-Tidwell transformation introduces an interaction term between each continuous variable X_i and its natural logarithm $\log(X_{i})$ :

$X_{i}\log(X_{i})$

This term is included in the logistic regression model to test whether the relationship between X_i and the logit is non-linear. A statistically significant coefficient for this interaction term indicates a violation of the linearity assumption, suggesting the need for a transformation of the predictor. the Box-Tidwell transformation provides an appropriate power transformation to linearize the relationship, thereby improving model accuracy and validity. Conversely, non-significant results support the assumption of linearity.

Limitations

One limitation of the Box-Tidwell transformation is that it only works for positive values of the independent variables. If your data contains negative values, the transformation cannot be applied directly without modifying the variables (e.g., adding a constant).

Alternative names and related concepts

The power transform appears under different names in various scientific and applied contexts:

Alpha-fairness – introduced in the study of network utility maximization by Frank Kelly and collaborators as the α-fair utility function family.^[16]^[17]

Tsallis entropy – a generalization of Shannon entropy proposed in non-extensive statistical mechanics, which reduces to Shannon entropy as the Tsallis parameter converges to 1.

Q-exponential family – a generalization of the standard exponential family that replaces the exponential function with the q-exponential form derived from Tsallis statistics.^[18]

Notes

^ Gao, Peisheng; Wu, Weilin (2006). "Power Quality Disturbances Classification using Wavelet and Support Vector Machines". Sixth International Conference on Intelligent Systems Design and Applications. ISDA '06. Vol. 1. Washington, DC, USA: IEEE Computer Society. pp. 201–206. doi:10.1109/ISDA.2006.217. ISBN 9780769525280. S2CID 2444503.
^ Gluzman, S.; Yukalov, V. I. (2006-01-01). "Self-similar power transforms in extrapolation problems". Journal of Mathematical Chemistry. 39 (1): 47–56. arXiv:cond-mat/0606104. Bibcode:2006cond.mat..6104G. doi:10.1007/s10910-005-9003-7. ISSN 1572-8897. S2CID 118965098.
^ Howarth, R. J.; Earle, S. A. M. (1979-02-01). "Application of a generalized power transformation to geochemical data". Journal of the International Association for Mathematical Geology. 11 (1): 45–62. doi:10.1007/BF01043245. ISSN 1573-8868. S2CID 121582755.
^ Peters, J. L.; Rushton, L.; Sutton, A. J.; Jones, D. R.; Abrams, K. R.; Mugglestone, M. A. (2005). "Bayesian methods for the cross-design synthesis of epidemiological and toxicological evidence". Journal of the Royal Statistical Society, Series C. 54: 159–172. doi:10.1111/j.1467-9876.2005.00476.x. S2CID 121909404.
^ ^a ^b Bickel, Peter J.; Doksum, Kjell A. (June 1981). "An analysis of transformations revisited". Journal of the American Statistical Association. 76 (374): 296–311. doi:10.1080/01621459.1981.10477649.
^ Sakia, R. M. (1992), "The Box–Cox transformation technique: a review", The Statistician, 41 (2): 169–178, CiteSeerX 10.1.1.469.7176, doi:10.2307/2348250, JSTOR 2348250
^ Li, Fengfei (April 11, 2005), Box–Cox Transformations: An Overview (PDF) (slide presentation), Sao Paulo, Brazil: University of Sao Paulo, Brazil, retrieved 2014-11-02
^ ^a ^b Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B. 26 (2): 211–252. JSTOR 2984418. MR 0192611.
^ Johnston, J. (1984). Econometric Methods (Third ed.). New York: McGraw-Hill. pp. 61–74. ISBN 978-0-07-032685-9.
^ Asar, O.; Ilk, O.; Dag, O. (2017). "Estimating Box-Cox power transformation parameter via goodness-of-fit tests". Communications in Statistics - Simulation and Computation. 46 (1): 91–105. arXiv:1401.3812. doi:10.1080/03610918.2014.957839. S2CID 41501327.
^ Abramovich, Felix; Ritov, Ya'acov (2013). Statistical Theory: A Concise Introduction. CRC Press. pp. 121–122. ISBN 978-1-4398-5184-5.
^ BUPA liver disorder dataset
^ Zarembka, P. (1974). "Transformation of Variables in Econometrics". Frontiers in Econometrics. New York: Academic Press. pp. 81–104. ISBN 0-12-776150-0.
^ Power Transform Family Graphs, SOCR webpages
^ Yeo, In-Kwon; Johnson, Richard A. (2000). "A New Family of Power Transformations to Improve Normality or Symmetry". Biometrika. 87 (4): 954–959. doi:10.1093/biomet/87.4.954. JSTOR 2673623.
^ Kelly, F. P.; Maulloo, A. K.; Tan, D. K. H. (1998). "Rate control for communication networks: Shadow prices, proportional fairness and stability". Journal of the Operational Research Society. 49 (3): 237–252. doi:10.1057/palgrave.jors.2600523.
^ Pereira, C. F.; Menasché, D. S.; Zaverucha, G.; Paes, A.; Barbosa, V. C. (2025). "A utility-driven approach to instance-based transfer learning for relational domains". Machine Learning. 114 (11): 261. doi:10.1007/s10994-025-06864-4.
^ Zhu, Lingwei; Shah, Haseeb; Wang, Han; Nagai, Yukie; White, Martha (2025). q-Exponential Family for Policy Optimization. Proceedings of the International Conference on Learning Representations (ICLR).

References

Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B. 26 (2): 211–252. JSTOR 2984418. MR 0192611.
Carroll, R. J.; Ruppert, D. (1981). "On prediction and the power transformation family" (PDF). Biometrika. 68 (3): 609–615. doi:10.1093/biomet/68.3.609.
DeGroot, M. H. (1987). "A Conversation with George Box" (PDF). Statistical Science. 2 (3): 239–258. doi:10.1214/ss/1177013223.
Handelsman, D. J. (2002). "Optimal Power Transformations for Analysis of Sperm Concentration and Other Semen Variables". Journal of Andrology. 23 (5).
Gluzman, S.; Yukalov, V. I. (2006). "Self-similar power transforms in extrapolation problems". Journal of Mathematical Chemistry. 39 (1): 47–56. arXiv:cond-mat/0606104. Bibcode:2006cond.mat..6104G. doi:10.1007/s10910-005-9003-7. S2CID 118965098.
Howarth, R. J.; Earle, S. A. M. (1979). "Application of a generalized power transformation to geochemical data". Journal of the International Association for Mathematical Geology. 11 (1): 45–62. doi:10.1007/BF01043245. S2CID 121582755.
Box, G.E.P. and Tidwell, P.W. (1962) Transformation of Independent Variables. Technometrics, 4, 531-550. https://doi.org/10.1080/00401706.1962.10490038 (a.k.a. Box-Tidwell transformation)

External links

Nishii, R. (2001) [1994], "Box–Cox transformation", Encyclopedia of Mathematics, EMS Press (fixed link)
Sanford Weisberg, Yeo-Johnson Power Transformations

[1] Gao, Peisheng; Wu, Weilin (2006). "Power Quality Disturbances Classification using Wavelet and Support Vector Machines". Sixth International Conference on Intelligent Systems Design and Applications. ISDA '06. Vol. 1. Washington, DC, USA: IEEE Computer Society. pp. 201–206. doi:10.1109/ISDA.2006.217. ISBN 9780769525280. S2CID 2444503.

[2] Gluzman, S.; Yukalov, V. I. (2006-01-01). "Self-similar power transforms in extrapolation problems". Journal of Mathematical Chemistry. 39 (1): 47–56. arXiv:cond-mat/0606104. Bibcode:2006cond.mat..6104G. doi:10.1007/s10910-005-9003-7. ISSN 1572-8897. S2CID 118965098.

[3] Howarth, R. J.; Earle, S. A. M. (1979-02-01). "Application of a generalized power transformation to geochemical data". Journal of the International Association for Mathematical Geology. 11 (1): 45–62. doi:10.1007/BF01043245. ISSN 1573-8868. S2CID 121582755.

[4] Peters, J. L.; Rushton, L.; Sutton, A. J.; Jones, D. R.; Abrams, K. R.; Mugglestone, M. A. (2005). "Bayesian methods for the cross-design synthesis of epidemiological and toxicological evidence". Journal of the Royal Statistical Society, Series C. 54: 159–172. doi:10.1111/j.1467-9876.2005.00476.x. S2CID 121909404.

[Bickel_and_Doksum-5] Bickel, Peter J.; Doksum, Kjell A. (June 1981). "An analysis of transformations revisited". Journal of the American Statistical Association. 76 (374): 296–311. doi:10.1080/01621459.1981.10477649.

[6] Sakia, R. M. (1992), "The Box–Cox transformation technique: a review", The Statistician, 41 (2): 169–178, CiteSeerX 10.1.1.469.7176, doi:10.2307/2348250, JSTOR 2348250

[7] Li, Fengfei (April 11, 2005), Box–Cox Transformations: An Overview (PDF) (slide presentation), Sao Paulo, Brazil: University of Sao Paulo, Brazil, retrieved 2014-11-02

[boxcox-8] Box, George E. P.; Cox, D. R. (1964). "An analysis of transformations". Journal of the Royal Statistical Society, Series B. 26 (2): 211–252. JSTOR 2984418. MR 0192611.

[9] Johnston, J. (1984). Econometric Methods (Third ed.). New York: McGraw-Hill. pp. 61–74. ISBN 978-0-07-032685-9.

[boxcoxGOF-10] Asar, O.; Ilk, O.; Dag, O. (2017). "Estimating Box-Cox power transformation parameter via goodness-of-fit tests". Communications in Statistics - Simulation and Computation. 46 (1): 91–105. arXiv:1401.3812. doi:10.1080/03610918.2014.957839. S2CID 41501327.

[11] Abramovich, Felix; Ritov, Ya'acov (2013). Statistical Theory: A Concise Introduction. CRC Press. pp. 121–122. ISBN 978-1-4398-5184-5.

[12] BUPA liver disorder dataset

[13] Zarembka, P. (1974). "Transformation of Variables in Econometrics". Frontiers in Econometrics. New York: Academic Press. pp. 81–104. ISBN 0-12-776150-0.

[14] Power Transform Family Graphs, SOCR webpages

[yeojohnson-15] Yeo, In-Kwon; Johnson, Richard A. (2000). "A New Family of Power Transformations to Improve Normality or Symmetry". Biometrika. 87 (4): 954–959. doi:10.1093/biomet/87.4.954. JSTOR 2673623.

[16] Kelly, F. P.; Maulloo, A. K.; Tan, D. K. H. (1998). "Rate control for communication networks: Shadow prices, proportional fairness and stability". Journal of the Operational Research Society. 49 (3): 237–252. doi:10.1057/palgrave.jors.2600523.

[17] Pereira, C. F.; Menasché, D. S.; Zaverucha, G.; Paes, A.; Barbosa, V. C. (2025). "A utility-driven approach to instance-based transfer learning for relational domains". Machine Learning. 114 (11): 261. doi:10.1007/s10994-025-06864-4.

[18] Zhu, Lingwei; Shah, Haseeb; Wang, Han; Nagai, Yukie; White, Martha (2025). q-Exponential Family for Policy Optimization. Proceedings of the International Conference on Learning Representations (ICLR).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

$\lambda$	Behavior for $y \geq 0$	Behavior for $y < 0$	Overall Effect
1	Identity: $w = y$	Identity: $w = y$	No transformation; preserves original scale.
0	Logarithmic: $w = \log(y + 1)$	Reflected logarithmic: $w = -\log(1 - y)$	Compresses large positives and reflects/expands negatives for symmetry.
2	Quadratic expansion: $w = \frac{(y + 1)^2 - 1}{2}$	Logarithmic reflection: $w = -\log(1 - y)$	Expands positives quadratically while applying log to negatives.
-1	Reciprocal-like: $w = \frac{(y + 1)^{-1} - 1}{-1} = 1 - (y + 1)^{-1} = \frac{y}{y+1}$	Adjusted power: $w = -\frac{(1 - y)^{3} - 1}{3}$	Inverts positives and applies higher power to negatives for skewness reduction.

History

Power transform

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Power transform

Definition

Box–Cox transformation

Confidence interval

Example

Econometric application

Activities and demonstrations

Yeo–Johnson transformation

Box-Tidwell transformation

Overview

Method

Applications

Stabilizing Continuous Predictors

Verifying Linearity in Logistic Regression

Method for Verifying Linearity

Limitations

Alternative names and related concepts

Notes

References

External links

Power transform

Overview

Definition

History

Parametric Power Transformations

Box–Cox Transformation

Yeo–Johnson Transformation

Semiparametric and Other Transformations

Box-Tidwell Transformation

Nonparametric Alternatives

Estimation and Implementation

Parameter Estimation Methods

Confidence Intervals and Diagnostics

Applications

Stabilizing Variance and Achieving Normality

Transforming Predictors in Regression Models

Uses in Machine Learning and Time Series

References

Add your contribution

Related Hubs

Contribute something