 Model specification | Wikipedia audio article

October 7, 2019

In statistics, model specification is part
of the process of building a statistical model: specification consists of selecting an appropriate
functional form for the model and choosing which variables to include. For example, given
personal income y {\displaystyle y}
together with years of schooling s {\displaystyle s}
and on-the-job experience x {\displaystyle x}
, we might specify a functional relationship y
=f
( s
, x
) {\displaystyle y=f(s,x)}
as follows: ln
⁡ y
=ln
⁡ y +
ρ s
+ β 1 x
+ β 2 x 2 +
ε {\displaystyle \ln y=\ln y_{0}+\rho s+\beta
_{1}x+\beta _{2}x^{2}+\varepsilon } where ε {\displaystyle \varepsilon }
is the unexplained error term that is supposed to comprise independent and identically distributed
Gaussian variables. The statistician Sir David Cox has said, “How
[the] translation from subject-matter problem to statistical model is done is often the
most critical part of an analysis”.==Specification error and bias==
Specification error occurs when the functional form or the choice of independent variables
poorly represent relevant aspects of the true data-generating process. In particular, bias
(the expected value of the difference of an estimated parameter and the true underlying
value) occurs if an independent variable is correlated with the errors inherent in the
underlying process. There are several different possible causes of specification error; some
are listed below. An inappropriate functional form could be
employed. A variable omitted from the model may have
a relationship with both the dependent variable and one or more of the independent variables
(causing omitted-variable bias). An irrelevant variable may be included in
the model (although this does not create bias, it involves overfitting and so can lead to
poor predictive performance). The dependent variable may be part of a system
of simultaneous equations (giving simultaneity bias).Additionally, measurement errors may
affect the independent variables: while this is not a specification error, it can create
statistical bias. Note that all models will have some specification
error. Indeed, in statistics there is a common aphorism that “all models are wrong”. In the
words of Burnham & Anderson, “Modeling is an art as well as a science and is directed
toward finding a good approximating model … as the basis for statistical inference”.===Detection of misspecification===
The Ramsey RESET test can help test for specification error in regression analysis.
In the example given above relating personal income to schooling and job experience, if
the assumptions of the model are correct, then the least squares estimates of the parameters ρ {\displaystyle \rho }
and β {\displaystyle \beta }
will be efficient and unbiased. Hence specification diagnostics usually involve testing the first
to fourth moment of the residuals.==Model building==
Building a model involves finding a set of relationships to represent the process that
is generating the data. This requires avoiding all the sources of misspecification mentioned
above. One approach is to start with a model in general
form that relies on a theoretical understanding of the data-generating process. Then the model
can be fit to the data and checked for the various sources of misspecification, in a
task called statistical model validation. Theoretical understanding can then guide the
modification of the model in such a way as to retain theoretical validity while removing
the sources of misspecification. But if it proves impossible to find a theoretically
acceptable specification that fits the data, the theoretical model may have to be rejected
and replaced with another one. A quotation from Karl Popper is apposite here:
“Whenever a theory appears to you as the only possible one, take this as a sign that you
have neither understood the theory nor the problem which it was intended to solve”.Another
approach to model building is to specify several different models as candidates, and then compare
those candidate models to each other. The purpose of the comparison is to determine
which candidate model is most appropriate for statistical inference. Common criteria
for comparing models include the following: R2, Bayes factor, and the likelihood-ratio
test together with its generalization relative likelihood. For more on this topic, see statistical
Akaike, Hirotugu (1994), “Implications of informational point of view on the development
of statistical science”, in Bozdogan, H. (ed.), Proceedings of the First US/JAPAN Conference
on The Frontiers of Statistical Modeling: An Informational Approach—Volume 3, Kluwer
Academic Publishers, pp. 27–38. Asteriou, Dimitrios; Hall, Stephen G. (2011).
“Misspecification: Wrong regressors, measurement errors and wrong functional forms”. Applied
Econometrics (Second ed.). Palgrave Macmillan. pp. 172–197.
Colegrave, N.; Ruxton, G. D. (2017). “Statistical model specification and power: recommendations
on the use of test-qualified pooling in analysis of experimental data”. Proceedings of the
Royal Society B. 284 (1851): 20161850. doi:10.1098/rspb.2016.1850. PMC 5378071. PMID 28330912.
Gujarati, Damodar N.; Porter, Dawn C. (2009). “Econometric modeling: Model specification
and diagnostic testing”. Basic Econometrics (Fifth ed.). McGraw-Hill/Irwin. pp. 467–522.
ISBN 978-0-07-337577-9. Harrell, Frank (2001), Regression Modeling
Strategies, Springer. Kmenta, Jan (1986). Elements of Econometrics
(Second ed.). New York: Macmillan Publishers. pp. 442–455. ISBN 0-02-365070-2.
Lehmann, E. L. (1990). “Model specification: The views of Fisher and Neyman, and later
developments”. Statistical Science. 5 (2): 160–168. doi:10.1214/ss/1177012164.
MacKinnon, James G. (1992). “Model specification tests and artificial regressions”. Journal
of Economic Literature. 30 (1): 102–146. JSTOR 2727880.
Maddala, G. S.; Lahiri, Kajal (2009). “Diagnostic checking, model selection, and specification
testing”. Introduction to Econometrics (Fourth ed.). Wiley. pp. 401–449. ISBN 978-0-470-01512-4.
Sapra, Sunil (2005). “A regression error specification test (RESET) for generalized linear models”
(PDF). Economics Bulletin. 3 (1): 1–6.