login/register

Snip!t from collection of Alan Dix

see all channels for Alan Dix

Snip
summary

Cross-validation is the statistical practice of partitio... data into subsets such that the analysis is initially pe... single subset, while the other subset(s) are retained fo...
The initial subset of data is called the training set; t ...
The theory

Cross-validation - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Cross-validation

Categories

/Channels/math

[ go to category ]

For Snip

loading snip actions ...

For Page

loading url actions ...

Cross-validation is the statistical practice of partitioning a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis.

The initial subset of data is called the training set; the other subset(s) are called validation or testing sets.

The theory of cross-validation was inaugurated by Seymour Geisser. It is important in guarding against testing hypotheses suggested by the data ("Type III error"), especially where further samples are hazardous, costly or impossible (uncomfortable science) to collect.

HTML

<p><b>Cross-validation</b> is the <a href="/wiki/Statistics" title="Statistics">statistical</a> practice of <a href="/wiki/Partition_of_a_set" title="Partition of a set">partitioning</a> a <a href="/wiki/Statistical_sample" title="Statistical sample">sample</a> of <a href="/wiki/Data" title="Data">data</a> into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis.</p> <p>The initial subset of data is called the <i>training set</i>; the other subset(s) are called <i>validation</i> or <i>testing sets</i>.</p> <p>The theory of cross-validation was inaugurated by <a href="/wiki/Seymour_Geisser" title="Seymour Geisser">Seymour Geisser</a>. It is important in guarding against <a href="/wiki/Testing_hypotheses_suggested_by_the_data" title="Testing hypotheses suggested by the data">testing hypotheses suggested by the data</a> ("Type III error"), especially where further <a href="/wiki/Statistical_sample" title="Statistical sample">samples</a> are hazardous, costly or impossible (<a href="/wiki/Uncomfortable_science" title="Uncomfortable science">uncomfortable science</a>) to collect.</p>