Enter Frequency Data
| # | Bin / Class Interval | Observed (Oi) | Expected (Ei) | Note | Del |
|---|---|---|---|---|---|
| Totals | — | — | |||
Theory & Formula
The Chi-Square (χ²) Goodness-of-Fit Test is a non-parametric statistical test that determines whether observed frequencies differ significantly from expected theoretical frequencies. In engineering hydrology, it validates whether flood peak, rainfall, or streamflow data follows a specific probability distribution before using it for design return-period estimation.
The Core Formula
Where \(O_i\) = observed frequency in bin \(i\), \(E_i\) = expected (theoretical) frequency in bin \(i\), and \(k\) = number of class intervals (bins).
Degrees of Freedom
Where \(m\) = number of distribution parameters estimated from the observed data. For example: Normal distribution requires estimating mean and standard deviation, so \(m = 2\). Gumbel EV-I also has two parameters, so \(m = 2\).
Decision Rule
- χ²calc ≤ χ²critical: Fail to reject H₀ — data fits the assumed distribution. Acceptable for design use.
- χ²calc > χ²critical: Reject H₀ — data does not adequately fit the assumed distribution. Try an alternative.
Key Assumptions & Requirements
- Each class interval must have an expected frequency ≥ 5. Merge adjacent bins if violated.
- Observations must be independent of each other.
- Minimum recommended sample size: n ≥ 50.
- The test is sensitive to the choice of bin boundaries — results can vary with different binning strategies.
Common Distributions in Hydrology
| Distribution | Parameters (m) | Typical Use | Region/Standard |
|---|---|---|---|
| Gumbel (EV-I) | 2 (location μ, scale σ) | Annual flood peaks, max daily rainfall | Global, BS 5400 |
| Normal | 2 (mean, SD) | Temperature, moderate rainfall | General |
| Log-Normal | 2 (log-mean, log-SD) | Skewed flood data | General |
| Log-Pearson III | 3 (mean, SD, skew) | Flood frequency analysis | US Bulletin 17C, BIS |
| Pearson III | 3 | Flood, rainfall | China, Russia |
| Exponential | 1 (rate λ) | Inter-arrival times, drought | General |
Critical Value Table (χ², upper-tail)
| df | α = 0.10 | α = 0.05 | α = 0.01 |
|---|---|---|---|
| 1 | 2.706 | 3.841 | 6.635 |
| 2 | 4.605 | 5.991 | 9.210 |
| 3 | 6.251 | 7.815 | 11.345 |
| 4 | 7.779 | 9.488 | 13.277 |
| 5 | 9.236 | 11.070 | 15.086 |
| 6 | 10.645 | 12.592 | 16.812 |
| 7 | 12.017 | 14.067 | 18.475 |
| 8 | 13.362 | 15.507 | 20.090 |
| 9 | 14.684 | 16.919 | 21.666 |
| 10 | 15.987 | 18.307 | 23.209 |
| 15 | 22.307 | 24.996 | 30.578 |
| 20 | 28.412 | 31.410 | 37.566 |
| 25 | 34.382 | 37.652 | 44.314 |
| 30 | 40.256 | 43.773 | 50.892 |
Worked Example
Problem: 50 years of annual flood peak data grouped into 4 class intervals. Test whether data follows the Gumbel (EV-I) distribution at α = 0.05. Parameters (location & scale) are estimated from the data, so m = 2.
| Interval (m³/s) | Oi | Ei | (O−E)²/E |
|---|---|---|---|
| 0 – 200 | 12 | 15 | (12−15)²/15 = 9/15 = 0.600 |
| 200 – 400 | 20 | 18 | (20−18)²/18 = 4/18 = 0.222 |
| 400 – 600 | 11 | 11 | (11−11)²/11 = 0/11 = 0.000 |
| > 600 | 7 | 6 | (7−6)²/6 = 1/6 = 0.167 |
| Total | 50 | 50 | χ² = 0.989 |
Degrees of Freedom: df = k − 1 − m = 4 − 1 − 2 = 1
Critical Value at α = 0.05, df = 1: χ²critical = 3.841
Decision: 0.989 < 3.841
Frequently Asked Questions
1. What is the Chi-Square goodness-of-fit test in hydrology?
It is a formal statistical test that determines whether observed hydrological data — such as flood peaks, annual rainfall, or streamflow — follows a specified theoretical probability distribution. It quantifies the discrepancy between what was observed and what a theoretical model predicts, giving a statistically defensible basis for distribution selection in frequency analysis.
2. Why is distribution fitting important in flood frequency analysis?
The chosen probability distribution directly determines design flood estimates for critical structures like dams, bridges, and spillways. An incorrect distribution can lead to either dangerous underestimation of design flows or costly over-engineering. The Chi-Square test provides a formal, reproducible criterion for accepting or rejecting a candidate distribution.
3. What are degrees of freedom and how are they calculated?
Degrees of freedom (df) = k − 1 − m, where k is the number of class intervals and m is the number of distribution parameters estimated from the data. Subtracting parameters accounts for the fact that fitting parameters uses some of the data's information, reducing independent variation. For example, a Normal distribution fit has m = 2 (mean and SD), so with 6 bins: df = 6 − 1 − 2 = 3.
4. What happens when an expected frequency bin is less than 5?
Bins with E < 5 violate a core assumption of the chi-square approximation, causing the test statistic to be unreliable — typically inflating χ², leading to false rejections. Adjacent bins must be merged until all expected frequencies are ≥ 5. This calculator warns you when this condition is violated and includes an auto-merge button.
5. How does the Gumbel distribution relate to extreme flood events?
The Gumbel (Extreme Value Type I) distribution arises naturally as the limiting distribution of the maximum of a large number of independent identically distributed variables. Annual flood peaks are the maximum of daily flows — so by the Extreme Value Theorem, they tend toward a Gumbel distribution. It has a heavier upper tail than the Normal, reflecting the higher probability of extreme events.
6. What is the Log-Pearson Type III and when is it used?
LP-III is the log-transformed version of the Pearson Type III (Gamma) distribution. It can accommodate positive skewness common in flood data. The US Army Corps of Engineers mandates its use (Bulletin 17C) for flood frequency analysis. It has three parameters: mean of log(Q), standard deviation of log(Q), and skewness coefficient of log(Q), so m = 3.
7. When should I use Kolmogorov-Smirnov instead of Chi-Square?
The K-S test works on the continuous CDF without requiring data grouping. It is preferred for: small samples (n < 50), continuous distributions where natural bin boundaries are unclear, or when you want to test at every point in the distribution rather than just within bins. Chi-Square is better when data is naturally grouped or when you have n ≥ 50 with clear class intervals.
8. What does the p-value mean in this test?
The p-value is the probability of obtaining a χ² statistic as large as the calculated value if the null hypothesis (good fit) is true. A small p-value (< α) indicates the observed data would be unlikely under the assumed distribution — evidence against it. p > α means the data is consistent with the assumed distribution. This calculator provides an approximation using a gamma function based method.
9. How sensitive is the test to the number and width of bins?
Significantly sensitive. Too few bins reduce power; too many create bins with low expected frequencies. A common rule of thumb is to use k = 1 + 3.3 × log₁₀(n) bins (Sturges' rule). Equal-probability bins (each bin has the same expected frequency) are generally preferred over equal-width bins as they distribute information evenly. Always try multiple binning strategies to check robustness.
10. What is the difference between one-tailed and two-tailed chi-square tests?
For goodness-of-fit, only the upper tail matters — a very large χ² indicates a poor fit. Very small χ² (better than expected) can sometimes indicate data manipulation, but standard practice uses only the upper-tail critical value. Two-tailed versions are used in chi-square tests of independence (contingency tables), which is a different application.
11. Can chi-square test be used for non-hydrological data?
Absolutely. The test is generic and applicable to any frequency data: structural load distributions, wind speed analysis, earthquake magnitude frequency, traffic volume, or any engineering variable where you wish to validate a theoretical model against observations. The principles remain the same regardless of the physical domain.
12. What sample size is needed for reliable results?
At minimum n = 50 observations are recommended, with each bin having E ≥ 5. Below n = 30, results are unreliable. For very small datasets (n < 30), consider the Anderson-Darling test, which is more powerful for small samples and works on continuous distributions without binning. Larger samples (n > 100) give the chi-square test high statistical power.
13. How do I determine the expected frequencies for a distribution?
Expected frequency for bin i: Eᵢ = n × P(aᵢ ≤ X ≤ bᵢ), where P is calculated from the distribution's CDF. For Gumbel: P = exp(−exp(−y)) where y = (x − μ)/σ (reduced variate). For Normal: P = Φ((b − μ)/σ) − Φ((a − μ)/σ) where Φ is the standard normal CDF. Parameters μ and σ are estimated from the data using method of moments or maximum likelihood.
14. What is the relationship between chi-square test and probability paper plots?
Probability paper plots are graphical methods — you plot data on specially scaled axes so that a fitted distribution appears as a straight line. The chi-square test is the formal numerical counterpart, providing a quantitative decision criterion. Best practice is to use both: plot to visually assess fit and identify outliers, then use chi-square to formally confirm or reject the distribution choice.
15. How do I handle tied observations when grouping into bins?
Ties (identical values) can create artificial bin boundaries. When multiple observations fall exactly on a bin boundary, assign them consistently to one bin (e.g., always to the upper bin using the convention aᵢ ≤ x < bᵢ₊₁). Document your convention. For continuous distributions, the probability of exactly tied values is theoretically zero, so ties often indicate measurement resolution issues.