Math

Simple Linear Regression

Q: What is the difference between slope (m) and intercept (b) in simple linear regression?

The slope **m** measures how much y changes for each one-unit increase in x — for example, m = 2.3 means y rises by 2.3 units when x increases by 1. The intercept **b** is the predicted value of y when x = 0. Practically, b is meaningful only if x = 0 falls within a plausible range of your data; otherwise it is just a mathematical anchor for the line. For example, in a regression of weight on height, the intercept (weight at zero height) is biologically meaningless but mathematically necessary. Always interpret the intercept in the context of your specific data and only at values of x that make physical or business sense.

Q: How many data points do I need for simple linear regression to be reliable?

A commonly cited rule of thumb is at least **n = 10–20 observations** for simple linear regression with one predictor. With n < 5 the estimates are highly unstable and confidence intervals are extremely wide. For inference (hypothesis testing on the slope), a minimum of n = 10 ensures the t-distribution approximation is reasonable. For prediction with moderate accuracy, n = 30+ is preferred. If you need to estimate confidence intervals around predictions for unseen x values, n = 50+ is recommended. For machine-learning contexts and complex industrial calibrations, n = 100+ becomes the working minimum. Statistical power for detecting a moderate slope (effect size ~0.5) typically requires n = 25-30 at α = 0.05.

Q: Can R² be negative?

Technically yes — R² can be negative if you compute it for a model that was **not** fit by OLS to your specific data. For example, if you impose a fixed slope or intercept from external knowledge, or if you compute R² on a held-out test set using a model trained on different data, you can get a negative value. A negative R² means the model fits **worse** than simply predicting the mean ȳ for every observation. However, when you use the standard OLS formulas on the same data you fit the model on, R² is mathematically constrained to be between 0 and 1 by construction. Negative R² on a test set is a strong signal of overfitting or model misspecification.

Q: What is the difference between simple and multiple linear regression?

**Simple** linear regression uses exactly one predictor (x) to model y: ŷ = mx + b. **Multiple** linear regression uses two or more predictors: ŷ = b₀ + b₁x₁ + b₂x₂ + … + bₖxₖ. Adding more predictors almost always increases R², even if the new variables are irrelevant — which is why **adjusted R²** is used in multiple regression to penalize unnecessary predictors. Multiple regression also introduces complications: **multicollinearity** (predictors correlated with each other), **interaction effects** (where the slope of one predictor depends on another), and increased risk of **overfitting** when the number of predictors approaches the number of observations. Start with simple regression to understand the relationship, then add predictors only if theory or data justifies it.

Q: Is a high R² always better?

Not necessarily. In **social sciences and economics**, R² values of 0.30–0.50 are common and considered meaningful because human behavior is inherently noisy. In **physics or chemistry**, R² < 0.99 might indicate a measurement or model problem. Context matters: an **overfitted model on small data** can show R² ≈ 1.0 yet predict new observations poorly — the classic warning sign of overfitting. **Cross-validation** (split data into train/test, or use k-fold) gives a more honest assessment of true predictive power. Also, a high R² combined with systematic patterns in residuals (curvature, increasing variance) indicates that despite the good fit number, the model is structurally wrong and should be replaced (e.g., with polynomial regression or log-transformed variables).

Q: What assumptions does ordinary least squares (OLS) regression make?

OLS regression makes four classical assumptions, often summarized as **LINE**: (1) **Linearity** — the true relationship between x and y is linear in the parameters. (2) **Independence** — observations are not correlated with each other (a critical violation occurs in time series data with autocorrelation). (3) **Normality of residuals** — residuals are approximately normally distributed (important for hypothesis tests and confidence intervals, less critical for point estimates). (4) **Equal variance (homoscedasticity)** — the variance of residuals is constant across all x values (heteroscedasticity is common in real data: residuals fan out as x grows). Violations of these assumptions can bias slope estimates or invalidate p-values and confidence intervals. Always check assumptions via residual plots, Q-Q plots, and tests like Durbin-Watson (autocorrelation) or Breusch-Pagan (heteroscedasticity).

Q: How is the Pearson correlation coefficient r related to R²?

In **simple** linear regression with one predictor, R² is exactly the square of the Pearson correlation coefficient: **R² = r²**. For example, if r = 0.90, then R² = 0.81. If r = -0.7, then R² = 0.49. The sign of r tells you the direction of the relationship (positive or negative slope), which R² alone does not — R² is always ≥ 0 regardless of direction. This relationship breaks down in multiple regression, where R² generalizes but is no longer a simple square of a single correlation coefficient. In simple regression, you can compute one from the other interchangeably. Pearson r is preferred when you want to communicate a symmetric measure of association; R² is preferred when you want to communicate the fraction of variance explained.

Q: Can I use simple linear regression if my data has outliers?

Use it with caution. OLS is **sensitive to outliers** because it minimizes squared residuals, giving disproportionate weight to extreme points. A single outlier can shift the slope dramatically and inflate or deflate R². Best practice: (1) **Always plot your data first** — a scatter plot reveals outliers immediately. (2) **Identify potential outliers** using residual plots, leverage values (hat matrix diagonal), Cook's distance (combined leverage and residual), or studentized residuals. (3) **Investigate before deleting** — is the outlier a data-entry error, a genuine but unusual observation, or a sign of a different underlying mechanism? Never delete data just because it spoils your model. (4) **Robust regression methods** (Theil-Sen estimator, RANSAC, Huber regression, quantile regression) are less sensitive to outliers if they cannot be removed and provide a sanity check on OLS results.

Q: What is the difference between regression and correlation?

**Correlation** measures the strength and direction of a linear association between two variables, using a symmetric metric like Pearson r (range -1 to +1). It tells you whether x and y move together but does not produce a predictive equation. **Regression** fits a specific equation y = mx + b that lets you predict y from x. Regression is asymmetric — regressing y on x gives a different line than regressing x on y. Use correlation when both variables are exchangeable and you only want to quantify association. Use regression when one variable is naturally the predictor (cause, independent variable, treatment) and the other is the response (effect, dependent variable, outcome) and you want to make predictions or estimate effect sizes.

Calculator Free · Private

Reviewed by: Martín Rodríguez (política editorial ) · Last reviewed: 23 may 2026

Was this calculator helpful?

Simple Linear Regression is one of the most fundamental tools in statistics and data analysis. It fits a straight line through a set of (x, y) data points by minimizing the total squared vertical distance between each observed point and the line — a method known as Ordinary Least Squares (OLS). The resulting line takes the form y = mx + b, where m is the slope (the rate of change of y per unit of x) and b is the y-intercept (the predicted y when x = 0). The coefficient of determination R² (between 0 and 1) tells you what fraction of the variance in y is explained by x — for example, R² = 0.85 means the linear model accounts for 85% of the variability in your data. OLS was developed independently by Carl Friedrich Gauss (1809) and Adrien-Marie Legendre (1805) to handle astronomical observations and remains the workhorse of empirical research two centuries later. You will see it in econometrics (estimating elasticities of demand), epidemiology (dose-response curves), finance (CAPM beta estimation), engineering (calibration curves for sensors), psychology (correlation between variables), education (predicting student performance), and machine learning (where it serves as the baseline before moving to more complex models like ridge regression, lasso, or neural networks). Use this calculator any time you need to: quantify a linear relationship between two continuous variables, forecast future values based on past data, check whether two variables are linearly associated before deciding on a more complex model, or simply understand the math behind the regression line that statistical software produces. The tool computes slope, intercept, R², residuals, and the full regression equation, with worked examples for context. Just paste comma-separated x and y values and get instant results with clear interpretation. For deeper analysis (confidence intervals, hypothesis tests, residual diagnostics, multicollinearity), step up to a full statistical package like R, Python's statsmodels, or SPSS.

Last reviewed: May 22, 2026 Verified by Martín Rodríguez Source: NIST/SEMATECH e-Handbook of Statistical Methods — Linear Least Squares Regression, Wikipedia — Simple Linear Regression, NIST/SEMATECH e-Handbook — Measures of Fit (R²) 100% private

When to use this calculator

Estimating a student's exam score (y) from hours studied (x) using a class dataset of 30 students — example: y = 8.5x + 20, predicting 105 points for 10 hours of study with R² = 0.78.
Analyzing whether monthly advertising spend (x, in USD) predicts monthly sales revenue (y, in USD) for a small business — example: an extra $1,000 ad spend yields $4,200 incremental revenue (m = 4.2), with R² = 0.83.
Fitting a calibration curve in a chemistry laboratory relating instrument signal readings (x, in absorbance units) to known concentration standards (y, in mg/L) to back-calculate unknown sample concentrations with R² ≥ 0.999.
Modeling outdoor temperature (x, in °F) vs daily household electricity consumption (y, in kWh) — example: each 1°F drop in winter adds 0.6 kWh/day, enabling utility bill forecasting for the coming month.
Estimating used car prices (y, in USD) as a function of vehicle age (x, in years) using a dealer dataset of 200 sales — example: each year of age drops resale value by $1,500 (m = -1500) with R² = 0.65.
Tracking website conversion rate (y, in %) vs page load time (x, in seconds) for an e-commerce site — example: each additional second of load time costs 2.3 percentage points in conversion (m = -2.3).
Sports analytics: predicting marathon finish time (y, in minutes) from weekly training mileage (x) using data from 100 runners — example: each additional weekly mile shaves 1.8 minutes off finish time (m = -1.8).
Real estate: estimating home prices (y, in $1000s) from square footage (x) within a single neighborhood — example: each extra square foot adds $180 (m = 0.18), with R² typically 0.55-0.75 depending on data homogeneity.

Example Calculation

(1, 2), (2, 4), (3, 5)
y = 1.5x + 0.67

Result: y = 1.5x + 0.67

How it works

4 min read

How It's Calculated

Simple Linear Regression uses the Ordinary Least Squares (OLS) method, which finds the unique line that minimizes the sum of squared residuals (vertical distances from each data point to the line).

Step-by-step formulas

Given n data pairs (x₁,y₁), (x₂,y₂), …, (xₙ,yₙ):

─── Intermediate sums ────────────────────────────────────────
Σx   = x₁ + x₂ + … + xₙ
Σy   = y₁ + y₂ + … + yₙ
Σxy  = x₁y₁ + x₂y₂ + … + xₙyₙ
Σx²  = x₁² + x₂² + … + xₙ²
x̄   = Σx / n
ȳ   = Σy / n

─── Slope ────────────────────────────────────────────────────
m = [n·Σxy − Σx·Σy] / [n·Σx² − (Σx)²]
  = Σ[(xᵢ − x̄)(yᵢ − ȳ)] / Σ[(xᵢ − x̄)²]

─── Intercept ────────────────────────────────────────────────
b = (Σy − m·Σx) / n  =  ȳ − m·x̄

─── Fitted line ──────────────────────────────────────────────
ŷ = m·x + b

─── Residuals & R² ───────────────────────────────────────────
eᵢ     = yᵢ − ŷᵢ                        (residual for point i)
SS_res = Σeᵢ²                            (sum of squared residuals)
SS_tot = Σ(yᵢ − ȳ)²                     (total sum of squares)
R²     = 1 − SS_res / SS_tot            (coefficient of determination)

> Note: R² equals the square of the Pearson correlation coefficient r when there is exactly one predictor.

---

Reference Table

The table below shows benchmark R² values and their conventional interpretation in applied sciences (based on guidance from NIST/SEMATECH e-Handbook of Statistical Methods):

R² Range	Interpretation	Typical Field Example
0.00 – 0.19	Very weak / negligible fit	Social-science surveys with noisy data
0.20 – 0.39	Weak fit	Early-stage exploratory biology studies
0.40 – 0.59	Moderate fit	Economics / marketing models
0.60 – 0.79	Strong fit	Climate vs. energy-use regression
0.80 – 0.94	Very strong fit	Engineering calibration curves
0.95 – 1.00	Near-perfect fit	Physical law verification, lab standards

Also useful — Pearson r interpretation (r = √R² for simple regression):

\	r\	value	Strength
0.00 – 0.10	Negligible
0.10 – 0.39	Weak
0.40 – 0.69	Moderate
0.70 – 0.89	Strong
0.90 – 1.00	Very strong

---

Typical Cases (Worked Examples)

Example 1 — Exam scores vs. hours studied (n = 3)

x (hours)	y (score)
1	2
2	4
3	5

Σx = 6,  Σy = 11,  Σxy = 1·2 + 2·4 + 3·5 = 25,  Σx² = 14,  n = 3

m = (3·25 − 6·11) / (3·14 − 6²)
  = (75 − 66)   / (42 − 36)
  = 9 / 6 = 1.5

b = (11 − 1.5·6) / 3
  = (11 − 9) / 3 = 2/3 ≈ 0.667

→  ŷ = 1.5x + 0.67

SS_tot = (2−11/3)² + (4−11/3)² + (5−11/3)²
       = (−5/3)² + (1/3)² + (4/3)² = 25/9 + 1/9 + 16/9 = 42/9 ≈ 4.667

ŷ₁=2.17, ŷ₂=3.67, ŷ₃=5.17
SS_res = (2−2.17)² + (4−3.67)² + (5−5.17)²
       ≈ 0.028 + 0.109 + 0.028 = 0.167

R² = 1 − 0.167/4.667 ≈ 0.964  (very strong fit)

Example 2 — Ad spend vs. revenue (n = 5)

x ($00s ad spend)	y ($000s revenue)
1	14
2	17
3	19
4	22
5	25

Σx=15, Σy=97, Σxy=1·14+2·17+3·19+4·22+5·25 = 314, Σx²=55, n=5

m = (5·314 − 15·97) / (5·55 − 15²)
  = (1570 − 1455) / (275 − 225)
  = 115/50 = 2.3

b = (97 − 2.3·15)/5 = (97 − 34.5)/5 = 62.5/5 = 12.5

→  ŷ = 2.3x + 12.5   (R² ≈ 0.993 — near-perfect linear trend)

---

Common Errors

1. Reversing x and y — OLS is NOT symmetric. Regressing y on x gives a different slope than regressing x on y. Always put the predictor (independent variable) as x and the response (dependent variable) as y.

2. Extrapolating far beyond the data range — The fitted line is valid only within (and near) the observed x range. Predicting y for x values far outside the training data ignores the possibility that the relationship is non-linear or changes regime.

3. Confusing R² with causation — A high R² (e.g., 0.97) only means the linear model fits well; it does not prove that x causes y. Spurious correlations can produce very high R² values (classic example: US per-capita cheese consumption vs. deaths by bedsheet tangling, r ≈ 0.95).

4. Using regression with too few data points — With n = 2 you always get R² = 1.0 (a line always passes through 2 points exactly), which is statistically meaningless. A minimum of n ≥ 10–20 is generally recommended for reliable OLS estimates.

5. Ignoring heteroscedasticity and outliers — OLS is sensitive to outliers because errors are squared. A single extreme point can dramatically shift m and b. Always plot your data and residuals before trusting the equation.

6. Assuming linearity without checking — Always plot a scatter diagram first. If the relationship is curved (e.g., quadratic or logarithmic), a straight-line OLS fit will give a misleading m and a deflated R².

---

Related Calculators

Pearson Correlation Coefficient

Mean, Median & Mode

Standard Deviation Calculator

Z-Score Calculator

Variance Calculator

Frequently asked questions

What is the difference between slope (m) and intercept (b) in simple linear regression?

The slope m measures how much y changes for each one-unit increase in x — for example, m = 2.3 means y rises by 2.3 units when x increases by 1. The intercept b is the predicted value of y when x = 0. Practically, b is meaningful only if x = 0 falls within a plausible range of your data; otherwise it is just a mathematical anchor for the line. For example, in a regression of weight on height, the intercept (weight at zero height) is biologically meaningless but mathematically necessary. Always interpret the intercept in the context of your specific data and only at values of x that make physical or business sense.

What does an R² of 0.85 actually mean?

R² = 0.85 means that 85% of the total variability in your y values is explained by the linear relationship with x. The remaining 15% is unexplained variance — measurement noise, omitted variables, non-linear effects, or random fluctuation. According to NIST/SEMATECH guidelines, R² ≥ 0.80 is considered a very strong fit in most applied fields. However, R² is not the only quality metric: a model with R² = 0.85 can still produce poor predictions if it suffers from outliers, heteroscedasticity, or systematic patterns in residuals. Always combine R² with residual plots, cross-validation, and field-specific judgment about whether the residuals are tolerable for your use case.

How many data points do I need for simple linear regression to be reliable?

A commonly cited rule of thumb is at least n = 10–20 observations for simple linear regression with one predictor. With n < 5 the estimates are highly unstable and confidence intervals are extremely wide. For inference (hypothesis testing on the slope), a minimum of n = 10 ensures the t-distribution approximation is reasonable. For prediction with moderate accuracy, n = 30+ is preferred. If you need to estimate confidence intervals around predictions for unseen x values, n = 50+ is recommended. For machine-learning contexts and complex industrial calibrations, n = 100+ becomes the working minimum. Statistical power for detecting a moderate slope (effect size ~0.5) typically requires n = 25-30 at α = 0.05.

Can R² be negative?

Technically yes — R² can be negative if you compute it for a model that was not fit by OLS to your specific data. For example, if you impose a fixed slope or intercept from external knowledge, or if you compute R² on a held-out test set using a model trained on different data, you can get a negative value. A negative R² means the model fits worse than simply predicting the mean ȳ for every observation. However, when you use the standard OLS formulas on the same data you fit the model on, R² is mathematically constrained to be between 0 and 1 by construction. Negative R² on a test set is a strong signal of overfitting or model misspecification.

What is the difference between simple and multiple linear regression?

Simple linear regression uses exactly one predictor (x) to model y: ŷ = mx + b. Multiple linear regression uses two or more predictors: ŷ = b₀ + b₁x₁ + b₂x₂ + … + bₖxₖ. Adding more predictors almost always increases R², even if the new variables are irrelevant — which is why adjusted R² is used in multiple regression to penalize unnecessary predictors. Multiple regression also introduces complications: multicollinearity (predictors correlated with each other), interaction effects (where the slope of one predictor depends on another), and increased risk of overfitting when the number of predictors approaches the number of observations. Start with simple regression to understand the relationship, then add predictors only if theory or data justifies it.

Is a high R² always better?

Not necessarily. In social sciences and economics, R² values of 0.30–0.50 are common and considered meaningful because human behavior is inherently noisy. In physics or chemistry, R² < 0.99 might indicate a measurement or model problem. Context matters: an overfitted model on small data can show R² ≈ 1.0 yet predict new observations poorly — the classic warning sign of overfitting. Cross-validation (split data into train/test, or use k-fold) gives a more honest assessment of true predictive power. Also, a high R² combined with systematic patterns in residuals (curvature, increasing variance) indicates that despite the good fit number, the model is structurally wrong and should be replaced (e.g., with polynomial regression or log-transformed variables).

What assumptions does ordinary least squares (OLS) regression make?

OLS regression makes four classical assumptions, often summarized as LINE: (1) Linearity — the true relationship between x and y is linear in the parameters. (2) Independence — observations are not correlated with each other (a critical violation occurs in time series data with autocorrelation). (3) Normality of residuals — residuals are approximately normally distributed (important for hypothesis tests and confidence intervals, less critical for point estimates). (4) Equal variance (homoscedasticity) — the variance of residuals is constant across all x values (heteroscedasticity is common in real data: residuals fan out as x grows). Violations of these assumptions can bias slope estimates or invalidate p-values and confidence intervals. Always check assumptions via residual plots, Q-Q plots, and tests like Durbin-Watson (autocorrelation) or Breusch-Pagan (heteroscedasticity).

How is the Pearson correlation coefficient r related to R²?

In simple linear regression with one predictor, R² is exactly the square of the Pearson correlation coefficient: R² = r². For example, if r = 0.90, then R² = 0.81. If r = -0.7, then R² = 0.49. The sign of r tells you the direction of the relationship (positive or negative slope), which R² alone does not — R² is always ≥ 0 regardless of direction. This relationship breaks down in multiple regression, where R² generalizes but is no longer a simple square of a single correlation coefficient. In simple regression, you can compute one from the other interchangeably. Pearson r is preferred when you want to communicate a symmetric measure of association; R² is preferred when you want to communicate the fraction of variance explained.

Can I use simple linear regression if my data has outliers?

Use it with caution. OLS is sensitive to outliers because it minimizes squared residuals, giving disproportionate weight to extreme points. A single outlier can shift the slope dramatically and inflate or deflate R². Best practice: (1) Always plot your data first — a scatter plot reveals outliers immediately. (2) Identify potential outliers using residual plots, leverage values (hat matrix diagonal), Cook's distance (combined leverage and residual), or studentized residuals. (3) Investigate before deleting — is the outlier a data-entry error, a genuine but unusual observation, or a sign of a different underlying mechanism? Never delete data just because it spoils your model. (4) Robust regression methods (Theil-Sen estimator, RANSAC, Huber regression, quantile regression) are less sensitive to outliers if they cannot be removed and provide a sanity check on OLS results.

What is the difference between regression and correlation?

Correlation measures the strength and direction of a linear association between two variables, using a symmetric metric like Pearson r (range -1 to +1). It tells you whether x and y move together but does not produce a predictive equation. Regression fits a specific equation y = mx + b that lets you predict y from x. Regression is asymmetric — regressing y on x gives a different line than regressing x on y. Use correlation when both variables are exchangeable and you only want to quantify association. Use regression when one variable is naturally the predictor (cause, independent variable, treatment) and the other is the response (effect, dependent variable, outcome) and you want to make predictions or estimate effect sizes.

When should I use log-transformations or polynomial regression instead of simple linear regression?

Use log-transformations when: (1) the relationship is multiplicative rather than additive — common in economic data where percent changes matter more than absolute changes. (2) the data spans several orders of magnitude — for example, income, city populations, or response times. (3) residuals are heteroscedastic with variance proportional to the level — log transforms often stabilize variance. Common transforms: log y on log x (elasticity model), log y on x (exponential growth), y on log x (logarithmic growth). Use polynomial regression (y = a + bx + cx² + …) when a scatter plot shows clear curvature that a straight line cannot capture. Limit polynomial degree to 2 or 3 to avoid wild oscillations at the edges (Runge's phenomenon). If the relationship is more complex, consider spline regression or non-linear regression with theory-derived functional forms.

How do I interpret confidence intervals and p-values for the slope?

The slope's confidence interval gives a range of plausible values for the true population slope. A 95% CI like [1.8, 2.6] means that if you repeated the sampling and regression many times, 95% of such intervals would contain the true slope. The p-value for the slope tests the null hypothesis H₀: m = 0 (no linear relationship). A small p-value (typically p < 0.05) means the data are inconsistent with no relationship — you reject H₀ and conclude there is evidence of a linear association. Important caveats: p-values do not measure effect size (a tiny slope can have a tiny p-value with enough data); confidence intervals depend on assumptions (independence, normality, homoscedasticity); and statistical significance is not the same as practical significance. Always report both the point estimate of the slope and its confidence interval, not just the p-value.