Regression Calculator

Paste your x and y data and instantly get a linear regression trendline: slope, intercept, correlation (r), R², and predictions. Designed for fast screenshots, homework checks, business experiments, and “is this relationship linear?” vibes.

⚡Instant equation: ŷ = a + bx

🧠R & R² explained in plain English

🎯Predict y for any x

💾Save & share your trendline

Enter your data

You can paste numbers separated by commas, spaces, or new lines. If you prefer, paste pairs like x,y on each line in the X box.

X values or x,y pairs *

🧾

Tip: If you paste pairs, leave Y empty.

Y values (optional if using pairs)

🧾

Same separators work: commas, spaces, or lines.

Predict y at x (optional)

🎯

We’ll compute ŷ using your fitted line.

Your regression result will appear here

Paste your data and tap “Run Regression” to get the trendline equation and R².

This calculator runs entirely in your browser. Nothing is uploaded.

Fit quality bar: 0% = weak linear fit · 50% = moderate · 100% = very strong.

WeakModerateStrong

Educational tool. Regression does not prove causation. Always sanity-check assumptions and consider outliers.

🧩What you’ll get

Regression outputs (instant)

You paste numbers. We return the key stats people actually screenshot.

Trendline: ŷ = a + bx
R & R²: relationship strength + explained variance
RMSE: typical prediction error (in y-units)
Prediction: enter x → get ŷ

Pro tip: If your fit looks “bad,” check for one extreme outlier. One point can bend a line.

✅Quick interpretation

How to read R² fast

R² is the shareable “how good is the line?” number. It ranges from 0 to 1.

R² ≈ 0.00–0.30: weak linear relationship
R² ≈ 0.30–0.70: moderate relationship
R² ≈ 0.70–1.00: strong linear relationship

R² can be high even when the relationship is not causal—correlation ≠ causation.

📚 Formula breakdown

Linear regression, explained like a human (not a textbook)

Linear regression is the simplest way to summarize a relationship between two numerical variables: an input x (what you control or observe) and an output y (what changes). The idea is to draw a straight line that “best fits” your data points.

The line is written as: ŷ = a + b x. Here, ŷ (y-hat) means “predicted y.” The two key parameters are:

Slope (b): how much y changes when x increases by 1 unit.
Intercept (a): the predicted y when x = 0 (sometimes meaningful, sometimes not).

What “best fit” means

“Best fit” is defined using least squares. For each data point (xᵢ, yᵢ), the line predicts ŷᵢ = a + b xᵢ. The vertical error is eᵢ = yᵢ − ŷᵢ. Least squares chooses a and b to minimize the sum of squared errors: SSE = Σ (yᵢ − ŷᵢ)². Squaring does two things: it makes all errors positive and it punishes big errors more. That’s why one outlier can dominate the fit.

How we compute slope and intercept

In simple linear regression, the math collapses to two clean formulas using means and variance:

b = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)²
a = ȳ − b x̄

x̄ is the average of all x values, and ȳ is the average of all y values. The numerator in b is essentially the covariance between x and y: it measures whether x and y move together. The denominator is the variance of x: it measures how spread out x is. If all x values are the same (no spread), you can’t fit a line (division by zero), because there’s no way to see how y changes with x.

Correlation (r) and R²

People love regression because it gives a single, sharable “relationship strength” number. That number is usually R² (pronounced “R-squared”). In simple linear regression with an intercept, R² is the square of Pearson’s correlation r.

Correlation is: r = Σ(xᵢ − x̄)(yᵢ − ȳ) / √(Σ(xᵢ − x̄)² · Σ(yᵢ − ȳ)²). It ranges from -1 to +1: positive means y tends to increase when x increases, negative means y tends to decrease. R² is r², so it ranges from 0 to 1.

R² ≈ 0: the line explains almost none of the variation in y.
R² ≈ 1: the line explains most of the variation in y.

Another way to remember it: R² is “how much of y’s wiggle the line explains.” If y is noisy relative to the trend, R² drops.

RMSE: the “typical error” you can feel

R² tells you the strength of the relationship, but it doesn’t tell you the size of errors in y-units. That’s where RMSE comes in: RMSE = √(SSE / n) (or √(SSE/(n−2)) in some textbooks). This tool uses √(SSE/n) as a simple, intuitive “average-ish” error. If your y is measured in dollars, RMSE is in dollars. If your y is in degrees, RMSE is in degrees. That makes RMSE great for decision-making: “If I predict y using this line, I’m typically off by ~RMSE.”

Regression ≠ causation (the most important sentence)

Regression is a summary of association, not proof of cause. Ice cream sales and sunburns both rise in summer; regression will show a strong relationship, but ice cream does not cause sunburn. This is why context matters. Use regression as a flashlight, not a verdict.

🧪 Examples

Worked examples you can copy-paste

Here are practical mini datasets. Paste them as lists (X box and Y box), or paste them as pairs in the X box (leave Y empty). Then compare your output to the expected story.

Example 1: almost perfectly linear

X: 1, 2, 3, 4, 5
Y: 2, 4, 6, 8, 10

You should get b ≈ 2 and a ≈ 0.
R² should be ~1.00 because it’s basically a perfect line.

Example 2: linear-ish with noise

X: 10 20 30 40 50
Y: 15 19 35 33 52

Slope should be positive (y rises as x rises).
R² won’t be 1.00 because there’s noise around the trend.

Example 3: negative relationship

(Pairs in X box)
1, 10
2, 9
3, 7
4, 6
5, 5

Slope should be negative (as x increases, y decreases).
r should be negative; R² stays positive.

Example 4: outlier check

X: 1,2,3,4,5,6
Y: 2,4,6,8,10,100

The last point is an extreme outlier that will yank the line upward.
Try deleting 100 and re-running; notice the massive shift.

Want a quick viral post idea? Run regression on two “relatable” things (sleep vs mood, coffee vs productivity), screenshot the equation + R², and caption it: “My life in one line.”

🧠 How it works

What happens when you click “Run Regression”

This tool does three things: (1) it cleans your input, (2) it computes the regression with the standard least-squares formulas, and (3) it formats everything for readability and easy sharing.

Step 1: Parse and validate data

We accept commas, spaces, tabs, and line breaks. The calculator tries to be forgiving because real-world copy/paste is messy. It also ignores empty tokens. If you use the “pairs” format, each line is split by comma (or tab), and we read x and y together. If you use two lists, we match by position: the first x goes with the first y, and so on.

If the counts don’t match, we show an error.
If you have fewer than 2 points, regression isn’t possible.
If all x values are identical, slope can’t be computed (no x-variance).

Step 2: Compute the line

Once we have x and y arrays, we compute x̄ and ȳ, then compute slope b and intercept a. With those, we can predict ŷ for each x and calculate errors. That yields SSE and RMSE. We also compute correlation r and R².

Step 3: Make it share-friendly

Your result card shows the equation and key stats. The fit-quality bar uses R² as the fill value, so you can “feel” the strength at a glance. The share buttons generate a short summary containing equation + R² + prediction (if you asked for one).

When linear regression is a bad idea

Linear regression is not magic. It’s a straight line. If your relationship is curved (like exponential growth, diminishing returns, or U-shapes), a straight line can mislead. Signs you should be cautious:

R² is low but the scatter plot looks obviously curved.
Residuals (errors) get bigger as x increases.
One outlier completely changes slope.

Still, for quick insight and communication, linear regression is the fastest “data story” tool. It’s the reason trendlines exist in spreadsheets.

❓ FAQ

Frequently Asked Questions

What kind of regression is this?
This calculator performs simple linear regression with an intercept: ŷ = a + bx. It’s the standard “trendline” most people use in Excel/Google Sheets.
Does a high R² mean x causes y?
No. High R² means a linear model explains a lot of variation in y, but causation requires domain knowledge, experiment design, and confounder checks.
Why is my R² low even though I see a pattern?
Your pattern might be non-linear (curved), your data might be noisy, or you might have outliers. Try plotting your points or checking if one value is far from the rest.
What’s the difference between r and R²?
r is correlation (direction + strength), ranging from -1 to +1. R² is r squared, ranging from 0 to 1, and represents explained variance in y. r tells you if the slope is positive or negative; R² is always non-negative.
How many points do I need?
Technically two points define a line, but that’s fragile. For anything meaningful, use more points. With tiny datasets, one outlier can dominate and “fake” a trend.
Why does the intercept look weird?
The intercept is the predicted y at x = 0. If x = 0 is outside your data range (common), the intercept can feel unintuitive. The slope is often the more interpretable parameter.
Is this the same as Excel’s trendline?
For linear trendlines, yes: it uses the same least-squares idea. Minor differences can occur due to rounding, settings, or whether the intercept is forced to zero (this tool does not force it).

MaximCalculator provides simple, user-friendly tools. Always treat results as educational guidance and double-check any important numbers elsewhere.