How text diff works: line, word, and character comparisons
Comparing two pieces of text seems trivial until you try to do it well. A good diff does not just say "these are different," it shows the smallest set of changes that turns one version into the other. That is a real computation, and understanding it makes diffs easier to read.
What a diff is really computing
Under the hood, most diff tools solve a version of the longest common subsequence problem: find the largest amount of content the two texts share, in order, then everything outside that shared core is what changed. Framing it this way is why a good diff highlights a small inserted sentence rather than marking everything after it as different. The goal is the minimal, most readable set of edits.
Line, word, and character granularity
- Line diff treats each line as a unit. It is ideal for code and config, where changes are naturally line-based, and it is what version control shows.
- Word diff compares word by word. For prose, where a sentence is edited mid-line, this pinpoints the changed words instead of flagging the whole line.
- Character diff is the finest grain, useful for spotting a single changed digit or a typo.
Picking the right granularity is most of the battle. A line diff of edited prose is noisy; a character diff of code is overwhelming.
Reading a unified diff
The unified format, the one you see in code review, marks removed lines with -, added lines with +, and surrounds them with a few unchanged context lines. The @@ markers give the line numbers each hunk affects. Once you read - as "old" and + as "new," the whole format becomes obvious.
The Diff Checker compares two texts in your browser, with nothing uploaded, and highlights exactly what changed. For counting and analyzing a single text rather than comparing two, see the text statistics tool.