Data Story

The Two Ledgers

A double-entry audit: where IMDB and TMDB disagree on the money.

filmimdbtmdbfinancedata-qualityaudit
Dataset scope
7168
films
1914–2024
years
859
budget pairs ($)
637
revenue pairs ($)
The chart focuses on titles with values on both ledgers. USD-only mode keeps comparisons honest without currency conversion.
Loading ledgers
Hypothesis

Disagreement in reported budgets and revenue is systematic — driven by missingness, older releases, and metadata practices — not pure noise.

Question: How consistent are financial metrics between IMDB and TMDB for the same titles?

Method: Compare IMDB raw budget/gross text against TMDB budget/revenue, measuring log-ratio gaps and outlier density by decade.

Prediction: Older decades and sparse metadata show larger gaps and more outliers.

Test: Compute log-gap distributions by decade and inspect extreme outliers.

Narrative Arc
Act I

Two columns type themselves in: IMDB on the left, TMDB on the right.

Act II

Threads stretch between the numbers; most hold steady, some fray and snap.

Act III

Audit stamps reveal outliers — the cases where the ledgers tell different stories.

Datasets
  • imdb.films
  • tmdb.movies
  • 14_two_ledgers.json
Limitations
  • IMDB finances are text fields; currency and estimates can differ.
  • TMDB budgets/revenue are incomplete and not always audited.
  • USD-only is a cleaner slice, not a universal truth.
Next

Want another story? Head back to the film data stories index or explore a new concept.

Back to indexarrow_forward