SeenRank Blog
The Princeton GEO study, explained for marketers
Updated 2026-05-13. By the SeenRank team.
Short answer: in 2024, researchers from Princeton, Georgia Tech, Allen Institute for AI, and IIT Delhi published "GEO: Generative Engine Optimization" at KDD, the first peer-reviewed quantification of what increases brand citation rate inside generative AI engines. They tested 9 content interventions; 7 produced measurable lift, 2 were neutral or negative. The largest lever: adding statistics to a page (+41% citation rate). Second largest: direct quotes from named sources (+28%). Two techniques actively hurt: keyword stuffing and excessive meta-tag manipulation. This is the closest thing the GEO field has to an empirical foundation in 2026.
What the study was, and why it matters
The paper is "GEO: Generative Engine Optimization" by Pranjal Aggarwal et al., published at the KDD 2024 conference (ACM SIGKDD Conference on Knowledge Discovery and Data Mining). The authors built an evaluation framework called GEO-Bench that simulates how generative engines select content to cite, then tested 9 different content optimization techniques against it. They measured each technique’s effect on two metrics: citation rate (how often the engine cites the page) and subjective impression (how favorably the cited page is framed).
Why it matters: until this study, GEO was vibes and anecdote. Practitioners traded tactics on Twitter and in private Slacks but nobody had measured anything. The Princeton paper is the first rigorous baseline. Almost every credible GEO playbook published since then traces its quantitative claims back to this study.
The 9 techniques they tested
Each technique was applied to a base set of pages and then evaluated by the GEO-Bench framework. The 9, in the order they appeared in the paper:
- Authoritative – rewrite content in an authoritative, persuasive tone.
- Citing Sources – add inline citations to authoritative external sources.
- Easy-to-Understand – simplify complex sentences, make structure scannable.
- Fluency Optimization – improve readability and flow without changing the content.
- Keyword Stuffing – add target keywords aggressively throughout the body.
- Quotation Addition – insert direct quotations from named experts.
- Statistics Addition – embed quantitative data points (industry numbers, percentages, hard metrics).
- Technical Terms – introduce or clarify field-specific technical terminology.
- Unique Words – replace generic phrasing with more distinctive vocabulary.
The results, ranked by measured citation lift
| Technique | Citation rate lift | Read this as |
|---|---|---|
| Statistics Addition | +41% | The single largest measured lever. Hard data with citations. |
| Quotation Addition | +28% | Direct quotes from named experts. Yourself counts if you have credentials. |
| Citing Sources | Large positive (unquantified single-number) | Link out to authoritative external sources. |
| Authoritative tone | Medium positive | Write with confidence. Hedging language hurts. |
| Fluency Optimization | Medium positive | Long, clear sentences over fragmented bullets. |
| Easy-to-Understand | Medium positive | Scannable hierarchy with descriptive headers. |
| Technical Terms | Small positive | Clarify or coin technical terminology where useful. |
| Unique Words | Neutral | Distinctive vocabulary did not measurably help in the study. |
| Keyword Stuffing | Negative | Counterproductive. Hurt citation rate measurably. |
The headline numbers everyone quotes from this study are the Statistics Addition +41% and Quotation Addition +28%. Those are the two with clean single-number lifts that survived the paper’s statistical significance tests cleanly. The others are directionally documented but reported as ranges or qualitative effects.
The methodology, briefly
Three pieces are worth understanding so you know what the numbers do and don’t mean:
1. The evaluation set: GEO-Bench
The authors built a benchmark of 10,000 queries spanning multiple domains (consumer products, B2B, finance, health, education, etc.). For each query, they curated a candidate set of pages a generative engine might cite. Then they ran the candidate set through a simulated generative engine and recorded which pages got cited.
2. The treatment: 9 content interventions applied to candidate pages
For each candidate page, they generated 9 alternative versions, one per intervention technique. Then they re-ran the engine and measured whether the intervention version got cited more or less often than the original.
3. The engines they tested against
The paper primarily used GPT-3.5 and GPT-4-class models with retrieval augmentation as the test bed. Subsequent independent studies have observed broadly similar effects on Claude, Perplexity, and Google AI Overview, but the original paper’s strongest claims are most cleanly anchored to GPT-class behavior.
The caveats every honest practitioner should mention
The study is the best empirical anchor we have, but it has real limits. Marketers who quote the +41% number without these caveats are either rounding for brevity or hand-waving.
- One paper, one team. The numbers haven’t been independently replicated at the scale of the original study. They have been broadly directionally confirmed by independent observation, but treat the specific percentages as best-available estimates, not settled science.
- Simulated generative engine, not live production. GEO-Bench is a benchmark. Real ChatGPT, real Claude, real Perplexity, real Google AI Overview have additional ranking signals (freshness, brand authority, click-through patterns) that GEO-Bench doesn’t fully capture.
- The interventions were applied in isolation. The +41% lift is measured when statistics are added to a page that didn’t have any. The marginal lift of adding statistics to a page that already has them is smaller. The lifts don’t cleanly stack to 100%+ total.
- 2024 cutoff. AI engines have updated several times since the study was published. The relative ranking of techniques is likely durable; the absolute numbers may have shifted.
- "Citation rate" measured at the page level. The study measures whether a specific page gets cited, not whether a brand’s overall visibility improves across queries. The two are related but not identical.
With those caveats in mind: this is still the single best foundation for GEO content strategy in 2026. The directional claims (statistics work, quotes work, keyword stuffing hurts) are about as well-established as anything in this young field.
What to actually do with the results
The study points at five concrete actions, ordered by leverage:
1. Add 2-3 statistics with cited sources to every important page
The +41% lever. Pick a high-leverage page (homepage, pricing, comparison, FAQ, top blog post). Find or generate 2-3 hard quantitative data points relevant to the topic. Cite each one to an authoritative external source. Distribute them across the body, not bunched at the end.
2. Add at least one named-expert quote per page
The +28% lever. A direct quotation from a named expert (with credentials in the topic) makes the page more extractable. Yourself counts if you have credentials. Use full attribution: name, title, where the quote came from.
3. Link out to 2-3 authoritative sources
The Citing Sources technique scored large positive lift. Link out to peer-reviewed studies, government data, recognized industry leaders. Don’t be afraid to link out; AI engines reward credibility signals.
4. Write with authority, not hedging
The Authoritative tone technique scored medium positive lift. Replace "this might help" with "this helps". Replace "many experts believe" with "[Specific named expert] argues that". Confident, direct prose extracts cleanly.
5. Strip keyword stuffing from your content
The one technique that actively hurt. If your top pages have unnatural keyword density, fix that first. Once. Then forget about classical keyword optimization for GEO purposes.
For a full per-page audit list see the 2026 GEO checklist. For the broader playbook see the AI Search Visibility 2026 guide.
Start by checking your baseline
Before you start applying the Princeton interventions, run a baseline check. Knowing where you currently stand makes it possible to measure whether the interventions worked. The free SeenRank check gives you that baseline in 30 seconds for ChatGPT.
FAQ
Has the Princeton GEO study been replicated?
Not at the scale of the original. Multiple independent practitioner studies have observed broadly similar effects on production engines (ChatGPT, Claude, Perplexity, Google AI Overview). The directional claims (statistics help, quotes help, keyword stuffing hurts) are well-established. The specific percentages are best-available estimates, not settled science.
Does the +41% lift apply to all engines equally?
Probably not equally, but in the same direction. The original study was anchored to GPT-class models. Independent observation suggests Perplexity weights freshness and citations more heavily, Claude weights original analysis more heavily, Google AI Overview weights organic rank more heavily. The Statistics Addition lever appears to work on all four; the exact magnitude likely varies.
If I add statistics, will I see results in a week?
Maybe. Web-layer citations (Perplexity, ChatGPT with Search, Google AI Overview) refresh in 3-5 business days, so a substantive content fix can show up in citations within a week. Brand-level visibility shifts take longer. See how often AI engines update what they know about your brand.
How many statistics is too many?
The study doesn’t bound the upper end. Practitioner observation suggests 2-3 well-placed statistics per page (one every ~300-500 words of body content) is the right density. Cramming in 20 statistics on a page reads like a content farm and doesn’t produce additional lift.
Where can I read the original paper?
The paper title is "GEO: Generative Engine Optimization" by Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik R Narasimhan, and Ameet Deshpande. Published at KDD 2024 (Aug 2024). Search the paper title in Google Scholar for the PDF and the project page; the team has also released GEO-Bench publicly for replication.
Run a free SeenRank check now →
Related: What is generative engine optimization? · The 2026 GEO checklist · AI Search Visibility: the 2026 guide