SEO and GEO for a Personal Site

I rebuilt this site as a small Next.js + MDX app and shipped it without thinking much about discovery. It loaded fast, it looked the way I wanted, and that was the bar. Then I asked the obvious next question: when someone searches "Bingran You" — on Google, on Bing, inside ChatGPT, inside Claude — what shows up?

The answer turned out to be "not much." So I spent a weekend doing the boring, invisible work that makes a personal site legible to both classical search and the new generation of LLM-driven answer engines. None of it changed how the site looks. All of it changed how the site is parsed.

This post is the field notes.

Two audiences, same plumbing

The split today is roughly:

Classical search (Google, Bing) reads HTML, follows links, ranks pages.
Generative engines (ChatGPT search, Claude with browsing, Perplexity, Cursor, Phind) consume the same web but with very different priorities. They love structured data, clean prose, machine-readable indexes, and clear entity signals. They tolerate messy HTML far less than Googlebot does.

The good news: the moves that help the first audience also help the second. The bad news: the visible moves (rewriting copy, redesigning) don't help much. The real wins are below the fold.

What I changed

1. Verify ownership in Google Search Console and Bing Webmaster

Both let you submit a sitemap, watch indexing status, and see what queries you actually rank for. Bing matters specifically because ChatGPT search, Copilot, and DuckDuckGo all index through it. I verified the apex via DNS TXT for Google, then dropped a BingSiteAuth.xml into public/.

2. One host of record

The site was reachable on both bingranyou.com and www.bingranyou.com. Google sees that as two sites and splits ranking signal between them. Fix: a permanent (308) redirect from www to apex via Next.js redirects() with a host matcher, plus an explicit metadata.alternates.canonical on every route. From now on there is exactly one URL Google can call canonical.

3. Per-entity structured data

The layout already had a Person and a WebSite JSON-LD block. I added per-route entities:

Each paper as ScholarlyArticle, with isPartOf pointing to the venue and a sameAs link to arXiv or the journal DOI.
Each project as SoftwareSourceCode, with codeRepository set when it lives on GitHub.
Each blog post as BlogPosting with datePublished, dateModified, and a canonical mainEntityOfPage.

This is the difference between Google parsing your /papers page as "a list of links" versus parsing it as "five publications, each with an author, venue, and abstract." The second produces rich results. The first produces a blue link.

4. Dynamic Open Graph images

Every route segment now ships an opengraph-image.tsx that renders a 1200×630 PNG at build time via next/og. Cream paper, serif display title, mono wordmark — same vocabulary as the site itself. Slack, X, LinkedIn, iMessage, and most LLM previews now show a real card instead of a placeholder.

5. `/llms.txt` and `/llms-full.txt`

The llmstxt.org convention is to expose two markdown files at the root: a short navigational index (/llms.txt) and a full-text bundle (/llms-full.txt). LLM crawlers actively look for them — they're the GEO equivalent of sitemap.xml. Mine include identity, social and scholarly profiles, the paper list, the project list, and (for the full version) the body of every blog post, read straight from the MDX source.

6. A real `/about` page

This is the only visible addition. It's a list-and-divider page, same vocabulary as the rest of the site, but every paragraph is a first-person factual sentence: I am a PhD candidate at UC Berkeley. I work in the Haeffner Lab. I do X and Y. LLMs ground entity queries on dense factual prose. A hero with a poetic tagline is fine for humans; an /about page with claims a model can lift verbatim is what answers questions like "who is Bingran You?"

The page also embeds ProfilePage schema with a mainEntity Person carrying jobTitle, affiliation, knowsAbout, and the full sameAs profile set.

7. A Wikidata item

The single highest-leverage off-site thing for entity recognition is a Wikidata item. Wikidata is the structured-data layer behind Google's Knowledge Graph and a major grounding source for LLM entity tables — once an item exists and links back to the canonical site, "Bingran You" stops being an ambiguous string and becomes a node every system can collapse to.

I created Q139620371 and added the statements that actually do work:

instance of → human. Without this one, patrollers can flag the item as missing a notability claim and propose deletion within 24–72 hours.
ORCID iD, Google Scholar author ID, official website. These are the four claims that, together, are enough for most search systems to merge profiles across the web.
occupation (researcher, scientist), field of work (artificial intelligence), educated at (UC Berkeley). The "what does this person do" surface.
GitHub, LinkedIn, X (Twitter) identifiers. Closing the loop with the platforms.

Every statement has a reference URL pointing at bingranyou.com/about or another authoritative source — referenced statements are dramatically less likely to be removed.

The site then declares the Wikidata QID outward in Person.sameAs, so both sides point at each other. That bidirectional reference is what Knowledge Graph and most LLM pipelines look for when deciding whether two mentions of "Bingran You" are the same entity.

A note on doing this with an agent: writing to a public, indexed knowledge base is the kind of action that should not be fully automated under a generic "I authorize you to operate the browser" instruction. The harness I was using blocked the actual publish clicks, even after explicit verbal approval — which is the right default. I filled the property and value fields automatically, then clicked publish by hand. About five minutes of clicks, and very much worth it.

What I deliberately didn't do

Keyword-stuffed copy. Useless for LLMs, embarrassing on a personal site.
AI-generated filler posts. Worse than no content; both Google and LLMs are increasingly hostile to it.
Tracking and analytics noise. A personal site doesn't need a heatmap.
Visual changes. The constraint was "keep the simple, elegant style." Almost everything above is invisible to a human visitor — it's all in the head, the meta tags, the structured data, the off-page assets.

What's left

The remaining levers are external and slow:

Backlink closure — making sure GitHub, X, LinkedIn, ORCID, and arXiv submissions all point to bingranyou.com. The site already declares sameAs outward; now the platforms need to point inward.
Per-paper Wikidata items — each publication can be its own item with author pointing back at me. SourceMD automates this from a DOI.
Writing posts that are actually worth indexing. The only lever that compounds.

— Bingran