Labor Markets · February 14, 2026

What Does the Labor Market Look Like from Inside an LLM?

Last year, Seongwoon Kim, Yong-Yeol Ahn, and I published Labor Space at WWW'24 — a paper about using large language models to build a unified representation of the labor market. The idea was straightforward: instead of using job titles or occupational codes to measure similarity between jobs (which are inconsistent and sparse), could we embed the full text of job descriptions into a shared semantic space and reason about skill relationships from there?

The answer was yes. But what I want to write about here is not the method — it's the surprises.

What the model got right that we didn't expect

When you project job descriptions into an LLM's embedding space, you get a map. And like any good map, the most interesting thing about it is not the expected landmarks — it's the unexpected topology. Jobs that we would normally classify as far apart (say, a graphic designer and a UX researcher) turn out to live very close together in skill-space. Jobs that look similar on a resume (two "manager" roles in different industries) can be in completely different neighborhoods.

The model doesn't know what a "manager" is. It knows what managers do — and apparently that differs a lot.

This is more than a curiosity. It suggests that our standard occupational classification systems — O*NET, ISCO, KSOCs — are encoding social categories of work, not functional ones. The LLM, having no prior knowledge of these categories, recovers something closer to the actual skill structure.

What the model got wrong

It also got some things wrong in illuminating ways. The embedding space reflects the text of job descriptions, which reflects how employers talk about work — not necessarily what workers actually do. Jobs in sectors with more elaborate HR writing (tech, finance, consulting) are described with more semantic richness. Jobs in physical labor sectors are often described in sparse, formulaic language that clusters them closer together than they really are.

In other words: the model is a mirror of how the labor market represents itself in language, which is not quite the same thing as the labor market.

Why this matters for policy

If you're building a reskilling program, a job-matching platform, or a labor market forecasting tool, it matters enormously whether your similarity metric reflects actual skill adjacency or just textual convention. Our paper is a step toward the former. But the gap between "better than O*NET" and "actually correct" is worth taking seriously.

I'm continuing to work on this. If you're working on related problems — especially on the policy or measurement side — I'd genuinely love to talk.