The Accidental Taxonomist (2010–2012)

Chaun Burnette
May 26
6 min read

Part 2 of The "Quantum" Ontology Chronicles

Several concepts in this series may appear adjacent to existing ideas in ontologies, knowledge graphs, contextual AI, semantic reasoning, and enterprise governance systems. That overlap is intentional. But the core execution model behind “quantum” ontologies is not simply contextual filtering or role-based semantic views.

As the series progresses, I’ll fully unpack the distinction between traditional semantic architectures and runtime semantic collapse/convergence, including why the execution model matters more than the terminology itself.

Abstract editorial illustration showing the transformation of raw, chaotic travel data from multiple sources on the left — overlapping text fragments, price symbols, and inconsistent formats — passing through a geometric classification layer at center, and emerging on the right as a clean, color-coded knowledge graph with interconnected attribute nodes representing cuisine, ambiance, price sensitivity, and weather dependency.

The goal wasn't to invent anything.

The goal was simply to get Excel to automatically ingest dozens of sources, classify hundreds of attributes, across thousands of records. That's all. Seems reasonable, right? This poor attempt of sarcasm means it clearly was a major frustration.

What wasn't visible from that vantage point (hunched over a laptop, writing the fourteenth nested IF statement of the afternoon) was that this particular frustration would turn into something amazing. Something that compounds over the next fourteen years. Growing. Mutating. Becoming something that would eventually be called a "quantum" ontology.

But that's getting ahead of the story.

In 2010, the problem wasn't ontologies. It was travel data. Specifically, it was the question of why a restaurant's "casual dining" classification on TripAdvisor had almost nothing in common with the same phrase used by a food blogger in Charleston.

That gap, even though it seemed small, semantic, and apparently insignificant, turned out to be the first crack in a very large wall.

What Was Actually Being Built

The project was AskCONCIE: a platform built on a single, ambitious premise. Not just personalized travel recommendations. Hyper-personalized recommendations. The kind that understands why a specific rooftop bar is exactly right on a Tuesday in October and completely wrong on a Saturday in July.

For that to work, the platform needed to reason about places, not just retrieve them. And to reason about them, it needed a dataset that was semantically coherent. Every attribute had to mean something consistent. Every relationship between attributes had to be explicit. Every record had to carry enough contextual richness to support a nuanced, real-time inference.

The raw data was everything but that.

Point-of-interest records were coming from Yahoo Travel, TripAdvisor, Yelp, and a network of local publications (food blogs, city guides, independent travel writers). The large aggregator platforms were structured and consistent, but shallow. A restaurant might have a cuisine type, a price range, and a neighborhood tag. That was it. The local publications were the opposite: rich, textured, full of the contextual detail that actually makes a recommendation feel personal, but wildly inconsistent in how they expressed it.

Getting them to speak the same language was the problem.

The tool available was Excel.

So a classification layer took shape inside it. Formula by formula. VB function by VB function. Lookup table by lookup table. The goal was to take that messy, multi-source input and return a single, normalized, confidence-weighted output for each attribute. Something the platform could actually use.

For a restaurant, that meant classifying cuisine type, price range, ambiance, noise level, and dress code. Whether outdoor seating was available, and in what seasons. Neighborhood character, time-of-day relevance, whether the experience skewed local or tourist. And for every classification: how much confidence did the sources actually earn, given where they agreed and disagreed?

Custom VB functions and formulas evaluated multiple source fields simultaneously. They flagged records where two sources conflicted on the same attribute. They applied different classification logic based on the type of POI being evaluated.

At this point, the concept of an ontology was not yet known.

But one was being built.

The Vocabulary Problem

The deeper the classification work went, the more the same wall kept appearing from different directions.

The same word meant different things depending on who wrote it and why.

"Casual dining" on TripAdvisor didn't match "casual dining" in a local food column. "Family-friendly" from a hotel aggregator had almost no relationship to "family-friendly" from a parent reviewing a science museum. "Walkable neighborhood" from a travel influencer was a different concept entirely than the same phrase from a disability travel resource.

To make the classification layer work, term libraries had to be built. Controlled mappings that said: when source A uses this phrase in this context, it means this canonical attribute. The formal term "controlled vocabularies" wasn't known yet. They were just referred to as lookup tables.

But that's what they were.

And underneath the lookup tables, something more important was taking shape: the recognition that meaning doesn't live in the word. It lives in the context surrounding the word.

There wasn't language for that insight yet. In 2010, it just meant the lookup tables kept getting longer.

The Rule That Broke Everything

The attribute that forced the deepest rethinking was "expensive."

The working rule seemed clean: flag any restaurant with an average entrée price above $40 as expensive. Consistent. Scalable. Done.

Then the edge cases arrived.

Three-panel horizontal infographic illustrating why expensive cannot be a universal classification rule: Panel one shows a simple price threshold rule applied universally; Panel two contrasts three user types — a young solo traveler for whom twenty-five dollars is expensive, a business executive for whom forty dollars is entry-level, and a family of four for whom per-person price is the wrong metric entirely; Panel three reframes the platform's role as learning what expensive means to each individual user, with a closing line reading 'This is semantic particularism. We just didn't know what to call it yet.

A 23-year-old solo traveler on a backpacker budget considers $25 expensive. A 46-year-old executive celebrating a client dinner considers $40 approachable. A family of four doesn't care about per-person entrée price, they're calculating a total bill and checking whether there's a kids' menu.

The rule was updated to account for those groups. Then the rule broke again.

Because even within those groups, individuals varied. A 23-year-old on a tech salary in San Francisco has a completely different threshold than a 23-year-old graduate student in Richmond, Virginia. The segment wasn't the answer. The individual in a specific moment, for a specific occasion, with a specific purpose was the only unit that actually mattered.

The platform's responsibility wasn't to apply a universal rule. It was to understand what "expensive" actually meant to this person, on this occasion, in this context. (The mechanisms behind how the platform does that are something we protect. The principle is what matters here.)

What was being built, without yet having a name for it, were increasingly complex conditional structures, not to enforce a classification, but to determine when any given classification actually applied.

Decision tree flowchart titled 'Why Expensive Is Never Just Expensive' showing how a raw data input of average entrée price thirty-eight dollars branches across three user types — young solo traveler, business professional, and family of four — then further branches by occasion context and observed platform signals, converging on a context-specific classification output rather than a universal rule, with a footnote noting that how the platform observes and validates user context is proprietary.

The label "expensive" wasn't wrong. The assumption that it applied consistently across people, contexts, and moments was.

Years later, a philosophical framework would surface that named this precisely: semantic particularism, the recognition that no rule or category applies identically across all situations. Meaning is contextual, not universal. Classification has to respond to the particular, not default to the general.

What the Technical Evolution Actually Looked Like

From the outside, this period looks like "building spreadsheets." From the inside, it was a progressive discovery of what structured semantic reasoning actually requires.

Multi-source normalization came first. Evaluating a field across three or four source columns and returning a single confidence-weighted classification. If TripAdvisor said "Italian" and a local food blog said "Neapolitan," the output wasn't a coin flip. It was a reasoned synthesis with a confidence score attached.

Attribute hierarchies emerged from the normalization work. "Outdoor dining" needed to be understood as a child of "dining," which was a child of "restaurant." Parent-child relationships weren't just organizational, they enabled inheritance. An attribute applied to a parent could propagate to children under the right conditions.

Conditional classification logic followed. VB functions that changed their output based on combinations of attributes, not just single field values. A noise level classification meant something different for a business lunch venue than it did for a first date. The conditions around a classification mattered as much as the classification itself.

Cross-source validation completed the picture. Preserving disagreement as signal rather than discarding it as noise. When two sources conflicted on the same attribute, that conflict carried information. Low confidence on a classification was itself a meaningful data point.

None of this had names yet. But the architecture was real. Something was being built that reasoned about relationships and context, not just rows and columns.

The Scale Wall

By 2011, the Excel file had become something genuinely unwieldy.

Hundreds of formulas. Dozens of VB functions. Thousands of POI records, each classified across scores of attributes, each carrying its source lineage and confidence weight.

Then one morning, Excel 2007 stopped opening it.

Not a crash. Not an error message. Just refusal. The file had accumulated more calculations than Excel 2007 could handle at startup. Upgrading to Excel 2010 solved the problem and bought more runway. But the moment was clarifying in a way that months of compounding complexity had been building toward:

The container had been outgrown.

The classification layer worked. The semantic relationships were real. The logic held. But Excel had made one thing unmistakably clear.

This could not scale.

Where This Led

By the end of 2012, something functional existed. A normalized, semantically structured dataset that could support nuanced, attribute-rich recommendations. What didn't exist was a path forward that wouldn't eventually hit the same wall, just bigger.

Taking AskCONCIE toward what had been envisioned, a platform that could reason about a user's intent in real time, across thousands of locations and millions of data points, required more than a larger container. It required infrastructure that could express semantic relationships natively. Not as an afterthought bolted onto rows and columns.

That meant SQL. Then JSON. And eventually a direct confrontation with what relational databases and document stores could and couldn't do when the thing being modeled isn't just data.

It's meaning.

That confrontation is where the next chapter begins.

Next: Part 3 – The Birth of Hyper-Personalization (2013-2016)

When Excel gave way to SQL and JSON, the semantic problems didn't disappear — they grew.

Previous Post: Part 1 - When Reality Depends on Who's Asking