Marketplaces10 min

Search and matching algorithms for marketplaces: ranking, filtering, personalization

How production marketplaces design search — query understanding, filters, ranking signals, learning-to-rank, and the A/B testing discipline that separates winners from guesswork.

Marketplace search is not e-commerce search wearing a different hat. The candidate pool is two-sided, inventory is often perishable, supply moves, prices float, and every ranked list has to balance relevance against liquidity. Airbnb optimizes for booking probability. Thumbtack ranks pros by the odds they will respond and the customer will hire. DoorDash ranks restaurants by the intersection of user preference and the drivers available in the next twelve minutes. The underlying pattern is the same: rank by the probability of a successful transaction, not by keyword match. Here is how production teams actually build that system.

Query understanding comes before ranking

A ranker cannot save a bad candidate set. The first job of marketplace search is turning a messy query — often a natural-language string on mobile — into a structured filter intent that retrieval can act on. That means tokenization, spell correction, synonym expansion, and entity extraction: locations, dates, categories, price bands. In 2026, more teams route query understanding through a small LLM for intent classification and slot filling, then hand the structured output to a conventional retrieval layer. The LLM is not ranking results; it is deciding whether "quiet studio near downtown Austin with parking" means a location filter plus an amenity filter plus a latent vibe signal.

Real-time eligibility is the quiet killer here. Delivery zones, geo-dynamic availability, category gating, and regional pricing all have to be evaluated per query, per user — not baked into a static index. Most marketplaces end up with a two-stage system: a fast candidate retrieval that fetches a few thousand plausible matches, then a heavier ranker that re-scores the top slice with real-time signals.

Filters and sort orders — the user-facing surface

Filters are deterministic. Sort orders are opinionated. The biggest mistake teams make is hiding the default sort order behind a label like "Recommended" without logging what users do when they switch away from it. That one click — the sort dropdown change — is the single most valuable ranking signal you can instrument, because it is a clean declaration of user dissatisfaction with your default.

  • Filters should be stateless and composable. Avoid filter logic that implicitly depends on sort order or on other filters being set.
  • Sort orders should be named for the user's goal ("Lowest price", "Fastest delivery"), not the internal signal ("By score_v3").
  • Always log the raw query, the applied filters, the active sort, the ranked list shown, and what the user did next. That tuple is the training data for everything downstream.

Ranking signals — what actually moves the needle

Every mature marketplace converges on a similar palette of signals. What changes is the weight. Airbnb has publicly described using hundreds of signals clustered around booking probability. Thumbtack moved from a hand-tuned heuristic to a machine-learned ranker and was transparent about the categories that mattered most. The table below is the shortlist we start with on every new marketplace engagement — it covers the 80% of ranking value before any fine-tuning.

Signal categoryExamplesWhy it matters
QualityAverage rating, review recency, completion rate, response timeSingle strongest predictor of whether the match will convert and repeat
RecencyLast active, new-listing boost, last review within N daysStale supply depresses the whole index; freshness is a liquidity signal
GeographyHaversine distance, delivery-zone polygon, local densityProximity is almost always a non-negotiable filter and a ranking tiebreaker
Price fitPrice vs. category median, user's implied budget, dynamic pricingPrice outliers convert poorly even when quality is high
BehavioralCTR, favorites, past bookings, dwell time on detail pagesStrongest personalization signal once you have 10+ sessions per user
Supply liquidityProvider capacity, calendar availability, acceptance rateRanking a booked-out provider first wastes the slot; acceptance is half the transaction
TrustVerification status, tenure on platform, dispute rateDrives long-term retention even when short-term CTR is neutral

A starter scoring formula

Before investing in learning-to-rank infrastructure, most marketplaces ship a linear or log-linear scoring function hand-tuned from the signals above. It is ugly, it will get replaced, and it is almost always the right first step because it gives you a transparent baseline to A/B against.

// A first-cut marketplace ranker. Replace with LTR once you have training data.
type Candidate = {
  id: string;
  rating: number;          // 0..5
  reviewsRecent90d: number;
  distanceKm: number;
  priceVsMedian: number;   // 0.0 = median, negative = cheaper
  acceptanceRate: number;  // 0..1
  isVerified: boolean;
  lastActiveDays: number;
};

const W = {
  quality: 0.30,
  recency: 0.15,
  distance: 0.20,
  priceFit: 0.10,
  liquidity: 0.15,
  trust: 0.10,
};

function score(c: Candidate, userLat: number, userLng: number): number {
  const quality = (c.rating / 5) * Math.min(1, c.reviewsRecent90d / 10);
  const recency = Math.exp(-c.lastActiveDays / 14);
  const distance = Math.exp(-c.distanceKm / 5);
  const priceFit = Math.exp(-Math.abs(c.priceVsMedian));
  const liquidity = c.acceptanceRate;
  const trust = c.isVerified ? 1 : 0.6;

  return (
    W.quality * quality +
    W.recency * recency +
    W.distance * distance +
    W.priceFit * priceFit +
    W.liquidity * liquidity +
    W.trust * trust
  );
}

Cold-start users have no behavioral signal, so personalization silently collapses to a global popularity ranker. That is a failure mode, not a feature. Give new users an onboarding signal (category, location, price band) within the first session and fall back to geographically-filtered category leaders — never a global bestseller list for a two-sided marketplace.

From hand-tuned weights to learning-to-rank

The moment a hand-tuned scorer starts producing counterintuitive rankings in one vertical while looking great in another, it is time to train. Learning-to-rank (LTR) approaches like LambdaMART, XGBoost with pairwise loss, or a neural ranker trained on click-and-convert data will outperform a tuned linear combination once you have enough labeled interactions — usually in the hundreds of thousands per category.

  • Pointwise loss (predict booking probability per item) is simplest to train and explain, and usually the first rung.
  • Pairwise loss (given two items, which will the user pick?) reflects ranking directly and tends to outperform pointwise on ranking metrics like NDCG.
  • Listwise loss is the theoretically right answer for ranked lists but is rarely worth the added complexity below search-engine scale.
  • Train on transactions, not clicks. Clicks correlate with relevance but also with position bias — items near the top get clicked because they are near the top.

Personalization without the creep factor

Personalization is the highest-leverage lever once a marketplace has density. It is also the easiest place to break user trust. The pattern that works in production separates query-item relevance from user-item preference as two distinct signals, then combines them in the ranker. Personalization that silently overrides an explicit filter ("you asked for under $50 but we showed you $80 because you usually click premium") generates support tickets. Personalization that re-orders within a relevant set ("you prefer modern decor, so we surfaced the modern studios first") feels magical.

Embedding-based retrieval is now standard for medium-to-large marketplaces. A two-tower model trains a user encoder and an item encoder so that a dot product approximates booking probability, and the item tower outputs are indexed in a vector database for millisecond retrieval. This does not replace the structured filter layer — it augments it for the long tail where keyword matching falls short.

A/B testing is the only honest feedback loop

Every ranking change feels like an improvement to the person who made it. The only way to know is an A/B test, and the only way to run A/B tests at marketplace scale without losing your statistical power is to get the experiment design right.

  • Randomize at the user level, not the session level. Session-level randomization contaminates retention metrics.
  • Define the primary metric before the experiment starts and pre-register the effect size you consider significant. Peeking at dashboards midway is how teams accidentally ship harmful changes.
  • Interleaving beats classic A/B for ranking experiments when you can implement it — Thumbtack published a strong writeup on this. A single user sees a merged list from two rankers and clicks reveal preference directly.
  • Watch for supply-side effects. A ranker that boosts conversion for buyers but starves a segment of sellers will destroy liquidity on a timescale longer than your experiment window.

Guardrail metrics matter as much as primary metrics. A ranking change that lifts bookings 3% while cutting provider earnings variance in half is a different product than one that lifts bookings 3% by concentrating volume on the top 5% of sellers. Both ship as wins; only one keeps the marketplace healthy.

Common failure modes

  • Over-personalization — the marketplace collapses into a filter bubble and new supply never surfaces, starving the long tail.
  • Under-logging — teams try to retrofit ranking intelligence onto a system that never logged impressions, which makes every analysis a guess.
  • Heuristic sprawl — the hand-tuned scorer grows 40 signals, nobody remembers what half of them do, and regressions become untraceable.
  • Ignoring the supply side — rankers optimized purely for buyer conversion slowly hollow out provider retention, and the marketplace ages out.

Key takeaways

  • Rank by probability of a successful transaction, not keyword match. Everything else is implementation detail.
  • Start with a transparent hand-tuned scorer over six or seven signals — quality, recency, distance, price fit, liquidity, trust, behavior — before investing in LTR.
  • Handle cold-start users with explicit onboarding signals and category-level fallbacks. Do not paper over the gap with global popularity.
  • Separate query-item relevance from user-item preference so personalization augments filtering instead of silently overriding it.
  • Every ranking change goes behind an A/B test with a pre-registered primary metric and supply-side guardrails. The marketplace has two sides; the metrics do too.
#marketplaces#search#ranking#learning-to-rank#personalization#ab-testing
Working on something similar?

Let's build it together.

We ship production SaaS, marketplaces, and web apps. If you want an engineering partner — not a consultancy — let's talk.