feat: Add support for ordering by multiple fields. #2681

stuhood · 2025-08-07T19:20:53Z

What

Add a TopDocs::order_by method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the TopOrderable trait. The TopOrderable trait is implemented (by a macro) for tuples of length 1 through 3 (for now).

How

Add:

a TopOrderable trait which is implemented for tuples, and a TopOrderableCollector to collect for it.
a Feature trait which is implemented for Scores, and for fast fields.
- To allow for boxing/dynamic dispatch of Features (which reduces code generation when the sort features are not known until runtime), Arc<dyn Feature> is implemented via ErasedFeature.
a TopNCompare trait which can be used together with a LazyTopNComputer to lazily fetch features during TopN.
- This new interface is necessary because TopNComputer does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier features).
- It could also allow for making DocId/DocAddress tiebreaking optional (see), via something like a "DocIdFeature".

This interface additionally could not use the CustomScorer APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by Strings).

Note

This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!

## What Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now). ## How Add: * a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it. * a `Feature` trait which is implemented for `Score`s, and for fast fields. * To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. * This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns). * It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`". This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s). ## Note This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so! ---- Upstream at quickwit-oss#2681

Uses: * a `TopOrderable` trait which can be derived for tuples * a `TopOrderableCollector` to collect for it. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. Note that this does not use the `CustomScorer` API because: 1. `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which is important when tiebreakers are only rarely actually coming into play in the comparison, and most values are being eliminated by earlier columns. 2. `CustomScoreTopCollector` does not allow segments to Top-N a different type than their final output type, which is essential for ordering by `String`s. 3. In order to include scores as one of the ordering columns, we need to be able to optionally enable scores. 4. The `CustomScoreTopCollector::merge_fruits` function needs to operate over a wrapper type in order to apply different ordering globally than per-segment.

fulmicoton · 2025-08-11T14:37:34Z

I am surprised this requires macros @stuhood. Can we get away with doing only generics?

fulmicoton · 2025-08-11T14:42:02Z

src/collector/top_orderable.rs

+    fn get(
+        &self,
+        column: &FeatureColumn,
+        order: Order,


Suggested change

order: Order,

The comment is wrong. This method is not returning the value for this feature. (depending on the order it can return the opposite).
At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.

At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.

By making it a generic parameter, or as a field?

I think that a bit more performance could be gained by making it a generic parameter (as TopNComputer now does): will give that a shot.

fulmicoton · 2025-08-11T14:42:32Z

src/collector/top_orderable.rs

+    /// NOTE: We don't require a `PartialOrd` bound on the output type in order to make it possible
+    /// to use a boxed type like `OwnedValue` without giving it a `PartialOrd` implementation which
+    /// might be unsafe (i.e.: panicing) in other positions.
+    fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>;


Suggested change

fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>;

fn compare(&self, lhs: &Self::Output, rhs: &Self::Output) -> Option<Ordering>;

fulmicoton · 2025-08-11T14:42:54Z

src/collector/top_orderable.rs

+        }
+    }
+
+    fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering> {


stuhood · 2025-08-12T04:23:22Z

I am surprised this requires macros @stuhood. Can we get away with doing only generics?

I don't think so... AFAIK it is not possible to abstract over the length of a tuple, and doing this without tuples requires that the types be boxed/dynamic or wrapped in enums.

I think that this is similar to the "SegmentCollectors for tuples" pattern over here:

tantivy/src/collector/mod.rs

Lines 291 to 292 in 2e4615c

    
           // ----------------------------------------------- 
        
           // Tuple implementations.

-- except implemented with a macro.

fulmicoton · 2025-09-03T07:56:22Z

@stuhood

I think this is possible:

struct LexicographicComparator<HeadComparator, TailComparator> {
    head: HeadComparator,
    tail: TailComparator,
}

impl<HeadComparator: TopNCompare, TailComparator: TopNCompare> TopNCompare for LexicographicComparator<HeadComparator, TailComparator> {
    type Accepted = (HeadComparator::Accepted, TailComparator::Accepted);

    fn accept(
        &self,
        threshold_value: &Self::Accepted,
        threshold_doc_id: DocId,
        score: Score,
        doc_id: DocId,
    ) -> Option<Self::Accepted> {
        todo!();
    }

    fn get(&self, score: Score, doc_id: DocId) -> Self::Accepted {
        todo!()
    }
}

struct LexicographicOrderable<Head, Tail> {
    head: Head,
    tail: Tail,
}

impl<Head, Tail> TopOrderable for LexicographicOrderable<Head, Tail>
where
    Head: TopOrderable,
    Tail: TopOrderable,
{
    type SegmentOutput = (Head::SegmentOutput, Tail::SegmentOutput);

    type Output = (Head::Output, Tail::Output);

    type SegmentComparator = LexicographicComparator<Head::SegmentComparator, Tail::SegmentComparator>;

    fn requires_scoring(&self) -> bool {
        todo!()
    }

    fn segment_comparator(
        &self,
        segment_reader: &SegmentReader,
    ) -> crate::Result<Self::SegmentComparator> {
        todo!()
    }

    fn feature_columns(
        &self,
        segment_reader: &SegmentReader,
    ) -> crate::Result<Vec<(FeatureColumn, Order)> > {
        todo!()
    }

    fn decode(
        &self,
        features: &Vec<(FeatureColumn, Order)>,
        segment_output: Vec<(Self::SegmentOutput, DocAddress)>,
    ) -> Vec<(Self::Output, DocAddress)> {
        todo!()
    }

    fn compare(&self, a: &(Self::Output, DocAddress), b: &(Self::Output, DocAddress)) -> bool {
        todo!()
    }

}

You could then write an adapter for 1,2,3 features without macros.
The current macro code is too hard for me to understand.

For the collector over tuples. It used to be implemented that way.
After noticing had API stabilized, I ended up ditching and created the current implementation, that doesn't use generics nor macros, and is just verbose.

Adjusts `Feature` to use `Option` where necessary. `StringFeature::SegmentOutput` avoids being wrapped in an `Option` because the `u64::MAX` "niche" is safe to use in that case (since it would require that many terms to trigger a collision). Surprisingly, no performance impact downstream. I additionally needed to adjust the unit tests to: * include a NULL * use a stable id, because for some reason at three segments (but not two?) the fact that segment ords are not stable was exposed.

stuhood · 2025-09-03T18:24:00Z

@stuhood

I think this is possible:
...

Interesting: thanks! Will give that a shot.

In the meantime, I just pushed a change to properly handle NULLs in TopOrderable.

I think that TopDocs::order_by_fast_field and TopDocs::order_by_string_fast_field are not doing the right thing for missing values / NULLs currently: they use a sort_column.first_or_default_col(default_value), but in the case of either 0 or u64::MAX, those values can collide. And in the case of string ordering, u64::MAX will trigger a lookup error.

I've adjusted the TopOrderable interface to encode Option<u64>, and (surprisingly?) it did not make a noticeable difference in Top-N performance.

stuhood · 2025-09-26T19:39:47Z

This has been on the backburner, but I did a bit of work to apply this feedback a few weeks ago, and it looks good: thanks a lot! I'm also going to apply this feedback before posting: #2681 (comment) : sometime in the next few weeks.

Leaving one note for myself about an additional helpful feature here: we'd additionally like the ability to preserve scores during a TopDocs::order_by, without necessarily including them in the feature columns (which would cause them to be included in the comparison). It's relatively minor, so it may be that the answer is to just go ahead and include the feature.

## What Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now). ## How Add: * a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it. * a `Feature` trait which is implemented for `Score`s, and for fast fields. * To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. * This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns). * It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`". This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s). ## Note This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so! ---- Upstream at quickwit-oss#2681

This was referenced Aug 7, 2025

feat: Add support for ordering by multiple fields. paradedb/tantivy#57

Merged

Fix TopNComputer for reverse order #2672

Merged

stuhood force-pushed the stuhood.generic-order-by-upstream branch 2 times, most recently from 407a1d8 to 1c82ef0 Compare August 8, 2025 17:42

stuhood added 2 commits August 8, 2025 14:09

Fix unrelated doc test.

140c506

stuhood force-pushed the stuhood.generic-order-by-upstream branch from 1c82ef0 to 140c506 Compare August 8, 2025 21:09

fulmicoton reviewed Aug 11, 2025

View reviewed changes

src/collector/top_orderable.rs

}

}

fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering> {

Copy link

Collaborator

fulmicoton Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming

stuhood mentioned this pull request Sep 26, 2025

feat: TopN scan can emit scores for any ORDER BY paradedb/paradedb#3230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add support for ordering by multiple fields. #2681

feat: Add support for ordering by multiple fields. #2681

Uh oh!

stuhood commented Aug 7, 2025 •

edited

Loading

Uh oh!

fulmicoton commented Aug 11, 2025

Uh oh!

fulmicoton Aug 11, 2025

Uh oh!

stuhood Aug 12, 2025

Uh oh!

fulmicoton Aug 11, 2025

Uh oh!

fulmicoton Aug 11, 2025

Uh oh!

stuhood commented Aug 12, 2025 •

edited

Loading

Uh oh!

fulmicoton commented Sep 3, 2025 •

edited

Loading

Uh oh!

stuhood commented Sep 3, 2025

Uh oh!

stuhood commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>;
	fn compare(&self, lhs: &Self::Output, rhs: &Self::Output) -> Option<Ordering>;

Uh oh!

feat: Add support for ordering by multiple fields. #2681

Are you sure you want to change the base?

feat: Add support for ordering by multiple fields. #2681

Uh oh!

Conversation

stuhood commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Note

Uh oh!

fulmicoton commented Aug 11, 2025

Uh oh!

fulmicoton Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

fulmicoton Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

fulmicoton Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fulmicoton commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuhood commented Sep 3, 2025

Uh oh!

stuhood commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stuhood commented Aug 7, 2025 •

edited

Loading

stuhood commented Aug 12, 2025 •

edited

Loading

fulmicoton commented Sep 3, 2025 •

edited

Loading