-
-
Notifications
You must be signed in to change notification settings - Fork 818
feat: Add support for ordering by multiple fields. #2681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add support for ordering by multiple fields. #2681
Conversation
407a1d8 to
1c82ef0
Compare
## What
Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now).
## How
Add:
* a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it.
* a `Feature` trait which is implemented for `Score`s, and for fast fields.
* To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`.
* a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN.
* This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns).
* It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`".
This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s).
## Note
This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!
----
Upstream at quickwit-oss#2681
Uses: * a `TopOrderable` trait which can be derived for tuples * a `TopOrderableCollector` to collect for it. * a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN. Note that this does not use the `CustomScorer` API because: 1. `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which is important when tiebreakers are only rarely actually coming into play in the comparison, and most values are being eliminated by earlier columns. 2. `CustomScoreTopCollector` does not allow segments to Top-N a different type than their final output type, which is essential for ordering by `String`s. 3. In order to include scores as one of the ordering columns, we need to be able to optionally enable scores. 4. The `CustomScoreTopCollector::merge_fruits` function needs to operate over a wrapper type in order to apply different ordering globally than per-segment.
1c82ef0 to
140c506
Compare
|
I am surprised this requires macros @stuhood. Can we get away with doing only generics? |
| fn get( | ||
| &self, | ||
| column: &FeatureColumn, | ||
| order: Order, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| order: Order, |
The comment is wrong. This method is not returning the value for this feature. (depending on the order it can return the opposite).
At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I don't think this is a good idea to have order. We could integrate the order within the Feature trait.
By making it a generic parameter, or as a field?
I think that a bit more performance could be gained by making it a generic parameter (as TopNComputer now does): will give that a shot.
| /// NOTE: We don't require a `PartialOrd` bound on the output type in order to make it possible | ||
| /// to use a boxed type like `OwnedValue` without giving it a `PartialOrd` implementation which | ||
| /// might be unsafe (i.e.: panicing) in other positions. | ||
| fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering>; | |
| fn compare(&self, lhs: &Self::Output, rhs: &Self::Output) -> Option<Ordering>; |
| } | ||
| } | ||
|
|
||
| fn compare(&self, a: &Self::Output, b: &Self::Output) -> Option<Ordering> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming
I don't think so... AFAIK it is not possible to abstract over the length of a tuple, and doing this without tuples requires that the types be boxed/dynamic or wrapped in enums. I think that this is similar to the " Lines 291 to 292 in 2e4615c
|
|
I think this is possible: You could then write an adapter for 1,2,3 features without macros. For the collector over tuples. It used to be implemented that way. |
Adjusts `Feature` to use `Option` where necessary. `StringFeature::SegmentOutput` avoids being wrapped in an `Option` because the `u64::MAX` "niche" is safe to use in that case (since it would require that many terms to trigger a collision). Surprisingly, no performance impact downstream. I additionally needed to adjust the unit tests to: * include a NULL * use a stable id, because for some reason at three segments (but not two?) the fact that segment ords are not stable was exposed.
Interesting: thanks! Will give that a shot. In the meantime, I just pushed a change to properly handle NULLs in I think that I've adjusted the |
|
This has been on the backburner, but I did a bit of work to apply this feedback a few weeks ago, and it looks good: thanks a lot! I'm also going to apply this feedback before posting: #2681 (comment) : sometime in the next few weeks. Leaving one note for myself about an additional helpful feature here: we'd additionally like the ability to preserve scores during a |
## What
Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now).
## How
Add:
* a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it.
* a `Feature` trait which is implemented for `Score`s, and for fast fields.
* To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`.
* a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN.
* This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns).
* It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`".
This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s).
## Note
This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!
----
Upstream at quickwit-oss#2681
## What
Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now).
## How
Add:
* a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it.
* a `Feature` trait which is implemented for `Score`s, and for fast fields.
* To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`.
* a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN.
* This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns).
* It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`".
This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s).
## Note
This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!
----
Upstream at quickwit-oss#2681
## What
Add a `TopDocs::order_by` method, which supports ordering by multiple fast fields and scores in one collection pass, as defined by the `TopOrderable` trait. The `TopOrderable` trait is implemented (by a macro) for tuples of length 1 through 3 (for now).
## How
Add:
* a `TopOrderable` trait which is implemented for tuples, and a `TopOrderableCollector` to collect for it.
* a `Feature` trait which is implemented for `Score`s, and for fast fields.
* To allow for boxing/dynamic dispatch of `Features` (which reduces code generation when the sort columns are not known until runtime), `Arc<dyn Feature>` is implemented via `ErasedFeature`.
* a `TopNCompare` trait which can be used together with a `LazyTopNComputer` to lazily fetch columns during TopN.
* This new interface is necessary because `TopNComputer` does not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier columns).
* It could also allow for making `DocId`/`DocAddress` tiebreaking optional ([see](quickwit-oss#2672 (comment))), via something like a "`DocIdFeature`".
This interface additionally could not use the `CustomScorer` APIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering by `String`s).
## Note
This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!
----
Upstream at quickwit-oss#2681
What
Add a
TopDocs::order_bymethod, which supports ordering by multiple fast fields and scores in one collection pass, as defined by theTopOrderabletrait. TheTopOrderabletrait is implemented (by a macro) for tuples of length 1 through 3 (for now).How
Add:
TopOrderabletrait which is implemented for tuples, and aTopOrderableCollectorto collect for it.Featuretrait which is implemented forScores, and for fast fields.Features(which reduces code generation when the sort features are not known until runtime),Arc<dyn Feature>is implemented viaErasedFeature.TopNComparetrait which can be used together with aLazyTopNComputerto lazily fetch features during TopN.TopNComputerdoes not allow for lazily fetching additional fields for the comparison tuple, which can eliminate a lot of IO when tiebreakers are only rarely actually coming into play in the comparison (because most values are being eliminated by earlier features).DocId/DocAddresstiebreaking optional (see), via something like a "DocIdFeature".This interface additionally could not use the
CustomScorerAPIs because it does not allow segments to Top-N a different type than their final output type (which is essential for ordering byStrings).Note
This patch isolates everything to one module, but should almost certainly be split up into multiple modules, and better integrated with the existing modules. I was hoping to get some feedback on it before rearranging things, but I'm very happy to do so!