[KinoSearch] passing positions

Nathan Kurz nate at verse.com
Thu Sep 6 00:17:47 PDT 2007



After sending you this message, I worked on this particular approach
enough to convince myself that it was a dead end.  I've move on to
another approach that I hope you'll like better, and while I don't
have it working yet I think I'm convinced it can be made to work.
There is still a Match structure, but it doesn't nest.  Instead, the
existing Scorer hierarchy is used.  I'll send you more details in a
followup message, and respond more generally here.

> FWIW, I've written up the naming principles I follow as a blog entry:
> <http://www.rectangular.com/blog/my_name_is_variable.html>.

Yes, I can agree with most of that.  Personally, I have less fear of
long identifiers and find 'string_compare' to be clearer to purpose
than 'scomp'.  :)

> One of your inventions is Scorer_Advance.  I like it as a substitute
> for Scorer_Next, and it might be worth a global search and replace
> since that method isn't public yet. :)  However, in your code it
> appears to be a substitute for Scorer_Skip_To.

I'm hoping to collapse those two down to a single function.
Currently, I'm thinking that function is Scorer_Match(), to emphasize
that the contents of the Match struct are available only until the
next such call, in a manner parallel to Scorer_Tally().

>  From a DRY standpoint, it would be nice to have a single
> PhraseScorer working over sub-scorers rather than having one which
> uses sub-Scorers and one which uses PostingLists.

Yes, I think that PhraseScorer should use a subscorer and not PostingLists.
That said, it may be simpler to restrict complexity of that subscorer
at least temporarily so that we don't have to start with a fully
recursive phrase scorer.

Something like allowing:
PhraseScorer -> AndScorer -> [TermScorer TermScorer TermScorer]
and not yet handling:
PhraseScorer -> AndScorer -.> [OrScorer PhraseScorer AndScorer]

> I think similar reasoning led you to Match and me to Tally.

Well, that and the hope that if I paralleled Match and Tally you'd
like the idea better :).

> > The trickiness (and I don't like trickiness) is that each Match is
> > allowed to contain either an array of positions, or an array of Match
> > structs:
>
> I doubt that's necessary.  Just create a default wrapper at the
> lowest level.  That's how TermScorer does things presently.

I fear the trickiness is still necessary at some level, but I think
I've managed to hide it in a place you'll like better.  Essentially,
I'm going to propose two main subclasses for Scorer, MultiScorer and
MatchScorer.  MultiScorer's contain a public VArray of other Scorer's,
while MatchScorer's contain a public Match struct.

> This variable name violates my "avoid overload overload" rule. :)
> "field" has a very specific meaning in the context of KS and this
> isn't it.

I agree with you in general, but I thought this was the specific
meaning.   It's removed from Match in my new incarnation, but would
would you prefer it to be called:  'index_field', 'field_num'?

> I think we can avoid this union.  See below.

Yes, it's gone in current incarnation.  Unfortunately, what it
switches to is a run-time type check for OBJ_IS_A(MultiScorer).

> This was the driving factor behind the ScoreProx class.

I've forgotten the details, but I came to the conclusion that
ScoreProx was at odds with Rich Positions, and that to allow a
Proximity type scorer to use Positions specific weights some wider
interface was needed.

> A better name for the ScoreProx class would be appreciated.  :)  It's
> the worst class name in KS, and the "num_sproxen" member var in Tally
> is the worst member var name.

:)

> Collation of positions gets complicated when these scorers are nested.

It's possible we are defining terms differently here, but my current
plan is that there never will be any collation.   Instead, the
MultiScorer's (AndScorer, OrScorer) will allow their children's Match
structs to be accessed directly.  I tried to pursue collation at one
point, and gave up: positions from multiple fields, phrases of
different lengths. On the bright side, direct access is very
efficient!

More tomorrow about what I'm currently aiming for.   It's still rough,
but I think you'll like it better than the initial proposal.

Nathan Kurz
nate at verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list