[KinoSearch] Roadmap .30 and Scorers
Andrew Bramble
bramble.andrew at gmail.com
Tue Jul 22 19:11:51 PDT 2008
Justin ,
Plans ? Plans ? I'm still struggling for ideas :) Don't stop on my account.
I never finish anythi...
KinoSearch was one of several approaches at indexing and slicing CPAN data,
my original hack was not a text based search but rather an index of
meta.yaml information against distributions allowing for queries like
requires:Test::More (you'll never guess the headslap effect of
'set_heed_colons' when doing this with QueryParser) or
requires: Test::More
license: apache
The original naive implementation used Graph and some list utils for
intersections.
To be honest - I'm working more towards a product search engine than a CPAN
index in particular, CPAN data was safe to work on at home ... business
data - not so safe.
What pushed me towards KinoSearch was seeing some results from the
evo.combeta linked from
rectangular.com , evo.com appear to have the functionality I'm thinking of -
where the results have computed 'refinements' for categories like brand that
are presumably document fields. See
http://www.evo.com/search?q=cooking&tag=Lead-Free
On Wed, Jul 23, 2008 at 10:57 AM, Justin DeVuyst <justin at devuyst.com> wrote:
> Hello,
>
> I was playing around with indexing and searching CPAN with KinoSearch
> recently myself. Could you elaborate on what your plans are? I'd
> like to move on to something else if someone else is already doing
> what I would like to see happen.
>
> Basically my goal is to make searchable, in one place, everything
> known about modules on the CPAN. Whether KinoSearch can fit the
> whole bill or just part of the bill I'm still not sure of.
>
> Thanks,
> jdv
>
> Andrew Bramble wrote:
> > Hello,
> >
> > After getting useful results and fast with KinoSearch .20 I began
> > looking at
> > ways to narrow results further using field specific refinements. EG
> > having
> > CPAN metadata indexed and being able to slice into it by a license
> > field.
> > Might it be possible for a Scorer (I think it's a scorer) to compute
> > from
> > within the set of matched results, the total frequency of tokens from
> > a
> > given field. To use the CPAN example again, rather than choosing to
> > search
> > for "date parser" and license:artistic , might the initial search for
> > "date parser" return the matching results AND a structure describing
> > that of
> > 100 matched documents, the field 'license' breaks down to perl=50,
> > artistic=30, gpl=10, bsd=5, apache=5.
> > One could then repeat the original search , adding 'license:perl'
> > to
> > narrow the search to only the 50 matching documents.
> >
> > Since this would required reading/examining each matched record I
> > would
> > guess this belongs in the XS/C rather than perl.
> >
> > Is it wishful thinking ? or might this be possible with subclassable
> > scorers/hit collectors.
> >
> > ++KinoSearch
> >
> > Andrew
> > _______________________________________________
> > KinoSearch mailing list
> > KinoSearch at rectangular.com
> > http://www.rectangular.com/mailman/listinfo/kinosearch
> >
>
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch at rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://rectangular.com/pipermail/kinosearch/attachments/20080723/a6eddd59/attachment-0002.htm
-------------- next part --------------
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list