[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [ietf-irnss Home]
Subject: RE: Transport requirements for DNS-like protocols
--On Saturday, June 29, 2002 10:11 AM +0200 Patrik Fältström
<paf@cisco.com> wrote:
> --On 2002-06-28 10.24 -0700 Nicolas Popp <nico@realnames.com>
> wrote:
>
>> As soon as you do fuzzy matching that forces you to retrieve
>> multiple records and rank them, the operational complexity is
>> increased ten-fold (and your query response time becomes way
>> more inpredictable unless you do a few "right things").
>
> Doing fuzzy-matching is most efficiently done by doing a
> calculation of a hash on the search string (something like
> soundex) and then exact mathing in the database.
>
> So, fuzzy-matching is for me just another version of
> "preparation" of the search string.
Patrik,
In a number of areas, matching by distance function --i.e.,
knowing all of the things that might match and determining which
one(s) are closest-- has turned out to be much more useful than
matching on a canonical form. In one of the classic examples,
the first-generation theory of how to do OCR was to try to
standardize ("prepare" in your terminology) characters,
font-independent, down to a common abstraction. Nice idea, but
it basically didn't work. Instead, we now assume (with English)
that a given character has to match one of 62, and make a
tentative decision based on similarity functions. Then we
repeat the process, looking up word-candidates in a dictionary
to see which candidates can be excluded because they are
uncommon in, or absent from, the language.
Sonex/ soundex matches are fuzzy matching, but they are not
fuzzy search; I think that fuzzy search is going to be needed
here.
So, I hope you are right -- it would be a lot easier. But...
john
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [ietf-irnss Home]
Powered by eList eXpress LLC