Earlier today when I was searching for discussion of the Evolution Robotics deal I found something weird at Technorati. The first page of search results for “ERSP” is filled with German pages that don't contain the term ERSP.
But the posts do contain “erspüren”, “erspäht”, “erspähe”, “erspüre”, “erspähen”, etc.
No, they're not doing substring matching. Yes, they seem to be tokenizing words very poorly for any language that isn't using straight ASCII.
Neverending shoddiness over there.
Later: Ha! Sphere tokenizes in the same wrong way.Posted by jjwiseman at February 14, 2007 12:55 AM