This page has been moved to http://xesam.org/main/XesamUserSearchLanguage
Xesam End User Search Language
This is not a final spec, only a proposal
Related pages: XesamQueryLanguage, XesamSearchLive, XesamRoadMap, XesamUI, XesamAbout
Introduction
This is a proposal for an end-user search language for desktop applications, not a full fledged query language. Also since this language is targeting end users it should be kept as simple as possible.
It is a deliberate approach not to allow nested queries such as hello and (world or internet). I claim that it is a very limited user base that would ever dream of doing this. Also - there's nothing in the spec preventing some search engines to support it. Usability studies support this. Users simply start a new query instead of adding logical operations.
It is designed as an extended synthesis of Apple's spotlight and Google's search languages.
Goals
Any Google-like search should do the expected
- Be easy and simple to use
Easy to parse for search engines - don't cater for complex queries (hence sub-queries/braces are left out)
- Let different search engines expose their unique features without breaking the standard
Easy embedding in full query XML, see XesamQueryLanguage
Specification
The basic definition can be outlined like this (details follow) (svg source):
Word, Select, and Phrase are commonly referred to as Terms.
If multiple terms are provided the default operation is to AND them together.
Old versions of language structure drawings: png and svg
Word
A string without white spaces (i.e. a word).
Select
A select term is a tuple <keyword><relation>. A keyword is a mapping from a word to a set of metadata fields to search. For possible relations see below.
The keyword mapping is constructed via the xesam ontology (or other installed ontologies) plus some additional aliases. By default a keyword will match the corresponding entry in the xesam ontology without the xesam namespace. Ie
title maps to xesam:title
If there is no such field in the ontology other ontologies may be searched at the discretion of the search engine. To supplement this mapping and make the language more user friendly a set of aliases are provided. They include the following
Alias |
Searched fields |
ext |
|
format |
|
mime |
|
tag |
|
type |
Special see below. Match content or source type |
The keyword to field name map should be case insensitive, thus the keyword usercomment should match the field name xesam:userComment.
The relation is a comparison operator. The following are allowed
Relation |
Description |
= |
Equality. Case insensitive on strings. |
: |
Value is contained in keyword |
<= |
Only well defined for dates and integer/floats. Undefined (but allowed) on strings. |
>= |
Same as <= |
< |
Same as <= |
> |
Same as <= |
Phrase
Any string enclosed in quotes. You can append modifiers immediately after the final quote. A modifier is a single letter, and you can list any number of modifiers.
Modifiers does not have to be respected, but must not cause parse errors. They are an optional extension. If a modifier is unsupported it is up to the service implementation to ignore it or handle it with best effort. The following query should match any object with the words hello, world, and printf, case sensitively, within ten words of each other:
"hello world printf"cp
Some search engines take a parameter to things like fuzziness, but these can't be tweaked from the xesam search language - the search engine should use sane default values where needed.
With some modifiers the phrase is not considered a phrase as such, merely a sequence of words (as in example above). This is hinted in the Input column.
Modifier |
Input |
Description |
b |
phrase |
Boost. Any match on the phrase should boost the score of the hit significantly |
c |
phrase |
Case sensitive |
C |
phrase |
Case insensitive |
d |
phrase |
Diacritic sensitive |
D |
phrase |
Diacritic insensitive |
e |
phrase |
Exact match. Short for cdl |
f |
phrase |
Fuzzy search |
l |
phrase |
Don't do stemming |
L |
phrase |
Do stemming |
o |
words |
Ordered words. The words in the string should appear in order, but not necessarily next to each other |
p |
words |
Proximity search. The words in the string should appear close to each other (suggested default: 10) |
r |
special |
The phrase is a regular expression |
s |
words |
Sloppy search. Not all words need to match (suggested default slack: floor(sqrt(num_words))) |
w |
words |
Word based matching. Match words inside other strings if there is some meaning full word separation. Fx "case"w matches CamelCase |
Collectors
Collector |
Representations |
Logical AND |
AND, and, && |
Logical OR |
OR, or, || |
Note that the default operation on multiple terms is AND.
+ and -
Since we default to anding together + is ignored. - means "AND NOT".
The Type Selector
The value of the type selector indicates what types of items the search should include. The value is matched to a xesam category - ie both sources and contents are allowed. Like other keywords the namespace is omitted and the match should be case insensitive. Default namespace is xesam. To search only in xesam:Audio content use, fx:
type:audio hendrix
You can search within a specific source type as well, like (here xesam:File):
type:file algorithm
To help users there is a convenience set of aliases for the category values like we have for fields:
Category Alias |
Real Category |
music |
|
picture |
|
attachment |
Note: Since the type selector allows you to query "either or source or content" it has no clean mapping to the XesamQueryLanguage. You can however easily create a clean map if the engine supports the category extension. Since this is a very isolated corner case, it is not considered a big problem.
Examples
Match any document containing the words "hello" and "world" disregarding letter casing:
hello world
Match any document containing "hello world" as one string disregarding letter casing.
"hello world"
Match any document that contains the words "hello" and "world" close to each other, in that order, and taking letter casing into account:
"hello world"cpo
Match any document of type "music" (which maps to xesam:Audio by the category aliases) with the contents of xesam:creator, or any child field here of, matching the string "Jimi Hendrix" disregarding letter casing:
type:music creator="Jimi Hendrix"
Find all images that has "flower" somewhere in its keywords (fx. "flower-red"), that matches a full text search on "africa":
type:image tag:flower africa



