Text containing formal language elements considered more searchable

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

It is observed that searching for a piece of sample code is much easier
than searching for a solution described in a natural language. This is
because source code written in a programming language using a certain
code library or for a certain system interface almost surely has an
unambiguous vocabulary (e.g. class and function names) and/or consistent
syntactic conventions. For example, to find some code that gets caret
position from a win32 edit control, searching for close occurrences of
the keywords GetCaretPos and AttachThreadInput will immediately catch a
few sample code snippets for this development purpose. Similar
observations can also be made from search systems for scientific data
such as DNA codes.

So I propose that natural language document composition and retrieval
may also benefit from such principle. We may define and promote a
controlled vocabulary (glossary) for a knowledge domain, and encourage
information producers and searchers to use such vocabulary in making and
retrieving information of this knowledge domain. Information producers
may also include one or more publicly recognized domain identifiers in
his information, and information searchers can use the same kind of
domain identifier to narrow his search.

Besides controlled terms, more complex formal language elements may be
specified to formalize structured concepts in information. For example,
a "win32_programming" knowledge domain may define a term "caret" and a
possible action "get" associated with "caret". This is like defining a
class "caret" in some C++ program and a member function "get" for this
class. Then the combined formal expression "caret.get" can be more
precise in describing the structured concept "get the caret" than a near
combination of "caret" and "get" respectively.

Using formal semantics also enables a search engine to reliably deduce
implications of a search criteria and therefore to return relevant
results if an exact match does not exist.

Yao Ziyuan

Site Timeline