Have anyone know how google organize foreign language ?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I'm doing research about "how well google can deal with foreign
Because some language has position each word completely different from
english. For example, some language do not have space seperate between
each word. So they really need to do differently on foreign language.
Then, I doubt how can nowadays Google foreign language do? Do they need
to have every countries native developer to develop parser for each
foreign language engine?

I try to find many article about "how google organize foreign language"
from many places but I cannot find it.

I really need help. Thanks for all kindness
    Prachaya Phaisanwiphatpong

Re: Have anyone know how google organize foreign language ?

You don't say what your home language is. I assume, from your name, that it
may be Thai.

There does not yet appear to be a www.google.th  (for Thailand) but there is
one www.google.cn (for China) so you would need to conduct your own
experiments there. Chinese is an "isolating" language meaning that words
dont under go much in the way of grammatical changes.

Turkish is a non Aryan language and is an agglutinating language meaning
that words get longer and longer as they undergo a series of gramatical
changes.  See http://www.google.com.tr/

While English words and words in the other Arayan languages do under go
grammatical changes they are not nearly as dramatic as in a true
agglutinating language.

So, in brief, to get decent results for a language a search engine needs to
be optimized for that language.

Google offers an interface in many languages:
http://www.google.com/language_tools but I suspect that their search engine
has only been optimized for the major European languages.

Quoted text here. Click to load it

Re: Have anyone know how google organize foreign language ?

I can say you for Russian. Russian language has 6 cases for nouns plus
6 cases for plural nouns. To simplify, each Russian noun may have 12
forms in text. There are many of other Russian language rules, which
change a word form in different situations. Several months ago Google
stated understading the main Russian language morphology rules. It
improved its search releveance a lot. They will definately imrove
Russian understanding in a while.

Check website accessiblity, privacy quality:
Check Google Analytics:

Anyone willing to trade multi-lingual dictionary databases?

badd.th@gmail.com wrote:
Quoted text here. Click to load it

Hello All,

I am a Windows Automated Robot Script Programmer
with an interest in multi-lingual applications.
I program my robot scripts using a powerful
automated robot scripting tool named Macro
Scheduler by Mjtnet (www.mjtnet.com)

My current project has the objective of enabling
the user to perform very effective web search
engine queries in languages the user has even total

In a nutshell users E-mail my computer (server) a
search engine request list of search terms in their
native language font or characters.  My automated
robot script does a dictionary word-for-word or
word-for-short term translation of the words in the
E-mail request to the user's designated or 'target'
language for a search engine search in the native
font or characters of the target language.

Currently the database for my dictionaries in
several languages are in (English) Excel
spreadsheets because without getting too technical
Macro Scheduler has specialized commands that makes
interacting with Excel a trivial proposition versus
one that would otherwise be complicated.
Fortunately Non-English unicode text characters keep
their attributes just fine in English Excel.  Thus,
using Excel as a text parsing and calculation
intermediary is recommended by other Macro Scheduler

The user can also designate the target language
to be the same as that of the E-mail request.  In
this case words/terms from the request are directly
implemented in the search URL (which I will further
describe in more detail shortly) and no dictionary
translation is required.

In the current application my automated robot
E-Mails the results back to the user in an attached
Excel spreadsheet.

The key to the effectiveness of my multi-lingual
search engine interface is the establishment of
dictionaries in all languages in their respective
native Unicode font or character sets not just
with respect to "regular" dictionary words but
geographic location and proper (i.e. people's)
names as well.

The crux of my post is to inquire if anyone has
developed an application or for that matter just
extensively uses an application in 'native' Unicode
fonts or characters if you would be receptive to
the idea of trading your word database with mine.

This could make say your present Russian program
(Cyrillic text characters) or application truly
multi-lingual/multi-national ... perhaps with a
little help from an automated robot script with
respect to either gleaning the words/terms or
making others language characters applicable in
your program.

I will address these topics in further detail
shortly but first ...

Because Google is the current world search
engine leader it was/is my first choice for
implementing my automated robot scripts on it.

Google provides and advertises an API or
"Application Programming Interface" which provides
the user essentially some robot capability for
automated searches of their famed search engine.  I
naively figured Google had no qualms or opposition
to automated scripts interfacing with their search
engine provided the number of accesses do not exceed
the limit Google sets for their API.

In other words I figured if the user is not directly
interacting with the Google's main page via their
API or my robot script which has much more in the
way of custom specialized functionality and
capability; it would be a "wash".


Google in its Terms of Service verbiage
specifically prohibits automated robot activity
or interaction to its services from its users
unless authorized by them.

For a few moments after I read the Google's
explicit prohibition it didn't make sense.  But
then it occurred to me Google's main order of
business is their search engine and their
carnivorous assimilation of data from its users.
In such "third party" automated script robots
such as mine the explicit association between
the user's search request and the user's IP
address is lost ... as well as one of the crown
jewels of Google's company interests that
separate it from other search engine providers.

Interestingly, other search engines I've
investigated appear not to have such explicit
Terms of Service prohibitions as Google against
automated scripts accessing them.  Perhaps the
others have other primary business interests and
directions where the association of user and IP
address is not so paramount.

Also, I found that the same concept of my "packing"
the "search URL" even easier with other search
engines!  Where Google requires a different search
URL "string" for each language as will shortly be
described, other search engines have one search URL
"template" or cookie cutter format where all the
robot has to do is plug in the Unicode characters
for any language in a standard search URL and ...
Viola! It works!

So, where the following examples are all with
respect to Google, the actual robot searches will
not be using Google but other search engines.
However, importantly the underlying concept and
mechanics are all the same.

As I mentioned earlier my Multi-Lingual Macro
Scheduler automated robot search engine
interface has the following format:

The user sends my computer (server) an E-mail
of a list of words for a Google search in his/her
preferred or native language in the native Unicode
font or character set and designates the language
the for which the search engine (Google) search is
to be performed.

The robot automatically scans for new E-mails
and upon recognizing a valid request: valid user
login and password, a language that is operational
and the request is valid format so the robot can
act on it etc., the first thing the robot does is
make a word-for-word or word-for-short phase
dictionary translation of the word list.

These will be the search engine (Google) search
terms ... again, in the target language's native
font or character set and in the order the user
lists them in the request.

The request and the target language can be the
same.  In this case no dictionary translation need
take place and the words from the request are
directly transferred "as is" to the search
processing portion of the application.

Probably most of you reading this post are aware
Google has a "main page" for various languages
in a continuing worldwide collaborative effort.
The portal to this capability is selecting the
"Language Tools" link on the "regular" English
Google web page: www.google.com

Interestingly, after implementing a Google search
any one of its various foreign language main pages
the result URLs contain not only search words/terms
in the native font, but the URLs respectively for
each language consistently maintain their format.

With Google every language has its own search URL
which can be replaced by English.

For instance the Urdu search string for famous
world traveler and explorer Marco Polo doing a
using English characters is:


The Greek search string for Marco Polo using
English characters is:


Google provides by default the first 10 results on
and the first result page.  The "next 10" Google
URLs for Urdu and Greek respectively are:



Likewise I've found there is an equivalent of these
standard "next 10" URLs in other search engines as

Once my robot has parsed the search words or terms
from the E-mail request and performed a dictionary
translation if required, it "plugs in" the terms in
the search URL and deploys it bypassing the need to
interact with Google's main page for the given
language or, for that matter, the main page of any
search engine.

The Marco and Polo delineated by a plus '+' sign
are replaced respectively with the native Unicode
renditions of Macro and Polo in Urdu and Greek.
Deploying the search URL in the respective native
font/characters renditions of Marco and Polo will
yield different, often more effective results
depending on the context.

More importantly where just text parsing and
processing is the objective not only don't I need
to interface with a search engine's main page ...
I don't need to use a graphic browser such as
Microsoft Internet Explorer (IE), Firefox, Netscape
etc. to deploy the search URL's.

Macro Scheduler has an HTTPRequest command which
gleans the text whether it it be standard ASCII
English text or the Unicode text for various
foreign languages in a fraction of a second versus
waiting for graphics of web page to stabilize in
standard browsers.

For applications where pure text and no graphical
(i.e. picture) aspects are involved, a Macro
Scheduler solution is an order of magnitude more
efficient and robust than an automated robot
solution that interacts with a browser.

The results of the search URL are URLs of web pages
that contain and/or pertain to the search criteria.
My robot recognizes these URLs and in a most
expedited and efficient manner; again using Macro
Scheduler's HTTPRequest command does an HTTPRequest
of the result URLs and finds instances of the words
and terms of the search request their frequency in
the result URLs.

The result URLs and presence/frequency data of the
search terms are ported into an Excel spreadsheet
and E-mailed back to the user as an attachment.

Macro Scheduler also has specialized commands for
making the aspect of scanning, receiving and
sending E-mail posts trivial as well.

I hope in this post I have adequately conveyed the
gist of my Multi-Lingual Automated Robot Search
Engine Interface (MLARSEI).  However, feature rich
I can make it, it is inherently limited by the
extent of the dictionaries.

Thank you for your interest and consideration.


Joel S.
Rochester, New York
(585) 473-7013
(585) 255-0997 - Cell

Re: Anyone willing to trade multi-lingual dictionary databases?

Quoted text here. Click to load it

You mean like this ?

----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+
----= East and West-Coast Server Farms - Total Privacy via Encryption =----

Site Timeline