FindWord recognises foreign language documents

Foreign language documents can be recognised -
and displayed separately *

The recognition of foreign language texts is based on the following strategy:

In normal english documents, using its dictionary and parser, FindWord recognises a certain number of words. These are shown with an asterisk "*" prepended in the word list. For every file shown in the file window the number of words recognised in it is shown as a percentage of the total words it contains (in the column on the right).
In the case of foreign texts and special texts (address books, computer programs and such) only a few words will be recognised, and their percentage of the total number of words will be very little.
FindWord makes use of this phenomenon by allowing you to define a minimum percentage of recognised words - under which a document will be classed as not containing normal english text. Thus you can request the sole display within a project of
- all documents,
- only english documents, or
- only foreign language or special documents.

An example:

A project has 13 documents all containing the word "patent":

The "Recognition quotient" specifies in percent, for the current project, how many words must be recognised from the dictionary or by the parser in order to classify a document as containing readable english. On average we observe that

more than 30% of words are recognised in an english text, but
less than 15% of words are recognised in foreign or special texts.

Therefore we recommend a recognition quotient of about 20%.

With that set and "Only foreign language files" selected, we immediately see that 3 files , with recognition quotients of respectively 6%, 12% and 8% lie below the 20% mark:

Conversely, if we select "Only english language files" then we'll see 13 - 3 = 10 files, with recognition quotients of at least 20%:

Foreign language documents can be recognised -and displayed separately *

An example:

Foreign language documents can be recognised -
and displayed separately *