V3.0 — ??.???.2023
© 2013-2023 Marco A.G.Pinto and Community Contributors.
Freely distributable and modifiable under the
Apache License v2.0.
SEMI-FINISHED MANUAL — REQUIRES A FULL REVISION WHEN I HAVE THE TIME!
LAST UPDATE: 2023-05-05


Index
1 — Introduction
2 — Copyright & DISCLAIMER
3 — Contacts
4 — Thanks
5 — How it works

  5.1a — Using UTF-8
  5.1b — EOL Windows VS Unix VS Mac
  5.1c — Packing the files into Extensions
  5.1d — Shortcut keys
  5.1e — Preferences
  5.1f — Sort
  5.1gFind
  5.2  Dictionary

       5.2.1 — Creating a Dictionary
       5.2.2 — Editing a Dictionary
       5.2.3 — How Suffixes/Prefixes work
       5.2.4 — What is flag position and rule
       5.2.5 — Menus
             5.2.5.1  AFF Validate
             5.2.5.2  — Bulk Import

             5.2.5.3  — Import places/people names using possessives

             5.2.5.4  — GB to AU/CA/NZ/ZA (Marco Pinto - GB)
             5.2.5.5  — GB primary to -ize/-ise script (Marco Pinto - GB)
             5.2.5.6  Extract wordlist
                      5.2.5.6.1 All .txt
                      5.2.5.6.2 — All .csv
                      5.2.5.6.3 — Compounds .txt
                      5.2.5.6.4 — Compounds .txt (LanguageTool)

             5.2.5.7  Count wordlist
             5.2.5.8  Show/Merge/Delete duplicates .dic
             5.2.5.9  Show duplicates wordlist
             5.2.5.10 Statistics
             5.2.5.11 — Words missing in Master Wordlist
             5.2.5.12 — Sort flags .dic
             5.2.5.13 — Crunch words with flags (Marco Pinto - GB)

             5.2.5.14 Fix invalid spaces
  5.3  — Thesaurus

       5.3.1 — Creating a Thesaurus
       5.3.2 — Editing a Thesaurus
       5.3.3 — Menus
             5.3.3.1 — Thesaurus Validate
             5.3.3.2 — Bulk Import

             5.3.3.3Extract synonyms
             5.3.3.4Clean up symbols (Hex to Emoji)
             5.3.3.5Show/Merge duplicates
             5.3.3.7Combine/Sort/Deduplicate simple meanings
                     5.3.3.7.1 — Combine
                     5.3.3.7.2
Deduplicate
                     5.3.3.7.3 — Sort
             5.3.3.8Preview Thesaurus
             5.3.3.6 — Fix invalid spaces
  5.4  Hyphenation
       5.4.1 — Creating a Hyphenation
       5.4.2 — Editing a Hyphenation
       5.4.3 — Menus
             5.4.3.1 — HYP Validate
             5.4.3.2 — Fix invalid spaces
  5.5  — Autocorrect
       5.5.1 — Creating an Autocorrect
       5.5.2 — Editing an Autocorrect
       5.5.3 — Menus
             5.5.3.1 — AC Validate
             5.5.3.2 — Bulk Import

             5.5.3.3 — Extract autocorrects
             5.5.3.4 — Clean up hex symbols
             5.5.3.5 — Show/Delete/XDelete duplicates
             5.5.3.6 — Fix invalid spaces

  5.6  — LanguageTool
  5.7  — Language Specific

  5.8  — Extension
       5.8.1 — Menus

             5.8.1.1 — .xpi/.oxt Properties
6 — History
7 Apache License
8 — PCRE Licence



F1 opens the help in various windows (not fully implemented yet).



1 — Introduction
An open-source multiplatform advanced linguistic tool coded in PureBasic for editing the Dictionary/Thesaurus/Hyphenation/Autocorrect files of OpenOffice, LibreOffice, Firefox, Thunderbird and SeaMonkey, provided they are in UTF-8 format.

This program was originally developed to easily edit the synonyms of OpenOffice and LibreOffice.

I had this idea after asking the persons in charge of the pt_PT project, from Minho University in Portugal, what I should do to suggest synonyms since only suggested words for the Portuguese speller were added.

I was told that they didn't know how to add synonyms, since the person in charge of that project left it long-ago (2006).

Later, I wanted to make it compatible with Firefox and Thunderbird, after it became possible to edit dictionaries. I hoped that in the future, someone would use it in Thunderbird and fix the en_GB speller, which was full of typos and missing words. Since no one volunteered, I took this task myself in 2013.

This is where my idea came from: develop something easy to use since I tried some official tools for the tasks and I didn't understand anything on them, not even how to use them.

My tool is so intuitive that even a child can use it.


2 — Copyright & DISCLAIMER
This program is copyrighted to Marco A.G.Pinto and Community Contributors.

It is freely distributable and modifiable under the Apache License v2.0.

The RegularExpression library used to decode prefixes and affixes is copyrighted to PCRE ( http://www.pcre.org/pcre.txt ) and the licence is at the end of the manual.


The lamp image used here was designed by Ignacio Javier Igjav under the Creative Commons Attribution-Share Alike 3.0 Unported licence.


3 — Contacts
(coder)

S.Mail:
Marco A.G.Pinto
Apartado 3083
2746-501 Queluz
(Portugal Portugal)

E.Mail:
marcoagpinto@sapo.pt


4 — Thanks
Some special thanks go to:

Groups/Organisations:
 — Apache Community;
 — LanguageTool Community;
 — LibreOffice Community;
 — Mozilla/Thunderbird Community;
 — PureBasic Community.

Persons:

 — Alberto Simões (Minho University);
 — Alexandro Colorado (Apache OpenOffice);
 — Andrea Pescetti (Apache OpenOffice);
 — Andreas Mantke (LibreOffice);
 — Andrew Ferguson (PureBasic);

 — António Manuel Dias (former pt_PT maintainer);
 — Áron Budea (LibreOffice);
 — Ashley Scott (PureBasic);
 — Bernd Krüger-Knauber (PureBasic);
 — Brandon Dupuis;
 — Chris Saxon (PureBasic);
 — Daniel Naber (LanguageTool);
 — Dennis Roczek (LibreOffice);
 — Filiep Spyckerelle (European Parliament);
 
Dominic Wujastyk;
 — Frédéric Laboureur (PureBasic);
 — Gervase Markham (Mozilla);

 — Guy Waterval (Apache OpenOffice);
 — Heinz Urban (PureBasic);
 — Ian Neal (Mozilla);
 — Jeroen Ooms (LibreOffice);
 — Jonathan Kew (Mozilla);

 — Jörg Knobloch (Mozilla);
 — José Almeida (Minho University);

 — Kruno (LibreOffice);
 
— Kevin Scannell (Mozilla);
 — Martin Srebotnjak (LanguageTool);

 — Martin Tlustos (LibreOffice);
 — Matthias Mailänder (LanguageTool);

 — Mauro Trevisan (LibreOffice)(Mozilla);
 — Mykhailo Oliinyk;

 — Olivier Hallot (LibreOffice);
 — Pedro Marques (IADE — Creative University);
 — Peter Chamberlin (Mozilla);
 — Philip Taylor;
 — Ricardo Palomares Martínez (Apache OpenOffice);
 — Shantanu Oak (LibreOffice);
 — srod (PureBasic);
 — Stuart Swales (Apache OpenOffice);
 — Thomas Schulz (PureBasic);
 — Tiago Santos (LibreOffice)(LanguageTool).


5 — How it works
5.1a — Using UTF-8
This tool was made to work with UTF-8 encoding.

A good trick to convert the old encoding formats to UTF-8 is to use, for example, the Notepad++ editor for Windows.

Simply open the files with it, change the encoding to UTF-8 using the menu: EncodingConvert to UTF-8-BOM, so that accents appear well.

Then, use the Save As option and select "Normal text file (*.txt)" and it is done.

Don't forget to change by hand in the header of the files, the word that has the old format, with the new one.

The headers with the font encoding are inside the files. See for example Version 2.4 (01/09/2007) of the Italian files:
— The Dictionary (.DIC + .AFF):
The .DIC has no keyword.

The .AFF has the following keyword:
SET ISO8859-15 → Replace with SET UTF-8

— The Thesaurus (.DAT):
It has in the first line:
ISO8859-15 → Replace with UTF-8

— The Hyphenator (.DIC):
It has in the first line:
ISO8859-15 → Replace with UTF-8


5.1b — EOL Windows VS Unix VS Mac
I have done some tests saving in Windows and Unix and the Windows files become bigger.

This happens because the End of Line characters is different in Windows and in Linux.

Windows uses #CRLF$, Unix #LF$ and Mac #CR$.

Linux/Unix is the open-source standard, so it is better to use its format.


5.1c — Packing the files into Extensions
To create extensions you will have to use other package which I don't know yet.

The simplest way though, is just to replace the files of an existing extension with yours.

You should use the SORT button before you can consider your Dictionary/Thesaurus/Autocorrect ready for being packed into an extension.

Making extensions for Mozilla seems easier than making for OpenOffice/LibreOffice, since for them it is more complex due to the fact that they can have multiple languages in one archive.

Compress in a .ZIP archive and then change the file extension to the target software.


5.1d — Shortcut keys
TAB SWITCH RIGHT — CTR + TAB
TAB SWITCH LEFT — SHIFT + CTR + TAB
OPEN — CTR + O
SAVE — CTR + S
SAVE AS — SHIFT + CTR + S
FIND — CTR + F
ADD — CTR + A
GOTO — CTR + G
DELETE — DEL
EXIT A WINDOW & ABORT PROCESSING & ABORT OPEN/SAVE/SAVE AS — <ESC>
EXIT — CTR + Q



5.1e — Preferences



ToolsPreferences

This window allows to select the global settings of PTG.

If PTG doesn't find the file ptg3.prefs, or if it from a different version, it will open this window automatically at the start and you will need to save it.


Number of lines visible for each ListIconGadget in the tabs:
Double-click in the last line that fits in the gadget, the last line visible, and watch for number in visible line textbox changes to what's selected. More space can be gained by changing the window resolution in this preference dialogue.
You may want to use the "Plus Pixels" to set the accuracy between -10 and +10 pixels.
This will make it work with all OSes.

PTG will try to guess the number of lines according to the maximum window resolution in the default prefs.


Resolution:
Set your preferred window resolution.
There is also the possibility of using a window that will fit your desktop entirely.
The GUI will fit to the chosen resolution.

The default resolution is always the window size that fits the screen resolution


LanguageTool:
This will set the minimum number of blank lines between each chunk in a LanguageTool grammar.xml file for parsing.
The idea is to edit/create rules using PTG.
Not working yet.


.AFF Aid Language:
It is possible to have external files with a list of each code, so that while working in the dictionary edit/add window, it is easier to find specific rules.

The GB Aid is hard-coded into PTG.

How .AFF Aid works:

The first line of the .txt uses the 3 letter ISO standard for the language, followed by the language in the language itself separated by e.g. a colon. vec:Vèneto

You then still could make separate versions of the file, if you want to make a more specialised version (indicating the country of dialect): vec-IT (for Italian version) or vec-BR.

The standard list of codes can be found at: https://www-01.sil.org/iso639-3

You can consult the list also directly on the Web at https://www-01.sil.org/iso639-3/codes.asp

There is
the possibility to have a third field with the name of the language in English, to suit people who do not know the code, neither the language e.g. nld:Nederlands:Dutch .

So, for Veneto, see the example:
custom_aff_aid.vec.png ;16x16 PNG locale for the ComboBox.

custom_aff_aid.vec.txt ;Veneto Rules. See the ".vec" before the ".txt".
                       
In the .txt file we followed the lines logic:
vec:vèneto:Veneto
a0/a1/a2: common verbs (first conjugation verbs, second conjugation arrhizotonic verbs)
b0/b1/b2: second conjugation rhizotonic verbs


So, the first line is what appears in the PTG ComboBox, and the rest are the rules.

Then place both files inside the folder: "custom_aff_aid"



Number Separator Character:
To separate thousands in the number of items by any character, while displaying numbers.
For example: 1,000,000 or 1 000 000, etc.


Warn over N words decoding (RAM/HDD):
To avoid running out of RAM or hard disk space in wordlists with millions of entries, we have this option.

It will show a warning if it processes more than N words, if one wishes to continue or abort.

For example, the option to search for duplicates fills the RAM with the decoded wordlist.

The option to extract the wordlist to HDD fills the space in the hard disk with the decoded wordlist.


Font for language:
If you have corrupt characters in your language, use this option to change the font and all gadgets will adapt its contents accordingly.

For example: for the Marathi language use the font: "Arial Unicode MS".


Check for Updates:
You may select a search interval to automatically look for PTG updates when it is run or not to search at all.

If an update is available and you wish to download it, you can choose that in the pop-up window informing that there is a new version and it will take you directly to the download page: https://proofingtoolgui.org/#downloads


Affix colours:
It will allow to select the font colour for primary, prefixes, suffixes and both (CIRCUMFIX) in the dictionary editor window.



5.1f — Sort
Before releasing an official update of your work you should sort the items.

I came up with an idea on how to solve the PureBasic issue in which strings are sorted by ASCII which makes words with diacritics to appear at the bottom of the lists.

What I did was: create a structured array with a field where I store each word in lowercase without diacritics, and other field where I store the original word.

Then, I sort by index 1 and repopulate with index 2.

Notice that I had to create a function to remove the accents and it is possible that some letters may be missing there and will be added when found/suggested.

With my code, letters with diacritics appear in the same orders as if they didn't have diacritics which is the indicated for spellers.

The entries move around on sorting because uppercase and lowercase are treated as equal, which means they can change position on sorting.

Converted accents on 2020-05-08:
; a
"á","a"
"à","a"
"ä","a"
"ã","a"
"â","a"
"å","a"
"ā","a"
"æ","a"
"ă","a"


; b
"ḃ","b"

; c
"ç","c"
"č","c"
"ċ","c"
"ć","c"
"ĉ","c"

; d
"ḋ","d"
"đ","d"
"ď","d"


; e
"é","e"
"è","e"
"ë","e"
"ê","e"
"ĕ","e"
"ē","e"
"ė","e"
"ễ","e"


; f
"ḟ","f"

; g
"ģ","g"
"ğ","g"
"ġ","g"

; i
"í","i"
"ì","i"
"ï","i"
"î","i"
"ī","i"

; j
"ɉ","j"

; l
"ƚ","l"
"ł","l"
"ľ","l"


; m
"ṁ","m"

; n
"ñ","n"
"ń","n"
"ň","n"


; o
"ó","o"
"ò","o"
"ö","o"
"õ","o"
"ō","o"
"ø","o"
"ô","o"

; p
"ṗ","p"

; r
"ř","r"

; s
"ṡ","s"
"š","s"
"ș","s"

; t
"ṫ","t"
"ŧ","t"
"ť","t"
"ț","t"

; u
"ú","u"
"ù","u"
"ü","u"
"û","u"
"ū","u"

; w
"ẃ","w"
"ẁ","w"
"ŵ","w"
"ẅ","w"

; y
"ў","y"
"ý","y"
"ỳ","y"
"ŷ","y"
"ÿ","y"

; z
"ž","z"


5.1f — Find
It is possible to search for items in each tab by pressing CTR + F or the "Find" button.

It is also possible to directly Edit/Delete items from the results by a right-click over the fount items.

The options are very simple:
 1) Left match text: Searches for items that match from the left the search expression;
 2) Match case: Case sensitive search;
 3) Match extra information: Searches for text in all columns of the items and not just in the first.

The gadget tabs are:
 1) Matches: Shows the words that match the search text;
 2) Position: Shows the match position in main ListIconGadget;
 3) % Match: Shows how much the search text matches the found words. Notice that it includes separators and flags;
 4) Status: If you right-click and edit a word making changes, it will appear in red: "Changed".




You can abort an ongoing search by pressing <ESC>.

TIP:

Always search if a word exists before adding it.

Press CTR + F and type/paste the word there.

This takes a second and saves a lot of work.


5.2 — Dictionary
A dictionary is basically two files, one with the words+flags (.dic) and other with the flags (.aff) that will derivate the .dic words.

For example, I could have in the affixes file some lines saying that a flag "S" would add a "123" to words, and in the .dic I could have a word "Marco/S" that would be decoded into:
Marco
Marco123

Flags are case sensitive, so "S" is different from "s".


PTG supports the following D
ictionaries features:
 1) FLAG chr, number and long;
 2) AF Compression;
 3) Twofold recursivity;
 4) NOSUGGEST flag;
 5) NEEDAFFIX flag;
 6) FORBIDDENWORD flag;
 7) CIRCUMFIX flag;
 8) KEEPCASE flag;
 9) COMPLEXPREFIXES flag (detects but doesn't decode yet).



5.2.1 — Creating a Dictionary
If you have a Dictionary in memory, use ERASE to delete all entries.

To create a Dictionary from zero you just have to press the button ADD to add words.

Use EDIT or double-click to change information regarding the words.

Use DELETE or <DEL> to remove entries.

The format of the Dictionary is two UTF-8 format files with the extension .DIC and .AFF .

Even though the tool reads the .AFF file, I still haven't read documentation about how it works. This means that creating a Dictionary from scratch will require some previous knowledge.

The .DIC file is the list of words, and the .AFF file is a list of rules and other options.

See the first two paragraphs of: https://www.chromium.org/developers/how-tos/editing-the-spell-checking-dictionaries

Now-and-then remember to SAVE/SAVE AS to play safe.


5.2.2 — Editing a Dictionary
First download the extension of the language you intend to use, from the official pages.

You should have an .oxt or .xpi file which you rename to .zip in order to extract its contents to HDD.

Press OPEN and select the .dic file of the Dictionary and my tool will also open the associated .aff file.

Now just ADD/EDIT/DELETE the current entries.

Now-and-then remember to SAVE/SAVE AS to play safe.


5.2.3 — How Suffixes/Prefixes work



A small explanation how to make suffixes/prefixes work, based on the e-mail written by Ricardo Palomares Martínez:

While editing dictionaries, you can add one or more identifiers in front of a word, after a "/". For example, the en_GB .AFF uses the identifier "S" to create plural:
party/S

This will look in the .AFF file and find:
SFX S Y 9
SFX S y ies [^aeiou]y
SFX S 0 s [aeiou]y
SFX S 0 es [sxz]
SFX S 0 es [cs]h
SFX S 0 s [^cs]h
SFX S 0 s [ae]u
SFX S 0 x [ae]u
SFX S 0 s [^ae]u
SFX S 0 s [^hsuxyz]

SFX S Y 9
SFX → It is a suffix (PFX would mean a prefix).
S   → The suffix identifier.
Y   → Y for YES. It means the rule can be cross-used with other prefixes and suffixes.
       If N the rule can't be applied together with other affixes the word might have.
9   → The number of lines related to this rule.


SFX S y ies [^aeiou]y
SFX       → It is a suffix (PFX would mean a prefix).
S         → It is the suffix/prefix identifier.
y         → For a suffix it is the letter(s) to be removed from the end of the word.
             For a prefix, from the beginning of the word.
ies       → For a suffix, it is the letter(s) to be added at the end of a word.
             For a prefix, from the beginning of the word.

[^aeiou]y → Condition in regexp notation. Here, the rule is applied to words ending with
             a "y" and the letter next to the last is NOT a, e, i, o or u.
             Yes, the ^ means that the letters mustn't match.

So, party/S would produce: parties

And, boy/S would produce: boys, triggering the following rule which has a 0 saying that no letters are replaced, just added. It applies to words ending with a "y". There is no ^ which means that the second letter from the right must be a, e, i, o or u.
SFX S 0 s [aeiou]y

Also notice that if words have capitalised letters, the Hunspell in the used software will only accept them with capitalised letters exactly like in the .DIC (it suggests a typo if different).


5.2.4 — What is flag position and rule
The derived words ListIconGadget has the fields: "Flag Position" and "Rule".

"Flag Position" is the characters position of the first line (header) of each rule used. For example:
SFX S Y 9
(It is a Suffix with identifier "S", "Yes" and "9" rules in it)

Then, inside the dictionary editor, you now have a column with the rule number after the header. Double-clicking in a ListIconGadget line will jump to the header, then you will just have to scroll a few lines down to the rule number.

Please notice that the editor gadget in the add/edit word window has a "clean" version of the .AFF with space repetitions removed in order to be faster finding the flags (less characters to process).

Regarding the rules:
[^abc]de[fghi]

[fghi] means the current character must be one of f,g,h,i.
de means it must have the word de on the left of the last character check above, if it matched the condition.
[^abc] means the current character in the word must not be a,b,c (all different).
See the example above for "party".

The rules may either be check from right to left (suffix) or left to right (prefix).

Prefixes are run against the primary word and suffixes. This means that if you have in the en_GB speller:
party/S

You will get:
party (primary)
parties (suffix)


If you had:
party/SU

You would run the "U" code (prefix) against the two words above:
party (primary)
parties (suffix)
unparty (prefix)
unparties (prefix)



5.2.5 Menus
5.2.5.1 — AFF Validate
After it is coded it will allow to search for missing UTF-8 flags or duplicates of it, duplicate rules or missing rules, showing the problematic lines where the errors occur.


5.2.5.2 — Bulk Import
word1
word2
word2
etc.




5.2.5.3 — Import places/people names using possessives




5.2.5.4 — GB to AU/CA/NZ/ZA (Marco Pinto - GB)

(21/NOV/2018)
[09:10] <marcoagpinto> I came up with a brilliant idea for AU+CA+NZ spellers since Kevin Atkinson's versions suck
[09:10] <marcoagpinto> I will code a feature into Proofing Tool GUI in January, that can be used with my GB speller
[09:10] <marcoagpinto> +word1
[09:10] <marcoagpinto> -word2
[09:11] <marcoagpinto> if we find maintainers, they can create a .txt with a list of British words to remove and AU+CA+NZ specific words to add

"GB to AU/CA/NZ" <- MENU OPTION 2018-12-13
[15:22] <marcoagpinto> I still need to implement the flags merging to remove duplicates
[15:22] <marcoagpinto> and the feature that will allows to convert GB to AU+CA+NZ
[15:22] <marcoagpinto> :)
[15:22] <marcoagpinto> will allow*
[15:22] <marcoagpinto> all one needs is a list of (-) words that will be applied to the speller
[15:23] * deneb__alpha (~deneb_alp@fedora/denebalpha) has joined #libreoffice-qa
[15:23] <marcoagpinto> :)
[15:23] <marcoagpinto> and add words from the countries
[15:23] <marcoagpinto> country specific words

[09:35] <marcoagpinto> darktrojan: I want to code a feature into Proofing Tool GUI that will create AU+CA+NZ spellers based on the GB one
[09:35] <marcoagpinto> there will be options to remove -ise/-ize from the .dic, as well as providing a list of words to remove from the .dic (GB only words)
[09:36] <marcoagpinto> is NZ only -ise?
[09:37] <%darktrojan> theoretically
[09:37] <%darktrojan> we tend not to care too much about that one
[09:37] <marcoagpinto> ahhhh
[09:37] <marcoagpinto> cool
[09:37] <marcoagpinto> and British words like "arse"? Do they exist in NZ?
[09:37] <marcoagpinto> :)
[09:38] <marcoagpinto> after the feature is implemented, someone will need to create a list of words to remove from GB
[09:38] <marcoagpinto> a .txt file with removal words


[13:12] <marcoagpinto> in a few months it will be possible to create AU+CA+NZ spellers based on the GB one!
[13:12] <marcoagpinto> :)
[13:12] <bearon> please stop saying my name
[13:12] <marcoagpinto> ahhhh
[13:12] <marcoagpinto> sorry
[13:13] <marcoagpinto> I have drawn/written the GUI for it
[13:13] <marcoagpinto> :)
[13:13] <marcoagpinto> on paper of course
[13:14] <marcoagpinto> one will be able to remove -ise or -ize according to the country
[13:14] <bearon> sounds like an interesting plan
[13:14] <marcoagpinto> for example, the future maintainer for AU could open the GB speller and select "remove all -ise" and then add AU words
[13:14] <marcoagpinto> :)
[13:14] <bearon> would be nice to have all the common words in all
[13:14] <marcoagpinto> and have a list of GB words to remove in a .txt
[13:15] <bearon> perhaps it'd be best to have one with the common words, and let GB be based on that as well
===
[13:12] <bearon> please stop saying my name
[13:12] <marcoagpinto> ahhhh
[13:12] <marcoagpinto> sorry
[13:13] <marcoagpinto> I have drawn/written the GUI for it
[13:13] <marcoagpinto> :)
[13:13] <marcoagpinto> on paper of course
[13:14] <marcoagpinto> one will be able to remove -ise or -ize according to the country
[13:14] <bearon> sounds like an interesting plan
[13:14] <marcoagpinto> for example, the future maintainer for AU could open the GB speller and select "remove all -ise" and then add AU words
[13:14] <marcoagpinto> :)
[13:14] <bearon> would be nice to have all the common words in all
[13:14] <marcoagpinto> and have a list of GB words to remove in a .txt
[13:15] <bearon> perhaps it'd be best to have one with the common words, and let GB be based on that as well
[13:16] <marcoagpinto> :)
[13:16] <marcoagpinto> well, the idea is also an option to "merged GB"
[13:17] <bearon> so you don't need to maintain a removal list, but can decide where to add the new word
[13:17] <marcoagpinto> I won't maintain such a list
[13:17] <bearon> since it's easier to forget to add something to the list of words to be removed ;)
[13:17] <marcoagpinto> the maintainers will
[13:17] <bearon> well, you have to
[13:17] <bearon> you add a new word to the GB speller
[13:17] <bearon> who will know if it's GB only or not?
[13:18] <marcoagpinto> "merge GB" + remove -ize/-ise + list of words to remove
[13:18] <marcoagpinto> :)
[13:18] <marcoagpinto> the maintainers?
[13:18] <marcoagpinto> the guys from CA+AU+NZ will know?
[13:18] <marcoagpinto> :)
[13:18] <marcoagpinto> we need to find people from these countries

5.2.5.5 — GB primary to -ize/-ise script (Marco Pinto - GB)


5.2.5.6 Extract wordlist
Decodes and extracts the wordlist into a .txt file.


5.2.5.6.1 — All .txt


5.2.5.6.2 — All .csv


5.2.5.6.3 — Compounds .txt


5.2.5.6.4 — Compounds .txt (LanguageTool)



5.2.5.7 Count wordlist
Decodes and counts the total number of words in the wordlist.


5.2.5.8 Show/Merge/Delete duplicates .dic
The menu to search for duplicates in the dictionary will match two identical words, unless they have morphological information that differentiates them.
See the example of the Portuguese speller where each word has information in front:
celeste/p [CAT=adj,N=s,G=_]
Celeste [CAT=np,G=f,SEM=p]





The main purpose is to merge flags into primary words, delete repeated words and merge flags of the same kind (customised).

For example, in the GB speller:
Marco (primary)
Marco/S (suffix)
Marco/GD (suffixes)
Anna/UA (prefixes)
Anna/I (prefix)

Would result in merging the flags of common type (unless you change the checkboxes in the window for other behaviour):
Marco/SGD
Anna/UAI

These two words in the .dic would be the result.


5.2.5.9 — Show duplicates wordlist
It decodes the .dic to RAM and then compares the whole wordlist for duplicates and opens a windows with the duplicate words, the number of duplicates and the positions in the .dic which causes them.

You can export the results as a text file for easier removal of duplicates.




5.2.5.10 Statistics
Decodes and counts the total number of words in the wordlist showing also statistical information.


Now we can know the number of new words added to the speller at any time based on a reference value (good for release notes).

I will explain the extra functionality of the window based on the line from above: "4 characters long".

If you left-click in a line, it will show three options in the pop-up menu:
1) Show
You can also double-click in a line instead of manually selecting "Show".

It will open the window below based on the criterium of the selected line.

For example, in the window below it will show all the words that after decoded have 4 characters, including the flags that caused that to happen.

2) Extract
It will extract the words based on the criterium of the selected line.

No flags will be extracted, but just the decoded words of the criterium (a normal wordlist).

3) Extract with flags

It will extract the words based on the criterium of the selected line, including mentions to the primary/flags that generated them:
beer Flag:R (Primary:bee)
bees Flag:S (Primary:bee)





You can scroll in the ListIconGadget using the cursor keys and it will decode in the preview panel.

In the left panel you will see the words that match the criterium and on the right panel the decoding of them, including primary/flags.

Pressing the "Extract" button will do the same as 2) with the criterium based on the previous Statistics window.
It will do the task for all the words listed and not just for the highlighted one.

Pressing the "Extract with flags" button will do the same as 3) with the criterium based on the previous Statistics window.
It will do the task for all the words listed and not just for the highlighted one.


5.2.5.11 — Words missing in Master Wordlist
This allows to import a master wordlist, and then other wordlists, and analyse them, then showing which words are missing in the master wordlist.

For example: for English I have my main wordlist (the one I am working on) and I found others on Internet such as the ones provided by Kevin Atkinson, so I thought about an easy way to find out what words are missing in my speller instead of checking by hand.

Or an easy way to find which words are missing in Kevin's en_AU, en_CA and en_US and report to him.

On 25-OCT-2020 I provided wordlists to Kevin Atkinson in his GitHub with the words missing in CA + AU + US.

This feature coded into Proofing Tool GUI allows to do that.

I will select the master wordlist (CA/AU/US) and the other (GB) then, the software will check which ones are missing in the master


5.2.5.12 — Sort flags .dic
Notice it uses non-casesensitive and "Aa" can become "Aa" or "aA" (for example, in FLAG CHR) (there is no order between the same letters with different cases).


5.2.5.13 — Crunch words with flags (Marco Pinto - GB)
This feature will add the flags to words, for example, in the Oxford Professor wordlist, if he had the words:
party
parties


After validating them in the official dictionary I would import them into the GB speller using a copy/paste option in the bulk import.

Then, this feature that adds flags, so in the end, those two words will become one:
party/S

Flag "S" makes the plural of it in the rules file of the speller.

It took around 5 hours to scan the entire 90K lines .dic file to automatically add and later merge the words...

It is slow as hell... there are around 90K lines of words in the .dic file and each words has to be checked with the others... so, it should be 90 000^2 testing... huge calculations...

Later, I added the setting to select the number of matching characters in the left: 1, 2 or 3 for the loop ranges.

This increased speed a lot as with "2" instead of 5 hours it took 4 minutes for the 90K lines and "3" took just 1 minute.

"2" should be enough for a first approach and then "3" after you are sure that all the words you added with bulk import are at least 3 characters long.




5.2.5.14 Fix invalid spaces
This option will search for invalid spaces on the list of words, such as two spaces or starting with a space.

It requires a future update for smarter lookup such as recognising remmed lines.



5.3 — Thesaurus
The thesaurus has a .dat and an .ixd file.

See for example the pt-PT .dat
UTF-8
a cerca de|1
   (-) |a respeito de|sobre
a começar de|1
   (-) |a partir de|desde


The pt-PT .idx:
UTF-8
50864
a cerca de|6
a começar de|47


Can you see the logic?
The .idx says that at byte 6 it has that "a cerca de" and at byte 47 the "a começar de" (byte=position).

Both the .idx and the .dat are saved at the same time (simultaneously), for each .dat entry I save I add the position to the .idx (the file position).

5.3.1 — Creating a Thesaurus

If you have a Thesaurus in memory, use ERASE to delete all entries.

To create a Thesaurus from zero you just have to press the button ADD to add synonyms.

Use EDIT or double-click to change information regarding the synonyms.

Use DELETE or <DEL> to remove entries.

The format of the Thesaurus is a UTF-8 format file with the extension .DAT .

Now-and-then remember to SAVE/SAVE AS to play safe.


5.3.2 — Editing a Thesaurus
First download the extension of the language you intend to use, from the official pages.

You should have an .OXT file which you rename to .ZIP in order to extract its contents to HDD.

Press OPEN and select the .DAT file of the Thesaurus.

Now just ADD/EDIT/DELETE the current entries.

Now-and-then remember to SAVE/SAVE AS to play safe.

In build 82 (14.Aug.2015) I improved the Thesaurus part. It is now possible to use DEL to delete synonyms and added a menu "Thesaurus Tools" with options being the most important one the "Combine" which combines all meanings but only works with simple lines:
x|2
a
b
would generate:
a|2
x
b
and:
b|2
a
x



PTG creates .idx files for the Thesaurus.


5.3.3 — Menus

5.3.3.2 — Bulk Import
word1,syn1,syn2
word2,syn1
etc.


5.3.3.3 Extract synonyms

5.3.3.4 Clean up symbols (Hex to Emoji)

5.3.3.5 Show/Merge duplicates

5.3.3.6 Fix invalid spaces


5.3.3.7 Combine/Sort/Deduplicate simple meanings


5.3.3.7.1 — Combine


5.3.3.7.2 Deduplicate simple meanings

What is the definition of a "duplicate" meaning?

It means for example:
apple|3
one
two
one

It means that it would remove the "one" once becoming:
apple|2
one
two

It checks line by line and not column by column:
apple|1

-|one|two|one

This wouldn't change the meanings.


  2) "Sort simple meanings" will work also line by line in the Thesaurus meanings.


5.3.3.7.3 — Sort


5.4 — Hyphenation
.AFF files are open in the first tab "Dictionary".

When you open a .DIC in "Dictionary", it opens also the .AFF. Then, after opening a dictionary, you can go to the "Hyphenation" tab and open an Hyphenation .DIC file.

The joining of both will allow you to derivate words in hyphenation or even type words by hand.




5.4.1 — Creating a Hyphenation
Simply add the rules to the EditorGadget or open an existing one.


5.4.2 — Editing a Hyphenation
Open the hyphenation .dic of the language you intend to use or add new rules.

One can open a speller and then check hyphenation from its wordlist, or just use an hyphenation file for words without codes.

Thanks to Mauro Trevisan for explaining to me how the rules work.

Results can always be checked with: https://www.ushuaia.pl/hyphen/?ln=en

The "Rule" field in the ListIconGadget is for developers to debug while they create rules.

Example of rules, taken from Németh László PDF:
. a l g o r i t h m .
   4l1g4
    l g o3
     1g o
           2i t h
               4h1m
  -----------------
   4 1 4 3 2 0 4 1
  a l-g o-r i t h-m = al-go-rith-m

In simple words, if you have the word
"algorithm", it will add a space (internally) to each letter and a "." on each side:
". a l g o r i t h m . "
Then you may create the rules above in the EditorGadget without spaces:
4l1g4
lgo3
1go
2ith
4h1m


If you use a dot:
.1go ; It means the rule matches the left of the word

4h1m. ; It means the rule matches the right of the word
.a4l1g4o3r2it4h1m. ;Using a dot on each side means that the word must match the whole rule.

Only the odd numbers are converted to hyphens.

If several rules match the words, the highest number in a column will be kept. This is probably used for priorities with higher numbers being taken into consideration first

Remember that the first word in the Hyphenator EditorGadget must be: UTF-8

On build 127, thanks to Mauro Trevisan, I improved the parsing of rules, to accept repeated rules per word:

For the word "ultrateren":

. u l t r a t e r e n .
   1l
  
2l t
    
1t    1t
      t
2r
     1t
       1r      1r
                  
1n
                  
2n .
 -----------------------
  
2 1 2   1   1   2
  u l-t r a-t e-r e n = ul-tra-te-ren

Rules:
1l
2lt
1t
t2r
1t
1r
1n
2n.



Mauro also told me about another type of rules that already have an hyphen.

For the word "inportar":

. i n p o r t a r .
      p o
-r t a r
  i n1
 -------------------
     1   -
  i n-p o-r t a r = in-po-rtar

Rules:
po-rtar
in1


Press OPEN and select the hyphenation .DIC file.


Now-and-then remember to SAVE/SAVE AS to play safe.


5.4.3 — Menus

5.4.3.1 — HYP Validate

After it is coded it will allow to search for missing UTF-8 flags or duplicates of it, duplicate rules or rules without numbers, showing the problematic lines where the errors occur.


5.4.3.2 — Fix invalid spaces


5.5 — Autocorrect



5.5.1 — Creating an Autocorrect

Proofing Tool GUI build 201 blocks accepting a word both as correct and incorrect, which will avoid future mistakes of having words that autocorrect themselves.

5.5.2 — Editing an Autocorrect
First download the DocumentList.xml of the language you intend to use, from the official AOO/LO pages.

The autocorrect files in AOO/LO are stored in the path:
$instdir/share/autocorr/acor_*.dat which are actually zipped files containing the XML files.

Rename the .DAT files to .ZIP and extract the contents.

Using Notepad++ or other tool, format the DocumentList.xml so that it is in UTF-8 and it uses the structure (you can copy/paste this first line over the XML entry):
<?xml version="1.0" encoding="UTF-8" ?> <block-list:block-list xmlns:block-list="http://openoffice.org/2001/block-list">
<block-list:block block-list:abbreviated-name="incorrect1" block-list:name="correct1"/>
<block-list:block block-list:abbreviated-name="incorrect2" block-list:name="correct2"/>
<block-list:block block-list:abbreviated-name="
incorrect3" block-list:name="correct3"/>
<block-list:block block-list:abbreviated-name="incorrect4" block-list:name="correct4"/>
    etc. (use lines like the previous)
</block-list:block-list>

Have in mind that you must have per line only one pair of incorrect/correct. I noticed that the .XML I edited had all the text in one single line, so use Notepad++ to create a return at the end of each line:


Press OPEN and select the DocumentList.xml file.

Now just ADD/EDIT/DELETE the current entries.

Now-and-then remember to SAVE/SAVE AS to play safe.


5.5.3 — Menus

5.5.3.1 — AC Validate
Opens a window with validation tests and shows the results in an EditorGadget which can be extracted to a .txt file.

It has two steps:

Step 1:
It checks if the correct entries exist in the speller, thus the need to have a loaded dictionary.


Step 2:
Checks for invalid suggestion patterns such as:
color,colour
colour,color


5.5.3.2 — Bulk Import
incorrect1,correct1
incorrect2,correct2
etc.



5.5.3.3 —
Extract autocorrects
It extracts to a .txt file the list of autocorrect entries, with the structure:
diden't → didn't
speeking → speaking



5.5.3.4 — Clean up hex symbols


5.5.3.5 — Show/Delete/XDelete duplicates
It shows, deletes and eXclusive Deletes entries in the autocorrect list.

For example, Delete will keep one of the duplicates found:
word1
word1
word1
word2
word2
word3


It will keep:
word1
word2
word3



eXclusive Delete will remove all entries if they have at least one duplicate:
word1
word1
word1
word2
word2
word3

It will only keep:
word3

This is useful if you have appended your list (merged) into an existing autocorrect file and want only to commit to Gerrit (LibreOffice) the words not found on your list and on Gerrit's.


5.5.3.6 — Fix invalid spaces


5.6 — LanguageTool


5.7 — Language Specific


5.8 — Extension


5.8.1 — Menus


5.8.1.1 — .xpi/.oxt Properties



6 — History

V3.0 — ??.???.2020
Compiled with PureBasic 5.XX.

a) General
 — The manual has been rewritten;
 — Supports more window resolutions
 — Several new options and gadgets;
 — It uses dynamic arrays making all load/save operations ultrafast;
 — Shortcut keys;
 — Enhanced pop-up menu which can be used on the ListIconGadgets items with options for smart/faster use;

 — Invalid characters, such as spaces, while inserting data, turns the gadgets background to red;
 — Created a menu "Prefs" with settings that loads and saves in a file named "ptg3.prefs";

   (Prefs allowing to select a dynamic number of lines. This makes it compatible with all OSes).
 — It saves Dictionary + Thesaurus + Hyphenation + Autocorrect with #LF$ instead of #CRLF$ for Linux mode;
 — Speeded up several operations;
 — Better UTF-8 warnings;
 — GUI improvements and fixes;

 — Modern menus look for Windows;
 — Pop-up menu has an extra option "Clone" (very useful);
 — Added "Not coded yet!" to tabs not coded yet;
 — New About PureBasic menu icon;
 — Renamed "Quit" menu item to "Exit";
 — Created a colour constant #GreenDark that is better seen on a grey background;
 — Coded undo/redo;
 — New button to move to the bottom of the ListIconGadget;
 — Help menu item now shows "F1" in it;
 — F1 (help) now works in the edit/new items windows;
 — Formats the total number of items (numbers) to be better seen;
 — There is a "Recent Files:" entry in the "File" menu;
 — It is possible to Edit/Delete from the "Advanced Find" window;
 — Cleaned the code.
 — HTML Help menu item;
 — Added images to pop-up menus;
 — Message Requesters now have a sign;
 — About window now shows
www.proofingtoolgui.org as the project's page;
 — The default separator chr for numbers is now a comma;

 — New pop-up menu option "Invert Selection";
 — Changed the e-mail address in "About" window to Sapo;
 — Improved the check for updates code;

b) Linux

 — GTK3 support;
 — Now there is an icon while running;
 — Apache License link now works.
 —
Fix: Get maximum window on PREFS now works (still doesn't work with Ubuntu 16.04, only with 17.10 and maybe 17.04);
 —
Fix: Highlighted item during a CURS UP at the top seems to only be removed after the scroll.
 —
Fix: Highlighted item during a CURS DOWN at the bottom seems to only be removed after the scroll.

c) Dictionary
 —
Pop-up menu to copy the selected line word into the clipboard;
 — Taboo warning if the NOSUGGEST flag is used;
 — It is possible to have a custom
"AFF Aid" files with 16x16 PNGs flags;
 — Replaced the ListIconGadget field "Position" with "Code Position".
 — Added recursivity to dictionary word decoding (twofold);
 —
Support for FLAG NUM and LONG;
 — Improved: If a code isn't found in the .aff it no longer exits the decoding function;
 — Major speed gain in the .AFF optimising code (gl_ES);
 — It accepts: \/ are escaped "/" in dictionary words;
 — It combines PREFIXES against PRIMARY+SUFFIXES.
 — Support for extracting wordlist compounds for LanguageTool and check for duplicates makes use of morphological information;
 — The check for duplicates in dictionary, also shows the lines numbers;
 — Added "Missing Codes:" to the Dictionary editor;
 — Window enlarges according to window size definition;
 — Major speed gain in "Optimising .AFF decode cache" — up to ~30% faster;

 — Fix: Check for duplicates improved (morphologic data);
 —
Fix: Decoding of rules with a 0 (pt_PT);

d) Autocorrect
 
— Added Exclusive Delete;
 — Allow to try to open a file with a non-valid header;

e) Hyphenation
 — Coded the hyphenator.

f) Thesaurus
 —
Sorting synonyms naturally by replacing | with chr(9) and after sorting with |;
 — Update number of meaning while editing synonyms supports Mac OS line endings;
 — Saving the Thesaurus creates an .idx file.

 — It is possible to extract a thesaurus in book format (can be later converted to PDF);
 — EditorGadget has a grey tip on it;

 — Window enlarges according to window size definition;


On build 198:

 — Compiled using PureBasic 6.00 LTS;
 — Fix: Wordlists importing would cause refresh of GUI;
 — "Crunch words with flags (Marco Pinto - GB)":
   1) Fix: It now recognises POS information;

   2) Fix: Use wrong flag for 1 chr words for non-derivable words;
   3) Major speed gain: around twice the speed;
   4) Added
"Characters match" tab;
   5) GTK3 ready.
 — Updated colour constants to 2022-08-06;
 — Included common_procedures_20220814;
 — Coded statistics by letter and statistics are now GTK3 ready;

 — The Master wordlist extraction now has a button to extract uppercase-only words;
 — "Extract to bottom" in pop-up menu now has "(PFX)" and "(All)";
 — Initial support for mouse wheel scroll in the virtual ListIconGadgets
   (DOESN'T WORK PROPERLY YET WITH PUREBASIC 6.00 LTS).

 — One procedure for all tabs to update the virtual vertical scroll bars;
 — Removed from LanguageTool part "-ize/-ise .txt (English)";
 — Coding of "Sort regexp";
 — New pop-up menu option "Copy flag rule" in dictionary editor;
 — Added option to Language Specific Tools: "Convert XML tags to accents wordlist (very slow)".

On build 197:
 — Updated PureBasic to 6.00 beta 8, now compatible with Ubuntu 22.04 LTS;
 — Fix: It now correctly enables the undo/redo buttons on each tab and menu items.

On build 196:
 — Updated PureBasic to 6.00 beta 6;
 — Tip on prefs improved and changed position;
 — Prefs default to full-screen resolution and three simple rule for number of lines in full screen;
 — Main GUI 100% Ubuntu 20.04 LTS gadgets size;
 — Moved up the gadgets 7 pixels on each tab of the main window;
 — New Linux icon in 64x64;
 — add-on image 24x24 now is grey when no add-on opened;
 — Replaced "Filename:" with "File:";
 — Disable main GUI for input/output operations;
 — Started working on the use of .xpi/.oxt;
 — "Words missing in Master Wordlist" shows filesizes in ListIconGadget;
 — Words editor now has a tip saying that COMPLEX PREFIXES are not supported yet;
 — Edit autocorrect, the text boxes now have red ink for incorrect and green ink for correct;
 — LanguageTool: Improved the extract rules structure information;
 — Optimised the .AFF decode cache;
 — Included common_procedures_20220210.

On build 193–195:
 — Updated PureBasic to 6.00 beta 1;
 — Improved GUI to GTK 3.20 (Ubuntu 20.04);
 — Initial support for "AutoText" (view only);
 — Added "eye glasses" to "read only" files (AutoText + LanguageTool);
 — Added "trashcan" image to clear the file history in the menu;
 — Coding of "Delete Duplicates small wordlist (very slow)";
 — Started showing the time taken in timely operations (WIP);
 — Add-ons icon for open .xpi/.oxt (still not working);
 — Improved the LanguageTool Structure Extract;
 — Included common_procedures_20211130b.

On build 192:
 — Fix: "Show/Merge/Delete duplicates in .dic" would not detect "All Prefixes";
 — Improved "GB to AU/CA/NZ/ZA (Marco Pinto - GB)";
 — Rewrote the "Preferences";
 — Added spaces on left regarding autocorrect (prefs + core);
 — Added tab "AutoText" plus corresponding places in the last files list;
 — Updated the common procedures to V20210720.


On build 189–191:
 — Fix: Root word in exporting as .CSV;
 — Fix: LanguageTool: Trying to export empty rules structure now shows "Aborted.";
 — Fix: Fix font issue in Prefs;
 — Improved "GB to AU/CA/NZ/ZA (Marco Pinto - GB)";
 — Cleaned the Prefs code and disabled 1024x resolutions in it;
 — Improved LanguageTool rules decoding;
 — Created an option in the pop-up menu "Extract to bottom" (not working yet);
 — The sizes of load/save main files for each tab shows the files label. Ex: "(aff:1 MB)+(dic:10MB)";
 — Cleaned the code;
 — Updated the common procedures to V20210703.

On build 188:
 — Fix: Bulk Import: empty lines aren't added to the .dic;
 — Replaced all "Okay" occurrences with "OK" (Olivier Hallot);
 — Options that use file open/save show on the status bar the filesize after "OK.";
 — Prefs: "0" disables above N words warning (Olivier Hallot);
 — LanguageTool:
   1) Basic support to open a grammar.xml file;
   2) Bulk Import Multiwords now accepts more POSs + combo box improvements;
   3) Added menu item to export rules structure.
 — "Show/Merge/Delete duplicates .dic":
   1) Now has the "Process" + "Export" button disabled if results=0;
   2) Now uses an EditorGadget line buffer to avoid refresh CPU load.
 — Added menu item to delete duplicates small wordlist;
 — Extract wordlist as CSV now has a "Root_word" field (Shantanu Oak);
 — Updated the common procedures to V20210113.

On build 183–187:
 — Compiled with PureBasic 5.73 LTS.
   Its beta 3 fixed Hindi and Korean fonts:
   https://www.purebasic.fr/english/viewtopic.php?f=4&t=76204;
 —
Fix: Certain parts of PTG would make it stop responding if large amounts of data
        were used (such as the LibreOffice pt-BR dictionary);
 — Major GUI improvements all over, such as:
   1) Added version information to title bar;
   2)
"Clear" buttons now use a trashcan image;
   3) Several windows now have the go to top/bottom buttons;
   4)
"AC Validate" supports GTK3 and reordered its buttons;
   5) Partial coding of the
"LanguageTool" tab, adding extra buttons;
   6) Options using file save/extract show on the status bar the filesize.
 — New menu items:
   1)
"Crunch words";
   2)
"Crunch POS";
   3)
"Wikipedia rules testing";
   4)
"Sort small wordlist";
   5)
"Release notes".
 — Adding of F1 to open the help in various windows (not fully implemented yet);
 — Dictionary:
   1) Major improvements in
"AFF Validate";
   2) Coded showing twofold missing flags;
   3) Coding of
"Words missing in Master Wordlist";
   4) Coded
"Sort flags".
 — Autocorrect:
   1)
"AC Validate": disables the gadgets before processing and enables after;
   2)
"AC Validate": asks if continue when extracting words goes beyond the safe value on the prefs.
 — LanguageTool:
   1) Added more POSes.
 — Updated the common procedures to V20201202.

On build 177–182:
 — Fix: CIRCUMFIX flag (Brandon)(Mauro);
 — Fix: CTR+A works in the EditorGadget of the dictionary and hyphenation;
 — Fix: Dictionary Editor words starting with / (comments) accept spaces at end;
 — Major cleanup of the process pfx/sfx procedure;
 — Yellow flag in Dictionary editor if input is a rem;
 — Several parts have go to bottom/top buttons:
   1) Bulk import;
   2) Names/Places import;
   3) AFF Validate;
   4) AC Validate;
   5) GUI tabs.
 — Added a
"Clear" button:
   1) Bulk Import;
   2)
"Import places/people names".
 —
"Show/Merge/Delete duplicates .dic" has a "Remember" button;
 —
"Show duplicates wordlist" shows the total of words in the wordlist;
 — In the merge flags in .dics there is an EditorGadget and buttons;
 — Thesaurus: coded
"Sort on number";
 — Initial LanguageTool support on menu items:
   1) Exporting POS (unfinished);
   2) Bulk Import Multiwords.
 — Updated the common procedures to V20200508.

On build 163–176:
 — Fix: PureBasic 5.72 LTS updated all libraries fixing a vulnerability in the RegExp library;
 — Fix: Now in Linux, it scrolls to the cursor position of the flags in the Dictionary editor;
 — Fix: Long filenames now fit in the GUI;
 — Fix: NEEDAFFIX flag (Shantanu);
 — Fix: CIRCUMFIX flag partially (Brandon);
 — All file operations now use UNIX #LF$ BOM;
 — Better warning if no "SET UTF-8" is found in the .AFF while loading a .DIC;
 — Added menu item "Sort flags .dic";
 — Dictionary Editor:
   1) Coding of Shantanu's summary;
   2) New "Match" gadgets.
 — Started coding the "AFF Validate" (almost ready);
 — New icon in menus for exporting files;
 — About window URL now points to HTTPS;
 — Updated the common procedures to V20200322.

On build
152–162:
 — Fix: Opening more than one instance of PTG says that the prefs file is invalid (file not closed).
 — Dictionary Editor:
   1) Checks inside a rule if it has the NEEDAFFIX FLAG;
   2) Allows sorting by column (Shantanu Oak);
   3) Replaced "Flag" symbol with text.
 — Statistics:
   1) Coding the extra options for the statistics (Shantanu Oak);
   2) Extract with and without flags option (Shantanu Oak).
 — Added one new menu: "Extension Tools".
 — Added submenus:
   1) Thesaurus Validate;
   2) eXclusive Master Wordlist;
   3) .xpi/.oxt Properties;
   4) Preview Thesaurus.
 — Replaced "GB to AU/CA/NZ" with "GB to AU/CA/NZ/ZA" (added also ZA).
 — Replaced "UTF-8 WITHOUT BOM" notices with just "UTF-8".
 — About window now shows "Version"+"Build"+"(beta)" (no more hyphen).
 — Support for Latin sorting:
   1) Dictionary;
   2) Thesaurus;
   3) Autocorrect.
 — Updated the common procedures to V20191210.

On build 151:
 — Uses PureBasic 5.71 LTS;
 — Common Procedures include (common functions for all my projects to avoid repeating the code);
 — Statistics now removes ".00" for % with round values;
 — Find:
   1) Has two extra tabs: "% Match" and "Status";
   2) Pressing CTR + F places the cursor in the search string gadget.

On build 150:
 — First column of the main ListIconGadgets now in grey (more professional);
 — Dictionary Editor:
   1) Added improved GB rules to .AFF Aid Language;
   2) Coded "Remove flag from textbox" in pop-up menu;
   3) Coding of COMPLEXPREFIXES (only detects it, it doesn't work).
 — Autocorrect: New wrong/right images;
 — Ubuntu 18.10+: Main window gets a bigger height panel gadget if the OS is Linux.

On build 148–149:
 — Fix: CIRCUMFIX flag, thanks to Brandon and Mauro;
 — Dictionary editor: Added 10 pixels to each field of the ListIconGadget for larger fonts usage.

On build 147:
 — Added a new menu item: "GB primary to -ize/-ise script";
 — Several parts now show "<ESC> to abort." during scan;
 — Fix: "Show/Merge/Delete duplicates .dic" variable in function used by value (undo not working properly);
 — Preferences allow to choose the font for the language.

On build 146:
 — Linux requires GTK3 (Ubuntu 18.04 LTS or above);
 — "GOTO" window shows the number of items between 1 and range;
 — Statistics use two decimal houses in the %.

On build 145:
 — Coded "GB to AU/CA/NZ".
 
On build 144:
 — Saving the dictionary also shows "Saving .aff file...";
 — Autocorrect string gadgets are now wider;
 — Dictionary editor:
   1) Added a refresh button;
   2) Allows to refresh decoding during processing, and OKAY and CANCEL.
 — Redesigned the About PureBasic window;
 — Finished coding "Show/Merge/Delete duplicates .dic" and added colours to affixes;
 — Now it uses a folder "settings" for the various working files;
 — Hyphenation:
   1) Decodes also "=";
   2) Supports commented words (turns StringGadget to yellow);
   3) Uses colours in the flags (Prefix, Suffix, Both);
   4) "No matches" in red if no rule applied.

On build 143:
 — Fix: "Preferences": Small GUI fix for Linux;
 — Fix: "Show/Merge/Delete duplicates .dic": Small GUI fix for Linux;
 — Added support for "KEEPCASE" flag;
 — In the dictionary editor:
   1) Increased the delay before refreshing the derivates while typing;
   2) Refreshing derivates now updates the number found during processing.
 — Continued coding "Show/Merge/Delete duplicates .dic": it will only respect the settings for
   morphological information and merge primary words and delete exact duplicates. I have been
   trying to code the merging of flags but there is a bug in the code which will require more
   attention. In simple words: the merging of flags is bugged and doesn't do anything.

On build 142:
 — Fix: Improved the Preferences window;
 — "Show duplicates wordlist" now clearer export to read and exports in UNIX format with BOM;
 — Export wordlist now exports in UNIX format with BOM;
 — Extracting as CSV the rules number have zeros before to make sorting easier in Calc/Excel;
 — Added to dictionary editor pop-up menu:
   1) "Copy all words"
   2) "Copy all words & rules"
 — Continued coding "Show/Merge/Delete duplicates .dic" ("Process" button still not coded).

On build 141:
 — Fix: Description of two GB rules in .AFF Aid Language by Ding-adong;
 — Fix: No longer decodes commented dictionary words (starting with "/");
 — Fix: Decoding of prefixes using twofold if there were flags of suffixes after flags of prefixes;
 — Commented dictionary words are now displayed in yellow in the ListIconGadgets;
 — Dictionary editor: Change background colour to yellow if word is commented (starts with "/");
 — Continued coding "Show/Merge/Delete duplicates .dic";
 — It is now possible to abort an advanced find in progress with <ESC>;
 — Speeded up decoding affixes;
 — "Warn over N words" while extracting and counting duplicates on wordlist.

On build 140:
 — Fix: Removed flag "=" to be recognised as morphological information on dictionary load;
 — Dictionary editor now has in the pop-up menu
"Copy word & rule";
 — Minor speed up decoding affixes;
 — Replaced prefix flag
"O" with flag "^" GB rule in .AFF Aid Language by Ding-adong;
 — Started coding
"Show/Merge/Delete duplicates .dic" (only "Scan" and "Export" working);
 — Dictionary words editor now allows to jump to flags position also in Linux and Mac.

On build 139:
 — Better decoding of affix rules, including dots, by using the RegularExpressions library;
 — Better icon for help;
 —
"unduplicate" is now called "deduplicate";
 — Two new dictionary options (not working yet):
   1) Show/Merge/Delete duplicates .dic;
   2) GB to AU/CA/NZ.
 — E-mailing Marco in the About window now adds the PTG version+build to the subject;
 — Added new GB rule to .AFF Aid Language created by Ding-adong.

On build 138:
 —
Fix: Preferences: Combobox icons with colours now appear straight in Linux;
 — Preferences: If
"Maximise window" height <600 pixels, the radio button gets disabled;
 — CIRCUMFIX finally coded.


On builds 134–137:
 — Compiled with PB 5.70 LTS;
 — Major speed up in
"AC Validate" in Ubuntu, thanks to the forum user #NULL;
 — Added one sentence in check for updates, if up to date:
"Your version is up to date.";
 — Coding of
"Bulk import" for dictionary+thesaurus+autocorrect;
 —
Fix: "Bulk import" was doing undo for dic + thesaurus + autocorrect (for all on abort);
 — Import the places/people names now accepts empty dictionaries;
 — Linux:
"Bulk Import" and "Import Proper names" can now close the window in the gadget;
 — Edit/New Dictionary word window now shows
"FLAG CHR", "FLAG NUM" OR "FLAG LONG";
 — Edit/New Dictionary word now shows
"CIRCUMFIX";
 — Edit/New Dictionary word now supports two instances of
"0" in the rules;
 —
Fix: replaced regarding prefixes conditions_per_line_sfx$(4) with conditions_per_line_pfx$(4);
 — Coded the
"CIRCUMFIX" flag (doesn't decode properly yet, just informs if it is used);
 — Import places/people names now also removes the chrs
"<", ">", " - ", " − " and " — ";
 — Replaced some more
"words" with "dictionary words";
 — In the
"Statistics" the total number of words is now a StringGadget (can copy its contents);
 — Now the dashes are larger in the confirm quit for files changed;
 — Now the dashes are larger in the copyright years, PB version and About window;
 — Added two
"Copy" (to clipboard) buttons in statistics;
 — Preferences now has an option for
"Both" in Affixes (CIRCUMFIX);
 — Improved the warning message when
"AC Validate" is selected and there is no dictionary.

On build 133:
 — Check for updates now checks if build on the site data is outdated;
 — Allows to import places/proper names with possessives (it still doesn't check if the flags exist in any position);
 — Better Linux icon;
 — Better looking logo in the about window;
 — Updated the check for updates URL;
 —
"Recent Files" menu item now has an icon;
 — If checking for updates and found, the OKAY button now directs the browser to the download bookmark;
 — Pop-up menus now use the flag: #PB_Menu_ModernLook (Windows only);
 — Floppy icon on tabs (blue/red);
 — Coding of the
"AC Validate";
 — Replace the warning
"No words found." with "No dictionary words found.".

On build 132:
 — Cleaned the decoding of prefixes and suffixes;
 — Executable icon no longer stretched (Windows);
 — Menu items in "Hyphenation" (not working yet):
   1) HYP Validate;
   2) Fix invalid spaces.
 — Added undo/redo buttons to Hyphenation tab (disabled);
 — Default resolution is now 1280×600;
 — Dictionary editor:
   1) Code (flag) is now an Emoji;
   2) Taboo image now has a tip;
   3) Missing flags is now written in red;
   4) Colours for "Suffixes" (#GreenDark) and "Prefixes" (#Blue)
      (Prefs allows to customise colours for "Primary", "Suffix" and "Prefix");
   5) Better check if a word is valid before exiting;
   6) Image for NEEDAFFIX and FORBIDDENWORD;
   7) Fix: NEEDAFFIX and FORBIDDENWORD variables reset at start of ADD/EDIT.
 — About window:
   1) Added (replaced text with) Emoji to Name, E.mail and S.mail;
   2) E-mail and URL now use the function GadgetFit to avoid Linux size/positioning issues.
 — Replaced the word "Tags" with "Postags";
 — Statistics:
   1) Now allows to calculate the number of new words since the last mentioned number;
   2) Now exports in Linux format and also the extra information;
   3) Optimised slightly the processing regarding uppercase words.
 — Changes show a colour asterisk in red;
 — Cleaned the code by using a function to turn the gadgets background to Red/White on errors.

On build 131:
 — Speed up in dictionary processing by searching first for
"SFX Y" instead of "SFX N";
 — Fix: Count words now separates thousands with the preferences character;
 — Improved error handling in hyphenator;
 — Taboo's warning sign is now an image and is stored in a long instead of a string for speed up;
 — New strategy for extra speed decoding the dictionary, but it takes more RAM;
 — New pop-up menu option: "Copy" for LanguageTool;
 — Dictionary: Combobox of ".AFF Aid" wider for bigger windows;
 — Fix: Decoding of prefixes wasn't working for some suffixes under certain conditions
        (due to the new "SFX N" code).

On build 130:
 — Fix: In Hyphenation tab it was not possible to use CURS UP/DOWN in the Hyphenation ListIconGadget;
 — New pop-up menu option: "Move" for LanguageTool;
 — Coding of "Check for Updates";
 — Fix: Coded "SFX N" rules, by not applying prefixes to it (Mykhailo Oliinyk);
 — Cleaned the code a bit.

On build 129:
 — Hyphenation: Fixed/Improved the display of rules (Mauro Trevisan):
   1) No spaces;
   2) Dots.

On build 127–128:
 — Hyphenation:
   1) Pop-up menu: "Copy word","Copy rule","Copy word & rule";
   2) Support for "-" in hyphenation rules;
   3) Support for "%" in hyphenation rules;
   4) Fix: Hyphenation now supports repeated rules per word.
 — Major Speed up of hyphenation;
 — Major Speed up of dictionary processing.

On build 124–126:
 — Improved the manual a lot;
 — The tab "Dictionary" now has "*.dic + *.aff" to make it easier to understand;
 — Apache License is now built-in in the manual;
 — Support for compressed AF;
 — NEEDAFFIX support;
 — Hyphenation:
    1) While hyphenating words it shows the current count+total;
    2) Adds "UTF-8" to hyphenator if EditorGadget is blank;
    3) Hyphenation only works if the EditorGadget starts with the word "UTF-8";
    4) Fix: Didn't skip the word "UTF-8";
    5) Fix: defaults MIN LEFT/RIGHT before checking for new values;
    6) Now it detects if rules don't end with a <RETURN>;
    7) Better detection of no rules;
    8) Added a "Clear" button;
    9) Added label: "# Hyphenations:";
   10) Hyphenation now respects the flags: LEFTHYPHENMIN and RIGHTHYPHENMIN;
   11) Fix: Would skip the 5 lines header (if no standard header, it would skip the first five rules);
   12) Fix: Changing resolution in the Preferences would reset each tab filename to "n/a" and erase the Hyphenation rules.


8
PCRE Licence (used by the RegularExpression library)

PCRE LICENCE
------------

PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.

Release 7 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.

The basic library functions are written in C and are freestanding. Also
included in the distribution is a set of C++ wrapper functions.


THE BASIC LIBRARY FUNCTIONS
---------------------------

Written by:       Philip Hazel
Email local part: ph10
Email domain:     cam.ac.uk

University of Cambridge Computing Service,
Cambridge, England.

Copyright (c) 1997-2007 University of Cambridge
All rights reserved.


THE C++ WRAPPER FUNCTIONS
-------------------------

Contributed by:   Google Inc.

Copyright (c) 2007, Google Inc.
All rights reserved.


THE "BSD" LICENCE
-----------------

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

    * Redistributions of source code must retain the above copyright notice,
      this list of conditions and the following disclaimer.

    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

    * Neither the name of the University of Cambridge nor the name of Google
      Inc. nor the names of their contributors may be used to endorse or
      promote products derived from this software without specific prior
      written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

End