V3.0 — ??.???.2023
© 2013-2023 Marco A.G.Pinto and
Community Contributors.
Freely distributable and modifiable under the
Apache License v2.0.
SEMI-FINISHED MANUAL — REQUIRES A
FULL REVISION WHEN I HAVE THE TIME!
LAST UPDATE: 2023-05-05
Index
1 — Introduction
2 — Copyright & DISCLAIMER
3 — Contacts
4 — Thanks
5 — How it works
5.1a — Using
UTF-8
5.1b — EOL Windows VS
Unix VS Mac
5.1c — Packing the files
into Extensions
5.1d — Shortcut keys
5.1e — Preferences
5.1f — Sort
5.1g — Find
5.2 — Dictionary
5.2.1 — Creating a Dictionary
5.2.2 — Editing
a Dictionary
5.2.3 — How
Suffixes/Prefixes work
5.2.4 — What
is flag position and rule
5.2.5 — Menus
5.2.5.1 —
AFF Validate
5.2.5.2 — Bulk
Import
5.2.5.3 — Import
places/people names using possessives
5.2.5.4 — GB
to AU/CA/NZ/ZA (Marco Pinto - GB)
5.2.5.5 — GB
primary to -ize/-ise script (Marco Pinto - GB)
5.2.5.6 —
Extract wordlist
5.2.5.6.1 — All
.txt
5.2.5.6.2 — All
.csv
5.2.5.6.3 — Compounds
.txt
5.2.5.6.4 — Compounds
.txt (LanguageTool)
5.2.5.7 —
Count wordlist
5.2.5.8 —
Show/Merge/Delete
duplicates .dic
5.2.5.9 —
Show
duplicates wordlist
5.2.5.10 —
Statistics
5.2.5.11 — Words missing in Master Wordlist
5.2.5.12 — Sort
flags .dic
5.2.5.13 — Crunch
words with flags (Marco Pinto - GB)
5.2.5.14 —
Fix invalid spaces
5.3 — Thesaurus
5.3.1 — Creating a
Thesaurus
5.3.2 — Editing
a Thesaurus
5.3.3 — Menus
5.3.3.1 — Thesaurus
Validate
5.3.3.2 — Bulk
Import
5.3.3.3
— Extract synonyms
5.3.3.4
— Clean up symbols (Hex
to Emoji)
5.3.3.5
— Show/Merge
duplicates
5.3.3.7 — Combine/Sort/Deduplicate simple meanings
5.3.3.7.1 — Combine
5.3.3.7.2 —
Deduplicate
5.3.3.7.3 — Sort
5.3.3.8 — Preview Thesaurus
5.3.3.6 — Fix
invalid spaces
5.4 — Hyphenation
5.4.1 — Creating
a Hyphenation
5.4.2 — Editing
a Hyphenation
5.4.3 — Menus
5.4.3.1 — HYP
Validate
5.4.3.2 — Fix
invalid spaces
5.5 — Autocorrect
5.5.1 — Creating
an Autocorrect
5.5.2 — Editing
an Autocorrect
5.5.3 — Menus
5.5.3.1 — AC
Validate
5.5.3.2 — Bulk
Import
5.5.3.3 — Extract
autocorrects
5.5.3.4 — Clean
up hex symbols
5.5.3.5 —
Show/Delete/XDelete duplicates
5.5.3.6 — Fix
invalid spaces
5.6 — LanguageTool
5.7 — Language Specific
5.8 — Extension
5.8.1 — Menus
5.8.1.1 — .xpi/.oxt
Properties
6 — History
7 —
Apache License
8 — PCRE Licence
F1 opens the help in various windows (not
fully implemented yet).
1 — Introduction
An open-source
multiplatform advanced linguistic tool coded in PureBasic for
editing the Dictionary/Thesaurus/Hyphenation/Autocorrect files
of OpenOffice, LibreOffice, Firefox, Thunderbird and
SeaMonkey, provided they are in UTF-8
format.
This program was originally developed to easily
edit the synonyms of OpenOffice and LibreOffice.
I had this idea after asking the persons in
charge of the pt_PT project, from Minho University in
Portugal, what I should do to suggest synonyms since only
suggested words for the Portuguese speller were added.
I was told that they didn't know how to add
synonyms, since the person in charge of that project left it
long-ago (2006).
Later, I wanted to make it compatible with
Firefox and Thunderbird, after it became possible to edit
dictionaries. I hoped that in the future, someone would use it
in Thunderbird and fix the en_GB speller, which was full of
typos and missing words. Since no one volunteered, I took this
task myself in 2013.
This is where my idea came from: develop
something easy to use since I tried some official tools for
the tasks and I didn't understand anything on them, not even
how to use them.
My tool is so intuitive that even a child can use
it.
2 — Copyright & DISCLAIMER
This program is copyrighted to Marco
A.G.Pinto and Community Contributors.
It
is freely distributable and modifiable under the Apache
License v2.0.
The RegularExpression library used to decode
prefixes and affixes is copyrighted to PCRE ( http://www.pcre.org/pcre.txt
) and the licence is at the end
of the manual.
The lamp image used here was designed by
Ignacio Javier Igjav under the
Creative Commons Attribution-Share Alike 3.0 Unported
licence.
3 — Contacts
(coder)
S.Mail: |
Marco A.G.Pinto
Apartado 3083
2746-501 Queluz
(Portugal
)
|
E.Mail: |
marcoagpinto@sapo.pt |
4 — Thanks
Some special thanks go to:
Groups/Organisations:
— Apache Community;
— LanguageTool Community;
— LibreOffice Community;
— Mozilla/Thunderbird Community;
— PureBasic Community.
Persons:
— Alberto Simões (Minho
University);
— Alexandro Colorado (Apache
OpenOffice);
— Andrea Pescetti (Apache
OpenOffice);
— Andreas Mantke (LibreOffice);
— Andrew Ferguson (PureBasic);
— António Manuel Dias (former
pt_PT maintainer);
— Áron Budea (LibreOffice);
— Ashley Scott (PureBasic);
— Bernd Krüger-Knauber (PureBasic);
— Brandon Dupuis;
— Chris Saxon (PureBasic);
— Daniel Naber (LanguageTool);
— Dennis Roczek (LibreOffice);
— Filiep Spyckerelle (European Parliament);
— Dominic
Wujastyk;
— Frédéric Laboureur (PureBasic);
— Gervase Markham (Mozilla);
— Guy Waterval (Apache
OpenOffice);
— Heinz Urban (PureBasic);
— Ian Neal (Mozilla);
— Jeroen Ooms (LibreOffice);
— Jonathan Kew (Mozilla);
— Jörg Knobloch (Mozilla);
— José Almeida (Minho University);
— Kruno (LibreOffice);
— Kevin Scannell (Mozilla);
— Martin Srebotnjak (LanguageTool);
— Martin Tlustos (LibreOffice);
— Matthias Mailänder (LanguageTool);
— Mauro
Trevisan (LibreOffice)(Mozilla);
— Mykhailo Oliinyk;
— Olivier
Hallot (LibreOffice);
— Pedro Marques (IADE —
Creative University);
— Peter Chamberlin (Mozilla);
— Philip
Taylor;
— Ricardo Palomares Martínez (Apache OpenOffice);
— Shantanu Oak (LibreOffice);
— srod (PureBasic);
— Stuart Swales (Apache OpenOffice);
— Thomas Schulz (PureBasic);
— Tiago Santos (LibreOffice)(LanguageTool).
5 — How it works
5.1a — Using UTF-8
This tool was made to work with UTF-8
encoding.
A good trick to convert the old encoding
formats to UTF-8 is to use, for example, the Notepad++ editor for
Windows.
Simply open the files with it, change the
encoding to UTF-8 using the menu: Encoding
→ Convert to UTF-8-BOM,
so that accents appear well.
Then, use the Save As
option and select "Normal text file (*.txt)" and
it is done.
Don't forget to
change by hand in the header of the files, the word that has the
old format, with the new one.
The headers with the font encoding are
inside the files. See for example Version 2.4 (01/09/2007)
of the Italian files:
— The Dictionary (.DIC + .AFF):
The .DIC has no keyword.
The .AFF has the following keyword:
SET ISO8859-15 → Replace
with SET UTF-8
— The Thesaurus (.DAT):
It has in the first line:
ISO8859-15 → Replace
with UTF-8
— The
Hyphenator (.DIC):
It has in the first line:
ISO8859-15 → Replace
with UTF-8
5.1b — EOL Windows VS Unix VS
Mac
I have done some tests saving in
Windows and Unix and the Windows files
become bigger.
This happens because the End of Line characters is different in
Windows and in Linux.
Windows uses #CRLF$, Unix
#LF$ and Mac #CR$.
Linux/Unix is the open-source standard,
so it is better to use its format.
5.1c — Packing the files into
Extensions
To create extensions you will have to use
other package which I don't know yet.
The simplest way though, is just to replace the files of an
existing extension with yours.
You should use the SORT button before you can
consider your Dictionary/Thesaurus/Autocorrect ready for being
packed into an extension.
Making extensions for Mozilla seems easier than making for
OpenOffice/LibreOffice, since for them it is more complex due to
the fact that they can have multiple languages in one archive.
Compress in a .ZIP
archive and then change the file extension to the target software.
5.1d
— Shortcut keys
TAB SWITCH RIGHT — CTR + TAB
TAB SWITCH LEFT — SHIFT + CTR +
TAB
OPEN — CTR +
O
SAVE — CTR +
S
SAVE AS — SHIFT +
CTR + S
FIND — CTR +
F
ADD — CTR +
A
GOTO — CTR +
G
DELETE — DEL
EXIT A WINDOW & ABORT PROCESSING & ABORT
OPEN/SAVE/SAVE AS — <ESC>
EXIT — CTR +
Q
5.1e
— Preferences
Tools → Preferences
This window allows to select the global
settings of PTG.
If PTG doesn't find the file ptg3.prefs, or if
it from a different version, it will open this window
automatically at the start and you will need to save it.
Number of lines visible for each ListIconGadget in the
tabs:
Double-click in the last line that fits in the gadget,
the last line visible, and watch for number in visible
line textbox changes to what's selected. More space can be gained
by changing the window resolution in
this preference dialogue.
You may want to use the "Plus Pixels"
to set the accuracy between -10 and +10 pixels.
This will make it work with all OSes.
PTG will try to guess the number of lines according to the maximum
window resolution in the default prefs.
Resolution:
Set your preferred window resolution.
There is also the possibility of using a
window that will fit your desktop entirely.
The GUI will fit to the chosen resolution.
The default
resolution is always the window size that fits the screen
resolution
LanguageTool:
This will set the minimum number of blank lines between each chunk
in a LanguageTool grammar.xml file for parsing.
The idea is to edit/create rules using PTG.
Not working yet.
.AFF Aid Language:
It is possible to have external files with a list of each code, so
that while working in the dictionary edit/add window, it is easier
to find specific rules.
The GB Aid is hard-coded into PTG.
How .AFF Aid works:
The first line of
the .txt uses the 3 letter
ISO standard for the language, followed by the language in the
language itself separated by e.g. a colon. vec:Vèneto
You then still could make separate versions of the file, if you
want to make a more specialised version (indicating the
country of dialect): vec-IT (for Italian version) or
vec-BR.
The standard list of codes can be found at:
https://www-01.sil.org/iso639-3
You can consult the list also directly on the Web at
https://www-01.sil.org/iso639-3/codes.asp
There is the possibility to have
a third field with the name of the language in English, to suit
people who do not know the code, neither the language e.g.
nld:Nederlands:Dutch .
So, for Veneto, see the example:
custom_aff_aid.vec.png ;16x16
PNG locale for the ComboBox.
custom_aff_aid.vec.txt
;Veneto Rules. See the ".vec"
before the ".txt".
In the .txt file we followed the lines logic:
vec:vèneto:Veneto
a0/a1/a2: common verbs (first
conjugation verbs, second conjugation arrhizotonic verbs)
b0/b1/b2: second conjugation
rhizotonic verbs
So, the first line is what appears in the PTG ComboBox, and the
rest are the rules.
Then place both files inside the folder:
"custom_aff_aid"
Number Separator Character:
To separate thousands in the number of items by any character,
while displaying numbers.
For example: 1,000,000 or 1 000 000, etc.
Warn over N words decoding (RAM/HDD):
To avoid running out of RAM or hard disk space in wordlists
with millions of entries, we have this option.
It will show a warning if it processes more than N words, if
one wishes to continue or abort.
For example, the option to search for duplicates fills the RAM
with the decoded wordlist.
The option to extract the wordlist to HDD fills the space in
the hard disk with the decoded wordlist.
Font for language:
If you have corrupt characters in your language, use this
option to change the font and all gadgets will adapt its
contents accordingly.
For example: for the Marathi language use the font: "Arial
Unicode MS".
Check for Updates:
You may select a search interval to automatically look for PTG
updates when it is run or not to search at all.
If an update is available and you wish to download it, you can
choose that in the pop-up window informing that there is a new
version and it will take you directly to the download page:
https://proofingtoolgui.org/#downloads
Affix colours:
It will allow to select the font colour for primary, prefixes,
suffixes and both
(CIRCUMFIX) in the
dictionary editor window.
5.1f
— Sort
Before releasing an official update of your
work you should sort the items.
I came up with an idea on how to solve
the PureBasic issue in which strings are sorted by ASCII which
makes words with diacritics to appear
at the bottom of the lists.
What I did was: create
a structured array with a field where I store each word in
lowercase without diacritics, and other
field where I store the original word.
Then, I sort by index 1 and repopulate with index 2.
Notice that I had to create a function to
remove the accents and it is possible that some letters may be
missing there and will be added when found/suggested.
With my code, letters with diacritics appear in the same orders
as if they didn't have diacritics which is the indicated for
spellers.
The entries move around on sorting because uppercase and
lowercase are treated as equal, which means they can change
position on sorting.
Converted accents on 2020-05-08:
; a
"á","a"
"à","a"
"ä","a"
"ã","a"
"â","a"
"å","a"
"ā","a"
"æ","a"
"ă","a"
; b
"ḃ","b"
; c
"ç","c"
"č","c"
"ċ","c"
"ć","c"
"ĉ","c"
; d
"ḋ","d"
"đ","d"
"ď","d"
; e
"é","e"
"è","e"
"ë","e"
"ê","e"
"ĕ","e"
"ē","e"
"ė","e"
"ễ","e"
; f
"ḟ","f"
; g
"ģ","g"
"ğ","g"
"ġ","g"
; i
"í","i"
"ì","i"
"ï","i"
"î","i"
"ī","i"
; j
"ɉ","j"
; l
"ƚ","l"
"ł","l"
"ľ","l"
; m
"ṁ","m"
; n
"ñ","n"
"ń","n"
"ň","n"
; o
"ó","o"
"ò","o"
"ö","o"
"õ","o"
"ō","o"
"ø","o"
"ô","o"
; p
"ṗ","p"
; r
"ř","r"
; s
"ṡ","s"
"š","s"
"ș","s"
; t
"ṫ","t"
"ŧ","t"
"ť","t"
"ț","t"
; u
"ú","u"
"ù","u"
"ü","u"
"û","u"
"ū","u"
; w
"ẃ","w"
"ẁ","w"
"ŵ","w"
"ẅ","w"
; y
"ў","y"
"ý","y"
"ỳ","y"
"ŷ","y"
"ÿ","y"
; z
"ž","z"
5.1f — Find
It is possible to search for items in
each tab by pressing CTR + F or the "Find"
button.
It is also possible to directly Edit/Delete items from the results
by a right-click over the fount items.
The options are very simple:
1) Left match text:
Searches for items that match from the
left the search expression;
2) Match case:
Case sensitive search;
3) Match extra information: Searches for
text in all columns of the items and not just in the first.
The gadget tabs are:
1) Matches: Shows the words
that match the search text;
2) Position: Shows the match
position in main ListIconGadget;
3) % Match: Shows how much
the search text matches the found words. Notice that it includes
separators and flags;
4) Status: If you
right-click and edit a word making changes, it will appear in
red: "Changed".
You can abort an ongoing search by pressing <ESC>.
|
TIP:
Always search if a word
exists before adding it.
Press CTR + F
and type/paste the word there.
This takes a second and
saves a lot of work. |
5.2 — Dictionary
A dictionary is basically two files, one
with the words+flags (.dic) and other with the
flags (.aff) that will derivate the .dic
words.
For example, I could have in the affixes file some lines saying
that a flag "S" would add a "123" to words, and in the .dic I
could have a word "Marco/S"
that would be decoded into:
Marco
Marco123
Flags are case sensitive, so "S"
is different from "s".
PTG supports the following Dictionaries features:
1) FLAG chr, number and long;
2) AF Compression;
3) Twofold recursivity;
4) NOSUGGEST flag;
5) NEEDAFFIX flag;
6) FORBIDDENWORD flag;
7) CIRCUMFIX flag;
8) KEEPCASE flag;
9) COMPLEXPREFIXES flag
(detects but doesn't decode yet).
5.2.1
— Creating a Dictionary
If you have a
Dictionary in memory, use ERASE
to delete all entries.
To create a Dictionary from zero you just have to press the button
ADD to add words.
Use EDIT or double-click to change information
regarding the words.
Use DELETE or <DEL> to
remove entries.
The format of the Dictionary is two UTF-8 format
files with the extension .DIC and .AFF
.
Even though the tool reads the .AFF file, I
still haven't read documentation about how it works. This means
that creating a Dictionary from scratch will require some previous
knowledge.
The .DIC file is the list of words, and the .AFF
file is a list of rules and other options.
See the first two paragraphs of:
https://www.chromium.org/developers/how-tos/editing-the-spell-checking-dictionaries
Now-and-then remember to SAVE/SAVE AS to play
safe.
5.2.2 — Editing a Dictionary
First download the extension of
the language you intend to use, from the official pages.
You should have an .oxt or .xpi
file which you rename to .zip
in order to extract its contents to HDD.
Press OPEN and select
the .dic file of the
Dictionary and my tool will also open the associated .aff file.
Now just ADD/EDIT/DELETE
the current entries.
Now-and-then remember to SAVE/SAVE
AS to play safe.
5.2.3 — How Suffixes/Prefixes work
A small explanation how to make suffixes/prefixes work, based on
the e-mail written by Ricardo Palomares Martínez:
While editing dictionaries, you can add one or more identifiers
in front of a word, after a "/".
For example, the en_GB .AFF uses the
identifier "S" to create
plural:
party/S
This will look in the .AFF file and find:
SFX S Y 9
SFX S y ies
[^aeiou]y
SFX S 0 s [aeiou]y
SFX S 0 es [sxz]
SFX S 0 es [cs]h
SFX S 0 s [^cs]h
SFX S 0 s [ae]u
SFX S 0 x [ae]u
SFX S 0 s [^ae]u
SFX S 0 s [^hsuxyz]
SFX
S Y 9
SFX → It is a suffix (PFX
would mean a prefix).
S → The suffix identifier.
Y → Y for YES. It means the rule
can be cross-used with other prefixes and suffixes.
If N the rule can't be
applied together with other affixes the word might have.
9 → The number of lines related
to this rule.
SFX S y ies [^aeiou]y
SFX → It is a suffix (PFX would
mean a prefix).
S → It is the suffix/prefix
identifier.
y → For a suffix it is the
letter(s) to be removed from the end of the word.
For a prefix, from the
beginning of the word.
ies → For a suffix, it is the
letter(s) to be added at the end of a word.
For a prefix, from the
beginning of the word.
[^aeiou]y
→ Condition in regexp notation. Here, the rule is applied to
words ending with
a "y"
and the letter next to the last is NOT a,
e, i, o or
u.
Yes, the ^
means that the letters mustn't match.
So, party/S would
produce: parties
And, boy/S would
produce: boys, triggering the following rule
which has a 0 saying that no letters are
replaced, just added. It applies to words ending with a "y".
There is no ^ which means that the second letter from the right
must be a, e, i,
o or u.
SFX S 0 s [aeiou]y
Also notice that if words have capitalised letters, the Hunspell
in the used software will only accept them with capitalised
letters exactly like in the .DIC (it
suggests a typo if different).
5.2.4
— What is flag position and rule
The derived words ListIconGadget has the fields: "Flag Position" and "Rule".
"Flag
Position" is the characters position of the first line (header)
of each rule used. For example:
SFX S Y 9
(It is a Suffix with identifier "S",
"Yes" and "9"
rules in it)
Then, inside the dictionary editor, you now have a column with the
rule number after the header. Double-clicking in a ListIconGadget
line will jump to the header, then you will just have to scroll a
few lines down to the rule number.
Please notice that the editor gadget in the add/edit word window
has a "clean" version of the .AFF
with space repetitions removed in order to be faster finding the flags (less characters to process).
Regarding the rules:
[^abc]de[fghi]
[fghi] means the current
character must be one of f,g,h,i.
de means it must have the word de
on the left of the last character check
above, if it matched the condition.
[^abc] means the current character in the word
must not be a,b,c
(all different).
See the example above for "party".
The rules may either be check from right to left (suffix) or left to right
(prefix).
Prefixes are run against the primary word and
suffixes. This means that if you have in the en_GB speller:
party/S
You will get:
party (primary)
parties (suffix)
If you had:
party/SU
You would run the "U" code (prefix)
against the two words above:
party (primary)
parties (suffix)
unparty (prefix)
unparties (prefix)
5.2.5 — Menus
5.2.5.1 — AFF Validate
After it is coded it will allow to search for
missing UTF-8 flags or duplicates of it,
duplicate rules or missing rules, showing the problematic lines
where the errors occur.
5.2.5.2 — Bulk
Import
word1
word2
word2
etc.
5.2.5.3
— Import places/people names using possessives
5.2.5.4
— GB to AU/CA/NZ/ZA (Marco Pinto - GB)
(21/NOV/2018)
[09:10] <marcoagpinto> I came up with a brilliant idea for
AU+CA+NZ spellers since Kevin Atkinson's versions suck
[09:10] <marcoagpinto> I will code a feature into Proofing
Tool GUI in January, that can be used with my GB speller
[09:10] <marcoagpinto> +word1
[09:10] <marcoagpinto> -word2
[09:11] <marcoagpinto> if we find maintainers, they can
create a .txt with a list of British words to remove and AU+CA+NZ
specific words to add
"GB to AU/CA/NZ" <- MENU OPTION 2018-12-13
[15:22] <marcoagpinto> I still need to implement the flags
merging to remove duplicates
[15:22] <marcoagpinto> and the feature that will allows to
convert GB to AU+CA+NZ
[15:22] <marcoagpinto> :)
[15:22] <marcoagpinto> will allow*
[15:22] <marcoagpinto> all one needs is a list of (-) words
that will be applied to the speller
[15:23] * deneb__alpha (~deneb_alp@fedora/denebalpha) has joined
#libreoffice-qa
[15:23] <marcoagpinto> :)
[15:23] <marcoagpinto> and add words from the countries
[15:23] <marcoagpinto> country specific words
[09:35] <marcoagpinto> darktrojan: I want to code a feature
into Proofing Tool GUI that will create AU+CA+NZ spellers based on
the GB one
[09:35] <marcoagpinto> there will be options to remove
-ise/-ize from the .dic, as well as providing a list of words to
remove from the .dic (GB only words)
[09:36] <marcoagpinto> is NZ only -ise?
[09:37] <%darktrojan> theoretically
[09:37] <%darktrojan> we tend not to care too much about
that one
[09:37] <marcoagpinto> ahhhh
[09:37] <marcoagpinto> cool
[09:37] <marcoagpinto> and British words like "arse"? Do
they exist in NZ?
[09:37] <marcoagpinto> :)
[09:38] <marcoagpinto> after the feature is implemented,
someone will need to create a list of words to remove from GB
[09:38] <marcoagpinto> a .txt file with removal words
[13:12] <marcoagpinto> in a few months it will be possible
to create AU+CA+NZ spellers based on the GB one!
[13:12] <marcoagpinto> :)
[13:12] <bearon> please stop saying my name
[13:12] <marcoagpinto> ahhhh
[13:12] <marcoagpinto> sorry
[13:13] <marcoagpinto> I have drawn/written the GUI for it
[13:13] <marcoagpinto> :)
[13:13] <marcoagpinto> on paper of course
[13:14] <marcoagpinto> one will be able to remove -ise or
-ize according to the country
[13:14] <bearon> sounds like an interesting plan
[13:14] <marcoagpinto> for example, the future maintainer
for AU could open the GB speller and select "remove all -ise" and
then add AU words
[13:14] <marcoagpinto> :)
[13:14] <bearon> would be nice to have all the common words
in all
[13:14] <marcoagpinto> and have a list of GB words to remove
in a .txt
[13:15] <bearon> perhaps it'd be best to have one with the
common words, and let GB be based on that as well
===
[13:12] <bearon> please stop saying my name
[13:12] <marcoagpinto> ahhhh
[13:12] <marcoagpinto> sorry
[13:13] <marcoagpinto> I have drawn/written the GUI for it
[13:13] <marcoagpinto> :)
[13:13] <marcoagpinto> on paper of course
[13:14] <marcoagpinto> one will be able to remove -ise or
-ize according to the country
[13:14] <bearon> sounds like an interesting plan
[13:14] <marcoagpinto> for example, the future maintainer
for AU could open the GB speller and select "remove all -ise" and
then add AU words
[13:14] <marcoagpinto> :)
[13:14] <bearon> would be nice to have all the common words
in all
[13:14] <marcoagpinto> and have a list of GB words to remove
in a .txt
[13:15] <bearon> perhaps it'd be best to have one with the
common words, and let GB be based on that as well
[13:16] <marcoagpinto> :)
[13:16] <marcoagpinto> well, the idea is also an option to
"merged GB"
[13:17] <bearon> so you don't need to maintain a removal
list, but can decide where to add the new word
[13:17] <marcoagpinto> I won't maintain such a list
[13:17] <bearon> since it's easier to forget to add
something to the list of words to be removed ;)
[13:17] <marcoagpinto> the maintainers will
[13:17] <bearon> well, you have to
[13:17] <bearon> you add a new word to the GB speller
[13:17] <bearon> who will know if it's GB only or not?
[13:18] <marcoagpinto> "merge GB" + remove -ize/-ise + list
of words to remove
[13:18] <marcoagpinto> :)
[13:18] <marcoagpinto> the maintainers?
[13:18] <marcoagpinto> the guys from CA+AU+NZ will know?
[13:18] <marcoagpinto> :)
[13:18] <marcoagpinto> we need to find people from these
countries
5.2.5.5
— GB primary to -ize/-ise script (Marco Pinto - GB)
5.2.5.6 — Extract wordlist
Decodes and extracts the wordlist into a .txt file.
5.2.5.6.1
— All .txt
5.2.5.6.2
— All .csv
5.2.5.6.3
— Compounds .txt
5.2.5.6.4
— Compounds .txt (LanguageTool)
5.2.5.7 —
Count
wordlist
Decodes and counts the total number of words in the wordlist.
5.2.5.8
—
Show/Merge/Delete duplicates .dic
The menu
to search for duplicates in the dictionary will match two
identical words, unless they have morphological information that
differentiates them.
See the example of the Portuguese speller where each word has
information in front:
celeste/p [CAT=adj,N=s,G=_]
Celeste [CAT=np,G=f,SEM=p]
The main purpose is to merge flags into primary words, delete
repeated words and merge flags of the same kind (customised).
For example, in the GB speller:
Marco (primary)
Marco/S (suffix)
Marco/GD (suffixes)
Anna/UA (prefixes)
Anna/I (prefix)
Would result in merging the flags of common type (unless
you change the checkboxes in the window for other behaviour):
Marco/SGD
Anna/UAI
These two words in the .dic would be the result.
5.2.5.9 —
Show duplicates wordlist
It decodes the .dic to RAM and then compares
the whole wordlist for duplicates and opens a windows with the
duplicate words, the number of duplicates and the positions in
the .dic which causes them.
You can export the results as a text file for easier removal
of duplicates.
5.2.5.10 —
Statistics
Decodes and counts the total number of words in the wordlist
showing also statistical information.
Now we can know the number of new words added to the speller
at any time based on a reference value (good for
release notes).
I will explain the extra functionality of the window based on
the line from above: "4 characters
long".
If you left-click in a line, it will show three options in the
pop-up menu:
1) Show
You can also double-click in a line instead of manually
selecting "Show".
It will open the window below based on the criterium of the
selected line.
For example, in the window below it will show all the words
that after decoded have 4 characters, including the flags that
caused that to happen.
2) Extract
It will extract the words based on the
criterium of the selected line.
No flags will be extracted, but just the decoded words of the
criterium (a normal wordlist).
3) Extract with flags
It
will extract the words based on the criterium of
the selected line, including mentions to the
primary/flags that generated them:
beer Flag:R (Primary:bee)
bees Flag:S (Primary:bee)
You can scroll in the ListIconGadget using the cursor keys and
it will decode in the preview panel.
In the left panel you will see the words that match the
criterium and on the right panel the decoding of them,
including primary/flags.
Pressing the "Extract"
button will do the same as 2) with the
criterium based on the previous Statistics window.
It will do the task for all the words listed and not just for
the highlighted one.
Pressing
the "Extract with flags"
button will do the same as 3) with the
criterium based on the previous Statistics window.
It
will do the task for all the words listed and not just
for the highlighted one.
5.2.5.11
— Words missing in Master Wordlist
This allows to import a master wordlist, and then other
wordlists, and analyse them, then showing which words are
missing in the master wordlist.
For example: for English I have my main wordlist (the
one I am working on) and I found others on Internet
such as the ones provided by Kevin Atkinson, so I thought
about an easy way to find out what words are missing in my
speller instead of checking by hand.
Or an easy way to find which words are missing in Kevin's
en_AU, en_CA and en_US and report to him.
On 25-OCT-2020 I provided wordlists to Kevin Atkinson in his
GitHub with the words missing in CA + AU + US.
This feature coded into Proofing Tool GUI allows to do that.
I will select the master wordlist (CA/AU/US)
and the other (GB) then, the software will
check which ones are missing in the master
5.2.5.12 — Sort flags .dic
Notice it uses non-casesensitive and "Aa"
can become "Aa" or "aA"
(for example, in FLAG CHR) (there is
no order between the same letters with different cases).
5.2.5.13 — Crunch words with
flags (Marco Pinto - GB)
This feature will add the flags to words, for example, in the
Oxford Professor wordlist, if he had the words:
party
parties
After validating them in the official dictionary I would
import them into the GB speller using a copy/paste option in
the bulk import.
Then, this feature that adds flags, so in the end, those two
words will become one:
party/S
Flag "S" makes the plural
of it in the rules file of the speller.
It took around 5 hours to scan the entire 90K lines .dic file
to automatically add and later merge the words...
It is slow as hell... there are around 90K lines of words in
the .dic file and each words has to be checked with the
others... so, it should be 90 000^2 testing... huge
calculations...
Later, I added the setting to select the number of matching
characters in the left: 1, 2 or 3 for the loop ranges.
This increased speed a lot as with "2"
instead of 5 hours it took 4 minutes for the 90K lines and "3" took just 1 minute.
"2" should be enough for a
first approach and then "3"
after you are sure that all the words you added with bulk
import are at least 3 characters long.
5.2.5.14 —
Fix
invalid spaces
This option will search for invalid spaces on the list of
words, such as two spaces or starting with a space.
It requires a future update for smarter lookup such as
recognising remmed lines.
5.3
— Thesaurus
The
thesaurus has a .dat and an .ixd file.
See for example the pt-PT .dat
UTF-8
a cerca de|1
(-) |a respeito de|sobre
a começar de|1
(-) |a partir de|desde
The pt-PT .idx:
UTF-8
50864
a cerca de|6
a começar de|47
Can you see the logic?
The .idx says that at byte 6 it has that "a cerca de" and at
byte 47 the "a começar de" (byte=position).
Both the .idx and the .dat are saved at the same time
(simultaneously), for each .dat entry I save I add the
position to the .idx (the file position).
5.3.1 — Creating a Thesaurus
If you have a Thesaurus
in memory, use ERASE
to delete all entries.
To create a Thesaurus from zero you just have to press the button
ADD to add synonyms.
Use EDIT or double-click to change information
regarding the synonyms.
Use DELETE or <DEL> to
remove entries.
The format of the Thesaurus is a UTF-8 format
file with the extension .DAT .
Now-and-then remember to SAVE/SAVE AS to play
safe.
5.3.2 — Editing a Thesaurus
First download the extension of the
language you intend to use, from the official pages.
You should have an .OXT
file which you rename to .ZIP in order to
extract its contents to HDD.
Press OPEN and select
the .DAT file of the Thesaurus.
Now just ADD/EDIT/DELETE
the current entries.
Now-and-then remember to SAVE/SAVE
AS to play safe.
In build 82 (14.Aug.2015)
I improved the Thesaurus part. It is now possible to use DEL
to delete synonyms and added a menu "Thesaurus
Tools" with options being the most important one the "Combine" which combines all
meanings but only works with simple lines:
x|2
a
b
would generate:
a|2
x
b
and:
b|2
a
x
PTG creates .idx files
for the Thesaurus.
5.3.3 — Menus
5.3.3.2 — Bulk
Import
word1,syn1,syn2
word2,syn1
etc.
5.3.3.3 —
Extract
synonyms
5.3.3.4 —
Clean
up symbols (Hex to Emoji)
5.3.3.5 —
Show/Merge
duplicates
5.3.3.6 —
Fix
invalid spaces
5.3.3.7
— Combine/Sort/Deduplicate
simple meanings
5.3.3.7.1
— Combine
5.3.3.7.2 —
Deduplicate simple meanings
What is the definition of a "duplicate"
meaning?
It means for example:
apple|3
one
two
one
It means that it would remove the "one" once becoming:
apple|2
one
two
It checks line by line and not column by
column:
apple|1
-|one|two|one
This wouldn't change the meanings.
2) "Sort
simple meanings" will work also line by line in the
Thesaurus meanings.
5.3.3.7.3 — Sort
5.4 — Hyphenation
.AFF files are open in the first tab "Dictionary".
When you open a .DIC in "Dictionary",
it opens also the .AFF. Then, after opening a dictionary, you can
go to the "Hyphenation" tab and
open an Hyphenation .DIC file.
The joining of both will allow you to derivate words in
hyphenation or even type words by hand.
5.4.1 — Creating a
Hyphenation
Simply add the rules to the
EditorGadget or open an existing one.
5.4.2 — Editing a Hyphenation
Open the hyphenation .dic of the language
you intend to use or add new rules.
One can open a speller and then check hyphenation from its
wordlist, or just use an hyphenation file for words without codes.
Thanks to Mauro Trevisan for explaining to me how the rules work.
Results can always be checked with:
https://www.ushuaia.pl/hyphen/?ln=en
The "Rule" field in the
ListIconGadget is for developers to debug while they create rules.
Example of rules, taken from Németh László PDF:
. a l g o r i t h m .
4l1g4
l g o3
1g o
2i
t h
4h1m
-----------------
4 1 4 3 2 0 4 1
a l-g o-r i t h-m = al-go-rith-m
In simple words, if you have the word
"algorithm", it will add a space (internally)
to each letter and a "." on
each side:
". a l g o r i t h m . "
Then you may create the rules above in the EditorGadget without
spaces:
4l1g4
lgo3
1go
2ith
4h1m
If you use a dot:
.1go ; It means the rule matches the left of
the word
4h1m. ; It means the rule
matches the right of the word
.a4l1g4o3r2it4h1m. ;Using a dot on each side
means that the word must match the whole rule.
Only the odd numbers are converted to hyphens.
If several rules match the words, the highest number in a column
will be kept. This is probably used for priorities with higher
numbers being taken into consideration first
Remember that the first word in the Hyphenator EditorGadget must
be: UTF-8
On build 127, thanks to Mauro Trevisan, I
improved the parsing of rules, to accept repeated rules per
word:
For the word "ultrateren":
. u l t r a t e r e n .
1l
2l t
1t
1t
t2r
1t
1r 1r
1n
2n
.
-----------------------
2 1 2 1 1 2
u l-t r a-t e-r e n = ul-tra-te-ren
Rules:
1l
2lt
1t
t2r
1t
1r
1n
2n.
Mauro also told me about another type of rules that already have
an hyphen.
For the word "inportar":
. i n p o r t a r .
p o-r
t a r
i n1
-------------------
1 -
i n-p o-r t a r = in-po-rtar
Rules:
po-rtar
in1
Press OPEN and select the hyphenation .DIC file.
Now-and-then remember to SAVE/SAVE
AS to play safe.
5.4.3
— Menus
5.4.3.1 — HYP Validate
After it is coded it will allow to search for
missing UTF-8 flags or duplicates of it,
duplicate rules or rules without numbers, showing the
problematic lines where the errors occur.
5.4.3.2 — Fix invalid spaces
5.5 —
Autocorrect
5.5.1 — Creating an Autocorrect
Proofing Tool GUI build 201 blocks accepting a word both as
correct and incorrect, which will avoid future mistakes of having
words that autocorrect themselves.
5.5.2 — Editing an Autocorrect
First download the DocumentList.xml of the language you intend to
use, from the official AOO/LO pages.
The
autocorrect files in AOO/LO are stored in the path:
$instdir/share/autocorr/acor_*.dat
which are actually zipped files containing the XML
files.
Rename the .DAT files
to .ZIP and extract the contents.
Using Notepad++
or other tool, format the DocumentList.xml so
that it is in UTF-8 and it uses the structure
(you can copy/paste this first line over the XML entry):
<?xml version="1.0"
encoding="UTF-8" ?> <block-list:block-list
xmlns:block-list="http://openoffice.org/2001/block-list">
<block-list:block
block-list:abbreviated-name="incorrect1" block-list:name="correct1"/>
<block-list:block
block-list:abbreviated-name="incorrect2" block-list:name="correct2"/>
<block-list:block
block-list:abbreviated-name="incorrect3" block-list:name="correct3"/>
<block-list:block
block-list:abbreviated-name="incorrect4" block-list:name="correct4"/>
etc. (use lines
like the previous)
</block-list:block-list> |
Have in mind that you must have per
line only one pair of incorrect/correct. I noticed that the .XML
I edited had all the text in one single line, so use Notepad++
to create a return at the end of each line:
Press OPEN and
select the DocumentList.xml file.
Now just ADD/EDIT/DELETE
the current entries.
Now-and-then remember to SAVE/SAVE
AS to play safe.
5.5.3
— Menus
5.5.3.1 — AC Validate
Opens a
window with validation tests and shows the results in an
EditorGadget which can be extracted to a .txt file.
It has two steps:
Step 1:
It checks if the correct entries
exist in the speller, thus the need to have a loaded
dictionary.
Step 2:
Checks for invalid suggestion
patterns such as:
color,colour
colour,color
5.5.3.2 — Bulk
Import
incorrect1,correct1
incorrect2,correct2
etc.
5.5.3.3 —
Extract
autocorrects
It extracts to a
.txt file the list of autocorrect entries, with the
structure:
diden't → didn't
speeking → speaking
5.5.3.4
— Clean up hex symbols
5.5.3.5
—
Show/Delete/XDelete
duplicates
It shows, deletes
and eXclusive Deletes entries in the autocorrect list.
For example, Delete will keep one of the duplicates found:
word1
word1
word1
word2
word2
word3
It will keep:
word1
word2
word3
eXclusive Delete will remove all entries if they have at
least one duplicate:
word1
word1
word1
word2
word2
word3
It will only keep:
word3
This is useful if you have appended
your list (merged) into an existing
autocorrect file and want only to commit to Gerrit (LibreOffice)
the words not found on your list and on Gerrit's.
5.5.3.6 — Fix
invalid spaces
5.6
— LanguageTool
5.7 — Language Specific
5.8 — Extension
5.8.1
— Menus
5.8.1.1 — .xpi/.oxt Properties
6 — History
V3.0 — ??.???.2020
Compiled with PureBasic 5.XX.
a) General
— The manual has been rewritten;
— Supports more window resolutions
— Several new options and gadgets;
— It uses dynamic arrays making all load/save operations
ultrafast;
— Shortcut keys;
— Enhanced pop-up menu which can be used on the
ListIconGadgets items with options for smart/faster use;
— Invalid characters, such as
spaces, while inserting data, turns the gadgets background to
red;
— Created a menu
"Prefs"
with settings that loads and
saves in a file named
"ptg3.prefs";
(Prefs allowing to select a dynamic
number of lines. This makes it compatible with all
OSes).
— It
saves Dictionary + Thesaurus + Hyphenation + Autocorrect
with #LF$ instead of #CRLF$
for Linux mode;
— Speeded up several operations;
— Better UTF-8
warnings;
— GUI improvements and fixes;
—
Modern menus look for Windows;
— Pop-up menu has an extra
option "Clone"
(very useful);
— Added "Not coded yet!"
to tabs not coded yet;
— New About PureBasic menu icon;
— Renamed "Quit" menu
item to "Exit";
— Created a colour constant #GreenDark
that is better seen on a grey background;
— Coded undo/redo;
— New button to move to the bottom of the ListIconGadget;
— Help menu item now shows
"F1" in it;
— F1 (help) now works in the edit/new
items windows;
— Formats the total number of items (numbers)
to be better seen;
— There is a "Recent Files:"
entry in the "File"
menu;
— It is possible to Edit/Delete from the
"Advanced Find" window;
— Cleaned the code.
— HTML Help menu item;
— Added images to pop-up menus;
— Message Requesters now have a sign;
— About window now shows
www.proofingtoolgui.org as the project's page;
— The default separator chr for numbers is now a comma;
— New pop-up menu option
"Invert Selection";
— Changed the e-mail address in
"About" window to Sapo;
— Improved the check for updates code;
b) Linux
—
GTK3 support;
— Now there is an icon while
running;
— Apache License link now works.
— Fix: Get maximum window on PREFS now
works (still doesn't work with Ubuntu 16.04, only
with 17.10 and maybe 17.04);
— Fix: Highlighted item during a CURS UP at
the top seems to only be removed after the scroll.
— Fix: Highlighted item during a CURS DOWN
at the bottom seems to only be removed after the scroll.
c) Dictionary
—
Pop-up menu to copy the selected
line word into the clipboard;
— Taboo
warning if the NOSUGGEST flag is used;
— It
is possible to have a custom
"AFF Aid" files with 16x16
PNGs flags;
— Replaced the ListIconGadget field
"Position"
with "Code Position".
— Added recursivity to dictionary word decoding (twofold);
— Support
for FLAG NUM and LONG;
— Improved: If a code isn't found
in the .aff it no longer exits the decoding function;
— Major speed gain in the .AFF
optimising code (gl_ES);
— It accepts: \/ are escaped
"/"
in dictionary words;
— It combines PREFIXES against
PRIMARY+SUFFIXES.
—
Support for extracting wordlist compounds for LanguageTool
and check for duplicates makes use of morphological
information;
— The check for duplicates in dictionary, also shows the
lines numbers;
— Added "Missing Codes:"
to the Dictionary editor;
— Window enlarges according to window size definition;
— Major speed gain in
"Optimising .AFF decode cache" — up to ~30%
faster;
— Fix: Check for duplicates improved (morphologic
data);
— Fix: Decoding of rules with a 0 (pt_PT);
d) Autocorrect
— Added
Exclusive Delete;
— Allow to try to open a file with a non-valid header;
e) Hyphenation
—
Coded the hyphenator.
f) Thesaurus
—
Sorting
synonyms naturally by replacing | with chr(9) and after
sorting with |;
—
Update number of meaning while
editing synonyms supports Mac OS line endings;
— Saving
the Thesaurus creates an .idx file.
— It
is possible to extract a thesaurus in book format (can
be later converted to PDF);
— EditorGadget has a grey tip on it;
—
Window enlarges according to window size definition;
On build 198:
—
Compiled using PureBasic 6.00
LTS;
— Fix:
Wordlists importing would cause refresh of GUI;
— "Crunch
words with flags (Marco Pinto - GB)":
1) Fix:
It now recognises POS information;
2)
Fix:
Use wrong flag for 1 chr words for
non-derivable words;
3) Major speed gain: around twice
the speed;
4) Added
"Characters
match" tab;
5) GTK3 ready.
—
Updated colour constants
to 2022-08-06;
— Included
common_procedures_20220814;
— Coded statistics
by letter and statistics are now GTK3 ready;
—
The Master wordlist extraction
now has a button to extract uppercase-only
words;
—
"Extract
to bottom" in
pop-up menu now has
"(PFX)" and
"(All)";
—
Initial support for mouse
wheel scroll in the virtual ListIconGadgets
(DOESN'T WORK PROPERLY YET
WITH PUREBASIC 6.00 LTS).
—
One procedure for all tabs to
update the virtual vertical scroll bars;
—
Removed from LanguageTool part
"-ize/-ise .txt
(English)";
—
Coding of "Sort
regexp";
—
New pop-up menu option "Copy
flag rule" in
dictionary editor;
—
Added option to Language
Specific Tools:
"Convert XML tags to accents wordlist (very slow)".
On build 197:
— Updated PureBasic to 6.00 beta 8, now
compatible with Ubuntu 22.04 LTS;
—
Fix: It now
correctly enables the undo/redo buttons on each tab
and menu items.
On build 196:
— Updated PureBasic to 6.00 beta 6;
— Tip on prefs improved and changed position;
— Prefs default to full-screen resolution and
three simple rule for number of lines in full
screen;
— Main GUI 100% Ubuntu 20.04 LTS gadgets size;
— Moved up the gadgets 7 pixels on each tab of
the main window;
— New Linux icon in 64x64;
— add-on image 24x24 now is grey when no
add-on opened;
— Replaced "Filename:"
with "File:";
— Disable main GUI for input/output
operations;
— Started working on the use of .xpi/.oxt;
— "Words
missing in Master Wordlist"
shows filesizes in ListIconGadget;
— Words editor now has a tip saying that
COMPLEX PREFIXES are not supported yet;
— Edit autocorrect, the text boxes now have
red ink for incorrect and green ink for correct;
— LanguageTool: Improved the extract rules
structure information;
— Optimised the .AFF decode cache;
— Included common_procedures_20220210.
On build 193–195:
— Updated PureBasic to 6.00 beta 1;
— Improved GUI to GTK 3.20 (Ubuntu
20.04);
— Initial support for "AutoText"
(view only);
— Added
"eye glasses" to "read only"
files (AutoText + LanguageTool);
— Added
"trashcan" image to
clear the file history in the menu;
— Coding of
"Delete Duplicates small wordlist (very slow)";
— Started showing the time taken in timely
operations (WIP);
— Add-ons icon for open .xpi/.oxt (still
not working);
— Improved the LanguageTool Structure Extract;
— Included common_procedures_20211130b.
On build 192:
— Fix:
"Show/Merge/Delete duplicates in .dic"
would not detect
"All Prefixes";
— Improved "GB to
AU/CA/NZ/ZA (Marco Pinto - GB)";
— Rewrote the
"Preferences";
— Added spaces on left regarding autocorrect (prefs
+ core);
— Added tab
"AutoText" plus
corresponding places in the last files list;
— Updated the common procedures to V20210720.
On build 189–191:
— Fix: Root
word in exporting as .CSV;
— Fix:
LanguageTool: Trying to export empty rules structure
now shows
"Aborted.";
— Fix: Fix
font issue in Prefs;
— Improved
"GB to AU/CA/NZ/ZA (Marco Pinto - GB)";
— Cleaned the Prefs code and disabled 1024x
resolutions in it;
— Improved LanguageTool rules decoding;
— Created an option in the pop-up menu
"Extract to bottom" (not working yet);
— The sizes of load/save main files for each
tab shows the files label. Ex:
"(aff:1 MB)+(dic:10MB)";
— Cleaned the code;
— Updated the common procedures to V20210703.
On build 188:
—
Fix: Bulk Import:
empty lines aren't added to the .dic;
— Replaced all
"Okay" occurrences
with "OK" (Olivier Hallot);
— Options that use file open/save show on the
status bar the filesize after
"OK.";
— Prefs:
"0" disables above
N words warning (Olivier Hallot);
— LanguageTool:
1) Basic support to
open a grammar.xml file;
2) Bulk Import
Multiwords now accepts more POSs + combo box
improvements;
3) Added menu
item to export rules structure.
—
"Show/Merge/Delete duplicates .dic":
1) Now has the
"Process"
+ "Export" button disabled if results=0;
2) Now uses an
EditorGadget line buffer to avoid refresh CPU load.
— Added menu item to delete duplicates small
wordlist;
— Extract wordlist as CSV now has a "Root_word"
field (Shantanu Oak);
— Updated the common procedures to V20210113.
On
build 183–187:
— Compiled with PureBasic 5.73 LTS.
Its beta 3 fixed Hindi and Korean fonts:
https://www.purebasic.fr/english/viewtopic.php?f=4&t=76204;
—
Fix: Certain parts
of PTG would make it stop responding if large
amounts of data
were used (such as the LibreOffice
pt-BR dictionary);
— Major GUI improvements all over, such as:
1) Added version information to
title bar;
2)
"Clear" buttons now
use a trashcan image;
3) Several windows now have the
go to top/bottom buttons;
4)
"AC Validate"
supports GTK3 and reordered its buttons;
5) Partial coding of the "LanguageTool"
tab, adding extra buttons;
6) Options using file
save/extract show on the status bar the filesize.
— New menu items:
1)
"Crunch words";
2)
"Crunch POS";
3)
"Wikipedia rules testing";
4)
"Sort small wordlist";
5)
"Release notes".
— Adding of F1 to open the help
in various windows (not fully implemented
yet);
— Dictionary:
1) Major improvements in
"AFF Validate";
2) Coded showing twofold
missing flags;
3) Coding of
"Words missing in Master Wordlist";
4) Coded "Sort
flags".
— Autocorrect:
1)
"AC Validate":
disables the gadgets before processing and enables
after;
2)
"AC Validate": asks
if continue when extracting words goes beyond the
safe value on the prefs.
— LanguageTool:
1) Added more POSes.
— Updated the common procedures to V20201202.
On build 177–182:
— Fix: CIRCUMFIX
flag (Brandon)(Mauro);
— Fix: CTR+A
works in the EditorGadget of the dictionary and
hyphenation;
— Fix:
Dictionary Editor words starting with / (comments)
accept spaces at end;
— Major cleanup of the process pfx/sfx procedure;
— Yellow flag in Dictionary editor if input is a
rem;
— Several parts have go to bottom/top buttons:
1) Bulk import;
2) Names/Places import;
3) AFF Validate;
4) AC Validate;
5) GUI tabs.
— Added a
"Clear" button:
1) Bulk Import;
2)
"Import places/people names".
—
"Show/Merge/Delete duplicates .dic"
has a
"Remember" button;
— "Show
duplicates wordlist"
shows the total of words in the wordlist;
— In the merge flags in .dics there is an
EditorGadget and buttons;
— Thesaurus: coded
"Sort on number";
— Initial LanguageTool support on menu items:
1) Exporting POS (unfinished);
2) Bulk Import Multiwords.
— Updated the common procedures to V20200508.
On build 163–176:
— Fix: PureBasic
5.72 LTS updated all libraries fixing a vulnerability in
the RegExp library;
— Fix: Now in Linux,
it scrolls to the cursor position of the flags in the
Dictionary editor;
— Fix: Long
filenames now fit in the GUI;
— Fix: NEEDAFFIX
flag (Shantanu);
— Fix: CIRCUMFIX
flag partially (Brandon);
— All file operations now use UNIX #LF$ BOM;
— Better warning if no
"SET UTF-8" is found in the .AFF while loading
a .DIC;
— Added menu item "Sort
flags .dic";
— Dictionary Editor:
1) Coding of Shantanu's summary;
2) New
"Match" gadgets.
— Started coding the "AFF
Validate" (almost ready);
— New icon in menus for exporting files;
— About window URL now points to HTTPS;
— Updated the common procedures to V20200322.
On build 152–162:
— Fix:
Opening more than one instance of PTG says that the
prefs file is invalid (file not closed).
— Dictionary Editor:
1) Checks inside a rule if it has the NEEDAFFIX
FLAG;
2) Allows sorting by column (Shantanu
Oak);
3) Replaced "Flag"
symbol with text.
— Statistics:
1) Coding the extra options for the statistics (Shantanu
Oak);
2) Extract with and without flags option (Shantanu
Oak).
— Added one new menu: "Extension
Tools".
— Added submenus:
1) Thesaurus Validate;
2) eXclusive Master Wordlist;
3) .xpi/.oxt Properties;
4) Preview Thesaurus.
— Replaced "GB
to AU/CA/NZ" with "GB
to AU/CA/NZ/ZA" (added also ZA).
— Replaced "UTF-8
WITHOUT BOM" notices with just
"UTF-8".
— About window now shows "Version"+"Build"+"(beta)"
(no more hyphen).
— Support for Latin sorting:
1) Dictionary;
2) Thesaurus;
3) Autocorrect.
— Updated the common procedures to
V20191210.
On build 151:
— Uses PureBasic 5.71 LTS;
— Common Procedures include (common functions
for all my projects to avoid repeating the code);
— Statistics now removes ".00"
for % with round values;
— Find:
1) Has two extra tabs: "%
Match" and "Status";
2) Pressing CTR + F
places the cursor in the search string gadget.
On build 150:
— First column of the main ListIconGadgets now in grey (more
professional);
— Dictionary Editor:
1) Added improved GB rules to .AFF
Aid Language;
2) Coded "Remove
flag from textbox" in pop-up menu;
3) Coding of COMPLEXPREFIXES (only
detects it, it doesn't work).
— Autocorrect: New wrong/right images;
— Ubuntu 18.10+: Main window gets a bigger height panel
gadget if the OS is Linux.
On build 148–149:
— Fix: CIRCUMFIX flag,
thanks to Brandon and Mauro;
— Dictionary editor: Added 10 pixels to each field of the
ListIconGadget for larger fonts usage.
On build 147:
— Added a new menu item: "GB
primary to -ize/-ise script";
— Several parts now show "<ESC>
to abort." during scan;
— Fix: "Show/Merge/Delete
duplicates .dic" variable in function used by
value (undo not working properly);
— Preferences allow to choose the font for the language.
On build 146:
— Linux requires GTK3 (Ubuntu
18.04 LTS or above);
— "GOTO" window shows
the number of items between 1 and range;
— Statistics use two decimal houses in the %.
On build 145:
— Coded "GB
to AU/CA/NZ".
On build 144:
— Saving the
dictionary also shows "Saving
.aff file...";
— Autocorrect string
gadgets are now wider;
— Dictionary editor:
1) Added a refresh button;
2) Allows to refresh decoding during
processing, and OKAY and CANCEL.
— Redesigned the
About PureBasic window;
— Finished coding "Show/Merge/Delete duplicates .dic"
and added colours to affixes;
— Now it uses a
folder "settings" for
the various working files;
— Hyphenation:
1) Decodes also "=";
2) Supports commented words (turns
StringGadget to yellow);
3) Uses colours in the flags (Prefix,
Suffix, Both);
4) "No
matches" in red if no rule applied.
On build 143:
— Fix:
"Preferences":
Small GUI fix for Linux;
— Fix:
"Show/Merge/Delete duplicates .dic": Small GUI
fix for Linux;
— Added support for
"KEEPCASE" flag;
— In the dictionary editor:
1) Increased the delay before
refreshing the derivates while typing;
2) Refreshing derivates now updates
the number found during processing.
— Continued coding
"Show/Merge/Delete duplicates .dic": it will only
respect the settings for
morphological information and merge primary words and
delete exact duplicates. I have been
trying to code the merging of flags but there is a bug
in the code which will require more
attention. In simple words: the merging of flags is
bugged and doesn't do anything.
On build 142:
— Fix: Improved the
Preferences window;
— "Show duplicates wordlist"
now clearer export to read and exports in UNIX format with
BOM;
— Export wordlist now exports in UNIX format with BOM;
— Extracting as CSV the rules number have zeros before to
make sorting easier in Calc/Excel;
— Added to dictionary editor pop-up menu:
1) "Copy
all words"
2) "Copy
all words & rules"
— Continued coding
"Show/Merge/Delete duplicates .dic" ("Process" button still not
coded).
On build 141:
— Fix: Description of
two GB rules in .AFF Aid Language by Ding-adong;
— Fix: No longer
decodes commented dictionary words (starting with
"/");
— Fix: Decoding of
prefixes using twofold if there were flags of suffixes
after flags of prefixes;
— Commented dictionary words are now displayed in yellow
in the ListIconGadgets;
— Dictionary editor: Change background colour to yellow
if word is commented (starts with
"/");
— Continued coding
"Show/Merge/Delete duplicates .dic";
— It is now possible to abort an advanced find in
progress with <ESC>;
— Speeded up decoding affixes;
— "Warn over N words"
while extracting and counting duplicates on wordlist.
On build 140:
— Fix:
Removed flag "=" to be recognised as morphological
information on dictionary load;
— Dictionary editor now has in the pop-up menu "Copy word & rule";
— Minor speed up decoding affixes;
— Replaced prefix flag
"O" with flag "^" GB
rule in .AFF Aid Language by Ding-adong;
— Started coding
"Show/Merge/Delete duplicates .dic"
(only
"Scan" and "Export"
working);
— Dictionary words editor now allows to jump to flags
position also in Linux and Mac.
On build 139:
— Better decoding of affix rules, including dots, by using
the RegularExpressions library;
— Better icon for help;
— "unduplicate" is now called
"deduplicate";
— Two new dictionary options (not working yet):
1) Show/Merge/Delete duplicates .dic;
2) GB to AU/CA/NZ.
— E-mailing Marco in the About window now adds the PTG
version+build to the subject;
— Added new GB rule to .AFF Aid Language created by
Ding-adong.
On build 138:
— Fix:
Preferences: Combobox icons with colours now appear straight
in Linux;
— Preferences: If
"Maximise window" height
<600 pixels, the radio button gets disabled;
— CIRCUMFIX finally coded.
On builds 134–137:
— Compiled with PB 5.70 LTS;
— Major speed up in
"AC Validate" in
Ubuntu, thanks to the forum user #NULL;
— Added one sentence in check for updates, if up to
date: "Your
version is up to date.";
— Coding of
"Bulk import" for
dictionary+thesaurus+autocorrect;
— Fix:
"Bulk import" was
doing undo for dic + thesaurus + autocorrect (for
all on abort);
— Import the places/people names now accepts empty
dictionaries;
— Linux: "Bulk
Import" and
"Import Proper names" can now close the window in the
gadget;
— Edit/New Dictionary word window now shows
"FLAG CHR",
"FLAG NUM" OR
"FLAG LONG";
— Edit/New Dictionary word now shows "CIRCUMFIX";
— Edit/New Dictionary word now supports two instances
of "0" in the rules;
— Fix: replaced regarding prefixes
conditions_per_line_sfx$(4) with
conditions_per_line_pfx$(4);
— Coded the
"CIRCUMFIX" flag (doesn't
decode properly yet, just informs if it is used);
— Import places/people names now also removes the
chrs "<",
">", " - ",
" − "
and " — ";
— Replaced some more
"words" with
"dictionary words";
— In the
"Statistics" the
total number of words is now a StringGadget (can
copy its contents);
— Now the dashes are larger in the confirm quit for
files changed;
— Now the dashes are larger in the copyright years,
PB version and About window;
— Added two
"Copy" (to
clipboard) buttons in statistics;
— Preferences now has an option for
"Both" in Affixes (CIRCUMFIX);
— Improved the warning message when
"AC Validate" is
selected and there is no dictionary.
On build 133:
— Check for updates now checks if build on the site
data is outdated;
— Allows to import places/proper names with
possessives (it still doesn't check if the
flags exist in any position);
— Better Linux icon;
— Better looking logo in the about window;
— Updated the check for updates URL;
— "Recent
Files" menu item now
has an icon;
— If checking for updates and found, the OKAY button
now directs the browser to the download bookmark;
— Pop-up menus now use the flag: #PB_Menu_ModernLook
(Windows only);
— Floppy icon on tabs (blue/red);
— Coding of the
"AC Validate";
— Replace the warning
"No words found."
with "No
dictionary words found.".
On build 132:
— Cleaned the decoding of prefixes and suffixes;
— Executable icon no longer stretched (Windows);
— Menu items in
"Hyphenation" (not working yet):
1) HYP Validate;
2) Fix invalid spaces.
— Added undo/redo buttons to Hyphenation tab (disabled);
— Default resolution is now 1280×600;
— Dictionary editor:
1) Code (flag) is now an Emoji;
2) Taboo image now has a tip;
3) Missing flags is now written in red;
4) Colours for "Suffixes"
(#GreenDark) and
"Prefixes" (#Blue)
(Prefs allows to customise colours for "Primary",
"Suffix" and
"Prefix");
5) Better check if a word is valid before exiting;
6) Image for NEEDAFFIX and FORBIDDENWORD;
7)
Fix: NEEDAFFIX and FORBIDDENWORD
variables reset at start of ADD/EDIT.
— About window:
1) Added (replaced text with) Emoji to Name, E.mail and
S.mail;
2) E-mail and URL now use the function GadgetFit to
avoid Linux size/positioning issues.
— Replaced the word "Tags"
with "Postags";
— Statistics:
1) Now allows to calculate the number of new words
since the last mentioned number;
2) Now exports in Linux format and also the extra
information;
3) Optimised slightly the processing regarding
uppercase words.
— Changes show a colour asterisk in red;
— Cleaned the code by using a function to turn the
gadgets background to Red/White on errors.
On build 131:
— Speed up in dictionary processing by searching first
for "SFX Y"
instead of "SFX N";
— Fix: Count words now
separates thousands with the preferences character;
— Improved error handling in hyphenator;
— Taboo's warning sign is now an image and is stored in a
long instead of a string for speed up;
— New strategy for extra speed decoding the dictionary, but
it takes more RAM;
— New pop-up menu option:
"Copy" for LanguageTool;
— Dictionary: Combobox of
".AFF Aid" wider for bigger windows;
— Fix: Decoding of
prefixes wasn't working for some suffixes under certain
conditions
(due to the new
"SFX N" code).
On build 130:
— Fix:
In Hyphenation tab it was not possible to use CURS UP/DOWN
in the Hyphenation ListIconGadget;
— New pop-up menu option:
"Move" for LanguageTool;
— Coding of "Check for
Updates";
— Fix:
Coded "SFX N" rules, by
not applying prefixes to it (Mykhailo Oliinyk);
— Cleaned the code a bit.
On build 129:
— Hyphenation: Fixed/Improved the display of rules (Mauro
Trevisan):
1) No spaces;
2) Dots.
On build 127–128:
— Hyphenation:
1) Pop-up menu: "Copy word","Copy rule","Copy
word & rule";
2) Support for "-" in
hyphenation rules;
3) Support for "%" in
hyphenation rules;
4) Fix:
Hyphenation now supports repeated rules per word.
— Major Speed up of hyphenation;
— Major Speed up of dictionary processing.
On build 124–126:
— Improved the manual a lot;
— The tab "Dictionary"
now has "*.dic + *.aff"
to make it easier to understand;
— Apache License is now built-in in the manual;
— Support for compressed AF;
— NEEDAFFIX support;
— Hyphenation:
1) While hyphenating words it shows the current
count+total;
2) Adds "UTF-8" to
hyphenator if EditorGadget is blank;
3) Hyphenation only works if the EditorGadget starts
with the word "UTF-8";
4) Fix:
Didn't skip the word "UTF-8";
5) Fix:
defaults MIN LEFT/RIGHT before checking for new values;
6) Now it detects if rules don't end with a
<RETURN>;
7) Better detection of no rules;
8) Added a "Clear"
button;
9) Added label: "#
Hyphenations:";
10) Hyphenation now respects the flags: LEFTHYPHENMIN and
RIGHTHYPHENMIN;
11) Fix:
Would skip the 5 lines header (if no standard
header, it would skip the first five rules);
12) Fix:
Changing resolution in the Preferences would reset each tab
filename to "n/a" and
erase the Hyphenation rules.
8 —
PCRE
Licence (used by the RegularExpression library)
|