British Dictionary (en_GB) —
Forked by Marco A.G.Pinto
Welcome to the forked version
of the British speller.
Since no one cared about
updating the speller (a free project very effortful),
I took the task myself in 2013, over a decade ago.
I grabbed the original project
and started adding/removing/fixing words. I kept the
original authors credits and added my name.
I have created the best
up-to-date British speller. It encompasses several
fields of knowledge, from simple to complex words.
Furthermore, it is suitable as
a basis for Commonwealth and European English.
It doesn't matter your race,
religion, gender, age or academic background, everyone
should have access to all words equally free.
I am improving the speller to
the maximum since I am testing it on the field:
Most of my e-mails are in
English, so I see the typo reports and attempt to verify
if it is really typos or missing words.
I have also pasted webpages
from newspapers, TV channels and such to see which words
are flagged.
To make sure the words I add
are the correct ones, I look for them in credible
sources:
1) Oxford
Dictionaries;
2)
Collins
Dictionary;
3)
Cambridge
Dictionary;
4)
Merriam-Webster
Dictionary (used
with caution );
5)
Wiktionary
(used with caution );
6)
Wikipedia
(used with caution );
7) Physical dictionaries.
In January 2015, I purchased
an “Oxford Gold Account”
to have a higher access to Oxford Dictionaries.
I am also involved on several
projects with a specific jargon, having added some “special” words.
I have been told to use
scripts to update the dictionary, but I am adding the
words with copy/paste after checking them in the
dictionaries mentioned above. This is slower and harder,
but the results are much better and accurate.
Some words are chiefly
American, and I will only add them if there is no
British correspondent.
In July 2019, Стоян e-mailed
me saying that I added many random words, making the
dictionary a lengthy lexicon instead of a spellchecker.
And that I needed a big corpus with the top 15–30% of
the most frequent words from different areas of science,
newspapers, fiction, poetry, Wikipedia articles, texts
from Project Gutenberg and more books or websites, etc.
Thus, I have been focusing on
adding plurals and possessives to the wordlist and also
cleaning the .dic file by removing duplicates and
merging words using affix flags. I have been checking
important Wikipedia articles to find missing words for
specific subjects, making it possible to write essays in
a subject by making most of the used terms available.
I pioneered the concept of
adding possessives to words and listing them in the
release notes.
Some people complained that I
add “all words under the
sun”. If you find any obsolete words or archaic
words that are close to current replacements, please
report them. I have been adding derivates of words to
assure that words like “biblically”
aren't missing (see Bugzilla ticket in LibreOffice:https://bugs.documentfoundation.org/show_bug.cgi?id=154826).
Adding
as many words as possible is useful because it is
better to have valid words, even if they are sometimes
confused with others, rather than risking typos and
uncertainty about correct spelling. Is it better to
see most words marked in red?
Status
of the British Dictionary V3.3.7:
The statistics for the British
Dictionary V3.3.7 (Proofing Tool GUI 4.0 build 300
WIP), released on 1.Jan.2025.
Please note that there are
thousands of duplicates in the wordlist because some
words in the .dic can't be merged because they contain
both SFX and PFX in the flags.
Those PFXs make it harder to
find if words are already in the .dic and make it harder
to merge flags, and it also messes the order of
extracting the wordlist at the update release day.
This and other things such as
duplicated flags cause duplicates.
In future versions of Proofing
Tool GUI, I will code features to mitigate these issues.
About
ise/ize:
Just
like in other languages, some words can be written
differently. Since Oxford says some words are valid both
ways, I kept both and the user decides which he prefers.
A good example is: “online”
and “on-line”.
For ise/ize, both ways are
accepted in some words:
— optimize/optimise;
— realize/realise.
Oxford Dictionaries will only
refer that certain words accept both ise/ize for Premium
accounts.
The regular user won't know by
accessing the Oxford website, but I have access to it.
Places
from New Zealand/UK (England, Scotland, Wales &
Northern Ireland):
On V2.61–2.64 I
included tons of place names.
My scientist friend, Peter
McGavin, told me that in NZ they use British, so I
decided to do something about it. I did the same for the
UK. I searched on Wikipedia for “towns”,
“counties”, “villages”,
“boroughs”, “suburbs”,
etc. and based me on:
—
https://en.wikipedia.org/wiki/List_of_towns_in_England;
—
https://en.wikipedia.org/wiki/List_of_towns_in_New_Zealand;
—
https://en.wikipedia.org/wiki/List_of_civil_parishes_in_England;
—
https://en.wikipedia.org/wiki/List_of_civil_parishes_in_Scotland;
—
https://en.wikipedia.org/wiki/List_of_places_in_Scotland;
—
https://en.wikipedia.org/wiki/List_of_communities_in_Wales;
—
https://en.wikipedia.org/wiki/Local_government_in_Wales;
—
https://en.wikipedia.org/wiki/List_of_towns_and_villages_in_Northern_Ireland;
—
https://en.wikipedia.org/wiki/Counties_of_Northern_Ireland;
—
https://en.wikipedia.org/wiki/Category:Suburbs_in_New_Zealand;
—
https://en.wikipedia.org/wiki/List_of_Church_of_Scotland_parishes.
Furthermore, added places sent
to me by Peter C.:
© OpenStreetMap contributors:
www.openstreetmap.org/copyright.
© The Clergy of the Church of
England Database Project, 2005.
Cities
from Australia
On V2.65 I added the
cities in Australia by population, since they are in
valid English:
—
https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population
Cities
from US
On V2.65 I added tons
of cities in the US with a 10 000+ population, since
they are in valid English.
This list was supplied by
Michael Holroyd on Kevin Atkinson's GitHub.
Cities
from Canada
On V2.67 I
added the cities in Canada, since they are in valid
English:
—
https://en.wikipedia.org/wiki/List_of_cities_in_Canada
State and union territory capitals in India
On V2.90 I added this list to the
dictionary, since they are in valid English:
— https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India
Common prescription and OTC drugs
On V2.63 I added tons
of drug names supplied by Andrew Ziem on Kevin
Atkinson's GitHub.
The generic drugs (such as “diphenhydramine”) are in
lowercase, while the brand names (such as “Abilify”)
are capitalised.
Words regarding
COVID-19
Main difficulties developing
this dictionary:
1) Proper names;
2) Possessive forms;
3) Plurals.
I have been checking word by word to spot errors and
missing plurals/possessives.
It will take many years to have it ready.
Some words need rechecking since I can't find plurals or
the entries in the .dic use PFX and SFX, so I can't
properly fix them.
I need to code a feature in my tool Proofing Tool GUI to
extract PFX + SFX words.
I was checking the words from 'a' to 'z' but Peter (a
friend who suggests words) told me to begin with
less % existing words.
If one wants to do it hard, there is no other way, I must
check word by word, it will take years, but it will be
done.
Adding new words:
If you believe to have found a missing/incorrect word,
please send it to me for analysis. If it is in the Oxford
or Collins dictionaries, I will add it.
Removing US words:
If you find American words, which appear both in
Oxford and Collins dictionaries as such with a British
correspondence, please send them to me for analysis
and removal.
Notice that the dictionary was
originally based on the US one, so many US words
originated from there.
Obscene/offensive words:
If you find any of the kind in the wordlist, and they
don't have the flag NOSUGGEST!,
please report them to me.
Archaic words:
I will only add archaic words if they don't interfere with
other words.
Notice that in literacy writing, some writers use archaic
words.
Please report to me any archaic words that have very
similar current words.
Obsolete words:
If you find any obsolete words, please report them to me.
Hyphenated words:
I have been avoiding adding words with hyphens and thus be
checking if they can also be written as just one word (together),
or if the official dictionaries state that they have no
hyphen at all, thus removing them from the .dic.
Release Proofing Tool GUI 4.0;
Rewrite the release
notes in a simpler/cleaner way;
Fork officially the
dictionary: ZA (South Africa);
Update both the GB dictionary and the ZA
dictionary in parallel;
Copy most proper names from the GB
dictionary to the ZA dictionary (PTG
enhancement required);
Add plurals and possessives to most current
nouns;
Add uncountable nouns;
Add proper names;
Add basic missing words which users must
suggest;
Add un- and non- words;
Add derivates for words;
Remove Americanisms in the GB dictionary;
Search for incorrect hyphenated words and
remove them;
Remove (extract) all prefixes from
.dic file to make it simpler to find/export words (PTG enhancement required);
Remove duplicates (PTG
enhancement required);
Add flags in .dic words to make it easier to
merge words (PTG
enhancement required);
Create GitHub folders for -ise and -ize
Hunspell files (PTG
enhancement required);
Make the GB dictionary use only -ise, but
keep GitHub folders for -ise, -ize and both (PTG enhancement required);
Fork officially the dictionaries: US, Canada
and Australia (PTG
enhancement required).
I hope that people will enjoy my work and that it may be
useful to the progress of humankind.
Kind regards from:
Marco in Spain drinking
Coca-Cola —
30.Jun.2016
Marco A.G.Pinto
Master of Science in Information Warfare/Competitive
Intelligence.
Open-Source Developer.
Last update:
12.Jan.2025