British Dictionary (en_GB) —
Forked by Marco A.G.Pinto
Welcome to the forked version of
the British speller.
Since no one cared about updating the speller (a free
project very effortful), I took the task myself in
2013, over a decade ago.
I grabbed the original project and started
adding/removing/fixing words. I kept the original authors
credits and added my name.
I have created the best up-to-date British speller. It
encompasses several fields of knowledge, from simple to
complex words.
Furthermore, it is suitable as a basis for Commonwealth
and European English.
It doesn't matter your race, religion, gender, age or
academic background, everyone should have access to all
words equally free.
I am improving the speller to the maximum since I am
testing it on the field:
Most of my e-mails are in English, so I see the typo
reports and attempt to verify if it is really typos or
missing words.
I have also pasted webpages from newspapers, TV channels
and such to see which words are flagged.
To make sure the words I add are the correct ones, I look
for them in credible sources:
1)
Oxford
Dictionaries;
2)
Collins Dictionary;
3)
Cambridge Dictionary;
4)
Merriam-Webster Dictionary (used with caution );
5)
Wiktionary (used
with caution );
6)
Wikipedia (used
with caution );
7)
Physical dictionaries.
In January 2015, I purchased an “Oxford
Gold Account” to have a higher access to Oxford
Dictionaries.
I am also involved on several projects with a specific
jargon, having added some “special”
words.
I have been told to use scripts to update the dictionary,
but I am adding the words with copy/paste after checking
them in the dictionaries mentioned above. This is slower
and harder, but the results are much better and accurate.
Some words are chiefly American, and I will only add them
if there is no British correspondent.
In July 2019, Стоян e-mailed me saying that I added many
random words, making the dictionary a lengthy lexicon
instead of a spellchecker. And that I needed a big corpus
with the top 15–30% of the most frequent words from
different areas of science, newspapers, fiction, poetry,
Wikipedia articles, texts from Project Gutenberg and more
books or websites, etc.
Thus, I have been focusing on adding plurals and
possessives to the wordlist and also cleaning the .dic
file by removing duplicates and merging words using affix
flags. I have been checking important Wikipedia articles
to find missing words for specific subjects, making it
possible to write essays in a subject by making most of
the used terms available.
I pioneered the concept of adding possessives to words and
listing them in the release notes.
Some people complained that I add “all
words under the sun”. If you find any obsolete
words or archaic words that are close to current
replacements, please report them. I have been adding
derivates of words to assure that words like “biblically”
aren't missing (see Bugzilla ticket in LibreOffice:https://bugs.documentfoundation.org/show_bug.cgi?id=154826).
Adding as many words as possible is
useful because it is better to have valid words, even if
they are sometimes confused with others, rather than
risking typos and uncertainty about correct spelling. Is
it better to see most words marked in red?
Status of the British
Dictionary V3.3.3:
The statistics for the British Dictionary V3.3.3 (Proofing
Tool GUI 4.0 build 300 WIP), released on 1.Jun.2024.
Please note that there are thousands of duplicates in the
wordlist because some words in the .dic can't be merged
because they contain both SFX and PFX in the flags.
Those PFXs make it harder to find if words are already in
the .dic and make it harder to merge flags, and it also
messes the order of extracting the wordlist at the update
release day.
This and other things such as duplicated flags cause
duplicates.
In future versions of Proofing Tool GUI, I will code
features to mitigate these issues.
About ize/ise:
Just like in other languages, some words can be
written differently. Since Oxford says some words are
valid both ways, I kept both and the user decides which he
prefers. A good example is: “online”
and “on-line”.
For ize/ise, both ways are accepted in some words:
— optimize/optimise;
— realize/realise.
Oxford Dictionaries will only refer that certain words
accept both ize/ise for Premium accounts.
The regular user won't know by accessing the Oxford
website, but I have access to it.
Places from New Zealand/UK
(England, Scotland, Wales & Northern Ireland):
On V2.61–2.64 I included tons of place names.
My scientist friend, Peter McGavin, told me that in NZ
they use British, so I decided to do something about it. I
did the same for the UK. I searched on Wikipedia for “towns”, “counties”,
“villages”, “boroughs”,
“suburbs”, etc. and
based me on:
— https://en.wikipedia.org/wiki/List_of_towns_in_England;
— https://en.wikipedia.org/wiki/List_of_towns_in_New_Zealand;
— https://en.wikipedia.org/wiki/List_of_civil_parishes_in_England;
— https://en.wikipedia.org/wiki/List_of_civil_parishes_in_Scotland;
— https://en.wikipedia.org/wiki/List_of_places_in_Scotland;
— https://en.wikipedia.org/wiki/List_of_communities_in_Wales;
— https://en.wikipedia.org/wiki/Local_government_in_Wales;
— https://en.wikipedia.org/wiki/List_of_towns_and_villages_in_Northern_Ireland;
— https://en.wikipedia.org/wiki/Counties_of_Northern_Ireland;
— https://en.wikipedia.org/wiki/Category:Suburbs_in_New_Zealand;
— https://en.wikipedia.org/wiki/List_of_Church_of_Scotland_parishes.
Furthermore, added places sent to me by Peter C.:
© OpenStreetMap contributors:
www.openstreetmap.org/copyright.
© The Clergy of the Church of England Database Project,
2005.
Cities from Australia
On V2.65 I added the cities in Australia by
population, since they are in valid English:
— https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population
Cities from US
On V2.65 I added tons of cities in the US with a
10 000+ population, since they are in valid English.
This list was supplied by Michael Holroyd on Kevin
Atkinson's GitHub.
Cities from Canada
On V2.67 I added the cities in Canada,
since they are in valid English:
— https://en.wikipedia.org/wiki/List_of_cities_in_Canada
State and union territory capitals in India
On V2.90 I added this list to the dictionary,
since they are in valid
English:
— https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India
Common prescription and OTC drugs
On V2.63 I added tons of drug names supplied by
Andrew Ziem on Kevin Atkinson's GitHub.
The generic drugs (such as “diphenhydramine”)
are in lowercase, while the brand names (such as “Abilify”)
are capitalised.
Words regarding COVID-19
Main difficulties developing this dictionary:
1) Proper names;
2) Possessive forms;
3) Plurals.
I have been checking word by word to spot errors and missing
plurals/possessives.
It will take many years to have it ready.
Some words need rechecking since I can't find plurals or the
entries in the .dic use PFX and SFX, so I can't properly fix
them.
I need to code a feature in my tool Proofing Tool GUI to
extract PFX + SFX words.
I was checking the words from 'a' to 'z' but Peter (a
friend who suggests words) told me to begin with less
% existing words.
If one wants to do it hard, there is no other way, I must
check word by word, it will take years, but it will be done.
Adding new words:
If you believe to have found a missing/incorrect word,
please send it to me for analysis. If it is in the Oxford or
Collins dictionaries, I will add it.
Removing US words:
If you find American words, which appear both in
Oxford and Collins dictionaries as such with a British
correspondence, please send them to me for analysis
and removal.
Notice that the dictionary was
originally based on the US one, so many US words
originated from there.
Obscene/offensive words:
If you find any of the kind in the wordlist, and they don't
have the flag NOSUGGEST !,
please report them to me.
Archaic words:
I will only add archaic words if they don't interfere with
other words.
Notice that in literacy writing, some writers use archaic
words.
Please report to me any archaic words that have very similar
current words.
Obsolete words:
If you find any obsolete words, please report them to me.
Hyphenated words:
I have been avoiding adding words with hyphens and thus be
checking if they can also be written as just one word (together),
or if the official dictionaries state that they have no
hyphen at all, thus removing them from the .dic.
THE DICTIONARY PROJECT GOALS/TASKS FOR 2025+:
Release Proofing Tool GUI 4.0;
Rewrite the release
notes in a simpler/cleaner way;
Update both the GB dictionary and the ZA
dictionary in parallel;
Copy most proper names from the GB
dictionary to the ZA dictionary (PTG
enhancement required);
Add plurals and possessives to most current
nouns;
Add uncountable nouns;
Add proper names;
Add basic missing words which users must
suggest;
Add un- and non- words;
Add derivates for words;
Remove Americanisms;
Search for incorrect hyphenated words and
remove them;
Remove (extract) all prefixes from
.dic file to make it simpler to find/export words (PTG enhancement required);
Remove duplicates (PTG
enhancement required);
Add flags in .dic words to make it easier to
merge words (PTG
enhancement required);
Create GitHub folders for -ize and -ise
Hunspell files, updated twice a year (PTG
enhancement required).
I hope that people will enjoy my work and that it may be
useful to the progress of humankind.
Kind regards from:
30.Jun.2016
Marco A.G.Pinto
Master of Science in Information Warfare/Competitive
Intelligence.
Open-Source Developer.
Last update:
11.Sep.2024