British Dictionary (en_GB) — Forked by Marco A.G.Pinto
 
Home README


Welcome to the forked version of the British speller.

Since no one cared about updating the speller (a free project very effortful), I took the task myself in 2013, over a decade ago.

I grabbed the original project and started adding/removing/fixing words. I kept the original authors credits and added my name.

I have created the best up-to-date British speller. It encompasses several fields of knowledge, from simple to complex words.

Furthermore, it is suitable as a basis for Commonwealth and European English.

It doesn't matter your race, religion, gender, age or academic background, everyone should have access to all words equally free.

I am improving the speller to the maximum since I am testing it on the field:
Most of my e-mails are in English, so I see the typo reports and attempt to verify if it is really typos or missing words.

I have also pasted webpages from newspapers, TV channels and such to see which words are flagged.

To make sure the words I add are the correct ones, I look for them in credible sources:
 1) Oxford Dictionaries;
 2) Collins Dictionary;
 3) Cambridge Dictionary;
 4) Merriam-Webster Dictionary (used with caution
);
 5) Wiktionary (used with caution
);
 6) Wikipedia (used with caution
);
 7) Physical dictionaries.

In January 2015, I purchased an “Oxford Gold Account” to have a higher access to Oxford Dictionaries.

I am also involved on several projects with a specific jargon, having added some “special” words.

I have been told to use scripts to update the dictionary, but I am adding the words with copy/paste after checking them in the dictionaries mentioned above. This is slower and harder, but the results are much better and accurate.

Some words are chiefly American, and I will only add them if there is no British correspondent.

In July 2019, Стоян e-mailed me saying that I added many random words, making the dictionary a lengthy lexicon instead of a spellchecker. And that I needed a big corpus with the top 15–30% of the most frequent words from different areas of science, newspapers, fiction, poetry, Wikipedia articles, texts from Project Gutenberg and more books or websites, etc.

Thus, I have been focusing on adding plurals and possessives to the wordlist and also cleaning the .dic file by removing duplicates and merging words using affix flags. I have been checking important Wikipedia articles to find missing words for specific subjects, making it possible to write essays in a subject by making most of the used terms available.

I pioneered the concept of adding possessives to words and listing them in the release notes.


Some people complained that I add “all words under the sun”. If you find any obsolete words or archaic words that are close to current replacements, please report them. I have been adding derivates of words to assure that words like “biblically” aren't missing (see Bugzilla ticket in LibreOffice:
https://bugs.documentfoundation.org/show_bug.cgi?id=154826).

Adding as many words as possible is useful because it is better to have valid words, even if they are sometimes confused with others, rather than risking typos and uncertainty about correct spelling. Is it better to see most words marked in red?


Status of the British Dictionary V3.3.3:

The statistics for the British Dictionary V3.3.3 (Proofing Tool GUI 4.0 build 300 WIP), released on 1.Jun.2024.

Please note that there are thousands of duplicates in the wordlist because some words in the .dic can't be merged because they contain both SFX and PFX in the flags.

Those PFXs make it harder to find if words are already in the .dic and make it harder to merge flags, and it also messes the order of extracting the wordlist at the update release day.

This and other things such as duplicated flags cause duplicates.

In future versions of Proofing Tool GUI, I will code features to mitigate these issues.


About ize/ise:
Just like in other languages, some words can be written differently. Since Oxford says some words are valid both ways, I kept both and the user decides which he prefers. A good example is: “online” and “on-line”.
For ize/ise, both ways are accepted in some words:
 — optimize/optimise
;
 — realize/realise
.

Oxford Dictionaries will only refer that certain words accept both ize/ise for Premium accounts.
The regular user won't know by accessing the Oxford website, but I have access to it.


Places from New Zealand/UK (England, Scotland, Wales & Northern Ireland):
On V2.61–2.64 I included tons of place names.
My scientist friend, Peter McGavin, told me that in NZ they use British, so I decided to do something about it. I did the same for the UK. I searched on Wikipedia for “towns”, “counties”, “villages”, “boroughs”, “suburbs”, etc. and based me on:
 —
https://en.wikipedia.org/wiki/List_of_towns_in_England;
 —
https://en.wikipedia.org/wiki/List_of_towns_in_New_Zealand;
 —
https://en.wikipedia.org/wiki/List_of_civil_parishes_in_England;
 —
https://en.wikipedia.org/wiki/List_of_civil_parishes_in_Scotland;
 —
https://en.wikipedia.org/wiki/List_of_places_in_Scotland;
 —
https://en.wikipedia.org/wiki/List_of_communities_in_Wales;
 —
https://en.wikipedia.org/wiki/Local_government_in_Wales;
 —
https://en.wikipedia.org/wiki/List_of_towns_and_villages_in_Northern_Ireland;
 —
https://en.wikipedia.org/wiki/Counties_of_Northern_Ireland;
 —
https://en.wikipedia.org/wiki/Category:Suburbs_in_New_Zealand;
 —
https://en.wikipedia.org/wiki/List_of_Church_of_Scotland_parishes.

Furthermore, added places sent to me by Peter C.:
© OpenStreetMap contributors:
www.openstreetmap.org/copyright.
© The Clergy of the Church of England Database Project, 2005.


Cities from Australia
On V2.65 I added the cities in Australia by population, since they are in valid English:
 — https://en.wikipedia.org/wiki/List_of_cities_in_Australia_by_population


Cities from US
On V2.65 I added tons of cities in the US with a 10 000+ population, since they are in valid English.
This list was supplied by Michael Holroyd on Kevin Atkinson's GitHub.


Cities from Canada
On V2.67 I added the cities in Canada, since they are in valid English:
 — https://en.wikipedia.org/wiki/List_of_cities_in_Canada  


State and union territory capitals in India
On V2.90 I added this list to the dictionary, since they are in valid English:    
 —
https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India


Common prescription and OTC drugs
On V2.63 I added tons of drug names supplied by Andrew Ziem on Kevin Atkinson's GitHub.
The generic drugs (such as “diphenhydramine”) are in lowercase, while the brand names (such as “Abilify”) are capitalised.


Words regarding COVID-19
On V2.83 I added tons of entries regarding the pandemic.


Main difficulties developing this dictionary:

 1) Proper names;
 2) Possessive forms;
 3) Plurals.

I have been checking word by word to spot errors and missing plurals/possessives.
It will take many years to have it ready.
Some words need rechecking since I can't find plurals or the entries in the .dic use PFX and SFX, so I can't properly fix them.
I need to code a feature in my tool Proofing Tool GUI to extract PFX + SFX words.
I was checking the words from 'a' to 'z' but Peter (a friend who suggests words) told me to begin with less % existing words.
If one wants to do it hard, there is no other way, I must check word by word, it will take years, but it will be done.


Adding new words:
If you believe to have found a missing/incorrect word, please send it to me for analysis. If it is in the Oxford or Collins dictionaries, I will add it.


Removing US words:
If you find American words, which appear both in Oxford and Collins dictionaries as such with a British correspondence, please send them to me for analysis and removal.
Notice that the dictionary was originally based on the US one, so many US words originated from there.


Obscene/offensive words:
If you find any of the kind in the wordlist, and they don't have the flag NOSUGGEST !, please report them to me.


Archaic words:

I will only add archaic words if they don't interfere with other words.
Notice that in literacy writing, some writers use archaic words.
Please report to me any archaic words that have very similar current words.


Obsolete words:
If you find any obsolete words, please report them to me.


Hyphenated words:
I have been avoiding adding words with hyphens and thus be checking if they can also be written as just one word (together), or if the official dictionaries state that they have no hyphen at all, thus removing them from the .dic.



THE DICTIONARY PROJECT GOALS/TASKS FOR 2025+:
 Release Proofing Tool GUI 4.0;
 Rewrite the release notes in a simpler/cleaner way;
 Update both the GB dictionary and the ZA dictionary in parallel;
 Copy most proper names from the GB dictionary to the ZA dictionary (PTG enhancement required);
 Add plurals and possessives to most current nouns;
 Add uncountable nouns;
 Add proper names;
 Add basic missing words which users must suggest;
 Add un- and non- words;
 Add derivates for words;
 Remove Americanisms;
 Search for incorrect hyphenated words and remove them;
 Remove (extract) all prefixes from .dic file to make it simpler to find/export words (PTG enhancement required);
 Remove duplicates (PTG enhancement required);
 Add flags in .dic words to make it easier to merge words (PTG enhancement required);
 Create GitHub folders for -ize and -ise Hunspell files, updated twice a year (PTG enhancement required).



I hope that people will enjoy my work and that it may be useful to the progress of humankind.

Kind regards from:
Marco in Spain drinking Coca-Cola (2016)
30.Jun.2016

Marco A.G.Pinto
Master of Science in Information Warfare/Competitive Intelligence.
Open-Source Developer.
 

Back to top



Last update: 11.Sep.2024