list of all words in english lang

Hey gang, lets see if anyone can suggest a resource for this porblem. I am trying to locate and store in a db all of the words that are in the english language. I have dug around MS Word, for there DB, but it is harder to get at than I thought. I know that there are less than a 100000 words in the english language. Can anyone suggest a resource for this information?

Hi
I dont know where to get all words in the english language, but I want to point out that there are much more then 100000 words, somewhere between 500000 and 1 million depending on how you count, see http://hypertextbook.com/facts/2001/JohnnyLing.shtml for some more info.

// Gregof

Hey, I’m sure there are more than 100000 words. I have a database that has 118619 words in it (just a standard .mdb … my school gave it to me for a predictive text project, and it’s 5.26 Mb).

James.

I found a 5 meg one in /usr/dict/ on Fedora Core 3 :slight_smile: I think that’s where it was…

I’m getting a bit nervous. First he wanted all the colors in a database, now all the words. And the soviet references in his username. Is another revolution on it’s way?

shmoove

As Malohkan mentions above, most UNIX boxes will have a database of words at one of the following locations:

/usr/dict/words
/usr/share/dict/words

They usually aren’t all that comprehensive, though - the machines I have here contain databases from 25K to 45K words, but no higher. They aren’t definitive lists of all permutations, and will often be missing different tenses or derivations of words, preferring to leave it to the user to be sensible about things.

You might want to reconsider your requirement to have every English word in your database. Maybe fifty thousand words is good enough?

The language is not static, and not even agreed upon - different people have different opinions on whether something is a valid word or not. Many people treat the Oxford English Dictionary as law, some consider it an archaic publication that isn’t moving with the times fast enough.

[quote]The language is not static, and not even agreed upon - different people have different opinions on whether something is a valid word or not. Many people treat the Oxford English Dictionary as law, some consider it an archaic publication that isn’t moving with the times fast enough.
[/quote]
Generally the Oxford dictionary is considered the de facto reference for British English, while Webster’s Dictionary is considered the de facto source for American English. Both have been known to do some pretty wacky stuff, such as Webster’s adding “ain’t” and Oxford adding “d’oh!”. :slight_smile:

FWIW, 10,000 - 50,000 words is probably more than sufficient to cover the majority of written and spoken English. Attempting to build an auto-complete function that handles more words would be difficult at best. Of course, if you REALLY wanted to capture the sum of the English language, you could always build a web crawler that would read and index every word it comes across. Once a word crosses a certain threshold (e.g. 10,000 occurrences), then you add it to the database as a “real” word. Be warned, however! While this method has successfully been used by Google for suggesting better spelled matches, it may result in pseudo-words such as “LOL” ending up in your database. It may also result in common misspellings such as “missle” instead of “missile”.

the linux one is good enough to cover Word Whomp and Text Twist… ;D