Spell checker dictionary for a game

I’m making the board game boggle, and have finished almost all the code portion. However, the one thing I am lacking is a a dictionary to see if a string is actually a word. Basically, i want to make a method that, given a string, will iterate through a text file and see if it is there. I have tried spell checkers like jazzy, and they are not what I want. If anyone would help me with this, that would be awesome.


public boolean isWord(String word)
{
	//JAVA == 7
	BufferedReader reader = new BufferedReader(new InputStreamReader(Files.newInputStream(Paths.get("path/to/file.txt"), StandardOpenOption.READ)));

	//JAVA <= 6
	BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("path/to/file.txt")));

	String line;

	while((line = in.readLine()) != null)
	{
		if(word == line)
			return true;
	}
	return false;
}

EDIT:

Changed method. Before it was loading all the words into an arraylist for storage in RAM instead of just checking the file for the word.

Seems inefficient to open and traverse through each character in a huge dictionary file, looking for line breaks, and then again to see if the word matches. Also, don’t use ‘==’ to compare strings; instead use equals() or for this purpose equalsIgnoreCase().

Using a hash map is probably better; word lists are not incredibly massive in filesize anyways and since you’re working on a word game you’ll probably need to spellcheck a lot.

Also check out this for ideas on “corrections”:
http://norvig.com/spell-correct.html

Yes, but think of the memory it would take up if you loaded ALL the words into RAM. :o
I just gave the code he asked for. He can decide what to do in the end. :point:
(if that comment was for the OP, and not me, never mind :))

@OP:

I recommend you do some culling on the words, because in boggle, not all words are possible, so take out all the ones that aren’t neccessary, and see how many are left.

Whether you load them into RAM or go through a file, is up to you.
If you hit performance issues, rethink your design.

thanks for the suggestions, but I was actually look for a text file of some such. I guess I should have been clearer. I just don’t want to have to rewrite the entire English language.

I compiled a dictionary file a while back…enjoy!

ra4king thank you so much! this is perfect! ;D

Coincidentally I recently thought about implementing Boggle or something similar just for the fun of it. It looks like their are structures designed for spell checking which mainly try to be compact and structures used for solving word game puzzles which try to enable fast search operations other than binary “is it in the list?” questions. Word lists are the simplest and most obvious answer, though.

Though it didn’t come up and isn’t what I was looking for, here are some simple improvements you could make. 1) Compress the file and read it using GZIPInputStream. (Probably won’t save nearly as much space as you could with complex structures, but it’s only one extra line.) 2) Create an index file with fewer words so that you can seek somewhere in the middle of a file and find the result sooner. (If it’s not all in memory where you can use a binary search.) This is like the top of the page in a dictionary where it shows the first and last word on each page or the tabs with letter labels on them for large dictionaries.


Also, two possible vocabulary clarifications for everyone. You should say “word list” and not “dictionary”. Word lists are simply files that contain one word per line. Dictionaries tend to refer to things that include extra data besides the words themselves or files that have words with the same root grouped together. I guess that’s why some people call maps dictionaries, though word lists don’t have a Java analog.

And avoid “spell check” if you’re just looking for whether a word exists. Spell check implies spelling corrections. :expressionless: Although I don’t know what you should use instead. “Word check” maybe?

‘computer freezes while attempting to scroll down’
:emo: :emo:

i had the same problem for a while, it depends on what your using. notepad, word, other text editors should work fine. eclipse or anything that else won’t be able to handle it. try dividing up the data also, like certain letters in one file.

More details? Give us a reason to believe it’s a good list!

Word lists with a bit of background and a licence which clearly allows their use include EOWL and 12dict. If you have good enough lawyers to take on Hasbro, you could also try to get hold of a SOWPODS or TWL file.

I don’t know why some people think that a word list is too big to hold in RAM. We’re talking 1 to 3 megabytes here, which is less than the memory required for a back buffer for a 1024x768 image at 32-bit depth. I hold that the most appropriate data structure for checking whether something is a word is a trie. You might want to generate it as a pre-processing step and serialise it to a file.

Ah yeah that text file is a compilation of 12dict and others (I noticed some words missing from 12dict). I then wrote a program that removed duplicates and sorted them, so don’t worry :stuck_out_tongue: