CRUCIVERB.COM

Constructing => General Discussion => Topic started by: zvit on October 31, 2021, 07:07:29 AM

Title: How many words should be in a word list?
Post by: zvit on October 31, 2021, 07:07:29 AM
I tried to add two dictionaries in addition to the "default" one in Crossfire, and it wouldn't accept the third one, saying it was out of memory. It's not my computer memory because I have 128GB of ram and most of it is free when I get this error. I'm guessing it's the software's database limitation but don't understand why.

The Crossfire default dictionary, together with my two dictionaries combined, has 979,211 words.

This got me thinking: I see that the more words you let a software work with, the more autofill options it finds. Too little a word list and the software will say "unfillable grid."

But if 979,211 words spit out a memory error, I would like to know how many words do you guys typically let a software work with?

Title: Re: How many words should be in a word list?
Post by: mmcbs on October 31, 2021, 07:29:35 AM
I'm not a Crossfire used, but 900K+ words is larger than any viable word list I've heard of. I suspect you have some duplicates in there, perhaps phrases spaced differently such as HELP OUT and HELPOUT, or some that are the same but one in lower case the other in upper case, or some punctuated and others not. Crossword Compiler has a utility to merge these, but I don't know about Crossfire.
Title: Re: How many words should be in a word list?
Post by: zvit on October 31, 2021, 07:53:19 AM
I'm not a Crossfire used, but 900K+ words is larger than any viable word list I've heard of. I suspect you have some duplicates in there, perhaps phrases spaced differently such as HELP OUT and HELPOUT, or some that are the same but one in lower case the other in upper case, or some punctuated and others not. Crossword Compiler has a utility to merge these, but I don't know about Crossfire.

Thanks, but nope, I don't have duplicates as you mentioned. I use excel to delete duplicates and check that my list only has one-word words. (screenshot below)

Being new to compiling, and noticing that the more words I had, the more autofill options I got and thinking that professionals probably use millions of words, I just added more and more.

So my question is, what IS a normal word list number? =<10K? =<100K?

And WHY wouldn't you use a million words if you get more autofill results? I know professionals don't autofill ever puzzle but the autofill gives you a good base to start with and good ideas.

(https://bit.ly/3nNdbVM)


Title: Re: How many words should be in a word list?
Post by: mmcbs on October 31, 2021, 08:08:05 AM
Well, of the screenshot you provided, about half are words that would never be used in a mainstream crossword, and unfortunately this is typical of the "big" word lists. They're just full of technical or specialized words/phrases that are not known by most people. They are chock full of abbreviations that are not familiar. They have many offensive, inappropriate, or outdated words/phrases. When it comes to word lists bigger is not better. My "default" word list is about 400,000 and probably less than half of it is usable in a typical crossword. To create a grid that will be enjoyable for your solvers (or acceptable to an editor) you have to be very selective about what words you use, and this is why professionals never use autofill unless they have a 100% curated word list. I actually have one of these that I call Basic English which has under 40,000 entries.
Title: Re: How many words should be in a word list?
Post by: zvit on October 31, 2021, 08:34:21 AM
Thanks. By the way, I am not creating a mainstream crossword, it's for the medical community (ems, doctors, nurses, paramedics, etc.) that's why my list is unique.
Title: Re: How many words should be in a word list?
Post by: asternknit on October 12, 2023, 12:38:41 AM
Thanks. By the way, I am not creating a mainstream crossword, it's for the medical community (ems, doctors, nurses, paramedics, etc.) that's why my list is unique. geometry dash world (https://geometrydashworld.net)
They simply contain a lot of unfamiliar technical or specialist terminology and phrases. They are jam-packed with unfamiliar abbreviations. They use a lot of language that is offensive, improper, or out of date. Bigger is not always better when it comes to word lists.