CRUCIVERB.COM

User

Welcome, Guest.
Please login or register.
 
 
 
Forgot your password?

Navigate

Resources

Donations


You can help support this site by making a small donation using either a PayPal account:

or with a major credit card such as:

 

 

Click here for details.

Author Topic: How big are your word lists?  (Read 9539 times)

Simeon

  • Guest
How big are your word lists?
« on: April 11, 2011, 12:58:49 PM »
Hello everyone. I'm still working on my crossword editor Palabra that I mentioned in an earlier post. Just for technical reasons I'd like to know how crossword authors handle their word lists.

In particular I'd like to get an idea of how word lists are created and used:

  • How many words are there on your word lists? Should I be thinking of 100,000 words or millions of words?
  • Do you usually work with a few word lists (such as a specialized word list with theme words) or just all the word lists that you've ever created?

Thanks in advance :)

Todd G

  • Jr. Member
  • **
  • Posts: 55
Re: How big are your word lists?
« Reply #1 on: April 17, 2011, 05:00:50 AM »
I believe the largest word lists have about 750K entries.  My own word list is less than 75K...and almost 65K of that came from the Cruciverb NYT list.  And I've been published.

This is the list I use for nearly all the puzzles I create: I have created specialized lists for specific puzzles (which didn't get published, so take with a grain of salt) and I have a theme list of my own 15 letter entries I can use for themeless puzzles...though I'm guessing most constructors put theirs into their standard word list.

Of course, by all means, feel free to add as many fresh, colorful, commonly known entries to your list(s) as you can.  But you don't need as many as you might think.

jorkel

  • Full Member
  • ***
  • Posts: 134
Re: How big are your word lists?
« Reply #2 on: April 17, 2011, 09:30:59 AM »
Todd is correct;  You can make a decent puzzle with just 75K, and many lists extend into the 100's of K.  (And I'm convinced that if you have the Cruciverb wordlist, then you probably have access to 99% of the entries that you'll ever need which are shorter than 7-letters ... the longer, multi-word entries are the ones that you need to go out and locate from whatever source you can).  A more important issue is how well ranked one's wordlist is, and a beginner can rerank enough words in his wordlist to produce publishable puzzles.  If you have the Professional version of Crossword Compiler, for instance, you can adjust the ranks of words that show up in the fill menus ... and that helps immensely.  Something to think about.

I have dozens of wordlists which function in different ways.  For instance, if you were going to make a puzzle in which none of the entries contain the letter E, then you would go to the Word List Manager (in CC) and figure out a way to produce such a wordlist.  (How else would one make such a puzzle?)  I also periodically make wordlists which contain entries of all the same rank or the same word length ... just so I can make visual inspections and adjust them accordingly.  (Think about it:  words only compete with other words which are the same length).  One could also make mini-wordlists that are all Roman numerals -- things like that -- so that their ranks could be adjusted quickly before adding them back to the main list.

One other thing I recommend:  have a separate wordlist which contains theme-specific entries, and use that wordlist to periodically purge such entries from your main wordlist.  Whenever you download an upgraded version of the Cruciverb wordlist, you tend to pick up more theme-specific words, so identify these and add them to your purge list.

Regards,
Joe K.
« Last Edit: April 17, 2011, 09:47:42 AM by jorkel »

Simeon

  • Guest
Re: How big are your word lists?
« Reply #3 on: April 18, 2011, 04:33:58 PM »
Thanks both for your elaborate replies. I have been testing my program so far with a wordlist of 73K words so that should be a good start for a first release. If troubles arise when using a wordlist of 750K words then I'll focus on that for a future release.

I don't have features such as ranking words or composing wordlists yet but it's only version 0.1.5 after all. The current features are:

- Create a crossword, edit the grid and add clues.
- Symmetry options when editing the grid.
- Export to PNG and PDF.
- Open and save puzzles in XPF file format (also used on xwordinfo.com).
- View a variety of puzzle properties (number of blocks, characters in use, etc.).
- Warnings for unchecked cells, consecutive unchecked cells and two-letter words.
- Modify the grid by adding/removing rows/columns and moving content up/down/left/right.

That may not seem much but it'll be much easier now to build new features on top of what I have. It's important to get the basics right and that the program is easy to use.

I'm aware that an important lacking feature is the automated or assisted grid filling. I've been working on that but I'm not putting it in this release yet (it's simply not done yet). I'm currently wrapping things up for release and then I intend to post it here to hopefully get some initial feedback.

Thanks again for your feedback and insight on how wordlists are handled by constructors.
« Last Edit: April 18, 2011, 04:44:44 PM by Simeon »

Simeon

  • Guest
Re: How big are your word lists?
« Reply #4 on: April 20, 2011, 05:01:42 AM »
I'm pleased to say that version 0.1.5 is now available for download at http://pypi.python.org/pypi/palabra. See the Readme or here ( https://bitbucket.org/svisser/palabra/wiki/Home ) for installation instructions. It should work best on Ubuntu 10.04 LTS 64-bit (my current system) but it should also work on other Ubuntu versions. If you have any problems with installing or running then please let me know. It currently uses the default wordlist of Ubuntu but in the future you can also configure your own wordlists.

I don't have a version for Windows / Mac OS X yet but it should technically be possible. I'll let you know when there is news.

Simeon

  • Guest
Re: How big are your word lists?
« Reply #5 on: April 21, 2011, 06:06:08 AM »
If you have trouble installing, please visit https://bitbucket.org/svisser/palabra/wiki/Home . I have placed updated installation instructions there.

 


Powered by EzPortal