le accidental occurrence

42at

Introducing WordPos

without comments

Wordpos (github, MIT license) is a node.js library and npm module build on top of natural’s WordNet module.   While natural.WordNet lets one lookup() a word in the WordNet database, I needed a simple way to pick all adjectives from a small corpus.

WordNet data files are split into the four principal parts-of-speech: nouns, verbs, adverbs, adjectives.  The lookup() methods naturally looks in all four files for the definition.  If we’re just looking for adjectives, this can be a bit inefficient.

Hence the wordpos module.  To use:

var WordPOS = require('wordpos'),
    wordpos = new WordPOS();

wordpos.getAdjectives('The angry bear chased the frightened little squirrel.', function(result){
    console.log(result);
});
// [ 'little', 'angry', 'frightened' ]

Similar functions exist getNouns(), getVerbs(), getAdverbs(). If you need all of them:

wordpos.getPOS('The angry bear chased the frightened little squirrel.', console.log)
// output:
  {
    nouns: [ 'bear', 'squirrel', 'little', 'chased' ],
    verbs: [ 'bear' ],
    adjectives: [ 'little', 'angry', 'frightened' ],
    adverbs: [ 'little' ],
    rest: [ 'the' ]
  }

Certain stopwords and duplicates are removed before the lookup. result.rest are the words not found in the dictionary.

Underneath, getX() functions use an efficient isX() which looks up the word in the index files, but does not look up the definition in the data files.  The isX() functions are similarly exposed:

wordpos.isVerb('fish', console.log); // true
wordpos.isNoun('fish', console.log);  // true
wordpos.isAdjective('fishy', console.log); // true
wordpos.isAdverb('fishly', console.log);  // false

Note that all these functions are async. To install:

npm install wordpos

This will also install WNdb module containing the WordNet files (about 10 MB compressed).

Written by Moos

May 18th, 2012 at 6:53 pm

Posted in blog

Tagged with , ,

Leave a Reply