Package : perl-Search-Tokenizer

Package details

Summary: Decompose a string into tokens (words)

Description:
This module builds an iterator function that will progressively extract
terms from a given input string. Terms are defined by a regular expression
(for example '\w+'). Term matching relies on the builtin "global match"
operator of Perl (the 'g' flag), and therefore is quite efficient.

Before being returned to the caller, terms may be filtered by an auxiliary
function, for performing tasks such as stemming or stopword elimination.

A tokenizer returned from the /"new" method is a code reference, _not_ a
regular Perl object. To use the tokenizer, just call it with a string to
parse : this will return another code reference, which works as an
iterator. Each call to the iterator will return the next term from the
string, until the string is exhausted.


URL: http://metacpan.org/release/Search-Tokenizer
License: GPLv1+ or Artistic

Maintainer: nobody

List of RPMs