Package : boilerpipe
Package details
Summary: Boilerplate Removal and Fulltext Extraction from HTML pages
Description:
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.
The library already provides specific strategies
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate.
URL: https://github.com/kohlschutter/boilerpipe
License: ASL 2.0
Maintainer: neoclust
Description:
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.
The library already provides specific strategies
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate.
URL: https://github.com/kohlschutter/boilerpipe
License: ASL 2.0
Maintainer: neoclust
List of RPMs
- boilerpipe-1.2.0-11.mga7.src.rpm (Mageia 7, i586 media, core-release)