Crawl and extract all articles of a domain using DevExtrac. Good for analytics and web text mining. GPL-V3.
Article Extractor (DevExtrac)
High Performance intelligent text extraction from web pages. Based on our observation of patterns across thousands of web pages. Good for any purpose that requires clutter-less text. GPL-V3.
News Categorization Corpus
Hand-picked keywords for 6 news categories. No training required, select right category fast using our proprietory news-categorizer.
We have developed next generation of clustering using index-based approach. Lightening fast and percent-wise clustering.
Hire us for creating solutions for your idea. Idea must be appealing to us.
Tools We work on
Python,Django, PHP, CodeIgniter, Java, Play Framework, WP, Drupal.