apt-fast and Axel: Roughly 26x Faster apt-get Installations and Upgrades

The apt-fast script I have created is a little shellscript that increases the speed of apt-get by many times. You need to have the axel download accelerator installed, which is a simple, short process, but everything else is extremely straight forward. I started out downloading the upgrades for Kubuntu, at 32kb/s. Not terrible, but not …

Continue reading ‘apt-fast and Axel: Roughly 26x Faster apt-get Installations and Upgrades’ »

boilerpipe – Project Hosting on Google Code

The boilerpipe library provides algorithms to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs …

Continue reading ‘boilerpipe – Project Hosting on Google Code’ »

Goliath: Non-blocking, Ruby 1.9 Web Server – igvita.com

There are easily half a dozen of factors you need to consider when picking an app server: the choice of the VM, implementation model, performance and memory usage, driver and library availability, community support, and so forth. In other words, it is a complex set of requirements, and no one solution is likely to meet …

Continue reading ‘Goliath: Non-blocking, Ruby 1.9 Web Server – igvita.com’ »

Scrappy, Simple Stupid Spidering

Scrappy is an easy (and hopefully fun) way of scraping, spidering, and/or harvesting information from web pages. Internally Scrappy uses the awesome Web::Scraper and WWW::Mechanize modules so as such Scrappy imports its awesomeness. Scrappy is inspired by the fun and easy-to-use Dancer API. Beyond being a pretty API for WWW::Mechanize::Plugin::Web::Scraper, Scrappy also has its own …

Continue reading ‘Scrappy, Simple Stupid Spidering’ »