Jay Taylor's notes

back to listing index

JoBo

[web search]
Original source (www.matuschek.net)
Tags: web-crawler archival www.matuschek.net
Clipped on: 2016-08-08

Image (Asset 1/6) alt=

JoBo

 Startseite | Blog | Sitemap | Impressum | Login
Sie befinden sich hier: Software / JoBo / Main / 

Purpose

JoBo is a simple program to download complete websites to your local computer. Internally it is basically a web spider. The main advantage to other download tools is that it can automatically fill out forms (e.g. for automated login) and also use cookies for session handling. Compared to other products the GUI seems to be very simple, but the internal features matters ! Do you know any download tool that allows it to login to a web server and download content if that server uses a web forms for login and cookies for session handling? It also features very flexible rules to limit downloads by URL, size and/or MIME type.

For programmers it features a very flexible object model and is easily expandable - expect new modules in the future ! It is implemented in Java and the source code is available. If you want to implement your own web spider, the WebRobot class will be a good starting point. Even if you don't want to use it as a download tool but for indexing, link checking or whatever you want, JoBo is the right tool. Retrieving documents and handling these documents are completely seperated - therefore you can plug in your own module easily.

Image (Asset 3/6) alt=

Features

  • command line and graphical version (but command line version needs a major update, currently the GUI version has much more features)
  • recursive search of all documents starting from a given start document
  • support of <A> <AREA> <IMG> <FRAME> tags (with fault tolerance)
  • support of the robot exclusion protocol
  • user controlled maximal search depth
  • user agent name can be defined
  • support of referrer headers
  • support of automated form handling (JoBo can fill fields with predefined values)
  • cookie support
  • XML configuration
  • used bandwidth can be limited
  • allow/deny downloads by mime type and document size (e.g. ignore all image/* files)
  • allow/deny downloads by regular expressions (e.g. don't download /cgi-bin)
  • can convert absolute links to relative
  • download only files newer then a given age
  • resume job

16.12.2006: JoBo 1.4 released

After more then two years of beta tests and only some minor changes, I created a new JoBo version. The new version contain several bugfixes, but no new functionalities. Also lots of deprecated methods have been removed and untyped collections have been replaced by generics.

JoBo will run under Java 1.5 or higher (only 1.5 tested).

Archivierte Seite

Diese Seite wurde archiviert, d.h. sie wird nicht mehr aktiv gepflegt und die Informationen entsprechen unter Umständen nicht mehr dem aktuellen Stand.

Image (Asset 4/6) alt= Image (Asset 5/6) alt= Image (Asset 6/6) alt=

Look-Out
Talking about everything
Crazy audio
DIY audio projects and more
Anmesty International SchweizMenschenrechte für alle

Menschen für MenschenKarlheinz Böhms Äthiopienhilfe