[edit | delete | history]

Living Semantic Web

Does the Semantic Web behave like a living system?

Living Semantic Web > Semantic Web Graph > RDF Crawler


The New RDFCrawler is a modification of the existing RDFCrawler. The RDF API has been updated to Jena in order to cope with the greatest amout of RDF metadata available in the Web. Moreover, some other changes have been introduce to improve its capabilities. They are summarised in the next points:

Installation

Download an unzip LivingSW.zip. It contains the source code (/src), compiled code (/bin), a regular expressions package (/lib) and a pair of useful scripts. Moreover, the New RDFCrawler requires some libraries from Jena to be placed at /lib . It has been tested with those from the Jena 1.6.1 version: jena.jar, icu4j.jar, xerces.jar, junit.jar, concurrent-1.3.0.jar

Use

The different functionalities of the New RDFCrawler are packed in the two provided scripts. The first one launches the crawler. There are two options, the first one crawls from URL for the given time and crawling depth. The second, pre-processes the given HTML URL to extract the URL from which the crawling will be performed.

> rdfcrawl URL [depth :int] [time :int]
> rdfcrawl base :htmlURL [depth :int] [time :int]

The other script is used to convert the N-Triples RDF model produced by the crawler to a Pajek Net. Moreover, It can alco convert RDF/XML input serialisations files:

> nt2pajek rdfserialisationfile(.nt|.xml)

Rhizomik Rhizomik