Our spider discards any URL that does not have the same host name as the
beginning URL; thus, the spider is limited to a single host. We could also
extend the program to specify a set of legal hosts, allowing a small group
of servers to be searched for content.
Base on the URL you entered, the spider will create 3 files that inlclude
almost all the information you need to do web analysis.
1. summary.txt includes all the information about the links found on the
site, including URL link, status code returned, content-length, last
modified date, content-type, number of internal links, number of out site
links and number of images.
2. images.txt includes all the information about the images found on the
site, including URL links to the image, status code returned, image size,
last modified date and image type.
3. referencedTimes.txt includes the information of each page about the
number of times referenced by other pages.
Please try our spider and see analyses results from the three files generated
by the spider (will take minutes, maxiam number of URLs is 2000, including reference and
image links).
If you want to read the program document, please click DOCUMENTATATION.