perils of web crawlingwriting

internet-policyinternet-culturetechnology-policyforwarded-content
1996-02-13 · 3 min read · Edit on Pyrite

Source

Automatically imported from: http://commons.somewhere.com:80/rre/1996/perils.of.web.crawling.html

Content

This web service brought to you by Somewhere.Com, LLC.

perils of web crawling

``` [The world is such a strange place.]

---

This message was forwarded through the Red Rock Eater News Service (RRE). Send any replies to the original author, listed in the From: field below. You are welcome to send the message along to others but please do not use the "redirect" command. For information on RRE, including instructions for (un)subscribing, send an empty message to rre-help@weber.ucsd.edu

---

Date: Tue, 13 Feb 96 9:13:29 PST From: RISKS List Owner Subject: RISKS DIGEST 17.71

RISKS-LIST: Risks-Forum Digest Tuesday 13 February 1996 Volume 17 : Issue 71

---

Date: Sun, 11 Feb 1996 09:12:18 -0500 From: simsong@vineyard.net (Simson L. Garfinkel) Subject: Those fun-loving guys at LANL.GOV

Reading about LANL's "(Click here to initiate automated `seek-and-destroy' against your site.)", I was reminding about a story that happened to a friend of mine a few weeks ago.

Turns out that my friend was writing a web-walking robot, and it made the mistake of walking into the LANL site. This robot was running at the end of a 28.8K SLIP link, so it wasn't capable of issuing more than 1 request every 2-3 seconds.

Well, the folks at LANL have some sort of monitoring software, because they noticed it immediately. What they did was they called up his Internet service provider and said that he was attacking a federal interest computer, and they threatened the ISP that unless they revealed the name and phone number of my friend, LANL would take legal action against the ISP.

Those fun-loving guys at LANL then called up my friend and left the following message on my friend's answering:

"YOU ARE RUNNING A WEB ROBOT THAT IS ATTACKING A FEDERAL INTEREST SITE. UNLESS YOU TURN IT OFF WITHIN AN HOUR, WE WILL SUE YOU AND SHUT YOUR COMPANY DOWN."

The folks at LANL then called my friend's Internet service provider and threatened them with legal prosecution for violation of various computer crime statutes, unless the ISP cut off my friend's Internet connection.

This is really scary --- the thought that some government official can call up your ISP and, through a combination of threats and legal citations, have somebody's Internet feed immediately terminated. What about due process of law? What about innocent until proven guilty? What about having to go through the mere formality of obtaining a court injunction before having action such as this taken?

---

Date: 12 Feb 1996 14:02:09 GMT From: weberwu@compute.tfh-berlin.de (Debora Weber-Wulff) Subject: More on WWW-Robot false hits...

A few weeks ago our WWW server was brought to its knees. We were being inundated with thousands of URL requests for a student's home page. The page didn't look that interesting, but we closed out the account and put out an all-points search for the student in question and tried to figure out what the entire world wanted from this student. Theories varied from a viral attack to wayward robots.

When he was dragged into the computer services center, he confessed to what he had done: He had installed one of these nifty counters to see how many times his pages was read. Since he had no hits other than himself, he decided to include some good names on his page (a little racier than "sex 'n drugs 'n rock 'n roll", but this is a family publication) and then he registered his page with "some cyberporn list". He did not remember which one it was. So apparently all the robots in the world found a new site that seemed to have racy new stuff in it; It was duly registered and there appear to be quite a number of people that either automate the search for sex pictures or check out what's new first thing in the morning; many are so stupid and keep trying, even when the server tells them that this link is no longer in operation. It took days for things to calm down. Needless to say, the student currently has his net account revoked...

The moral of the story: don't attract robots with false claims, there are too many of them out there!

Debora Weber-Wulff, Technische Fachhochschule Berlin, FB Informatik, Luxemburger Str. 10, 13353 Berlin, Germany weberwu@tfh-berlin.de

---

Date: Tue, 13 Feb 1996 16:36:23 +1100 (EDT) From: Cameron Simpson Subject: Re: Risks of web robots (Dellinger, RISKS-17.66,67)

There is a protocol called the Robot Exclusion Protocol designed explicitly to prevent robots from traversing on-the-fly datasets. It solves exactly the problem outlined above.

A moment's search through Yahoo reveals: http://info.webcrawler.com/mak/projects/robots/norobots.html entitled "A Standard for Robot Exclusion". - Cameron Simpson cameron@dap.csiro.au, DoD#743 http://www.dap.csiro.au/~cameron/

---

End of RISKS-FORUM Digest 17.71

--- ```

This web service brought to you by Somewhere.Com, LLC.