Rimm "study" of Internet pornographywriting

educationprivacylaborinternet-culturelawcommerceforwarded-contentcommunity-networking
1995-06-28 · 9 min read · Edit on Pyrite

Source

Automatically imported from: http://commons.somewhere.com:80/rre/1995/Rimm.study.of.Internet.p.html

Content

This web service brought to you by Somewhere.Com, LLC.

Rimm "study" of Internet pornography

``` [Posted with permission. Too bad the Internet can't sue for libel.]

Date: Wed, 05 Jul 95 20:30:49 -0700 From: Brian Reid Subject: Critique of the Rimm study

I have read a preprint of the Rimm study of pornography and I am so distressed by its lack scientific credibility that I don't even know where to begin critiquing it. Normally when I am sent a publication for review, if I find a flaw in it I can identify it and say "here, in this paragraph, you are making some unwarranted assumptions". In this study I have trouble finding measurement techniques that are not flawed. The writer appears to me not to have a glimmer of an understanding even of basic statistical measurement technique, let alone of the application of that technique to something as elusive and ill-defined as USENET.

I have been measuring USENET readership and analyzing USENET content, and publishing studies of what I find since April 1986. I have spent years refining the measurement techniques and the data processing algorithms. Despite those 9 years of working on the problem, I still do not believe that it is possible to get measurements whose accuracy is within a factor of 10 of the truth. In other words, if I measure something that seems to be 79, the truth might be 790 or 7.9 or anywhere in between. Despite this inaccuracy, the measurements are interesting, because whatever unknowns it is that they are measuring, these unknowns are similar from one month to the next, so that the study of trends is meaningful. As long as you are aware of what it is that you are taking the ratio of, it is also meaningful to compare USENET measurements, because whatever the errors might be, they are often similar in two numbers from the same measurement set, and they are multiplicative, so they tend to cancel out in quotient.

In other words, in the results that I publish, the two kinds of measurements that are meaningful enough to pay attention to for serious scholarship are the normalized month-to-month trends in the readership percentages of a given newsgroup, and the within-the-same-month ratio of the readership of one newsgroup to the readership of another. The reason that I publish the numbers is primarily to enable trend analysis; it is not reasonable to take a single-point measurement seriously.

No matter what the level of accuracy you are seeking, it is imperative that you understand what it is that you are measuring. Whenever you cannot measure an entire population, you must find and measure a sample, and the error in your measurement will be magnified if your sample is not a representative sample. A small error in understanding the nature of the sample population will lead to an error like the famous "Dewey defeats Truman" headline in the 1948 US Presidential election. A large error in understanding the nature of the sample population can lead to results that are completely meaningless, such as measuring pregnancy rates in a population whose age and sex are unknown.

Rimm has made three "beginner's errors" that, in my opinion, when taken together, render his numbers completely meaningless:

1. He has selected a very homogeneous population to measure. While he has chosen not to identify his population, he has included enough of his sample data to allow me to correlate his numbers with my own numbers for the same measurement period. His data correlate exactly with my numbers for Pittsburgh newsgroups in that measurement period; only his own university (Carnegie-Mellon) has widespread enough campus networking to make it possible for him to sample that large a population. It is therefore almost certain that he has measured his own university. I received my Ph.D. in Computer Science from Carnegie-Mellon University, and I am very aware that it is dominantly male and dominantly a technology school. The behavior of computer-using students at a high-tech urban engineering school might not be very similar to the behavior of other student populations, let alone non-student populations.

2. He has measured only one time period, January 1995. Having lived at Carnegie-Mellon University for a number of years, I know first-hand that student interests in January are extremely different from student interests in September or April. When measuring human behavior about which very little is known, it is important to take numerous measurements over time and to look for time series. Taking the last few years worth of my data and doing a trend analysis in the newsgroups that he has named as pornographic shows an average 3:1 seasonal trend change between low-readership months (November and April) and high-readership months (September and January). But the trends are different in different newsgroups. A single-point measurement is not nearly as meaningful as a series of measurements.

3. He makes the assumption that by seeing a data reference to an image or a file, it is possible to tell what the individual did with the file. We in the network measurement business are very careful to explain what it is that our measurements mean. Here is the standard explanation that I publish with my monthly measurements to talk about the number that Rimm calls "number of downloads".

To "read" a newsgroup means to have been presented with the opportunity to look at at least one message in it. Going through a newsgroup with the "n" key counts as reading it. For a news site, "user X reads group Y" means that user X's .newsrc file has marked at least one unexpired message in Y.

Rimm used my network measurement software tools to take his data, and he did not anywhere in his article state that he had made changes to them, so I must conclude that his numbers and my numbers are derived from the same software. But the number that he is using for "number of downloads" is the same number that I call "number of readers" by the above definition. It has nothing to do with the number of downloads. In fact, it is not possible for this measurement system to tell whether or not a file has been downloaded; it can tell whether or not a person has been presented with the opportunity to download a file but it cannot tell whether the user answered "yes" or "no".

In summary, I do not consider Rimm's analysis to have enough technical rigor to be worthy of publication in a scholarly journal.

Brian Reid, Ph.D. Director, Network Systems Laboratory Digital Equipment Corporation Palo Alto, California reid@pa.dec.com http://www.research.digital.com/nsl/people/reid/bio.html

Forwarded-by: Hal Forwarded-by: bostic@CS.Berkeley.EDU (Keith Bostic) Forwarded-by: Wendell Craig Baker Forwarded-by: kieran@interport.net (Aaron Dickey)

A Preliminary Discussion of Methodological Peculiarities in the Rimm Study of Pornography on the "Information Superhighway"

June 28, 1995

David G. Post Visiting Associate Professor of Law Georgetown University Law Center Dpost@eff.org, or Dpostn00@Counsel.com 202-364-5010

Please Distribute Freely

The Georgetown Law Journal is about to publish the results of a study by Marty Rimm of Carnegie Mellon University on "Marketing Pornography on the Information Superhighway: A Survey of 917,410 Images, Description, Short Stories, and Animations Downloaded 8.5 Million Times by Consumers in over 2000 Cities in Forty Countries, Provinces, and Territories." The study has recently been the subject of a cover story in Time magazine ("Cyberporn," July 3, 1995). Rimm has claimed that the methodology and results were extensively reviewed by Carnegie Mellon faculty (see "Cybersensitivity," Washington Post, page C1, June 28, 1995); whether or not that was the case, it appears that the Georgetown Law Journal did not similarly make the study available to outside reviewers (other than the three commentators -- Anne Wells Branscomb, Catherine MacKinnon, and Carlin Meyer) prior to publication. As a member of the Georgetown University faculty with research interests in this area, I was approached in March, 1995, to help several of the student editors with questions that they had arising out of the study; they would not, however, show me a copy of the study itself, and they asserted that they were unable to do so because of a secrecy arrangement they had made with Mr. Rimm.

One would have, perhaps, more confidence in the results of the Rimm study had it been subjected to more vigorous peer review. What follows is a preliminary list of some of the methodological oddities that I have uncovered after review of a pre-publication copy of the study that the Law Journal editors made available to me on June 26 (the publication date for the Time story). THIS LIST IS NOT, NOR IS IT INTENDED TO BE, EXHAUSTIVE; I anticipate that other such oddities will emerge as the interested community takes a more careful look at these results in the coming weeks and months.

1. Usenet Groups. Rimm's study of Usenet groups was confined to those groups with the "alt.binaries" prefix (p. 1865).

The researchers determined that "[s]eventeen of the thirty-two alt.binaries newsgroups located on the Usenet contained pornographic images" (p. 1867). During a single seven-day period (9/21/94 to 9/27/94), the researchers logged 827 image postings to the "non-pornographic" newsgroups (Rimm's descriptor), and 4206 image postings to the "pornographic newsgroups." Thus, of the 827+4206=5033 images posted, 83.5% (4206) were to newsgroups that contain pornographic material.

Preposterously, in his "Summary of Significant Results of the Carnegie Mellon Study, Rimm writes that "83.5% of all images posted on the Usenet are pornographic." The correct conclusion, of course, is that 83.5% of the images posted to a subset of newsgroups (the alt.binaries newsgroups) are to newsgroups that contain pornographic images. Rimm's conclusion is the precise methodological equivalent to the following: (a) restricting a study of printed pornography to magazines located in the "adult" area of a bookstore, (b) finding that 83.5% of the reader submissions during a one- week period were to magazines that contained "pornographic" material, and concluding (c) that 83.5% of all reader submissions to all magazines are pornographic.

2. Usenet, II. Rimm writes:

The best data concerning network pornography consumption comes from the Usenet, which itself constitutes only 11.5% of Internet traffic. Of this 11.5%, approximately 3% by message count, but 22% by byte count (e.g., 2.5% of total Internet backbone traffic) is associated with Usenet newsgroups containing pornographic imagery" (p. 1869).

Thus, by Rimm's own figures (which he chooses not to highlight), then, fewer than one-half of 1% of the messages on the Internet (3% of 11.5%) are "associated with" newsgroups that contain pornographic imagery; since some (many? most?) of those messages are, presumably, not themselves pornographic, the actual proportion of pornographic messages is therefore even smaller than that.

3. "However, as this study makes clear, studying pornography according to consumption, as opposed to availability, provides a much more revealing picture of the marketplace" (p. 1869). Although Rimm's figures show that of the forty most popular newsgroups worldwide "only one -- alt.binaries.pictures.erotica -- contained encoded pornographic images," (p. 1871), he claims that "when the data is (sic) classified by percent of news readers who subscribe to the newsgroups, three of the five most popular newsgroups are pornographic. Moreover, 20,644 of the 101,211 monthly Usenet posts in the top forty newsgroups, or 20.4%, are pornographic" (p. 1873).

Oddly, no data are presented to support this claim, i.e., no data classify newsgroups by "percent of newsreaders who subscribe to the newsgroups." Nor is it clear whether Rimm, as he appears to claim, actually looked at 101,211 Usenet posts in the top forty newsgroups in order to determine that 20.4% of the postings "are pornographic."

4. World Wide Web. In his Summary of Significant Results, Rimm reports that "[p]edophilic and paraphilic pornography are widely available through various computer networks and protocols such as the Usenet, World Wide Web, and commercial 'adult' BBS" (p. 1849). No evidence is presented to demonstrate that such material is available anywhere on the Web. Indeed, in the Appendix dealing with the results of a March 1995 Web Survey (Appendix C), Rimm reports locating only 123 Web sites containing any "sexually explicit imagery or materials," (p. 1923), only 9 of which had any "pornographic material" at all. Rimm provides no information that any of these sites -- which constitute, in any event, far less than one-tenth of 1% of all Web sites -- contain pedophilic or paraphilic material. ```

This web service brought to you by Somewhere.Com, LLC.