|
What Google Search Reveals About Us Billions of search terms can paint a detailed picture. But who gets to see it? By Thomas Claburn InformationWeek Mar 13, 2006 12:01 AM Organizing the world's information and making it universally accessible--Google's ambition--has its ramifications. In handling billions of search requests, Google generates terabytes of data about the Web searches of its users, a wealth of information the company mines regularly but guards vigorously. For six months, the U.S. Department of Justice, in an effort to uphold the Child Online Protection Act, has been pressing to get its hands on some of that data. What happens next could reveal as much about Google as Google knows about its users. The Justice Department subpoenaed Google in August, demanding two months of search queries and all the URLs in its index. Negotiations led to a narrowing of the government's request, to 5,000 queries and 50,000 URLs, but the sides hit an impasse. Unlike AOL, MSN, and Yahoo, which gave the government what it sought, Google contested the order. A federal judge will hear Google's case in San Jose, Calif., this week. Google argues that the data the feds want isn't relevant to the government's effort and that some of what's requested--the number, length, and type of queries processed--constitutes a trade secret. Google also objects to the work involved in producing the requested information and worries that "being forced to compromise its privacy principles" would erode customer confidence. But it's not just a matter of principle. Google's always been coy about how much personal information it collects. Now questions over how much it knows and who gets access to that information are becoming more important. Google's become the canary in the data mine. Google has called into question whether the government's subpoena complies with the Electronic Communications Privacy Act, which limits the circumstances under which electronic data and communications can be disclosed to the government and other entities. The act covers two types of network services: electronic communications and remote computing. The government claims Google provides neither, but attorney Richard Wiebe disagrees. In a brief filed by the Center for Democracy and Technology, Wiebe argues that Google, as an outsourcer of search functions, qualifies as a remote computing service provider. It's not the first time Google has had to stare down the double barrels of user privacy and government compliance, and given Google's rich archive of user data, it probably won't be the last. Anticipating interest in its data stores from authorities in China, where its legal options are fewer, Google elected not to offer Gmail or Blogger from servers based in that country until "we can do so in a manner that respects our users' interests in the privacy of their personal communications." In the States, Justice is trying to prove that the Child Online Protection Act is necessary by demonstrating that Internet filtering software doesn't adequately protect minors from viewing sexually explicit material. To make its case, the government aims to use data from search engines to perform statistical analysis about the effectiveness of Internet filters in screening out pornography. The feds aren't looking to include information that can be tied to individual users, but some worry the government's request could become a precedent for just that. Search keywords and associated URLs aren't exactly trade secrets. Search engine queries are routinely sold, stripped of personally identifiable information that might have been gleaned from the original query. InfoSpace, which owns meta-search engines Dogpile.com and MetaCrawler.com, sells keyword lists to online advertising companies. Google doesn't sell such lists, though it does routinely publish a list of popular search terms called the Google Zeitgeist. What's more, Google queries are "disclosed routinely to third parties when a user clicks any link in Google search results," writes Philip B. Stark, a professor of statistics at the University of California at Berkeley and a government expert witness, in court documents. Indeed, Google tells anyone with a Web site more than it has told the government so far. Web servers contain logs of text files that are records of site visitor activity. Anyone with access to those logs--and that's not limited to system administrators--can determine, in most cases, the IP address of the visitor's computer, the date and time the visitor requested each page and image on the site, the referring URL, and more. That may seem like trivial information, but taken in aggregate it "can be very revealing," writes blogger Tom Owad. In January, Owad demonstrated how Web information can be manipulated to encroach on privacy: He wrote a computer script to download all Amazon Wish Lists posted by people named Edgar, winnowed the results to controversial titles, located the individuals who posted the lists using Yahoo People Search, then plotted their locations on Google Maps. "The technology exists to watch everybody," he warns. Identity Game Google confirms that it can identify people who have submitted specific keyword search terms using their IP addresses or HTTP cookies. It also can identify, in some circumstances, the search terms submitted by a specific user. "The list we could produce is of corresponding IP addresses or cookies, not an actual list of people, unless they have provided us with their names by registration" or in some other way, says Nicole Wong, associate general counsel for Google, in an E-mail. | Google's Goods What can get collected | | | Google Search Keywords, IP address and server log data, OS and browser
Google Account E-mail address, password, home country
Google Base Photos, documents
Search Across Computers PC files
Google Toolbar URLs visited, personal information
Gmail E-mail address, messages, contact list, login history
AdWords Name, mailing address, Social Security or tax ID number | |
Google collects still more data via other applications, some of which users volunteer and some of which they generate. Frequent users can set up personal Google accounts comprised of an E-mail address, password, and home country and that might include a credit card number if the user has signed up with Google AdWords. A Google account may also be the source of a user's search history--what was clicked on and when--if he or she used Google's Personal Search service to improve search relevancy. Google says it may tie other information to Personal Search to improve search quality. Google accounts typically are associated with services such as Gmail. The trail left behind by a Gmail user might include an E-mail address, password, alternate E-mail address and password, list of contacts, login history, and records of actions such as clicking on certain user interface elements, ads, and links. It's much the same for other apps such as Google Base, Google Desktop, Google Talk, and Google Toolbar; some of these can be configured to give Google potentially sensitive information. Google Base, for instance, is a public database that contains any digital information a user has submitted. Google Desktop's Search Across Computers feature creates encrypted copies of a user's local files on Google servers. And the Google Toolbar and Google Web Accelerator report the Web pages you're viewing to Google and may include personal information submitted to third-party Web sites, such as a person's name, if the Web site in question embeds login details in a URL. These privacy issues are likely to become more complicated--documents revealed at a recent Google presentation for analysts indicate the company may offer an online storage service called GDrive. Google's Data Google is open about the information it collects. It gathers data that users provide and generate through their online activity with the company's sites, according to Google's privacy policy. But it's impossible to know how long Google keeps all that clickstream information. "Data is kept for as long as it's useful," Wong says. When a user deletes information from his or her Google account, related data still may be accessible to Google for weeks or months. Deleted E-mail, for example, can stay in Google's databases for up to 60 days. And Google retains data from AdWords accounts for bookkeeping and tax purposes, Wong says. Just what Google does with user-generated information isn't always obvious. The company says it uses the information to provide products and services, including the display of advertising and custom content. It also uses that information to analyze its services, ensure good performance of its sites, and develop new services. That's similar to what AOL, Yahoo, and other Internet companies do. In analyzing its data, Google can correlate things such as service response time and customer retention. If a user, for example, conducts a search and leaves before the result is returned, that's a clue that its servers are too slow. That's one of the reasons for launching the China-based Google.cn search site. The filters imposed by the Chinese government on Google.com made that site perform poorly in China. What, Me Worry? Google is one company among many with massive amounts of potentially sensitive information. The potential to invade users' privacy is enormous, but Google's risks remain mostly unrealized. Google's resistance to the Justice Department's demands is meant to keep things that way. Other companies have acquiesced to government requests for user-generated Internet data and various kinds of communications traffic, sometimes reluctantly and other times without hesitation. In January, the Electronic Frontier Foundation, a cyberliberties group, sued AT&T for giving the National Security Agency access to customers' phone and Internet communications. The issue of search engine privacy has come to the fore because the wealth of information search companies compile is vast and relatively unprotected by laws, at least compared with data held by companies in regulated industries such as finance, health care, and telecommunications. Ray Everett-Church, chief privacy officer for ePrivacy Group and co-author of Internet Privacy For Dummies (For Dummies, 2002), says a worst-case scenario is that data requested by the government could be used in fishing expeditions. Moreover, if such subpoenas are allowed, there are likely to be more of them. "From a precedent perspective, turning over this sort of information and responding to this sort of large, blanket request is very dangerous," Everett-Church says. It raises the question: Can your searches be used against you? The short answer is yes. There's the case of North Carolina computer consultant Robert Petrick, convicted of murder in November. During Petrick's trial, Google search terms were introduced as evidence. As it turned out, the most incriminating terms--including "break," "snap," and "neck"--were obtained from Petrick's computers, but the potential for damning evidence from search engine archives was real. David Townsend, CEO of eFor Computer Forensics, who served as a forensic consultant in the recent trials of Michael Jackson and Scott Peterson, compares computers to surveillance devices because they can yield search information and other data. "Say a civil suit comes in, and they've erased all this stuff," Townsend says. "I can pull the HTML of the Web page back up that says Yahoo on it, and it says what his search terms were." The evidence left behind during Web searches can be telling, yet it also can be misconstrued or used with malice. Even those who think the U.S. government is looking after their best interests may find themselves in disputes with colleagues, employers, ex-spouses, insurers, or competitors. In such circumstances, bits of Web usage information can be connected to present an unwelcome, unflattering, or damning picture--and one that may or may not be accurate. Data Policies Key For companies concerned about these issues, Townsend says it's critical to have a formal data-retention policy--and to follow it. If a court order arrives and an employee has been dumping data contrary to the rules, "they're in a pickle," he says. Privacy laws eventually will be revised to address the perils confronting Google and others. Until then, the way around this bind is to minimize the amount of data that's collected and retained, an approach taken by many companies whose business models, unlike Google's, aren't based on indexing vast databases. "You can't respond to a request for data if you don't have the data," Everett-Church says. The issue has made it to Washington. U.S. Rep. Edward J. Markey of Massachusetts, the ranking Democrat on the Telecommunications and Internet Subcommittee of the House Energy and Commerce Committee, last month introduced the Eliminate Warehousing of Consumer Internet Data Act of 2006, aimed at protecting consumer privacy and preventing the indefinite storage of data. An 800-person survey released last month by the Center for Survey Research at the University of Connecticut suggests Americans are divided about whether search engines should turn over information about their users' search habits to the government: 50% say the companies shouldn't comply with a government request, while 44% say they should. The fact that only 30% support government monitoring of Internet search behavior while 65% oppose it suggests that respondents separated the issue of obedience to the government and the issue of whether monitoring is appropriate. And 60% of people oppose permanent storage of search behavior by the likes of Google and Yahoo. "People are getting increasingly wary about government surveillance of any type," says Samuel Best, director of UConn's Center for Survey Research. "It's likely what we're seeing here is the beginning of a big battle on information technologies and how data gets stored." Information Week Article Bush Signs Bill Targeting Knockoffs
Manufacturers lose an estimated $200 billion a year from counterfeit products, and the new bill closes some loopholes.
By
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
TechWeb News
Mar 17, 2006 11:04 AM
The Stop Counterfeiting In Manufactured Goods bill closes loopholes that allowed the shipment of fake products from electronic components and automotive parts to apparel into the United States. The bill also requires courts to order the destruction of all counterfeit products seized as part of a criminal investigation and convicted counterfeiters to relinquish profits and any equipment used in the operation. Those convicted of counterfeiting must reimburse the legitimate businesses they exploited. Manufacturers lose an estimated $200 billion a year from counterfeit products, according to U.S. Customs and Border Protection. Mike Wills, Intermec Corp.'s vice president of global services, RFID, and intellectual property, said more manufacturers in consumer goods and retail industries are looking at embedding radio frequency identification (RFID) tags in individual items as an option to stop counterfeiting. Some industries, such as pharmaceutical, have already begun to deploy RFID technology in their supply chain to stop counterfeiting. Drug manufacturers and distributors have integrated RFID technology to make certain fake drugs don't reach consumers. Projects are being rolled out on request from the U.S. Food and Drug Administration (FDA). The White House said in a statement that it broke up a prescription drug counterfeiting network and seized more than $4 million in counterfeit medicine with help from partners overseas. With help from 16 countries on five continents, it also eliminated more than $100 million in illegal online software, games, movies and music. InformationWeek Bush Signs Bill Targeting Knockoffs | March 17, 2006
|