Search Logger
Posts from: Research Admin

Author Archive

Large-scale graph computing at Google

1:47 pm - June 15, 2009 in Google Research Blog


If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one major Internet company has been founded upon this graph.

Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.

In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel).

Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers of dozens of Pregel applications within Google have found that "thinking like a vertex," which is the essence of programming in Pregel, is intuitive.

We've been using Pregel internally for a while now, but we are beginning to share information about it outside of Google. Greg Malewicz will be speaking at the joint industrial track between ACM PODC and ACM SPAA this August on the very subject. In case you aren't able to join us there, here's a spoiler: The seven bridges of Königsberg — inspiration for Leonhard Euler's famous theorem that established the basics of graph theory — spanned the Pregel river.
 

A new landmark in computer vision

9:30 am - June 22, 2009 in Google Research Blog


[Cross-posted with the Official Google Blog]

Science fiction books and movies have long imagined that computers will someday be able to see and interpret the world. At Google, we think computer vision has tremendous potential benefits for consumers, which is why we're dedicated to research in this area. And today, a Google team is presenting a paper on landmark recognition (think: Statue of Liberty, Eiffel Tower) at the Computer Vision and Pattern Recognition (CVPR) conference in Miami, Florida. In the paper, we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy.

To be clear up front, this is a research paper, not a new Google product, but we still think it's cool. For our demonstration, we begin with an unnamed, untagged picture of a landmark, enter its web address into the recognition engine, and poof — the computer identifies and names it: "Recognized Landmark: Acropolis, Athens, Greece." Thanks computer.

How did we do it? It wasn't easy. For starters, where do you find a good list of thousands of landmarks? Even if you have that list, where do you get the pictures to develop visual representations of the locations? And how do you pull that source material together in a coherent model that actually works, is fast, and can process an enormous corpus of data? Think about all the different photographs of the Golden Gate Bridge you've seen — the different perspectives, lighting conditions, and image qualities. Recognizing a landmark can be difficult for a human, let alone a computer.

Our research builds on the vast number of images on the web, the ability to search those images, and advances in object recognition and clustering techniques. First, we generated a list of landmarks relying on two sources: 40 million GPS-tagged photos (from Picasa and Panoramio) and online tour guide webpages. Next, we found candidate images for each landmark using these sources and Google Image Search, which we then "pruned" using efficient image matching and unsupervised clustering techniques. Finally, we developed a highly efficient indexing system for fast image recognition. The following image provides a visual representation of the resulting clustered recognition model:



In the above image, related views of the Acropolis are "clustered" together, allowing for a more efficient image matching system.

While we've gone a long way towards unlocking the information stored in text on the web, there's still much work to be done unlocking the information stored in pixels. This research demonstrates the feasibility of efficient computer vision techniques based on large, noisy datasets. We expect the insights we've gained will lay a useful foundation for future research in computer vision.

If you're interested to learn more about this research, check out the paper.
 

Speed Matters

5:15 pm - June 23, 2009 in Google Research Blog


At Google, we've gathered hard data to reinforce our intuition that "speed matters" on the Internet. Google runs experiments on the search results page to understand and improve the search experience. Recently, we conducted some experiments to determine how users react when web search takes longer. We've always viewed speed as a competitive advantage, so this research is important to understand the trade-off between speed and other features we might introduce. We wanted to share this information with the public because we hope it will give others greater insight into how important speed can be.

Speed as perceived by the end user is driven by multiple factors, including how fast results are returned and how long it takes a browser to display the content. Our experiments injected server-side delay to model one of these factors: extending the processing time before and during the time that the results are transmitted to the browser. In other words, we purposefully slowed the delivery of search results to our users to see how they might respond.

All other things being equal, more usage, as measured by number of searches, reflects more satisfied users. Our experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). That's 0.2% to 0.6% fewer searches for changes under half a second!

Furthermore, users do fewer and fewer searches the longer they are exposed to the experiment. Users exposed to a 200 ms delay since the beginning of the experiment did 0.22% fewer searches during the first three weeks, but 0.36% fewer searches during the second three weeks. Similarly, users exposed to a 400 ms delay since the beginning of the experiment did 0.44% fewer searches during the first three weeks, but 0.76% fewer searches during the second three weeks. Even if the page returns to the faster state, users who saw the longer delay take time to return to their previous usage level. Users exposed to the 400 ms delay for six weeks did 0.21% fewer searches on average during the five week period after we stopped injecting the delay.

While these numbers may seem small, a daily impact of 0.5% is of real consequence at the scale of Google web search, or indeed at the scale of most Internet sites. Because the cost of slower performance increases over time and persists, we encourage site designers to think twice about adding a feature that hurts performance if the benefit of the feature is unproven. To learn more on how to improve the performance of your website visit code.google.com/speed. For more details on our experiments, download this PDF.
 

International Conference on Machine Learning (ICML 2009) in Montreal

6:00 pm - July 2, 2009 in Google Research Blog


The 26th International Conference on Machine Learning (ICML 2009) was recently held in Montreal in conjunction with the 22nd Conference On Learning Theory (COLT 2009) and the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009). This is one of the major forums for researchers from both industry and academia to share the recent developments in the area of machine learning and artificial intelligence. Machine learning is a central area for Google as it has many applications in extracting useful information from a vast amount of data available on the web. In addition to sponsoring this scientific event, Google contributed intellectually to several scientific forums. Here's a short report of those activities:
  • There were ten papers co-authored by Googlers in these conferences, which covered several areas of machine learning including domain adaption, online learning, bandits, boosting, sparsity and kernel learning.
  • Corinna Cortes, the head of Google Research NY gave one of the three invited talks of ICML. She surveyed the last decade of research in learning kernels and highlighted both the successes and the failures in learning kernels with a focus on applications of convex optimization for this purpose. Corinna concluded with a call for applying new ideas and novel techniques to overcome the current obstacles.
  • We presented a tutorial on Convergence of Natural Game Dynamics. This topic has received a lot of attention recently as it stands at the conflux of many fields such as economics, machine learning and theoretical computer science. In the tutorial, we surveyed the convergence properties of the most natural game dynamics such as the Nash dynamics or the best-response dynamics to the popular no-regret learning-based dynamics. The tutorial highlighted similarities and differences between the approaches in both the time of convergence, the point of convergence, and the quality of the outcome. We believe that the influence of the learning algorithms on the behavior of the users is an exciting and intriguing topic of research for many, and in particular for the analysis of ad auctions.
Google's main mission is "to organize the world's information and make it universally accessible and useful," and machine learning plays a fundamental role in both of these aspects. As a result, Google has invested significant resources in this area of research, and we look forward to continued participation and collaboration at these conferences for many more years.
 

Google’s Research Awards Program Update

3:30 pm - July 14, 2009 in Google Research Blog


When we think about innovation, it is easy to forget that it took about 55 years to spread automobile usage to 1/4 of the US population, ... 35 years for the telephone, ... 20 years for the radio, ... 15 years for the PC, ... 10 years for the cell phone, ... 7 years for the Internet (Council of Competiveness, Innovate America, 2004).

Recognizing that innovation holds the key to many of the unique technical challenges we face, we remain committed to maintaining strong relations with the academic research community. Our Research Awards Program has experienced phenomenal growth. Of the total number of applications that we received since the program's inception in 2005, more than half were submitted in the past year. To cope with the increased level of interest worldwide, we reorganized the program to accept submissions three times per year: April 15th, August 15th, and December 15th. Proposals are evaluated by teams of engineers and researchers, who make recommendations for funding. We try to move fast. Investigators receive a response about three months after their submission.

Here are some highlights from a recent round of applications:

"Recognition and Modeling of Objects from Street View Scans of Cities"
Thomas Funkhouser, Princeton University
Professor Funkhouser aims to develop methods for automatic construction of semantically-labeled, detailed, and photorealistic 3D models of cities from Street View data. The main efforts will be towards the segmentation and recognition of small objects (e.g. mailboxes, fire hydrants, parking meters, etc.) in Lidar data based on shape classification and contextual reasoning. A second objective will be to construct seamless, photorealistic 3D models of complete cities by extracting and fitting parts from repositories of polygonal models.

"An Ad Auctions Trading Agent Competition"
Michael Wellman, University of Michigan
The University of Michigan will introduce and operate a new game in the Trading Agent Competition (TAC) series of research competitions, in the domain of sponsored search. The TAC Ad Auctions (TAC/AA) game challenges participants to develop bidding strategies for advertisers in a simulated retail home entertainment market. The aim is to spur research and generate insights about advertiser bidding strategy, in a scenario more complex than those considered in the research literature to date. The TAC/AA environment features multiple interrelated keywords, a structured search user model, rich data availability, and a dynamic market context. Since 2000, the annual TAC series has catalyzed research on trading agent design and analysis, produced by a diverse group of researchers from academia and industry.

"A Suite of Automated Tools for Efficient Management and Search in Web-based Arabic Documents"
Adnan Yahya, Birzeit University, Palestine.
This research aims to design text mining and processing tools that are able to efficiently index, process, search, and categorize large quantities of Arabic data. This research addresses the challenges Arabic poses for NLP and information retrieval, automatic Arabic document categorization, root extraction, language detection, and Arabic query correction, suggestion and expansion. The PIs employ a statistical/Corpus-based approach based on contemporary data initially obtained from a local newspaper.

"When Children Search: Understanding what they do and what they could do with Google Search"
Allison Druin, University of Maryland
Children ages 5-13 are among the most frequent users of the Internet; yet, searching and browsing the web can present many challenges. Spelling, typing, query formulation, and deciphering results are all barriers for children in attempting to find the information they need. Professor Druin is trying to understand these issues in more diverse ages of children by focusing on current and ubiquitous search tools, namely, keyword-based web search engines.

"An Educational Camera for Kids"
Shree Nayar, Columbia University
Professor Nayar is desiging a novel digital camera that could be used as an innovative educational medium. His target audience is students between the ages of 10 and 13 years living in poor communities across the globe. The camera, named “Bigshot,” will be presented to students as a kit to expose them to diverse science and engineering concepts. Once assembled, the camera will be used so that students can share their photos with students in other cultures using Picasa and Google Groups.

"Discovering semantic concepts and their relations in large image collections"
Bernt Schiele, Technische Unversitat Darmstadt, Germany
Professor Schiele will investigate to what extent meaningful structures can be discovered from large sets of images both in a fully unsupervised fashion as well with minimal human supervision. To this end, Professor Schiele's work will try to first discover structure and learn multi-feature distance metrics in large collections of images and then to enrich such structure by weak annotations in order to link discovered structure and to derive semantic concepts.

"Dataspace Metrics: Measuring Progress for Pay-as-you-go Information Integration"
Michael Franklin, University of California
The goal of this project is to develop a measurement framework for gauging progress in terms of the quality and accuracy of information integration. The starting point is the development of a set of metrics for judging the “goodness’ of information integration across a number of information types and use cases. These metrics will then be analyzed and where possible, unified, so that a more general measurement framework can be developed. Such a framework will serve as a key component for future Dataspace management systems, and could provide a grounding for other collaborative information integration solutions.

For more information about this program, including submission guidelines, please visit the Research Awards Program page.
 

ACM EC Conference and Workshop on Ad Auctions

7:00 pm - July 21, 2009 in Google Research Blog


This month, the 10th ACM Conference on Electronic Commerce (EC 2009) and the 5th Workshop on Ad Auctions took place at Stanford University. This is one of the major forums for economists and computer scientists to share their ideas about mechanism design and algorithmic game theory. Other than co-authoring several papers in the conference and workshops, Google contributed significantly in presenting tutorials.

Among the four tutorials given at the ACM EC conference, we participated in presenting two of them:
  • In a joint tutorial with Google researcher Muthu Muthukrishnan, we explored research problems in sponsored search inspired by taking the advertiser's perspective. Emphasizing a cross-disciplinary approach, we presented sample research directions in keyword selection, traffic prediction and bidding strategy, encouraging the research community to build upon known auction models in order to tackle these even more challenging domains. We explored in more detail specific examples of research in bidding strategies.
  • Moreover, in a joint tutorial with H. Roeglin, we presented results in convergence of game dynamics, both to equilibria and nearly-optimal solutions. This was a more algorithm-oriented variant of the tutorial at ICML (which is described in a previous blog post.)

The Ad Auctions Workshop brought together many industry and academic research leaders to discuss ongoing challenges in online advertisement. The topics presented at the workshop included the role of externalities in ad auctions, new truthful ad auction mechanisms with budget constraints, efficiency loss of generalized second-price ad auctions, and complex combinatorial ad auctions. Google researchers co-organized, participated in the discussions, and contributed the following presentations:
  • Google's chief economist, Hal Varian, gave an enlightening invited talk about using Google Trends data for "predicting the present." In this work, Google Trends data is used to help improve forecasts of various economic time series. Examples illustrating this technique were drawn from the auto industry, real estate, and unemployment. He emphasized that Google Trends data is publicly available and encouraged people to use this data for their research.
  • We presented two papers in the workshop, one about optimal pricing mechanisms over social networks and one about using offline optimization in stochastic online ad allocation problems. In the latter talk, we presented an algorithm in which we use an optimal offline solution in an "expected instance", and use this solution as a signal in online decision making. Using the idea of the power of two choices from the CS literature, we give a novel theoretical analysis of our method, improving the best previously known result. We also gave some practical insight about using these methods in online ad allocation. The theoretical results will appear in the upcoming FOCS 2009 conference.

Motivated by our various ad systems, there is a large research effort at Google around areas at the intersection of Economics, Computer Science and Machine Learning. The Ad Auctions Workshop and ACM Conference on Electronic Commerce are among the best forums for stimulating ideas and collaboration in these interdisciplinary areas.
 

Predicting Initial Claims for Unemployment Benefits

7:00 pm - July 22, 2009 in Google Research Blog


One of the strongest leading indicators of economic activity is the number of people who file for unemployment benefits. Macroeconomists Robert Gordon and James Hamilton have recently examined the historical evidence. According to Hamilton's summary: "...in each of the last six recessions, the recovery began within 8 weeks of the peak in new unemployment claims."

In an earlier blog post, we suggested that Google Trends/Search Insights data could be useful in short term predictions of economic variables. Given the importance of initial claims as a macroeconomic predictor, we thought it would be useful to try to forecast this economic metric. The initial claims data is available from the Department of Labor, while the Google Trends data for relevant categories is available here.

We applied the methodology outlined in our earlier paper, building a model to forecast initial claims using the past values of the time series, and then added the Google Trends variables to see how much they improved the forecast. We found a 15.74% reduction in mean absolute error for one-week ahead out, of sample forecasts. Most economists would consider this to be a significant boost. Details of our analysis may be found in this paper.

The bottom line is that initial claims have been generally declining from their peak and that, so far at least, the Google query data is forecasting further short term declines. It would be good news indeed if this particular Google trend continues.
 

App Inventor for Android

5:15 pm - July 31, 2009 in Google Research Blog


At Google Research, we are making it easy to build mobile applications, and we're collaborating with faculty from a dozen colleges and universities to explore whether this could change the nature of introductory computing. With the support of Google University Relations, the faculty group will work together this fall to pilot courses where beginning students, including non-computer science majors, create Android applications that incorporate social networking, location awareness, and Web-based data collections.

Mobile applications are triggering a fundamental shift in the way people experience computing and use mobile phones. Ten years ago, people "went to the computer" to perform tasks and access the Internet, and they used a cell phone only to make calls. Today, smartphones let us carry computing with us, have become central to servicing our communication and information needs, and have made the web part of all that we do. Ten years ago, people's use of computing was largely dissociated from real life. With the ubiquity of social networking, online and offline life are becoming fused. This fall's exploration is motivated by the vision that open mobile platforms like Android can bring some of that same change to introductory Computer Science, to make it more about people and their interactions with others and with the world around them. It's a vision where young people—and everyone—can engage the world of mobile services and applications as creators, not just consumers. Through this work, we hope to do the following:

  • Make mobile application development accessible to anyone.
  • Enhance introductory learning experiences in computing through the vehicle of Android’s open platform.
  • Encourage a community of faculty and students to share material and ideas for teaching and exploring.

The collaborative experiment kicked off with a three-day workshop at Google's Mountain View campus in June, where invited faculty shared their plans for the courses they will offer this fall. The group also got an advance look at App Inventor for Android, the prototype development platform that Google is working on and that the faculty and their students will use in their courses. App Inventor for Android lets people assemble Android applications by arranging "components" using a graphical drag-and-drop-interface. One of the goals of the fall experiment is to further shape the system in response to the experience and feedback of students and faculty.

The schools participating in this fall's collaboration are Ball State University, University of Colorado Boulder, Georgia Tech, Harvard, Indiana University, Mills College, MIT, Olin College, University of California at Berkeley, University of Michigan, University of Queensland, University of San Francisco, and Wellesley College.

Questions or comments? Please send us feedback. We look forward to hearing from you!
 

Two Views from the 2009 Google Faculty Summit

4:59 pm - August 3, 2009 in Google Research Blog


[cross-posted with the Official Google Blog]

We held our fifth Computer Science Faculty Summit at our Mountain View campus last week. About 100 faculty attendees from schools in the Western hemisphere attended the summit, which focused on a collection of technologies that serve to connect and empower people. Included in the agenda were presentations on technologies for automated translation of human language, voice recognition, responding to crises, power monitoring and collaborative data management. We also talked about technologies to make personal systems more secure, and how to teach programming — even using Android phones. You can see a more complete list of the topics in the Faculty Summit Agenda or check out my introductory presentation for more information.

I asked a few of the faculty to provide us their perspective on the summit, thinking their views may be more valuable than our own: Professor Deborah Estrin, a Professor of Computer Science at UCLA and an expert in large-scale sensing of environmental and other information, and Professor John Ousterhout, an expert in distributed operating systems and scripting languages.

Professor Estrin's perspective:

We all know that Google has produced a spectacular array of technologies and services that has changed the way we create, access, manage, share and curate information. A very broad range of people samples and experiences Google’s enhancements and new services on a daily basis. I, of course, am one of those minions, but last week I had the special opportunity to get a glimpse inside the hive while attending the 2009 Google Faculty Summit. I still haven't processed all of the impressions, facts, figures and URLs that I jotted down over the packed day and a half-long gathering, but here are a few of the things that impressed me most:

  • The way Google simultaneously launches production services while making great advances in really hard technical areas such as machine translation and voice search, and how these two threads are fully intertwined and feed off of one another.
  • Their embrace of open source activities, particularly in the Android operating system and programming environment for mobiles. They also seed and sponsor all sorts of creative works, from K-12 computer science learning opportunities to an the open data kit that supports data-gathering projects worldwide.
  • The company’s commitment to thinking big and supporting their employees in acting on their concerns and cares in the larger geopolitical sphere. From the creation of Flu Trends to the support of a new "Crisis Response Hackathon" (an event that Google, Microsoft and Yahoo are planning to jointly sponsor to help programmers find opportunities to use their technical skills to solve societal problems), Googlers are not just encouraged to donate dollars to important causes — they are encouraged to use their technical skills to create new solutions and tools to address the world's all-too-many challenges.

This was my second Google Faculty Summit — I previously attended in 2007. I was impressed by the 2007 Summit, but not as deeply as I was this year. Among other things, this year I felt that Googlers talked to us like colleagues instead of just visitors. The conversations flowed: Not once did I run up across the "Sorry, can't talk about that... you know our policy on early announcements". I left quite excited about Google's expanded role in the CS research ecosystem. Thanks for changing that API!

Professor Ousterhout's perspective:

I spent Thursday and Friday this week at Google for their annual Faculty Summit. After listening to descriptions of several Google projects and talking with Googlers and the other faculty attendees, I left with two overall takeaways. First, it's becoming clear that information at
scale is changing science and engineering. If you have access to enormous datasets, it opens up whole new avenues for scientific discovery and for solving problems. For example, Google's machine translation tools take advantage of "parallel texts": documents that have been translated by humans from one language to another, with both forms available. By comparing the sentences from enormous numbers of parallel texts, machine translation tools can develop effective translation tools using simple probabilistic approaches. The results are better than any previous attempts at computerized translation, but only if there are billions of words available in parallel texts. Another example of using large-scale information is Flu Trends, which tracks the spread of flu by counting the frequency of certain search terms in Google's search engine; the data is surprisingly accurate and available more quickly than that from traditional approaches.

My second takeaway is that it's crucial to keep as much information as possible publicly available. It used to be that much of science and engineering was driven by technology: whoever had the biggest particle accelerator or the fastest computer had an advantage. From now on, information will be just as important as technology: whoever has access to the most information will make the most discoveries and create the most exciting new products. If we want to maintain the leadership position of the U.S., we must find ways to make as much information as possible freely available. There will always be vested commercial interests that want to restrict access to information, but we must fight these interests. The overall benefit to society of publishing information outweighs the benefit to individual companies from restricting it.
 

Under the Hood of App Inventor for Android

10:00 am - August 11, 2009 in Google Research Blog


We recently announced our App Inventor for Android project on the Google Research Blog. That blog entry was long on vision but short on technological details--details which we think would be of interest to our readers.

Of particular interest is our use of Scheme. Part of our development environment is a visual programming language similar to Scratch. The visual language provides a drag-and-drop interface for assembling procedures and event handlers that manipulate high-level components of Android-based phones. The components are similar to the ones in the recently announced Simple; in fact, the code bases share an ancestor.

We parse the visual programming language into an S-expression intermediate language, which is a domain-specific language expressed as a set of Scheme macros, along with a Scheme runtime library. We did this for a few reasons:
  • S-expressions are easy to generate and read for both humans and machines.
  • Scheme macros are a convenient (albeit sometimes arcane) way to express S-expression based syntax.
  • Scheme is a small, powerful and elegant language well suited to describe and evaluate a large set of programming semantics. Additionally, it provides the flexibility that we require as our language and its semantics grow and develop.
  • Scheme expertise was readily available among our team.
  • A pre-existing tool (Kawa by Per Bothner) to create Android compatible output from scheme code was already available.
For now the project is just an experiment we're performing with a dozen colleges and universities, but we hope to eventually open up the development environment to wider use and to open-source parts of the code.
 
 
 
 
 
 
It's All About Search | © clsc.net |
2012.05.1822:59
Tech used here: Valid HTML - Valid CSS - Valid RSS - JavaScript - PHP - Smarty - MySQL - and a partridge in a pear tree.