Search Logger
Posts from: Research Admin

Author Archive

Socially Adjusted CAPTCHAs

12:58 pm - April 16, 2009 in Google Research Blog


Unfortunately, there is a war going on between humans and 'bots. Software
'bots are attempting to generate massive numbers of computer accounts
which are then sold in bulk to spammers. Spammers use these accounts to
inundate emails and discussion boards. Meanwhile humans are trying to
simply create an account and don't want to spend a lot of time proving
that they are not a program.

Typically we use CAPTCHAs -- we present an image of some distorted text
and then ask the applicant to type in the letters. As image processing gets
more sophisticated, these letter sequences tend to get longer and more
distorted, sometimes to the point where humans fail too.

So we switched the game. We show an image, say an airplane, but it
is randomly rotated and we ask the applicant to rotate it to "up." This
is generally hard for computers but easy for people. Well, for the most
part.

Since computers are good at faces, skies, text, etc. we sift
through our database of images running state-of-the-art up detectors to
remove those images. But of the images that remain, some are too hard
for people to figure out. What is up for a plate or a piece of
abstract art?

So here is where it gets interesting. We show people several images, one
of which is a "candidate" and we see how people do. If everyone rotates
it the same way, it is a keeper. If there is a lot of variation, we
discard it. As extra credit it turns out that even if the original image were
taken at an angle, it does not matter, since people, in large numbers,
socially adjust the CAPTCHA.

Read the full paper here (posted with the permission of WWW'09).
 

Congratulations to NSF CLuE Grant awardees

2:03 pm - April 23, 2009 in Google Research Blog


The first goal of the Academic Cluster Computing Initiative was to familiarize the academic community with the methods necessary to run very large datasets on massive distributed computer networks. By expanding that program to include research grants through the National Science Foundation's Cluster Exploratory (CLuE) program, we're also hoping to enable new and better approaches to data-intensive research across a range of disciplines.

Now that the NSF has announced the 2009 CLuE grants in addition to some previous Small Grant for Exploratory Research (SGER) grants, we're excited to congratulate the recipient researchers and wish them the best as they bring new projects online and continue to run existing SGER projects on the Google/IBM cluster.

The NSF selected projects based on their potential to advance computer science as well as to benefit society as a whole, and researchers at 14 institutions are tackling ambitious problems in everything from computer science to bioinformatics. The institutions receiving CLuE grants are Purdue, UC Santa Barbara, University of Washington, University of Massachussetts-Amherst, UC San Diego, University of Virginia, Yale, MIT, University of Wisconsin-Madison, Carnegie Mellon, University of Maryland- College Park, University of Utah and UC Irvine. Florida International University, Carnegie Mellon and University of Maryland will continue other projects with exiting SGER grants. These grantees will run their projects on a Google/IBM-provided cluster running an open source implementation of Google's MapReduce and File System.

We're excited to help foster new approaches to difficult, data-intensive problems across a range of fields, and we can't wait to see more students and researchers come up with creative applications for massive, highly distributed computing.
 

The Continuing Metamorphosis of the Web

12:39 pm - April 27, 2009 in Google Research Blog


I just returned from giving a talk at the 18th World Wide Web Conference in Madrid and was pleased to see a healthy and dynamic conference despite difficult economic conditions. Madrid had beautiful spring weather, and a magnificent modern architecture abounds throughout the city. I will say, though, that the Madrid subway does not vibrate (shake, rattle, and roll) one’s soul quite as much as does our local NYC subway.

My talk was entitled The Continuing Metamorphosis of the Web. In it, I noted that the initial web standards were so simple and sensible that they engendered a path of stepwise innovations, which taken together have aggregated into amazing accomplishments. Metaphorically, I feel our community has been on a kind of pseudo-random walk that has taken us to remarkable places. The truly great results have included the creation of a virtual Library of Alexandria, the creation of the search engine (to be that library’s super-card catalog), the empowerment of the long tail (in diverse communities), and great innovations to doing business. I argued that the bottom up evolution is continuing (perhaps even accelerating) today, and that the current stepwise improvements are still leading to broad innovations, which we will come to view as extraordinary as any that have occurred to-date.

Here are three great achievements currently a-brewing:
  1. “Totally Transparent Processing.” By this, I argued that our use of the web (whether for search, communication, or information access) can increasingly occur in a fluid manner that is independent of the device we are using, independent of the human language we prefer, independent of the modality of the data, and independent of the corpus of information on which our interaction is based. In effect, processing can be transparent ∀d∈D, ∀l∈L, ∀m∈M, ∀c∈C. Our barriers to using information technology are fading away and becoming transparent.
  2. “Ideal Distributed Computing.” While we have known the fundamentals of distributed computing for many decades, only today are we reaching a state where we can achieve a powerful and efficient balance of computation between all end-user devices and a vast collection of shared storage and computational resources. Cloud computing is today’s term d’arte, but I talked more generally about systems with the flexibility that computation and data can move across computers within a cluster, across clusters of computers and—of course—between clusters and all other (say, end user) devices. The result is the efficient, even awesome, capability to provide communication, computation and data to a vast collection of people and applications.
  3. “Hybrid, Not Artificial, Intelligence.” Systems are regularly augmenting the capability of all of us in day-to-day life, and our collective use of those systems is, in turn, augmenting the capabilities of those systems in a beneficial virtuous circle. The virtuous circle is operating already in the search engine, voice recognition systems, recommendation systems, and more. There is every reason to think the effect will become ever more potent as computers are applied to more domains and and used by larger populations. The result may not be artificially intelligent machines that pass the Turing Test, but instead systems that will be ever more capable of helping us achieve our goals in life -- in a kind of partnership. For a related take on this, you might look at a Google Official Blog post, “The Intelligent Cloud,” which Franz Och and I posted last Fall.
More explanation and many examples, based on Google research and services, are available in the slides I used with my talk. A PDF file of those slides is available on the WWW2009 website under the papers and presentations link.
 

Cloud Computing and the Internet

10:17 am - April 28, 2009 in Google Research Blog


[adapted from the speech given on the occasion of the honoris causa ceremony
at the Universidad Politecnico de Madrid]

The Internet is largely a software artifact and a layered one as my distinguished colleague, Sir Tim Berners-Lee has observed on many occasions. The layering has permitted a remarkable versatility in the implementation of the Internet and its applications. New technology can be used to implement each layer and as long as the interfaces between the layers remain static, the changes do not affect the functionality of the system. In this way, the Internet has evolved and adapted new transmission and switching technology into its lower layers and has supported new upper layers such as the HTTP, HTML and SSL protocols of the World Wide Web.

In recent years, the term “cloud computing” has emerged to make reference to the idea that from the standpoint of a device, say a laptop, on the Internet, many of the applications appear to be operating somewhere in the network “cloud.” Google, Amazon, Microsoft and others, as well as enterprise operators, are constructing these cloud computing centers. Generally, each cloud knows only about itself and is unaware of the existence of other cloud computing facilities. In some ways, cloud computing is like the networks of the 1960s when my colleagues and I began to think about connecting computers together on networks. Each network was typically proprietary. IBM had Systems Network Architecture; Digital Equipment Corporation had its DECNET; Hewlett-Packard had its Distributed System. These networks were specific to each manufacturer and did not interconnect nor even have a way to express the idea of connecting to another network. The Internet was the solution that Robert Kahn and I developed to allow all such networks to be interconnected in a uniform way.

Cloud computing is at the same stage. Each cloud is a system unto itself. There is no way to express the idea of exchanging information between distinct computing clouds because there is no way to express the idea of “another cloud.” Nor is there any way to describe the information that is to be exchanged. Moreover, if the information contained in one computing cloud is protected from access by any but authorized users, there is no way to express how that protection is provided and how information about it should be propagated to another cloud when the data is transferred.

Interestingly, my colleague, Sir Tim Berners-Lee, has been pursuing ideas that may inform the so-called “inter-cloud” problem. His idea of data linking may prove to be a part of the vocabulary needed to interconnect computing clouds. The semantics of data and of the actions one can take on the data, and the vocabulary in which these actions are expressed appear to me to constitute the beginning of an inter-cloud computing language. This seems to me to be an extremely open field in which creative minds everywhere can be free to contribute ideas and to experiment with new concepts. It is a new layer in the Internet architecture and, like the many layers that have been invented before, it is an open opportunity to add functionality to an increasingly global network.

There are many unanswered questions that can be posed about this new problem. How should one reference another cloud system? What functions can one ask another cloud system to perform? How can one move data from one cloud to another? Can one request that two or more cloud systems carry out a series of transactions? If a laptop is interacting with multiple clouds, does the laptop become a sort of “cloudlet”? Could the laptop become an unintended channel of information exchange between two clouds? If we implement an inter-cloud system of computing, what abuses may arise? How will information be protected within a cloud and when transferred between clouds. How will we refer to the identity of authorized users of cloud systems? What strong authentication methods will be adequate to implement data access controls?

Because the Internet is primarily a software artifact, there seems to be no end to its possibilities. It is an endless frontier, open to exploration by virtually anyone. I cannot guess what will be discovered in these explorations but I am sure that we will continue to be surprised by the richness of the Internet’s undiscovered territory in the decades ahead.
 

The bar-bet phenomenon: increasing diversity in mobile searches

1:27 pm - May 7, 2009 in Google Research Blog


Historically, research suggests that web search on mobile phones has been limited when compared to the diverse set of queries which comprise computer-based search. Researchers attribute the homogeneous mobile search behavior in part to the phone's form factor and browsing capabilities. However, our new logs-based study indicates that high-end phones, like the iPhone, are changing the landscape of mobile search. We found that search from these phones has evolved not only to mimic computer web search patterns, but to exceed the expectations set by conventional web search in some cases.

We see iPhone searches mimicking computer-based search behavior in terms of query length (~3 words per query for computer and iPhone queries, as opposed to 2.5 words per query for conventional mobile queries) and query classification (notably the percentage of Adult and Entertainment searches have decreased on the iPhone relative to conventional mobile phones). But what is most surprising to us is that frequent searchers on iPhone surpass frequent searchers on computers in terms of the diversity of queries they issue. In other words, people are using high-end phones to search for a more diverse set of information needs than computers are used for; we jokingly refer to this as the "bar-bet" phenomenon -- or the "pub-quiz" phenomenon for those of you in the UK.

We devised a metric for quantifying the variability of a user’s search intentions across time. This variability metric, entro-percent, is a normalized entropy metric which compares the number of search tasks issued by a user to the number of categories those search tasks fall under. This user-variability for conventional mobile web search is much lower than for computer-based search, confirming the hypothesis that mobile web users query over a much less diverse set of topics. The surprising news is that iPhone users, on the other hand, had a higher variability than computer based users, indicating their information needs are more diverse! This shows that the challenges posed by a phone's form factor can be outweighed by its "always on, always in your pocket" benefits.

To understand the meaning of the entro-percent equation, read our full paper summarizing the findings of our logs-based study of search patterns on conventional mobile phones, iPhones and conventional computers and get all the juicy details.
 

ACM Multimedia 2009 Grand Challenges

9:19 am - May 12, 2009 in Google Research Blog


At Google Research we interact with the academic research community closely through various programs like Research Awards, Visiting Faculty Program, and by active participation in various conferences. Dealing with large quantities of data gives us some unique challenges and perspectives on various problems. In many cases entirely new problem classes begin to emerge. These problems often have not received attention from a broad part of the research community. In an effort to bridge this gap for multimedia problems, we participated in setting Grand Challenges for this year's ACM Multimedia Conference. We proposed "Robust, As-Accurate-As-Human Genre Classification for Video" as a challenge.

The majority of research in video analysis today focuses on surveillance video. While this is critical for a lot of security applications, it is incomplete in describing challenges that come up when we tackle a video retrieval and discovery application like YouTube. Analysis work beyond surveillance is often limited to specific categories like News and Sports that have well defined structures that the solution methods can explicitly work with. Our challenge aims to encourage more work in the area of semantic understanding of a broad variety of videos. Genre classification is a problem thats representative of some of the challenges that stem from the sheer diversity that can exist across video categories. The challenge will encourage new methods to solve these problems, as well as attempts at standardizing datasets to represent this problem. With internet video gaining popularity in an astounding magnitude, we believe this challenge will steer the multimedia research community towards challenges posed by the magnitude and variety of this new problem area.

We are grateful to Mor Naaman (Rutgers University) and Tat-Seng Chua (National University of Singapore) for organizing this industry challenge track at ACM Multimedia and inviting us to be a part of it.

Details of our challenge can be found here.
 

The best and the brightest

11:00 am - May 15, 2009 in Google Research Blog


[Also posted on the Official Google Blog]

I can't think of a better environment than academia for asking hard questions and trying to solve the unsolvable. It's at universities that graduate students perform some of the most exciting and game-changing research in computer science and technology. These university labs foster the students that are going to be the next innovators and leaders in research.

We started the Google Fellowship Program this year to support graduate students in their quest to discover and achieve great things. Our goal was to find the best and brightest PhD students and award them a unique fellowship that highlights their contributions to research and supports them through their graduate studies. Several top universities submitted their students for consideration by research scientists, distinguished engineers and executives at Google. The breadth of research covered by these students and the scope of their vision was astounding. Learning about them was exciting; choosing from among them was truly difficult.

After careful review, we are proud to announce the 2009 Google Fellowship recipients:
  • Roxana Geambasu, Google Fellowship in Cloud Computing (University of Washington)
  • Michael Piatek, Google Fellowship in Computer Networking (University of Washington)
  • David Sontag, Google Fellowship in Machine Learning (Massachusetts Institute of Technology)
  • Ali Farhadi, Google Fellowship in Computer Vision Image Interpretation (University of Illinois at Urbana-Champaign)
  • Nicholas Chen, Google Fellowship in Human-Computer Interaction (University of Maryland)
  • Siddhartha Sen, Google Fellowship in Fault Tolerant Computing (Princeton University)
  • Ryan Peterson, Google Fellowship in Distributed Systems (Cornell University)
  • Eric Gilbert, Google Fellowship in Social Computing (University of Illinois at Urbana-Champaign)
  • Micha Elsner, Google Fellowship in Natural Language Processing (Brown University)
  • Subhransu Maji, Google Fellowship in Computer Vision Object Recognition (University of California, Berkeley)
  • Nicolas Lambert, Google Fellowship in Market Algorithms (Stanford University)
  • Han Liu, Google Fellowship in Statistics (Carnegie Mellon University)
  • Lixia Liu, Google Fellowship in Compiler Technology (Purdue University)
These students exemplify excellence in all areas, and we look forward to the impact that they are sure to have on their fields and the world. The Google Fellowship will provide them with funding to cover their tuition and expenses, plus an Android-powered phone and a Google mentor. Our sincere congratulations to all of them!
 

Google Fellowships, the Nuts and Bolts

11:00 am - May 15, 2009 in Google Research Blog


As you may have read, today we announced the recipients of the 2009 Google Fellowships. (You can read the announcement over on the Official Google Blog.) This is fantastic news, and the blog post makes the Google Fellowship Program sound very polished. But the truth is there was a lot more work (and scrambling) done in the background...here's a quick snapshot.

We first conceived of the idea of the fellowships late last year. Google already funds academic research through the Google Research Awards, but we really wanted to support the graduate students who are doing a lot of the research and are the future of their respective fields. Idea: why don't we search out the best and brightest PhD students and pay their tuition and expenses, plus give them an Android phone and hook them up with a Google researcher so we can all share really cool ideas? Done and done.

After we made the decision to do the fellowships in 2009, we were in for some hard work. We quickly spread the word about the fellowships in order to give the universities and students time to prepare and send us information about themselves and their research. The nominated students were doing research on a vast array of subjects: Cloud Computing, Computer Graphics, Market Algorithms, Machine Learning, Natural Language Processing, Social Computing, Information Retrieval, Compilers, and Computer Vision to name a few. I relied upon a small army of research scientists and distinguished engineers to help me review them. In addition to lending their scientific expertise to looking over the Google Research Awards, not to mention their "day job", the forty-five Googlers also were able to provide feedback on the students in record time - these guys are champs. Then a whirlwind review with Alfred Spector, VP of Research and Special Initiatives at Google, and just six months later we are proud to announce the 2009 Google Fellowship recipients.

It was a jam-packed 6 months, and I'm really proud of how the program turned out this year. That said, I'm already looking forward to our sophomore year in 2010. You should expect to see a broader program covering more areas of research, more schools, and more geographies. I can't wait.
 

Remembering Rajeev Motwani

10:06 am - June 8, 2009 in Google Research Blog


Many hundreds of us at Google were fortunate to have been educated, advised, and inspired by Professor Rajeev Motwani. Six of us were his PhD students and very many others (including our founders) were advised by or took courses from him. Others Googlers, who were not students at Stanford, had close collegial relations. But, no matter what the relationship, we respected Rajeev as a great man. He was not just a mathematically deep computer scientist, not just an entrepreneurial computer scientist who catalyzed value at the intersection of his work and the real world, he was also a thoughtful, caring, and honorable friend.

The words of just a few of us speak louder than any summary I can make:

Sergey Brin wrote in his blog, “Officially, Rajeev was not my advisor, and yet he played just as big a role in my research, education, and professional development. In addition to being a brilliant computer scientist, Rajeev was a very kind and amicable person and his door was always open. No matter what was going on with my life or work, I could always stop by his office for an interesting conversation and a friendly smile.”

Zoltan Gyongyi wrote, “Not only a great educator and one of the brightest researchers of his generation, Rajeev was also a catalyst of Silicon Valley innovation--Google itself standing as a proof. Moreover, he was a mentor, colleague, role model, friend to many Googlers. I am utterly unable to find words that would properly express my personal gratitude to him and the weight of this loss.”

Mayur Datar wrote, “I was fortunate to have Rajeev as my PhD advisor for five years at Stanford. Beyond graduation, he often helped me with priceless career guidance and professional help in terms of meetings with other people in Silicon Valley. There are only a handful of people I can think of who are such high caliber academics and entrepreneurs. His contributions and impact on CS theory community, Stanford CS Dept, and Silicon Valley enterprises and entrepreneurs is unfathomable. I still find it hard to come to terms with his horrible reality. My deepest condolences and prayers go out to his family. He will be fondly remembered and dearly missed by all of us!"

An Zhu wrote, “I am both fortunate and honored to have Rajeev as my PhD advisor. The 5 years at Stanford is very memorable to me. I’m eternally grateful for his advice and support throughout. It is indeed a sad day for many, including his students.”

Alon Halevy wrote, “Rajeev was an inspiration to me and my colleagues on so many levels. As a young graduate student, I remember him working on some of the toughest theoretical computer science problems of the day. Later, his taste for good theory and ability to apply it to practice had a huge impact on various aspects of data management research. As a professor, and now as a Googler, I am awed at the amazing stream of high-caliber students that he mentored. As an entrepreneur, he gave me some generous and well-timed advice. And most of all, as a person, his kindness and willingness to help anyone was a true inspiration.”

Vibhu Mittal wrote, “He was a brilliant researcher and a great professor. And yet the only thing that I can remember right now is that he was a fun, generous, helpful guy who was always willing to sit down and chat for a few minutes. I hope wherever he is, he is still doing it. And I hope there’ll be more people like him in this world to help people like us. I wish his family well — words cannot express what I feel for them.”

Gagan Aggarwal wrote, “I feel extremely fortunate to have had Rajeev as my PhD advisor. He was a wonderful advisor--always very flexible and willing to let his students work at their own pace, while making sure that things are going alright and providing guidance when needed. One of the several striking features of Rajeev's research was his ability to translate real life problems into clean, well-motivated, abstract questions (that he would promptly pose to his students). He was for me an eternal source of fresh problems and great ideas, a source I could tap into whenever my own ideas dried up (and was planning to, just last week). It is impossible to come to terms with the fact that I am never going to do this again. Rajeev had an unmatched clarity of thought and perceptiveness that was evident not only in doing research with him but also in the invaluable advice he gave me about career choices and life in general. ...Rajeev took on many diverse roles: teacher, entrepreneur, advisor and friend, and filled them all as only he could have. His passing will leave an impossible-to-fill void among all those whose lives he touched.”

There are more notes from Googlers, among those of many others, on the Stanford blog commemorating Rajeev.

I’d like to close by noting that Rajeev Motwani’s work on the intersection of theory and practice inspired not only the way Google processes information, but also Google's core scientific values: we fundamentally believe in the power of applying mathematical analysis and algorithmic thinking to challenging real world problems. This philosophy was inherent in Rajeev’s research, the education he gave PhD students, and the advice and classes he provided to many more.

With his and the recent untimely deaths of other influential computer scientists and friends, we are all reminded to seize each day and make the most of it. I think Rajeev would have wanted us to keep this in mind.
 

Google Fusion Tables

6:00 pm - June 9, 2009 in Google Research Blog


Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly.


Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users.


In the version we're launching today, you can upload tabular data sets (right now, we're supporting up to 100 MB per data set, 250 MB of data per user) and share them with your collaborators or with the world. You can choose to share all of your data with your collaborators, or keep parts of it hidden. You can even share different portions of your data with different collaborators.


When you edit the data in place, your collaborators always get the latest version. The attribution feature means your data will get credit for its contribution to any data set built with it. And yes, you can export your data back out of the cloud as CSV files.


Want to understand your data better? You can filter and aggregate the data, and you can visualize it on Google Maps or with other visualizations from the Google Visualization API. In this example, an intensity map of the world shows countries that won more than 10 gold medals in the Summer Olympics. You can then embed these visualizations in other properties on the Web (e.g., blogs and discussion groups) by simply pasting some HTML code we provide you.


The power of data is truly harnessed when you combine data from multiple sources. For example, consider combining data about access to fresh water in various countries with data about malaria rates in those countries, or as shown here, showing three sources of GDP data side by side. Fusion Tables enables you to fuse multiple sets of data when they are about the same entities. In database speak, we call this a join on a primary key but the data originates from multiple independent sources. This is just the start, more join capabilities will come soon.


But Fusion Tables doesn't require you and your collaborators to stop there. What if you don't agree on all of the values? Or need to understand the assumptions behind the data better? Fusion Tables enables you to discuss data at different granularity levels -- you can discuss individual rows or columns or even individual cells. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.


We hope you find Fusion Tables useful. As usual with first releases, we realize there is much missing, and we look forward to hearing your feedback.

 
 
 
 
 
 
It's All About Search | © clsc.net |
2012.02.0716:38
Tech used here: Valid HTML - Valid CSS - Valid RSS - JavaScript - PHP - Smarty - MySQL - and a partridge in a pear tree.