Search Logger
Archives for November, 2009.

Archive for November, 2009

Save on holiday shopping

3:19 pm - November 25, 2009 in Checkout: The Official Google Checkout Blog
With the holiday shopping rush underway, Google Checkout can help you find great deals this season. Over 900 Checkout stores including TigerDirect.com, BlueNile.com, and Petco.com are offering exclusive discounts of $5, $10, or $20 on what you buy through December 17th.

What's more, it's easy to find places to save as you shop quickly and securely with Checkout. You can either search for products on Google.com and look for the Google Checkout promotion badge, or browse participating stores on our new Checkout deals page. Good luck out there!

 

Let your subscriptions’ personality come through

1:13 pm - November 24, 2009 in Official Google Reader Blog
Posted by Mihai Parparita, Software Engineer

Favicons menu screenshotWe recently asked you for your ideas (and votes) on how to make Reader better. One of the more popular suggestions was adding favicon support for subscriptions, so today we're introducing just that (thanks to 20%-er Shreyas Desai).

We realize that not everyone wants their subscription list to turn into a multi-colored extravaganza, so we've made it into a setting that you can access from your subscriptions menu.

Be on the lookout for more ideas being implemented, and feel free to let us know how you like this feature on Twitter or in our help forum.

 

Google Product Search on the Official Google Blog

9:16 pm - November 23, 2009 in Google Merchant Blog
Earlier today, Jeff and Sameer from the Product Search team shared some holiday shopping updates on the Official Google Blog. Check out their post to see a recap of some of the work we've been doing over the past few months, including local store locations, video product reviews, and our new gallery view, among others.

Posted by Vivek Tata, Product Marketing Manager, Google Product Search
 

IE8 SmartScreen in action

4:32 pm - November 23, 2009 in IEBlog

Last week at PDC, as we were about to start talking to people about IE9, I saw the following notification from my Facebook account:

From: Facebook [mailto:notification+mwm5axbx@facebookmail.com]
Sent: Tuesday, November 17, 2009 10:05 AM

Dina posted something on your Wall and wrote:

"funny vid of u, you see it? http://www.facebook.com/l/ca339;hTTP://www.N70.InFO/2d"

To see your Wall or to write on Dina's Wall, follow the link below:

<..>

Thanks,

The Facebook Team

The message was from someone I know pretty well, and I believed the message. The address itself (http://www.n70.info/2d) wasn’t that suspicious; there are a lot of URL shortening services, and the .info domain has many legitimate sites on it. So I clicked the it:

IE8 SmartScreen blocking page indicating that the requested URL is unsafe

and thought – whew. 

IE8’s SmartScreen now blocks malware sites over two million times a day. IE8 offers a lot of protection from real-world attacks: phishing protection, a cross-site scripting filter, and Protected Mode (I may run as an administrator, but my browser doesn’t). With attacks on the rise, using (or upgrading to) a browser with this much protection is more important than ever. IE8 also offers great reliability because of process-isolation, and offers users the ability to manage add-ons that affect performance and stability. InPrivate Browsing and InPrivate Filtering are also quite handy.

I wrote back to my friend, and she was surprised. You can read Facebook’s guidance about what to do if this happens to you or a friend.

Dean Hachamovitch

 

Blogger Status 2009-11-23 14:21:00

2:21 pm - November 23, 2009 in Blogger Status
Image uploads will be unavailable Monday (11/23) at 4:00PM PDT for about 30 minutes for maintenance.


Update: This is now complete. Thanks for your patience.
 

Explore Images with Google Image Swirl

1:12 pm - November 23, 2009 in Google Research Blog


Earlier this week, we announced the Labs launch of Google Image Swirl, an experimental search tool that organizes image-search results. We hope to take this opportunity to explain some of the research underlying this feature, and why it is an important area of focus for computer vision research at Google.

As the Web becomes more "visual," it is important for Google to go beyond traditional text and hyperlink analysis to unlock the information stored in the image pixels. If our search algorithms can understand the content of images and organize search results accordingly, we can provide users with a more engaging and useful image-search experience.

Google Image Swirl represents a concrete step towards reaching that goal. It looks at the pixel values of the top search results and organizes and presents them in visually distinctive groups. For example, in ambiguous queries such as "jaguar," Image Swirl separates the top search results into categories such as jaguar the animal and jaguar the brand of car. The top-level groups are further divided into collections of subgroups, allowing users to explore a broad set of visual concepts associated with the query, such as the front view of a Jaguar car or Eiffel Tower at night or from a distance. This is a distinct departure from the way images are ranked by the Google Similar Images, which excels at finding images very visually similar to the query image.


No matter how much work goes into engineering image and text features to represent the content of images, there will always be errors and inconsistencies. Sometimes two images share many visual or text features, but have little real-world connection. In other cases, objects that look similar to the human eye may appear drastically different to computer vision algorithms. Most difficult of all, the system has to work at Web Scale -- it must cover a large fraction of query traffic, and handle ambiguities and inconsistencies in the quality of information extracted from Web images.

In Google Image Swirl, we address this set of challenges by organizing all available information about an image set into a pairwise similarity graph, and applying novel graph-analysis algorithms to discover higher-order similarity and category information from this graph. Given the high dimensionality of image features and the noise in the data, it can be difficult to train a monolithic categorization engine that can generalize across all queries. In contrast, image similarities need only be defined for similar enough objects and trained with limited sets of data. Also, invariance to certain transformations or typical intra-class variation can be built into the perceptual similarity function. Different features or similarity functions may be selected, or learned, for different types of queries or image contents. Given a robust set of similarity functions, one can generate a graph (nodes are images and edges are similarity values) and apply graph analysis algorithms to infer similarities and categorical relationships that are not immediately obvious. In this work, we combined multiple sources of similarity such as those used in Google Similar Images, landmark recognition, Picasa's face recognition, anchor text similarity, and category-instance relationships between keywords similar to that in WordNet. It is a continuation of our prior effort [paper] to rank images based on visual similarity.

As with any practical application of computer vision techniques, there are a number of ad hoc details which are critical to the success of the system but are scientifically less interesting. One important direction of our future work will be to generalize some of the heuristics present in the system to make them more robust, while at the same time making the algorithm easier to analyze and evaluate against existing state-of-the-art methods. We hope that this work will lead to further research in the area of content-based image organization and look forward to your feedback.
 

Google Base’s Search Page Retired

9:14 pm - November 18, 2009 in Google Merchant Blog
As we announced we'd be doing a few weeks ago, we've retired Google Base's separate search page today. We don't expect that this change will have much impact for data providers. However, if you have any questions about this or any other issue, check out the Google Merchant Center Forums to get help from the Base community and Google staff.

Update: We've corrected the publish date.

Posted by Robin Züger, Product Manager, Google Base
 

An Early Look At IE9 for Developers

12:23 pm - November 18, 2009 in IEBlog

We’re just about a month after the Windows 7 launch, and wanted to show an early look at some of the work underway on Internet Explorer 9. 

At the PDC today, in addition to demonstrating some of the progress on performance and interoperable standards, we showed how IE and Windows will make the power of PC hardware available to web developers in the browser. Specifically, we demonstrated hardware-accelerated rendering of all graphics and text in web pages, something that other browsers don’t do today. Web site developers will see performance gains and other benefits without having to re-write their sites.

Performance Progress. Browser performance involves many different sub-systems within the browser. Different sites – and different activities within the same site – place different loads and demands on the browser.

For example, two news sites might look similar to a user but have very different performance characteristics. Because of how the developers authored the sites, one site might spend most of its time in the Javascript engine and DOM, while the other site might spend most of its time in layout and rendering. A site that’s more of an “application” than a page (like web-based email, or the Office Web Apps) can exercise browser subsystems in completely different ways depending on the user’s actions.

The chart below shows how much time different sites spends in different subsystems of IE. For example, it shows that one major news site spends most of its time in the script engine and marshalling, while another spends most of its time in script and rendering, and the Excel Web App spends very little of its time running script at all.

chart of which IE subsystems different websites spend their time in.  The chart shows that each site has a very different allocation of which subsystems they spend time in.

Note that this chart shows the percentages of total time spent in each subsystem, not relative time between sites. It focuses on just the primary browsing sub-systems and doesn’t include “frame” functionality (like anti-phishing), or third-party software that’s running in the IE process (like toolbars, or controls like Flash). It also factors out networking since that’s dependent on the users network speed. Notice also that a site’s profile can change significantly across scenarios; for example, the Excel Web App profile for loading a file is quite different from the profile for selecting part of the sheet.

The script engine is just one of these browser subsystems. There are many benchmarks for script performance. One common test of script performance is from Apple’s Webkit team, the SunSpider test. The chart below shows the relative performance of different browsers on the same machine running the SunSpider test.

chart of IE, FF, Chrome and Safari performance of Sunspider test.  The IE9 results on sunspider are competitve with FF 3.6, Chrome4 and the nightly webkit build.

In addition to IE7 and the current “final release” versions of major browsers, we’ve included the latest pre-release “under development” builds of the major browsers. We’re just about a month after IE8 was released as part of the Windows 7 launch, and the version of IE under development is no longer an outlier. 

It is worth noting that once the differences are this small, the other subsystems that contribute to performance become much more important, and perceiving the differences may be difficult on real-world sites. That said, we remain committed to improving script performance.

We’re looking at the performance characteristics of all the browser sub-systems as real-world sites use them. Our goal is to deliver better performance across the board for real-world sites, not just benchmarks.

Standards Progress. Our focus is providing rich capabilities – the ones that most developers want to use – in an interoperable way.  Developers want more capabilities in the browser to build great apps and experiences; they want them to work in an interoperable way so they don’t have to re-write and re-test their sites again and again. The standards process offers a good means to that end.

As engineers, when we want to assess progress, we develop a test suite that exercises the breadth and depth of functionality. With IE8, we delivered a highly-interoperable implementation of CSS 2.1 and contributed over 7,200 tests to the W3C. Standards that do not include validation tests are much more difficult to implement consistently, and more difficult for site developers to rely on.

Some standards tests – like Acid3 – have become widely used as shorthand for standards compliance, even with some shortcomings. Acid3 tests about 100 aspects of different technologies (many still in the “working draft” stage of standardization), including many edge cases and error conditions. Here’s the latest build of IE9 running Acid3: 

screen shot of ACID3 test showing a score of 32.

As we improve support in IE for technologies that site developers use, the score will continue to go up. A more meaningful (from the point of view of web developers) example of standards support involves rounded corners. Here’s IE9 drawing rounded corners, along with the underlying mark-up:

screenshot of a box with rounded corners.  each corner is rounded differently.

Another example of standards support that matters to web developers is CSS3 selectors. Here’s a test page that some people in the web development community put together at css3.info; it’s a good illustration of a more thorough test, and one that shows some of the progress we’ve made since releasing IE8:

screenshot of css3.info test page showing many passing test cases.

Community testing efforts like this one can be helpful. Ultimately, we want to work with the community and W3C and other members of the working groups to define true validation test suites, like the one that we’re all working on together for CSS 2.1, for the standards that matter to developers. For example, this link tests one of the HTML5 storage APIs; some browsers (including IE8) support it today, while others don’t.

The work we do here, both in the product and on test suites, is a means to an end: a rich interoperable platform that developers can rely on. 

Bringing the power of PC hardware and Windows to web developers in the browser. The PC platform and ecosystem around Windows deliver amazing hardware innovation. The browser should be a place where the benefits of that hardware innovation shine through for web developers.

We’re changing IE to use the DirectX family of Windows APIs to enable many advances for web developers. The starting point is moving all graphics and text rendering from the CPU to the graphics card using Direct2D and DirectWrite. Graphics hardware acceleration means that rich, graphically intensive sites can render faster while using less CPU. (This interview includes screen captures of a few examples.) Now, web developers can take advantage of the hardware ecosystem’s advances in graphics while they continue to author sites with the same interoperable standards patterns they’re used to.

In addition to better performance, this technology shift also increases font quality and readability with sub-pixel positioning:

96 point Gabriola on a Lenovo X61 ThinkPad at 100% Zoom using GDI (note jaggies):

text "Direct2D" in 96pt Gabriola font using GDI rendering.  The rendering looks somewhat jagged.

96 point Gabriola on a Lenovo X61 ThinkPad at 100% Zoom: Direct2D (without jaggies):

text "Direct2D" in 96pt Gabriola font using Direct2D rendering.  The rendering looks much smoother than how it is rendered in GDI.

Last week, Channel 9 interviewed several of the engineers on the team. You can find videos of the interviews here:

Introduction, and Interoperable Standards

Early look at the Script Engine

Hardware accelerated graphics and text in the browser via Direct2D

While we’re still early in the product cycle, we wanted to be clear to developers about our approach and the progress so far. We’re applying the feedback from the IE8 product cycle, and we’re committed to delivering on another version of IE.

Thanks,
Dean Hachamovitch
General Manager, Internet Explorer

Update 11/23/09 - The IE9 demo from PDC is now available.  The IE content starts around minute 48.

 

The Next Frontier in Search: Questions & Answers

12:03 am - November 14, 2009 in The Ask.com Blog

A few months ago at SemTech 2009 we announced that our questions and answers database –launched almost a year ago – had grown to more than 300 million high-quality Q&A pairs. “High-quality” means that we use our semantic and extraction capabilities to recognize the best answer from within the sea of information on relevant pages. Instead of 10 blue links, we deliver the best answer right at the top of the page.

This week we’ve achieved another significant milestone by reaching 400 million Q&A pairs, and I want to acknowledge the outstanding work of our engineering and product teams who have built one of the largest and most useful Q&A collections on the web.

I also want to share what we’re seeing from our users in response to our Q&A offerings, and to preview what’s next for Ask.

Our Q&A strategy has started to pay off. We see increasing loyalty among users who conduct question searches on Ask. Simultaneously, we’ve seen a pronounced increase in the percentage of users on Ask who conduct queries in the form of a question – we now see 3x more questions on our site as a share of total queries than our competitors. And perhaps most rewarding for us is when we ask Internet users where they go for questions and answers online, they consistently rank Ask.com first, making us the #1 brand for questions and answers online.

Online search in the form of natural-language questions was the ingenious proposition of the original Ask Jeeves in 1996, and frankly, it’s the reason we’re still around today after so many other Internet brands didn’t survive. 

As the leader in questions for more than a decade, one thing is crystal clear: Asking a question isn’t the same as searching.

Our users tell us that their expectation when asking a question is different from their expectation when conducting a search. When asking a question, they have a specific need for a specific piece of information. When conducting a search, they’re browsing for information, sorting through results to unearth the answer they’re looking for. 

Put another way, when asking a question, you expect the work to be done for you (much like when you ask a librarian for a book at the library). When conducting a search, you do the work yourself (skipping the librarian, and heading to the card catalog instead).

Further, with the advent of the social web, asking questions online is now more natural, as we have the ability to broadcast a question to real people, our friends, instead of hoping a computer can understand our inquiry.

I firmly believe that questions are the future of search, but search technologies as we know them today can’t deliver against this future.

And this brings me to what’s next for Ask.

We’re focused on solving the two shortcomings of search as it relates to questions:

1. Traditional search signals don’t work well for answers to questions.
2. The answers to many questions are wrong or don’t exist online.

Let me explain what I mean.

When you’re in the business of answering questions, the volume of inbound links to a given web page – a long-accepted search technique for ranking web sites – doesn’t tell you the site with the best answer to a user’s question; it just tells you the most popular page with relevant information. Nor does another search technique, text matching, sufficiently identify the best answer, as the text in a question is rarely found in the best answer. Same with a newer though established technique, pioneered at Ask, actually, that uses click-through behavior to determine a site's relevance. Unlike presenting a text snippet that merely describes a site and a link, presenting the actual answer requires no click through to the  
destination site.

Below are some examples which bring this to life.
 
Pic1 
Pic2 
Without a wholly different approach, search engines will never be able to adequately answer all the questions that users increasingly have for them.

More importantly, no method that merely extracts answers from a published web page will ever be able to access the limitless number of answers that are unpublished on the Internet. Indeed, the information that is directly relevant to many questions most certainly exists; it's just that it’s locked in people’s heads or captured in unpublished conversations, and therefore inaccessible by traditional search. Obviously, this is not a trivial deficiency in a world that is increasingly interconnected and clamoring for perspective, guidance, and shared knowledge at an interpersonal level online.
 
At Ask.com, we’re dedicating ourselves to solving these problems and we're approaching the solution in two primary ways: 

1.  Extracting and ranking existing answers
2. Indexing sources of answers that have not yet been published

To extract and rank existing answers, as opposed to merely ranking web pages that contain information, we have and are continuing to develop a unique set of algorithms and technologies that are based on new signals for relevance specifically tuned to questions and answers.

I’ve outlined a few of these below.

Pic3  Pic4  Pic5

Pic6 
 
Developing a new Q&A relevance algorithm that draws upon these signals is what we’re focused on building here at Ask, honing our ability to extract answers from the published Internet, and allowing us to fulfill a vastly larger volume of questions than can be done with existing search technologies.

But our work doesn’t end with extraction and ranking of existing, published answers. Where our vision really comes to life is in our efforts to index the sources of unpublished knowledge that can generate answers specifically in response to a question, in the moment it’s asked. This is the long tail of questions that are nearly impossible for search engines to answer, but which create incredible value for users when they are.

Here are some examples:

Pic7 

As we accelerate our strategy to answer the world’s questions, these “tough questions” are where we see huge opportunity, and where we are also focusing our efforts. And as you’ve probably guessed by now, we will do this unconventionally, harnessing the equity of the Ask brand, and our loyal, question-loving users to build a community of answerers available through Ask.

We’ve learned at Ask that while the existing Web can solve many problems, when you’re in the pursuit of answering questions, relying on published information sources can really only get you part of the way there. There is an infinite volume of answers in people’s heads that isn’t being indexed by the search engines today, and that can’t be successfully deployed against questions until you unleash it, in real-time, in response to the unique needs expressed by the person asking the question. 

This is the problem we’re in the process of solving here at Ask: Connecting our users’ questions to the best possible answers on the planet – be they published or unpublished. And as we solve this problem, we believe today’s multi-billion dollar questions and answers value proposition will one day transcend search as we know it today.

I’m very passionate about this, and so is our team at Ask.com. You’ll be hearing much more from us on this in the coming months.

Doug

Doug Leeds
President
Ask.com US

 
 
 
 
 
 
It's All About Search | © clsc.net |
2012.02.0420:17
Tech used here: Valid HTML - Valid CSS - Valid RSS - JavaScript - PHP - Smarty - MySQL - and a partridge in a pear tree.