Jump to content

Microsoft open sources algorithm that gives Bing some of its smarts


Karlston

Recommended Posts

You can ask "How tall is the tower in Paris?" and it knows what you're talking about.

The Eiffel Tower.
Enlarge / The Eiffel Tower.

Search engines today are more than just the dumb keyword matchers they used to be. You can ask a question—say, "How tall is the tower in Paris?"—and they'll tell you that the Eiffel Tower is 324 meters (1,063 feet) tall, about the same as an 81-story building. They can do this even though the question never actually names the tower.

 

How do they do this? As with everything else these days, they use machine learning. Machine-learning algorithms are used to build vectors—essentially, long lists of numbers—that in some sense represent their input data, whether it be text on a webpage, images, sound, or videos. Bing captures billions of these vectors for all the different kinds of media that it indexes. To search the vectors, Microsoft uses an algorithm it calls SPTAG ("Space Partition Tree and Graph"). An input query is converted into a vector, and SPTAG is used to quickly find "approximate nearest neighbors" (ANN), which is to say, vectors that are similar to the input.

 

This (with some amount of hand-waving) is how the Eiffel Tower question can be answered: a search for "How tall is the tower in Paris?" will be "near" pages talking about towers, Paris, and how tall things are. Such pages are almost surely going to be about the Eiffel Tower.

 

Microsoft has released today the SPTAG algorithm as MIT-licensed open source on GitHub. This code is proven and production-grade, used to answer questions in Bing. Developers can use this algorithm to search their own sets of vectors and do so quickly: a single machine can handle 250 million vectors and answer 1,000 queries per second. There are some samples and explanations in Microsoft's AI Lab, and Azure will have a service using the same algorithms.

 

Microsoft CEO Satya Nadella has spoken on a number of occasions of his desire to "Democratize AI" and make it available to everyone, creating not just a centralized, specialized tool that demands considerable expertise but something that a wide range of developers, solving a wide range of problems, can use as part of their toolkit. The release of SPTAG is an example of how Microsoft is putting those words into practice; the combination of an Azure service and open source means that developers can start with the more constrained, easy-to-use service, and as their expertise or requirements grow more complex, they can use SPTAG to build their own services.

 

Source: Microsoft open sources algorithm that gives Bing some of its smarts (Ars Technica)

Link to comment
Share on other sites


  • Replies 1
  • Views 394
  • Created
  • Last Reply

 

3 hours ago, Karlston said:

Search engines today are more than just the dumb keyword matchers they used to be. You can ask a question—say, "How tall is the tower in Paris?"—and they'll tell you that the Eiffel Tower is 324 meters (1,063 feet) tall, about the same as an 81-story building.

 

You must make the correct question. If you ask "HOW TALL IS PARIS", you get 1,73 m and the photo of Paris Hilton!

 

If you only ask "How high is the Tower", Google answers 300 m. Maybe hundreds of people asked the same thing today, after they read this article so Google presumes you are not asking for the Tower of London, The Sears Tower, the Trump Tower, or for Burj Khalifa; but are asking, among all the towers in the world, for the Eiffel Tower!

 

Now, if you ask "How long is the channel" between quotes, you get about 170.000 answers (but only 2 pages are valid) the first one is:

 

How long is the Channel Tunnel? The Channel Tunnel is 31.5 miles long or 50.45 km. That's the equivalent of 169 Eiffel Towers stacked on top of each other. 23.5 miles (37.9 km) of the Channel Tunnel is under the English Channel, making it the world's longest undersea tunnel.

 

No mention to the Panama Channel, Suez Channel or the English Channel... and after some intracendent entiries the last item is How long is the channel that runs through the Grand Canyon.

 

So, make your own conclusion: how smart is Google answering incomplete questions?

 

Link to comment
Share on other sites


Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...