There’s been some increased buzz lately about the semantic web and what it all means. Alex Iskold, CEO of AdaptiveBlue has a great piece on SemanticWeb.com, titled "The Semantic Curmudgeon".
AdaptiveBlue make an interesting browser plugin that understands context and applies that understanding to generate useful shortcuts on pages, links and text. So if you’re browsing music on the web, their plugin "understands" that and suggests useful contextually-related links. Visit the site and take a tour — it will explain it much better than I just did (sorry Alex!).
Part of what we do here in MonkeyVille is semantic-technology related, in that some of our code understands words and attempts to imply context. But by no means are we a semantic web application (and no, I will not get drawn in to the idiotic "I’m a web 3.0 app" discussions that are bouncing around the blogosphere).
However, our choice not to be a fully semantic app was deliberate, and Alex’s article hits the proverbial nail on the noggin as to why:
1. It lacks memory and is not iterative in nature.
2. Its ultimate goal is to deliver perfect answers, which are unattainable.
3. It is technologically impractical to achieve.
Back in the day, I was involved with a company that did a lot of research into symbol recognition for engineering drawings. The idea is just like OCR — scan a page of text and get real words — but for engineering symbols. It is a tough problem to solve, arguably worse than handwriting recognition because symbols can be anywhere in a drawing and can be drawn on top of other lines and features.
We had some clever engineers who spent a lot of time trying to solve the problem. Using AI, fuzzy this and neural that, they boosted recognition rates from ~65% to (I think) 80+%. We were proud, and very condescending of our competitor with their stubby and sad 65%. But the competition were smart, as well as clever. They responded not by developing even better technology, but creating better workflow. They redefined the real problem: customers wanted to quickly convert hand drawn squiggles to symbols within a CAD system. Customers really didn’t care how they got the end result. So competitor took their oh-so-sad 65% algorithm, used it to identify everything in the drawing that might possibly be a symbol, and developed some very quick tools to tab around the drawing and manually replace all the squiggles with symbols.
Using their system, you could convert an entire drawing in ~30 minutes. Using ours, initial processing would only take 10 or 15 minutes, but the cleanup (finding the 20% that wasn’t recognized correctly) took an hour or more. So their dopey oh-so-stupid technology kicked our asses by 2x or more every time.
Doh!
Smart almost always beats clever.
And the situation with the purely semantic web is almost identical. The idea that code will ever be able to identify context and meaning with 100% certainly for every individual is absurd. It might hit 85% for most people, or more. But it will never, ever, hit perfect accuracy.
The second issue I see with "pure" semantic web plays is that they expect authors of web pages to add additional markup that conveys contextual meaning. Given how long it has taken for CSS to be adopted — something obviously useful at the individual level — I question how readily authors will adapt to adding contextual markup. Not to mention that there is an awful lot of data already out there. IDC reckon that “The digital universe in 2006 could be likened to 12 stacks of books extending from the Earth to the sun. By 2010 the stack of books could reach from the sun to Pluto and back…”.
So while semantic tools absolutely have their use, they are just another component of an overall solution, not a universal magic bullet.
More on what we’re doing on Wednesday!
2 comments for “Semantic, schmantic…”