Shirky and Ontology vs. Folksonomies
January 27, 2006Fields: semweb Semantic Web Ontology
An interesting article here , with some comments here. Obviously linked to the last post, and something that is probably bubbling up (incidentally, it’s an area that seems to have been rather ignored by the ‘academic’ community: When I checked, citeseer was down, and ArXiv (cs) only had one paper on folksonomies, and that was a network analysis). I actually think there are some far more incisive comments about the problems with SemWeb in a different Shirky Article, so I’ll try and comment on both. This is just a collection of comments, rather than a point-by-point reply.
He’s fundamentally right about the weakness in deductive logic; the world just isn’t neat enough to allow us to hope for closure on the world. On the other hand, that doesn’t mean that we should just throw ontology out of the window; we just need to be careful about our uses & claims, and more than that, need to be careful about our definitions.
1: Gruber’s (famous) definition of Ontology is: An ontology is a specification of a conceptualization. I still don’t know what this means. I think it means that an ontology is a MODEL of the world. I then to chop its functions up into terminological (shared words), taxonomic (Is-A in a tree) and ontological (complex definitions of classes). I don’t know whether it works for others, but it works for me, and I think it’s got an important bit to it: The reason we go to so much trouble in defining classes is that it allows us to say: If
2: (many) things aren’t definitely one thing or another; this is also very true. I’m not talking here about different uses of ontology, which need different views; instead, the fundamental failure of ontologies to reflect real life. And he’s right. It’s just that it doesn’t matter. What matters is not whether it’s right, but whether it’s right enough. In 1993, some people from MIT pointed out some uncomfortable truths about Knowledge Representation (of which Ontology is part). Their first, and most powerful point, is that all KR is a surrogate, and an imperfect surrogate, so it will, always, produce some incorrect inferences. What matters is whether it also produces enough correct ones.
3: The failure of classification systems, which brings us on to more interesting ideas. If you start by saying that there are books about Russia, of course you run into problems. That’s because books don’t ‘have’ a country. If instead you start by asking ‘what can we say about a book’ (Title, Main subject, Author, Colour) then you get somewhere. What you then need to do is to to define ‘Books about Russia’ as being those books that have a main subject Russia (I’m trying to resist the urge to break into rdf here), and run your reasoner. It will tell you which books are ‘about Russia’. Of course, if you want a green sociology book, then you can do that too….The point is that defined classes are essentially queries, and poly-heirarchical ontologies let each book be the answer to multiple queries. As long as you can define your class/ query, you can find the book. Of course, it’s not always perfect, but see (2) above.
Now, where this get interesting is: what base terms should we use to build our queries? In the past, we’ve tried big top down approaches; where it might get interesting, and where I think folksonomies get good, are if you use the folk-tags as your base terms, and build queries/ classes out of those. Of course, you might need a bit more (like pulling the country space out the URL), and language encoding would be a cherry, but it might be nice….very nice, in fact.
4: N% belonging: The final thing is to deal with the fact that we can’t be sure that things are always something. His suggestion is to use n% membership, which is ok (although I have no real idea of the semantics of this: you might be able to jury-rig a frequentist approach, but it would be very variable in its response, and I don’t know what a subjective approach would mean here. Anyway, I digress…), and there has been some work on Bayesian Ontologies (somewhere, sorry). The other approach is to say that things either are, or are not, in a category, but we don’t know which. What we might be able to do is to come up with some reasons - some arguments - for believing one or the other. Of course, which you trust is up to you (and I’m not sure there’s a ‘right’ answer). Then again, seeing as my Thesis is on hooking up ontologies and arguments, I would say this.
Any comments on this insanely long post gratefully recieved…

Random thoughts on the n% belonging idea: it seems to be that there are 2 different ways in which something can fail to always fit into a certain category, and to have a percentage associated with it. On the one hand, you can have something that occasionally matches category A but then occasionally matches category B, with a complete transition between the two categories. In that case, you could estimate the amount of time in which the item was category A versus category B and say that something is 60% A and 40% B. More problematic, though, would be things that share a percentage of characteristics with category A and a percentage with B. Then, if you say that something is 60% A and 40% B, you means something different: you mean that it has combination of characteristics from both categories. Both approaches might be useful, but troublesome to work with.
Also, I think even Shirky would agree with you that “throwing ontology out the window” is not a necessary step. In the article on classification, he points out that certain domains (with expert users, formal categories, established boundaries, etc) still lend themselves well to the ontological approach. From what I can gather from your blog here, it sounds like the project you are working on tends more towards that kind of domain than to the sprawling chaos of web communities.
Comment by Karen — February 1, 2006 @ 9:40 pm
Random thoughts on the n% belonging idea: it seems to be that there are 2 different ways in which something can fail to always fit into a certain category, and to have a percentage associated with it. On the one hand, you can have something that occasionally matches category A but then occasionally matches category B, with a complete transition between the two categories. In that case, you could estimate the amount of time in which the item was category A versus category B and say that something is 60% A and 40% B. More problematic, though, would be things that share a percentage of characteristics with category A and a percentage with B. Then, if you say that something is 60% A and 40% B, you means something different: you mean that it has combination of characteristics from both categories. Both approaches might be useful, but troublesome to work with.
Also, I think even Shirky would agree with you that “throwing ontology out the window” is not a necessary step. In the article on classification, he points out that certain domains (with expert users, formal categories, established boundaries, etc) still lend themselves well to the ontological approach. From what I can gather from your blog here, it sounds like the project your working on tends more towards that kind of domain than to the sprawling chaos of web communities.
Apologies if this comment ends up doubleposted. I tried once and it didn’t seem to take…
Comment by Karen — February 1, 2006 @ 9:43 pm