Collaborative tagging: how networking sites connect people by interests and goals
13-Oct-05
Spread the word about this article:
The organic growth of the web is bringing the issue of information categorization and retrieval upfront as never before. Publishing information is not an expensive activity anymore, both from a technical and a financial point of view, and loads of valuable independent editors contributes with their daily effort to the global knowledge by means of the new cultural weapons of weblogs and wikies. Internet users are constantly looking for new and adapting forms of categorization to help them structuring the overwhelming information flow that every day comes to their desktops.
Searching vs Browsing
Structuring web content in order to ease information retrieval has been a tough issue for the web community since the very beginning. Librarians brought their professional experience by adopting a traditional categorization approach for web content classification. This approach, created for the optimal retrieval of physical objects like books on the library shelves, found it difficult to adapt itself to the non-physical world of digital information. Categorization systems like the Dewey Decimal Systems were meant to superimpose a fixed hierarchical structure designed to optimize seek time and then based on the strict “be in one place” requirement. When the Yahoo Directory project started over ten years ago, it was one of the first attempts to bring some order to the already vast production of content over the web. The Yahoo list soon became a hierarchy with categories and, more or less in the way a standard file system behaves, it forced a strict classification to define the right categories in which all the websites should be put. The top-down approach was chosen with the strength of long-term experience in cataloguing that was claimed as the better recipe to cope with that giant effort. But the web is made of links, and this potentially infinite collection of links brings to our desktops every day a network of cross-referenced content that is impossible to constrain into a weak model based on pre-defined classifications. There is no shelf in the web in fact, no giant file system rules to obey to and nothing but a network of links is required to make it live and self-organized. Google’s success was also based on this argument: the only categorization needed is what a user put into the search box. The shift from browsing, where people making the ontology also have the responsibility to structure the world in advance, to searching, where query results are only based on the web of links and on the actual content, was immediate. Directories like DMOZ – The Open Directory Project – keep struggling against the enormous power of the scalable architecture of the Googleplex, which is today capable of returning results for more than 300 queries a second. When offered side-by-side, search and categorization were rarely used with comparable intensity, being search far more the approach of election. Ontological classifications and browsing was still kept as the primary approach to information finding in contexts where faceted classifications were successfully adopted. Thematic directories like epicurios.com or wine.com took advantage of labeling to allow for searches that consider several different features (or facets) of the item at the same time. This approach has proved particularly effective to organize a relatively small corpus of information, made of a stable and restricted collection of entities, grouped into formal categories with clear edges.
Bookmarking and remote storage
But this is not what the Internet was meant for. Born as a project to connect research centres, as soon as it reached a wider distribution the web hosted an unregulated dynamic flow of information whose peers’ diversity exceeded any expectation. In this context, no authoritative source of judgment, a feature necessary to design any top-down categorization model, is conceivable. Information exchanges in the web are instead driven by uncoordinated users, involved in growing a continuously evolving corpus of unstable entities. Their primary need is to remember the items they come across while surfing the web. The added value of personal elaboration to the actual information content of each web item is as well important as the seeking process itself and should be supported by a technology that could seamlessly follow each individual mental model. In a world of links, the web browsers offered the bookmarks functionality to allow for a personal archive of references. Nevertheless, local storage was soon perceived as a limitation, especially for those who surfed the web from both home and office computers. Bookmarklets contributed to keep the bookmarking habit alive but allowed for remote storage of personal collections of links. These little JavaScript routines running within the browser at a mouse click speed, made it also possible to comment on specific links while storing it in a remote personal repository like the one offered by the SaveThis service. It was the first step towards the sharing of an enhanced surfing experience. Then the architectures of participation came into view. Started with very popular websites like the online auctioneer eBay or the techies’ favorite SlashDot, a collaborative approach to information collection began to emerge as an effective way to demote untrusted content and promote popular and socially certified web publications. This self-regulating approach has been successfully experimented in content harvesting processes like Wikipedia, in which a bottom-up aggregation of voluntary contribution has made of a user-generated encyclopedia one of the most brilliant successes of collaboration in the web.
Social classification vs. Formal classification
The social aspects of these architectures became particularly effective with the advent of the collaborative bookmarking website named del.icio.us. As its creator Joshua Schachter has defined it, del.icio.us is a “social bookmarks manager”, a remote repository of personal bookmarks in which, thanks to a wisely formatted URL in the form of http://del.icio.us/username, each individual collection of links is accessible to any visitor. But what made it unique and so popular was the possibility offered to users to add a bunch of labels to each bookmark, thus forming an incremental personal classification of the links stored remotely. These labels, or tags as the current widespread wording seems to prefer, form a self-organised classification schema that encompasses all the items bookmarked in del.icio.us. It was the birth of “folksonomy”, a play on the word “taxonomy” that the Information Architect Thomas Vander Wal coined in a SIGIA discussion thread. In del.icio.us like in any other tag-based repository, each user can browse the links collection through a special interface, known as tag-cloud, showing the used tags at a different font size according to its frequency: the larger the font, the higher the tag usage frequency. Additionally, users can browse all the stored bookmarks marked with a specific tag or with a group of tags and can be notified, through specific RSS feeds built on the fly, each time a new bookmarks is added for a specific tag or group of tags, or by a specific user. The tagging habit has now virally spread throughout the web. Del.icio.us for bookmarks; Flickr for photography; 43Things for goal-setting; 43Places for travel, AllConsuming for books, albums and movies consumed, LibraryThing for personal libraries, CiteULike for academic papers, last.fm for music; and 43People for social networking: the websites offering tagging functionalities now count in tens. Not to mention the giant Google that allows for labeling into its GMail web application and into its newborn Reader, an efficient RSS aggregator with a brilliant user interface that send us back to the BBS era with an amazingly simple keyboard shortcuts set. Collaborative Tagging then promotes itself as an alternative way to categorize web content. Instead of the hierarchical and exclusive approach of taxonomies, it suggests a flat and inclusive approach through which it is possible to select, like in a Venn diagram, content categorized with a number of tags at the same time, thus overcoming the limitations of the unique classification forced by taxonomies.
The cognitive process behind tagging
From a semantic point of view, the problems investing both taxonomies and tagging systems are pretty much the same. They deal with the process of creating semantic relations between words and often depend on the context in which a concept is framed. Polysemy, the possibility for a word to have many related senses, and Homonymy, the characteristics involving words having multiple unrelated senses, are just two examples. A tag-based search can be of help in the case of homonyms because the inclusion of related terms in the search can eliminate unwanted homonyms from the results. In the case of Synonymy instead, namely the possibility of addressing at the same meaning with multiple words, collective tagging tends to amplify the problem, since the process of tagging is unregulated by definition, and so open to inconsistencies. But the requirement of a “controlled vocabulary”, a basic ingredient of all taxonomic organizations, is somehow fulfilled by the social, self-organised nature of collaborative tagging. As a matter of fact, what differentiates the cognitive process behind tagging from the one behind categorization is the possibility for an individual to avoid the choice of the best place in which to put a certain item. After the identification of content that is worth remembering, and after the creation of a list of concepts that the content itself suggests to our minds, tagging offers the possibility of simply writing those concepts down in a way we perceived to be optimal for future retrieval, according to our own mindset. This is much simpler and straightforward than selecting, amongst the activated concepts, the one that best represent the item, in order to find the best category for it. The relatively low cognitive cost of tagging makes the collaborative categorization effort real. The structure emerges from a bottom-up aggregation of widespread concepts that makes the shared categorization come out of the multitude of different tags used to mark an item. The del.icio.us bookmark posting interface for example tends to second the power-law that govern the tagging process. While adding the tags, a list of suggestions is available consisting of commonly used tags for that item together with commonly related tags. These suggestions make the whole tagging process even simpler, contributing to make the tag set, and then the classification, for each item to converge toward a stable pattern. A document, or generally any knowledge item, thus becomes the sum of its tags. An interesting aspect of tagging is the intended purpose of each classification. These tags could address the identification of each item (what is it, what is it about, who owns it) and the elaboration process the user performed about it (qualities and features, actions to perform, related projects or tasks). Tags can in fact be used with a ‘selfish’ discipline, having the personal retrieval purposes primarily in mind, or with an ‘altruistic’ approach that takes into account other users’ retrieval process. While the former simply denotes the application of a new approach to solve an old problem, the latter make it apparent the attempt of taking advantage of the collective effort to enhance the knowledge acquisition experience. Collaborative Tagging then becomes a way to receive filtered recommendations on topics of interest with the possibility, like in the case of del.icio.us bookmarking, to select only trusted recommenders. Language is also a driver for community specific classifications. English tags are used for general accessibility and the English language naturally becomes the source for the emergent controlled vocabulary. At the same time, context-specific jargons like for example the one who most of the technology related tags stored in del.icio.us belongs to, contribute to select the audience of a knowledge base by means of groups of tags relative to a specific application area. The signal loss typical of the compression of many diverse concepts into one single category is in tagging systems reduced when considering the aggregated, emergent tag group that represents each item. Instead of being a problem, the multiplicity point of view that can characterizes the way a resource is tagged simply widens the spectrum of classification and eventually consolidates into that minimum commonality that takes into account all the most accepted visions. The advantage of this approach is also maybe its main drawback: like any other socially-driven phenomenon, it follows the dynamics of imitation. The popularity of a resource is strictly related to the visibility of its recommender and so it is its categorization. During the tagging process people tend to accept the suggestions provided, so when the tag set is propagated thanks to the popularity of the user who originally created it, each of the chosen tags is also promoted with the same speed. But, all in all, this is the way buzzwords become popular, and tagging systems are no different.
Towards a personal tagging desktop
Tagging is at the moment a growing habit among web users. Nonetheless a coherent personal approach is still missing. Several platforms are available to collect items of different nature but no service or application allow for a central repository of the whole personal tag set. No interface is available to browse heterogeneous content by means of the same tag-driven mindset and so users are forced to replicate the queries on to the various systems. An ideal scenario could be one in which a tag-cloud or any comparable browsing/searching interface could provide access to all our personal knowledge items, grouping together e-mail messages, documents, web links, contacts, and any other privately or publicly accessible resource with the same personal classification schema. Step towards this goal has already been tried for example by spurl.net who provides a way to store private resources together with public ones using the same tag set provided to del.icio.us. Or by Technorati, the blogger-support organization, which centralizes the searches by tag and provides result sets made of resources coming from weblogs, del.icio.us link collections, flickr photos into one single webpage. But this has to do with the magic of Web 2.0, the programmable web idea, and it will be the topic of the next articles.
References
Please visit http://del.icio.us/KBTechSIG for the complete bookmark list. Links related to this article are marked with the tag "KBTechNews07"
Hammond, T., Hannay, T., Lund, B., Scott, J., Social Bookmarking Tools – A General Review, D-Lib Magazine, April 2005. http://www.dlib.org/dlib/april05/hammond/04hammond.html
Arnold, S., Google Technology, in The Google legacy, how Google's internet search is transforming application software, Infonortics, Tetbury, England; September 2005. http://www.infonortics.com/publications/google/google-legacy.html
Sinha, R., A cognitive analysis of tagging (or how the lower cognitive cost of tagging makes it popular), weblog post, September 27, 2005. http://www.rashmisinha.com/archives/05_09/tagging-cognitive.html
Shirky, C., Ontology is Overrated: Categories, Links, and Tags, shirky.com, 2005 http://www.shirky.com/writings/ontology_overrated.html Golder S, Huberman BA, The Structure of Collaborative Tagging Systems, Information Dynamics Lab, HP Labs, 2005. (Working paper) http://arxiv.org/ftp/cs/papers/0508/0508082.pdf
Details
- Author:
- Silverio Petruzzellis
- Publisher:
- KnowledgeBoard
- Date:
- 13-Oct-05
- Categories:
- Technology
This article has been read 9577 times.
Tools
Member comments (8)
Share your views with other users: add your own comments to this item.
Really Useful Article
A really interesting article - especially the stuff on tagging. Also spent quite a bit of time on Ning after reading the comments (thanks Ed)!
Knowledge Categorization
As per the research report, 20-30% of the time people spend in looking for knowledge / information. If it takes more than 5 clicks to get to the specific piece of knowledge, most people assume that the knowledge doesn't exist. Hence searching is very expensive.
Comprehensive and more meaningful tagging (individual as well as team perspective) along with the structured navigation is very essential to reduce the cost of searching. It helps maximize the productivity and responsiveness.
Most information is categorized using the taxonomy or folder hierarchy model. Folder hierarchy is the most primitive model for knowledge categorization. Proper categorization makes the content meaningful based on the needs of the knowledge seekers.
For example, the following categories can be applied to ensure that the search is faster and relevant:
1. Knowledge Category (Subject Area). Eg: Technology->IT->Java->JDBC.
2. Knowledge Level (1 - beginners 5 - experts)
3. Knowledge Functions or Skills (Presales, consultant, marketing, research, training and so on). Multiple knowledge functions can also be assigned to one piece of information when it is relevant to multiple business functions.
4. Knowledge Type such as article, presentation, white-paper, faq and so on
5. Keywords (for pointing to related knowledge)
6. and many more
Collaborative (common or team) tagging must be reviewed and regulated. Then, we can also introduce personalized tagging which is visible or meaningful only to the individual.
This comprehensive tagging or categorization makes the navigation structured, easy and relevant. It makes the knowledge portal more meaningful.
following from Ed's comment
and there's even more! I just heard about Ning which looks rather fun.
Ed
Caution and alrm are raised to get on with it
Silverio, you say:
"Tagging is at the moment a growing habit among web users. Nonetheless a coherent personal approach is still missing."
There seems to be indications that this personal focus on local and global responsibility is now being left behind by the powers that be, above everyone's head. Here are 2 examples of fuel being added to the fire of censorship and control of the world wide web:
http://www.techcentralstation.com/102105A.html
and: http://www.techcentralstation.com/hayek.html
Because of such a condition, shouls we not be considering the appeal I am making to the EU KM Board membership for:
DIPLOMACY IS THE SEARCH OF WISDOM'S CURRENT APPLICATION
We must grow with the momentum of the Faculty of Living to ease civilization out of the grip of stress and terror. Wisdom, knowledge and the timing of humanity's massive need to learn how to personaly live as a global community nomimates Chris Macrae and appoints the EU KM Board with the mission to serve the wise movement of maturity and the bold transparency of open source KM, so as justify the invitation on this thread, to support the campaign in these 2 links:
http://www.bbc.co.uk/dna/actionnetwork/G1281
http://www.knowledgeboard.com/cgi-bin/item.cgi?id=129672&d=pnd
And at the last 2 posts of this thread::
http://disruptive-mice.org/forums/711/ShowPost.aspx
Knowledge of personal and communal wisdom is an experience that we cannot afford to ignore any longer. There is plenty of caution and civility to proceed boldly. Let us now engage, shall we...
And there's more...
Currently one of my favourite tag sites was introduced to me by Social Network architect Julian Bond: www.last.fm (mentioned above) is the first internet radio station that plays you music based on tags - it's a fantastic implementation of folksonomy and well worth a look.
In the open source space there is also a new email solution from www.hula-project.org that provides a facility to tag your emails just like Gmail.
I'd also keep an eye on Seth Godin's latest venture: www.squidoo.com which introduces the concept of lenses and experts.
This is a very interesting space to be in right now and I'm glad to be working on a variety of projects that are implementing this approach to KM.
Derek might care to take a look at Omea: http://www.jetbrains.com/omea/
Tagging for a personal desktop
A link to this article was posted in cpsquare in response to a question I posted: I'm looking for a utility that lets me tag files on my hard drive.
# I've tried a few Google searches, but can't find exactly what I want. I have downloaded and installed a search utility from http://www.copernic.com/index.html This is blindingly fast, and has helped. It indexes content as well as files titles. I have e-mailed these guys and asked about tagging, but no response.
# I've done the Google desktop thing.
# What I'd like is something like Picassa for documents on my HDD.
Until reading this comment I had never thought of a blended web tags and HDD set of tags. This would be even better. :-)
Is there an app out there that enables tagging of documents on a HDD?
Social Bookmarking by Google to come
After its very effective Reader, Google is presenting a new feature in its Google History application interface: the good old bookmarks. Of course, no bookmark make sense without a tag nowadays. So Google allows for each bookmark to be marked by labels.
I’m afraid the vision I have in mind is very close to become reality. Google is approaching Social Bookmarking and will allow us to have that unified tag-based search+browse interface overarching the whole personal knowledge base on our desktops.
I’m sure that Google Desktop will allow to bookmark private items as well and its labeling system will then cover them all: mail+documents+weblinks.
A short comment on wording: Google keeps calling “label” what almost anywhere else is named “tag”. I think this is an attemp to diversify the brand and promote the Google approach in contrast to all the rest. It will be interesting to see how much the term label will be used in the web community as a synomym of tag and measure the diffusion of the Google web apps to come in a near future.


Where are the paragraphs?
Is it just me or did nobody see the absence of Paragraphs in this huge block of text? It becomes difficult to read without paragraphs...
Yet, I must say I have tried about 90% of the tools mentioned in this document and still feel that they are not enough.
Collaborative tagging can at the most help identify the most popular tag for a news or an item, but when it comes to Knowledge Management, you cannot store all the knowledge in the Tag. The tag is merely an indicator of the potential Knowledge.
Most of these sites are modified Meta-Content Management Systems which double up as People/Social Networks because of their Human element. The moot point that needs to be addressed is managing the knowledge that these people carry.
I may or may not use the same Tags as a person like me. Again, the meaning that I apply to my tag may not be the same meaning that applies to the Tag in general. The simplest examples would be words with multiple meanings (e.g. wind).
In this context connecting two people because they have the same tags may not necessarily mean they have the same interests. Again. Most of the tagging happens in youth or Offices. Connections among the former are fast and furious, while connections among the latter are business oriented. People-matching in either-case is bound to lead into wrong arenas (most of the time).
All these hypotheses are from my recent expriences of these tools. For a simple example, click on a particular tag and follow three random users who have used that 'Tag' for categorizing something. See how many can be called as people with same interests...