woensdag 16 mei 2007

The 3+1 dimensions of information.

The 3+1 dimensions of information.


Suppose we have a set S of pieces of information. S is a homogeneous set: all elements are of the same "type". The elements can be of "type" text-document, picture, semi-structured description of an event, whatever.



We want to make S accessible to humans in an intelligent manner. What does that mean? For us that means:

Humans may search and browse S, and connect elements of S to knowledge they already have.

So we want to allow for 3 tasks:

  • Search: This could be free text search in the known IR style.
  • Browse: Dependent on the application and the type of the elements in S.

  • Connect with existing knowledge: With what kind of existing knowledge would users want to connect information? In almost all cases, to answers to the following three questions:
    1. When?
    2. Where?
    3. Who?

These 3 sides (dimensions is the proper term) of a piece of information are almost always important for users. Why?

  • to use the information
  • to understand the information
  • to connect the information to the "world", including the user himself

  • to connect several pieces of information, to structure S
  • ...

Information enrichment


Because these 3 dimensions are so fundamental, pieces of information often already contain answers to these 3 questions, but often implicitly. For intelligent information access it is needed to make this implicit information explicit in the form of (semi-)structured metadata added to each piece of information.

Categorization: the what question (the fourth dimension)


Much emphasis is given to the what question, often in the form of categorization (alles in een hokje of een vakje stoppen). Is this really needed for intelligent information access? It happens so much, we do it so often and without thinking, but is it really needed?


Think of music and the usual categories: classical, jazz, pop, rock, worldmusic, and so on. Does that inform you? Often this kind of information is not implicit in the source, but rather a subjective assignment of the source to some class.


I think we should focus on more objective metainformation which is implicit in the source. It is important and useful to extract that. From it, we can also create meaningful categories like

music created in the 1930's in New York city by people with Jewish names.

For semantic browsing this might be much more interesting information than an isa-hierarchy of musical genres.

Examples


Uitburo


A list of cultural events. The when and where dimensions are very important to use the information. The who and what dimension is important to understand and to make a choice. The what dimension is explicitly given in the data. It looks like the when and where dimensions are also there already, but as Bas de Beer discovered they are 1) very incomplete and 2) not very reliable.


User generated content: blogs and reactions to news articles


Here it seems most important to try to find out

  • what is the blog or reaction about? Often an event.
  • what is the opinion of the author on that event?

The when, where and who dimensions seem to be very useful in discovering the event. Extracting opinions is yet another matter.


Historical archive of (news)articles


Here the when, where and who dimensions could be useful in making special semantic browsing facilities through the enormous set S. If we have this metadata we can think of cool browsing and connecting tools:

  • putting articles on timelines
  • connecting them through Google earth
  • connecting them through main players, possible with pictures
  • ...

Again, also we might connect articles to events through this kind of metadata (often this seems enough: e.g., on France, Sarkozy, May 2007 Google returns nicely the right event.)


Historical archive of manifestos


Inspired by Lipschits' beautiful indices, we mostly focused on creating these. Which is a kind of categorization, but admittingly much more objective. But I see no reason not to do the simple things first.

Opdracht 4 Google Docs

  • Ga in een nieuw venster of een nieuwe tab naar http://docs.google.com, en log in met je Google account. Dit had je al, of dit heb je net aangemaakt in de eerste blogopdracht.
  • Klik op nieuw document, kopieer deze opdracht met Ctrl C en plak hem in de tekstverwerker met cntrl V. Sla hem op onder de naam opdracht4.