Document View - ProQuest

Conceptual data structures for personal knowledge management

Max Völkel, Heiko Haller. Online Information Review. Bradford: 2009. Vol. 33, Iss. 2; pg. 298

Abstract (Summary)

The purpose of the paper is to design a model and tools that are capable of representing and handling personal knowledge in different degrees of structuredness and formalisation, and usable and extensible by end-users. This paper presents the results of analysing literature and various data models and formalisms used to structure information on the desktop. The unified data model (CDS) is capable of representing structures from various information tools, including documents, file systems, hypertext, tagging and mind maps. The five knowledge axes of CDS are identity, order, hierarchy, annotation and linking. The CDS model is based on text. Extensions for multimedia annotations have not been investigated. Future personal knowledge management (PKM) tools should take the mentioned shortcoming of existing PKM tools into account. Implementing the CDS model can be a way to make PKM tools interoperable. This paper presents research combining cognitive psychology, personal knowledge management and semantic web technologies. The CDS model provides a way to let end-users work on different levels of granularity and different levels of formality in one environment.

» Jump to indexing (document details)

Full Text

(7596 words)

Personal knowledge management

Edited by Dr David Pauleen

Introduction

The most important contribution of management in the 20th century was to increase manual worker productivity fifty-fold. The most important contribution of management in the 21st century will be to increase knowledge worker productivity - hopefully by the same percentage. [...] The methods, however, are totally different from those that increased the productivity of manual workers ([10] Drucker, 1999a, p. 79).

What might these methods be? Since 1995 ([38] Stankosky, 2005) the field of knowledge management has investigated how people and knowledge work together. [30] North (2007) defined knowledge work as work based on knowledge with an immaterial result; value creation is based on processing, generating and communicating knowledge. [33] Polanyi (1966) made a distinction between explicit knowledge encoded in artefacts such as books or webpages, and tacit knowledge that resides in the individual. The SECI model of [27] Nonaka (1994) describes knowledge transfers between humans and artefacts.

The field of knowledge management has focused on knowledge transfer between people, either via socialisation or via externalised artefacts. The high expectations of central enterprise knowledge repositories have often not been met ([3] Braganza and Mollenkramer, 2002). The following wave of expert finders and corporate white pages focused mostly on connecting the right people and letting them communicate.

Today, knowledge workers are flooded with information ([1] Alvarado et al. , 2003). The field of personal information management (PIM) aims to help individuals to manage all artefacts in the personal space of information, which "includes all the information items that are, at least nominally, under that person's control (but not necessarily exclusively so)" ([18] Jones and Bruce, 2005, p. 9). Recently, more research is being focused on the individual knowledge worker, establishing the field of personal knowledge management (PKM):

- The knowledge-based organisation is no more effective than the sum of its knowledge workers ([7] Davenport, 2005).

- One should focus on the individual and give individual users incentive and benefit before focusing on the social network ([31] Oren, 2006).

- [34] Schütt (2003) defines knowledge workers based on the works of [11] Drucker (1999b) and [39] Taylor (1911) - simplified, workers (doing) are instructed by managers (thinking). These managers have to manage themselves. This self-managing is considered an important characteristic of the knowledge worker - they have to manage themselves because their tasks are constantly changing. Increasing knowledge worker productivity has to be a company's main goal, not storing documents in databases.

Seminal articles by [4] Bush (1945) and [12] Engelbart (1963) describe tools that allow an individual to work more efficiently and effectively with external representations of knowledge.

In knowledge work, people are frequently confronted with two limitations of the human mind - long-term memory recall and short-term memory capacity. Limits of the long-term memory can be overcome partially with tools to help the remembering or reconstructing of knowledge. Human short-term memory can hold only around seven objects at a time ([24] Miller, 1956). For user interfaces, [37] Shneiderman (1998) advises us to "Do everything possible to free the user's memory burden" (p. 75). Interestingly, this other limitation can also be partly relieved by using external knowledge representations, such as by taking short notes, or drawing a diagram or mind-map that helps us keep an overview of a somewhat larger set of items and quickly brings each single one into full recall on demand. We conclude that both of these very prominent cognitive limitations can be addressed by providing an adequate external knowledge representation tool.

[29] Nonaka and Takeuchi (1995) distinguished two kinds of knowledge:

explicit; and

tacit (internal).

Later works ([9] Despres and Chauvel, 2000; [28] Nonaka and Konno, 1998) concluded that external and tacit knowledge are actually two extremes on a spectrum. [23] Maurer (1999) stated that knowledge resides in the heads of people and that the computer can only store "computerised knowledge" which is to be understood as "shadow knowledge", a "weakish image" of the real knowledge (p. 12).

In PKM, we often deal with knowledge that is somewhere in the middle of these extremes. Note-taking, for example, is a core activity of PKM where an individual creates an external representation for internal concepts. Later, the external representation is internalised again to re-activate the knowledge in the individual's mind. If somebody writes a short informal note to himself it is often completely meaningless to others. The knowledge is thus not fully externalised - yet this note is an external reminder about some knowledge that the author would otherwise forget.

Research goals

The goal of this research was to find a general representation model for PKM tools. Different from other models, the model we were looking for is not meant to be hidden behind a yet-to-be defined user interface, but to be exposed as directly as possible to the end-user. The model should be easy to learn and it should be possible to import, represent and handle a number of existing knowledge organisation formalisms. In a more colloquial way, we were looking for a model that can do with concepts what spreadsheets can do with numbers.

Scenario

Today's knowledge workers are confronted with an overwhelming amount of information. Sometimes information is sent to the knowledge worker (e.g. by e-mail or RSS feeds), found by chance (e.g. while meeting somebody in an airplane), or actively researched (e.g. in the library). A typical information professional could be a business analyst reviewing AJAX-frameworks in Web 2.0 start-ups, a biologist researching where sharks live and how and why their populations change, a lecturer in French history or a lawyer specialising in environmental law. The running example in this paper will be a biologist called Linda who is writing a paper on great white sharks for a conference in Italy.

The information typically encountered by knowledge workers is either of a self-management nature, such as tasks and appointments or contact data, all of which is well covered by existing specialised PIM tools, or information belonging to the knowledge worker's domain of interest. Since the structure of this domain-specific information is typically unique and often even undergoing thorough change, there are often no specialised tools available that would support handling this information in a way appropriate for its structure. Popular generic tools are spreadsheets, text documents, slides and the file system. However, none of them would let Linda collect material about shark populations, reasons for their growth or decline, different shark species, shark hunting strategies, etc., in an integrated way. In a text file she would probably lose the overview and in a spreadsheet she would not be able to represent relations between shark species. If she used both tools, she could not easily refer to a specific cell in the spreadsheet from the document (as just one example of the disadvantages).

The two main tasks we intend to support are structured note-taking and document creation. The core process from notes to a document can be described as steps in a knowledge maturing process ([22] Maier and Schmidt, 2007). Here the question of granularity arises. [35] Shneiderman (1989) found that users are better able to answer questions when a text is modelled as more-fine-grained (46 articles) hypertext instead of large chunks (five articles). Granularity is also an important cost-driver for PKM and a PKM system is only of value to a user if it provides more benefit in delivering relevant information than the cost of using it, that is, externalisation, refinement and search ([41] Völkel and Abecker, 2008). We concluded that a PKM system should be able to represent a whole range of granularity from short items (e.g. notes) to longer items (e.g. emerging documents).

Research design

In order to create a lean vocabulary for incremental recording and step-wise formalisation of personal knowledge, we conducted an extensive analysis of existing models and tools widely used to record, structure and communicate knowledge. We identified a set of common knowledge structures found to be inherent in most knowledge artefacts - ranging from vague paper notes and books, hypertexts and folksonomies to highly structured documents and even taxonomies and most ontologies.

In order to allow gradual transitions between various degrees of formalisation, the types of these structural relations were modelled hierarchically as a lightweight, top-level ontology of general relation types by subsuming the more specific ones under those they semantically imply. This resulted in the conceptual data structures (CDS) model ([43] Völkel et al. , 2008).

A first open source CDS back-end was implemented on the semantic web content repository (SWCR; [40] Völkel, 2007). On top of this CDS back-end, three user interface prototypes have been realised. Additionally, we ran a small user survey to refine and extend requirements proposed in state-of-the-art literature.

We first briefly present the CDS model and then in greater length the tools, while also explaining how Linda could use these tools in her daily work.

CDS model

The CDS model consists of two parts - a data model and a set of core relations found most often in existing models and tools used today for PKM.

CDS data model

The CDS data model consists of:

- items of unstructured text, structured text, images or other content;

- names;

- statements between items of all kinds; and

- relation types.

Formally, we have a Model M , which is a set of Items . Each Item has a unique identifier (a URI) that makes it globally addressable. Each Item belongs to exactly one Model , has a creation date, a last modified date and an author. There are four kinds of Items a user can use, described below.

A ContentItem represents a piece of addressable content. Content may be textual or binary. Binary content is defined as on the web ([17] Jacobs and Walsh, 2004), that is, having an encoding, MIME-type and length measured in bytes. Textual content in CDS has by default UTF-8 encoding and may use some formatting using the structured text interchange format (STIF), as defined in [43] Völkel et al. (2008).

A NameItem models a term of the user's vocabulary. The name of the NameItem must be unique within a Model . There may be two or more ContentItems having the same content as a NameItem or as another ContentItem .

NameItems allow URIs to be completely hidden from user interfaces. In this respect, they are similar to the titles of wiki pages, for example. Note that NameItems represent only the name itself. For example, a wiki page can be modelled as two Items - a NameItem to represent the wiki page title and a ContentItem to represent the wiki page content. The NameItems can be used as generic named containers, tags or formal types. NameItems allow jumping directly into certain nodes of the Model , similar to using known URLs to start browsing the web.

A Relation is a special kind of NameItem . Relations are used in Statements , which are explained in the next paragraph. Each Relation has a mandatory inverse Relation .

A Statement connects Items. A Statement is always of the form ( Item , Relation , Item ). As a Statement is itself an Item , the user can annotate Statements as well - a handy feature, for example, for discussion systems.

It is possible for different statements with the same URI to assert the same triple. But it is not possible for two different Statements (differing in source, relation or target) to have the same URI. For every Statement ( s , p , o ), the inverse Statement ( o , p , s ) is inferred, where p is the inverse of p . This is handy for user interfaces, which allow browsing of items in both directions.

CDS relation type hiearchy

The second part of the CDS model is a set of built-in Relations , which are repeatedly occurring across different knowledge organisation tools and models. The Relations are arranged in an inheritance hierarchy so that the Relations with more specific semantics imply those with broader semantics.

The five core relation types deal with identity ( similar to , same as , has alias ), order, hierarchy, different forms of annotation (i.e. free-text annotations, tagging and formal typing) and generic hyperlinks. As the relation hierarchy is represented in the CDS data model (see Figure 1 [Figure omitted. See Article Image.]), the user can (and is expected to) extend it.

The root type of the relationship hierarchy is related . Every Item is related to another Item , if any kind of Relation has been stated. This Relation allows very vague knowledge to be stated - "these items are related, but I can't tell why or don't want to spend the time to refine this now". The next level in the hierarchy is either similar - to link items that describe the same real-world entity, or has target - to interlink different items. Has target models a generic, directed hyperlink as it is found in the web, references in documents or links in the file system. The CDS model contains three built-in refinements for has target , which user interfaces should treat specially.

Has after and its inverse relation, has before , model any kind of ordering relation. It might be order in space, time or by other means.

Has detail and its inverse, has context , represent any kind of hierarchy and nesting. This relation represents hierarchies in a generic way, for example, part-whole relations or type hierarchies are considered special cases of this relation. Types can be arranged in a type inheritance hierarchy, like classes in an ontology or programming language.

Both ordering and hierarchies are most often used among items that have the same type. To represent links between items of different modelling layers, CDS uses has annotation and its sub-relations.

Together with the hype around Web 2.0, tagging became popular for assigning easy-to-type keywords to items. In CDS, tagging is treated as a kind of annotation, hence has tag is a sub-relation of has annotation .

Assigning items a formal type is accomplished with the relation has type . In CDS, has type is a sub-relation of has tag , which leads to the desired effect that, for example, a species of shark that is typed as a carnivore implies it is also tagged as carnivore .

Tools based on CDS

In this section we present three prototypes of tools based on the CDS model, which have been developed within the NEPOMUK project (see http://nepomuk.semanticdesktop.org/).

Hypertext Knowledge Workbench

The Hypertext Knowledge Workbench (HKW) resembles a semantic wiki, but without the tight coupling of one title to one page. HKW is different from semantic wikis in that:

- it is backed by the more flexible CDS model;

- it allows formal statements to be created and changed easily; and

- it integrates authoring, structuring and formalisation into the retrieval.

Figure 2 [Figure omitted. See Article Image.] shows a screenshot of the graphical interface user (GUI - try it online or download from http://cds.xam.de) focusing on the NameItem "Great white shark". The screen is divided into seven coloured areas, showing related Items . More formally, for a centred Item i the GUI shows a dynamic view for the query ( i , *, *), including all inverse statements and inferred triples.

Below the "Great white shark" Item , HKW shows the Items related via the relation has detail . For example, the statement "Great white shark"-"maximum length"-"6 m" is rendered here. This tells the user (Linda) that "maximum length" is a sub-relation of has detail . Behind the word "6 m" there are icons allowing Linda to navigate to the Statement "Great white shark"-"maximum length"-"6 m". In a Statement view, the Statement can be changed - for example, Linda can change the Relation or create a new source or target. Auto-linking is supported wherever possible.

Alternatively, she can delete this statement or create a new Item at the location (Great white shark, maximum length) by pressing the plus icon. This allows new semantically interlinked items to be created easily. If the user enters a longer text or uses line breaks the system assumes the user is creating a ContentItem . For short text, the system suggests existing NameItems or creates new ones.

Items related to the selected Item via has context (the inverse relation of has detail ) are shown above the "Great white shark" item. The other coloured boxes represent other CDS core relations. The GUI always shows relations in their most specific box. Items are only presented in different boxes at the same time if the user assigned multiple super-relations to a relation.

QuiKey

QuiKey is a kind of smart semantic command-line that focuses on highest interaction-efficiency to browse, query and author CDS-based knowledge bases in a step-by-step manner. It combines ideas of simple interaction techniques like auto-completion, command interpreters and faceted browsing, and integrates them to a new interaction concept. QuiKey forms a generic, extensible user interface for CDS models. Despite its versatility, QuiKey needs very little screen space, which also makes it a candidate for future mobile use. (QuiKey is described in more detail in [43] Völkel et al. , 2008.) A screenshot of its current implementation is depicted in Figure 3 [Figure omitted. See Article Image.].

iMapping

iMapping is a technique for visually structuring information objects. It supports the full range from informal note-taking and semi-structured personal information management to formal knowledge models. With iMaps, users can easily go from overview to fine-grained structures while browsing, editing or refining the knowledge base in one comprehensive view. An iMap is comparable to a large white-board where information items can be positioned like post-it notes, but also nested into each other. Spatial browsing and zooming as well as graphical editing facilities make it easy to structure content in an intuitive way. iMapping builds on a zooming user interface approach to facilitate navigation and to help users maintain an overview in the knowledge space. (The iMapping approach and its motivations and foundations are described in more detail in [15] Haller, 2006.) A small sample map mock-up is depicted in Figure 4 [Figure omitted. See Article Image.].

Analysis and requirements

We asked 27 participants (of whom 17 were researchers, largely in the field of computer science) to complete an online survey called "What should a personal knowledge management (PKM) tool do for you?"

Information types

In response to the survey, users mentioned all kinds of information types to be tackled by a PKM tool. One user summarised the situation as:

A broad amalgam of scientific papers, non-scientific articles, URLs and other documents, IM conversations, emails and personal notes comes in daily, forming sediments of data on my disk.

An exhaustive list of all information types mentioned by users includes:

- scientific papers (2×);

- non-scientific articles (1×);

- bookmarks (5×);

- instant messaging conversations (1×);

- e-mails (2×);

- personal notes (2×);

- social networks (1×);

- scans (1×);

- pictures (1×);

- online documents (1×);

- account information (e.g. bank account number - 1×);

- topics (1×);

- how-tos (e.g. how to set a classpath in Java - 1×);

- mathematical knowledge (1×);

- contacts (2×);

- presentations (1×);

- projects (i.e. their notes, time plans, accounts, ideas, etc. - 1×);

- concept maps (1×);

- tools (1×);

- ideas (1×);

- events (1×);

- recipes (1×);

- favourite teas (1×);

- tax information (1×); and

- to-dos (1×).

Tasks

In the online survey, people mentioned a number of diverse tasks: note taking, paper writing, birthday reminders, organising to move to another country, strategic planning, scientific research, and consultation with friends and colleagues. Only the task "paper writing" ([13] Esselborn-Krumbiegel, 2002) was mentioned more than once (five times). One user summarised this as:

... locate the information by keyword, date, other metadata or by tracing a path of discovery, then attributing the source correctly, and communicating in a universally readable format.

Functional requirements

In the remainder of this section we discuss functionality requirements and their mapping to CDS and tools based on it. The basic processes in PIM have been identified ([18] Jones and Bruce, 2005) as:

- keeping (i.e. input of information into a personal space of information);

- finding or re-finding (i.e. output of information from a personal space of information); and

- meta-activities (e.g. mapping between information and need, maintenance and organisation).

For note-taking a user in our survey wrote:

A PKM tool should help me aggregate, collect and view all the small bits of information, which are either needed for long term reference, or in the short term for completing a task.

Easily find things

At the heart of PKM is the requirement to easily find things that are stored in the PKM tool. Although the information stored in items and their relations is itself often not self-contained, it might suffice to remind the user of the knowledge that was present when the information was entered.

In HKW, Linda can, for example, retrieve the year in which the movie Jaws was shown if she remembers the director "Stephen Spielberg". She can enter "Steph" and get an auto-completion list containing "Stephen Spielberg". From there, she would probably look under the hyperlink "has directed" to see the title " Jaws ". After clicking on it, she can see the details of the movie, including the release date of 1975. In general, if the item to be found is not a NameItem , HKW allows her to browse associatively from a known entity to the desired one, just like in a wiki. However, navigation in HKW is expected to be faster because links are already grouped into different cognitive dimensions (ordering, hierarchical, typing or other links).

In the iMapping prototype, she can first zoom in to the upper right corner for "private stuff". In there, she can go to her "Movies" item and browse. As there are many movies, she can first go into the "Best Directors Ever" item inside the "Movies" item. In there, she can select "Stephen Spielberg" and see all outgoing links labelled with the type. Although there are quite a few links, she can just look at links pointing at movies and can quickly identify the Jaws movie. After zooming onto it she can see "1975" inside the " Jaws " item. On hovering over it, a relation "release date" is shown between the outer " Jaws " item and the inner "1975" item. iMapping allows an item to be found based on spatial proximity simply by moving around in the infinite 2D space.

Fast entry of new items and extension of existing items: the survey revealed a desire for the fast entry of new items (mentioned twice in the survey) as well as an easy way to extend existing items (also mentioned twice). [31] Oren (2006) advises a focus on simply capturing and representing the things that the user wants to store, before doing any reasoning with it.

QuiKey is the fastest tool by means of mouse clicks and keys typed for entering data. With one shortcut the tool is brought into focus. Now Linda can simply enter a new short note such as "reproduction is slow, with sexual maturity occurring at about 12-15 years of age". Alternatively she can write "white shark" < tab>, "sexual maturity" < tab>, "12-15 years" < return>. This will add a CDS statement to her PKM knowledge base, extending the existing white shark items, without requiring her to navigate anywhere first. New items and relations are created on the fly if needed. Re-use of existing items and relations is encouraged with auto-completion. After she has written "white shark" < tab>, QuiKey will present her with a ranked list of existing statements about the white shark. Thus QuiKey also includes browsing of the knowledge base.

Grouping of items

The next set of features required is centred on grouping of items. The tool should be able to let "me group seemingly unrelated content" (survey). Users need composition for navigation ([14] Frank, 1988). This allows browsing, for example, thereby narrowing down the user's view, and allows the discovery of related yet unexpected items. In the iMapping prototype, Linda can, for example, simply move the item "basking shark" next to "white shark", as both shark types have similar body shapes. This would not introduce any kind of statement in the underlying CDS model, but would help her to remember associations with minimal modelling effort.

Named containers

Survey respondents also demanded that it should be easy to place new items into a named container. However, [14] Frank (1988) advised not to require a user to name all items. Consider linking several contacts in Linda's address book to the same postal address (for example, all 20 people working for a non-governmental underwater-life protection organisation). In this case it would be too much work to assign the address of the office a dedicated name. Yet it would also be cumbersome to have to change the entries of all those people if the postal address of the NGO changed. Therefore users should be allowed but not required to give entities a name.

CDS accomplishes this with NameItems , which are unique within a knowledge model and which can be used as contexts (i.e. like document folders), tags, types or anything. The three prototype tools support consistent re-use of NameItems by offering auto-completion features. Conceptually, the relationship types are also names from the same namespace; that is, there cannot be a relation named "knows" that is something different from a NameItem " knows ". One NameItem represents only one thing, although multiple NameItems can represent the same thing (synonyms).

In iMapping, Linda can create an item called "fish eaten by white shark" and put inside it items named "rays", "tuna" and "smaller sharks". She can simply click inside an existing item and thereby get a cursor to enter the text of a newly created child item. She could also create the three fish items first and then add a new item and drag the fish items into it.

In HKW, the intended way would be to navigate to "white shark" and click on "add" in the "has detail" panel. Linda would get a pop-up window with two fields - the first pre-filled with the text "has detail" and the second has the user input focus. She could enter just some text " x " now and thereby add the statement (white shark, has detail, x ). Instead she decides to create a new sub-relation of "has detail" by typing "eats" into the first field. In the second field she types "rays" and presses < submit>. This creates the two statements (has detail, has sub-relation, eats) and (white shark, eats, rays). HKW now shows "eats" as a sub-relation of "has detail". She clicks on the "add" icon of "eats" and enters "tuna" < submit> and repeats this for "smaller sharks".

Categories

In the survey, users preferred categories over strict hierarchies (mentioned three times). All three CDS-based tools allow multiple parents; that is, an item can have several tags, types, annotations or contexts. A relation can have several super-relations. HKW presents the relationship inheritance graph as a flat tree and some relations appear simply as children of several other nodes.

Context

According to the survey, users wish it to be clear "which data is from my personal information sphere and which is coming from outside". This is in line with [31] Oren (2006) - understand the notion of context, capture it together with the information and use it to enhance recall and understanding. The CDS back-end records for each item the creation date and the author who entered it into the system. Items created by the system are marked with a different author.

Links

The next set of required features deals with explicit links between items. [31] Oren (2006) summarises:

- exploit the interlinked nature;

- do not rely only on search; and

- allow people to associate freely.

Three users required links between items, for example, a link between the tasks "buy food for dog" and "bring dog to veterinary".

The link is one of the four core CDS types and all three prototypes build on it. In iMapping users can drag-and-drop typed links between items. In HKW the user can even annotate, tag and link the links themselves.

Order

Ordering a collection of ideas or text snippets into a coherent flow is one of the main tasks of authoring ([13] Esselborn-Krumbiegel, 2002). A user should be able to create order gradually, for example, by stating order between some sections, but not requiring a total ordering.

Partial or total order is one of the four core dimensions in CDS. It is supported by HKW, which, for example, allows Linda to state explicitly that section A of her paper comes before B and before C, but that there is no relation yet between B and C. This allows Linda to add order gradually and consciously to her article outline without mentally keeping track of what is at some place in the list because it was explicitly put there and what is just currently there while it is being sorted.

Hierarchy

Hierarchies of all kinds are commonly used in user interfaces to let the user narrow down their interests step-by-step. Users need ways to see multiple levels of detail at once ([14] Frank, 1988). [36] Shneiderman (1996) emphasises the need to get "Overview first, zoom and filter, then details-on-demand" (p. 336). Users in our survey required the ability to hide the level of detail to get an overview of the content. Others wished a graphical overview that represents connections and interactions between notes.

CDS has a built-in relationship type to represent hierarchies - "has context" and "has detail". HKW allows three level of a hierarchy to be seen - current item, context of that item and details of that item. iMapping allows an infinite number of levels to be seen at once, limited only by screen resolution.

Transclusion

Users often lose the structure of knowledge cues when transforming from one tool to another. For example, text snippets from a hypertext context lose their identity when pasted into a document. Instead of copying the value of an item, it is more elegant to copy a reference to the item. If the content item is changed, the change is reflected in all parts where it is embedded. Embedding a reference and rendering the content is called transclusion. The need for transclusion is explained by [21] Ludwig (2005) and [25] Nelson (1995).

CDS makes it easy to reference all parts of a model, as each item has a globally unique URI. There is currently no tool support implemented for this in the prototypes.

The last set of requirements deal with adding and using more structure and semantics to the items.

Flexible schema

In a survey conducted by [31] Oren (2006) users expressed a requirement for flexible schemas - leave users their freedom and do not constrain them into rigid schemas.

The CDS model can be used to simply capture the semantics on a level of node-and-arc diagrams. Items represent the nodes and labelled arcs can be represented with a statement, where the content of the statement is the label of the arc. Arcs can be undirected and use the CDS built-in relationship type "related". Directed arcs are modelled with "has target". There is also the option to create more formal relationship types in lower levels of the relationship inheritance hierarchy.

Structuring

One of the most often requested features (five survey respondents) was support for (re-)structuring existing structures: a PKM tool should help to structure and sort items, be easy to restructure, help to move from unstructured to more structured, organize pieces of larger text, and help to categorize items according to existing filing schemes such as taxonomies, tags, vocabularies and ontologies (survey).

QuiKey is not well suited to restructuring existing knowledge. HKW allows existing statements to be refined or changed, for example, by navigating to a statement and changing the relationship type into something different, such as something more specific. By default the auto-completion shows only refinements of the currently selected relation.

In iMapping, the user gets a graphical, zoom-able overview of all their items and can simply structure them by drag-and-drop, as on a physical pin board. After grouping related items together and moving them inside another item, a number of items can efficiently be manipulated at once. In this regard, iMapping has the same restructuring capabilities as, for example, mind-mapping ([5] Buzan, 1991) tools, but with the added value of spatial hypertext; that is, the positions of items are chosen by the user, which allows the creation of very lightweight "piles" of related items, just like on a physical desk.

Search and query

Besides browsing, a user also needs the ability to search and query the data ([14] Frank, 1988). The CDS back-end offers queries that let Linda exploit her modelling effort. In QuiKey she could ask ((white shark, has sense, ? x ) AND (NOT(human, has sense, ? x ))) to find out that white sharks have a sense for electrical fields that humans do not. In systems that do not allow formalising of content she would potentially have to read many articles and build up the query results in her mind. Semantic queries are especially useful for creating lists of items fulfilling certain criteria. (More details about the user interaction for creating such queries and the query language itself can be found in [16] Haller, 2008.)

Formal knowledge

CDS allows one to not only structure but also to formalise knowledge. This allows Linda to retrieve "white shark" using an expressions like (? x , has type, Lamniformes) although she never told the system that this is true. She might just have entered (white shark, has type, Lamnidae) and also (Lamnidae, has supertype, Lamniformes). This allows the CDS back-end to deduce - using a standard RDFS reasoning engine - that white sharks also belong to the Lamniformes. It remains the responsibility of the user to decide which content should be formalised up to what degree.

Interoperability

To re-use the data in other systems (particularly other knowledge management systems), Linda needs to export all items and structures into a common format. The Resource Description Format (RDF; [6] Brickley and Guha, 2004) defines an extensible, graph-based model for integrating distributed, heterogeneous information sources. The CDS back-end represents all data (besides binary content) natively as RDF data.

Related work

Semantic wikis are designed and used not only for collaborative use but also for PKM (Oren, 2005, cited in [8] Decker et al. , 2005; [32] Oren et al. , 2006). Semantic wikis allow stepwise formalisation of content. First, a page is created, then it is filled with text, spell-corrected, structured, restructured, and linked to other pages. Then links are typed and pages linked to categories. Ironically, just as with paper-based approaches, changing things is not that easy in semantic wikis. Tasks such as renaming a relation typically require an administrator to run scripts over the database, as the wiki source text of many pages needs to be changed. Second, a common use-case of PKM tools is the need to import knowledge from external sources. In most semantic wikis, the importation of semantic data needs to be represented by artificially generated wiki syntax inserted into pages, which does not integrate easily with existing content.

[21] Ludwig (2005) sees redundancy within and among documents as a hurdle to efficient information usage. He questions whether documents are the best container for knowledge representations and proposes to work more directly with redundancy-free semantic knowledge management systems. In such a system, the traditional notion of a document is replaced by virtual documents, which render parts of the knowledge base as an interactive tree.

[2] Bernstein (2006) describes TinderBox, a personal content management assistant, which offers sophisticated HTML generation via templates.

Both systems ([2] Bernstein, 2006; [21] Ludwig, 2005) allow end-users to construct ontologies out of their linked information objects. The same direction can be observed in the larger fields of semantic desktop ([8] Decker et al. , 2005) and semantic wiki ([42] Völkel and Schaffert, 2006).

Conclusions

In an attempt to create a lean vocabulary for incremental recording and step-wise formalisation of personal knowledge, we identified a set of common knowledge structures. These conceptual data structures (CDS) were found to be inherent in a variety of different knowledge artefacts, ranging from vague paper notes to highly structured documents.

The CDS model allows knowledge to be gradually represented in various degrees of formalisation in a uniform fashion. As a lightweight top-level ontology about relation types, CDS is designed to bridge the gap between unstructured content (e.g. informal notes) and formal semantics (e.g. ontologies) by allowing the use of vague semantics and by subsuming arbitrary relation types under more general ones.

It serves two purposes:

as a guideline for future PKM tools, providing a set of crucial structural primitives; and

the RDF-based representation of CDS can serve as a knowledge exchange format.

The three prototypical CDS front-end tools described here show the variety of visualisation and interaction paradigms that can benefit by using this common data model.

Part of this work has been funded by the European Commission in the context of the IST NEPOMUK IP - The Social Semantic Desktop (FP6-027705). Another part of this work has been undertaken in "WAVES - Wissensaustausch bei der verteilten Entwicklung von Software" (see http://waves.fzi.de/), funded by BMBF, Germany. The views expressed in this paper are the views of the authors but not necessarily the views of any sponsor.

[Reference] » View reference page with links

1. Alvarado, C., Teevan, J., Ackerman, M.S. and Karger, D. (2003), "Surviving the information explosion: how people find their electronic information", Technical Report AIM-2003-006, MIT AI Lab, Cambridge, MA.

2. Bernstein, M. (2006), "Shadows in the cave: hypertext transformations", paper presented at the Symposium on Interactive Visual Information Collections and Activity (IVICA), College Station, TX, 1-2 October, available at: www.csdl.tamu.edu/ivica/papers/L2-2-Bernstein.pdf (accessed 15 January 2009).

3. Braganza, A. and Mollenkramer, G.J. (2002), "Anatomy of a failed knowledge management initiative: lessons from PharmaCorp's experiences", Knowledge and Process Management, Vol. 9 No. 1, pp. 23-33.

4. Bush, V. (1945), "As we may think", Atlantic Monthly, Vol. 176, pp. 101-8.

5. Buzan, T. (1991), Use Both Sides of Your Brain: New Mind-mapping Techniques, 3rd ed., Plume, New York, NY.

6. Brickley, D. and Guha, R. (2004), "RDF vocabulary description language 1.0: RDF schema", available at: www.w3.org/TR/rdf-schema/ (accessed 16 December 2008).

7. Davenport, T.H. (2005), Thinking for a Living: How to Get Better Performances and Results from Knowledge Workers, Harvard Business School Press, Boston, MA.

8. Decker, S., Park, J., Quan, D. and Sauermann, L. (Eds) (2005), The Semantic Desktop - Next Generation Information Management and Collaboration Infrastructure, Vol. 175, CEUR Workshop Proceedings, ISWC Workshop, Galway.

9. Despres, C. and Chauvel, D. (2000), Knowledge Horizons: The Present and Promise of Knowledge Management, Butterworth-Heinemann, Boston, MA.

10. Drucker, P.F. (1999a), "Knowledge-worker productivity: the biggest challenge", California Management Review, Vol. 41 No. 6, pp. 79-94.

11. Drucker, P.F. (1999b), Management Challenges for the 21st Century, HarperBusiness, New York, NY.

12. Engelbart, D. (1963), A Conceptual Framework for the Augmentation of Man's Intellect, Spartan Books, Washington, DC, pp. 1-29.

13. Esselborn-Krumbiegel, H. (2002), Von der Idee zum Text. Eine Anleitung zum wissenschaftlichen Schreiben, 2nd ed., Schöningh-Utb, Paderborn.

14. Frank, G.H. (1988), "Reflections on notecards: seven issues for the next generation of hypermedia systems", Communications of the ACM, Vol. 31 No. 7, pp. 836-52.

15. Haller, H. (2006), "iMapping - a graphical approach to semi-structured knowledge modelling", paper presented at the 3rd International Semantic Web User Interaction Workshop (SWUI2006), Athens, GA, 6 November, available at: www.aifb.uni-karlsruhe.de/WBS/hha/papers/iMapping_SWUI2006_paper.pdf (accessed 15 January 2009).

16. Haller, H. (2008), "QuiKey", Proceedings of the Workshop on Semantic Search at the 5th European Semantic Web Conference, Vol. 334, pp. 74-8.

17. Jacobs, I. and Walsh, N. (2004), "Architecture of the World Wide Web, Volume One", W3C, available at: www.w3.org/TR/webarch/ (accessed 16 December 2008).

18. Jones, W. and Bruce, H. (2005), "A report on the NSF-sponsored workshop on personal information management", report, Seattle, WA, 27-29 January, available at: http://pim.ischool.washington.edu/final%20PIM%20report.pdf (accessed 15 January 2009).

21. Ludwig, L. (2005), "Semantic personal knowledge management", Technical Report D11.01_v0, DERI, Galway.

22. Maier, R. and Schmidt, A. (2007), "Characterizing knowledge maturing: a conceptual process model for integrating e-learning and knowledge management", in Gronau, N. (Ed.), 4th Conference Professional Knowledge Management - Experiences and Visions (WM '07), Potsdam, Vol. 1, GITO, Berlin, pp. 325-34.

23. Maurer, H. (1999), "The heart of the problem: knowledge management and knowledge transfer", Proceedings of ENABLE'99, Espoo-Vantaa Institute of Technology, Vantaa, pp. 8-17.

24. Miller, G. (1956), "The magical number seven, plus or minus two: some limits on our capacity for processing information", Psychological Review, Vol. 63, pp. 81-97.

25. Nelson, T.H. (1995), "The heart of connection: hypermedia unified by transclusion", Communications of the ACM, Vol. 38 No. 8, pp. 31-3.

27. Nonaka, I. (1994), "A dynamic theory of organizational knowledge creation", Organization Science, Vol. 5 No. 1, pp. 14-37.

28. Nonaka, I. and Konno, N. (1998), "The concept of "ba": building a foundation for knowledge creation", California Management Review, Vol. 40 No. 3, pp. 40-54.

29. Nonaka, I. and Takeuchi, H. (1995), The Knowledge-creating Company: How Japanese Companies Create the Dynamics of Innovation, Oxford University Press, New York, NY.

30. North, K. (2007), "Produktive Wissensarbeit", paper presented at the 5. Karlsruher Symposium für Wissensmanagement in Theorie und Praxis, Karlsruhe, 11 October.

31. Oren, E. (2006), "An overview of information management and knowledge work studies: lessons for the semantic desktop", paper presented at the Semantic Desktop and Social Semantic Collaboration Workshop, the International Semantic Web Conference, Athens, GA, 6 November.

32. Oren, E., Völkel, M., Breslin, J.G. and Decker, S. (2006), "Semantic wikis for personal knowledge management", Database and Expert Systems Applications, Vol. 4080/2006, Springer, Berlin/Heidelberg, pp. 509-18.

33. Polanyi, M. (1966), Tacit Dimension, Routledge and Kegan Paul, London.

34. Schütt, P. (2003), "The post-Nonaka knowledge management", Journal of Universal Computer Science, Vol. 9 No. 6, pp. 451-62.

35. Shneiderman, B. (1989), "Reflections on authoring, editing, and managing hypertext", in Barrett, E. (Ed.), The Society of Text: Hypertext, Hypermedia, and the Social Construction of Information, MIT Press, Cambridge, MA, pp. 115-31.

36. Shneiderman, B. (1996), "The eyes have it: a task by data type taxonomy for information visualizations", VL '96: Proceedings of the 1996 IEEE Symposium on Visual Languages, IEEE Computer Society, Washington, DC.

37. Shneiderman, B. (1998), Designing the User Interface, Addison Wesley, Reading, MA.

38. Stankosky, M. (2005), Creating the Discipline of Knowledge Management: The Latest in University Research, Butterworth-Heinemann, New York, NY.

39. Taylor, F.W. (1911), The Principles of Scientific Management, Harper and Brothers, New York, NY.

40. Völkel, M. (2007), "A semantic web content model and repository", Proceedings of the 3rd International Conference on Semantic Technologies, JUCS, Graz, pp. 254-61.

41. Völkel, M. and Abecker, A. (2008), "Cost-benefit analysis for the design of personal knowledge management systems", in Cordeiro, J. and Filipe, J. (Eds), ICEIS 2008 - Proceedings of the Tenth International Conference on Enterprise Information Systems, Volume AIDSS, Barcelona, June 12-16, Springer, Berlin, pp. 95-105.

42. Völkel, M. and Schaffert, S. (Eds) (2006), Proceedings of the First Workshop on Semantic Wikis - From Wiki To Semantics, FZI Forschungszentrum Informatik Karlsruhe, Karlsruhe.

43. Völkel, M., Haller, H., Bolinder, W., Davis, B., Groth, K., Gudjónsdóttir, R., Kotelnikov, M., Lannerö, P. and Lindquist, S. (2008), "Conceptual data structure tools", Deliverable 1.2, Nepomuk Consortium, Karlsruhe.

References

References (44)

Indexing (document details)

Subjects:	International, Studies, Knowledge management, Personal information, Models, World Wide Web, User interface
Classification Codes	9179 Asia & the Pacific, 9130 Experimental/theoretical, 5200 Communications & information management
Author(s):	Max Völkel, Heiko Haller
Author Affiliation:	Max Völkel, Forschungszentrum für Informatik, Universität Karlsruhe, Karlsruhe, Germany Heiko Haller, Forschungszentrum für Informatik, Universität Karlsruhe, Karlsruhe, Germany
Document types:	Feature
Document features:	References, Illustrations, Charts
Publication title:	Online Information Review. Bradford: 2009. Vol. 33, Iss. 2; pg. 298
Source type:	Periodical
ISSN:	14684527
ProQuest document ID:	1882655981
Text Word Count	7596
DOI:	10.1108/14684520910951221
Document URL:	http://proquest.umi.com/pqdweb?did=1882655981&sid=4&Fmt=3&clientId=12687&RQT=309&VName=PQD

Document View
Print \| Email \| Copy link \| Cite this \| Mark Document		This article cannot be translated due to its length.

Print \| Email \| Copy link \| Cite this \| Mark Document		Publisher Information
^ Back to Top