Extending the Simple Knowledge Organization System (skos) for Concept Management in Vocabulary Development Applications

Yüklə 74,6 Kb.
ölçüsü74,6 Kb.

Proc. Int’l Conf. on Dublin Core and Metadata Applications 2010

Metadata for WICRI, a Network of Semantic Wikis for Communities in Research and Innovation

Jacques Ducloy

DRRT Lorraine, France


Thierry Daunois

Univ. Lorraine, France


Muriel Foulonneau

CRP Tudor, Luxembourg


Alice Hermann

INSA Rennes, France


Jean-Charles Lamirel

Univ. Lorraine, France


Stéphane Sire

EPFL, Switzerland


Jean-Pierre Thomesse

Univ. Lorraine, France


Christine Vanoirbeek

EPFL, Switzerland



This paper introduces metadata issues in the framework of the WICRI project, a network of semantic wikis for communities in research and innovation. A wiki can be related to an institution, to a research field (mainly, environment or ICT at this time), or to a regional entity. Metadata and semantic items play the strategic role to handle the quality and the consistency of the network. An important point deals with the “wiki way of working” in which a metadata specialist and a scientist, familiar with abstract formalisms, can work altogether, at the same time, on the same pages. Some first experiments of designing metadata are presented. A wiki, encyclopedia of metadata, is proposed, and related technical issues are discussed.

Keywords: network of wikis; Semantic MediaWiki; metadata encyclopedia, e-Science; CRIS.

1. Introduction

Since March 1995, when Ward Cunningham launched WikiWikiWeb, a collaborative web site devoted to software, wikis are playing an increasing role in scientific information systems. This paper analyzes the place of metadata in a large wiki network. When a working-group launches a “tiny lonesome wiki”, dealing with a very specific topic, metadata does not play a role that is perceived as important. This feeling evolves depending on the size of an application, for instance Wikipedia, or on its complexity, for instance a network of more than 100 wikis. On our own side, we are starting a project in which we have to face with a large network of semantic wikis.

While Wikipedia reaches 3.000.000 articles, with a large amount related to scientific topics, the need for metadata becomes ubiquitous. For instance, its statistics for January 20101 gives 259.000 templates and 552.000 categories. The “animal” page implies that several specialists (programming for templates, communication for readability, semantics for the “poly taxonomy” design, zoology, paleontology…) work altogether. They share the same pages and could modify, quite in the same time, the related metadata. Is thus not the successful story of Wikipedia mainly based on the consistency of the encyclopedia, and therefore, on its metadata system?

The architecture hosted by the Wikimedia Foundation is rather centralized: a multilingual family around the English version, supplemented by specialized wikis. On the other hand, most wikis that can be found up to now in research organizations are quite monolithic. What would happen if several scientific communities aim at building an editorial collection distributed on a network of semantic wikis? We are discovering the extent of this problem in the WICRI project.

This article aims at identifying several metadata issues we faced in starting this network. WICRI stands for "WIkis for Communities in Research and Innovation". Right now, WICRI is a demonstrator, containing about sixty wikis; some of them are designed on a regional or institutional basis, other ones are related to several scientific topics. The knowledge architecture to be designed is quite the same as the one that would be required for several thousands of wikis. Thus, metadata does play a crucial role. Semantic wikis introduce a new generation of metadata, allowing a knowledge modeling in a RDF framework that is interesting to consider.

In this paper, we will first introduce the WICRI network; then we will present the initial technical choices we started with. We will discuss the next issues in two ways: a contributor facing the writing of metadata, and new services for helping him in this task.

Note: This article is written while using a collaborative practice, in a same way that we have done for DC 2006 (Ducloy et al. 2006). It will be published in two versions: traditional on the web site of the conference; and “Wicrified”2 on the Artist wiki3.

2. WICRI, a Network of Wikis for Research and Innovation

Customizing Wikipedia for Research and Innovation

Wikipedia has demonstrated the interest of the wiki approach to build and disseminate a common knowledge at a very large scale. It provides a first, but not sufficient, answer to research needs. Academic institutions are still suspicious about Wikipedia's validity. As a result, transparency of contributions and validity assessment are absolutely necessary for WICRI: its infrastructure must include registration processes, driven by institutional entities. Thus these institutions must find an advantage in "investing" in wikis. Hence, on a network, each partner can manage its own wiki, and promote its own visibility.

Our first experience has highlighted important issues from an editorial point of view. For instance, publishing new results of research activities is not compliant with Wikipedia's practices. Wikipedia's contributors must display information attested by external references. On WICRI, these results must be written under the control (or moderation) of scientific committees. This way of doing is experimented with a journal (AMETIST) published in the network. Publishing authored articles implies a very constrained way of modifying the original text, i.e. limited to adding links to articles explaining a particular topic, or to discussion area.

A networked framework allows managing several editorial strategies, and mainly: institutional, thematic and regional. The first demonstrator was built with a few institutional wikis. It has highlighted that, if several pages deal with the same topic, it is more suitable to specifically develop this topic on a thematic wiki. Thus, several thematic wikis have been introduced. Consequently, one topic can be described in different ways on different wikis. A small team, mainly 3 people in the same office, has operated the demonstrator. Even with such a small group, consistency problems have emerged, underlining the need for an effective carrying of metadata.

2.1 Different Classes of Wikis

The WICRI network accepts two main classes of wikis. An entity can open an institutional wiki. A regional one has an identifier in two parts: region name following by acronym; i.e. Lorraine/SGE stands for the research cluster SGE (environmental sciences and engineering, Sciences et Génie de l'Environnement) in Lorraine area. For a wiki of scientific working groups, the first part is a code identifying the thematic; for instance, ICT/Artist belongs to Artist workgroup, dealing with Information and Communication Technologies.

The global WICRI community can also design a common wiki. Be it managed by an organization or not, it fully shares the common rules and is moderated by independent and scientific committees. A common wiki has an identifier with Wicri/ as first part, i.e. Wicri/Lorraine or Wicri/Water.

Institutional wikis might have specific rules, differing from the rules of the common wikis. For example, an institutional wiki can be open to anonymous contributions, or, on the contrary, be even more strictly limited. The editorial line can strongly differ from WICRI's one, as well.

In the WICRI network, most wikis are related to a "family" (i.e. a set of wikis, one for each language, connected by interwiki links). Wicri/Water (fr) refers to the French component of the family, and Wicri/Water (en) to the English one.

2.2 The Current WICRI Network

At the beginning of 2010, WICRI network contains almost 30 common wikis. A first set is based on a regional framework such as Wicri/Lorraine. Another set is devoted to thematic fields. At this time, one wiki, Wicri/Ticri is related to “Information & Communication Technology” (a Dublin Core portal is included). Four wiki families deal with environment: Wicri/Water, /Woods, /Biomass and /UrbanSoils. They content information system items (such as program committees), and editorial texts (scientific articles, scientific surveys).

FIG. 1. The current WICRI network (a subset)

A few common wikis have been designed to ensure global consistency of the network. The most visible, Wicri/Wicri, gives a global view of the network: all topics must appear and link to more detailed pages or desk in other wikis. Wicri/Media, an image repository, plays the same role as Wikipedia Commons. Concerning metadata handling, Wicri/Base contains templates and semantic items, which can be used in all other wikis. Most institutional wikis have relationships with a regional wiki and with one or two thematic wikis. The whole network can be browsed, using ontologies, through a thematic path, and also be used as an information system.

2.3 WICRI: a Networked Current Research Information System

A Current Research Information System, commonly known as "CRIS", is any information tool dedicated to provide access to and disseminate research information, such as People, Projects, Organizations, Results (publications, patents and products), Facilities, and Equipment (EuroCRIS, 2009).

The European Commission supports the CRIS approach, through the CERIF (Common European Research Information Format) recommendation4. This way of working is spreading worldwide and, for instance, at the USDA (United States Department of Agriculture)5. Such a system could play a very strategic role in the WICRI network, acting as a structural skeleton. This approach looks like Jeffery's (2007) or Erbach's (2006) ones. They would like to merge organization related items (CRIS) with open archives in order to produce an e-Science infrastructure (Jeffery, 2005).

FIG. 2. Integrating a CRIS on wiki.

WICRI want to go one step further in order to obtain a highly detailed and understandable CRIS while using editorial facilities of wikis for bringing a human readable summary. In this perspective, semantic wikis could provide a technical basis for implementing a CRIS as skeleton.

2.4 Initial Technical Issues

WICRI project aims at setting up a set of services. From the initial demonstrator, it is becoming a digital infrastructure on which pragmatic solutions are promoting, considering Zack Rosen's advice to Semantic Web researchers (2009): “Researchers need to stop thinking of themselves as researchers and start thinking of themselves as implementors”.

A wiki engine had thus to be defined at the start of the project. A priority issue was to allow a maximum of researchers to disseminate their results to a maximum of potentially involved actors, in other words, to be fully compatible with Wikipedia. By the way, MediaWiki6 was chosen as the engine of the WICRI network. Used by Wikipedia, it is very popular in research and innovation contexts. A strong advantage is the possibility of using Semantic MediaWiki, which provides an “extension that enables wiki-users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways” (Krötzsch, 2007).

A consistent investment is needed to achieve a level of functionality comparable to that obtained with Wikipedia. That implies to supplement the functionality of MediaWiki with templates that are commonly used in Wikipedia, so that a new contributor is not disoriented when moving from Wikipedia to WICRI. The wiki Wicri/base has been specifically created to manage the collection of needed templates (and also semantic items) used throughout the network.

It is important to point out here that the choice of MediaWiki is not exclusive. The network can theoretically support different engines. Due to the small size of the current WICRI team, we have limited our choice, at least temporarily, to a unique engine type.

3. Writing a Networked Hypertext with Formulas and Metadata

In most content management systems, which have been designed “before blogs and wikis”, a clear barrier exists between editing contents and programming or managing metadata. As a scientist uses to write mainly short and isolated papers, Digital Libraries are reduced to storage of isolated papers in archives or various databases. Could we expect that a global consistency of a knowledge domain will be provided, for a human reading, only by ontologies and semantic properties; and why not, “using a magic wand”, through folksonomies?

In a quite opposite way, on a wiki, any actor can handle all these activities (from programming to writing contents) at any time on any page. Wikipedia acts as a Digital Library, in which the experts of a given scientific area could directly design a portal. Authors may write pages and associated metadata in the same temporality. They do not write “one or several papers”, but a “human brain designed” hypertext.

In this section, we will focus on several aspects of writing a scientific, readable and networked hypertext: handling scientific objects and knowledge; submitting given information in different ways, in different contexts, for different audiences.

3.1 Semantic Wikis for Scientific Objects

Scientists and engineers use to work with a lot of technical objects, such as formulas, drawings, 3D images; and not only texts. In this purpose, they must very often use formal writing, and not only WYSIWYG interfaces. This way of doing could be considered as training for entering knowledge items or metadata. In other words, improving the carrying of scientific object would improve the quality of metadata or semantic facilities. MediaWiki is quite poor for handling formulas or drawing. So, a technical support is soon requested for getting corresponding extensions (for instance “imagemap”7 or LaTeX8). The Proteopedia project (Hodis, 2008) is going one step forward, tackling with the management of molecular items such as protein, RNA, DNA and other macromolecules9. The contributor can thus set several kinds of molecular interaction while using “green links” in the wiki text. These links interact with a Java applet (jmol). Generalizing this way of doing requires a more complete XML support10, with contributors having acquired a good practice of markup language. In such a context, handling syntax of metadata or semantic items would be quite easy. The difficulties would come from designing a global knowledge in a collective way. For instance, several sets of taxonomies are implemented in Wikipedia about life species. A comparison between several language versions, Commons and WikiSpecies shows a multipurpose utilization of 3 classification schemes11.

About Semantic MediaWiki in science, a first set of applications deals with organization issues; for instance, semanticWeb.org or openResearch.org provides a semantic model for scientific events. Another set aims at building or curating ontology. But, until now we have not found wikis that use ontologies in order to handle scientific objects with an editorial purpose. While adapting the semanticWeb.org model we have encountered several difficulties, due both to the variety of situations occurring in different communities, and to the translation in French.

Moreover, Semantic MediaWiki does not represent a universal solution. For instance SWiM (Lange, 2008), a semantic wiki for mathematical knowledge management12, has a better handling of mathematical formulas than the Latex extension of SMW.

3.2 Different Writing in Different Contexts for Different Audiences

In the WICRI context, most information should be developed several times on different wikis. For instance, each research project with several partners must be cited and commented in the regional wiki of each partner, as well as in all relevant thematic wikis. Here follow 3 examples, related to DCMI life, we faced with: a city (Pittsburgh), a scientific paper, and a call for paper.

Pittsburgh appears at least on 3 wikis. On Wicri/Ticri, this city is linked to DC 2010 and the corresponding page speaks about main activities related to information science13. On Wicri/Water, the content deals with confluence of Allegheny and Monongahela rivers for giving Ohio river. On Wicri/Wicri, the page gives general facts and introduces commented links on the other pages. These 3 pages are related to the same topic, but display clearly distinct contents.

About a translation of Lagoze’s paper, “Qu’est-ce qu’une bibliothèque numérique, au juste?”, (Lagoze, 2005). In ICT/Artist the paper is integrated in the portal of the Ametist journal in which it was first translated14. As a reference paper, a copy has been done in Wicri/Ticri where anchors and links are quite different than those existing on ICT/Artist. Since this paper's introduction could get a very large audience, this part is exclusively displayed on Wicri/Wicri.

About an ICT conference held in Lorraine, the call for papers is duplicated on two wikis, Wicri/Ticri and Wicri/Lorraine. Table 1 shows different ways of managing the relationships between this event and committee members. The event model of semanticweb.org is used with properties Has PC member and Has OC member. Paul Dupont, working in Lorraine, is always qualified with the property Has PC member. On Wicri/Lorraine John Smith is only linked to Wicri/Ticri with an interwiki link, [[ticri.en:John Smith]], because he has no author page on Wicri/Lorraine (up to now, SMW does not provide semantic links between different wikis).

TABLE 1: A part of a page relative to a conference happening in Nancy.

The Committee as it appears on every pages.

Program Committee

  • Paul Dupont, Nancy (Fr)

  • John Smith, London (UK)

Organizing Committee

  • Jean Durand, Nancy (Fr)

As it would be coded in a thematic (i.e. Ticri) wiki.

PC members are qualified by properties. OC members have only interwiki links

==Program Committee==

* [[Has PC member::Paul Dupont]], Nancy (Fr)

* [[Has PC member::John Smith]], London (UK)

==Organizing Committee==

* [[wicri-lor.fr:Jean Durand|Jean Durand]], Nancy (Fr)

As it would be coded in a regional (Lorraine) wiki.

Only, local PC or OC members are qualified by properties.

==Program Committee==

* [[Has PC member::Paul Dupont]], Nancy (Fr)

* [[ticri.en:John Smith|John Smith]], London (UK)

==Organizing Committee==

* [[Has OC member::Jean Durand]], Nancy (Fr)

3.3 Managing Network Consistency

For the WICRI project, a critical issue is to manage network consistency. Here follows an example that implies a large set of pages about geographic items such as countries, towns, etc.

FIG. 3. Interlinks between geographic items

When a new city appears on a given wiki, the contributor should theoretically keep the connectivity of the networked hypertext. Fig. 4 gives an example with the city of Nancy in an institutional wiki (Artist). The Nancy related page on ICT/Artist must be linked with Lorraine, France and Europe pages on the same wiki (these pages must eventually be created). It must also be linked to Nancy page on Wicri/Ticri, Wicri/Wicri, and so on. In a multilingual context, this graph must be duplicated with taking care of translation (for instance, for Lorraine the page name would be "Lorraine (region)" in English for disambiguation reason).

For a better understanding by a reader, this consistency needs to be explained by text. Automatic tools could provide an initial building, but contributors must also be implied in writing explanations. Thus, managing network consistency and related metadata is a cooperative task involving altogether human contributors and computers.

4. Metadata for Authors and Contributors

All these pages are mainly written by human contributors, and not by computers. Computers could help in various ways but, in fine, contributors make pages. In a repository-based network using OAI-PMH, computer protocols share controlled metadata and give consistency. In a wiki network, a contributor can write on many wikis and interact with metadata that plays a crucial role in the authoring process. This section introduces a new wiki for supporting metadata design.

4.1 Introducing Wicri/Metadata

Almost any contributor may be faced with having to create metadata in the WICRI network. Here is an example of writing of a call for papers. The first sentence looks like: DCMI announces that DC-2010 will be held in Pittsburgh. How to write it in a semantic wiki with the good properties? While reading the user manual of Semantic MediaWiki, introducing a new property seems to be very easy: you have just to contribute with something like this:

[[organizer::DCMI]] announces that DC-2010 will be help in [[place::Pittsburgh]]

When pushing the “Save page” button, the relations and, if needed, the properties are created. Thus the true problem does not deal with syntax, but with semantics: how to choose and to name a property? For instance, about the role of the DCMI in DC conference, we could write: organizer, has organizer, has global organizer, DC:contributor, etc.

A look at semanticweb.org illustrates this difficulty15. The “Property namespace” contains 773 pages; 768 are real properties; 277 pages are classified as “wanted properties” (without explicit page). Looking for DC:creator, we have found several variants. The preferred term is “Has author” (frequency 99). The most used term is “Author” (1058). The expression “Written by” appears 35 times. At least, “Author of”, “Content author”, and “Creator” appear once. In WICRI, the problem that we have pointed out for semanticweb.org is distributed on a network. Thus the following aspects have to be addressed. How to know if a property exists in the semantic model of the wiki? How to choose a new name for a new property in consistency with the existing ones? In a multilingual family of wikis, how could metadata items be translated?

We propose to set up a wiki, with an encyclopedic philosophy dealing with metadata. There are several wikis dedicated to metadata on the web, for instance, on the DCMI (Enoksson, 2008), but they are usually dedicated to specialists and, often related to a particular schema. Here, we want to be understood by a non-specialist16 who has to deal with many topics at the same time.

4.2 Main Lines for Wicri/Metadata

Metadata are related to a model (possibly expressed through an ontology in a semantic wiki) to represent the structure of the wiki and the properties of wiki resources. Each wiki can be created with several specific domain models (for instance, we use the FAO World Reference Base, for soil resources, in Wicri/UrbanSoils) and several general models (for instance the research event model used in semanticweb.org). Moreover, some concepts may exist in different languages. As a result, different wikis may use close or similar concepts using different models. A specific wiki, called Wicri/Base had soon been created in order to provide common tools for the WICRI community, including templates and particular metadata sets (e.g. Semantic Infobox Laboratory) and metadata elements. But this wiki only deals with items that have gotten a strong consensus. On its own side, Wicri/metadata must help in building this consensus.

4.2.1 Representing General Research Resources

The main function of Wicri/metadata is to provide elements to define metadata related to general resources of scientific communication. It relates to CRIS as well as Research repositories. The representation of resources is bound to the general domain of research, including concepts which belong to CRIS, Knowledge Organization Systems used in the different research domains or created ad hoc (for instance, see Tifous, 2007), bibliographic formats such as MARC or the DCMI Scholarly Work Application Profile, datasets formatting models such as text formatting (TEI…), survey datasets (DDI), educational formats such as LOM, persons (e.g. FOAF)…

With this set of schemas, the same concept could appear several times, with several shades. Wicri/Metadata has to explain this kind of situation in order to design guidelines, or to support multilinguality (e. g. Attribut:A pour ville adapted from Property:Has location city).

4.2.2 Ensuring Interoperability with other Semantic Applications

An interesting strategy is to find a “kernel ontology” that can be used without major adaptations. In this case, only the extensions have to be explained in Wicri/Metadata. This way of doing ensures interoperability with other semantic applications. WICRI operates like this for the model of conferences, starting from OpenResearch.org and explaining local adaptations.

This way of doing in generalized for describing scientific contents. WICRI is exploiting Eurovoc17 as a general ontology, which should be completed by specialized ones, for example WRB. Some repositories, such as OntologyPattern18 or Watson19 can be used for discovering domain ontologies. However, metadata editors have to search specifically for existing properties and sometimes they may find close but not exactly similar properties. This raises an issue to define the relations between concepts defined in different models.

4.2.3 The Wiki as a Metadata Registry?

Until now, WICRI has chosen to define redirects (i.e. owl:sameAs relations) with concepts from ontology repositories. However, in this case, the strict equivalence of two concepts is limited. Ontology mapping requires richer relations to be encoded, such as SKOS mapping properties skos:exactMatch, skos:closeMatch... Moreover, collaborative ontology mapping mechanisms (Correndo, 2008) should be available to the network so that any contributor who creates a new metadata concept or identifies a new relation should be able to enrich the system.

This should end up as a wiki-based metadata registry for the WICRI network, with some specificity though. The wiki architecture allows expressing a mix between structured and unstructured content. Scientific concepts are not defined only with traditional definitions, but also using scientific literature, guidelines etc. This is particularly important in a multilingual context as we observed in the WICRI network as well as in other collaborative scientific platforms. A review of concepts used to describe resources in the field of education (Sarre, 2010) demonstrates that many concepts proposed as metadata for this domain are not fully specified. There are metadata schemas, as well as concepts only defined in journal articles, guideline… It should therefore be possible to add concepts, even outside the scope of a proper ontology. In addition, semantic wikis include some intelligence, which can be useful to make inferences on the existing, or potential, relations between the concepts used in the network. The wiki network is not only an interface to a CRIS and research repositories, it also makes research content and scientific communication a building block of the semantic Web by providing dereferenceable resources and reasoning mechanisms through a decentralized and collaborative environment.

5. Metadata for Computers

The “wiki way of doing” puts the contributor in the heart of the metadata handling. So, what could be the role of the computer? Our feeling is that we cannot expect true automation in a short term. However, several tools or approaches appear to be very interesting on specific problems.

5.1 Networks and Distributed Wiki Applications

A strong issue for a network of wikis deals with replication management. In WICRI network a given data can appear on many pages of many wikis. What happens when this information must be modified? We have identified 5 classes of replication cases.

1. Wiki replication. A whole wiki could be duplicated in a P2P network of wikis with a distributed replication mechanism (Oster, 2006). This feature is useful for technical reasons (strategic wiki as Wicri/Wicri) or sometimes for political ones (wiki bringing visibility for several institutions). But, it does not matter with editorial replications, neither metadata.

2. Page replication. A page (or a set of pages) is replicated on several wikis. This kind of facility begins to be available (Rahhal, 2009), and could be very useful for invariant pages, such as templates related to semantic models. Using DSMW (Distributed Semantic Media Wiki) extension20, this mechanism is driven by metadata (semantic properties).

3. Paragraph replication. Until now, we have not found an extension of SMW extending the previous mechanism at the paragraph level. This need is ubiquitous in WICRI network. A palliative, creating templates for each paragraph, might work, but a human contributor could not really use it (for instance, this latter will need one page for each bibliographic reference).

4. Paragraph replication, with transformations. In many cases, the previous mechanisms could not be applied because the paragraph must be transformed while replicating. For instance, for editorial reasons, requirements for handling organization committees can be different in a regional wiki (with semantic links for local members) and in a thematic wiki (no links).

5. Replication of sets of several pages. Such an example was given before (geographic items).

Due to this large amount of problems, we have to leave out fully automated systems, and think about "computer assisted hypertext writing".

5.2 Handling WICRI Network Consistency

WICRI operates among scientific communities and institutes. If WICRI could get an adhesion of academic entities, such as libraries, a true “a posteriori validation process” could be set up. So, what kind of tools could help scientific people to work altogether with semantic or metadata experts? A first way consists in extending facilities that are soon provided on a simple wiki, to a network. We began to implement bots that use an XML schema, which gives the way to access to wiki facilities, such as “RecentChanges” and provide a consolidation at a network level.

server="http://maquettewicri.loria.fr" path="/fr.wicri/index.php5?">

FIG 4. XML description of WICRI network, used for piloting bots.

In a more prospective issue, we are looking how to use specialized tools in interaction with the wiki network. For instance, with geographic items, corresponding pages are handled simultaneously by administrators, or bots, but also by non-specialist contributors. So, defining the ontology on a common wiki is not really secure. Thus a better way deals with using external tools, like Protégé, and using bots to handle consistency in the wiki network. Several works about designing ontology in a cooperative way (Tudorache, 2008) are promising.

About Human–computer interaction, the semantic forms facility of SMW are useful in several cases, mainly for particular pages (for instance, periodical record), but appears not sufficient with editorial constraints. Using XML editors, and for instance, Xtiger (Sire, 2010) seems promising, but requires a better handling XML objects by MediaWiki. For instance, implementing some requirements such as “structuring a wiki page in TEI” or “templates with list as parameters” is a strong issue, which must be planned at long range.

However, in a more short range, we could expect to help a contributor in a better discovering of resources when he writes a new page.

5.3 Enriching the Wiki Network through the use of Web Data

The global exploitation of Web information represents an important challenge for enhancing the dynamicity, the flexibility and the scope of a wiki network like the one we propose. Hence, on the one hand, this process is mandatory for assisting the upcoming contributors with elaborated and reliable redaction guidelines during the network construction phase. On the other hand, it is also decisive for supplying end-users with external information whose added value is to maintain significant relationships with the semantic context of the wiki network.

On the end-user's side, the goal of querying the web is both to complete as well as to enrich the information on a given topic as soon as this latter has been formerly furnished to the user by the wiki network semantic context. The wiki network can thus be considered as a structured information support for intelligently querying and mining the Web. Clustering processes can be used in a last step to synthesize the obtained Web results (Lamirel, 2006).

On the author's side, relevant semantic roles that should take part in the wiki context can be selected, or even attributed, through looking up a large amount of unstructured Web data. In such case, one can also rely on the help of clustering process in combination with the use of wiki network metadata and the one of external annotation sources, in order to organize the querying results in a suitable way with the final goal of facilitating author's decision.

In our case, an important task is to find out the main actors and the salient institutions of a domain. This implies to highlight their various potential roles in said domain, as well as to characterize the nature of their relationships in the social networks associated to their disciplines. This kind of information can only be obtained by a large scope querying process stacking a sufficient amount of information to be able to bring out reliable hypothesis and conclusion. It thus led to consider intelligent and guided access to external wiki data through the use of existing wiki metadata. A main challenge is thus to be able to isolate wiki strategic information as authors or institution names in a flow of unformatted data. This approach relies itself on the global domain of automatized named entities labeling techniques. The current statistical systems that could be used in this context need to exploit a great quantity of pre-annotated data to learn all the possible forms of the named entities. In this case, it is thus necessary to label a corpus, which will serve as training tool. As soon as this task is quite unaffordable with limited human resources, recent initiatives such as DBpedia (Bizer, 2009) or Yago (Suchanek, 2009) seek to provide likely semantic corpora to help to design labeling tools. In the same spirit, certain semantic ontologies such as NLGbAse21 are largely directed towards labeling. In our own case, the WICRI network itself can also play the role of a particularly rich database for picking up reliable information about such potential entities.

6. Conclusion

About 18 months ago, the WICRI initiative was launched, in order to show that wikis can be useful to research and innovation communities. 6 months later, it appeared that networked and semantic approach should be experimented, through one thematic and one regional wiki. For now almost 1 year, we are dealing with a set of environmental oriented wikis. Doing so, we have been and are still facing many difficulties, to which our answers are sometimes only partial and unsatisfactory. Yet, the network way of thinking looks better than isolated services.

Similarly, semantic technologies applied to wikis allow building a research information system acting as and editorial portal to archives, with a strong interdisciplinary level.

The quality and consistency of the network are correlated with the quality of its metadata. To improve it from a technical point of view, Semantic MediaWiki allows skipping a step, in a data-centric approach. A wiki works as a light structured CMS, “a set of pages”, which could be boosted by RDF annotations. A wiki is also carrying some light structured texts, “a simplified approach of html”. Our feeling now is that a better handling of XML by a wiki is a key issue.

A wiki is also a cooperative place where specialists work altogether. Remembering that many scientific communities are using strong formalisms (LaTeX…), their training and education open an immediate way of improving qualities of metadata and semantic models in the WICRI network.


Bizer, Christian; Lehmann, Jens; Kobilarov, Georgi; Auer, Soren; Becker, Christian; Cyganiak, Richard; Hellmann, Sebastian (September 2009). "DBpedia - A crystallization point for the Web of Data". Web Semantics: Science, Services and Agents on the World Wide Web 7 (3): 154-165. ISSN 1570-8268

Correndo, G., Alani, H., & Smart, P. (2008). A community based approach for managing ontology alignments. In The 7th International Semantic Web Conference (p. 61). From http://eprints.ecs.soton.ac.uk/16673/

Ducloy, Jacques, Yann Nicolas, Diane Le Hénaff, Muriel Foulonneau, Luc Grivel, Jean-Paul Ducasse. Metadata towards an e-research cyberinfrastructure: the case of francophone PhD theses. Proceedings of DC 2006, Manzanillo, Mexico, 2006. , from http://dcpapers.dublincore.org/ojs/pubs/article/view/846.

Erbach, Gregor (2006). Data-centric view in e-Science information systems. Data Science Journal Vol. 5 (2006) pp.219-222, from http://www.jstage.jst.go.jp/article/dsj/5/0/219/_pdf

EuroCRIS (2009). Recording Research. Report for CRIS seminar September 2009. Retrieved February 10, 2010, from http://www.eurocris.org/fileadmin/Upload/200909.pdf

Hodis, Eran (2008), Jaime Prilusky, Eric Martz, Israel Silman, John Moult and Joel L. Sussman. Proteopedia - a scientific 'wiki' bridging the rift between 3D structure and function of biomacromolecules, Genome Biology 2008, doi:10.1186/gb-2008-9-8-r121. From http://genomebiology.com/2008/9/8/R121

Jeffery, Keith (2005). CRIS + open access = the route to research knowledge on the GRID. In 71st IFLA General Conf. and Council proceedings, Oslo, Norway, 2005, from http://www.ifla.org/IV/ifla71/papers/007e-Jeffery.pdf

Jeffery, Keith (2007). Technical Infrastructure and Policy Framework for Maximising the Benefits from Research. Proc. of the 11th Int. Conf. on Electronic Publishing, Vienna, Austria 13 June 2007. Leslie Chan and Bob Martens. ISBN 978-3-85437-292-9, 2007, pp. 1-12, from http://citeseerx.ist.psu.edu/viewdoc/download?doi=

Krötzsch, Markus, Denny Vrandecic, Max Völkel, Heiko Haller, Rudi Studer (2007). Semantic Wikipedia. In: Journal of Web Semantics 5/2007, pp. 251–261. Elsevier 2007.

Lagoze, Carl, Dean Krafft, Sandy Payette, and Susan Jesuroga. (2005, November). What is a digital library anyway, anymore? Beyond search and access in the NSDL. D-Lib Magazine, 11(11). Retrieved, January 10, 2007, from http://www.dlib.org/dlib/november05/lagoze/11lagoze.html.

Lamirel, Jean-Charles (2006), and Shadi Al Shehabi. MultiSOM: a multiview neural model for accurately analyzing and mining complex data. In Proceedings of the 4th International Conference on Coordinated & Multiple Views in Exploratory Visualization (CMV), London, UK, July 2006.

Lange, Christoph (2008). SWiM – a semantic wiki for mathematical knowledge management. In Sean Bechhofer, Manfred Hauswirth, Jörg Hoffmann, and Manolis Koubarakis, editors, ESWC, volume 5021 of Lecture Notes in Computer Science, pages 832–837. Springer, 2008.

Oster, Gérald (2006), Pascal Urso, Pascal Molli and Abdessamad Imine. In Proceedings of the 2006 ACM Conference on Computer Supported Cooperative Work, CSCW 2006, Banff, Alberta, Canada, November 4-8, 2006, 2006. From http://www.loria.fr/~molli/pmwiki/uploads/Main/oster06cscw.pdf

Rahhal, Charbel (2009), Hala Skaf-Molli, Pascal Molli and Stéphane Weiss: Multi-synchronous Collaborative Semantic Wikis. In Wise'09: International Conference on Web Information Systems, 2009. Retrieved, February 2010, from http://www.loria.fr/~molli/pmwiki/uploads/Main/Skaf09wise.pdf

Rosen, Zack (2010) RDF Semantic web research isn't working, Zack Rosen's post from Retrieved March 28, from http://www.zacker.org/semantic-web-research-isnt-working

Sarre, S., Foulonneau, M. (2010) "Reusability in e-assessment : Towards a multifaceted approach for managing metadata of e-assessment resources", Fifth International Conference on Internet and Web Applications and Services.

Sire, Stéphane (2010), Christine Vanoirbeek, Vincent Quint, Cécile Roisin. Authoring XML all the Time, Everywhere and by Everyone. Proc. of XML Prague 2010, p. 125-149, Institute for Theoretical Computer Science, March 2010.

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web (WWW '07). ACM, New York, NY, USA, 697-706. DOI=10.1145/1242572.1242667 http://doi.acm.org/10.1145/1242572.1242667

Tifous, A., El Ghali, A., Dieng-Kuntz, R., Giboin, A., Christina, C., and Vidou, G. 2007. An ontology for supporting communities of practice. In Proceedings of the 4th international Conference on Knowledge Capture (Whistler, BC, Canada, October 28 - 31, 2007). D. Sleeman and K. Barker, Eds. K-CAP '07. ACM, New York, NY, 39-46. DOI= http://doi.acm.org/10.1145/1298406.1298415

Tudorache, Tania (2008), Natalya F. Noy, Samson Tu and Mark A Musen. Supporting Collaborative Ontology Development in Protégé. In: Lecture Notes In Computer Science; Vol. 5318 archive Proceedings of the 7th International Conference on The Semantic Web

1 < http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#namespaces >

2 Wicrified is a neologism that comes from “wikified” in Wikipedia jargon. This task consists in using Wiki mark-up and adapting a page to the network, i.e. setting links, categories, metadata or semantic annotations.

3 < http://maquettewicri.loria.fr/en.artist/index.php5?title=DC_2010_WICRI_paper >

4 < http://www.euroCRIS.org >

5 < http://cwf.uvm.edu/cris/ >

6 < http://www.mediawiki.org/wiki/MediaWiki >

7 An image map is a list of coordinates for hyper linking areas of an image to various destinations.

8 It requires installing LaTeX environment close to the operating system, which is a quite complex task.

9 < http://proteopedia.org/wiki/index.php >

10 Right now, the current SVG (Scalable Vector Graphics) extension of MediaWiki converts an XML object into an image format without possibility of interactions between text and images.

11 For instance Acer on Wikipedia Species < http://species.wikimedia.org/wiki/Acer >,

  • Wikipedia (en): < http://en.wikipedia.org/w/index.php?title=Maple&oldid=345810808 >

  • Wikipédia (fr): < http://fr.wikipedia.org/wiki/%C3%89rable >

  • Wikipedia Commons: < http://commons.wikimedia.org/wiki/Category:Acer >

12 < http://wiki.openmath.org/ >

13 < http://maquettewicri.loria.fr/fr.ticri/index.php5?title=Pittsburgh >

14 < http://maquettewicri.loria.fr/fr.artist/index.php5?title=Ametist_0_Lagoze >

15 Data collected on 4th, March 2010.

16 For instance, we avoid to link to pages containing a thousand lines of RDF/XML, as an explanation!

17 < http://europa.eu/eurovoc/ >

18 < http://ontologydesignpatterns.org >

19 < http://kmi-web05.open.ac.uk/WatsonWUI/ >

20 < http://m3p.gforge.inria.fr/pmwiki/pmwiki.php >

21 < http://www.nlgbase.org/publi.html >

Yüklə 74,6 Kb.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2022
rəhbərliyinə müraciət

    Ana səhifə