Informatics Institute Annual Report 004 Chapter introduction



Yüklə 0,79 Mb.
səhifə4/13
tarix06.03.2018
ölçüsü0,79 Mb.
#44909
1   2   3   4   5   6   7   8   9   ...   13

Characterization

Research within the Information and Language Processing Systems group is aimed at developing and studying the computational, linguistic, and statistical underpinnings of effective ways of providing intelligent information access, especially in the face of massive amounts of information. Addressing this task requires synergy between AI-research, IR-techniques, and applied natural language processing. Our leading methodology is to identify real-world scenarios that give rise to interesting research challenges. If possible we try to address such challenges from a broad spectrum of perspectives, ranging from foundational and theoretical to experimental.

The ILPS group takes part in a number of world-wide evaluation exercises in the areas of information retrieval and language processing, including TREC, CLEF, INEX, ACE, and Senseval. To support our participation in these evaluation efforts, the group has made, and continues to make, a considerable investment in software infrastructure. During 2004 the ILPS increasingly become both a participant and an organizer of large-scale evaluation efforts. It has now become co-ordinator of the Dutch question answering efforts for CLEF, as well as organizer of the WebCLEF multi-lingual web retrieval evaluation effort.

Key Words

Information retrieval, applied natural language processing, knowledge representation and reasoning, machine learning, data mining, question answering, social networks, semi-structured data.


Main themes
The overall aim of the projects carried out within the Information and Language Processing Systems group is to put abstract theories to work with the aim of gaining insights in the computational, statistical, and linguistic underpinnings of dealing with large amounts of textual information.
Research activities within the Information and Language Processing Systems group fall under one or more of the following headings:
Information Retrieval

Work under this heading covers topics such as spatial reasoning and image retrieval, semistructured data, cross-lingual retrieval, mono-lingual retrieval for European languages, question answering systems, knowledge representation, and web logs.


Applied Natural Language Processing

This heading covers topics such as information extraction, text mining, lexical semantics, shallow parsing technologies, robust generation of semantic representations, and online sentiment analysis.


Knowledge Representation and Reasoning

Work under this heading includes semistructured data, constraint satisfaction problems, expressive power and functionality of query languages for XML data and of restricted description languages (including modal, description, and feature logic), proof and decision methods for modal-like logics, benchmarking XML query languages, query evaluation for XPath and automated reasoning.

Much of the research in the Information and Language Processing Systems group is aimed at understanding the computational behavior of information and language processing technologies, especially in relation to their potential benefits for real world information processing tasks. As a consequence, there is a strong emphasis on implementation efforts.


2004 Results



Information Retrieval

Our retrieval work was organized around participations in CLEF, INEX, and TREC, with work on XML retrieval, multi-lingual retrieval, question answering, and web retrieval. In the area of XML retrieval, we investigated the appropriate unit of retrieval for XML retrieval, which lead to a large number of publications, both at SIGIR and elsewhere. We also studied the potential contribution of structural hints in XML retrieval; this too proved a fruitful research area with publications at CIKM and elsewhere.

Mixing content-dependent and content-independent document features, our web retrieval work proved competitive at the 2004 TREC Web track, building on our expanding work on statistical language modeling. Additional work on the use of light-weight query refinement techniques for web retrieval led to an ECIR publication, best student paper award.

Work on our multi-stream question answering technology proved fruitful and productive, with a successful participation in the question answering track at CLEF, and with publications on question analysis, predictive answering, as well as answer selection. We have started up efforts to marry our question answering and XML retrieval efforts, and are exploring question answering against online encyclopedia and against collections of frequently asked questions mined from the web.


Applied Natural Language Processing

Work here was organized around a number of evaluation efforts. For a start, we took part in Senseval-3 (Semantic forms and Logic forms), as part of our work on robust shallow semantic analysis of natural texts. Our semantic representations provide a good trade-off between the richness of the resulting structures and the complexity and robustness of the computational methods.


Richer levels of linguistic analysis were studied as part of a new generative language modeling formalism that allowed group members to systematically study a broad spectrum of grammatical formalisms.

Among other applied linguistic interests were pursued question generation and the study of word order variations. In a pilot study, manual annotations in a treebank were exploited and used to generate complete syntactic annotations of questions derived from declarative sentences.


Within the NWO project “ITEQA: Inference for Temporal Question Answering”, we worked on the general issues of improving information extraction systems by incorporating data-driven techniques. Recognition of temporal expressions yields readily to machine learning, but their normalization, or interpretation, when considered as a monolithic task, seems to call for a rule-based approach. We worked an analysis of normalization that separates context-independent from context-dependent processing and, in particular, identifies context-dependent classification tasks as a potential application area for machine learning. We have found that that automatically learned classifiers can improve timex normalization performance while simplifying system development.


Knowledge Representation and Reasoning

Almost all technology for processing XML data makes use of the W3C standard language XPath. Within the NWO projects “A Model Checking Approach to Query Evaluation on XML Documents” and “Model Checking Algorithms and Tools for Hybrid Logics” we started with extending the XMark XQuery benchmark to test the full functionality of XPath 1.0.

Within the NWO project “Complex Knowledge Base Classification” we worked on establishing the computational complexity and expressive power of XML query languages. This work is mainly disseminated in the database community. Using techniques from modal and temporal logic we could provide a complete characterization of the important Navigantional XPath language. We characterized the W3C standard language XPath  1.0, as the two variable fragment of first order logic (SIGMOD Record) and we designed a natural extension of XPath  1.0 which is complete in the sense of Codd (ACM PODS 2004, best paper award).

The master thesis of one of our students has been turned into the NWO Mozaïek proposal “A Model Checking Approach to Query Evaluation on XML Documents” which has been granted in 2004. The PhD student started working on this project in September. The aim is to see whether the techniques developed in temporal logic model checking can be applied to XML query processing.

Finally, within the NWO Pioneer Project held by ILPS work was done on constructing ontologies. To help improve current modeling and reasoning environments we introduced, jointly with researchers from the AMC, formal criteria for good modeling of terminologies. By introducing methods for debugging, explanation and automatic semantical enrichment to support the modeling process, we aim at encouraging others to use more expressive formalisms.
2005 and beyond
In 2005 we will continue to shift our focus to web-based information access. This development will show up very prominently in our organization efforts for WebCLEF, a new multi-lingual web retrieval task that is being
set up under the CLEF umbrella.
Additionally, we will extend our web-based question answering demo's. Going beyond factoid questions will be one of main question answering aims for 2005, and it will centered around crawling, extracting, and retrieving
frequently asked question pages.
Further planned activities include creating synergy between our XML retrieval and question answering activities. In particular, we aim to re-implement much of the language processing required for question answering as off-line annotation activities, thus creating collections of concurrent XML files. Answering question would then be adressed as retrieval against these files.

Finally, our work on benchmarking XML query evaluation engines will continue, and we now aspire to create multiple test sets that together cover as much of the required query and language functionality as possible.


The Laboratory for Human Computer Studies
The Human Computer Studies (HCS) laboratory performs research on theories, methods and technology regarding the design and use and evaluation of complex human-computer systems. A first class of complex human-computer that are object of study are knowledge-intensive systems. Such systems can be based on human knowledge that is represented in ontologies and knowledge bases, or can contain knowledge obtained by machine learning methods. This research is centered around questions concerning semantic modeling, ontology engineering, multi-agent systems and adaptive systems. Typical examples of such systems are: Semantic Web applications, text mining tools, tools for developing and maintaining ontologies, qualitative reasoning systems, adaptive systems and ontology population and learning systems. Application domains include: cultural heritage, E-government services on the web, knowledge management, modeling physical systems, intelligent learning environments, bioinformatics, collaborative information management and virtual organizations.
A second class of complex human-computer systems that are object of study at HCS, are e-learning environments, simulation environments for educational purposes and interactive systems. A central topic is the study of interaction requirements and user experiences, in particular for special user groups. Results of this research include methods for requirement extraction, user evaluation methods and best practices.
General Information
Contact person : Prof Dr B.J. Wielinga

Telephone : +31 – 20 – 888 4696/4689

URL : http://hcs.science.uva.nl

Fax : +31 – 20 – 525 6896

Email : wielinga@science.uva.nl
Position within the organization
The laboratory for Human Computer Studies (HCS) is one of the three laboratories at the Informatics Institute of the Faculty of Science at the Universiteit van Amsterdam. The HCS laboratory was founded in 2004 as a merger between the former Social Science Informatics group in the Faculty of Social and Behavioural Sciences and a number of smaller groups in the Faculty of Science. The HCS group participates in the Dutch Graduate School “SIKS” (School for Knowledge and Information Systems).
Characterization
The general mission of HCS is the study of how people interact with ICT applications to achieve their goals, how human knowledge can be brought to bear in ICT applications and how ICT technology can support human activities. The research in HCS can be categorized under the following main themes:

Knowledge, Agent and Semantic Web Technology

Adaptive Information Management

Co-operative Information Management in Federated Systems

Interactive Systems

Qualitative Reasoning


Each of these themes will be discussed in detail below.
Knowledge, Agent and Semantic Web Technology
Key Words
Ontologies, Semantic Web, Multi-agent systems, Knowledge-based indexing and retrieval
Main Theme
This theme concerns research on theories, methods and technologies for representing, extracting and interpreting the semantics of information resources. The background of the research is the vision of the Semantic Web: an extension of the current web with meaning attached to information resources such as documents, images, video and audio material. Ontologies as a shared conceptualization of a part of the world, will play an essential role in the attribution of meaning to heterogeneous information resources. Hence, an important topic in the theme is the representation of, the reasoning with and the acquisition (automatic or semi-automatic) of ontologies.
The Semantic Web will be a distributed system. Hence one may expect that agent technology will play an important role in the Semantic Web. The role of agents will be to harvest, combine and extract information from heterogeneous sources. Agents will have different types of knowledge. For example one agent may know how to extract terms from a document, another agent may have knowledge about an ontology of a particular domain. One of the research topics of the theme is how such agents can be organized in a multi-agent system.
2004 Results
Ontologies

The Ph.D. project on knowledge-rich indexing of learning objects (S. Kabel) showed that re-use of learning material can be facilitated by using ontologies for indexing and retrieval. Fragments of instructional material represented by various media (text, video, images, audio) were indexed using ontologies of various types (domain, instructional, knowledge type). Empirical studies using the indexed material and the ontologies for retrieval showed that the ontologies increase the efficiency and the effectiveness of re-use of instruction material.


Document analysis and indexing

The general goal of the Metis project is the development of knowledge on how to make organisations smarter by focusing on: collaboration and communication; relating and visualizing information and communication; and organisational perspectives on knowledge management. The HCS laboratory has contributed by developing tools and ontologies that make it possible to index and retrieve documents within communities of practice (CoPs). An initial evaluation of the results at a large plastics company suggests that the "knowledge maps" semi-automatically generated from such CoPs make it easier to retrieve relevant knowledge. Further application of text analysis tools to "noisy" document-sets is ongoing.


Multi-agent systems

In 2004 C. van Aart completed his thesis on “Organizational Building Blocks for Design of Distributed Intelligent Systems”. This work resulted in a framework for multi-agent design, based both on human organizational models and principles for distributed intelligent systems design. Three organizational structures were developed for coordination and cooperation in multi-agent systems. A number of case studies were developed as a proof of principle for the theoretical concepts.


Programming and Semantic Web infrastructure

The core of the HCS software resources is formed by SWI-Prolog and its supporting packages dealing with visualization, networking, markup languages, semantic web storage and querying and much more. After initial development within an EU project for editing models in the field of knowledge engineering the system has grown to a world-wide accepted open source resource for research and education. The current focus is on ontology management and text mining.


Achievements over 2004 include addition of Constraint Logic Programming (CLP) in cooperation with KU Leuven, multi-threaded HTTP/HTTPS server and client support, extension of the semantic web storage module with concurrency and internationalization (unicode) support and provide a Sesame compatible query engine and API in cooperation with the VU.
2005 and beyond
Semantic Web and E-culture

The work on the topics decribed above will continue in 2005 and beyond. The work on converting existing thesauri to ontologies represented with Semantic Web technology (RDF(S) and OWL) will continue. In particular, conversion and use of large ontologies, such as the Getty thesauri (AAT, TGN, ULAN), in the domain of annotating cultural heritage objects will be an important research topic for the coming years.


Another topic in this context is population and learning of ontologies from heterogeneous sources. Initial tests in the domain of e-culture have shown that relations, for example between artist and art style can be automatically derived from information on the WWW. This work will be extended to ontology learning: the extraction of generic concepts and relations from various sources.
Communication Coordination in Hybrid Multi-Agent Systems

The major objective of the CCHMAS part of the CDM cluster in the ICIS project (Bsik) is to investigate organizational structures and strategies for the coordination in dynamic, distributed and hybrid agent systems. van Aart et al. (2004) have used traditional organizational principles to build ad-hoc intelligent systems from intelligent services. The CCHMAS project will extend this work to more dynamic situations, where there is uncertainty about the world in which the agents operate, the (amount of) information and the availability of agents. We will research a specific class of agents - so-called manager agents - that have no other task then to organize and coordinate a dynamic set of diverse operator agents in an uncertain world, and that report to, and take instructions from humans. The project has started end of 2004. In 2005 we will set-up a prototype system within the RoboCup Rescue Simulation environment. We will use this environment to define initial coordination and communication ontologies for joint actions within the RoboCup Rescue world. We will extend this framework by successively introducing more and more dynamics into the environment, into the organizational structures and into available information.


Adaptive Information Management
Key Words
machine learning, data mining, adaptive systems, knowledge discovery, grammar induction, web mining, interactive systems
Main Theme
The research on Adaptive Information Management (AIM) focuses on the study of systems that can adapt their behavior to the environment. There are two main lines of research, single learning systems, and groups of cooperating systems. This field of research has a strong link with the study of agents. The distinguishing character of AIM is the focus on adaptation of behavior and learning. The ambition of the AIM group in the coming five years is:

  • to study formal models of adaptive systems,

  • to study collaboration models

  • to realize a number of industrial or prototypical applications of adaptive systems,

  • to study the complexity and explain the efficacy of existing learning systems.


2004 Results


  • In the DUMPERS project, it was shown that a method for clustering page transitions can discover structure in usage from the traces of users. This can be used to automatically construct navigation support.

  • The thesis by Floor Verdenius discussed the application of Machine Learning for industrial problems and provided methods for selecting an appropriate learning method for a problem. A methodology is outlined and a method is presented to assess if a particular class of models is appropriate.

  • In the context of AID we will be building a suite of dynamic, model driven information access and knowledge extraction tools on top of an architecture for grid-based distributed data analysis over the next 4 years. In the AID VL-e project an activity analysis of the cases provided by the Food Informatics partners was made. This analysis is the basis for a demo application of the adaptive query environment that will be presented to the project partners in February 2005. We decided to participate in the TREC Genomics Track to test our results in an international context. The AID tools were made operational on the TREC Genomics data set of over 4 million Medline abstracts using the Mesh ontology.

  • Research on data assimilation resulted in a study in the biological domain (bird migration) and traffic forecasting area.

  • Research into virtual organizations to support ICT led to novel concepts for ICT maintenance. In the scope of the VL-e project, the proposed models are experimentally verified.


2005 and beyond
Two new projects started at the end of 2004. An IOP-MMI project will address methods for text classification applied in the context of trainable information distribution: can a system be trained to send information to the person for which it is most useful considering current activities. In the ICIS-CHIM project, adaptive systems will be studied from the perspective of acceptability for the user. Other themes are the discovery of concepts and relations from text methods for learning and sharing “learning bias”, knowledge that is used for further learning. In 2005 the AID VL-e project will make it’s first prototype of the adaptive information retrieval tool suite available on the GRID Proof of Concept environment of VL-e. In 2005 we will also start collaboration with the Biorange program focusing on the following themes: SP3: Integrative Bioinformatics. Research theme 5: Content Driven Data Modeling and SP4: VL-e for Bioinformatics Applications. Research theme 3: Collaborative Information Management.
Collaborative Information Management in Federated Systems
Keywords
Federated/distributed databases, collaborative networked organizations, knowledge management, ontology engineering, schema matching/integration, knowledge discovery, virtual organization breeding environment
Main theme
Research in the CO-IM group is primarily focused on collaborative information management in Federated Systems. It addresses the design and development of architectural frameworks, semantic models, and supporting services necessary for the inter-operation and coordination of goal-oriented collaboration among heterogeneous / autonomous systems. Special emphasis is given to: modeling of information / knowledge, ontology engineering, federated schema management, federated query processing, and semi-automatic assisting tools for developers of federated architectures and systems, all of which applied to a wide variety of complex emerging domains, from scientific to manufacturing and from control engineering to tele-assistance. The main areas of research and prototypical development activities aim at the following:

  • Federated information/knowledge engineering in complex emerging domains

  • Information management architecture for cooperative systems

  • Semi-automatic tools supporting integration / inter-operability among heterogeneous / autonomous environments

  • Collaborative Networked Organizations (CNO) paradigm, their theoretical foundations, management of breeding environments, ontology engineering / discovery, trust modeling and trust management


2004 Results
In EC 5FP ENBI – European Network for Biodiversity Information, the applicability and potential of both the GRID technology and Virtual Organizations to biodiversity domain was explored. An analysis and characterization of collaborative information management needs for the ENBI application domain was performed, and the results of activities were reported to the global biodiversity community of GBIF.
In EC 5FP – two Networks of Experts: THINKcreative, and VOSTER, collaborated in the organization of several international workshops/conferences related to the area of Virtual Organizations. We were involved (as a co-editor) in the preparation of a technical book, published by Springer, with several chapters based on the results of the THINKcreative project, and other chapters submitted by international experts in the field. Both of these projects ended in 2004.
In EC 6FP ECOLEAD– European Collaborative networked Organization LEADership initiative, the CO-IM group is one of the main four players in this large Integrated Project, with 19 European partners, launched in April 2004. During its first year, the key entities and components of the Breeding Environment for collaborative networks (VBE) are identified and modelled, VBE operating principles are defined, and the design of a VBE support infrastructure is achieved. Furthermore, requirement analysis of the VBE functionality has identified the need for: pro-active VBE competency management system, support for ontology discovery/evolution during the life cycle of the VBE, and performance-based trust establishment among the VBE members.
In Dutch Bsik VL-e – Virtual Laboratory e-Science, the design of VL-e COLIM (collaborative information management) component started. A study was performed on the development of flexible approaches for automatic generation of database schema definitions based on the ontology entries provided by expert scientists. Also an existing proof of concept module for virtual laboratory information management (VIMCO), developed earlier by the CO-IM group, was enhanced and made available on the GRID proof of concept environment of the VL-e. The VIMCO component will be used as a “base” for information management by VL-e’s existing scientific applications. The PhD thesis of Ersin Kaletas represented the information management of the Virtaul Laboratory project and introduced the use of pre-defined “Process Flow Templates” and workflows for proper modeling of experimentation steps in different scientific domains.
In EC 5FP TeleCARE – Multi-Agent Tele-Supervision System for Elderly Care, the development of several key information management components were completed, including: the federated database system, its schema management, and agent-based federated query processing, the resource catalog management system supporting TeleCARE’s HW/SW resources, and the tool for semi-automatic generation of data structures from their ontology definitions. These components were demonstrated and evaluated, serving as the proof of concepts. The project completed in 2004.
2005 and beyond
In ENBI, the design of CIMS infrastructure for Cooperative Information Management System, that applies the GRID technology, federated information management, and the Virtual Organization, to the biodiversity domain, will be finalized. This model and approach will be evaluated for adoption by the Global biodiversity community of GBIF.
The ECOLEAD project runs until 2008. By that time, development of the breeding environment to support establishment and operation of collaborative networked organizations (CNO) will be achieved. Significant contributions will be made to: development of a reference model, and reference system architecture for the VO breeding environments, the VBE management system, the VBE life cycle support, discovery / evolution of competency/resource ontology, trust models, and approaches to performance-based trust establishment.
The VL-e project runs until 2008. The VL-e COLIM will approach several open issues related to design and implementation of “generic and extensible collaborative information management architecture” for supporting the VL-E application areas, including:

  • Common representation models and common framework / language, to support integration of pre-existing heterogeneous information systems, towards their full federation.

  • Development of Grid-based modules for federated information retrieval and federated query processing.

  • Development of advanced semi-automatic tools, e.g. schema matching/integration and automatic generation of database schemas, to assist the VL-e’s scientists, developers and administrators, with management of their information,


Interactive Systems
Key Words
Human computer interaction, Cultural aspects of ICT, E-government, Learning environments, Meta-cognition
Main Theme
In general this theme covers the theory and practice of human computer interaction in a variety of contexts. One topic under this theme is the study of interaction requirements and user experiences for special user groups (blind children, cultural minorities, disadvantaged communities, entrepreneurs, students). Special evaluation methodologies have to be developed to gain insight in interface design principles for these groups.
A second topic under this theme is the design and evaluation of interactive learning environments. What design principles can be developed for systems that support learners in their learning tasks. An important theoretical issue is how the principles derived from theories of learning and instruction can be translated into methods for e-learning environments.
2004 Results


Yüklə 0,79 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin