2.1. Describe the Unit’s research (max. 4 pages)
This question surveys how the research carried out in the Unit has impacted research in its own field(s). Describe the orientation of scientific publishing, most important research results and the role of multidisciplinarity or interdisciplinarity etc. Also, describe the role of basic and applied research. In case the research carried out in the Unit is clearly specialised in the different fields of computer science, describe each field separately (see also question 6.3).
The Helsinki Institute for Information Technology HIIT is a joint research institute of the two leading research universities in Finland, the University of Helsinki (UH) and the Helsinki University of Technology (TKK). It was founded in 1999. At present, HIIT has some 135 researchers and staff. It operates in close collaboration with the Computer Science Departments of the parent universities at three locations: Ruoholahti, Otaniemi (TKK campus) and Kumpula (UH campus). Administratively, HIIT presently consists of two units: the Advanced Research Unit (ARU) founded in 1999 and the Basic Research Unit (BRU) founded in late 2001. The ARU is located in TKK and the BRU in UH.
HIIT conducts internationally high-level strategic research in information technology and related multi-disciplinary topics, especially in areas where Finnish IT industry has a significant role. It works in close co-operation with Finnish universities, research institutes, and industry, aiming at significant scientific impact that also benefits the industry and the progress of the Finnish information society. HIIT has a strong network of international partnerships with leading foreign research universities and institutions.
HIIT's work is organised in long-term research programmes, each consisting of several co-operating groups with a total of 25-40 researchers and led by a senior professor-level researcher. Each programme has an Advisory Board consisting of representatives from industry and academia. An internal Management Board consisting of the senior researchers participating in the programme coordinates the research of each programme. Programmes operate through various instruments, such as externally funded projects (TEKES, Academy of Finland, EU, companies), research positions (internal and Academy of Finland funding) and graduate school positions.
Programmes combine basic and strategic research with activities aimed at innovations. Through this, they aim at scientific impact through publications and influence on the scientific community, industrial impact through research prototypes and demonstrations, standardization activities, and close linkage with leading companies, and societal impact through participation in information society research, innovation-oriented activities, direct links with decision-makers, and active participation in public debate. For scientific impact, HIIT publishes its results in high-quality scientific journals and leading conferences, and also in open-source software. It also maintains close links with leading researchers in its fields through research visits and personal communication.
The present research programmes of HIIT (since 1.1.2006) are as follows:
Algorithmic Data Analysis (ADA). Director: Academy Professor Heikki Mannila
The development in measurement and data collection technologies have made it possible to gather and store large amount of information in many areas of science and industry. The ability to analyze these masses of raw data has increased at a much slower speed, however. The research programme on data analysis develops data mining and computational statistics methods for various application tasks.
Future Internet (FI). Director: Prof. Kimmo Raatikainen
Enhancing Internet infrastructure to enable efficient, secure and trusted always-on connectivity and services.
Network Society (NS). Director: Prof. Marko Turpeinen
Human-centric multidisciplinary anticipation and development of ubiquitous information and communication technology, which is based on deep understanding of needs and practices of our everyday life and our social relations in a network society.
Probabilistic Adaptive Systems (PAS). Director: Prof. Petri Myllymäki
Study and further development of the theory of sophisticated probabilistic models and exploring their applications for solving problems appearing in complex real-world stochastic systems.
The following paragraphs give selected highlights of HIIT’s research results. They have been chosen to display different research approaches and forms of impact as well as multi-disciplinary research lines covering bioinformatics, behavioural sciences, political science, and law.
Host Identity Protocol and related Internet infrastructure
The Host Identity Protocol is an approach to solving the present architectural deficiencies of the Internet protocol stack, especially support for mobility and multihoming, by introducing a new protocol layer at the “waist” of the stack. The layer introduces a new name space of Host Identities (HI) in the stack, effectively replacing IP numbers from the higher levels of the protocols. This separates the presently bundled functions of IP numbers as both locators and identities.
HIIT has been involved in the (initially small) HIP research community since 2002. We have developed our own HIP implementation, HIP for Linux (HIPL), and also various network infrastructure components related to rendezvous service, HI-IP mapping, and support of various kinds of middleboxes. Jointly with UC Berkeley, HIIT also developed the Hi3 overlay infrastructure for managing HIP sessions, and has performed extensive testing of it on the PlanetLab network.
HIIT is presently a central node in the increasing network of HIP-inspired researchers. In particular, HIIT’s Dr. Andrei Gurtov co-leads the IRTF working group related to HIP, and HIIT has contributed significantly to the Internet Drafts related to HIP infrastructure. As another direct result of our work, HIP support was in late 2006 integrated with the standard Linux kernel, with the results that all Linuxes now are HIP-compatible.
Fuego Core middleware platform
The Fuego Core middleware platform is the result of a series of related projects focusing on middleware for future mobile Internet. It covers various themes considered of fundamental significance: XML processing and messaging, mobile distributed event system, XML synchronization and data access, and software configuration management. With this, the work has contributed to international standardization, particularly to IETF (SIMPLE WG) and W3C (Mobile Web Initiative and Device Independence Activity). The platform has also been adopted by industry for its own research and development.
ContextPhone
In the area of context-awareness and smart phones, there has been significant success in recognizing context by analysing user situation data. The results include a prototyping platform ContextPhone for context-aware applications running on Smartphones, specifically on Nokia’s S60 platform. ContextPhone consists of about 30 distinct components that implement data gathering, generalized event services, data logging, user interfaces, network protocols and debugging facilities. The platform has been published in both the sense of academic publications and as freely downloadable software, licensed both under GPL version 2 and MIT free software licenses.
Applications built on top of ContextPhone have been used in several research institutes. The data logging application ContextLogger was used to gather a unique dataset from one hundred participants over nine months by Nathan Eagle at the MIT Media Lab, and has been the basis for data analysis method development at HIIT.
ContextMedia, a contextual mobile media gathering tool, has been used together with the University of Art and Design Helsinki in several artist-led workshops around the world as well as by the Garage Cinema Research Group at UCB, with an end-user version released for public consumption under the name Merkitys—Meaning. A special-purpose sensor network version of ContextPhone is used in a artist-led cross-disciplinary project (Evans in press). Datasets from these experiments have been released publicly and have been used amongst others by research at the University of Helsinki and University of Jyväskylä (Mazhelis et al., HICSS, 2006).
Future Internet search
Future Internet search technologies have been a focus area of the PAS programme since 2002. The work uses probabilistic and information-theoretic methods to model information retrieval, also following the principles of open source software development. The underlying hypothesis of the work is that distributed, semantic-based and multilingual methods will have a central role in the future of information retrieval. The work has been carried out in several parallel projects funded by the Academy of Finland, TEKES, and EU’s 6th framework programme.
Highlights of this work include algorithms and freely available software for learning latent variable models for text analysis, developed by W. Buntine and others, which have made it possible to create radically novel, semantic (content-based) search engines. In another line of work, new results in the Minimum Description Length (MDL) theory by J. Rissanen, P. Myllymäki and others, have been successfully applied in clustering, density estimation and image denoising.
Methods and tools for gene mapping, haplotyping, diagnostic markers and gene regulation
This line of research is based on a fruitful long-term collaboration of HIIT researchers with medical geneticists. We started with the problem of how to find loci in the genome that predispose to certain diseases. The first important results included tools for association analysis of haplotype data using techniques from data mining (Toivonen et al., American Journal of Human Genetics, 2000). This algorithm was then successfully used by geneticists in the Karolinska Institutet, Stockholm, to locate the asthma gene, a highly significant finding that was published in Science.
Later, we developed a novel model for genomes of a population which led to a new efficient algorithm for haplotyping genome data, using hidden Markov techniques (Ukkonen, WABI 2003; Koivisto et al., WABI 2005). The resulting haplotyping software has accuracy and speed that is among the very best available at the moment. A similar founder approach has recently been applied by at least two leading groups elsewhere.
With gene copy number analyses, one patent application has been filed covering the diagnostic use of the chromosomal copy number change regions.
Most recently, we have developed in collaboration with Professor Jussi Taipale (Biomedicum, Helsinki) a new model for so-called gene enhancer elements in mammalian genomes. Such elements have important role in the regulation of gene activity. We carried out a genomewide comparative analysis and predicted several new enhancer elements that were successfully verified in vivo (Hallikas et al., Cell 2006; Palin et al., Nature Protocols 2006).
Finding orders from data
In certain data analysis applications there is a natural ordering for the rows or columns of the data. For example, in paleontological presence/absence data the rows represent sites and the columns represent species: the task is to find an ordering for the sites so that for each species its occurrences are in consecutive observations. In the error-free case this seriation problem reduces to consecutive ones problem, but it is NP-hard for realistic data. We have in the last years developed novel algorithms for this seriation task (Gionis et al., Paleobiology 2006; Puolamäki et al., PLoS Computational Biology 2006); their performance is excellent compared to previous approaches. Recent results (Gionis et al., KDD 2006) show also that finding partial orders can be done efficiently.
Techniques and tools for learning linear latent variable models
A common data-analysis framework for continuous data is to describe the data as a linear mixture of some underlying hidden variables. This family of methods includes Independent Component Analysis (ICA) and Non-negative Matrix Factorization (NMF), which both have received considerable attention in the machine learning community. We have contributed significantly to the problem formulation, solution algorithms, and software for these methods. In particular, we have published a book which is now the standard reference on ICA (Hyvärinen et al., Independent Component Analysis, Wiley, 2001). We have also developed and improved the FastICA MATLAB package (www.cis.hut.fi/projects/ica/fastica/), implementing the world-wide most widely-used ICA algorithm, which we developed in the 1990’s.
Furthermore, we have focused on the important problem of estimating the reliability of ICA components (Himberg et al., NeuroImage, 2004). We have extended the standard NMF method to include sparseness constraints. The resulting method (Hoyer, JMLR, 2004) has become a main reference for modern approaches to NMF, and our corresponding MATLAB package is widely-used.
Social media, especially mobile photography and mobile spectator media
In this line of work, mobile media services, especially for social photography and large-scale events, have been conceptualized, developed and extensively tested. This work has resulted in service design principles for mobile group media, as well as explorative implementations in commercial products (Kuvaboxi, Jaiku, Comeks), service prototypes (Comedia), and open mobile application platforms (MUPE). The work has been performed in close co-operation with UC Berkeley (prof. Marc Davis and prof. Nancy van House).
Mobility and cognition
The long-term objective of this line of research is to understand qualitatively and quantitatively the impact of mobile computing and communication to the interactive behaviour of users and user groups. To this end, the research has focused on three major lines: 1) the investigation of cognitive regulation of action in mobile human-computer interaction; 2) the description of the fundamental limitations in interacting with mobile devices when mobile; and 3) the charting of possible user interface solutions.
During the research, several innovative research methods and instruments have been developed to facilitate experimental research in naturalistic real-world settings. For instance, HIIT has developed a state-of-the art wearable video recording system that makes it possible to collect rich data for mobile human-computer interaction studies.
Availability of such data has enabled us to study phenomena that would not appear in a laboratory setting. As an example, we built a predictive model of a mobile user’s attention, basing on Bayesian networks and data collected from 28 users of mobile web browsers. The results are promising, with accuracy in binary classification reaching 72% (22% above default), even with realistic sensors.
Creative Commons licenses for media sharing
After introducing the Creative Commons (CC) licenses in Finland in 2003, HIIT researchers have focused on pros and cons of applying CC-licenses to community-created content and peer-to-peer media creation and delivery. We have also aimed to understand the new media business models and large-scale societal implications of Creative Commons approach. We are also building concrete experiments especially related to media archive sharing (the P2P Fusion EU project) and educational material distribution (the EduGrid initiative). The work has had a significant societal impact through facilitating the adoption and use of CC licenses in Finland and elsewhere.
Global network society research
The aim of this research line is to analyse at macroscopic societal level the logic and global challenges of the network society. The baseline of the work is given by the studies of Prof. Pekka Himanen with Prof. Manual Castells, who have analysed comparatively the Finnish/European, the Silicon Valley/USA, and Singapore/Chinese network society models. An interim goal of the work is to develop an integrated set of indicators, the Global Future Index, for describing the relations of network society development to innovation systems and social context. Outcomes of the work include a draft version of the index that has been presented to the World Economic Forum.
Dostları ilə paylaş: |