Using social semantic knowledge to improve annotations in personal photo collections
João Moura Pires (superv.), Universidade NOVA de Lisboa, February 2015.
Keywords: Personal Photo Collections, Context Separation, Annotations, Multimedia summarisation; Human factors; Empirical User Study
Abstract: The characteristics of a personal photo collection set challenges in the archival and retrieval that are different from the challenges in general-purpose multimedia collections. The images in personal photo collections show large variability in the depicted items and have hidden semantics. Such features make it hard to find a fully automated solution to the archival and retrieval, that deals with sensory and semantic gaps. Since emotions and non-visual contextual information can be very important to address those problems, including the user in-the-loop is relevant. Thus, manual annotations are key, although their time-consuming nature may alienate users from doing them.
The approach followed in this dissertation uses social semantic knowledge, as a basis to build algorithms for supporting the archival and the retrieval of images from personal photo collections. It borrows from data warehousing the notion of a multidimensional space, capable of answering rare, personalised and previously unseen queries, based on a highly descriptive, social aware, hierarchical set of dimensions. Those dimensions are the “when”, “where”, “who” and “what”. The user annotations are used to position photos in the multidimensional space, key to support the retrieval results, adapted to the user interacting with the system. To reduce the manual labour, the system relies on preprocessing the available information, gathered from the metadata and from previously inserted information, to suggest annotations that users will correct or accept. The suggestions are supported by a knowledge base of relevant concepts for a personal domain, stored as an ontology.
Two key algorithms are proposed, along with a prototype. The first algorithm, used during archival, does an automatic segmentation of a set of photos, keeping the spatio-temporal context coherent within segments. A second algorithm, used during retrieval,
summarises a set of photos with clustering techniques and short descriptions, relying on hierarchies of textual terms, retrieved from the multidimensional space’ dimensions.
The acceptance of the algorithms by the end users shows that using social semantic knowledge, supporting temporal regularities, and using textual human understandable terms to describe the context, are important to build reliable solutions for this domain.