Vie, 2014-07-18 17:11


JUEVES  04/09/2014 (Aula 2.11)
09:00 - 09:40 Reunión del grupo temático.Informe de actividades pasadas y propuesta de actividades para el próximo año.
09:45 - 10:15  Charla de David Martín: Visual Odometry for Intelligent Vehicles.
10:30 - 11:10 Charla de Francesc Moreno: Monocular 3D detection of rigid and non-rigid shapes.
11:30 - 12:15 Charla de Joost van der Weijer: Coloring Object Recognition.
12:30 - 13:00 Presentación técnica de INFAIMON, por Toni Ruiz: Soluciones de Visión Artificial. Tecnología Embedded al alcance del usuario.


JUEVES  04/09/2014 (Salón de Actos)
15:30 - 16:30


Krystian Mikolajczyk: Long term tracking and activity recognition in multi-camera network.

(Surrey University)



VIERNES 05/09/2014 (Aula 2.11)
10:00 - 12:00 Minitutorial. Ariadna Quanttoni. Spectral Methods for Structured Prediction.


Descripción de las charlas del jueves:


Davis Martín: Visual Odometry for Intelligent Vehicles

Jueves 4 de Septiembre, 9:45h.

Abstract: Visual Odometry is a technique of Computer Vision that allows to estimate the egomotion of a vehicle, by means one or more in-vehicle cameras. The technique estimates the pose of the vehicle through examination of the changes that motion induces on the images of its in-vehicle cameras. The technique uses visual information from the front ground of the vehicle, such as a road, where a road with high texture is essential for extracting the apparent motion of the vehicle. Images are captured consecutively (e.g. 100 ms) for ensuring that they have sufficient scene overlap. The Computer vision technique is presented using stereo vision, because motion estimation through stereo images reaches better accuracy and performance. The steps of a visual odometry algorithm for motion estimation using stereo vision are: (i) Obtain the feature points of the stereo images, (ii) Establish the matching between left feature points and right feature points of the stereo rig for subsequent triangulation towards obtaining 3D points, and (iii) Follow the feature points consecutively (e.g. 100 ms) to establish the vehicle movement.

Besides presenting the foundations of Visual Odometry for Intelligent Vehicles, the recent research in this field will be detailed. That is, an autocalibration method to determine the pose of a stereo vision system using the geometry of the ground in front of the in-vehicle cameras. The pose changes considerably while the vehicle is driven, so these constant changes of the pose are interesting to be able to detect constantly the variations in its extrinsic parameters (height, pitch, roll). The pose of the camera is useful for several applications based on Computer Vision, such as Advanced Driver Assistance Systems, Autonomous Vehicles or Robotics.

Short bio: David Martín, graduated in Industrial Physics (Automation) from the National University of Distance Education (UNED, 2002) and Ph.D. degree in Computer Science from the Spanish Council for Scientific Research (CSIC) and UNED, Spain 2008. He joined CSIC in 2002. He was fellow at the European Organization for Nuclear Research (CERN, Switzerland, 2006-2008) and Post-Doc researcher in Robotics at CSIC (2008-2011). Currently, he is Professor and Post-Doc researcher at Carlos III University of Madrid and member of the Intelligent Systems Lab since 2011. His research interests are Real-time Systems, Computer Vision, Sensor Fusion, Intelligent Transportation Systems, Advanced Driver Assistance Systems, Autonomous Navigation, Vehicle Positioning, and Field Robotics. He participates in several industrial research projects, and is reviewer of prestigious Journals and member of the Spanish Computer Vision Group, among others.


Francesc Moreno: Monocular 3D detection of rigid and non-rigid shapes

Jueves 4 de Septiembre, 10:30h.

Abstract: In this talk, I will first present an approach to the PnP problem -the estimation of the pose of a calibrated camera from n point correspondences between an image and a 3D model of a rigid object- whose computational complexity grows linearly with n. Our central idea is to express the 3D points as a weighted sum of four virtual control points.  The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) using simple linearization techniques. I will then show how an algebraic outlier rejection scheme can be introduced within the computation of the pose, without the need to resort to RANSAC-based strategies. In the second part of the talk I will discuss how the same linear formulations can be extended to retrieving 3D deformable objects. I will present results for both  non-rigid shape and human pose recovery.

Short bio: Francesc Moreno-Noguer received the MSc degrees in industrial engineering and electronics from the Technical University of Catalonia (UPC) and the Universitat de Barcelona in 2001 and 2002, respectively, and the PhD degree from UPC in 2005. From 2006 to 2008, he was a postdoctoral fellow at the computer vision departments of Columbia University and the EPFL. In 2009, he joined the Institut de Robòtica i Informàtica Industrial in Barcelona as an associate researcher of the Spanish Scientific Research Council. His research interests include retrieving rigid and nonrigid shape, motion, and camera pose from single images and video sequences.


Joost van der Weijer: Coloring Object Recognition

Jueves 4 de Septiembre, 11:30h.

Abstract: Bag-of-words based image representations are found to be very successful for object recognition. Initially, these representations were solely shape based, but over the last several years they have been extended with color information. Much research has been dedicated to optimal color feature design, often from a photometric invariance point of view. However, relatively little research went into the questions where color should be introduced in the bag-of-words pipeline. In this talk I will focus on this aspect, I will indicate several places where color can be introduced, analyze the theoretical consequences and compare experimental results.

The two main strategies to combine multiple cues, known as early- and late fusion both suffer from significant drawbacks. Based on their analysis I will propose two novel methods for combining shape and color cues. Firstly, I will discuss a method which is motivated by human color vision, called Color Attention. Here color is used to construct a top-down category-specific attention map. The color attention map is then further deployed to modulate the shape features by taking more features from regions within an image that are likely to contain an object instance. Evaluation of both approaches on several benchmark data sets shows that the proposed methods outperform both early- and late fusion.

Short bio: Joost van de Weijer is a senior scientist at the Computer Vision Center. Joost van de Weijer has a master in applied physics at Delft University of Technology and a PhD degree at the University of Amsterdam. He obtained a Marie Curie Intra-European scholarship in 2006, which was carried out in the LEAR team at INRIA Rhône-Alpes. From 2008-2012 he was a Ramon y Cajal fellow at the Universitat Autonòma de Barcelona. His research interest is in color for computer vision, object recognition, and color imaging. He has published in total over 60 peer reviewed papers. He has given several postgraduate tutorials at mayor ventures such as ICIP 2009, DAGM 2010, and ICCV 2011.


Toni Ruiz (Director técnico del Grupo INFAIMON): Soluciones de Visión Artificial. Tecnología Embedded al alcance del usuario. 

Jueves 4 de Septiembre, 12:30h.

Contenido:  El objetivo de esta charla consiste en dar a conocer las distintas soluciones de visión Embedded  existentes en el mercado. En los últimos años ha habido un auge en el desarrollo de hardware de visión compatible con las plataformas Embedded. Distintos fabricantes han apostado por estas tecnologías desarrollando cámaras con interfaces compatibles e implementando drivers específicos para estas plataformas. Veremos una breve descripción de los sistemas actuales y ejemplos prácticos de aplicaciones ya llevadas a cabo en el mundo de la visión industrial. 


Descripción de la CONFERENCIA PLENARIA del jueves:

Krystian Mikolajczyk: Long term tracking and activity recognition in multi-camera network.

Jueves 4 de Septiembre, 15:30 - 16:30

There have been much research effort and excellent methods proposed for tracking and activity recognition in particular within surveillance applications. Their performance is typically reported for short sequences with relatively high framerate. It is however hard to find approaches designed for analysing visual data recorded over weeks or months. Long term behavioural patterns of individual subjects are excellent indicators of their health status, mood, personal character etc. Animals are excellent subjects for such analysis as their environment can be restricted, they are unaware of the surveillance, privacy and data protection are not a problem in contrast to humans, yet the challenges and methods for data analysis are applicable to images with humans. The design of such system has to consider all components affecting pipeline including the environment, the hardware, the software as well as the processing budget. We have developed a methodology for animal behaviour analysis in confined areas such as laboratories, ZOOs, farms based on a network of motion triggered cameras recording low framerate sequences. Off the shelf camera traps were used but due to unreliable internal clocks the sequences needed to be synchronised every few days. New binary descriptors were proposed for efficient background subtraction to tract the animals between images. Local feature were also used for short term activity recognition. Finally long term behavioural patterns are accumulated in multi-dimensional histograms and compared using various distance metrics.

Krystian Mikolajczyk is an Associate Professor at the Centre for Vision, Speech and Signal Processing, in Electronic Engineering department at the University of Surrey.
He did his undergraduate study at the University of Science and Technology (AGH) in Krakow, Poland. He completed his PhD degree in 2002 at the Institute National Polytechnique de Grenoble, France, with an internship at the University of British Columbia , Canada. He then worked as a research officer in INRIA, University of Oxford and Technical University of Darmstadt (Germany), before joining the University of Surrey in 2005. His main area of expertise is in image and video recognition, in particular in problems related to image representation and machine learning. He participated in a number of EU and UK projects all in the area of image and video annotation and retrieval. Dr Mikolajczyk has more than 60 publications in top-tier computer vision and machine learning forums (http://bit.ly/1kE8nfg). He has served in various roles at major international conferences (e.g. ICCV, CVPR, ICPR, NIPS), co-chairing British Machine Vision Conference 2012 and IEEE International Conference on Advanced Video and Signal-Based Surveillance 2013.


Descripción del tutorial del viernes:

Ariadna Quanttoni: Spectral Methods for Structured Prediction

Viernes, 5 de Septiembre, de 10 a 12h


Acceso libre bajo inscripción, hasta completar el aforo.

Inscribirse en: VisionCea@gmail.com. Poner en asunto: Inscripción Tutorial JJAA2014



Many problems in robotics and computer vision require modeling paired sequences of inputs and outputs, i.e. sequence tagging. For example, consider a robot moving in some environment, at each point in time the robot receives readings from some sensors and must decide what action (out of a discrete set of actions) should be taken. In this example, the input would consist of a sequence of sensor readings and the output would consist of a sequence of discrete actions. Several problems in computer vision can also be caste as modeling paired sequences of continuous inputs and discrete outputs. For example, consider the problem of human gesture recognition where given a video sequence the task is to predict the gesture that is been performed at each frame. Clearly, this can be caste as a sequence prediction problem where the continuous inputs correspond to real-valued features of the video sequence and the discrete outputs correspond to the gestures been performed at each point in time.

In recent years we have seen the development of efficient provably correct algorithms for learning Hidden Markov Models and closely related function classes, these are the so-called spectral algorithms. These algorithms are appealing because of the of existence of theoretical guarantees and because of their efficiency.  In this tutorial I will present a simple derivation of spectral learning for structured prediction that puts emphasis on providing intuitions on the inner workings of the method and provides a unified view of several algorithms under a single simple framework. I will emphasize how this learning approach can be used to derive learning algorithms for sequence tagging tasks.