Du er ikke logget ind
Beskrivelse
With the vast development of Internet capacity and speed, as well as wide adop- tion of media technologies in people's daily life, a large amount of videos have been surging, and need to be efficiently processed or organized based on interest. The human visual perception system could, without difficulty, interpret and r- ognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. For a computer vision system, it has been be very challenging to achieve automatic video event understanding for decades. Broadly speaking, those challenges include robust detection of events under - tion clutters, event interpretation under complex scenes, multi-level semantic event inference, putting events in context and multiple cameras, event inference from object interactions, etc. In recent years, steady progress has been made towards better models for video event categorisation and recognition, e. g. , from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition. Nowadays, text based video retrieval is widely used by commercial search engines. However, it is still very difficult to retrieve or categorise a specific video segment based on their content in a real multimedia system or in surveillance applications.