Access the full text.
Sign up today, get DeepDyve free for 14 days.
(2014)
The Scientific World Journal Hindawi Publishing Corporation
D. Lenat, M. Prakash, M. Shepherd (1986)
CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition BottlenecksAI Mag., 6
J. McCarthy (1987)
SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCEMachine intelligence
S. Fahlman (2011)
Using Scone's Multiple-Context Mechanism to Emulate Human-Like Reasoning
Jean-Christophe Nebel, Michal Lewandowski, Jérôme Thévenon, Francisco Martínez-Contreras, S. Velastín (2011)
Are Current Monocular Computer Vision Systems for Human Action Recognition Suitable for Visual Surveillance Applications?
A. Salah, T. Gevers, N. Sebe, A. Vinciarelli (2010)
Challenges of Human Behavior Understanding
T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, A. Wu (2002)
An Efficient k-Means Clustering Algorithm: Analysis and ImplementationIEEE Trans. Pattern Anal. Mach. Intell., 24
T. Joachims (1998)
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
M. Santofimia, J. Rincón, Jean-Christophe Nebel (2012)
Common-Sense Knowledge for a Computer Vision System for Human Action Recognition
S. Fahlman (2006)
Marker-Passing Inference in the Scone Knowledge-Base System
Liang Wang, D. Suter (2008)
Visual learning and recognition of sequential data manifolds with applications to human movement analysisComput. Vis. Image Underst., 110
Hilde Kuehne, Hueihan Jhuang, Estíbaliz Garrote, T. Poggio, Thomas Serre (2011)
HMDB: A large video database for human motion recognition2011 International Conference on Computer Vision
Chin-Hsien Fang, Ju-Chin Chen, Chien-Chung Tseng, J. Lien (2009)
Human Action Recognition Using Spatio-temporal Classification
Pingkun Yan, S. Khan, M. Shah (2008)
Learning 4D action feature models for arbitrary view action recognition2008 IEEE Conference on Computer Vision and Pattern Recognition
Rakesh Gupta, Mykel Kochenderfer (2004)
Common Sense Data Acquisition for Indoor Mobile Robots
Maja Stikic, B. Schiele (2009)
Activity Recognition from Sparsely Labeled Data Using Multi-Instance Learning
B. Krausz, C. Bauckhage (2010)
Action Recognition in Videos Using Nonnegative Tensor Factorization2010 20th International Conference on Pattern Recognition
D. Bruckner, G. Yin, Armin Faltinger (2012)
Relieved commissioning and human behavior detection in Ambient Assisted Living Systemse & i Elektrotechnik und Informationstechnik, 129
(2008)
Delivery context ontology. World Wide Web Consortium, Working Draft WDdcontology
Gabriella Csurka, C. Dance, Lixin Fan, J. Willamowski, Cédric Bray (2002)
Visual categorization with bags of keypoints, 1
William Pentney, Ana-Maria Popescu, Shiaokai Wang, Henry Kautz, Matthai Philipose (2006)
Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense
Francisco Martínez-Contreras, C. Orrite-Uruñuela, J. Jaraba, Hossein Ragheb, S. Velastín (2009)
Recognizing Human Actions Using Silhouette-based HMM2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance
G. Primiero (2007)
Information and Knowledge, A Constructive Type-theoretical Approach, 10
I. Laptev, P. Pérez (2007)
Retrieving actions in movies2007 IEEE 11th International Conference on Computer Vision
D. Bruckner, B. Sallans, R. Lang (2007)
Behavior learning via state chains from motion detector sensors2007 2nd Bio-Inspired Models of Network, Information and Computing Systems
(2014)
Advances in Multimedia International Journal of Biomedical Imaging Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Artificial Neural Systems Advances in
J. Rincón, M. Santofimia, Jean-Christophe Nebel (2013)
Common-sense reasoning for human action recognitionPattern Recognit. Lett., 34
Shabbir Hossain, P. Valente, Kasper Hallenborg, Y. Demazeau (2011)
User modeling for activity recognition and support in Ambient Assisted Living6th Iberian Conference on Information Systems and Technologies (CISTI 2011)
Rodrigo Cilla, M. Patricio, A. Berlanga, J. Molina (2011)
Improving the Accuracy of Action Classification Using View-Dependent Context Information
G. Antoniou, F. Harmelen (2004)
Web Ontology Language: OWL
J. Hobbs, Robert Moore (1985)
Formal Theories of the Commonsense World
D. Davidson (1980)
Actions, Reasons, And Causes
Chris Baker, R. Saxe, J. Tenenbaum (2009)
Action understanding as inverse planningCognition, 113
R. Vezzani, Davide Baltieri, R. Cucchiara (2010)
HMM Based Action Recognition with Projection Histogram Features
H. Yasunaga (2008)
Information and KnowledgeJoho Chishiki Gakkaishi, 18
H. Knublauch, M. Musen, A. Rector (2004)
Editing Description Logic Ontologies with the Protégé OWL PluginDescription Logics
M. Santofimia, S. Fahlman, X. Toro, F. Moya, J. López (2012)
Possible-World and Multiple-Context Semantics for Common-Sense Action Planning
P. Berka (2011)
NEST: A Compositional Approach to Rule-Based and Case-Based ReasoningAdv. Artif. Intell., 2011
Mohiudding Ahmad, Seong-Whan Lee (2008)
Human action recognition using shape and CLG-motion flow from multi-view image sequencesPattern Recognit., 41
Ivan Laptev (2005)
On Space-Time Interest PointsInternational Journal of Computer Vision, 64
Danny Wyatt, Matthai Philipose, Tanzeem Choudhury (2005)
Unsupervised Activity Recognition Using Automatically Mined Common Sense
I. Sipiran, B. Bustos (2011)
Harris 3D: a robust extension of the Harris operator for interest point detection on 3D meshesThe Visual Computer, 27
M. Kaâniche, F. Brémond (2010)
Gesture recognition by learning local motion signatures2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
I. Laptev, Marcin Marszalek, C. Schmid, Benjamin Rozenfeld (2008)
Learning realistic human actions from movies2008 IEEE Conference on Computer Vision and Pattern Recognition
Xiaohang Wang, Daqing Zhang, T. Gu, H. Pung (2004)
Ontology based context modeling and reasoning using OWLIEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second
M. Minsky (1999)
The emotion machine: from pain to suffering
D Lenat, M Prakash, M Shepherd (1986)
CYC: using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecksArtificial Intelligence Magazine, 6
J. McCarthy (1960)
Programs with common sense
Michal Lewandowski, D. Makris, Jean-Christophe Nebel (2010)
View and Style-Independent Action Manifolds for Human Activity Recognition
Grégory Rogez, J. Guerrero, J. Rincón, C. Orrite-Uruñuela (2006)
Viewpoint Independent Human Motion Analysis in Man-made Environments
L. Sigal, A. Balan, Michael Black (2010)
HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human MotionInternational Journal of Computer Vision, 87
K Ducatel, M Bogdanowicz, F Scapolo, J Leijten, JC Burgelma (2001)
Scenarios for ambient intelligence in 2010 (ISTAG 2001)
(2014)
Electrical and Computer Engineering Journal of Journal of Computer Networks and Communications Hindawi Publishing Corporation
K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten, J. Burgelman (2001)
Scenarios for Ambient Intelligence in 2010 Final Report
S. Heymans, Li Ma, Darko Anicic, Zhilei Ma, Nathalie Steinmetz, Yue Pan, Jing Mei, Achille Fokoue, Aditya Kalyanpur, A. Kershenbaum, E. Schonberg, Kavitha Srinivas, C. Feier, Graham Hench, B. Wetzstein, U. Keller (2008)
Ontology Reasoning with Large Data Repositories
Tanzeem Choudhury, Matthai Philipose, Danny Wyatt, Jonathan Lester (2006)
Towards Activity Databases: Using Sensors and Statistical Models to Summarize People's LivesIEEE Data Eng. Bull., 29
J. Shand (2006)
Central Works of Philosophy: The Twentieth Century: Quine and After: Introduction
T. Kasteren, B. Kröse (2007)
Bayesian Activity Recognition in Residence for Elders
D. Lenat, R. Guha (1990)
Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project
Jingen Liu, Saad Ali, M. Shah (2008)
Recognizing human actions using multiple features2008 IEEE Conference on Computer Vision and Pattern Recognition
M. Tazari, Francesco Furfari, J. Lázaro, E. Ferro (2010)
The PERSONA Service Platform for AAL Spaces
Thomas Eiter, Giovambattista Ianni, A. Polleres, R. Schindlauer, H. Tompits (2006)
Reasoning with Rules and Ontologies
(2002)
J. Divers, Possible Worlds. Problems of Philosophy
(2014)
Human-Computer Interaction Advances in Computer Engineering Advances in
Daniel Weinland, Edmond Boyer, Rémi Ronfard (2007)
Action Recognition from Arbitrary Views using 3D Exemplars2007 IEEE 11th International Conference on Computer Vision
H. Storf, Martin Becker, Mark Riedl (2009)
Rule-based activity recognition framework: Challenges, technique and learning2009 3rd International Conference on Pervasive Computing Technologies for Healthcare
Jianguo Zhang, S. Gong (2010)
Action categorization with modified hidden conditional random fieldPattern Recognit., 43
(2014)
hindawi.com Volume 2014 Computational Intelligence & Neuroscience Hindawi Publishing Corporation/www.hindawi.com Volume 2014 Modelling & Simulation in Engineering
Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 270171, 18 pages http://dx.doi.org/10.1155/2014/270171 Research Article 1 2 3 Maria J. Santofimia, Jesus Martinez-del-Rincon, and Jean-Christophe Nebel Computer Architecture and Network Group, School of Computer Science, University of Castilla-La Mancha, 13072 Ciudad Real, Spain eTh Institute of Electronics, Communications and Information Technology (ECIT), Queens University of Belfast, Belfast BT3 9DT, UK Digital Imaging Research Centre, Kingston University, London KT1 2EE, UK Correspondence should be addressed to Maria J. Santofimia; [email protected] Received 23 August 2013; Accepted 29 October 2013; Published 14 May 2014 Academic Editors: G. Bordogna and I. Garc´ıa Copyright © 2014 Maria J. Santofimia et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Smart Spaces, Ambient Intelligence, and Ambient Assisted Living are environmental paradigms that strongly depend on their capability to recognize human actions. While most solutions rest on sensor value interpretations and video analysis applications, few have realized the importance of incorporating common-sense capabilities to support the recognition process. Unfortunately, human action recognition cannot be successfully accomplished by only analyzing body postures. On the contrary, this task should be supported by profound knowledge of human agency nature and its tight connection to the reasons and motivations that explain it. eTh combination of this knowledge and the knowledge about how the world works is essential for recognizing and understanding human actions without committing common-senseless mistakes. This work demonstrates the impact that episodic reasoning has in improving the accuracy of a computer vision system for human action recognition. This work also presents formalization, implementation, and evaluation details of the knowledge model that supports the episodic reasoning. 1. Introduction recognizeactions.Forexample,kickingandpunchingaretwo actions that suggest an ongoing gfi ht. In this sense, having Recognizing human actions is an essential requirement for the ability to recognize the sequence of actions that define an fulfilling the vision of Smart Spaces, Ambient Intelligence, undesirable behavior can be used to trigger a security alarm. or Ambient Assisted Living. These paradigms envision envi- Obviously, several challenges arise when dealing with ronments in which electronic devices, merged with the human action recognition. In addition to the inherent background, operate as sensors retrieving environmental difficulty of recognizing different people’s body postures information. Among all different types of sensors, video performingthesameaction[2], different actions may involve cameras are extremely powerful devices because of the great similar or identical poses. Moreover, images recorded within amount of contextual information that they are capable of a real environment are not always captured from the best capturing. However, despite human’s ability to understand perspectiveorangle,which makesitimpossibletoretrieve effortlessly video sequences through observation, computer poses consistently [3]. vision systems still have work to do in this regard. Fortunately, the human ability to recognize actions does Automatic video understanding is a delicate task that yet notonlyrelyonvisualanalysisofhuman body postures remains an unresolved topic [1]. Among all the challenges but also requires additional sources of information such as involved in video understanding, this paper focuses on context, knowledge about actor intentions, or knowledge humanactionrecognition sincethisisanenablingkey about how the world works normally referred to as common for Smart Spaces applications. Applications that depend on sense. This type of information helps people to recognize, the identification of certain behavior require the ability to among several similar actions, the one that is the most 2 The Scientific World Journal consistent with knowledge that person holds about previous into concrete procedures that can be computationally run. experiences. For example, consider the actions of waving and Section 6 validates the working hypothesis motivating throwing something overhead. They can be performed quite this work by assessing the proposed system performance, in thesameway.However,ifitisknown beforehand that from both thecomputervisionperspective andthe human the actor is not holding anything that could be thrown away, cognition one. Finally, Section 7 presents the most relevant waving is themostlikelyactionbeing performed. conclusions drawn from the work described here. However, this human ability is far more sophisticated than just a simple condition matching process. People have also the capacity to hypothesize about dieff rent situations 2. Previous Work or episodes, project eeff cts of actions based on previous experiences, wait for following actions to explain a previously Different approaches have been devised to tackle the problem nonunderstood action, or even ignore the occurrence of a of human action recognition from the computer vision per- certain action that cannot be recognized without interfering spective, such as [7–10]. Mainly, video-based action recogni- in the interpretation and understanding of the situation. tion algorithms rely on learning from examples and machine Human’s episodic memory is what enables us to model, learning techniques such as HMM [11], dimensionality reduc- represent, and reason about events, actions, preconditions, tion [12–14], or Bag of Words [9]. Since these approaches do and consequences [4]. not include any reasoning capability, their efficiency relies An episode is considered here to extend along time completely on the training and its coverage of all actions andinvolve asequenceofeventsand actions, presided by present in a given scenario. Unfortunately, all those action an undergoing plan. In this sense, a single action, such as recognition experiments are conducted with videos that are walking, should be seen as part of a higher level action, not representative of real life data, as it is demonstrated activity, or episode such as approaching an object to pick by the poor performance obtained on videos captured in it up. Single actions can take place on isolation but, for uncontrolled environments [9, 15]. It has been concluded understanding purposes, it is essential to distinguish when that none of existing techniques based solely on computer they are being part of a more complex activity or episode. vision and machine learning is currently suitable for real Notethatthewordsactivityandepisodeareusedinstinctively video surveillance applications [1]. along this document. However, few works combine video-based strategies with Consequently, a successful action recognition system, anthropological aspects or knowledge about human and inspired by the human one, should entail two different social behavior [16], despite these essential elements being perspectives, that is, visual analysis and episodic reasoning. of human behavior [17]. According to [18]human behavior This work extends the work in [ 5] in which an off-the- shelf computer vision system is combined with a heuristic is enabled by six different types of mechanisms: instinctive system for human action recognition. This work improves reactions, learned reactions, deliberative thinking, reflective the heuristic system by providing a computational implemen- thinking, self-reflective thinking, and self-conscious reflec- tation of the human episodic memory paradigm to support tion. es Th e mechanisms should be therefore considered as episode modeling, representation, and reasoning. Different an inherent part of any system intended to understand mental models intervene in episodic reasoning and this work human behavior, independent of the dimension in which it proposes use of three of them: beliefs, expectations, and is expressed, thinking, acting, or talking, for example. On the estimations. es Th e mental models hold different implications, contrary, the approach that followed from the human action such as the fact that a belief is true in the context of the recognition perspective consists in rather equipping systems person that holds that idea although it may not be true in with the minimum amount of information required to solve therealworldcontext.Theseimplicationshavetobecaptured the problem. in order to successfully implement an episodic reasoning Enabling computational systems with these mechanisms approach. A preliminary attempt to present this formulation is not an accessory demand, but, on the contrary, it is was introduced in [6]. Furthermore, usage of these mental becoming more and more essential as new paradigms depend models is formalized by means of a semantic model and on showing rational behavior. In this sense, human activity validated by, first, translating them into a software implemen- recognition is becoming a hot topic due to the key role it plays tation and, second, assessing the commonsensicality level of inefi ldsofknowledgesuchasAmbientAssistedLiving(AAL) the resulting system. [19] or Ambient Intelligence (AmI) [20]. In fact,asstatedin The following sections describe how these endeavors can [21], activity recognition is one of the main challenges faced be articulated based on the implementation of philosophical by AAL [22]. Provided that human activity recognition is a theories about knowledge and human behavior. More task that can be framed in very different elds fi of knowledge, it particularly, Section 2 presents some of the most relevant is important to state here that this work focuses on achieving worksgoing in thesamedirection as theone presentedhere. video-basedhuman action recognitionasanenablingkey Section 3 discusses the relevance that common sense has for AAL spaces. es Th e spaces are characterized by showing in achieving system with intelligent capabilities. Section 4 skills for supervising, helping, and assisting the elderly in proposes and formalizes a knowledge model for video-based human action recognition. Section 5 describes how the their daily life. es Th e skills need to therefore be grounded in formal theories supporting this work can be implemented cognitive and understanding capabilities. The Scientific World Journal 3 This reference to cognitive and understanding capabilities Cyc or OpenCyc (http://www.opencyc.org/), and Scone basically alludes to computational mechanisms for interpret- (http://www.cs.cmu.edu/∼sef/scone/). The first system sim- ing the facts provided by sensors and video devices deployed ply provides knowledge-based capabilities, lacking of an in an AAL space. The events captured by environmental sen- inference and reasoning engine, similarly, although OpenCyc sors are interpreted as signal describing an ongoing episode holds these mechanisms, it only provides limited capability in in a well-known context. Modeling the knowledge and infor- comparison with the commercial project Cyc. Finally, Scone mation gathered from this type of scenarios has also been is an open-source system that provides efficient mechanisms a major topic of discussion. In this sense, the World Wide for supporting common-sense reasoning and knowledge Web Consortium (W3C), aware of that shortage, provides modeling operations [4, 36]. The characteristic that makes a standardized and formal model of the environment [23]. Scone the most suitable choice when it comes to episodic Despite this attempt to standardize the conceptual entities reasoning is its capability to deal with multiple contexts at that should be part of the model, this ontology fails to provide thesametime. eTh conceptof context in Scone provides the the means to model ongoing episodes or situations, and, for perfect abstraction to hold episodes or situations. eTh way that reason, the work presented here has adopted the model Scone handles contexts is also essential to enable episodic proposed by McCarthy and Hayes [24]. The situation concept reasoning, since it implements a lightweight approach that proposed by McCarthy models world episodes as changes barely overloads the system response as contexts are being result of actions and events taking place in it. This work has createdinthe knowledgebase. Moreover,the fact that only therefore adopted this approach by describing actions and one context is active at a time provides a way of keeping events in terms of a set of statements describing the world inconsistent information in the same knowledge base without before theactiontakes placeand aeft rward. causing any disturbance in the data consistency. Setting aside the formality employed for knowledge modeling, next issue to be considered is the employed mech- 3. Leveraging Common Sense anism for undertaking human action recognition. Despite the fact that there is not a unique standard procedure for The development of the field of Artificial Intelligence has been action recognition in AAL, some of the most common led by the will of building computational intelligent systems. approaches are rule-based [21, 25, 26], statistical [27]or This task has turned out to be a very difficult one, and, despite learning, both in supervised and unsupervised modes [28, the fact that computing systems have been improving their 29]. However, due to the background of this paper, special intelligent skills, the lack of common sense that they sueff r attention is paid to those approaches based on video, like from has prevented them from becoming truly intelligent. In the one presented here. The work in [ 30] employs human words of Minsky [18]“some programs canbeatpeopleatchess. silhouettes linked by connectors, in such a way that different Others candiagnoseheart attacks. Yetotherscan recognize postures are represented by means of different silhouettes and pictures of faces, assemble cars in factories, or even pilot ships connections. The work in [ 31] proposes decomposing human andplanes. Butnomachine yetcan make abed,orreadabook, actions into subtasks, such that the recognition process is or babysit.”Inhis 1968 paper[37], McCarthy proposes an accomplished in different stages. eTh work in [ 32], despite not approach with which to build a program with the capability to being specicfi ally devoted to a video-based solution, refers to solve problems in the form of an advice taker. In order to do sensorsingeneral so it canbeeasilyextrapolatedtovideo- so,McCarthyreckonsthatsuchanattemptshouldbefounded basedsystems.Itconsistsinapplyingstatistical modeling of in the knowledge of the logical consequences of anything that sensor behavior to learn behavioral patterns that can be used could be told, as well as the knowledge that precedes it. In for security and care systems. The work in [ 33] extends the this work, McCarthy postulates that “aprogram hascommon previous work to consider additional approaches for not only senseifitautomatically deducesfromitselfasufficiently wide monitoring systems but also making special emphasis on the class of immediate consequences of anything it is told and what behavior modeling task. However, these types of systems, it already knows.” mainly characterized by their rigidness, fail to deal with For Lenat et al. [38], “common sense is the sort of unexpectedorunforeseensituations. Forthatreason, more knowledge that an encyclopedia would assume the reader elaborated reasoning mechanisms are required to deal with knew without being told (e.g., an object can’t be in two places action recognition in open spaces. By open spaces we refer at once).” Minsky [18] uses the term with regard to the things here to those environments in which interactions and events that we expect other people to know, those things labeled as are coming from different sources at unexpected times. obvious. In this sense, the feature that distinguishes people The task of modeling human behavior has been tack- from computers, regarding cognitive and understanding led in this work from the perspective of common sense. capabilities, is the vast amount of knowledge they hold as well Some activities have already been undertaken in this regard, as their associated mechanisms that support an effective use although from the perspective of indoor mobile robots of such knowledge. [34, 35]. Due to the great eoff rt involved in collecting Replicating human intelligence is therefore a task that knowledge about the everyday world, the most commonly requires an extremely large amount of knowledge. However, employed approach consists in resorting to existing systems. it is neither expert nor specific knowledge that needs to eTh re are not many systems dedicated to collect and man- age common-sense knowledge. In fact, the most famous be improved in these systems. On the contrary, the focus ones are OpenMind (http://commons.media.mit.edu/en/), should be placed at everyday knowledge known as common 4 The Scientific World Journal sense. In this sense, the working hypothesis motivating this 4. A Semantic Model for Human work was that video-based human action recognition could Action Recognition be enhanced with common-sense knowledge in order to There is a set of relevant concepts that characterize the enable episodic reasoning to overcome the occurrence of process of episodic reasoning for human action recognition, nonsensical errors. Two main difficulties are found in demonstrating this independent of whether it is carried out computationally or by a person. Essentially, there is a context in which a working hypothesis: on the one hand, to date, computer person, typically referred to as an actor, performs a set of vision systems are not yet capable of recognizing whichever human action performed in video sequences recorded from temporal actions, each of which is intended to a specicfi end. In this sense, a video-based human action recognition real scenarios [1]; and, on the other hand, collecting the system only requires a concrete set of entities to model the vast amount of common-sense knowledge held by humans is problem domain. eTh se are the actor who appears in the far from being a feasible task. Note that Cyc [39]has been scene, the context in which the scene is framed, the actions gathering common-sense knowledge for over 25 years and he/she performs, the beliefs and expectations the system it is still working on it. It is therefore necessary to make holds about what the actor is doing and what he/she is some simplifications to the original problem: human actions doing next, and finally the estimation in which all these that aretoberecognizedhavetobelimited to agiven set and human common-sense knowledge has to be reduced to beliefs are considered. es Th e concepts and their relationships, expressed in a semantic model, should suffice to formally an incomplete set. So, in this sense, the conclusions drawn model the knowledge involved in video-based human action from this incomplete set of common-sense knowledge can be directly extrapolated to the complete one. recognition, as empirically demonstrated in this paper. The semantic model also provides a set of syntactic It canbetemptingtothink that hand-craeft drepre- rules with their associated meaning which allows describing sentation of expert knowledge can, at some point, replace the knowledge involved in any episode of human action theroleofcommon-senseknowledge.Infact, thefollowing recognition. This knowledge is, in practice, reduced to a set quotation, extracted from [39], discusses this issue. of propositional statements written in terms of instances of “It is oeft n difficult to make a convincing case for having these concepts and their relationships. a consensus reality knowledge base, because whenever one Finally, the need for information standardization in cites a particular piece of common sense that would be needed in a situation, it’s easy to dismiss it and say “well, distributed systems also supports the demand for a semantic model. When more than one system or module interoper- we would have put that into our expert system as just one ates to perform an operation, there exists an information more (premise on a) rule.” For instance, in diagnosing a sick twenty-year-old coal miner, the program is told that exchange that should be supported on some sort of agreement that states how such information can be correctly processed he has been working in coal mines for 22 years (the typist and understood. accidentally hit two 2s instead of just one). Common sense tells us to question the idea of someone working in a However, despite the importance of counting on a seman- coal mine since age-2. Yes, if this sort of error had been tic model for human action recognition, a complete review foreseen, the expert system could of course question it also. of the state of the art has brought into light that this aspect The argument is, however, that we could keep coming up hasbeentotally overlooked.Thefactthatmostsolutions with instance aer ft instance where some additional piece of focus on the proposal of new algorithms, methodologies, or common sense knowledge would be needed in order to avoid signal processing approaches is probably the reason why the falling into an inhumanly silly mistake.” knowledge management aspect of the problem has not been exploited. On the contrary, this topic has been thoroughly Obviously, a more careful representation of information studied by philosophers [40–42]. could take into consideration that the age of a person cannot be a bigger number than the number of years the same Among existing theories about actions, this work imple- person has been working in coal mines. Using the same ments the theory proposed by Donald Davidson in “Theory context that concerns us here, it could be stated that, in order about actions, reasons, and causes”[40]. According to the to throw something overhead, the person has to previously Davidsonian view about the nature of actions, every human pick up the object that is about to throw away. However, action is rational because the explanation of that action the work presented here is more concerned with describing involves a judgment of some sort. In other words, what this the knowledge that would allow the system to achieve that theory states is that every action is motivated by an intention, same conclusion on its own, rather than with providing these in the broad sense of the word. So, the link between the action matching condition rules. eTh counterpart is that the amount and the reason that explains it is what Davidson refers to as of informationrequiredtodosoishuge. the rationalization. For that reason, the approach followed by this work con- eTh most relevant conclusion of this theory is that the sists in minimizing the common-sense knowledge involved reason that motivates an action also rationalizes it. This in the considered scenario by constraining the context in fact has very relevant implications to this work because it which actors perform. However, it is essential to highlight supportsnotonlythecomputationalformalmodelforhuman that these constrains should not be equated to the approach actionsproposedherebut also thevalidationofthe working followed by expert systems. hypothesis motivating this work. The Scientific World Journal 5 Context Actor Performs Exists in Is true in Belief Action Consists in Is a sequence of Expectation Estimation Figure 1: A semantic model for video-based human action recognition. Figure 1 depicts the set of concepts and relationships that : 𝐴 → 𝐺 ,suchthat 𝐴 is the set of possible actions, 𝐺 comprises a semantic model for human action recognition. is the set of possible actors, and the function returns the Apart from the concept of action and actor, some other actor performing the given action. Furthermore, the function relevant entities require their semantics to be modeled. It : ,𝐺 𝐴 → 𝑅 ,suchthat 𝑅 is the set of possible reasons is obvious that human action recognition cannot conceive motivating a specicfi action, returns the primary reason for an existence without considering the context in which actions actor performing an action in seeking specific results. Finally, are being performed. the function : 𝐺 → 𝐴 returns the actions performed by The simplicity of this model is in reducing human an actor. Consider action nature to those concepts that cannot be avoided. This ∃𝑔 ∈ 𝐺 ∀𝑎 ∈𝐴: ( (𝑎 ) ∧𝑃𝑅 (𝑔, 𝑎 )) ⇐⇒ 𝐴 𝑃 (𝑔 ) =𝑎 . semantic model can be used to model the domain knowledge, 𝑖 𝑖 𝑖 𝑖 (2) independent of the environment in which they are being considered. Moreover, this simplicity eases the process of eTh refore, every action is performed by an actor if and translating the formal model into a concrete programming only if there exists an actor with a primary reason to perform language implementation of the semantic model. eTh follow- that action. ing definitions state the foundation of the proposed semantic model. Den fi ition 3. An Actor is the set 𝐺 of individual actors who perform actions. eTh function 𝑡𝑖 : 𝐺 → 𝑇 returns Den fi ition 1. A Context is the set 𝐶 composed of statements the set 𝑇 of proattitudes held by an actor that support the which, when used together, describe knowledge about the reasons to perform certain actions. Moreover, the function general world or a specific belief. er Th e may be multiple : ,𝐺 𝑆 → 𝐴 ,suchthat 𝑆 is the subset of statements contexts describing each of the different views or beliefs of the describing actions performed by actors. The function 𝑆𝑇 : world. The meaning or truth value of a statement is a function 𝐺, 𝐴 → 𝑆 returns a statement describing that a specific actor of the context in which it is considered. performs a specific action at that exact time instant. Consider Let us define the function meaning :𝑆,𝐶 → 𝑀 ,where 𝑆 is the set of statements describing the world, 𝐶 is the set ∃𝑔 ∈ 𝐺 ∀𝑎 ∈𝐴 ∀𝑠 ∈𝑆:𝑃𝐹 (𝑔, 𝑠 ) =𝑎 ⇐⇒ 𝑠 =𝑆𝑇 (𝑔, 𝑎 ) . 𝑖 𝑖 𝑖 𝑖 𝑖 of possible contexts, and 𝑀 is the set of possible meanings. (3) Meaning (𝑠, )𝑐 returnsthe meaningortruth valueofthe statement 𝑠 in the context 𝑐 .This canbeformallystatedas Every action performed by an actor is described by means of a time-stamped statement. Consider ∃𝑚 ∈ 𝑀 ∀𝑐 ∈𝐶 ∀𝑠 ∈𝑆:𝑚= meaning (𝑠 ,𝑐 ) ⇐⇒ 𝑠 ⊆𝑐 . 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 (1) ∃𝑔 ∈ 𝐺 ∀𝑎 ∈𝐴 ∃𝑟∈𝑅:𝑃𝑅(𝑔,𝑎 )=𝑟⇐⇒𝑟 ∈ attitudes (𝑔) . 𝑖 𝑖 The meaning or truth value of a given statement depends (4) on the contexts in which it has been declared. The definition of actor therefore implies that, for every Den fi ition 2. An Action is the set 𝐴 of individual actions that action performed by that actor and motivated by a specicfi have been described from the perspective of their relation primary reason, the set of proattitudes supporting the actor to the primary reason that rationalizes them. eTh function behavior includes that specific primary reason. 𝑃𝐹 𝑡𝑢𝑑𝑒𝑎𝑡 𝐴𝐺 𝑃𝐴 𝑃𝑅 𝐴𝐺 𝐴𝐺 6 The Scientific World Journal Action video Ranked list Beliefs and sequences of actions expectations SVM Knowledge Story classifier modeling understanding Figure 2: Stages involved in the proposed solution for human action recognition. Den fi ition 4. A Belief is the ordered set 𝐵 of individual beliefs duringavideosequence.Anestimationconsistsinanordered comprised of a temporal sequence of statements describing setofbeliefs,inwhich themainbeliefisthe oneoutputby actions performed by actors. eTh function : 𝐵 → 𝑆 the recognition process. The function : 𝐸 → 𝐵 returns returns the sequence of action statements considered in a the ordered set of beliefs that compose that estimation. specific belief. Consider Additionally, function : 𝐸 → 𝐵 returns the main belief of a specicfi estimation. Consider ∀𝑎 ∈𝐴 ∀𝑔 ∈𝐺 ∀𝑠 ∈𝑆:𝑆𝑇 (𝑔 ,𝑎 ) =𝑠 ⇐⇒ 𝑠 ⊆𝐵𝐹 (𝑏 ) . 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 ∃𝑒 ∈ 𝐸 ∃𝑏 ,𝑏 ,...,𝑏 ∈𝐵: 1 2 𝑛 (5) (10) 𝑒 =𝑏 ⇐⇒ 𝐸𝐺 𝑒 =(𝑏 ,𝑏 ,...,𝑏 ). ( ) ( ) 1 1 2 𝑛 Every statement describing the fact that an action has been performed by an actor is part of a belief. As it has been already mentioned, the set 𝐵 is an ordered 5. System Implementation set of individual beliefs. eTh order is a direct consequence of The ultimate goal of this work is to demonstrate that com- the belief grade associated with each individual belief. eTh bining video recognition tools with episodic reasoning is the more a specific belief is considered to be the real sequence most compelling approach for human action recognition. eTh of actions taking place, the higher order it has in the ordered motivation is therefore to support the recognition process set. The belief located at the top of the ordered sequence of notonlyinvideo features analysis butalsointhe knowledge beliefs is referred to as main belief. Consider about human behavior and how the world works. In this ∃𝑚𝑏 ∈ 𝐵 : ∀𝑏 ∈𝐵|𝑚𝑏>𝑏 . endeavor, several stages can be identified in the proposed (6) 𝑖 𝑖 solution as depicted in Figure 2. eTh rst fi step consists in an Finally, beliefs are not considered in isolation but as part initial classicfi ation of actions based on visual body posture of a more general entity called estimation. eTh function : analysis. This initial classification is then provided, as input, 𝐸→𝐵 returns the ordered sequence of beliefs that comprise to the knowledge-based system in charge of rationalizing the a specicfi estimation of a video-based analysis of human recognized actions. However, rather than proposing just one action recognition. Consider action, the computer vision system returns a list of actions whose order depends on their associated probabilities. eTh ∀𝑏 ∈𝐵 ∃𝑒 ∈𝐸:𝑏𝑖⊆𝐵𝐹 𝑒 . ( ) (7) rfi st action in the ordered set is the most probable one, although it does not necessarily mean that this is the correct Den fi ition 5. An Expectation is the set 𝑋 of individual one. For that reason, it is more sensible to consider the set expectations; each of them contains an ordered sequence of of most probable actions rather than taking for granted that actions that are normally referred to as activity. eTh function the most probable action, the rfi st in the ranked list, is the : 𝑋 → 𝐴 returns the ordered set of actions composing a correct one. This approach exploits the fact that, although specific expectation. Consider the rfi st action is not the correct one, in most cases, the groundtruth action is present in the list of the vfi e most ∃𝑥 ∈ 𝑋 ∃𝑎 ,𝑎 ,...𝑎 ∈𝐴:𝑛= |𝑥 | , 1 2 𝑛 probable actions. Hopefully, if actors are really behaving in a rational manner, that is, performing actions motivated by (𝑥 ) =(𝑎 ,𝑎 ,...,𝑎 ), (8) 1 2 𝑛 reasons, and also the groundtruth action is present in that ∃𝑎 ∈𝐴 ∃𝑥 ∈𝑋 :𝑎⊆𝑥⇐⇒𝑎⊆𝐸𝑋 (𝑥 ) . list, then we expect the reasoning system to be able to identify the correct or groundtruth action even when it has not been Function 𝑅𝐴 : ,𝑋 𝐴 → 𝐴 returns the remaining ordered returned in first position. eTh third stage basically seeks for set of actions that follow up a specific ordered set: the motivations that might be behind each of these actions. This information supports the reasoning system in deciding ∃𝑥 ∈ 𝑋 ∃𝑎 ,𝑎 ,...,𝑎 ,...,𝑎 ∈𝐴 ∃𝑛, 1 2 𝑚 𝑛 which action better complies with actions believed to have been previously performed in the same video sequence. 𝑚∈ R |𝑚<𝑛:𝑅𝐴(𝑎 ,𝑎 ,...,𝑎 ) (9) 1 2 𝑚 A prototype system, going through these three stages, has been built in order to turn the working hypothesis into = (...,𝑎 )⇐⇒(𝑎 ,𝑎 ,...𝑎 )⊆𝐸𝑋 (𝑥 ) . 𝑛 1 2 𝑚 a real implementation that could be run and empirically Den fi ition 6. An Estimation is the set 𝐸 of individual estima- evaluated. This section describes the technological decisions, tions for each human action recognition process performed grouping them into three major areas, as known, computer 𝐸𝑋 𝐸𝑋 𝐵𝐹 𝑀𝐵 𝑀𝐵 𝐺𝐸 𝐵𝐹 The Scientific World Journal 7 vision analysis, knowledge management, and common-sense 5.2. Knowledge and Semantic Model. The capability of rea- reasoning. soning about knowledge has become an essential feature of any system intended to intelligently behave. However, some important questions arise in relation to that knowledge: 5.1. Computer Vision Module for Human Action Recognition. What does the system need to know in order to understand The first stage of our system consists in generating initial theongoing situation? Howsurethe system canbeabout action estimations by applying machine learning. eTh n, these its interpretation? Whenever a conflict arises between the estimates are passed to the knowledge-based system for fur- computer vision estimation and the knowledge-based one, ther reasoning. Given a video sequence, the computer vision which one should be considered as more reliable? system, trained to recognize a given set of actions, returns eTh se and similar questions are formally and theoreti- an ordered sequence of actions which best describes the cally addressed from the knowledge model perspective. The video according to the computer vision capability. Although implementation of that model is, however, not a trivial each of those actions has been assessed by the system as the issue and several concerns need to be considered first. most probable, alternative actions may still be likely. As a Selection of the most appropriate implementation technology consequence, in addition to the ordered sequence of actions, to comply with the model requirements is one of these issues, alternative actions with high probabilities are also provided as well as sharing the model to all modules involved in to the knowledge-based system. the proposed distributed architecture. This last requirement Among the different machine learning techniques that therefore imposes the constraint of being compatible with the can be applied to perform human action recognition [43, 44], rest of the architectural module technologies. the Bag of Words (BoW) framework [45, 46]isparticularly Regarding the rfi st issue, ontologies, specially those writ- suitable. BoW has been proved [47–49]asone of themost ten in OWL Language [53], are one of the most extended accurate methods for action recognition, able to perform on approaches to implement knowledge models. However, there a large variety of different scenarios with a low computational are several reasons arguing against their suitability for the cost. Contrary to other classification techniques, it does purpose that concerns us here. Firstly, the computer vision not require any additional segmentation algorithm, which system returns an ordered list of actions for each single action simplifies significantly the computer vision task and makes performed in the video sequence. Although only one of those possible working directly on video data. Consequently, BoW actions is selected to be part of the main belief, it is necessary methodology was chosen as the base of our computer vision to keep a record of all discarded actions just in case later hints module for action recognition. suggest that a previously selected action was not correct, in Similar to most machine learning techniques, BoW relies which case the estimation needs to be revised to propose a on a training phase to learn the discriminative features and different one. the classiefi rs that allow a correct recognition. eTh refore, our The need to keep track of uncertain actions implies that BoW training stage consists of, rfi stly, producing a codebook apriori inconsistent knowledgeshouldbeassertedtothe of feature descriptors, secondly, generating a descriptor for knowledge base. Inconsistency issues arise when proposi- each action video available in the training set, and, na fi lly, tional statements describe the actor performing different training a classifier with those video descriptors. eTh pipeline actions at the same time instant. These same time-instant starts by extracting salient feature points in each labeled actions correspond to each of the actions returned by the video belonging to the training set. To ensure discriminative computer vision module. For example, if two of the actions features, a well-known detector, Harris3D [50, 51], is applied. of the set are sitting down and getting up from a chair, Once feature points are extracted from all training videos, two propositional statements stating these facts should be a clustering algorithm [52]isusedtogroup andquantize asserted to the knowledge base. Obviously, this situation the salient point descriptors and to generate a codebook, wouldleadtoaninconsistentsituation sincebothactions or dictionary, which provides the vocabulary in which data cannot be performed at the same time. will be described. Finally, each video of the training set is Philosophers [41, 42] have suggested a theory to tackle described in terms of the new word descriptors and used as the problem of how to deal with inconsistent knowledge. This input to train a cascade of linear Support Vector Machine theory has been extrapolated to computing and, according (SVM) classifiers. In this way, the SVM classifiers, one per to Hobbs and Moore [54], instead of talking about the action, learn the optimal hyperplane that separate best the propositions that are true in a given context—or belief, using different actions. the terminology proposed here—one should rather talk about During the action classica fi tion phase, actions performed what states of affairs are compatible with what is already in the video of interest are recognized by applying a similar known. These states of affairs are referred to by philosophers procedure. Salient feature points are rfi st detected using the as possible worlds [55]. The possible worlds theory basically same Harris3D algorithm. en, Th the features are quantized consists in creating different worlds—once again we can talk using the learned codebook in order to generate a video about beliefs—each of which comprises the propositional descriptor. As final step, the descriptor is fed into each SVM knowledge verified to be consistent. classifier, which allows quantifying the similarity between the This leads to isolating inconsistent facts in different new sequence and each trained action type. As a result, an knowledge islands, referred to here as beliefs.Consistency issues can therefore be avoided by considering true only the ordered list of action labels is generated according to their fit. knowledge described under the active belief. In this sense, 8 The Scientific World Journal General Context Event Thing Belief Get up Pick up Sit down Action Person Believe 0 Walk Actor 1 Walk L1 Reading a Walk at t1 Expectation L1 Thing book L1 Pick up L1 Walk role L1 L1 Sit down Get up L1: has action Action Reading a book Reading a book at t1 Node Is a Individual Exists in context L1 Link Relationship Figure 3: Knowledge modeled using Scone. each of the actions returned by the computer vision module, frameworks claim to support ontology reasoning [58–60], instead of being asserted to the general knowledge base, is they are actually only performing consistency checking oper- being individually asserted to a different belief. This approach ations. In this regard, Scone provides powerful mechanisms assures that the general knowledge base is consistent, as well to support real reasoning tasks. eTh marker-passing mecha- as each of the different beliefs created in each estimation nism [36] that it implements provides an extremely efficient process. way of performing inference, deduction, or reasoning by Implementing the possible world theory to describe the default operations. propositional knowledge comprised of each belief has several In summary, the use of ontologies is unsuitable for advantages: (a) standard automatic deduction and inference the purposes described here, whereas Scone, through the techniques can be applied; (b) it assures knowledge-based multiple-context and marker-passing mechanisms, repre- consistency; (c) and more importantly uncertain information sents an excellent option for fulfilling the requirements of the does not need to be discarded. proposed semantic model and the knowledge management Unfortunately, ontologies do not yet enable the rep- demands. resentation of possible worlds due to the impossibility of Once theelectionofthe Sconeknowledge-based system deactivating some parts of the ontology while keeping the rest has been justified, the next matter to be tackled is the active. This mechanism is not supported by neither ontologies implementation of the proposed knowledge model using nor the existing approaches to manage them, such as Protege the Scone language syntax. Figure 3 depicts implementation [56]. On the contrary, Scone represents an excellent option using the Scone terminology. to deal with possible worlds by means of its multiple-context The Scone language is a dialect of Lisp, in which new con- mechanism [4, 57]. Every world or every belief can be cepts and relationships can be easily created to represent all described in a particular context, and only one context at a the elements of the semantic model. Sconecode 1 shows how time is active. Only the knowledge described in the active the actor concept is created as a specialization of one person, context is considered, therefore avoiding inconsistency issues therefore inheriting all the properties and relationships of a among statements asserted to different contexts. person. Still, the actor concept is a high level entity. This entity Not being able to deal with apriori inconsistent knowl- can be made concrete by means of the individual abstraction. edge is not the only reason why ontologies cannot be used Whenever an individual of a certain type is declared, an for the proposed architecture. In addition, although several instance of that type is created. The Scientific World Journal 9 (1) (new-type {actor}{person}) (2) (new-indv {actor1}{actor}) (3) (new-type {believe}{compound event}) (4) (new-type {expectation}{thing}) (5) (6) ;; An expectation is composed of an ordered sequence (7) ;; of actions (8) (new-type-role {has expectation}{expectation}{event}) (9) (10) ;; Here is an example of how an expectation is defined (11) (new-indv {picking up a book for reading it} (12) {expectation}) (13) (14) ;; Object properties in Scone are referred to roles (15) (the-x-of-y-is-z {has expectation}{picking up a book (16) for reading it}{walk towards}) (17) (the-x-of-y-is-z {has expectation}{picking up a book (18) for reading it}{pick up}) (19) (the-x-of-y-is-z {has expectation}{picking up a book (20) for reading it}{turn around}) (21) (the-x-of-y-is-z {has expectation}{picking up a book (22) for reading it}{sit down}) (23) (the-x-of-y-is-z {has expectation}{picking up a book (24) for reading it}{get up}) sconecode 1 (1) (2) (new-indv {test room}{room}) (3) (new-indv {test room doorway}{doorway}) (4) (the-x-of-y-is-z {doorway}{test room}{test room doorway }) (5) (new-indv {test room floor}{floor}) (6) (the-x-of-y-is-z {floor}{test room}{test room floor}) (7) (8) (new-type {chair}{thing}) (9) (new-type-role {chair leg}{chair}{thing}) (10) (new-type-role {chair sitting surface}{chair}{surface} :n 1) (11) (new-indv {test room chair}{chair}) sconecode 2 Finally, this module does not only consider the semantic knowledge-based system, in the general context, inherited by model knowledge or the knowledge describing how the every belief context. worldworks,alsoknown as common-sense knowledge, but Sconecode 2 shows some of the propositional statements also it does count on domain specific knowledge. Domain describing the DSK of the particular scenario. specific knowledge can be also referred to as context knowl- This code sample shows how basic information about the edge. However, for simplicity purposes, we will refer to environment is described under the proposed framework. that as domain specific knowledge (DSK) to avoid confu- This code represents the description of a test room, in which sions with the word context that was previously used to there is an entrance or doorway, as an example of domain describe the mechanism implemented by Scone to support specific knowledge (DSK). The common-sense knowledge— the possible world theory. DSK consists in the propositional also referred to as world knowledge or WK—already holds knowledge that describes the environment in which actions propositional statements stating that entering a room is an are being performed. This information turns out to be action that consists in crossing through a doorway to enter essential for meaning disambiguation purposes. DSK is also an enclosed space of a building. In addition, there is also described using the Scone language and asserted to the Scone a chair in the room—example of DSK—which is a type of 10 The Scientific World Journal 1st 2nd 3rd 4th 5th Legs Arms Legs Arms Arms Actions that sum up 50% Selected action Legs: 2 of probability Extract PoI Action Arms: 3 Exp.: reading a book Was that expectation 1st 2nd 3rd 4th 5th previously checked? Select expectation. For example, a Check action consistency. You need person picks up a book and sits down to pick up something before in order to read the book throwing it up Expectations If there is just a For multiple Select the most single expectation expectations: appropriate belief to active, follow it update active list assert the action Belief Figure 4: High level stages involved in the reasoning process. sitting surface—example of DSK. In the same way, the other waving, and scratching head are examples of an action set, the elements present in the test room are described following arm is the body part that appears more oeft n. This means that similar rules. the rfi st action, or the most probable action, is not kicking as it was originally estimated, but the next most probable one involving the arm, which is, in this case, the punching action. 5.3.Common-Sense Reasoning Module. This section describes eTh n, the reordered list of actions is checked for inconsistency the world knowledge functioning and the way people behav- issues. Consistency checking consists in determining whether ior can be heuristically used to recognize human actions. the action requirements are fulfilled. oTh se actions whose More specifically, this section intends to replicate the founda- requirements are not fulfilled are discarded. tions for human’s ability to recognize actions when angles or The second level considers the case where an action is video quality is quite poor, for example, and visual informa- part of a composite or activity, here referred to as expectation. tion is scarce or insucffi ient to support the recognition task. When an expectation is active, it can be assumed that the As stated in Section 5.2, the proposed framework for actor is engaged in accomplishing that sequential set of knowledgemodelingisbased on thepossibleworld theory. actions. For that reason, given the actions already performed Thistheoryisthe enabling keyfor modeling cognitive by theactor,itispossibletoknowthe actionsthatare mental models such as beliefs, through the use of the world expected next. abstraction. Section 5.2 also states that the possible world It might be possible that more than one expectation is theory is implemented in Scone by means of the multiple- active at a time. In that case, the system should keep track of context mechanism. This subsection is now concerned with them which will lead to not forcing any belief of the actions how to implement the reasoning routines exploiting the to come. Alternatively, if the active expectation is unique, the semantics implicit in the possible world theory. next action to come is sought in the ordered action set and Figure 4 depicts a high level description of the proposed aer ft ward asserted to the main belief. If the expected action wasnot in thelist, theactionwillbeforcedtothe main belief. reasoning routine. According to the entities involved in the different routine stages, three different levels can be Remaining actions are asserted to the following beliefs. identified. The first level deals with the action set returned Figure 5 depicts the activity diagram of the previously described process, whereas Figure 6 shows the class diagram by the computer vision system. Every human action can be characterized by the body part that mainly intervenes in for the implementation routine. accomplishing that action. For example, the punching action Finally, going one step further in the level of details used to describe the reasoning routine, Algorithm 1 shows its mainly involves fists, as part of arms, whereas the kicking one involves legs. eTh action set is therefore analyzed in order implementation using pseudocode. Whereas the focus of this work is on describing the proposed knowledge framework, to determine the prevailing body parts. eTh body part, or so-called here point of interest (PoI), that more frequently the work in [5] provides a thorough description of the appearsinthe action setisusedtoreorder thesameaction algorithmic aspects. This condensed version of the proposed reasoning mech- set so that rfi st actions are those involving the PoI, delegating others to the bottom of the list. Given that kicking, punching, anisms essentiallyanalyzeseachofthe vfi eactions provided The Scientific World Journal 11 Parse file with classified actions Read actions from file Is this first the line? Assert action: Get line of actions Create main belief “walking towards” Yes Is this a consistent action? No Discard action No Is this first action in line? Reorder based Set current on PoI belief to 0 Yes No Current belief++ Does action match the Is there an active expected one? expectation in current belief ? A-mark expectations involving actions in main Yes No No belief Yes B-mark expectations involving current actions Assert action to active belief Interchange current belief by Does any of the main one Mark expectations with A- actions in line marker and B-marker match? Select action Yes Successive active expectations ==2? No Activated expectation Identify ++ expected action Is single expectation marked? Yes Activate expectation No Is the active belief Yes the last one? Explore next belief No Figure 5: Activity diagram for reasoning about human actions. 12 The Scientific World Journal 1 ∗ -hasSetOfBeliefs Estimation Belief -actor -beliefName -timeInstant -composingActions -currentBelief -previousActions -currentAction -levelOfInheritance CSAR -CurrentExpectation -currentTimeInstant -directoryName: CSAR -mainBelief -generatesEstimation -expectationsForActionsInBelief -filesInDirectory -consideredBeliefs -activatedExpectations -belief 1 +CSAR()() -consideredActions +getActionsInBelief() +getFilePath() -consideredExpectations +setActionInBeliefAtCurrentTimeInstant() +main() +newActionEstimation() +setFirstActionInMainBelief() +setFirstActionInEstimation() +doesActionExistsInBeliefAtCurrentTimeInstant() +getMainBelief() +getExpectationsInvolvingActionsInCurrentBelief() 1 -isReviewedBy +setMainBelief() +addExpectationToExistingSetOfExpectations() +getBeliefsInEstimation() +incrementNumberOfActiveExpectations() +createNewBeliefInEstimation() +getCurrentBelief() +setCurrentBelief() -plausibleInBelief +setConsideredExpectations() +setActionExpectation() +isThereAnActiveExpectation() +filterInconsistentActions() +deleteCurrentActionIfInconsistent() +reorderActionsBasedOnPOI() +assertConsideredActionToCorrespondingBeliefs() +assertFirstActionInLineToMainBelief() +assertRemainingActionsInLineToRemainingBeliefs() +assertConsideredActionsFollowingTheActiveExpectation() +removeAssertedActionsFromConsideredActions() +removeBeliefContainingCurrentAction() +getIntersectionOfMarkedSet() +getSmallestNumberOfElementsInSet() +checkIfIntersectOfBeliefsIsRequired() +makeBackUpCopyOfConsideredBeliefs() -activeExpectations ∗ +restoreBackUpCopyOfConsideredBeliefs() 0..1 -estimation Expectation 1.. -worksUponClassifiedResultFiles -expectationName -actionsInExpectation -actionsToCome ClassifiedResultFile +getActionsInExpectation() -fileName: ClassifiedResultFile +setActionsToCome() -actor: <sin especificar> = ac -lineToParse -actionsInCurrentParsedLine -isLinkedToClassifiedResultFile -expectations 1 +analyzedClassifiedResultFile() -actions +setLineToParse() +getLineToParse() Action +getActionsFromLine() -actions -actor +setActionsFromLine() +stripActorName() -pointOfInterest -actionName +stripActionsFromLineToParse() 1 +parseActionsIXMASNumbersToActionObjects() -requiredActions +getExpectationsOfActions() -actions +getRequiredActions() +setPoI() +equals() +isActioneFir Th stActionToComeInExpectation() Figure 6: Class diagram for the common-sense reasoning module. by the computer vision system. These actions are evaluated was that the combination of common-sense knowledge, in the context in which they are being considered, referred reasoning capabilities, and video-based action classification to as DSK and, in the context of general knowledge, referred could improve human action recognition of each of the parts to as WK. If actions are consistent with this information, in isolation. This is done by removing common-senseless they are studied, in order to determine whether any of them mistakes which introduce noise into the recognition process. is part of an activity. If the analysis of the current and past The main axiomatic fact supporting this working hypoth- actionss brings into light that there is a unique activity taking esis is directly drawn from the nature of human agency or place, then theactivityissaidtobeactive. Active activities— humanbehavior. Basedonthe Davidsonianviewofhuman also referred to here as expectations—drive the system in its agency [40], actions are always motivated by a primary decision of which actions to assert in each of the parallel reason. Supported in this philosophical premise, the reason considered beliefs. that motivates actions also rationalizes them. This section aims at demonstrating the correctness of the working hypothesis. Since the proposed solution is tackled 6. Experimental Validation from taking advantage of different perspectives, that is, the computer vision and human cognition, each of them has to Following descriptions of both the theoretical and imple- be validated. mentational aspects of the working hypothesis, this section In this sense, the followed approach consists in comparing is devoted to discussing validation issues. It is important to recall that the working hypothesis motivating this work the accuracy of a common-sense based computer vision The Scientific World Journal 13 If the reason that motivates an action also rationalizes (1) Actions = 𝑎 ,𝑎 ,𝑎 ,𝑎 ,...,𝑎 1 2 3 4 𝑁 it, and, consequently, if motivations could be heuristically (2) for 𝑖=0 to 𝑁 do driven and restricted, actions would also be limited to those (3) if 𝑖==0 then matching the available motivations. In other words, if we (4) create(main belief) want a person to perform a certain action all what has to be (5) assert(“walk”, main belief) done is to motivate that person to do so. If that motivation is (6) end if subtle enough, this implication can be used to demonstrate (7) constraints = WK(𝑎 ) that common-sense capabilities enhance the performance of (8) if DSK(constraints) == true then a computer vision system. Recall that it is necessary to limit (9) mark expectations(active belief) the set of actions performed in a scene because, on the one (10) mark expectations(𝑎 ) hand, the available common-sense knowledge-based system (11) if isExpectationUnique(expectation) then (12) assert(𝑎 ,active belief) is incomplete and, on the other hand, the computer vision (13) end if system is only capable of recognizing a small set of actions. It (14) activate(𝑏 +1) 𝑖 has to be highlighted that actors should not be instructed to (15) else perform specific actions, because, by doing so, the rationality (16) discard(𝑎 ) explaining the action would have been contaminated. On the (17) end if contrary, by creating the appropriate atmosphere to motivate (18) end for certain actions, we are creating a real scenario, in which actors actdrivenbyintentions, whileatthe same time assuring that Algorithm 1: Perform estimation (actions). the scenario remains in the boundaries of the set of actions and knowledge known by the system. eTh limitednumberofactions that canberecognized by computer vision systems justifies the need for a set-up system with one without reasoning capabilities. Moreover, scenario. er Th e, actors are surrounded by suitable elements thehuman cognitiveperspective is validatedbycomparing that encourage them to perform a predefined and expected the system with the most characteristic example of cognitive set of actions such as punch or kick a punching-ball or read subjects: people. This was achieved by asking people to a book. The negligible probability of an actor performing perform the same recognition tasks performed by the system those activities without the presence of interactive objects for and under the same circumstances. Comparison between the fighting or reading makes them necessary for capturing the system and people performances allows assessing the level of actions of interest. “commonsensicality”heldbythe proposed system. The proposed scenario consists in a waiting room in which several objects have been strategically placed. Objects 6.1. Accuracy Assessment. Traditionally, the best way for such as a punching-ball, a chair, or a book are motivating assessing the performance of computer vision system for people behavior, for example, to play with the punching-ball human action recognition is by training and testing the whichshouldleadthemtoperformthekickingandpunching proposed system with one of the publicly available datasets. actions or to sit down on the chair. These open and public datasets are therefore the most Eleven sequences were recorded in which actors were just suitable benchmarks for evaluating and comparing proposed told to remain in the room for a period of time and feel solutions with existing approaches. free to enjoy the facilities present in the room. es Th e eleven Despite the relevant number of existing datasets, none sequences were manually groundtruthed and segmented into of them fulfill the initial requirements of this work, which actions. Afterward, these actions were fed to both, the basic include encapsulating the complexity of real life applications computer vision system and the enhanced version in which with a significant number of complex activities. eTh lack common-sense capabilities had been leveraged. of rationality with which the actions of these datasets are In order to train a computer vision system capable of performed made them unsuitable for the purposes of this successfully detecting and segmenting the actions happening work. Indeed, the proposed solution is based on the premise on this testing scenario, a suitable dataset must be chosen. that actions had to be performed for a reason in order to This training dataset must not only comprise similar activities be rational. Unfortunately, existing datasets consist in video to the ones being promoted in our testing contextualized sequencesinwhich actors aretoldwhattodoinacontextless dataset but also fulfill a set of requirements. us, Th the training scenario. Actions have to be part of a comprehensive story, so setmustbeabletocover avariety of camera viewssothat that performed actions make sense with regard to the aims or recognition is view-independent and the set should include the reasons that motivate actors to behave like they do. a sucffi iently large amount of instances of the actions of Twomainpremisessupport thecreationofthe new interest. es Th e instances must be not only annotated but dataset: first, rule-based strategies are to be avoided; and, also perfectly segmented and organized to simplify training. The only suitable sets which fulfill these requirements and second, directions given to actors are kept to a minimum. eTh se two premises can be satisefi d by creating the appropri- cover most of the activities that are promoted for our testing ate atmosphere that makes actors prone to perform certain environment are IXMAS [43]. IXMAS is focused on standard indoor actions which allows providing quite an exhaustive actions but allowing them, at the same time, to behave in a rational manner. description of possible actions in our limited scenario. Since 14 The Scientific World Journal Table 1: Average of accuracy rates obtained by the basic and intelligent a person is, but, if they are correctly addressed, they common-sense enhanced system. can certainly bring into light very relevant information about certain aspects of human intelligence. This fact is highlighted Computer vision system Accuracy rate here in order to make sure that results retrieved from the Basic computer vision system 29.4% proposed questionnaires are not misused or misinterpreted, Common-sense based computer vision system 51.9% and they are considered within the boundaries in which they were conceived. Obviously, if humans were provided with video sequences it is comprisedof12actions,performed by 12 dieff rent they would easily gur fi e out the actions being performed. actors, and recorded simultaneously by 5 different cameras, Moreover, the performance of the vision system has been it provides view independence and should oeff r sufficient alreadystatedinthe previous section. Forbothreasons,sub- examples to train a discriminative action classification. jects will be presented with the same information provided Table 1 shows the average of the accuracy rates obtained as input to the common-sense reasoning system: the set of for the eleven video sequences. A closer look to the accuracy the vfi e most probable actions returned by the computer rates obtained by actor shows that, in the best case scenarios, vision system. Based on that action set and the description accuracy rates reach a 75% of positive recognition for the about the scenario in which actions are being performed, enhanced system. humans have to determine the course of actions taking place. Table 2 presents the accuracy rates obtained by both In order to allow a fair comparison, the people completing the systems, the basic and the common-sense enhanced one, questionnaire have also been provided with a full description for each individual actor. eTh columns with labels 1 to 11 of the environment and the actions actors can perform. eTh represent each of the 11 individual actors, each of which has questionnaire has been elaborated allowing people to change been recorded in a video sequence. As it can be seen in that previous estimations based on information from following table, even when using the same recognition approach—basic actions;inthesamewaythecommon-sensereasoningsystem or common-sense enhanced—accuracy rate experiments interchanges beliefs whenever a lower-priority belief starts dramatic variations. Several reasons explain these values, gaining credit, due to the sequential activation expectations. mainly based on the rationality with which actions were being Since humans, unlike machines, easily get fatigued when performed by each actor. However, since these aspects belong engaged in tedious and repetitive tasks, the questionnaire to the human cognition side, it will be more detailed and should be therefore compiled to mitigate the impact of analyzed in the next subsection. tiredness in the obtained result. eTh proposed approach Note that results shown in Tables 1 and 2 were initially consists in focusing on the two most extreme cases, that is, presented in [5]. those in which the recognition accuracy rates obtained by thesystemare thehighest andthe lowest.Looking back to 6.2. Commonsensicality Assessment. From the cognitive per- Table 2, in which the system performance is compared with a spective, this system claims to hold common-sense knowl- computer vision system, it can be noticed that actors 4 and edge and reasoning capabilities complementing the computer 10 are, respectively, those in which the highest and lowest accuracy rates are achieved. vision system in the task of recognizing human actions rationally performed. Assessing this claim is not a trivial Results show how, despite some variations, the majority matter,mainlydue to thefactthatcommon-senseknowledge of thepeopletendtoidentify certainactions with thesame estimation. This suggests that the mental process followed to is neither unique nor common to everybody. On the contrary, common sense varies from one person to another due to reason about human actions is quite similar among people, criteria such as age, gender, education, or culture. independent of nationalities, education, or age. However, Measuring how commonsensical a person or a system there are always subjects in the groups who disagree. This is resembles the problem of measuring human intelligence. probably means that they are following a different reasoning course. Traditionally, intelligence quotients have been used to deter- mine and compare intelligence levels among humans. eTh se Independent of the mental process followed by ques- quotients are obtained from performance of subjects in tioned people, it has to be highlighted that, when compared with the system under test, they do not outperform the system intelligence tests. The same approach is therefore followed here to measure the commonsensical level of the system in accuracy rate. In fact, people even degrade performance of comparison to humans. Rather than resorting to complex and the computer vision system. This fact can therefore be used to philosophical questionnaires about common-sense knowl- demonstrate that the proposed system works, at least, as well edge, the proposed approach consists in presenting humans as a representative sample of people. This fact also indicates to the same situations analyzed by the system and compar- that the common-sense approach used by the system better ing their answers. In the aforementioned intelligence tests, suits the characteristic of the problem, if compared with the intelligence is treated as though it was unique and common mechanisms employed by the questioned people. Probably, people are resorting to more complex mechanisms such as to every human being, with the disadvantages involved in this simplification. However, if results are interpreted within past experiences, for example, that are encouraging them the boundaries of these simplifications, this type of test to ignore the recommendations of the computer vision system. It is also probable that those who have had previous canbeveryuseful. In otherwords,intelligencetestcannot be considered to be the silver bullet for determining how experiences with computer vision or intelligent systems better The Scientific World Journal 15 Table 2: Accuracy rates for each individual actor. CVS 1 2 3 4 5 6 7 8 9 10 11 Avg. Basic 35.5 16.0 30.0 58.3 44.4 22.2 40.0 15.4 40.0 16.7 33.3 29.4 CS 64.5 52.0 50.0 75.0 55.6 66.7 40.0 30.8 60.0 25.0 33.3 51.9 Table 3: Participants information. Gender Age Education Nationality (6 in total) Male Female ? <25 25–40 >40 ? Undergrad Postgrad ? Spanish Other EU Asian Canadian ? 34 3 — 9 11 4 13 9 15 13 16 5 2 1 13 Table 4: Average of accuracy rates obtained by questioned people and system. System Actor 4 (%) Actor 10 (%) Questionnaires 43.01 25.67 Reasoning system 75.0 25.0 Table 5: Accuracy in % obtained in recognizing each action. Actor 4 Walk Punch Point Walk Punch Turn Walk Punch Turn Punch Check Walk 100 75.67 16.21 5.40 54.05 0 2.70 64.86 10.81 56.75 48.64 94.59 Actor 10 Walk Kick Turn Walk Punch Walk Scratch Walk Sit Get Wave Walk 97.29 0 2.70 18.91 48.64 2.70 2.70 16.21 5.40 5.40 16.21 91.89 understand the mechanisms of these systems. Consequently, was to demonstrate that computational tasks involving some these people provide estimations more similar to the ones degree of human behavior understanding cannot be success- provided by the system. This is indeed one of the boundaries fully addressed without considering some form of reasoning constraining the importance that should be given to the and contextual information. To demonstrate this axiomatic questionnaire results. fact, a system has been built combining both strategies: It is also worth mentioning that actor 4, the one behaving computer vision and common sense. in a more rational manner, obtains a higher recognition eTh proposed system performs a primary recognition of rate than actor 10, the one behaving more erratically. Ques- actions, which is only based on image analysis capabilities. tionnaire results demonstrate that, as expected, a rational This rfi st stage calculates the vfi e most probable actions behavior canbemoreeasilyrecognizedthananerratic one. according to actors body postures. eTh se actions are provided In this sense, accuracy rates obtained by questioned people as inputs to the common-sense reasoning system. In a second are always better for actor 4 than for actor 10. This is the stage, the common-sense reasoning model performs some most relevant conclusion drawn from the analysis of the reasoning tasks upon the computer vision system suggested questionnaire results, since it can be used to demonstrate one actions. eTh se operations are supported upon a formal model of the axiomatic facts motivating this work: common-sense of knowledge, also proposed and formalized here. capabilities improve recognition rates of rational behavior. Essentially, three conceptual abstractions are proposed in eTh followingtablessummarizethemostrelevantaspects this model in order to replicate the mental process followed of the undertaken test. Table 3 starts by summarizing the by humans into a computational system. eTh notion of action, different subjects that have participated in these tests. Thirty- belief, and expectation articulates the reasoning mechanisms seven people, from six different nationalities and various implemented according to the Davidsonian theory of actions. age groups, have performed the questionnaires. Additionally, In order to validate this model, a new video dataset has been Table 4 summarizes accuracy average obtained by the 37 proposed here, in which actions are motivated by reasons. eTh questioned subjects. eTh se values are compared with the ones environment in which those video sequences are recorded obtained by the system proposed here. Finally, Table 5 shows has been carefully designed to provide actors with the reasons theaccuracyrateobtainedinthe recognitionofeachofthe 12 to perform the actions known by the computer vision system. actions composing the analyzed sequence. This contribution is validated by the construction of the prototype, therefore verifying that the proposed semantic model complies with knowledge requirements arising in 7. Conclusions supervised contexts for human action recognition. This paper describes a system for video-based human action Two more aspects need to be validated, as they are the recognition enhanced with common-sense knowledge and performance of the system in terms of recognition rates and reasoning capabilities. The main motivation of this work commonsensicality. eTh rst fi aspect has been evaluated by 16 The Scientific World Journal implementing a state-of-the-art approach for vision-based [7] R. Vezzani, D. Baltieri, and R. Cucchiara, “HMM based action recognition with projection histogram features,” in Proceedings humanactionrecognition.Thesecondaspectisevaluated of the 20th International Conference on Recognizing Patterns in by asking people to recognize human actions, based on Signals, Speech, Images, and Videos (ICPR ’10), pp. 286–293, the sole information provided by the ve fi most probable Springer, Berlin, Germany, 2010. actions. Results in both sides demonstrate that incorpo- [8] F. Mart´ınez-Contreras,C.Orrite-Urun˜uela, E. Herrero-Jaraba, rating common-sense knowledge and reasoning capabilities H. Ragheb, and S. A. Velastin, “Recognizing human actions dramatically improves recognition rates. Additionally, it can using silhouette-based HMM,” in Proceedings of the 6th IEEE also be concluded from the questionnaire analysis that, in InternationalConferenceonAdvancedVideo andSignalBased order for the common-sense reasoning system to show its Surveillance (AVSS ’09), pp. 43–48, IEEE Computer Society, great potential, human actions being analyzed should be part Washington, DC, USA, September 2009. of the rational behavior of the actor. Both the common- [9] I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld, “Learning sensereasoning system andpeoplehavefailedtosuccessfully realistic human actions from movies,” in Proceedings of the 26th recognize actions performed by erratic actors. IEEE Conference on Computer Vision and Pattern Recognition, Finally, it should be highlighted that this work tackles the pp. 1–8, Los Alamitos, Calif, USA, June 2008. problemofvision-basedhuman action recognitionfroma [10] J. Zhang and S. Gong, “Action categorization with modified comprehensive perspective. This entitles the proposed system hidden conditional random field,” Pattern Recognition,vol.43, to be deployed in any supervised environment in which no. 1, pp. 197–203, 2010. human behavior understanding is required, as in Ambient [11] M. Ahmad and S.-W. Lee, “Human action recognition Assisted Living. using shape and CLG-motion flow from multi-view image sequences,” Pattern Recognition,vol.41, no.7,pp. 2237–2252, Conflict of Interests [12] L. Wang and D. Suter, “Visual learning and recognition of sequential data manifolds with applications to human move- eTh authors declare that they have no conflict of interests. ment analysis,” Computer Vision and Image Understanding,vol. 110, no. 2, pp. 153–172, 2008. Acknowledgment [13] C.-H. Fang, J.-C. Chen, C.-C. Tseng, and J.-J. James Lien, “Human action recognition using spatio-temporal classifica- Thisresearchwas supportedbythe SpanishMinistryofSci- tion,” in Proceedings of the 9th Asian conference on Computer ence and Innovation under the Project DREAMS (TEC2011- Vision (ACCV ’09),vol.2,pp. 98–109, Springer,Berlin, Ger- 28666-C04-03). many, 2010. [14] M. Lewandowski, D. Makris, and J.-C. Nebel, “View and style- independent action manifolds for human activity recognition,” References in Proceedings of the 11th European Conference on Computer Vision (ECCV ’10), pp. 547–560, Springer, Berlin, Germany, [1] J. C. Nebel, M. Lewandowski, J. Thevenon, F. Martnez, and S. Velastin, “Are current monocular computer vision systems 2010. for human action recognition suitable for visual surveillance [15] H.Kuehne, H. Jhuang,E.Garrote,T.Poggio, andT.Serre, applications?” in Proceedings of the 7th International Conference “HMDB: a large video database for human motion recognition,” on Advances in Visual Computing (ISVC ’11),vol.2,pp.290–299, inProceedingsoftheIEEEInternationalConferenceonComputer Springer, Berlin, Germany, 2011. Vision (ICCV ’11), pp. 2556–2563, November 2011. [2] L. Sigal, A. O. Balan, and M. J. Black, “HumanEva: synchronized [16] C. L. Baker, R. Saxe, and J. B. Tenenbaum, “Action understand- video and motion capture dataset and baseline algorithm for ing as inverse planning,” Cognition,vol.113,no. 3, pp.329–349, evaluation of articulated human motion,” International Journal of Computer Vision,vol.87, no.1-2,pp. 4–27,2010. [17] A. A. Salah, T. Gevers, N. Sebe, and A. Vinciarelli, “Challenges [3] G.Rogez,J.J.Guerrero, J. Mart´ınez, and C. O. Urunuela, ˜ of human behavior understanding,” in Proceedings of the 1st “Viewpoint independent human motion analysis in manmade International Conference on Human Behavior Understanding environments,” in Proceedings of the British Machine Vision (HBU ’10), pp. 1–12, Springer, Berlin, Germany, 2010. Conference, M. J. Chantler, R. B. Fisher, and E. Trucco, Eds., pp. [18] M. Minsky, “The emotion machine: from pain to suffering,” in 659–668, British Machine Vision Association, Edinburgh, UK, Creativity & Cognition,pp. 7–13,1999. [19] T. V. Kasteren and B. Krose, “Bayesian activity recognition in [4] S. Fahlman, “Using Scone’s multiple-context mechanism to residence for elders,” in Proceedings of the 3rd IET International emulate human-like reasoning,” in Proceedings of the AAAI Fall Conference on Intelligent Environments (IE ’07),pp. 209–212, Symposium Series, 2011. September 2007. [5] J. Mart´ınez del Rinco´n, M. J. Santofimia, and J.-C. Nebel, [20] K. Ducatel, M. Bogdanowicz, F. Scapolo, J. Leijten, and J. C. “Common-sense reasoning for human action recognition,” Burgelma, “Scenarios for ambient intelligence in 2010 (ISTAG Pattern Recognition Letters,vol.34, no.15, pp.1849–1860,2013. 2001),” Tech. Rep., 2001. [6] M. J. Santofimia, J. Martinez-del Rincon, and J. C. Nebel, [21] H. Storf, M. Becker, and M. Riedl, “Rule-based activity recog- “Common-sense knowledge for a computer vision system for nition framework: challenges, technique and learning,” in human action recognition,” in Proceedings of the 4th Interna- Proceedings of the 3rd International Conference on Pervasive tional Conference on Ambient Assisted Living and Home Care Computing Technologies for Healthcare (PCTHealth ’09),pp.1–7, (IWAAL ’12), pp. 159–166, Springer, Berlin, Germany, 2012. April 2009. The Scientific World Journal 17 [22] M.-R. Tazari, F. Furfari, J. P. Lazaro Ramos, and E. Ferro, “The [37] J. McCarthy, “Programs with common sense,” in Semantic PERSONA service platform for AAL spaces,” in Handbook of Information Processing,vol.1,pp. 403–418, 1968. Ambient Intelligence and Smart Environments,pp. 1171–1199, [38] D. Lenat, M. Prakash, and M. Shepherd, “CYC: using common sense knowledge to overcome brittleness and knowledge acqui- [23] R. Lewis and J. M. Cantera Fonseca, “Delivery context ontol- sition bottlenecks,” Artificial Intelligence Magazine ,vol.6,no. 4, ogy. World Wide Web Consortium, Working Draft WD- pp.65–85,1986. dcontology,” Tech. Rep. 20080415, 2008. [39] D. B. Lenat and R. V. Guha, Building Large Knowledge-Based Sys- [24] J. McCarthy and P. J. Hayes, Some Philosophical Problems from tems, Representation and Inference in the Cyc Project, Addison- the Standpoint of Artificial Intelligence , Morgan Kaufmann, San Wesley, Boston, Mass, USA, 1989. Francisco, Calif, USA, 1987. [40] D. Davidson, “Actions, reasons, and causes,” Journal of Philoso- [25] P. Berka, “NEST: a compositional approach to rule-based and phy,vol.60, pp.685–700,1963. casebased reasoning,” Advances in Articfi ial Intelligence ,vol. [41] J. Divers, Possible Worlds. Problems of Philosophy,Routledge, 2011, Article ID 374250, 15 pages, 2011. [26] S. Hossain, P. Valente, K. Hallenborg, and Y. Demazeau, “User [42] P. Bricker and D. Lewis, On the Plurality of Worlds,Central modeling for activity recognition and support in ambient Works of Philosophy. eTh Twentieth Century: Quine and After, assisted living,” in Proceedings of the 6th Iberian Conference on Information Systems and Technologies (CISTI ’11),pp. 1–4, June [43] D. Weinland, E. Boyer, and R. Ronfard, “Action recognition from arbitrary views using 3D exemplars,” in Proceedings of the [27] T. Choudhury, M. Philipose, D. Wyatt, and J. Lester, “Towards 11thIEEE InternationalConferenceonComputerVision(ICCV activity databases: using sensors and statistical models to ’07), pp. 1–7, Rio de Janeiro, Brazil, October 2007. summarize peoples lives,” IEEE Data Engineering Bulletin,vol. [44] P. Yan, S. M. Khan, and M. Shah, “Learning 4D action feature 29,no. 1, pp.49–58,2006. models for arbitrary view action recognition,” in Proceedings [28] M. Stikic and B. Schiele, “Activity recognition from sparsely of the 26th IEEE Conference on Computer Vision and Pattern labeled data using multi-instance learning,” in Proceedings Recognition (CVPR ’08), IEEE Computer Society, Anchorage, of the 4th International Symposium on Location and Context Alaska, USA, June 2008. Awareness (LoCA ’09),pp. 156–173, Springer,Berlin, Germany, 2009. [45] T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” in Proceedings of the [29] D. Wyatt, M. Philipose, and T. Choudhury, “Unsupervised 10th European Conference on Machine Learning (ECML ’98), activity recognition using automatically mined common sense,” C. Nedellec and C. Rouveirol, Eds., pp. 137–142, Springer, in Proceedings of the 20th National Conference on Artificial Heidelberg, Germany, 1998. Intelligence (AAAI ’05), vol. 1, pp. 21–27, AAAI Press, July 2005. [30] B. Krausz and C. Bauckhage, “Action recognition in videos [46] G. Csurka,C.R.Dance,L.Fan,J.Willamowski, andC.Bray, using nonnegative tensor factorization,” in Proceedings of the “Visual categorization with bags of keypoints,” in Proceedings of 20th International Conference on Pattern Recognition (ICPR ’10), the Workshop on Statistical Learning in Computer Vision,pp. 1– pp.1763–1766,August2010. 22, 2004. [31] R. Cilla, M. A. Patricio, A. Berlanga, and J. M. Molina, “Improv- [47] M. B. Kaanic ˆ he and F. Bremo ´ nd, “Gesture recognition by ing the accuracy of action classication using viewdependent learning local motion signatures,” in Proceedings of the IEEE context information,” in Proceedings of the 6th International Computer Society Conference on Computer Vision and Pattern Conference on Hybrid Artificial Intelligent Systems (HAIS ’11) , Recognition (CVPR ’10),pp. 2745–2752, June 2010. vol. 2, pp. 136–143, Springer, Heidelberg, Germany, 2011. [48] J. Liu, S. Ali, and M. Shah, “Recognizing human actions using [32] D. Bruckner,B.Sallans,and R. Lang,“Behavior learningvia multiple features,” in Proceedings of the 26th IEEE Conference state chains from motion detector sensors,” in Proceedings of the on Computer Vision and Pattern Recognition (CVPR ’08), 2nd International Conference on Bio-Inspired Models of Network, Anchorage, Alaska, USA, June 2008. Information, and Computing Systems (BIONETICS ’07),pp.176– [49] I. Laptev and P. Per ´ ez, “Retrieving actions in movies,” in Pro- 183, December 2007. ceedings of the11thIEEE InternationalConferenceonComputer [33] D. Bruckner, G. Q. Yin, and A. Faltinger, “Relieved commission- Vision, pp. 1–8, October 2007. ing and human behavior detection in Ambient Assisted Living [50] I. Laptev, “On space-time interest points,” International Journal Systems,” Elektrotechnik Und Informationstechnik,vol.129,no. of Computer Vision,vol.64, no.2-3,pp. 107–123, 2005. 4, pp. 293–298, 2012. [51] I. Sipiran and B. Bustos, “Harris 3D: a robust extension of the [34] M. J. Kochenderfer and R. Gupta, “Common sense data acqui- Harris operator for interest point detection on 3D meshes,” sition for indoor mobile robots,” in Proceedings of the 19th Visual Computer,vol.27, no.11, pp.963–976,2011. National Conference on Artificial Intelligence (AAAI ’04) ,pp. [52] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, 605–610, AAAI Press/The MIT Press, July 2004. R. Silverman, and A. Y. Wu, “An efficient k-means clustering [35] W. Pentney, A.-M. Popescu, S. Wang, H. Kautz, and M. algorithms: analysis and implementation,” IEEE Transactions on Philipose, “Sensor-based understanding of daily life via large- Pattern Analysis and Machine Intelligence,vol.24, no.7,pp. 881– scale use of common sense,” in Proceedings of the 21st National 892, 2002. Conference on Artificial Intelligence , vol. 1, pp. 906–912, AAAI [53] G. Antoniou and F. Van Harmelen, “Web Ontology Language: Press, July 2006. OWL,” in Handbook on Ontologies in Information Systems,pp. [36] S. E. Fahlman, “Marker-passing inference in the scone 67–92, Springer, 2003. knowledge-base system,” in Proceedings of the 1st International Conference on Knowledge Science Engineering and Management [54] J. R. Hobbs and R. C. Moore, Formal Theories of the Common- (KSEM ’06),Lecture NotesinAI, Springer,2006. sense World, Ablex Series in Artificial Intelligence, Ablex, 1985. 18 The Scientific World Journal [55] G. Primiero, Information and Knowledge, A Constructive Type- Theoretical Approach ,vol.10of Logic, Epistemology, and the Unity of Science, Springer, 2008. [56] H. Knublauch, M. A. Musen, and A. L. Rector, “Editing description logic ontologies with the Protege OWL plugin,” in Proceedings of the International Conference on Description Logics,2004. [57] M. J. Santofimia, S. E. Fahlman, F. Moya, and J. C. Lopez, “Possible-world and multiple-context semantics for common- sense action planning,” in Proceedings of the Workshop on Space, Time and Ambient Intelligence (IJCAI ’11),M.Bhatt,H.W. Guesgen, andJ.C.Augusto,Eds., 2011. [58] T. Eiter, G. Ianni, A. Polleres, R. Schindlauer, and H. Tompits, “Reasoning with rules and ontologies,” in Reasoning Web 2006, pp.93–127, Springer,2006. [59] X. H. Wang,D.Q.Zhang,T.Gu, andH.K.Pung, “Ontology based context modeling and reasoning using OWL,” in Proceed- ings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications, Workshops (PerCom ’04), pp. 18–22, IEEE Computer Society, Washington, DC, USA, March 2004. [60] S. Heymans, L. Ma, D. Anicic et al., “Ontology reasoning with largedatarepositories,”in Ontology Management,pp. 89–128,
The Scientific World JOURNAL – Pubmed Central
Published: May 14, 2014
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.