Paper Group ANR 412
Segmentation and Classification of Skin Lesions for Disease Diagnosis. Social Behavior Prediction from First Person Videos. Interactive Spoken Content Retrieval by Deep Reinforcement Learning. Ceteris paribus logic in counterfactual reasoning. Fast Task-Specific Target Detection via Graph Based Constraints Representation and Checking. Collaborative …
Segmentation and Classification of Skin Lesions for Disease Diagnosis
Title | Segmentation and Classification of Skin Lesions for Disease Diagnosis |
Authors | Sumithra R, Mahamad Suhil, D. S. Guru |
Abstract | In this paper, a novel approach for automatic segmentation and classification of skin lesions is proposed. Initially, skin images are filtered to remove unwanted hairs and noise and then the segmentation process is carried out to extract lesion areas. For segmentation, a region growing method is applied by automatic initialization of seed points. The segmentation performance is measured with different well known measures and the results are appreciable. Subsequently, the extracted lesion areas are represented by color and texture features. SVM and k-NN classifiers are used along with their fusion for the classification using the extracted features. The performance of the system is tested on our own dataset of 726 samples from 141 images consisting of 5 different classes of diseases. The results are very promising with 46.71% and 34% of F-measure using SVM and k-NN classifier respectively and with 61% of F-measure for fusion of SVM and k-NN. |
Tasks | |
Published | 2016-09-12 |
URL | http://arxiv.org/abs/1609.03277v1 |
http://arxiv.org/pdf/1609.03277v1.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-and-classification-of-skin |
Repo | |
Framework | |
Social Behavior Prediction from First Person Videos
Title | Social Behavior Prediction from First Person Videos |
Authors | Shan Su, Jung Pyo Hong, Jianbo Shi, Hyun Soo Park |
Abstract | This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person cameras to automatically annotate each other’s the visual semantics of social configurations. We leverage two learning signals uniquely embedded in first person videos. Individually, a first person video records the visual semantics of a spatial and social layout around a person that allows associating with past similar situations. Collectively, first person videos follow joint attention that can link the individuals to a group. We learn the egocentric visual semantics of group movements using a Siamese neural network to retrieve future trajectories. We consolidate the retrieved trajectories from all players by maximizing a measure of social compatibility—the gaze alignment towards joint attention predicted by their social formation, where the dynamics of joint attention is learned by a long-term recurrent convolutional network. This allows us to characterize which social configuration is more plausible and predict future group trajectories. |
Tasks | 3D Reconstruction |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09464v1 |
http://arxiv.org/pdf/1611.09464v1.pdf | |
PWC | https://paperswithcode.com/paper/social-behavior-prediction-from-first-person |
Repo | |
Framework | |
Interactive Spoken Content Retrieval by Deep Reinforcement Learning
Title | Interactive Spoken Content Retrieval by Deep Reinforcement Learning |
Authors | Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-Shan Lee |
Abstract | User-machine interaction is important for spoken content retrieval. For text content retrieval, the user can easily scan through and select on a list of retrieved item. This is impossible for spoken content retrieval, because the retrieved items are difficult to show on screen. Besides, due to the high degree of uncertainty for speech recognition, the retrieval results can be very noisy. One way to counter such difficulties is through user-machine interaction. The machine can take different actions to interact with the user to obtain better retrieval results before showing to the user. The suitable actions depend on the retrieval status, for example requesting for extra information from the user, returning a list of topics for user to select, etc. In our previous work, some hand-crafted states estimated from the present retrieval results are used to determine the proper actions. In this paper, we propose to use Deep-Q-Learning techniques instead to determine the machine actions for interactive spoken content retrieval. Deep-Q-Learning bypasses the need for estimation of the hand-crafted states, and directly determine the best action base on the present retrieval status even without any human knowledge. It is shown to achieve significantly better performance compared with the previous hand-crafted states. |
Tasks | Q-Learning, Speech Recognition |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.05234v1 |
http://arxiv.org/pdf/1609.05234v1.pdf | |
PWC | https://paperswithcode.com/paper/interactive-spoken-content-retrieval-by-deep |
Repo | |
Framework | |
Ceteris paribus logic in counterfactual reasoning
Title | Ceteris paribus logic in counterfactual reasoning |
Authors | Patrick Girard, Marcus Anthony Triplett |
Abstract | The semantics for counterfactuals due to David Lewis has been challenged on the basis of unlikely, or impossible, events. Such events may skew a given similarity order in favour of those possible worlds which exhibit them. By updating the relational structure of a model according to a ceteris paribus clause one forces out, in a natural manner, those possible worlds which do not satisfy the requirements of the clause. We develop a ceteris paribus logic for counterfactual reasoning capable of performing such actions, and offer several alternative (relaxed) interpretations of ceteris paribus. We apply this framework in a way which allows us to reason counterfactually without having our similarity order skewed by unlikely events. This continues the investigation of formal ceteris paribus reasoning, which has previously been applied to preferences, logics of game forms, and questions in decision-making, among other areas. |
Tasks | Decision Making |
Published | 2016-06-24 |
URL | http://arxiv.org/abs/1606.07522v1 |
http://arxiv.org/pdf/1606.07522v1.pdf | |
PWC | https://paperswithcode.com/paper/ceteris-paribus-logic-in-counterfactual |
Repo | |
Framework | |
Fast Task-Specific Target Detection via Graph Based Constraints Representation and Checking
Title | Fast Task-Specific Target Detection via Graph Based Constraints Representation and Checking |
Authors | Went Luan, Yezhou Yang, Cornelia Fermuller, John S. Baras |
Abstract | In this work, we present a fast target detection framework for real-world robotics applications. Considering that an intelligent agent attends to a task-specific object target during execution, our goal is to detect the object efficiently. We propose the concept of early recognition, which influences the candidate proposal process to achieve fast and reliable detection performance. To check the target constraints efficiently, we put forward a novel policy to generate a sub-optimal checking order, and prove that it has bounded time cost compared to the optimal checking sequence, which is not achievable in polynomial time. Experiments on two different scenarios: 1) rigid object and 2) non-rigid body part detection validate our pipeline. To show that our method is widely applicable, we further present a human-robot interaction system based on our non-rigid body part detection. |
Tasks | |
Published | 2016-11-14 |
URL | http://arxiv.org/abs/1611.04519v2 |
http://arxiv.org/pdf/1611.04519v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-task-specific-target-detection-via-graph |
Repo | |
Framework | |
Collaborative Training of Tensors for Compositional Distributional Semantics
Title | Collaborative Training of Tensors for Compositional Distributional Semantics |
Authors | Tamara Polajnar |
Abstract | Type-based compositional distributional semantic models present an interesting line of research into functional representations of linguistic meaning. One of the drawbacks of such models, however, is the lack of training data required to train each word-type combination. In this paper we address this by introducing training methods that share parameters between similar words. We show that these methods enable zero-shot learning for words that have no training data at all, as well as enabling construction of high-quality tensors from very few training examples per word. |
Tasks | Zero-Shot Learning |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02310v3 |
http://arxiv.org/pdf/1607.02310v3.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-training-of-tensors-for |
Repo | |
Framework | |
Algorithmic Composition of Melodies with Deep Recurrent Neural Networks
Title | Algorithmic Composition of Melodies with Deep Recurrent Neural Networks |
Authors | Florian Colombo, Samuel P. Muscinelli, Alexander Seeholzer, Johanni Brea, Wulfram Gerstner |
Abstract | A big challenge in algorithmic composition is to devise a model that is both easily trainable and able to reproduce the long-range temporal dependencies typical of music. Here we investigate how artificial neural networks can be trained on a large corpus of melodies and turned into automated music composers able to generate new melodies coherent with the style they have been trained on. We employ gated recurrent unit networks that have been shown to be particularly efficient in learning complex sequential activations with arbitrary long time lags. Our model processes rhythm and melody in parallel while modeling the relation between these two features. Using such an approach, we were able to generate interesting complete melodies or suggest possible continuations of a melody fragment that is coherent with the characteristics of the fragment itself. |
Tasks | |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07251v1 |
http://arxiv.org/pdf/1606.07251v1.pdf | |
PWC | https://paperswithcode.com/paper/algorithmic-composition-of-melodies-with-deep |
Repo | |
Framework | |
Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images
Title | Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images |
Authors | Caner Sahin, Rigas Kouskouridas, Tae-Kyun Kim |
Abstract | State-of-the-art techniques proposed for 6D object pose recovery depend on occlusion-free point clouds to accurately register objects in 3D space. To reduce this dependency, we introduce a novel architecture called Iterative Hough Forest with Histogram of Control Points that is capable of estimating occluded and cluttered objects’ 6D pose given a candidate 2D bounding box. Our Iterative Hough Forest is learnt using patches extracted only from the positive samples. These patches are represented with Histogram of Control Points (HoCP), a “scale-variant” implicit volumetric description, which we derive from recently introduced Implicit B-Splines (IBS). The rich discriminative information provided by this scale-variance is leveraged during inference, where the initial pose estimation of the object is iteratively refined based on more discriminative control points by using our Iterative Hough Forest. We conduct experiments on several test objects of a publicly available dataset to test our architecture and to compare with the state-of-the-art. |
Tasks | Pose Estimation |
Published | 2016-03-08 |
URL | http://arxiv.org/abs/1603.02617v2 |
http://arxiv.org/pdf/1603.02617v2.pdf | |
PWC | https://paperswithcode.com/paper/iterative-hough-forest-with-histogram-of |
Repo | |
Framework | |
Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification
Title | Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification |
Authors | Maxime Bucher, Stéphane Herbin, Frédéric Jurie |
Abstract | This paper addresses the task of zero-shot image classification. The key contribution of the proposed approach is to control the semantic embedding of images – one of the main ingredients of zero-shot learning – by formulating it as a metric learning problem. The optimized empirical criterion associates two types of sub-task constraints: metric discriminating capacity and accurate attribute prediction. This results in a novel expression of zero-shot learning not requiring the notion of class in the training phase: only pairs of image/attributes, augmented with a consistency indicator, are given as ground truth. At test time, the learned model can predict the consistency of a test image with a given set of attributes , allowing flexible ways to produce recognition inferences. Despite its simplicity, the proposed approach gives state-of-the-art results on four challenging datasets used for zero-shot recognition evaluation. |
Tasks | Image Classification, Metric Learning, Zero-Shot Learning |
Published | 2016-07-27 |
URL | http://arxiv.org/abs/1607.08085v1 |
http://arxiv.org/pdf/1607.08085v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-semantic-embedding-consistency-by |
Repo | |
Framework | |
AVEC 2016 - Depression, Mood, and Emotion Recognition Workshop and Challenge
Title | AVEC 2016 - Depression, Mood, and Emotion Recognition Workshop and Challenge |
Authors | Michel Valstar, Jonathan Gratch, Bjorn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Guiota Stratou, Roddy Cowie, Maja Pantic |
Abstract | The Audio/Visual Emotion Challenge and Workshop (AVEC 2016) “Depression, Mood and Emotion” will be the sixth competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological depression and emotion analysis, with all participants competing under strictly the same conditions. The goal of the Challenge is to provide a common benchmark test set for multi-modal information processing and to bring together the depression and emotion recognition communities, as well as the audio, video and physiological processing communities, to compare the relative merits of the various approaches to depression and emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks. |
Tasks | Emotion Recognition |
Published | 2016-05-05 |
URL | http://arxiv.org/abs/1605.01600v4 |
http://arxiv.org/pdf/1605.01600v4.pdf | |
PWC | https://paperswithcode.com/paper/avec-2016-depression-mood-and-emotion |
Repo | |
Framework | |
Reference-Aware Language Models
Title | Reference-Aware Language Models |
Authors | Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling |
Abstract | We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention. |
Tasks | Dialogue Generation, Recipe Generation |
Published | 2016-11-05 |
URL | http://arxiv.org/abs/1611.01628v5 |
http://arxiv.org/pdf/1611.01628v5.pdf | |
PWC | https://paperswithcode.com/paper/reference-aware-language-models |
Repo | |
Framework | |
Interpretability in Linear Brain Decoding
Title | Interpretability in Linear Brain Decoding |
Authors | Seyed Mostafa Kia, Andrea Passerini |
Abstract | Improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of brain decoding models. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, we present a simple definition for interpretability of linear brain decoding models. Then, we propose to combine the interpretability and the performance of the brain decoding into a new multi-objective criterion for model selection. Our preliminary results on the toy data show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative linear models. The presented definition provides the theoretical background for quantitative evaluation of interpretability in linear brain decoding. |
Tasks | Brain Decoding, Model Selection |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05672v1 |
http://arxiv.org/pdf/1606.05672v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretability-in-linear-brain-decoding |
Repo | |
Framework | |
Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification
Title | Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification |
Authors | Seyed Mostafa Kia |
Abstract | Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed theoretical definition, we formalize a heuristic method for approximating the interpretability of multivariate brain maps in a binary magnetoencephalography (MEG) decoding scenario. Third, we propose to combine the approximated interpretability and the performance of the brain decoding model into a new multi-objective criterion for model selection. Our results for the MEG data show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms in the future. |
Tasks | Brain Decoding, Model Selection |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08704v1 |
http://arxiv.org/pdf/1603.08704v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretability-of-multivariate-brain-maps |
Repo | |
Framework | |
Incremental Noising and its Fractal Behavior
Title | Incremental Noising and its Fractal Behavior |
Authors | Konstantinos A. Raftopoulos, Marin Ferecatu, Dionyssios D. Sourlas, Stefanos D. Kollias |
Abstract | This manuscript is about further elucidating the concept of noising. The concept of noising first appeared in \cite{CVPR14}, in the context of curvature estimation and vertex localization on planar shapes. There are indications that noising can play for global methods the role smoothing plays for local methods in this task. This manuscript is about investigating this claim by introducing incremental noising, in a recursive deterministic manner, analogous to how smoothing is extended to progressive smoothing in similar tasks. As investigating the properties and behavior of incremental noising is the purpose of this manuscript, a surprising connection between incremental noising and progressive smoothing is revealed by the experiments. To explain this phenomenon, the fractal and the space filling properties of the two methods respectively, are considered in a unifying context. |
Tasks | |
Published | 2016-07-28 |
URL | http://arxiv.org/abs/1607.08362v2 |
http://arxiv.org/pdf/1607.08362v2.pdf | |
PWC | https://paperswithcode.com/paper/incremental-noising-and-its-fractal-behavior |
Repo | |
Framework | |
Boltzmann meets Nash: Energy-efficient routing in optical networks under uncertainty
Title | Boltzmann meets Nash: Energy-efficient routing in optical networks under uncertainty |
Authors | Panayotis Mertikopoulos, Aris L. Moustakas, Anna Tzanakaki |
Abstract | Motivated by the massive deployment of power-hungry data centers for service provisioning, we examine the problem of routing in optical networks with the aim of minimizing traffic-driven power consumption. To tackle this issue, routing must take into account energy efficiency as well as capacity considerations; moreover, in rapidly-varying network environments, this must be accomplished in a real-time, distributed manner that remains robust in the presence of random disturbances and noise. In view of this, we derive a pricing scheme whose Nash equilibria coincide with the network’s socially optimum states, and we propose a distributed learning method based on the Boltzmann distribution of statistical mechanics. Using tools from stochastic calculus, we show that the resulting Boltzmann routing scheme exhibits remarkable convergence properties under uncertainty: specifically, the long-term average of the network’s power consumption converges within $\varepsilon$ of its minimum value in time which is at most $\tilde O(1/\varepsilon^2)$, irrespective of the fluctuations’ magnitude; additionally, if the network admits a strict, non-mixing optimum state, the algorithm converges to it - again, no matter the noise level. Our analysis is supplemented by extensive numerical simulations which show that Boltzmann routing can lead to a significant decrease in power consumption over basic, shortest-path routing schemes in realistic network conditions. |
Tasks | |
Published | 2016-05-04 |
URL | http://arxiv.org/abs/1605.01451v1 |
http://arxiv.org/pdf/1605.01451v1.pdf | |
PWC | https://paperswithcode.com/paper/boltzmann-meets-nash-energy-efficient-routing |
Repo | |
Framework | |