July 28, 2019

3090 words 15 mins read

Paper Group ANR 337

Paper Group ANR 337

Shape from Shading through Shape Evolution. Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving. DOTE: Dual cOnvolutional filTer lEarning for Super-Resolution and Cross-Modality Synthesis in MRI. Multi-View Deep Learning for Consistent Semantic M …

Shape from Shading through Shape Evolution

Title Shape from Shading through Shape Evolution
Authors Dawei Yang, Jia Deng
Abstract In this paper, we address the shape-from-shading problem by training deep networks with synthetic images. Unlike conventional approaches that combine deep learning and synthetic imagery, we propose an approach that does not need any external shape dataset to render synthetic images. Our approach consists of two synergistic processes: the evolution of complex shapes from simple primitives, and the training of a deep network for shape-from-shading. The evolution generates better shapes guided by the network training, while the training improves by using the evolved shapes. We show that our approach achieves state-of-the-art performance on a shape-from-shading benchmark.
Tasks
Published 2017-12-08
URL http://arxiv.org/abs/1712.02961v1
PDF http://arxiv.org/pdf/1712.02961v1.pdf
PWC https://paperswithcode.com/paper/shape-from-shading-through-shape-evolution
Repo
Framework

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling

Title Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling
Authors Wenpeng Li, BinBin Zhang, Lei Xie, Dong Yu
Abstract Deep learning models (DLMs) are state-of-the-art techniques in speech recognition. However, training good DLMs can be time consuming especially for production-size models and corpora. Although several parallel training algorithms have been proposed to improve training efficiency, there is no clear guidance on which one to choose for the task in hand due to lack of systematic and fair comparison among them. In this paper we aim at filling this gap by comparing four popular parallel training algorithms in speech recognition, namely asynchronous stochastic gradient descent (ASGD), blockwise model-update filtering (BMUF), bulk synchronous parallel (BSP) and elastic averaging stochastic gradient descent (EASGD), on 1000-hour LibriSpeech corpora using feed-forward deep neural networks (DNNs) and convolutional, long short-term memory, DNNs (CLDNNs). Based on our experiments, we recommend using BMUF as the top choice to train acoustic models since it is most stable, scales well with number of GPUs, can achieve reproducible results, and in many cases even outperforms single-GPU SGD. ASGD can be used as a substitute in some cases.
Tasks Speech Recognition
Published 2017-03-17
URL http://arxiv.org/abs/1703.05880v2
PDF http://arxiv.org/pdf/1703.05880v2.pdf
PWC https://paperswithcode.com/paper/empirical-evaluation-of-parallel-training
Repo
Framework

Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving

Title Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving
Authors Thomas Guntz, Raffaella Balzarini, Dominique Vaufreydaz, James L. Crowley
Abstract In this paper we present the first results of a pilot experiment in the capture and interpretation of multimodal signals of human experts engaged in solving challenging chess problems. Our goal is to investigate the extent to which observations of eye-gaze, posture, emotion and other physiological signals can be used to model the cognitive state of subjects, and to explore the integration of multiple sensor modalities to improve the reliability of detection of human displays of awareness and emotion. We observed chess players engaged in problems of increasing difficulty while recording their behavior. Such recordings can be used to estimate a participant’s awareness of the current situation and to predict ability to respond effectively to challenging situations. Results show that a multimodal approach is more accurate than a unimodal one. By combining body posture, visual attention and emotion, the multimodal approach can reach up to 93% of accuracy when determining player’s chess expertise while unimodal approach reaches 86%. Finally this experiment validates the use of our equipment as a general and reproducible tool for the study of participants engaged in screen-based interaction and/or problem solving.
Tasks
Published 2017-10-12
URL http://arxiv.org/abs/1710.04486v1
PDF http://arxiv.org/pdf/1710.04486v1.pdf
PWC https://paperswithcode.com/paper/multimodal-observation-and-interpretation-of
Repo
Framework

DOTE: Dual cOnvolutional filTer lEarning for Super-Resolution and Cross-Modality Synthesis in MRI

Title DOTE: Dual cOnvolutional filTer lEarning for Super-Resolution and Cross-Modality Synthesis in MRI
Authors Yawen Huang, Ling Shao, Alejandro F. Frangi
Abstract Cross-modal image synthesis is a topical problem in medical image computing. Existing methods for image synthesis are either tailored to a specific application, require large scale training sets, or are based on partitioning images into overlapping patches. In this paper, we propose a novel Dual cOnvolutional filTer lEarning (DOTE) approach to overcome the drawbacks of these approaches. We construct a closed loop joint filter learning strategy that generates informative feedback for model self-optimization. Our method can leverage data more efficiently thus reducing the size of the required training set. We extensively evaluate DOTE in two challenging tasks: image super-resolution and cross-modality synthesis. The experimental results demonstrate superior performance of our method over other state-of-the-art methods.
Tasks Image Generation, Image Super-Resolution, Super-Resolution
Published 2017-06-15
URL http://arxiv.org/abs/1706.04954v1
PDF http://arxiv.org/pdf/1706.04954v1.pdf
PWC https://paperswithcode.com/paper/dote-dual-convolutional-filter-learning-for
Repo
Framework

Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

Title Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras
Authors Lingni Ma, Jörg Stückler, Christian Kerl, Daniel Cremers
Abstract Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel approach to object-class segmentation from multiple RGB-D views using deep learning. We train a deep neural network to predict object-class semantics that is consistent from several view points in a semi-supervised way. At test time, the semantics predictions of our network can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.
Tasks Scene Understanding, Semantic Segmentation
Published 2017-03-26
URL http://arxiv.org/abs/1703.08866v2
PDF http://arxiv.org/pdf/1703.08866v2.pdf
PWC https://paperswithcode.com/paper/multi-view-deep-learning-for-consistent
Repo
Framework

A Fixed-Point of View on Gradient Methods for Big Data

Title A Fixed-Point of View on Gradient Methods for Big Data
Authors Alexander Jung
Abstract Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those methods for minimizing convex objective functions. Due to their conceptual and algorithmic simplicity, gradient methods are widely used in machine learning for massive data sets (big data). In particular, stochastic gradient methods are considered the de- facto standard for training deep neural networks. Studying gradient methods within the realm of fixed-point theory provides us with powerful tools to analyze their convergence properties. In particular, gradient methods using inexact or noisy gradients, such as stochastic gradient descent, can be studied conveniently using well-known results on inexact fixed-point iterations. Moreover, as we demonstrate in this paper, the fixed-point approach allows an elegant derivation of accelerations for basic gradient methods. In particular, we will show how gradient descent can be accelerated by a fixed-point preserving transformation of an operator associated with the objective function.
Tasks
Published 2017-06-29
URL http://arxiv.org/abs/1706.09880v4
PDF http://arxiv.org/pdf/1706.09880v4.pdf
PWC https://paperswithcode.com/paper/a-fixed-point-of-view-on-gradient-methods-for
Repo
Framework

Deep 6-DOF Tracking

Title Deep 6-DOF Tracking
Authors Mathieu Garon, Jean-François Lalonde
Abstract We present a temporal 6-DOF tracking method which leverages deep learning to achieve state-of-the-art performance on challenging datasets of real world capture. Our method is both more accurate and more robust to occlusions than the existing best performing approaches while maintaining real-time performance. To assess its efficacy, we evaluate our approach on several challenging RGBD sequences of real objects in a variety of conditions. Notably, we systematically evaluate robustness to occlusions through a series of sequences where the object to be tracked is increasingly occluded. Finally, our approach is purely data-driven and does not require any hand-designed features: robust tracking is automatically learned from data.
Tasks
Published 2017-03-28
URL http://arxiv.org/abs/1703.09771v2
PDF http://arxiv.org/pdf/1703.09771v2.pdf
PWC https://paperswithcode.com/paper/deep-6-dof-tracking
Repo
Framework

Action Recognition: From Static Datasets to Moving Robots

Title Action Recognition: From Static Datasets to Moving Robots
Authors Fahimeh Rezazadegan, Sareh Shirazi, Ben Upcroft, Michael Milford
Abstract Deep learning models have achieved state-of-the- art performance in recognizing human activities, but often rely on utilizing background cues present in typical computer vision datasets that predominantly have a stationary camera. If these models are to be employed by autonomous robots in real world environments, they must be adapted to perform independently of background cues and camera motion effects. To address these challenges, we propose a new method that firstly generates generic action region proposals with good potential to locate one human action in unconstrained videos regardless of camera motion and then uses action proposals to extract and classify effective shape and motion features by a ConvNet framework. In a range of experiments, we demonstrate that by actively proposing action regions during both training and testing, state-of-the-art or better performance is achieved on benchmarks. We show the outperformance of our approach compared to the state-of-the-art in two new datasets; one emphasizes on irrelevant background, the other highlights the camera motion. We also validate our action recognition method in an abnormal behavior detection scenario to improve workplace safety. The results verify a higher success rate for our method due to the ability of our system to recognize human actions regardless of environment and camera motion.
Tasks Temporal Action Localization
Published 2017-01-18
URL http://arxiv.org/abs/1701.04925v1
PDF http://arxiv.org/pdf/1701.04925v1.pdf
PWC https://paperswithcode.com/paper/action-recognition-from-static-datasets-to
Repo
Framework

Representation learning of drug and disease terms for drug repositioning

Title Representation learning of drug and disease terms for drug repositioning
Authors Sahil Manchanda, Ashish Anand
Abstract Drug repositioning (DR) refers to identification of novel indications for the approved drugs. The requirement of huge investment of time as well as money and risk of failure in clinical trials have led to surge in interest in drug repositioning. DR exploits two major aspects associated with drugs and diseases: existence of similarity among drugs and among diseases due to their shared involved genes or pathways or common biological effects. Existing methods of identifying drug-disease association majorly rely on the information available in the structured databases only. On the other hand, abundant information available in form of free texts in biomedical research articles are not being fully exploited. Word-embedding or obtaining vector representation of words from a large corpora of free texts using neural network methods have been shown to give significant performance for several natural language processing tasks. In this work we propose a novel way of representation learning to obtain features of drugs and diseases by combining complementary information available in unstructured texts and structured datasets. Next we use matrix completion approach on these feature vectors to learn projection matrix between drug and disease vector spaces. The proposed method has shown competitive performance with state-of-the-art methods. Further, the case studies on Alzheimer’s and Hypertension diseases have shown that the predicted associations are matching with the existing knowledge.
Tasks Matrix Completion, Representation Learning
Published 2017-05-15
URL http://arxiv.org/abs/1705.05183v2
PDF http://arxiv.org/pdf/1705.05183v2.pdf
PWC https://paperswithcode.com/paper/representation-learning-of-drug-and-disease
Repo
Framework

Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health

Title Is Saki #delicious? The Food Perception Gap on Instagram and Its Relation to Health
Authors Ferda Ofli, Yusuf Aytar, Ingmar Weber, Raggi al Hammouri, Antonio Torralba
Abstract Food is an integral part of our life and what and how much we eat crucially affects our health. Our food choices largely depend on how we perceive certain characteristics of food, such as whether it is healthy, delicious or if it qualifies as a salad. But these perceptions differ from person to person and one person’s “single lettuce leaf” might be another person’s “side salad”. Studying how food is perceived in relation to what it actually is typically involves a laboratory setup. Here we propose to use recent advances in image recognition to tackle this problem. Concretely, we use data for 1.9 million images from Instagram from the US to look at systematic differences in how a machine would objectively label an image compared to how a human subjectively does. We show that this difference, which we call the “perception gap”, relates to a number of health outcomes observed at the county level. To the best of our knowledge, this is the first time that image recognition is being used to study the “misalignment” of how people describe food images vs. what they actually depict.
Tasks
Published 2017-02-21
URL http://arxiv.org/abs/1702.06318v1
PDF http://arxiv.org/pdf/1702.06318v1.pdf
PWC https://paperswithcode.com/paper/is-saki-delicious-the-food-perception-gap-on
Repo
Framework

Neural Machine Translation via Binary Code Prediction

Title Neural Machine Translation via Binary Code Prediction
Authors Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, Satoshi Nakamura
Abstract In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output layer to be logarithmic in vocabulary size in the best case. In addition, we also introduce two advanced approaches to improve the robustness of the proposed model: using error-correcting codes and combining softmax and binary codes. Experiments on two English-Japanese bidirectional translation tasks show proposed models achieve BLEU scores that approach the softmax, while reducing memory usage to the order of less than 1/10 and improving decoding speed on CPUs by x5 to x10.
Tasks Machine Translation
Published 2017-04-23
URL http://arxiv.org/abs/1704.06918v1
PDF http://arxiv.org/pdf/1704.06918v1.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-via-binary-code
Repo
Framework

Multi-focus image fusion using VOL and EOL in DCT domain

Title Multi-focus image fusion using VOL and EOL in DCT domain
Authors Mostafa Amin-Naji, Ali Aghagolzadeh
Abstract The purpose of multi-focus image fusion is gathering the essential information and the focused parts from the input multi-focus images into a single image. These multi-focused images are captured with different depths of focus of cameras. Multi-focus image fusion is very time-saving and appropriate in discrete cosine transform (DCT) domain, especially when JPEG images are used in visual sensor networks (VSN). The previous works in DCT domain have some errors in selection of the suitable divided blocks according to their criterion for measurement of the block contrast. In this paper, we used variance of Laplacian (VOL) and energy of Laplacian (EOL) as criterion to measure the contrast of image. Also in this paper, the EOL and VOL calculations directly in DCT domain are prepared using vector processing. We developed four matrices which calculate the Laplacian of block easily in DCT domain. Our works greatly reduce error due to unsuitable block selection. The results of the proposed algorithms are compared with the previous algorithms in order to demonstrate the superiority of the output image quality in the proposed methods. The several JPEG multi-focus images are used in experiments and their fused image by our proposed methods and the other algorithms are compared with different measurement criteria.
Tasks
Published 2017-10-17
URL http://arxiv.org/abs/1710.06511v2
PDF http://arxiv.org/pdf/1710.06511v2.pdf
PWC https://paperswithcode.com/paper/multi-focus-image-fusion-using-vol-and-eol-in
Repo
Framework

Synergistic Team Composition

Title Synergistic Team Composition
Authors Ewa Andrejczuk, Juan A. Rodriguez-Aguilar, Carme Roig, Carles Sierra
Abstract Effective teams are crucial for organisations, especially in environments that require teams to be constantly created and dismantled, such as software development, scientific experiments, crowd-sourcing, or the classroom. Key factors influencing team performance are competences and personality of team members. Hence, we present a computational model to compose proficient and congenial teams based on individuals’ personalities and their competences to perform tasks of different nature. With this purpose, we extend Wilde’s post-Jungian method for team composition, which solely employs individuals’ personalities. The aim of this study is to create a model to partition agents into teams that are balanced in competences, personality and gender. Finally, we present some preliminary empirical results that we obtained when analysing student performance. Results show the benefits of a more informed team composition that exploits individuals’ competences besides information about their personalities.
Tasks
Published 2017-02-27
URL http://arxiv.org/abs/1702.08222v1
PDF http://arxiv.org/pdf/1702.08222v1.pdf
PWC https://paperswithcode.com/paper/synergistic-team-composition
Repo
Framework

Local Directional Relation Pattern for Unconstrained and Robust Face Retrieval

Title Local Directional Relation Pattern for Unconstrained and Robust Face Retrieval
Authors Shiv Ram Dubey
Abstract Face recognition is still a very demanding area of research. This problem becomes more challenging in unconstrained environment and in the presence of several variations like pose, illumination, expression, etc. Local descriptors are widely used for this task. The most of the existing local descriptors consider only few immediate local neighbors and not able to utilize the wider local information to make the descriptor more discriminative. The wider local information based descriptors mainly suffer due to the increased dimensionality. In this paper, this problem is solved by encoding the relationship among directional neighbors in an efficient manner. The relationship between the center pixel and the encoded directional neighbors is utilized further to form the proposed local directional relation pattern (LDRP). The descriptor is inherently uniform illumination invariant. The multi-scale mechanism is also adapted to further boost the discriminative ability of the descriptor. The proposed descriptor is evaluated under the image retrieval framework over face databases. Very challenging databases like PaSC, LFW, PubFig, ESSEX, FERET, AT&T, and FaceScrub are used to test the discriminative ability and robustness of LDRP descriptor. Results are also compared with the recent state-of-the-art face descriptors such as LBP, LTP, LDP, LDN, LVP, DCP, LDGP and LGHP. Very promising performance is observed using the proposed descriptor over very appealing face databases as compared to the existing face descriptors. The proposed LDRP descriptor also outperforms the pre-trained ImageNet CNN models over large-scale FaceScrub face dataset. Moreover, it also outperforms the deep learning based DLib face descriptor in many scenarios.
Tasks Face Recognition, Image Retrieval
Published 2017-09-20
URL https://arxiv.org/abs/1709.09518v2
PDF https://arxiv.org/pdf/1709.09518v2.pdf
PWC https://paperswithcode.com/paper/local-directional-relation-pattern-for
Repo
Framework

Dense Pooling layers in Fully Convolutional Network for Skin Lesion Segmentation

Title Dense Pooling layers in Fully Convolutional Network for Skin Lesion Segmentation
Authors Ebrahim Nasr-Esfahani, Shima Rafiei, Mohammad H. Jafari, Nader Karimi, James S. Wrobel, S. M. Reza Soroushmehr, Shadrokh Samavi, Kayvan Najarian
Abstract One of the essential tasks in medical image analysis is segmentation and accurate detection of borders. Lesion segmentation in skin images is an essential step in the computerized detection of skin cancer. However, many of the state-of-the-art segmentation methods have deficiencies in their border detection phase. In this paper, a new class of fully convolutional network is proposed, with new dense pooling layers for segmentation of lesion regions in skin images. This network leads to highly accurate segmentation of lesions on skin lesion datasets which outperforms state-of-the-art algorithms in the skin lesion segmentation.
Tasks Lesion Segmentation
Published 2017-12-29
URL https://arxiv.org/abs/1712.10207v4
PDF https://arxiv.org/pdf/1712.10207v4.pdf
PWC https://paperswithcode.com/paper/dense-fully-convolutional-network-for-skin
Repo
Framework
comments powered by Disqus