July 27, 2019

3130 words 15 mins read

Paper Group ANR 629

Paper Group ANR 629

Deformable Part-based Fully Convolutional Network for Object Detection. Unsupervised Image-to-Image Translation with Generative Adversarial Networks. Text2Action: Generative Adversarial Synthesis from Language to Action. What Can This Robot Do? Learning from Appearance and Experiments. Annotating High-Level Structures of Short Stories and Personal …

Deformable Part-based Fully Convolutional Network for Object Detection

Title Deformable Part-based Fully Convolutional Network for Object Detection
Authors Taylor Mordan, Nicolas Thome, Matthieu Cord, Gilles Henaff
Abstract Existing region-based object detectors are limited to regions with fixed box geometry to represent objects, even if those are highly non-rectangular. In this paper we introduce DP-FCN, a deep model for object detection which explicitly adapts to shapes of objects with deformable parts. Without additional annotations, it learns to focus on discriminative elements and to align them, and simultaneously brings more invariance for classification and geometric information to refine localization. DP-FCN is composed of three main modules: a Fully Convolutional Network to efficiently maintain spatial resolution, a deformable part-based RoI pooling layer to optimize positions of parts and build invariance, and a deformation-aware localization module explicitly exploiting displacements of parts to improve accuracy of bounding box regression. We experimentally validate our model and show significant gains. DP-FCN achieves state-of-the-art performances of 83.1% and 80.9% on PASCAL VOC 2007 and 2012 with VOC data only.
Tasks Object Detection
Published 2017-07-19
URL http://arxiv.org/abs/1707.06175v1
PDF http://arxiv.org/pdf/1707.06175v1.pdf
PWC https://paperswithcode.com/paper/deformable-part-based-fully-convolutional
Repo
Framework

Unsupervised Image-to-Image Translation with Generative Adversarial Networks

Title Unsupervised Image-to-Image Translation with Generative Adversarial Networks
Authors Hao Dong, Paarth Neekhara, Chao Wu, Yike Guo
Abstract It’s useful to automatically transform an image from its original form to some synthetic form (style, partial contents, etc.), while keeping the original structure or semantics. We define this requirement as the “image-to-image translation” problem, and propose a general approach to achieve it, based on deep convolutional and conditional generative adversarial networks (GANs), which has gained a phenomenal success to learn mapping images from noise input since 2014. In this work, we develop a two step (unsupervised) learning method to translate images between different domains by using unlabeled images without specifying any correspondence between them, so that to avoid the cost of acquiring labeled data. Compared with prior works, we demonstrated the capacity of generality in our model, by which variance of translations can be conduct by a single type of model. Such capability is desirable in applications like bidirectional translation
Tasks Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published 2017-01-10
URL http://arxiv.org/abs/1701.02676v1
PDF http://arxiv.org/pdf/1701.02676v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-image-to-image-translation-with
Repo
Framework

Text2Action: Generative Adversarial Synthesis from Language to Action

Title Text2Action: Generative Adversarial Synthesis from Language to Action
Authors Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, Songhwai Oh
Abstract In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed generative network correctly models the relationship between language and action and can generate a diverse set of actions from the same sentence.
Tasks
Published 2017-10-15
URL http://arxiv.org/abs/1710.05298v2
PDF http://arxiv.org/pdf/1710.05298v2.pdf
PWC https://paperswithcode.com/paper/text2action-generative-adversarial-synthesis
Repo
Framework

What Can This Robot Do? Learning from Appearance and Experiments

Title What Can This Robot Do? Learning from Appearance and Experiments
Authors Ashwin Khadke, Manuela Veloso
Abstract When presented with an unknown robot (subject) how can an autonomous agent (learner) figure out what this new robot can do? The subject’s appearance can provide cues to its physical as well as cognitive capabilities. Seeing a humanoid can make one wonder if it can kick balls, climb stairs or recognize faces. What if the learner can request the subject to perform these tasks? We present an approach to make the learner build a model of the subject at a task based on the latter’s appearance and refine it by experimentation. Apart from the subject’s inherent capabilities, certain extrinsic factors may affect its performance at a task. Based on the subject’s appearance and prior knowledge about the task a learner can identify a set of potential factors, a subset of which we assume are controllable. Our approach picks values of controllable factors to generate the most informative experiments to test the subject at. Additionally, we present a metric to determine if a factor should be incorporated in the model. We present results of our approach on modeling a humanoid robot at the task of kicking a ball. Firstly, we show that actively picking values for controllable factors, even in noisy experiments, leads to faster learning of the subject’s model for the task. Secondly, starting from a minimal set of factors our metric identifies the set of relevant factors to incorporate in the model. Lastly, we show that the refined model better represents the subject’s performance at the task.
Tasks
Published 2017-12-15
URL http://arxiv.org/abs/1712.05497v2
PDF http://arxiv.org/pdf/1712.05497v2.pdf
PWC https://paperswithcode.com/paper/what-can-this-robot-do-learning-from
Repo
Framework

Annotating High-Level Structures of Short Stories and Personal Anecdotes

Title Annotating High-Level Structures of Short Stories and Personal Anecdotes
Authors Boyang Li, Beth Cardier, Tong Wang, Florian Metze
Abstract Stories are a vital form of communication in human culture; they are employed daily to persuade, to elicit sympathy, or to convey a message. Computational understanding of human narratives, especially high-level narrative structures, remain limited to date. Multiple literary theories for narrative structures exist, but operationalization of the theories has remained a challenge. We developed an annotation scheme by consolidating and extending existing narratological theories, including Labov and Waletsky’s (1967) functional categorization scheme and Freytag’s (1863) pyramid of dramatic tension, and present 360 annotated short stories collected from online sources. In the future, this research will support an approach that enables systems to intelligently sustain complex communications with humans.
Tasks
Published 2017-10-08
URL http://arxiv.org/abs/1710.06917v2
PDF http://arxiv.org/pdf/1710.06917v2.pdf
PWC https://paperswithcode.com/paper/annotating-high-level-structures-of-short
Repo
Framework

Low-dose spectral CT reconstruction using L0 image gradient and tensor dictionary

Title Low-dose spectral CT reconstruction using L0 image gradient and tensor dictionary
Authors Weiwen Wu, Yanbo Zhang, Qian Wang, Fenglin Liu, Peijun Chen, Hengyong Yu
Abstract Spectral computed tomography (CT) has a great superiority in lesion detection, tissue characterization and material decomposition. To further extend its potential clinical applications, in this work, we propose an improved tensor dictionary learning method for low-dose spectral CT reconstruction with a constraint of image gradient L0-norm, which is named as L0TDL. The L0TDL method inherits the advantages of tensor dictionary learning (TDL) by employing the similarity of spectral CT images. On the other hand, by introducing the L0-norm constraint in gradient image domain, the proposed method emphasizes the spatial sparsity to overcome the weakness of TDL on preserving edge information. The alternative direction minimization method (ADMM) is employed to solve the proposed method. Both numerical simulations and real mouse studies are perform to evaluate the proposed method. The results show that the proposed L0TDL method outperforms other competing methods, such as total variation (TV) minimization, TV with low rank (TV+LR), and TDL methods.
Tasks Computed Tomography (CT), Dictionary Learning
Published 2017-12-13
URL http://arxiv.org/abs/1801.01452v2
PDF http://arxiv.org/pdf/1801.01452v2.pdf
PWC https://paperswithcode.com/paper/low-dose-spectral-ct-reconstruction-using-l0
Repo
Framework

On Estimation of $L_{r}$-Norms in Gaussian White Noise Models

Title On Estimation of $L_{r}$-Norms in Gaussian White Noise Models
Authors Yanjun Han, Jiantao Jiao, Rajarshi Mukherjee
Abstract We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of $r=1$ (with poly-logarithmic gap between upper and lower bounds) and $r$ even (with asymptotically sharp upper and lower bounds) over H"{o}lder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even $r$ in terms of an investigator’s ability to produce asymptotically adaptive minimax estimators without paying a penalty.
Tasks
Published 2017-10-11
URL https://arxiv.org/abs/1710.03863v4
PDF https://arxiv.org/pdf/1710.03863v4.pdf
PWC https://paperswithcode.com/paper/on-estimation-of-l_r-norms-in-gaussian-white
Repo
Framework

Spatial Aggregation of Holistically-Nested Convolutional Neural Networks for Automated Pancreas Localization and Segmentation

Title Spatial Aggregation of Holistically-Nested Convolutional Neural Networks for Automated Pancreas Localization and Segmentation
Authors Holger R. Roth, Le Lu, Nathan Lay, Adam P. Harrison, Amal Farag, Andrew Sohn, Ronald M. Summers
Abstract Accurate and automatic organ segmentation from 3D radiological scans is an important yet challenging problem for medical image analysis. Specifically, the pancreas demonstrates very high inter-patient anatomical variability in both its shape and volume. In this paper, we present an automated system using 3D computed tomography (CT) volumes via a two-stage cascaded approach: pancreas localization and segmentation. For the first step, we localize the pancreas from the entire 3D CT scan, providing a reliable bounding box for the more refined segmentation step. We introduce a fully deep-learning approach, based on an efficient application of holistically-nested convolutional networks (HNNs) on the three orthogonal axial, sagittal, and coronal views. The resulting HNN per-pixel probability maps are then fused using pooling to reliably produce a 3D bounding box of the pancreas that maximizes the recall. We show that our introduced localizer compares favorably to both a conventional non-deep-learning method and a recent hybrid approach based on spatial aggregation of superpixels using random forest classification. The second, segmentation, phase operates within the computed bounding box and integrates semantic mid-level cues of deeply-learned organ interior and boundary maps, obtained by two additional and separate realizations of HNNs. By integrating these two mid-level cues, our method is capable of generating boundary-preserving pixel-wise class label maps that result in the final pancreas segmentation. Quantitative evaluation is performed on a publicly available dataset of 82 patient CT scans using 4-fold cross-validation (CV). We achieve a Dice similarity coefficient (DSC) of 81.27+/-6.27% in validation, which significantly outperforms previous state-of-the art methods that report DSCs of 71.80+/-10.70% and 78.01+/-8.20%, respectively, using the same dataset.
Tasks 3D Medical Imaging Segmentation, Computed Tomography (CT), Pancreas Segmentation
Published 2017-01-31
URL http://arxiv.org/abs/1702.00045v1
PDF http://arxiv.org/pdf/1702.00045v1.pdf
PWC https://paperswithcode.com/paper/spatial-aggregation-of-holistically-nested-1
Repo
Framework

Color-opponent mechanisms for local hue encoding in a hierarchical framework

Title Color-opponent mechanisms for local hue encoding in a hierarchical framework
Authors Paria Mehrani, Andrei Mouraviev, Oscar J. Avella Gonzalez, John K. Tsotsos
Abstract A biologically plausible computational model for color representation is introduced. We present a mechanistic hierarchical model of neurons that not only successfully encodes local hue, but also explicitly reveals how the contributions of each visual cortical layer participating in the process can lead to a hue representation. Our proposed model benefits from studies on the visual cortex and builds a network of single-opponent and hue-selective neurons. Local hue encoding is achieved through gradually increasing nonlinearity in terms of cone inputs to single-opponent cells. We demonstrate that our model’s single-opponent neurons have wide tuning curves, while the hue-selective neurons in our model V4 layer exhibit narrower tunings, resembling those in V4 of the primate visual system. Our simulation experiments suggest that neurons in V4 or later layers have the capacity of encoding unique hues. Moreover, with a few examples, we present the possibility of spanning the infinite space of physical hues by combining the hue-selective neurons in our model.
Tasks
Published 2017-06-30
URL http://arxiv.org/abs/1706.10266v2
PDF http://arxiv.org/pdf/1706.10266v2.pdf
PWC https://paperswithcode.com/paper/color-opponent-mechanisms-for-local-hue
Repo
Framework

Object-Extent Pooling for Weakly Supervised Single-Shot Localization

Title Object-Extent Pooling for Weakly Supervised Single-Shot Localization
Authors Amogh Gudi, Nicolai van Rosmalen, Marco Loog, Jan van Gemert
Abstract In the face of scarcity in detailed training annotations, the ability to perform object localization tasks in real-time with weak-supervision is very valuable. However, the computational cost of generating and evaluating region proposals is heavy. We adapt the concept of Class Activation Maps (CAM) into the very first weakly-supervised ‘single-shot’ detector that does not require the use of region proposals. To facilitate this, we propose a novel global pooling technique called Spatial Pyramid Averaged Max (SPAM) pooling for training this CAM-based network for object extent localisation with only weak image-level supervision. We show this global pooling layer possesses a near ideal flow of gradients for extent localization, that offers a good trade-off between the extremes of max and average pooling. Our approach only requires a single network pass and uses a fast-backprojection technique, completely omitting any region proposal steps. To the best of our knowledge, this is the first approach to do so. Due to this, we are able to perform inference in real-time at 35fps, which is an order of magnitude faster than all previous weakly supervised object localization frameworks.
Tasks Object Localization, Weakly-Supervised Object Localization
Published 2017-07-19
URL http://arxiv.org/abs/1707.06180v1
PDF http://arxiv.org/pdf/1707.06180v1.pdf
PWC https://paperswithcode.com/paper/object-extent-pooling-for-weakly-supervised
Repo
Framework

Scalable and Effective Deep CCA via Soft Decorrelation

Title Scalable and Effective Deep CCA via Soft Decorrelation
Authors Xiaobin Chang, Tao Xiang, Timothy M. Hospedales
Abstract Recently the widely used multi-view learning model, Canonical Correlation Analysis (CCA) has been generalised to the non-linear setting via deep neural networks. Existing deep CCA models typically first decorrelate the feature dimensions of each view before the different views are maximally correlated in a common latent space. This feature decorrelation is achieved by enforcing an exact decorrelation constraint; these models are thus computationally expensive due to the matrix inversion or SVD operations required for exact decorrelation at each training iteration. Furthermore, the decorrelation step is often separated from the gradient descent based optimisation, resulting in sub-optimal solutions. We propose a novel deep CCA model Soft CCA to overcome these problems. Specifically, exact decorrelation is replaced by soft decorrelation via a mini-batch based Stochastic Decorrelation Loss (SDL) to be optimised jointly with the other training objectives. Extensive experiments show that the proposed soft CCA is more effective and efficient than existing deep CCA models. In addition, our SDL loss can be applied to other deep models beyond multi-view learning, and obtains superior performance compared to existing decorrelation losses.
Tasks MULTI-VIEW LEARNING
Published 2017-07-30
URL http://arxiv.org/abs/1707.09669v2
PDF http://arxiv.org/pdf/1707.09669v2.pdf
PWC https://paperswithcode.com/paper/scalable-and-effective-deep-cca-via-soft
Repo
Framework

Bilingual Words and Phrase Mappings for Marathi and Hindi SMT

Title Bilingual Words and Phrase Mappings for Marathi and Hindi SMT
Authors Sreelekha S, Pushpak Bhattacharyya
Abstract Lack of proper linguistic resources is the major challenges faced by the Machine Translation system developments when dealing with the resource poor languages. In this paper, we describe effective ways to utilize the lexical resources to improve the quality of statistical machine translation. Our research on the usage of lexical resources mainly focused on two ways, such as; augmenting the parallel corpus with more vocabulary and to provide various word forms. We have augmented the training corpus with various lexical resources such as lexical words, function words, kridanta pairs and verb phrases. We have described the case studies, evaluations and detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine translation systems. From the evaluations we observed that, there is an incremental growth in the quality of machine translation as the usage of various lexical resources increases. Moreover, usage of various lexical resources helps to improve the coverage and quality of machine translation where limited parallel corpus is available.
Tasks Machine Translation
Published 2017-10-05
URL http://arxiv.org/abs/1710.02398v3
PDF http://arxiv.org/pdf/1710.02398v3.pdf
PWC https://paperswithcode.com/paper/bilingual-words-and-phrase-mappings-for
Repo
Framework

A Survey on Multi-Task Learning

Title A Survey on Multi-Task Learning
Authors Yu Zhang, Qiang Yang
Abstract Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL.
Tasks Active Learning, Dimensionality Reduction, Multi-Task Learning, MULTI-VIEW LEARNING
Published 2017-07-25
URL http://arxiv.org/abs/1707.08114v2
PDF http://arxiv.org/pdf/1707.08114v2.pdf
PWC https://paperswithcode.com/paper/a-survey-on-multi-task-learning
Repo
Framework

Deep Hyperalignment

Title Deep Hyperalignment
Authors Muhammad Yousefnezhad, Daoqiang Zhang
Abstract This paper proposes Deep Hyperalignment (DHA) as a regularized, deep extension, scalable Hyperalignment (HA) method, which is well-suited for applying functional alignment to fMRI datasets with nonlinearity, high-dimensionality (broad ROI), and a large number of subjects. Unlink previous methods, DHA is not limited by a restricted fixed kernel function. Further, it uses a parametric approach, rank-$m$ Singular Value Decomposition (SVD), and stochastic gradient descent for optimization. Therefore, DHA has a suitable time complexity for large datasets, and DHA does not require the training data when it computes the functional alignment for a new subject. Experimental studies on multi-subject fMRI analysis confirm that the DHA method achieves superior performance to other state-of-the-art HA algorithms.
Tasks
Published 2017-10-11
URL http://arxiv.org/abs/1710.03923v1
PDF http://arxiv.org/pdf/1710.03923v1.pdf
PWC https://paperswithcode.com/paper/deep-hyperalignment
Repo
Framework

Marine Animal Classification with Correntropy Loss Based Multi-view Learning

Title Marine Animal Classification with Correntropy Loss Based Multi-view Learning
Authors Zheng Cao, Shujian Yu, Bing Ouyang, Fraser Dalgleish, Anni Vuorenkoski, Gabriel Alsenas, Jose Principe
Abstract To analyze marine animals behavior, seasonal distribution and abundance, digital imagery can be acquired by visual or Lidar camera. Depending on the quantity and properties of acquired imagery, the animals are characterized as either features (shape, color, texture, etc.), or dissimilarity matrices derived from different shape analysis methods (shape context, internal distance shape context, etc.). For both cases, multi-view learning is critical in integrating more than one set of feature or dissimilarity matrix for higher classification accuracy. This paper adopts correntropy loss as cost function in multi-view learning, which has favorable statistical properties for rejecting noise. For the case of features, the correntropy loss-based multi-view learning and its entrywise variation are developed based on the multi-view intact space learning algorithm. For the case of dissimilarity matrices, the robust Euclidean embedding algorithm is extended to its multi-view form with the correntropy loss function. Results from simulated data and real-world marine animal imagery show that the proposed algorithms can effectively enhance classification rate, as well as suppress noise under different noise conditions.
Tasks MULTI-VIEW LEARNING
Published 2017-05-03
URL http://arxiv.org/abs/1705.01217v1
PDF http://arxiv.org/pdf/1705.01217v1.pdf
PWC https://paperswithcode.com/paper/marine-animal-classification-with-correntropy
Repo
Framework
comments powered by Disqus