Paper Group ANR 629
Deformable Part-based Fully Convolutional Network for Object Detection. Unsupervised Image-to-Image Translation with Generative Adversarial Networks. Text2Action: Generative Adversarial Synthesis from Language to Action. What Can This Robot Do? Learning from Appearance and Experiments. Annotating High-Level Structures of Short Stories and Personal …
Deformable Part-based Fully Convolutional Network for Object Detection
Title | Deformable Part-based Fully Convolutional Network for Object Detection |
Authors | Taylor Mordan, Nicolas Thome, Matthieu Cord, Gilles Henaff |
Abstract | Existing region-based object detectors are limited to regions with fixed box geometry to represent objects, even if those are highly non-rectangular. In this paper we introduce DP-FCN, a deep model for object detection which explicitly adapts to shapes of objects with deformable parts. Without additional annotations, it learns to focus on discriminative elements and to align them, and simultaneously brings more invariance for classification and geometric information to refine localization. DP-FCN is composed of three main modules: a Fully Convolutional Network to efficiently maintain spatial resolution, a deformable part-based RoI pooling layer to optimize positions of parts and build invariance, and a deformation-aware localization module explicitly exploiting displacements of parts to improve accuracy of bounding box regression. We experimentally validate our model and show significant gains. DP-FCN achieves state-of-the-art performances of 83.1% and 80.9% on PASCAL VOC 2007 and 2012 with VOC data only. |
Tasks | Object Detection |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06175v1 |
http://arxiv.org/pdf/1707.06175v1.pdf | |
PWC | https://paperswithcode.com/paper/deformable-part-based-fully-convolutional |
Repo | |
Framework | |
Unsupervised Image-to-Image Translation with Generative Adversarial Networks
Title | Unsupervised Image-to-Image Translation with Generative Adversarial Networks |
Authors | Hao Dong, Paarth Neekhara, Chao Wu, Yike Guo |
Abstract | It’s useful to automatically transform an image from its original form to some synthetic form (style, partial contents, etc.), while keeping the original structure or semantics. We define this requirement as the “image-to-image translation” problem, and propose a general approach to achieve it, based on deep convolutional and conditional generative adversarial networks (GANs), which has gained a phenomenal success to learn mapping images from noise input since 2014. In this work, we develop a two step (unsupervised) learning method to translate images between different domains by using unlabeled images without specifying any correspondence between them, so that to avoid the cost of acquiring labeled data. Compared with prior works, we demonstrated the capacity of generality in our model, by which variance of translations can be conduct by a single type of model. Such capability is desirable in applications like bidirectional translation |
Tasks | Image-to-Image Translation, Unsupervised Image-To-Image Translation |
Published | 2017-01-10 |
URL | http://arxiv.org/abs/1701.02676v1 |
http://arxiv.org/pdf/1701.02676v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-image-to-image-translation-with |
Repo | |
Framework | |
Text2Action: Generative Adversarial Synthesis from Language to Action
Title | Text2Action: Generative Adversarial Synthesis from Language to Action |
Authors | Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, Songhwai Oh |
Abstract | In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed generative network correctly models the relationship between language and action and can generate a diverse set of actions from the same sentence. |
Tasks | |
Published | 2017-10-15 |
URL | http://arxiv.org/abs/1710.05298v2 |
http://arxiv.org/pdf/1710.05298v2.pdf | |
PWC | https://paperswithcode.com/paper/text2action-generative-adversarial-synthesis |
Repo | |
Framework | |
What Can This Robot Do? Learning from Appearance and Experiments
Title | What Can This Robot Do? Learning from Appearance and Experiments |
Authors | Ashwin Khadke, Manuela Veloso |
Abstract | When presented with an unknown robot (subject) how can an autonomous agent (learner) figure out what this new robot can do? The subject’s appearance can provide cues to its physical as well as cognitive capabilities. Seeing a humanoid can make one wonder if it can kick balls, climb stairs or recognize faces. What if the learner can request the subject to perform these tasks? We present an approach to make the learner build a model of the subject at a task based on the latter’s appearance and refine it by experimentation. Apart from the subject’s inherent capabilities, certain extrinsic factors may affect its performance at a task. Based on the subject’s appearance and prior knowledge about the task a learner can identify a set of potential factors, a subset of which we assume are controllable. Our approach picks values of controllable factors to generate the most informative experiments to test the subject at. Additionally, we present a metric to determine if a factor should be incorporated in the model. We present results of our approach on modeling a humanoid robot at the task of kicking a ball. Firstly, we show that actively picking values for controllable factors, even in noisy experiments, leads to faster learning of the subject’s model for the task. Secondly, starting from a minimal set of factors our metric identifies the set of relevant factors to incorporate in the model. Lastly, we show that the refined model better represents the subject’s performance at the task. |
Tasks | |
Published | 2017-12-15 |
URL | http://arxiv.org/abs/1712.05497v2 |
http://arxiv.org/pdf/1712.05497v2.pdf | |
PWC | https://paperswithcode.com/paper/what-can-this-robot-do-learning-from |
Repo | |
Framework | |
Annotating High-Level Structures of Short Stories and Personal Anecdotes
Title | Annotating High-Level Structures of Short Stories and Personal Anecdotes |
Authors | Boyang Li, Beth Cardier, Tong Wang, Florian Metze |
Abstract | Stories are a vital form of communication in human culture; they are employed daily to persuade, to elicit sympathy, or to convey a message. Computational understanding of human narratives, especially high-level narrative structures, remain limited to date. Multiple literary theories for narrative structures exist, but operationalization of the theories has remained a challenge. We developed an annotation scheme by consolidating and extending existing narratological theories, including Labov and Waletsky’s (1967) functional categorization scheme and Freytag’s (1863) pyramid of dramatic tension, and present 360 annotated short stories collected from online sources. In the future, this research will support an approach that enables systems to intelligently sustain complex communications with humans. |
Tasks | |
Published | 2017-10-08 |
URL | http://arxiv.org/abs/1710.06917v2 |
http://arxiv.org/pdf/1710.06917v2.pdf | |
PWC | https://paperswithcode.com/paper/annotating-high-level-structures-of-short |
Repo | |
Framework | |
Low-dose spectral CT reconstruction using L0 image gradient and tensor dictionary
Title | Low-dose spectral CT reconstruction using L0 image gradient and tensor dictionary |
Authors | Weiwen Wu, Yanbo Zhang, Qian Wang, Fenglin Liu, Peijun Chen, Hengyong Yu |
Abstract | Spectral computed tomography (CT) has a great superiority in lesion detection, tissue characterization and material decomposition. To further extend its potential clinical applications, in this work, we propose an improved tensor dictionary learning method for low-dose spectral CT reconstruction with a constraint of image gradient L0-norm, which is named as L0TDL. The L0TDL method inherits the advantages of tensor dictionary learning (TDL) by employing the similarity of spectral CT images. On the other hand, by introducing the L0-norm constraint in gradient image domain, the proposed method emphasizes the spatial sparsity to overcome the weakness of TDL on preserving edge information. The alternative direction minimization method (ADMM) is employed to solve the proposed method. Both numerical simulations and real mouse studies are perform to evaluate the proposed method. The results show that the proposed L0TDL method outperforms other competing methods, such as total variation (TV) minimization, TV with low rank (TV+LR), and TDL methods. |
Tasks | Computed Tomography (CT), Dictionary Learning |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1801.01452v2 |
http://arxiv.org/pdf/1801.01452v2.pdf | |
PWC | https://paperswithcode.com/paper/low-dose-spectral-ct-reconstruction-using-l0 |
Repo | |
Framework | |
On Estimation of $L_{r}$-Norms in Gaussian White Noise Models
Title | On Estimation of $L_{r}$-Norms in Gaussian White Noise Models |
Authors | Yanjun Han, Jiantao Jiao, Rajarshi Mukherjee |
Abstract | We provide a complete picture of asymptotically minimax estimation of $L_r$-norms (for any $r\ge 1$) of the mean in Gaussian white noise model over Nikolskii-Besov spaces. In this regard, we complement the work of Lepski, Nemirovski and Spokoiny (1999), who considered the cases of $r=1$ (with poly-logarithmic gap between upper and lower bounds) and $r$ even (with asymptotically sharp upper and lower bounds) over H"{o}lder spaces. We additionally consider the case of asymptotically adaptive minimax estimation and demonstrate a difference between even and non-even $r$ in terms of an investigator’s ability to produce asymptotically adaptive minimax estimators without paying a penalty. |
Tasks | |
Published | 2017-10-11 |
URL | https://arxiv.org/abs/1710.03863v4 |
https://arxiv.org/pdf/1710.03863v4.pdf | |
PWC | https://paperswithcode.com/paper/on-estimation-of-l_r-norms-in-gaussian-white |
Repo | |
Framework | |
Spatial Aggregation of Holistically-Nested Convolutional Neural Networks for Automated Pancreas Localization and Segmentation
Title | Spatial Aggregation of Holistically-Nested Convolutional Neural Networks for Automated Pancreas Localization and Segmentation |
Authors | Holger R. Roth, Le Lu, Nathan Lay, Adam P. Harrison, Amal Farag, Andrew Sohn, Ronald M. Summers |
Abstract | Accurate and automatic organ segmentation from 3D radiological scans is an important yet challenging problem for medical image analysis. Specifically, the pancreas demonstrates very high inter-patient anatomical variability in both its shape and volume. In this paper, we present an automated system using 3D computed tomography (CT) volumes via a two-stage cascaded approach: pancreas localization and segmentation. For the first step, we localize the pancreas from the entire 3D CT scan, providing a reliable bounding box for the more refined segmentation step. We introduce a fully deep-learning approach, based on an efficient application of holistically-nested convolutional networks (HNNs) on the three orthogonal axial, sagittal, and coronal views. The resulting HNN per-pixel probability maps are then fused using pooling to reliably produce a 3D bounding box of the pancreas that maximizes the recall. We show that our introduced localizer compares favorably to both a conventional non-deep-learning method and a recent hybrid approach based on spatial aggregation of superpixels using random forest classification. The second, segmentation, phase operates within the computed bounding box and integrates semantic mid-level cues of deeply-learned organ interior and boundary maps, obtained by two additional and separate realizations of HNNs. By integrating these two mid-level cues, our method is capable of generating boundary-preserving pixel-wise class label maps that result in the final pancreas segmentation. Quantitative evaluation is performed on a publicly available dataset of 82 patient CT scans using 4-fold cross-validation (CV). We achieve a Dice similarity coefficient (DSC) of 81.27+/-6.27% in validation, which significantly outperforms previous state-of-the art methods that report DSCs of 71.80+/-10.70% and 78.01+/-8.20%, respectively, using the same dataset. |
Tasks | 3D Medical Imaging Segmentation, Computed Tomography (CT), Pancreas Segmentation |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1702.00045v1 |
http://arxiv.org/pdf/1702.00045v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-aggregation-of-holistically-nested-1 |
Repo | |
Framework | |
Color-opponent mechanisms for local hue encoding in a hierarchical framework
Title | Color-opponent mechanisms for local hue encoding in a hierarchical framework |
Authors | Paria Mehrani, Andrei Mouraviev, Oscar J. Avella Gonzalez, John K. Tsotsos |
Abstract | A biologically plausible computational model for color representation is introduced. We present a mechanistic hierarchical model of neurons that not only successfully encodes local hue, but also explicitly reveals how the contributions of each visual cortical layer participating in the process can lead to a hue representation. Our proposed model benefits from studies on the visual cortex and builds a network of single-opponent and hue-selective neurons. Local hue encoding is achieved through gradually increasing nonlinearity in terms of cone inputs to single-opponent cells. We demonstrate that our model’s single-opponent neurons have wide tuning curves, while the hue-selective neurons in our model V4 layer exhibit narrower tunings, resembling those in V4 of the primate visual system. Our simulation experiments suggest that neurons in V4 or later layers have the capacity of encoding unique hues. Moreover, with a few examples, we present the possibility of spanning the infinite space of physical hues by combining the hue-selective neurons in our model. |
Tasks | |
Published | 2017-06-30 |
URL | http://arxiv.org/abs/1706.10266v2 |
http://arxiv.org/pdf/1706.10266v2.pdf | |
PWC | https://paperswithcode.com/paper/color-opponent-mechanisms-for-local-hue |
Repo | |
Framework | |
Object-Extent Pooling for Weakly Supervised Single-Shot Localization
Title | Object-Extent Pooling for Weakly Supervised Single-Shot Localization |
Authors | Amogh Gudi, Nicolai van Rosmalen, Marco Loog, Jan van Gemert |
Abstract | In the face of scarcity in detailed training annotations, the ability to perform object localization tasks in real-time with weak-supervision is very valuable. However, the computational cost of generating and evaluating region proposals is heavy. We adapt the concept of Class Activation Maps (CAM) into the very first weakly-supervised ‘single-shot’ detector that does not require the use of region proposals. To facilitate this, we propose a novel global pooling technique called Spatial Pyramid Averaged Max (SPAM) pooling for training this CAM-based network for object extent localisation with only weak image-level supervision. We show this global pooling layer possesses a near ideal flow of gradients for extent localization, that offers a good trade-off between the extremes of max and average pooling. Our approach only requires a single network pass and uses a fast-backprojection technique, completely omitting any region proposal steps. To the best of our knowledge, this is the first approach to do so. Due to this, we are able to perform inference in real-time at 35fps, which is an order of magnitude faster than all previous weakly supervised object localization frameworks. |
Tasks | Object Localization, Weakly-Supervised Object Localization |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06180v1 |
http://arxiv.org/pdf/1707.06180v1.pdf | |
PWC | https://paperswithcode.com/paper/object-extent-pooling-for-weakly-supervised |
Repo | |
Framework | |
Scalable and Effective Deep CCA via Soft Decorrelation
Title | Scalable and Effective Deep CCA via Soft Decorrelation |
Authors | Xiaobin Chang, Tao Xiang, Timothy M. Hospedales |
Abstract | Recently the widely used multi-view learning model, Canonical Correlation Analysis (CCA) has been generalised to the non-linear setting via deep neural networks. Existing deep CCA models typically first decorrelate the feature dimensions of each view before the different views are maximally correlated in a common latent space. This feature decorrelation is achieved by enforcing an exact decorrelation constraint; these models are thus computationally expensive due to the matrix inversion or SVD operations required for exact decorrelation at each training iteration. Furthermore, the decorrelation step is often separated from the gradient descent based optimisation, resulting in sub-optimal solutions. We propose a novel deep CCA model Soft CCA to overcome these problems. Specifically, exact decorrelation is replaced by soft decorrelation via a mini-batch based Stochastic Decorrelation Loss (SDL) to be optimised jointly with the other training objectives. Extensive experiments show that the proposed soft CCA is more effective and efficient than existing deep CCA models. In addition, our SDL loss can be applied to other deep models beyond multi-view learning, and obtains superior performance compared to existing decorrelation losses. |
Tasks | MULTI-VIEW LEARNING |
Published | 2017-07-30 |
URL | http://arxiv.org/abs/1707.09669v2 |
http://arxiv.org/pdf/1707.09669v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-and-effective-deep-cca-via-soft |
Repo | |
Framework | |
Bilingual Words and Phrase Mappings for Marathi and Hindi SMT
Title | Bilingual Words and Phrase Mappings for Marathi and Hindi SMT |
Authors | Sreelekha S, Pushpak Bhattacharyya |
Abstract | Lack of proper linguistic resources is the major challenges faced by the Machine Translation system developments when dealing with the resource poor languages. In this paper, we describe effective ways to utilize the lexical resources to improve the quality of statistical machine translation. Our research on the usage of lexical resources mainly focused on two ways, such as; augmenting the parallel corpus with more vocabulary and to provide various word forms. We have augmented the training corpus with various lexical resources such as lexical words, function words, kridanta pairs and verb phrases. We have described the case studies, evaluations and detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine translation systems. From the evaluations we observed that, there is an incremental growth in the quality of machine translation as the usage of various lexical resources increases. Moreover, usage of various lexical resources helps to improve the coverage and quality of machine translation where limited parallel corpus is available. |
Tasks | Machine Translation |
Published | 2017-10-05 |
URL | http://arxiv.org/abs/1710.02398v3 |
http://arxiv.org/pdf/1710.02398v3.pdf | |
PWC | https://paperswithcode.com/paper/bilingual-words-and-phrase-mappings-for |
Repo | |
Framework | |
A Survey on Multi-Task Learning
Title | A Survey on Multi-Task Learning |
Authors | Yu Zhang, Qiang Yang |
Abstract | Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL. First, we classify different MTL algorithms into several categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach, and decomposition approach, and then discuss the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, batch MTL models are difficult to handle this situation and online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing are reviewed to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works. Finally, we present theoretical analyses and discuss several future directions for MTL. |
Tasks | Active Learning, Dimensionality Reduction, Multi-Task Learning, MULTI-VIEW LEARNING |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.08114v2 |
http://arxiv.org/pdf/1707.08114v2.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-multi-task-learning |
Repo | |
Framework | |
Deep Hyperalignment
Title | Deep Hyperalignment |
Authors | Muhammad Yousefnezhad, Daoqiang Zhang |
Abstract | This paper proposes Deep Hyperalignment (DHA) as a regularized, deep extension, scalable Hyperalignment (HA) method, which is well-suited for applying functional alignment to fMRI datasets with nonlinearity, high-dimensionality (broad ROI), and a large number of subjects. Unlink previous methods, DHA is not limited by a restricted fixed kernel function. Further, it uses a parametric approach, rank-$m$ Singular Value Decomposition (SVD), and stochastic gradient descent for optimization. Therefore, DHA has a suitable time complexity for large datasets, and DHA does not require the training data when it computes the functional alignment for a new subject. Experimental studies on multi-subject fMRI analysis confirm that the DHA method achieves superior performance to other state-of-the-art HA algorithms. |
Tasks | |
Published | 2017-10-11 |
URL | http://arxiv.org/abs/1710.03923v1 |
http://arxiv.org/pdf/1710.03923v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-hyperalignment |
Repo | |
Framework | |
Marine Animal Classification with Correntropy Loss Based Multi-view Learning
Title | Marine Animal Classification with Correntropy Loss Based Multi-view Learning |
Authors | Zheng Cao, Shujian Yu, Bing Ouyang, Fraser Dalgleish, Anni Vuorenkoski, Gabriel Alsenas, Jose Principe |
Abstract | To analyze marine animals behavior, seasonal distribution and abundance, digital imagery can be acquired by visual or Lidar camera. Depending on the quantity and properties of acquired imagery, the animals are characterized as either features (shape, color, texture, etc.), or dissimilarity matrices derived from different shape analysis methods (shape context, internal distance shape context, etc.). For both cases, multi-view learning is critical in integrating more than one set of feature or dissimilarity matrix for higher classification accuracy. This paper adopts correntropy loss as cost function in multi-view learning, which has favorable statistical properties for rejecting noise. For the case of features, the correntropy loss-based multi-view learning and its entrywise variation are developed based on the multi-view intact space learning algorithm. For the case of dissimilarity matrices, the robust Euclidean embedding algorithm is extended to its multi-view form with the correntropy loss function. Results from simulated data and real-world marine animal imagery show that the proposed algorithms can effectively enhance classification rate, as well as suppress noise under different noise conditions. |
Tasks | MULTI-VIEW LEARNING |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01217v1 |
http://arxiv.org/pdf/1705.01217v1.pdf | |
PWC | https://paperswithcode.com/paper/marine-animal-classification-with-correntropy |
Repo | |
Framework | |