May 5, 2019

3161 words 15 mins read

Paper Group ANR 561

Paper Group ANR 561

Image Colorization Using a Deep Convolutional Neural Network. PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation. Iterative Refinement for Machine Translation. Direct Visual Odometry using Bit-Planes. Filter based Taxonomy Modification for Improving Hierarchical Classification. Texture and Color-based Image Retrieval Using the Loca …

Image Colorization Using a Deep Convolutional Neural Network

Title Image Colorization Using a Deep Convolutional Neural Network
Authors Tung Nguyen, Kazuki Mori, Ruck Thawonmas
Abstract In this paper, we present a novel approach that uses deep learning techniques for colorizing grayscale images. By utilizing a pre-trained convolutional neural network, which is originally designed for image classification, we are able to separate content and style of different images and recombine them into a single image. We then propose a method that can add colors to a grayscale image by combining its content with style of a color image having semantic similarity with the grayscale one. As an application, to our knowledge the first of its kind, we use the proposed method to colorize images of ukiyo-e a genre of Japanese painting?and obtain interesting results, showing the potential of this method in the growing field of computer assisted art.
Tasks Colorization, Image Classification, Semantic Similarity, Semantic Textual Similarity
Published 2016-04-27
URL http://arxiv.org/abs/1604.07904v1
PDF http://arxiv.org/pdf/1604.07904v1.pdf
PWC https://paperswithcode.com/paper/image-colorization-using-a-deep-convolutional
Repo
Framework

PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation

Title PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation
Authors Georg Poier, Markus Seidl, Matthias Zeppelzauer, Christian Reinbacher, Martin Schaich, Giovanna Bellandi, Alberto Marretta, Horst Bischof
Abstract The development of powerful 3D scanning hardware and reconstruction algorithms has strongly promoted the generation of 3D surface reconstructions in different domains. An area of special interest for such 3D reconstructions is the cultural heritage domain, where surface reconstructions are generated to digitally preserve historical artifacts. While reconstruction quality nowadays is sufficient in many cases, the robust analysis (e.g. segmentation, matching, and classification) of reconstructed 3D data is still an open topic. In this paper, we target the automatic and interactive segmentation of high-resolution 3D surface reconstructions from the archaeological domain. To foster research in this field, we introduce a fully annotated and publicly available large-scale 3D surface dataset including high-resolution meshes, depth maps and point clouds as a novel benchmark dataset to the community. We provide baseline results for our existing random forest-based approach and for the first time investigate segmentation with convolutional neural networks (CNNs) on the data. Results show that both approaches have complementary strengths and weaknesses and that the provided dataset represents a challenge for future research.
Tasks Interactive Segmentation
Published 2016-10-06
URL http://arxiv.org/abs/1610.01944v3
PDF http://arxiv.org/pdf/1610.01944v3.pdf
PWC https://paperswithcode.com/paper/petrosurf3d-a-dataset-for-high-resolution-3d
Repo
Framework

Iterative Refinement for Machine Translation

Title Iterative Refinement for Machine Translation
Authors Roman Novak, Michael Auli, David Grangier
Abstract Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions. As a result, earlier mistakes cannot be corrected at a later stage. In this paper, we present a translation scheme that starts from an initial guess and then makes iterative improvements that may revisit previous decisions. We parameterize our model as a convolutional neural network that predicts discrete substitutions to an existing translation based on an attention mechanism over both the source sentence as well as the current translation output. By making less than one modification per sentence, we improve the output of a phrase-based translation system by up to 0.4 BLEU on WMT15 German-English translation.
Tasks Machine Translation
Published 2016-10-20
URL http://arxiv.org/abs/1610.06602v3
PDF http://arxiv.org/pdf/1610.06602v3.pdf
PWC https://paperswithcode.com/paper/iterative-refinement-for-machine-translation
Repo
Framework

Direct Visual Odometry using Bit-Planes

Title Direct Visual Odometry using Bit-Planes
Authors Hatem Alismail, Brett Browning, Simon Lucey
Abstract Feature descriptors, such as SIFT and ORB, are well-known for their robustness to illumination changes, which has made them popular for feature-based VSLAM@. However, in degraded imaging conditions such as low light, low texture, blur and specular reflections, feature extraction is often unreliable. In contrast, direct VSLAM methods which estimate the camera pose by minimizing the photometric error using raw pixel intensities are often more robust to low textured environments and blur. Nonetheless, at the core of direct VSLAM is the reliance on a consistent photometric appearance across images, otherwise known as the brightness constancy assumption. Unfortunately, brightness constancy seldom holds in real world applications. In this work, we overcome brightness constancy by incorporating feature descriptors into a direct visual odometry framework. This combination results in an efficient algorithm that combines the strength of both feature-based algorithms and direct methods. Namely, we achieve robustness to arbitrary photometric variations while operating in low-textured and poorly lit environments. Our approach utilizes an efficient binary descriptor, which we call Bit-Planes, and show how it can be used in the gradient-based optimization required by direct methods. Moreover, we show that the squared Euclidean distance between Bit-Planes is equivalent to the Hamming distance. Hence, the descriptor may be used in least squares optimization without sacrificing its photometric invariance. Finally, we present empirical results that demonstrate the robustness of the approach in poorly lit underground environments.
Tasks Visual Odometry
Published 2016-04-04
URL http://arxiv.org/abs/1604.00990v1
PDF http://arxiv.org/pdf/1604.00990v1.pdf
PWC https://paperswithcode.com/paper/direct-visual-odometry-using-bit-planes
Repo
Framework

Filter based Taxonomy Modification for Improving Hierarchical Classification

Title Filter based Taxonomy Modification for Improving Hierarchical Classification
Authors Azad Naik, Huzefa Rangwala
Abstract Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down HC with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i:e:, defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art HC approaches.
Tasks
Published 2016-03-02
URL http://arxiv.org/abs/1603.00772v3
PDF http://arxiv.org/pdf/1603.00772v3.pdf
PWC https://paperswithcode.com/paper/filter-based-taxonomy-modification-for
Repo
Framework

Texture and Color-based Image Retrieval Using the Local Extrema Features and Riemannian Distance

Title Texture and Color-based Image Retrieval Using the Local Extrema Features and Riemannian Distance
Authors Minh-Tan Pham, Grégoire Mercier, Lionel Bombrun, Julien Michel
Abstract A novel efficient method for content-based image retrieval (CBIR) is developed in this paper using both texture and color features. Our motivation is to represent and characterize an input image by a set of local descriptors extracted at characteristic points (i.e. keypoints) within the image. Then, dissimilarity measure between images is calculated based on the geometric distance between the topological feature spaces (i.e. manifolds) formed by the sets of local descriptors generated from these images. In this work, we propose to extract and use the local extrema pixels as our feature points. Then, the so-called local extrema-based descriptor (LED) is generated for each keypoint by integrating all color, spatial as well as gradient information captured by a set of its nearest local extrema. Hence, each image is encoded by a LED feature point cloud and riemannian distances between these point clouds enable us to tackle CBIR. Experiments performed on Vistex, Stex and colored Brodatz texture databases using the proposed approach provide very efficient and competitive results compared to the state-of-the-art methods.
Tasks Content-Based Image Retrieval, Image Retrieval
Published 2016-11-07
URL http://arxiv.org/abs/1611.02102v2
PDF http://arxiv.org/pdf/1611.02102v2.pdf
PWC https://paperswithcode.com/paper/texture-and-color-based-image-retrieval-using
Repo
Framework

Sub-sampled Newton Methods with Non-uniform Sampling

Title Sub-sampled Newton Methods with Non-uniform Sampling
Authors Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney
Abstract We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) := \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available. We consider the regime where $n \gg d$. As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of ${\nabla^2 f_i(w)}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered in order to capture important terms among ${\nabla^2 f_i(w)}_{i=1}^{n}$. We show that at each iteration non-uniformly sampling at most $\mathcal O(d \log d)$ terms from ${\nabla^2 f_i(w)}_{i=1}^{n}$ is sufficient to achieve a linear-quadratic convergence rate in $w$ when a suitable initial point is provided. In addition, we show that our algorithms achieve a lower computational complexity and exhibit more robustness and better dependence on problem specific quantities, such as the condition number, compared to similar existing methods, especially the ones based on uniform sampling. Finally, we empirically demonstrate that our methods are at least twice as fast as Newton’s methods with ridge logistic regression on several real datasets.
Tasks
Published 2016-07-02
URL http://arxiv.org/abs/1607.00559v2
PDF http://arxiv.org/pdf/1607.00559v2.pdf
PWC https://paperswithcode.com/paper/sub-sampled-newton-methods-with-non-uniform
Repo
Framework

Uncovering Locally Discriminative Structure for Feature Analysis

Title Uncovering Locally Discriminative Structure for Feature Analysis
Authors Sen Wang, Feiping Nie, Xiaojun Chang, Xue Li, Quan Z. Sheng, Lina Yao
Abstract Manifold structure learning is often used to exploit geometric information among data in semi-supervised feature learning algorithms. In this paper, we find that local discriminative information is also of importance for semi-supervised feature learning. We propose a method that utilizes both the manifold structure of data and local discriminant information. Specifically, we define a local clique for each data point. The k-Nearest Neighbors (kNN) is used to determine the structural information within each clique. We then employ a variant of Fisher criterion model to each clique for local discriminant evaluation and sum all cliques as global integration into the framework. In this way, local discriminant information is embedded. Labels are also utilized to minimize distances between data from the same class. In addition, we use the kernel method to extend our proposed model and facilitate feature learning in a high-dimensional space after feature mapping. Experimental results show that our method is superior to all other compared methods over a number of datasets.
Tasks
Published 2016-07-09
URL http://arxiv.org/abs/1607.02559v1
PDF http://arxiv.org/pdf/1607.02559v1.pdf
PWC https://paperswithcode.com/paper/uncovering-locally-discriminative-structure
Repo
Framework

Proximal Quasi-Newton Methods for Regularized Convex Optimization with Linear and Accelerated Sublinear Convergence Rates

Title Proximal Quasi-Newton Methods for Regularized Convex Optimization with Linear and Accelerated Sublinear Convergence Rates
Authors Hiva Ghanbari, Katya Scheinberg
Abstract In [19], a general, inexact, efficient proximal quasi-Newton algorithm for composite optimization problems has been proposed and a sublinear global convergence rate has been established. In this paper, we analyze the convergence properties of this method, both in the exact and inexact setting, in the case when the objective function is strongly convex. We also investigate a practical variant of this method by establishing a simple stopping criterion for the subproblem optimization. Furthermore, we consider an accelerated variant, based on FISTA [1], to the proximal quasi-Newton algorithm. A similar accelerated method has been considered in [7], where the convergence rate analysis relies on very strong impractical assumptions. We present a modified analysis while relaxing these assumptions and perform a practical comparison of the accelerated proximal quasi- Newton algorithm and the regular one. Our analysis and computational results show that acceleration may not bring any benefit in the quasi-Newton setting.
Tasks
Published 2016-07-11
URL http://arxiv.org/abs/1607.03081v2
PDF http://arxiv.org/pdf/1607.03081v2.pdf
PWC https://paperswithcode.com/paper/proximal-quasi-newton-methods-for-regularized
Repo
Framework

Local Region Sparse Learning for Image-on-Scalar Regression

Title Local Region Sparse Learning for Image-on-Scalar Regression
Authors Yao Chen, Xiao Wang, Linglong Kong, Hongtu Zhu
Abstract Identification of regions of interest (ROI) associated with certain disease has a great impact on public health. Imposing sparsity of pixel values and extracting active regions simultaneously greatly complicate the image analysis. We address these challenges by introducing a novel region-selection penalty in the framework of image-on-scalar regression. Our penalty combines the Smoothly Clipped Absolute Deviation (SCAD) regularization, enforcing sparsity, and the SCAD of total variation (TV) regularization, enforcing spatial contiguity, into one group, which segments contiguous spatial regions against zero-valued background. Efficient algorithm is based on the alternative direction method of multipliers (ADMM) which decomposes the non-convex problem into two iterative optimization problems with explicit solutions. Another virtue of the proposed method is that a divide and conquer learning algorithm is developed, thereby allowing scaling to large images. Several examples are presented and the experimental results are compared with other state-of-the-art approaches.
Tasks Sparse Learning
Published 2016-05-27
URL http://arxiv.org/abs/1605.08501v1
PDF http://arxiv.org/pdf/1605.08501v1.pdf
PWC https://paperswithcode.com/paper/local-region-sparse-learning-for-image-on
Repo
Framework

3D Simulation for Robot Arm Control with Deep Q-Learning

Title 3D Simulation for Robot Arm Control with Deep Q-Learning
Authors Stephen James, Edward Johns
Abstract Recent trends in robot arm control have seen a shift towards end-to-end solutions, using deep reinforcement learning to learn a controller directly from raw sensor data, rather than relying on a hand-crafted, modular pipeline. However, the high dimensionality of the state space often means that it is impractical to generate sufficient training data with real-world experiments. As an alternative solution, we propose to learn a robot controller in simulation, with the potential of then transferring this to a real robot. Building upon the recent success of deep Q-networks, we present an approach which uses 3D simulations to train a 7-DOF robotic arm in a control task without any prior knowledge. The controller accepts images of the environment as its only input, and outputs motor actions for the task of locating and grasping a cube, over a range of initial configurations. To encourage efficient learning, a structured reward function is designed with intermediate rewards. We also present preliminary results in direct transfer of policies over to a real robot, without any further training.
Tasks Q-Learning
Published 2016-09-13
URL http://arxiv.org/abs/1609.03759v2
PDF http://arxiv.org/pdf/1609.03759v2.pdf
PWC https://paperswithcode.com/paper/3d-simulation-for-robot-arm-control-with-deep
Repo
Framework

Can neural machine translation do simultaneous translation?

Title Can neural machine translation do simultaneous translation?
Authors Kyunghyun Cho, Masha Esipova
Abstract We investigate the potential of attention-based neural machine translation in simultaneous translation. We introduce a novel decoding algorithm, called simultaneous greedy decoding, that allows an existing neural machine translation model to begin translating before a full source sentence is received. This approach is unique from previous works on simultaneous translation in that segmentation and translation are done jointly to maximize the translation quality and that translating each segment is strongly conditioned on all the previous segments. This paper presents a first step toward building a full simultaneous translation system based on neural machine translation.
Tasks Machine Translation
Published 2016-06-07
URL http://arxiv.org/abs/1606.02012v1
PDF http://arxiv.org/pdf/1606.02012v1.pdf
PWC https://paperswithcode.com/paper/can-neural-machine-translation-do
Repo
Framework

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses

Title EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses
Authors Simon Van Eyndhoven, Tom Francart, Alexander Bertrand
Abstract OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy, two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. METHODS: In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multi-channel speech separation and denoising algorithm. RESULTS: Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. CONCLUSIONS: Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. SIGNIFICANCE: Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain-computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.
Tasks Denoising, EEG, Speech Separation
Published 2016-02-18
URL http://arxiv.org/abs/1602.05702v4
PDF http://arxiv.org/pdf/1602.05702v4.pdf
PWC https://paperswithcode.com/paper/eeg-informed-attended-speaker-extraction-from
Repo
Framework

Model-based Adversarial Imitation Learning

Title Model-based Adversarial Imitation Learning
Authors Nir Baram, Oron Anschel, Shie Mannor
Abstract Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle $D$ that discriminates between the expert’s data distribution and that of the generative model $G$. The generative model is trained to capture the expert’s distribution by maximizing the probability of $D$ misclassifying the data it generates. Overall, the system is \emph{differentiable} end-to-end and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of $D$. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.
Tasks Imitation Learning
Published 2016-12-07
URL http://arxiv.org/abs/1612.02179v1
PDF http://arxiv.org/pdf/1612.02179v1.pdf
PWC https://paperswithcode.com/paper/model-based-adversarial-imitation-learning
Repo
Framework

Curvature Integration in a 5D Kernel for Extracting Vessel Connections in Retinal Images

Title Curvature Integration in a 5D Kernel for Extracting Vessel Connections in Retinal Images
Authors Samaneh Abbasi-Sureshjani, Marta Favali, Giovanna Citti, Alessandro Sarti, Bart M. ter Haar Romeny
Abstract Tree-like structures such as retinal images are widely studied in computer-aided diagnosis systems for large-scale screening programs. Despite several segmentation and tracking methods proposed in the literature, there still exist several limitations specifically when two or more curvilinear structures cross or bifurcate, or in the presence of interrupted lines or highly curved blood vessels. In this paper, we propose a novel approach based on multi-orientation scores augmented with a contextual affinity matrix, which both are inspired by the geometry of the primary visual cortex (V1) and their contextual connections. The connectivity is described with a five-dimensional kernel obtained as the fundamental solution of the Fokker-Planck equation modelling the cortical connectivity in the lifted space of positions, orientations, curvatures and intensity. It is further used in a self-tuning spectral clustering step to identify the main perceptual units in the stimuli. The proposed method has been validated on several easy and challenging structures in a set of artificial images and actual retinal patches. Supported by quantitative and qualitative results, the method is capable of overcoming the limitations of current state-of-the-art techniques.
Tasks
Published 2016-08-29
URL http://arxiv.org/abs/1608.08049v3
PDF http://arxiv.org/pdf/1608.08049v3.pdf
PWC https://paperswithcode.com/paper/curvature-integration-in-a-5d-kernel-for
Repo
Framework
comments powered by Disqus