May 5, 2019

3161 words 15 mins read

Paper Group ANR 561

Image Colorization Using a Deep Convolutional Neural Network. PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation. Iterative Refinement for Machine Translation. Direct Visual Odometry using Bit-Planes. Filter based Taxonomy Modification for Improving Hierarchical Classification. Texture and Color-based Image Retrieval Using the Loca …

Image Colorization Using a Deep Convolutional Neural Network


Title	Image Colorization Using a Deep Convolutional Neural Network
Authors	Tung Nguyen, Kazuki Mori, Ruck Thawonmas
Abstract	In this paper, we present a novel approach that uses deep learning techniques for colorizing grayscale images. By utilizing a pre-trained convolutional neural network, which is originally designed for image classification, we are able to separate content and style of different images and recombine them into a single image. We then propose a method that can add colors to a grayscale image by combining its content with style of a color image having semantic similarity with the grayscale one. As an application, to our knowledge the first of its kind, we use the proposed method to colorize images of ukiyo-e a genre of Japanese painting?and obtain interesting results, showing the potential of this method in the growing field of computer assisted art.
Tasks	Colorization, Image Classification, Semantic Similarity, Semantic Textual Similarity
Published	2016-04-27
URL	http://arxiv.org/abs/1604.07904v1
PDF	http://arxiv.org/pdf/1604.07904v1.pdf
PWC	https://paperswithcode.com/paper/image-colorization-using-a-deep-convolutional
Repo
Framework

PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation


Title	PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation
Authors	Georg Poier, Markus Seidl, Matthias Zeppelzauer, Christian Reinbacher, Martin Schaich, Giovanna Bellandi, Alberto Marretta, Horst Bischof
Abstract	The development of powerful 3D scanning hardware and reconstruction algorithms has strongly promoted the generation of 3D surface reconstructions in different domains. An area of special interest for such 3D reconstructions is the cultural heritage domain, where surface reconstructions are generated to digitally preserve historical artifacts. While reconstruction quality nowadays is sufficient in many cases, the robust analysis (e.g. segmentation, matching, and classification) of reconstructed 3D data is still an open topic. In this paper, we target the automatic and interactive segmentation of high-resolution 3D surface reconstructions from the archaeological domain. To foster research in this field, we introduce a fully annotated and publicly available large-scale 3D surface dataset including high-resolution meshes, depth maps and point clouds as a novel benchmark dataset to the community. We provide baseline results for our existing random forest-based approach and for the first time investigate segmentation with convolutional neural networks (CNNs) on the data. Results show that both approaches have complementary strengths and weaknesses and that the provided dataset represents a challenge for future research.
Tasks	Interactive Segmentation
Published	2016-10-06
URL	http://arxiv.org/abs/1610.01944v3
PDF	http://arxiv.org/pdf/1610.01944v3.pdf
PWC	https://paperswithcode.com/paper/petrosurf3d-a-dataset-for-high-resolution-3d
Repo
Framework


Title	Iterative Refinement for Machine Translation
Authors	Roman Novak, Michael Auli, David Grangier
Abstract	Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions. As a result, earlier mistakes cannot be corrected at a later stage. In this paper, we present a translation scheme that starts from an initial guess and then makes iterative improvements that may revisit previous decisions. We parameterize our model as a convolutional neural network that predicts discrete substitutions to an existing translation based on an attention mechanism over both the source sentence as well as the current translation output. By making less than one modification per sentence, we improve the output of a phrase-based translation system by up to 0.4 BLEU on WMT15 German-English translation.
Tasks	Machine Translation
Published	2016-10-20
URL	http://arxiv.org/abs/1610.06602v3
PDF	http://arxiv.org/pdf/1610.06602v3.pdf
PWC	https://paperswithcode.com/paper/iterative-refinement-for-machine-translation
Repo
Framework

Direct Visual Odometry using Bit-Planes


Title	Direct Visual Odometry using Bit-Planes
Authors	Hatem Alismail, Brett Browning, Simon Lucey
Abstract	Feature descriptors, such as SIFT and ORB, are well-known for their robustness to illumination changes, which has made them popular for feature-based VSLAM@. However, in degraded imaging conditions such as low light, low texture, blur and specular reflections, feature extraction is often unreliable. In contrast, direct VSLAM methods which estimate the camera pose by minimizing the photometric error using raw pixel intensities are often more robust to low textured environments and blur. Nonetheless, at the core of direct VSLAM is the reliance on a consistent photometric appearance across images, otherwise known as the brightness constancy assumption. Unfortunately, brightness constancy seldom holds in real world applications. In this work, we overcome brightness constancy by incorporating feature descriptors into a direct visual odometry framework. This combination results in an efficient algorithm that combines the strength of both feature-based algorithms and direct methods. Namely, we achieve robustness to arbitrary photometric variations while operating in low-textured and poorly lit environments. Our approach utilizes an efficient binary descriptor, which we call Bit-Planes, and show how it can be used in the gradient-based optimization required by direct methods. Moreover, we show that the squared Euclidean distance between Bit-Planes is equivalent to the Hamming distance. Hence, the descriptor may be used in least squares optimization without sacrificing its photometric invariance. Finally, we present empirical results that demonstrate the robustness of the approach in poorly lit underground environments.
Tasks	Visual Odometry
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00990v1
PDF	http://arxiv.org/pdf/1604.00990v1.pdf
PWC	https://paperswithcode.com/paper/direct-visual-odometry-using-bit-planes
Repo
Framework

Filter based Taxonomy Modification for Improving Hierarchical Classification


Title	Filter based Taxonomy Modification for Improving Hierarchical Classification
Authors	Azad Naik, Huzefa Rangwala
Abstract	Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down HC with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i:e:, defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art HC approaches.
Tasks
Published	2016-03-02
URL	http://arxiv.org/abs/1603.00772v3
PDF	http://arxiv.org/pdf/1603.00772v3.pdf
PWC	https://paperswithcode.com/paper/filter-based-taxonomy-modification-for
Repo
Framework

Texture and Color-based Image Retrieval Using the Local Extrema Features and Riemannian Distance


Title	Texture and Color-based Image Retrieval Using the Local Extrema Features and Riemannian Distance
Authors	Minh-Tan Pham, Grégoire Mercier, Lionel Bombrun, Julien Michel
Abstract	A novel efficient method for content-based image retrieval (CBIR) is developed in this paper using both texture and color features. Our motivation is to represent and characterize an input image by a set of local descriptors extracted at characteristic points (i.e. keypoints) within the image. Then, dissimilarity measure between images is calculated based on the geometric distance between the topological feature spaces (i.e. manifolds) formed by the sets of local descriptors generated from these images. In this work, we propose to extract and use the local extrema pixels as our feature points. Then, the so-called local extrema-based descriptor (LED) is generated for each keypoint by integrating all color, spatial as well as gradient information captured by a set of its nearest local extrema. Hence, each image is encoded by a LED feature point cloud and riemannian distances between these point clouds enable us to tackle CBIR. Experiments performed on Vistex, Stex and colored Brodatz texture databases using the proposed approach provide very efficient and competitive results compared to the state-of-the-art methods.
Tasks	Content-Based Image Retrieval, Image Retrieval
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02102v2
PDF	http://arxiv.org/pdf/1611.02102v2.pdf
PWC	https://paperswithcode.com/paper/texture-and-color-based-image-retrieval-using
Repo
Framework

Sub-sampled Newton Methods with Non-uniform Sampling


Title	Sub-sampled Newton Methods with Non-uniform Sampling
Authors	Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney
Abstract	We consider the problem of finding the minimizer of a convex function $F: \mathbb R^d \rightarrow \mathbb R$ of the form $F(w) := \sum_{i=1}^n f_i(w) + R(w)$ where a low-rank factorization of $\nabla^2 f_i(w)$ is readily available. We consider the regime where $n \gg d$. As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of ${\nabla^2 f_i(w)}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered in order to capture important terms among ${\nabla^2 f_i(w)}_{i=1}^{n}$. We show that at each iteration non-uniformly sampling at most $\mathcal O(d \log d)$ terms from ${\nabla^2 f_i(w)}_{i=1}^{n}$ is sufficient to achieve a linear-quadratic convergence rate in $w$ when a suitable initial point is provided. In addition, we show that our algorithms achieve a lower computational complexity and exhibit more robustness and better dependence on problem specific quantities, such as the condition number, compared to similar existing methods, especially the ones based on uniform sampling. Finally, we empirically demonstrate that our methods are at least twice as fast as Newton’s methods with ridge logistic regression on several real datasets.
Tasks
Published	2016-07-02
URL	http://arxiv.org/abs/1607.00559v2
PDF	http://arxiv.org/pdf/1607.00559v2.pdf
PWC	https://paperswithcode.com/paper/sub-sampled-newton-methods-with-non-uniform
Repo
Framework

Uncovering Locally Discriminative Structure for Feature Analysis


Title	Uncovering Locally Discriminative Structure for Feature Analysis
Authors	Sen Wang, Feiping Nie, Xiaojun Chang, Xue Li, Quan Z. Sheng, Lina Yao
Abstract	Manifold structure learning is often used to exploit geometric information among data in semi-supervised feature learning algorithms. In this paper, we find that local discriminative information is also of importance for semi-supervised feature learning. We propose a method that utilizes both the manifold structure of data and local discriminant information. Specifically, we define a local clique for each data point. The k-Nearest Neighbors (kNN) is used to determine the structural information within each clique. We then employ a variant of Fisher criterion model to each clique for local discriminant evaluation and sum all cliques as global integration into the framework. In this way, local discriminant information is embedded. Labels are also utilized to minimize distances between data from the same class. In addition, we use the kernel method to extend our proposed model and facilitate feature learning in a high-dimensional space after feature mapping. Experimental results show that our method is superior to all other compared methods over a number of datasets.
Tasks
Published	2016-07-09
URL	http://arxiv.org/abs/1607.02559v1
PDF	http://arxiv.org/pdf/1607.02559v1.pdf
PWC	https://paperswithcode.com/paper/uncovering-locally-discriminative-structure
Repo
Framework

Proximal Quasi-Newton Methods for Regularized Convex Optimization with Linear and Accelerated Sublinear Convergence Rates


Title	Proximal Quasi-Newton Methods for Regularized Convex Optimization with Linear and Accelerated Sublinear Convergence Rates
Authors	Hiva Ghanbari, Katya Scheinberg
Abstract	In [19], a general, inexact, efficient proximal quasi-Newton algorithm for composite optimization problems has been proposed and a sublinear global convergence rate has been established. In this paper, we analyze the convergence properties of this method, both in the exact and inexact setting, in the case when the objective function is strongly convex. We also investigate a practical variant of this method by establishing a simple stopping criterion for the subproblem optimization. Furthermore, we consider an accelerated variant, based on FISTA [1], to the proximal quasi-Newton algorithm. A similar accelerated method has been considered in [7], where the convergence rate analysis relies on very strong impractical assumptions. We present a modified analysis while relaxing these assumptions and perform a practical comparison of the accelerated proximal quasi- Newton algorithm and the regular one. Our analysis and computational results show that acceleration may not bring any benefit in the quasi-Newton setting.
Tasks
Published	2016-07-11
URL	http://arxiv.org/abs/1607.03081v2
PDF	http://arxiv.org/pdf/1607.03081v2.pdf
PWC	https://paperswithcode.com/paper/proximal-quasi-newton-methods-for-regularized
Repo
Framework

Local Region Sparse Learning for Image-on-Scalar Regression


Title	Local Region Sparse Learning for Image-on-Scalar Regression
Authors	Yao Chen, Xiao Wang, Linglong Kong, Hongtu Zhu
Abstract	Identification of regions of interest (ROI) associated with certain disease has a great impact on public health. Imposing sparsity of pixel values and extracting active regions simultaneously greatly complicate the image analysis. We address these challenges by introducing a novel region-selection penalty in the framework of image-on-scalar regression. Our penalty combines the Smoothly Clipped Absolute Deviation (SCAD) regularization, enforcing sparsity, and the SCAD of total variation (TV) regularization, enforcing spatial contiguity, into one group, which segments contiguous spatial regions against zero-valued background. Efficient algorithm is based on the alternative direction method of multipliers (ADMM) which decomposes the non-convex problem into two iterative optimization problems with explicit solutions. Another virtue of the proposed method is that a divide and conquer learning algorithm is developed, thereby allowing scaling to large images. Several examples are presented and the experimental results are compared with other state-of-the-art approaches.
Tasks	Sparse Learning
Published	2016-05-27
URL	http://arxiv.org/abs/1605.08501v1
PDF	http://arxiv.org/pdf/1605.08501v1.pdf
PWC	https://paperswithcode.com/paper/local-region-sparse-learning-for-image-on
Repo
Framework

3D Simulation for Robot Arm Control with Deep Q-Learning


Title	3D Simulation for Robot Arm Control with Deep Q-Learning
Authors	Stephen James, Edward Johns
Abstract	Recent trends in robot arm control have seen a shift towards end-to-end solutions, using deep reinforcement learning to learn a controller directly from raw sensor data, rather than relying on a hand-crafted, modular pipeline. However, the high dimensionality of the state space often means that it is impractical to generate sufficient training data with real-world experiments. As an alternative solution, we propose to learn a robot controller in simulation, with the potential of then transferring this to a real robot. Building upon the recent success of deep Q-networks, we present an approach which uses 3D simulations to train a 7-DOF robotic arm in a control task without any prior knowledge. The controller accepts images of the environment as its only input, and outputs motor actions for the task of locating and grasping a cube, over a range of initial configurations. To encourage efficient learning, a structured reward function is designed with intermediate rewards. We also present preliminary results in direct transfer of policies over to a real robot, without any further training.
Tasks	Q-Learning
Published	2016-09-13
URL	http://arxiv.org/abs/1609.03759v2
PDF	http://arxiv.org/pdf/1609.03759v2.pdf
PWC	https://paperswithcode.com/paper/3d-simulation-for-robot-arm-control-with-deep
Repo
Framework

Can neural machine translation do simultaneous translation?


Title	Can neural machine translation do simultaneous translation?
Authors	Kyunghyun Cho, Masha Esipova
Abstract	We investigate the potential of attention-based neural machine translation in simultaneous translation. We introduce a novel decoding algorithm, called simultaneous greedy decoding, that allows an existing neural machine translation model to begin translating before a full source sentence is received. This approach is unique from previous works on simultaneous translation in that segmentation and translation are done jointly to maximize the translation quality and that translating each segment is strongly conditioned on all the previous segments. This paper presents a first step toward building a full simultaneous translation system based on neural machine translation.
Tasks	Machine Translation
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02012v1
PDF	http://arxiv.org/pdf/1606.02012v1.pdf
PWC	https://paperswithcode.com/paper/can-neural-machine-translation-do
Repo
Framework

EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses


Title	EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses
Authors	Simon Van Eyndhoven, Tom Francart, Alexander Bertrand
Abstract	OBJECTIVE: We aim to extract and denoise the attended speaker in a noisy, two-speaker acoustic scenario, relying on microphone array recordings from a binaural hearing aid, which are complemented with electroencephalography (EEG) recordings to infer the speaker of interest. METHODS: In this study, we propose a modular processing flow that first extracts the two speech envelopes from the microphone recordings, then selects the attended speech envelope based on the EEG, and finally uses this envelope to inform a multi-channel speech separation and denoising algorithm. RESULTS: Strong suppression of interfering (unattended) speech and background noise is achieved, while the attended speech is preserved. Furthermore, EEG-based auditory attention detection (AAD) is shown to be robust to the use of noisy speech signals. CONCLUSIONS: Our results show that AAD-based speaker extraction from microphone array recordings is feasible and robust, even in noisy acoustic environments, and without access to the clean speech signals to perform EEG-based AAD. SIGNIFICANCE: Current research on AAD always assumes the availability of the clean speech signals, which limits the applicability in real settings. We have extended this research to detect the attended speaker even when only microphone recordings with noisy speech mixtures are available. This is an enabling ingredient for new brain-computer interfaces and effective filtering schemes in neuro-steered hearing prostheses. Here, we provide a first proof of concept for EEG-informed attended speaker extraction and denoising.
Tasks	Denoising, EEG, Speech Separation
Published	2016-02-18
URL	http://arxiv.org/abs/1602.05702v4
PDF	http://arxiv.org/pdf/1602.05702v4.pdf
PWC	https://paperswithcode.com/paper/eeg-informed-attended-speaker-extraction-from
Repo
Framework

Model-based Adversarial Imitation Learning


Title	Model-based Adversarial Imitation Learning
Authors	Nir Baram, Oron Anschel, Shie Mannor
Abstract	Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle $D$ that discriminates between the expert’s data distribution and that of the generative model $G$. The generative model is trained to capture the expert’s distribution by maximizing the probability of $D$ misclassifying the data it generates. Overall, the system is \emph{differentiable} end-to-end and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of $D$. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.
Tasks	Imitation Learning
Published	2016-12-07
URL	http://arxiv.org/abs/1612.02179v1
PDF	http://arxiv.org/pdf/1612.02179v1.pdf
PWC	https://paperswithcode.com/paper/model-based-adversarial-imitation-learning
Repo
Framework

Curvature Integration in a 5D Kernel for Extracting Vessel Connections in Retinal Images


Title	Curvature Integration in a 5D Kernel for Extracting Vessel Connections in Retinal Images
Authors	Samaneh Abbasi-Sureshjani, Marta Favali, Giovanna Citti, Alessandro Sarti, Bart M. ter Haar Romeny
Abstract	Tree-like structures such as retinal images are widely studied in computer-aided diagnosis systems for large-scale screening programs. Despite several segmentation and tracking methods proposed in the literature, there still exist several limitations specifically when two or more curvilinear structures cross or bifurcate, or in the presence of interrupted lines or highly curved blood vessels. In this paper, we propose a novel approach based on multi-orientation scores augmented with a contextual affinity matrix, which both are inspired by the geometry of the primary visual cortex (V1) and their contextual connections. The connectivity is described with a five-dimensional kernel obtained as the fundamental solution of the Fokker-Planck equation modelling the cortical connectivity in the lifted space of positions, orientations, curvatures and intensity. It is further used in a self-tuning spectral clustering step to identify the main perceptual units in the stimuli. The proposed method has been validated on several easy and challenging structures in a set of artificial images and actual retinal patches. Supported by quantitative and qualitative results, the method is capable of overcoming the limitations of current state-of-the-art techniques.
Tasks
Published	2016-08-29
URL	http://arxiv.org/abs/1608.08049v3
PDF	http://arxiv.org/pdf/1608.08049v3.pdf
PWC	https://paperswithcode.com/paper/curvature-integration-in-a-5d-kernel-for
Repo
Framework