January 25, 2020

3210 words 16 mins read

Paper Group ANR 1638

A Distributed Approach towards Discriminative Distance Metric Learning. Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards. High Fidelity Face Manipulation with Extreme Pose and Expression. Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference. Hidden State Guidance: Improving Image Caption …

A Distributed Approach towards Discriminative Distance Metric Learning


Title	A Distributed Approach towards Discriminative Distance Metric Learning
Authors	Jun Li, Xun Lin, Xiaoguang Rui, Yong Rui, Dacheng Tao
Abstract	Distance metric learning is successful in discovering intrinsic relations in data. However, most algorithms are computationally demanding when the problem size becomes large. In this paper, we propose a discriminative metric learning algorithm, and develop a distributed scheme learning metrics on moderate-sized subsets of data, and aggregating the results into a global solution. The technique leverages the power of parallel computation. The algorithm of the aggregated distance metric learning (ADML) scales well with the data size and can be controlled by the partition. We theoretically analyse and provide bounds for the error induced by the distributed treatment. We have conducted experimental evaluation of ADML, both on specially designed tests and on practical image annotation tasks. Those tests have shown that ADML achieves the state-of-the-art performance at only a fraction of the cost incurred by most existing methods.
Tasks	Metric Learning
Published	2019-05-11
URL	https://arxiv.org/abs/1905.05177v1
PDF	https://arxiv.org/pdf/1905.05177v1.pdf
PWC	https://paperswithcode.com/paper/a-distributed-approach-towards-discriminative
Repo
Framework

Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards


Title	Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards
Authors	Anmol Kagrecha, Jayakrishnan Nair, Krishna Jagannathan
Abstract	Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR) in a fixed budget best-arm identification framework. We allow the reward distributions to be unbounded or even heavy-tailed. For this problem, our goal is to devise algorithms that are entirely distribution oblivious, i.e., the algorithm is not aware of any information on the reward distributions, including bounds on the moments/tails, or the suboptimality gaps across arms. In this paper, we provide a class of such algorithms with provable upper bounds on the probability of incorrect identification. In the process, we develop a novel estimator for the CVaR of unbounded (including heavy-tailed) random variables and prove a concentration inequality for the same, which could be of independent interest. We also compare the error bounds for our distribution oblivious algorithms with those corresponding to standard non-oblivious algorithms. Finally, numerical experiments reveal that our algorithms perform competitively when compared with non-oblivious algorithms, suggesting that distribution obliviousness can be realised in practice without incurring a significant loss of performance.
Tasks	Multi-Armed Bandits
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00569v1
PDF	https://arxiv.org/pdf/1906.00569v1.pdf
PWC	https://paperswithcode.com/paper/190600569
Repo
Framework

High Fidelity Face Manipulation with Extreme Pose and Expression


Title	High Fidelity Face Manipulation with Extreme Pose and Expression
Authors	Chaoyou Fu, Yibo Hu, Xiang Wu, Guoli Wang, Qian Zhang, Ran He
Abstract	Face manipulation has shown remarkable advances with the flourish of Generative Adversarial Networks. However, due to the difficulties of controlling the structure and texture in high-resolution, it is challenging to simultaneously model pose and expression during manipulation. In this paper, we propose a novel framework that simplifies face manipulation with extreme pose and expression into two correlated stages: a boundary prediction stage and a disentangled face synthesis stage. In the first stage, we propose to use a boundary image for joint pose and expression modeling. An encoder-decoder network is employed to predict the boundary image of the target face in a semi-supervised way. Pose and expression estimators are employed to improve the prediction accuracy. In the second stage, the predicted boundary image and the original face are encoded into the structure and texture latent space by two encoder networks respectively. A proxy network and a feature threshold loss are further imposed to disentangle the latent space. Furthermore, considering the lack of high-resolution face databases to verify the effectiveness of our method, we collect a new high quality Multi-View Face (MVF-HQ) database in 6000 $\times$ 4000 resolution. It contains 120,283 images from 479 identities with diverse pose, expression and illumination variants, which is much larger in scale and much higher in resolution than the current public high-resolution face manipulation databases. We expect it to push forward the advance of face manipulation. Qualitative and quantitative experiments on four databases show that our method dramatically improves the visualization of face manipulation.
Tasks	Face Generation, Face Recognition
Published	2019-03-28
URL	https://arxiv.org/abs/1903.12003v2
PDF	https://arxiv.org/pdf/1903.12003v2.pdf
PWC	https://paperswithcode.com/paper/high-fidelity-face-manipulation-with-extreme
Repo
Framework

Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference


Title	Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference
Authors	Thomas Verelst, Tinne Tuytelaars
Abstract	Modern convolutional neural networks apply the same operations on every pixel in an image. However, not all image regions are equally important. To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image. We introduce a residual block where a small gating branch learns which spatial positions should be evaluated. These discrete gating decisions are trained end-to-end using the Gumbel-Softmax trick, in combination with a sparsity criterion. Our experiments on Food-101, CIFAR and ImageNet show that our method has better focus on the region of interest and better accuracy than existing methods, at a lower computational complexity. Moreover, we provide an efficient CUDA implementation of our dynamic convolutions using a gather-scatter approach, achieving a significant improvement in inference speed on MobileNetV2 and ShuffleNetV2. On human pose estimation, a task that is inherently spatially sparse, the processing speed is increased by 45% with less than 0.1% loss in accuracy.
Tasks	Pose Estimation
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03203v1
PDF	https://arxiv.org/pdf/1912.03203v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-convolutions-exploiting-spatial
Repo
Framework

Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder


Title	Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder
Authors	Jialin Wu, Raymond J. Mooney
Abstract	Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images or detected objects as inputs.
Tasks	Image Captioning
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14208v2
PDF	https://arxiv.org/pdf/1910.14208v2.pdf
PWC	https://paperswithcode.com/paper/hidden-state-guidance-improving-image
Repo
Framework

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning


Title	KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning
Authors	Tarin Clanuwat, Alex Lamb, Asanobu Kitamoto
Abstract	Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years starting from the 8th century. Over 3 millions books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved. However, following a change to the Japanese writing system in 1900, Kuzushiji has not been included in regular school curricula. Therefore, most Japanese natives nowadays cannot read books written or printed just 150 years ago. Museums and libraries have invested a great deal of effort into creating digital copies of these historical documents as a safeguard against fires, earthquakes and tsunamis. The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts. Thus there has been a great deal of interest in using Machine Learning to automatically recognize these historical texts and transcribe them into modern Japanese characters. Nevertheless, several challenges in Kuzushiji recognition have made the performance of existing systems extremely poor. To tackle these challenges, we propose KuroNet, a new end-to-end model which jointly recognizes an entire page of text by using a residual U-Net architecture which predicts the location and identity of all characters given a page of text (without any pre-processing). This allows the model to handle long range context, large vocabularies, and non-standardized character layouts. We demonstrate that our system is able to successfully recognize a large fraction of pre-modern Japanese documents, but also explore areas where our system is limited and suggest directions for future work.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09433v1
PDF	https://arxiv.org/pdf/1910.09433v1.pdf
PWC	https://paperswithcode.com/paper/kuronet-pre-modern-japanese-kuzushiji
Repo
Framework

Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness


Title	Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness
Authors	Adnan Siraj Rakin, Zhezhi He, Li Yang, Yanzhi Wang, Liqiang Wang, Deliang Fan
Abstract	Deep Neural Network (DNN) trained by the gradient descent method is known to be vulnerable to maliciously perturbed adversarial input, aka. adversarial attack. As one of the countermeasures against adversarial attack, increasing the model capacity for DNN robustness enhancement was discussed and reported as an effective approach by many recent works. In this work, we show that shrinking the model size through proper weight pruning can even be helpful to improve the DNN robustness under adversarial attack. For obtaining a simultaneously robust and compact DNN model, we propose a multi-objective training method called Robust Sparse Regularization (RSR), through the fusion of various regularization techniques, including channel-wise noise injection, lasso weight penalty, and adversarial training. We conduct extensive experiments across popular ResNet-20, ResNet-18 and VGG-16 DNN architectures to demonstrate the effectiveness of RSR against popular white-box (i.e., PGD and FGSM) and black-box attacks. Thanks to RSR, 85% weight connections of ResNet-18 can be pruned while still achieving 0.68% and 8.72% improvement in clean- and perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its PGD adversarial training baseline.
Tasks	Adversarial Attack
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13074v1
PDF	https://arxiv.org/pdf/1905.13074v1.pdf
PWC	https://paperswithcode.com/paper/robust-sparse-regularization-simultaneously
Repo
Framework

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning


Title	Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning
Authors	Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves
Abstract	We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps. To learn the value function for horizon $h$, these algorithms bootstrap from the value function for horizon $h-1$, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as “the deadly triad”). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and $n$-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad.
Tasks	Q-Learning
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03906v2
PDF	https://arxiv.org/pdf/1909.03906v2.pdf
PWC	https://paperswithcode.com/paper/fixed-horizon-temporal-difference-methods-for
Repo
Framework

Mirror Descent View for Neural Network Quantization


Title	Mirror Descent View for Neural Network Quantization
Authors	Thalaiyasingam Ajanthan, Kartik Gupta, Philip H. S. Torr, Richard Hartley, Puneet K. Dokania
Abstract	Quantizing large Neural Networks (NN) while maintaining the performance is highly desirable for resource-limited devices due to reduced memory and time complexity. It is usually formulated as a constrained optimization problem and optimized via a modified version of gradient descent. In this work, by interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, we introduce a Mirror Descent (MD) framework for NN quantization. Specifically, we provide conditions on the projections (i.e., mapping from continuous to quantized ones) which would enable us to derive valid mirror maps and in turn the respective MD updates. Furthermore, we present a numerically stable implementation of MD that requires storing an additional set of auxiliary variables (unconstrained), and show that it is strikingly analogous to the Straight Through Estimator (STE) based method which is typically viewed as a “trick” to avoid vanishing gradients issue. Our experiments on CIFAR-10/100, TinyImageNet, and ImageNet classification datasets with VGG-16, ResNet-18, and MobileNetV2 architectures show that our MD variants obtain quantized networks with state-of-the-art performance.
Tasks	Quantization
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08237v2
PDF	https://arxiv.org/pdf/1910.08237v2.pdf
PWC	https://paperswithcode.com/paper/mirror-descent-view-for-neural-network
Repo
Framework

Emergence of order in random languages


Title	Emergence of order in random languages
Authors	E. DeGiuli
Abstract	We consider languages generated by weighted context-free grammars. It is shown that the behaviour of large texts is controlled by saddle-point equations for an appropriate generating function. We then consider ensembles of grammars, in particular the Random Language Model of E. DeGiuli, Phys. Rev. Lett., 122, 128301, 2019. This model is solved in the replica-symmetric ansatz, which is valid in the high-temperature, disordered phase. It is shown that in the phase in which languages carry information, the replica symmetry must be broken.
Tasks	Language Modelling
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07516v2
PDF	https://arxiv.org/pdf/1902.07516v2.pdf
PWC	https://paperswithcode.com/paper/emergence-of-order-in-random-languages
Repo
Framework

Quickly Finding the Best Linear Model in High Dimensions


Title	Quickly Finding the Best Linear Model in High Dimensions
Authors	Yahya Sattar, Samet Oymak
Abstract	We study the problem of finding the best linear model that can minimize least-squares loss given a data-set. While this problem is trivial in the low dimensional regime, it becomes more interesting in high dimensions where the population minimizer is assumed to lie on a manifold such as sparse vectors. We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in the finite sample regime. We establish linear convergence rate and data dependent estimation error bounds for PGD. Our contributions include: 1) The results are established for heavier tailed sub-exponential distributions besides sub-gaussian. 2) We directly analyze the empirical risk minimization and do not require a realizable model that connects input data and labels. 3) Our PGD algorithm is augmented to learn the bias terms which boosts the performance. The numerical experiments validate our theoretical results.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01728v1
PDF	https://arxiv.org/pdf/1907.01728v1.pdf
PWC	https://paperswithcode.com/paper/quickly-finding-the-best-linear-model-in-high
Repo
Framework

JPEG XT Image Compression with Hue Compensation for Two-Layer HDR Coding


Title	JPEG XT Image Compression with Hue Compensation for Two-Layer HDR Coding
Authors	Hiroyuki Kobayashi, Hitoshi Kiya
Abstract	We propose a novel JPEG XT image compression with hue compensation for two-layer HDR coding. LDR images produced from JPEG XT bitstreams have some distortion in hue due to tone mapping operations. In order to suppress the color distortion, we apply a novel hue compensation method based on the maximally saturated colors. Moreover, the bitstreams generated by using the proposed method are fully compatible with the JPEG XT standard. In an experiment, the proposed method is demonstrated not only to produce images with small hue degradation but also to maintain well-mapped luminance, in terms of three kinds of criterion: TMQI, hue value in CIEDE2000, and the maximally saturated color on the constant-hue plane.
Tasks	Image Compression
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11315v1
PDF	http://arxiv.org/pdf/1904.11315v1.pdf
PWC	https://paperswithcode.com/paper/jpeg-xt-image-compression-with-hue
Repo
Framework

Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic


Title	Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic
Authors	Seid Muhie Yimam, Abinew Ali Ayele, Chris Biemann
Abstract	In this paper, we present an analysis of the first Ethiopic Twitter Dataset for the Amharic language targeted for recognizing abusive speech. The dataset has been collected since 2014 that is written in Fidel script. Since several languages can be written using the Fidel script, we have used the existing Amharic, Tigrinya and Ge’ez corpora to retain only the Amharic tweets. We have analyzed the tweets for abusive speech content with the following targets: Analyze the distribution and tendency of abusive speech content over time and compare the abusive speech content between a Twitter and general reference Amharic corpus.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04419v1
PDF	https://arxiv.org/pdf/1912.04419v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-the-ethiopic-twitter-dataset-for
Repo
Framework

Fashion Editing with Adversarial Parsing Learning


Title	Fashion Editing with Adversarial Parsing Learning
Authors	Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin
Abstract	Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value. Existing works often treat it as a general inpainting task and do not fully leverage the semantic structural information in fashion images. Moreover, they directly utilize conventional convolution and normalization layers to restore the incomplete image, which tends to wash away the sketch and color information. In this paper, we propose a novel Fashion Editing Generative Adversarial Network (FE-GAN), which is capable of manipulating fashion images by free-form sketches and sparse color strokes. FE-GAN consists of two modules: 1) a free-form parsing network that learns to control the human parsing generation by manipulating sketch and color; 2) a parsing-aware inpainting network that renders detailed textures with semantic guidance from the human parsing map. A new attention normalization layer is further applied at multiple scales in the decoder of the inpainting network to enhance the quality of the synthesized image. Extensive experiments on high-resolution fashion image datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods on image manipulation.
Tasks	Human Parsing
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00884v2
PDF	https://arxiv.org/pdf/1906.00884v2.pdf
PWC	https://paperswithcode.com/paper/190600884
Repo
Framework

Guidelines for creating man-machine multimodal interfaces


Title	Guidelines for creating man-machine multimodal interfaces
Authors	João Ranhel, Cacilda Vilela
Abstract	Understanding details of human multimodal interaction can elucidate many aspects of the type of information processing machines must perform to interact with humans. This article gives an overview of recent findings from Linguistics regarding the organization of conversation in turns, adjacent pairs, (dis)preferred responses, (self)repairs, etc. Besides, we describe how multiple modalities of signs interfere with each other modifying meanings. Then, we propose an abstract algorithm that describes how a machine can implement a double-feedback system that can reproduces a human-like face-to-face interaction by processing various signs, such as verbal, prosodic, facial expressions, gestures, etc. Multimodal face-to-face interactions enrich the exchange of information between agents, mainly because these agents are active all the time by emitting and interpreting signs simultaneously. This article is not about an untested new computational model. Instead, it translates findings from Linguistics as guidelines for designs of multimodal man-machine interfaces. An algorithm is presented. Brought from Linguistics, it is a description pointing out how human face-to-face interactions work. The linguistic findings reported here are the first steps towards the integration of multimodal communication. Some developers involved on interface designs carry on working on isolated models for interpreting text, grammar, gestures and facial expressions, neglecting the interwoven between these signs. In contrast, for linguists working on the state-of-the-art multimodal integration, the interpretation of separated modalities leads to an incomplete interpretation, if not to a miscomprehension of information. The algorithm proposed herein intends to guide man-machine interface designers who want to integrate multimodal components on face-to-face interactions as close as possible to those performed between humans.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10408v1
PDF	http://arxiv.org/pdf/1901.10408v1.pdf
PWC	https://paperswithcode.com/paper/guidelines-for-creating-man-machine
Repo
Framework