Paper Group ANR 1638
A Distributed Approach towards Discriminative Distance Metric Learning. Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards. High Fidelity Face Manipulation with Extreme Pose and Expression. Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference. Hidden State Guidance: Improving Image Caption …
A Distributed Approach towards Discriminative Distance Metric Learning
Title | A Distributed Approach towards Discriminative Distance Metric Learning |
Authors | Jun Li, Xun Lin, Xiaoguang Rui, Yong Rui, Dacheng Tao |
Abstract | Distance metric learning is successful in discovering intrinsic relations in data. However, most algorithms are computationally demanding when the problem size becomes large. In this paper, we propose a discriminative metric learning algorithm, and develop a distributed scheme learning metrics on moderate-sized subsets of data, and aggregating the results into a global solution. The technique leverages the power of parallel computation. The algorithm of the aggregated distance metric learning (ADML) scales well with the data size and can be controlled by the partition. We theoretically analyse and provide bounds for the error induced by the distributed treatment. We have conducted experimental evaluation of ADML, both on specially designed tests and on practical image annotation tasks. Those tests have shown that ADML achieves the state-of-the-art performance at only a fraction of the cost incurred by most existing methods. |
Tasks | Metric Learning |
Published | 2019-05-11 |
URL | https://arxiv.org/abs/1905.05177v1 |
https://arxiv.org/pdf/1905.05177v1.pdf | |
PWC | https://paperswithcode.com/paper/a-distributed-approach-towards-discriminative |
Repo | |
Framework | |
Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards
Title | Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards |
Authors | Anmol Kagrecha, Jayakrishnan Nair, Krishna Jagannathan |
Abstract | Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR) in a fixed budget best-arm identification framework. We allow the reward distributions to be unbounded or even heavy-tailed. For this problem, our goal is to devise algorithms that are entirely distribution oblivious, i.e., the algorithm is not aware of any information on the reward distributions, including bounds on the moments/tails, or the suboptimality gaps across arms. In this paper, we provide a class of such algorithms with provable upper bounds on the probability of incorrect identification. In the process, we develop a novel estimator for the CVaR of unbounded (including heavy-tailed) random variables and prove a concentration inequality for the same, which could be of independent interest. We also compare the error bounds for our distribution oblivious algorithms with those corresponding to standard non-oblivious algorithms. Finally, numerical experiments reveal that our algorithms perform competitively when compared with non-oblivious algorithms, suggesting that distribution obliviousness can be realised in practice without incurring a significant loss of performance. |
Tasks | Multi-Armed Bandits |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00569v1 |
https://arxiv.org/pdf/1906.00569v1.pdf | |
PWC | https://paperswithcode.com/paper/190600569 |
Repo | |
Framework | |
High Fidelity Face Manipulation with Extreme Pose and Expression
Title | High Fidelity Face Manipulation with Extreme Pose and Expression |
Authors | Chaoyou Fu, Yibo Hu, Xiang Wu, Guoli Wang, Qian Zhang, Ran He |
Abstract | Face manipulation has shown remarkable advances with the flourish of Generative Adversarial Networks. However, due to the difficulties of controlling the structure and texture in high-resolution, it is challenging to simultaneously model pose and expression during manipulation. In this paper, we propose a novel framework that simplifies face manipulation with extreme pose and expression into two correlated stages: a boundary prediction stage and a disentangled face synthesis stage. In the first stage, we propose to use a boundary image for joint pose and expression modeling. An encoder-decoder network is employed to predict the boundary image of the target face in a semi-supervised way. Pose and expression estimators are employed to improve the prediction accuracy. In the second stage, the predicted boundary image and the original face are encoded into the structure and texture latent space by two encoder networks respectively. A proxy network and a feature threshold loss are further imposed to disentangle the latent space. Furthermore, considering the lack of high-resolution face databases to verify the effectiveness of our method, we collect a new high quality Multi-View Face (MVF-HQ) database in 6000 $\times$ 4000 resolution. It contains 120,283 images from 479 identities with diverse pose, expression and illumination variants, which is much larger in scale and much higher in resolution than the current public high-resolution face manipulation databases. We expect it to push forward the advance of face manipulation. Qualitative and quantitative experiments on four databases show that our method dramatically improves the visualization of face manipulation. |
Tasks | Face Generation, Face Recognition |
Published | 2019-03-28 |
URL | https://arxiv.org/abs/1903.12003v2 |
https://arxiv.org/pdf/1903.12003v2.pdf | |
PWC | https://paperswithcode.com/paper/high-fidelity-face-manipulation-with-extreme |
Repo | |
Framework | |
Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference
Title | Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference |
Authors | Thomas Verelst, Tinne Tuytelaars |
Abstract | Modern convolutional neural networks apply the same operations on every pixel in an image. However, not all image regions are equally important. To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image. We introduce a residual block where a small gating branch learns which spatial positions should be evaluated. These discrete gating decisions are trained end-to-end using the Gumbel-Softmax trick, in combination with a sparsity criterion. Our experiments on Food-101, CIFAR and ImageNet show that our method has better focus on the region of interest and better accuracy than existing methods, at a lower computational complexity. Moreover, we provide an efficient CUDA implementation of our dynamic convolutions using a gather-scatter approach, achieving a significant improvement in inference speed on MobileNetV2 and ShuffleNetV2. On human pose estimation, a task that is inherently spatially sparse, the processing speed is increased by 45% with less than 0.1% loss in accuracy. |
Tasks | Pose Estimation |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03203v1 |
https://arxiv.org/pdf/1912.03203v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-convolutions-exploiting-spatial |
Repo | |
Framework | |
Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder
Title | Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder |
Authors | Jialin Wu, Raymond J. Mooney |
Abstract | Most RNN-based image captioning models receive supervision on the output words to mimic human captions. Therefore, the hidden states can only receive noisy gradient signals via layers of back-propagation through time, leading to less accurate generated captions. Consequently, we propose a novel framework, Hidden State Guidance (HSG), that matches the hidden states in the caption decoder to those in a teacher decoder trained on an easier task of autoencoding the captions conditioned on the image. During training with the REINFORCE algorithm, the conventional rewards are sentence-based evaluation metrics equally distributed to each generated word, no matter their relevance. HSG provides a word-level reward that helps the model learn better hidden representations. Experimental results demonstrate that HSG clearly outperforms various state-of-the-art caption decoders using either raw images or detected objects as inputs. |
Tasks | Image Captioning |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14208v2 |
https://arxiv.org/pdf/1910.14208v2.pdf | |
PWC | https://paperswithcode.com/paper/hidden-state-guidance-improving-image |
Repo | |
Framework | |
KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning
Title | KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning |
Authors | Tarin Clanuwat, Alex Lamb, Asanobu Kitamoto |
Abstract | Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years starting from the 8th century. Over 3 millions books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved. However, following a change to the Japanese writing system in 1900, Kuzushiji has not been included in regular school curricula. Therefore, most Japanese natives nowadays cannot read books written or printed just 150 years ago. Museums and libraries have invested a great deal of effort into creating digital copies of these historical documents as a safeguard against fires, earthquakes and tsunamis. The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts. Thus there has been a great deal of interest in using Machine Learning to automatically recognize these historical texts and transcribe them into modern Japanese characters. Nevertheless, several challenges in Kuzushiji recognition have made the performance of existing systems extremely poor. To tackle these challenges, we propose KuroNet, a new end-to-end model which jointly recognizes an entire page of text by using a residual U-Net architecture which predicts the location and identity of all characters given a page of text (without any pre-processing). This allows the model to handle long range context, large vocabularies, and non-standardized character layouts. We demonstrate that our system is able to successfully recognize a large fraction of pre-modern Japanese documents, but also explore areas where our system is limited and suggest directions for future work. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09433v1 |
https://arxiv.org/pdf/1910.09433v1.pdf | |
PWC | https://paperswithcode.com/paper/kuronet-pre-modern-japanese-kuzushiji |
Repo | |
Framework | |
Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness
Title | Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness |
Authors | Adnan Siraj Rakin, Zhezhi He, Li Yang, Yanzhi Wang, Liqiang Wang, Deliang Fan |
Abstract | Deep Neural Network (DNN) trained by the gradient descent method is known to be vulnerable to maliciously perturbed adversarial input, aka. adversarial attack. As one of the countermeasures against adversarial attack, increasing the model capacity for DNN robustness enhancement was discussed and reported as an effective approach by many recent works. In this work, we show that shrinking the model size through proper weight pruning can even be helpful to improve the DNN robustness under adversarial attack. For obtaining a simultaneously robust and compact DNN model, we propose a multi-objective training method called Robust Sparse Regularization (RSR), through the fusion of various regularization techniques, including channel-wise noise injection, lasso weight penalty, and adversarial training. We conduct extensive experiments across popular ResNet-20, ResNet-18 and VGG-16 DNN architectures to demonstrate the effectiveness of RSR against popular white-box (i.e., PGD and FGSM) and black-box attacks. Thanks to RSR, 85% weight connections of ResNet-18 can be pruned while still achieving 0.68% and 8.72% improvement in clean- and perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its PGD adversarial training baseline. |
Tasks | Adversarial Attack |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13074v1 |
https://arxiv.org/pdf/1905.13074v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-sparse-regularization-simultaneously |
Repo | |
Framework | |
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning
Title | Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning |
Authors | Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel Graves |
Abstract | We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $\textit{fixed}$ number of future time steps. To learn the value function for horizon $h$, these algorithms bootstrap from the value function for horizon $h-1$, or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as “the deadly triad”). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and $n$-step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement learning problems competitively with methods such as Q-learning that learn conventional value functions. We also prove convergence of fixed-horizon temporal difference methods with linear and general function approximation. Taken together, our results establish fixed-horizon TD methods as a viable new way of avoiding the stability problems of the deadly triad. |
Tasks | Q-Learning |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03906v2 |
https://arxiv.org/pdf/1909.03906v2.pdf | |
PWC | https://paperswithcode.com/paper/fixed-horizon-temporal-difference-methods-for |
Repo | |
Framework | |
Mirror Descent View for Neural Network Quantization
Title | Mirror Descent View for Neural Network Quantization |
Authors | Thalaiyasingam Ajanthan, Kartik Gupta, Philip H. S. Torr, Richard Hartley, Puneet K. Dokania |
Abstract | Quantizing large Neural Networks (NN) while maintaining the performance is highly desirable for resource-limited devices due to reduced memory and time complexity. It is usually formulated as a constrained optimization problem and optimized via a modified version of gradient descent. In this work, by interpreting the continuous parameters (unconstrained) as the dual of the quantized ones, we introduce a Mirror Descent (MD) framework for NN quantization. Specifically, we provide conditions on the projections (i.e., mapping from continuous to quantized ones) which would enable us to derive valid mirror maps and in turn the respective MD updates. Furthermore, we present a numerically stable implementation of MD that requires storing an additional set of auxiliary variables (unconstrained), and show that it is strikingly analogous to the Straight Through Estimator (STE) based method which is typically viewed as a “trick” to avoid vanishing gradients issue. Our experiments on CIFAR-10/100, TinyImageNet, and ImageNet classification datasets with VGG-16, ResNet-18, and MobileNetV2 architectures show that our MD variants obtain quantized networks with state-of-the-art performance. |
Tasks | Quantization |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08237v2 |
https://arxiv.org/pdf/1910.08237v2.pdf | |
PWC | https://paperswithcode.com/paper/mirror-descent-view-for-neural-network |
Repo | |
Framework | |
Emergence of order in random languages
Title | Emergence of order in random languages |
Authors | E. DeGiuli |
Abstract | We consider languages generated by weighted context-free grammars. It is shown that the behaviour of large texts is controlled by saddle-point equations for an appropriate generating function. We then consider ensembles of grammars, in particular the Random Language Model of E. DeGiuli, Phys. Rev. Lett., 122, 128301, 2019. This model is solved in the replica-symmetric ansatz, which is valid in the high-temperature, disordered phase. It is shown that in the phase in which languages carry information, the replica symmetry must be broken. |
Tasks | Language Modelling |
Published | 2019-02-20 |
URL | https://arxiv.org/abs/1902.07516v2 |
https://arxiv.org/pdf/1902.07516v2.pdf | |
PWC | https://paperswithcode.com/paper/emergence-of-order-in-random-languages |
Repo | |
Framework | |
Quickly Finding the Best Linear Model in High Dimensions
Title | Quickly Finding the Best Linear Model in High Dimensions |
Authors | Yahya Sattar, Samet Oymak |
Abstract | We study the problem of finding the best linear model that can minimize least-squares loss given a data-set. While this problem is trivial in the low dimensional regime, it becomes more interesting in high dimensions where the population minimizer is assumed to lie on a manifold such as sparse vectors. We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in the finite sample regime. We establish linear convergence rate and data dependent estimation error bounds for PGD. Our contributions include: 1) The results are established for heavier tailed sub-exponential distributions besides sub-gaussian. 2) We directly analyze the empirical risk minimization and do not require a realizable model that connects input data and labels. 3) Our PGD algorithm is augmented to learn the bias terms which boosts the performance. The numerical experiments validate our theoretical results. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.01728v1 |
https://arxiv.org/pdf/1907.01728v1.pdf | |
PWC | https://paperswithcode.com/paper/quickly-finding-the-best-linear-model-in-high |
Repo | |
Framework | |
JPEG XT Image Compression with Hue Compensation for Two-Layer HDR Coding
Title | JPEG XT Image Compression with Hue Compensation for Two-Layer HDR Coding |
Authors | Hiroyuki Kobayashi, Hitoshi Kiya |
Abstract | We propose a novel JPEG XT image compression with hue compensation for two-layer HDR coding. LDR images produced from JPEG XT bitstreams have some distortion in hue due to tone mapping operations. In order to suppress the color distortion, we apply a novel hue compensation method based on the maximally saturated colors. Moreover, the bitstreams generated by using the proposed method are fully compatible with the JPEG XT standard. In an experiment, the proposed method is demonstrated not only to produce images with small hue degradation but also to maintain well-mapped luminance, in terms of three kinds of criterion: TMQI, hue value in CIEDE2000, and the maximally saturated color on the constant-hue plane. |
Tasks | Image Compression |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11315v1 |
http://arxiv.org/pdf/1904.11315v1.pdf | |
PWC | https://paperswithcode.com/paper/jpeg-xt-image-compression-with-hue |
Repo | |
Framework | |
Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic
Title | Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic |
Authors | Seid Muhie Yimam, Abinew Ali Ayele, Chris Biemann |
Abstract | In this paper, we present an analysis of the first Ethiopic Twitter Dataset for the Amharic language targeted for recognizing abusive speech. The dataset has been collected since 2014 that is written in Fidel script. Since several languages can be written using the Fidel script, we have used the existing Amharic, Tigrinya and Ge’ez corpora to retain only the Amharic tweets. We have analyzed the tweets for abusive speech content with the following targets: Analyze the distribution and tendency of abusive speech content over time and compare the abusive speech content between a Twitter and general reference Amharic corpus. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.04419v1 |
https://arxiv.org/pdf/1912.04419v1.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-the-ethiopic-twitter-dataset-for |
Repo | |
Framework | |
Fashion Editing with Adversarial Parsing Learning
Title | Fashion Editing with Adversarial Parsing Learning |
Authors | Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin |
Abstract | Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value. Existing works often treat it as a general inpainting task and do not fully leverage the semantic structural information in fashion images. Moreover, they directly utilize conventional convolution and normalization layers to restore the incomplete image, which tends to wash away the sketch and color information. In this paper, we propose a novel Fashion Editing Generative Adversarial Network (FE-GAN), which is capable of manipulating fashion images by free-form sketches and sparse color strokes. FE-GAN consists of two modules: 1) a free-form parsing network that learns to control the human parsing generation by manipulating sketch and color; 2) a parsing-aware inpainting network that renders detailed textures with semantic guidance from the human parsing map. A new attention normalization layer is further applied at multiple scales in the decoder of the inpainting network to enhance the quality of the synthesized image. Extensive experiments on high-resolution fashion image datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods on image manipulation. |
Tasks | Human Parsing |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00884v2 |
https://arxiv.org/pdf/1906.00884v2.pdf | |
PWC | https://paperswithcode.com/paper/190600884 |
Repo | |
Framework | |
Guidelines for creating man-machine multimodal interfaces
Title | Guidelines for creating man-machine multimodal interfaces |
Authors | João Ranhel, Cacilda Vilela |
Abstract | Understanding details of human multimodal interaction can elucidate many aspects of the type of information processing machines must perform to interact with humans. This article gives an overview of recent findings from Linguistics regarding the organization of conversation in turns, adjacent pairs, (dis)preferred responses, (self)repairs, etc. Besides, we describe how multiple modalities of signs interfere with each other modifying meanings. Then, we propose an abstract algorithm that describes how a machine can implement a double-feedback system that can reproduces a human-like face-to-face interaction by processing various signs, such as verbal, prosodic, facial expressions, gestures, etc. Multimodal face-to-face interactions enrich the exchange of information between agents, mainly because these agents are active all the time by emitting and interpreting signs simultaneously. This article is not about an untested new computational model. Instead, it translates findings from Linguistics as guidelines for designs of multimodal man-machine interfaces. An algorithm is presented. Brought from Linguistics, it is a description pointing out how human face-to-face interactions work. The linguistic findings reported here are the first steps towards the integration of multimodal communication. Some developers involved on interface designs carry on working on isolated models for interpreting text, grammar, gestures and facial expressions, neglecting the interwoven between these signs. In contrast, for linguists working on the state-of-the-art multimodal integration, the interpretation of separated modalities leads to an incomplete interpretation, if not to a miscomprehension of information. The algorithm proposed herein intends to guide man-machine interface designers who want to integrate multimodal components on face-to-face interactions as close as possible to those performed between humans. |
Tasks | |
Published | 2019-01-29 |
URL | http://arxiv.org/abs/1901.10408v1 |
http://arxiv.org/pdf/1901.10408v1.pdf | |
PWC | https://paperswithcode.com/paper/guidelines-for-creating-man-machine |
Repo | |
Framework | |