February 1, 2020

3064 words 15 mins read

Paper Group AWR 297

Paper Group AWR 297

Data-Free Quantization Through Weight Equalization and Bias Correction. Learning Humanoid Robot Running Skills through Proximal Policy Optimization. Meshed-Memory Transformer for Image Captioning. Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors. The Effect of Translationese in Machine Translation Test Sets. Benchmar …

Data-Free Quantization Through Weight Equalization and Bias Correction

Title Data-Free Quantization Through Weight Equalization and Bias Correction
Authors Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
Abstract We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference on modern deep learning hardware. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.
Tasks Object Detection, Quantization, Semantic Segmentation
Published 2019-06-11
URL https://arxiv.org/abs/1906.04721v3
PDF https://arxiv.org/pdf/1906.04721v3.pdf
PWC https://paperswithcode.com/paper/data-free-quantization-through-weight
Repo https://github.com/jakc4103/DFQ
Framework pytorch

Learning Humanoid Robot Running Skills through Proximal Policy Optimization

Title Learning Humanoid Robot Running Skills through Proximal Policy Optimization
Authors Luckeciano C. Melo, Marcos R. O. A. Maximo
Abstract In the current level of evolution of Soccer 3D, motion control is a key factor in team’s performance. Recent works takes advantages of model-free approaches based on Machine Learning to exploit robot dynamics in order to obtain faster locomotion skills, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep Reinforcement Learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot’s dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.10620v1
PDF https://arxiv.org/pdf/1910.10620v1.pdf
PWC https://paperswithcode.com/paper/learning-humanoid-robot-running-skills
Repo https://github.com/luckeciano/humanoid-run-ppo
Framework none

Meshed-Memory Transformer for Image Captioning

Title Meshed-Memory Transformer for Image Captioning
Authors Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara
Abstract Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability to multi-modal contexts like image captioning, however, is still largely under-explored. With the aim of filling this gap, we present M$^2$ - a Meshed Transformer with Memory for Image Captioning. The architecture improves both the image encoding and the language generation steps: it learns a multi-level representation of the relationships between image regions integrating learned a priori knowledge, and uses a mesh-like connectivity at decoding stage to exploit low- and high-level features. Experimentally, we investigate the performance of the M$^2$ Transformer and different fully-attentive models in comparison with recurrent ones. When tested on COCO, our proposal achieves a new state of the art in single-model and ensemble configurations on the “Karpathy” test split and on the online test server. We also assess its performances when describing objects unseen in the training set. Trained models and code for reproducing the experiments are publicly available at: https://github.com/aimagelab/meshed-memory-transformer.
Tasks Image Captioning, Machine Translation, Text Generation
Published 2019-12-17
URL https://arxiv.org/abs/1912.08226v2
PDF https://arxiv.org/pdf/1912.08226v2.pdf
PWC https://paperswithcode.com/paper/m2-meshed-memory-transformer-for-image
Repo https://github.com/aimagelab/meshed-memory-transformer
Framework pytorch

Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors

Title Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
Authors Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Francesco Moramarco, Jack Flann, Nils Y. Hammerla
Abstract Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks. Furthermore, when averaged word vectors are trained supervised on large corpora of paraphrases, they achieve state-of-the-art results on standard STS benchmarks. Inspired by these insights, we push the limits of word embeddings even further. We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors. We show that max-pooled word vectors are only a special case of fuzzy BoW and should be compared via fuzzy Jaccard index rather than cosine similarity. Finally, we propose DynaMax, a completely unsupervised and non-parametric similarity measure that dynamically extracts and max-pools good features depending on the sentence pair. This method is both efficient and easy to implement, yet outperforms current baselines on STS tasks by a large margin and is even competitive with supervised word vectors trained to directly optimise cosine similarity.
Tasks Semantic Textual Similarity, Word Embeddings
Published 2019-04-30
URL http://arxiv.org/abs/1904.13264v1
PDF http://arxiv.org/pdf/1904.13264v1.pdf
PWC https://paperswithcode.com/paper/dont-settle-for-average-go-for-the-max-fuzzy-1
Repo https://github.com/Babylonpartners/fuzzymax
Framework none

The Effect of Translationese in Machine Translation Test Sets

Title The Effect of Translationese in Machine Translation Test Sets
Authors Mike Zhang, Antonio Toral
Abstract The effect of translationese has been studied in the field of machine translation (MT), mostly with respect to training data. We study in depth the effect of translationese on test data, using the test sets from the last three editions of WMT’s news shared task, containing 17 translation directions. We show evidence that (i) the use of translationese in test sets results in inflated human evaluation scores for MT systems; (ii) in some cases system rankings do change and (iii) the impact translationese has on a translation direction is inversely correlated to the translation quality attainable by state-of-the-art MT systems for that direction.
Tasks Machine Translation
Published 2019-06-19
URL https://arxiv.org/abs/1906.08069v1
PDF https://arxiv.org/pdf/1906.08069v1.pdf
PWC https://paperswithcode.com/paper/the-effect-of-translationese-in-machine
Repo https://github.com/jjzha/translationese
Framework none

Benchmarking Neural Machine Translation for Southern African Languages

Title Benchmarking Neural Machine Translation for Southern African Languages
Authors Laura Martinus, Jade Z. Abbott
Abstract Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared. This has lead a struggle to reproduce reported results, and few publicly available benchmarks for African machine translation models exist. To start to address these problems, we trained neural machine translation models for 5 Southern African languages on publicly-available datasets. Code is provided for training the models and evaluate the models on a newly released evaluation set, with the aim of spur future research in the field for Southern African languages.
Tasks Machine Translation
Published 2019-06-17
URL https://arxiv.org/abs/1906.10511v1
PDF https://arxiv.org/pdf/1906.10511v1.pdf
PWC https://paperswithcode.com/paper/benchmarking-neural-machine-translation-for
Repo https://github.com/LauraMartinus/ukuxhumana
Framework tf

Towards conceptual generalization in the embedding space

Title Towards conceptual generalization in the embedding space
Authors Luka Nenadović, Vladimir Prelovac
Abstract Humans are able to conceive physical reality by jointly learning different facets thereof. To every pair of notions related to a perceived reality may correspond a mutual relation, which is a notion on its own, but one-level higher. Thus, we may have a description of perceived reality on at least two levels and the translation map between them is in general, due to their different content corpus, one-to-many. Following success of the unsupervised neural machine translation models, which are essentially one-to-one mappings trained separately on monolingual corpora, we examine further capabilities of the unsupervised deep learning methods used there and apply some of these methods to sets of notions of different level and measure. Using the graph and word embedding-like techniques, we build one-to-many map without parallel data in order to establish a unified vector representation of the outer world by combining notions of different kind into a unique conceptual framework. Due to their latent similarity, by aligning the two embedding spaces in purely unsupervised way, one obtains a geometric relation between objects of cognition on the two levels, making it possible to express a natural knowledge using one description in the context of the other.
Tasks Machine Translation
Published 2019-06-05
URL https://arxiv.org/abs/1906.01873v3
PDF https://arxiv.org/pdf/1906.01873v3.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-unsupervised-concept
Repo https://github.com/kagi-ai/concept-unification
Framework tf

Feedback Network for Image Super-Resolution

Title Feedback Network for Image Super-Resolution
Authors Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, Wei Wu
Abstract Recent advances in image super-resolution (SR) explored the power of deep learning to achieve a better reconstruction performance. However, the feedback mechanism, which commonly exists in human visual system, has not been fully exploited in existing deep learning based image SR methods. In this paper, we propose an image super-resolution feedback network (SRFBN) to refine low-level representations with high-level information. Specifically, we use hidden states in an RNN with constraints to achieve such feedback manner. A feedback block is designed to handle the feedback connections and to generate powerful high-level representations. The proposed SRFBN comes with a strong early reconstruction ability and can create the final high-resolution image step by step. In addition, we introduce a curriculum learning strategy to make the network well suitable for more complicated tasks, where the low-resolution images are corrupted by multiple types of degradation. Extensive experimental results demonstrate the superiority of the proposed SRFBN in comparison with the state-of-the-art methods. Code is avaliable at https://github.com/Paper99/SRFBN_CVPR19.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-03-23
URL https://arxiv.org/abs/1903.09814v2
PDF https://arxiv.org/pdf/1903.09814v2.pdf
PWC https://paperswithcode.com/paper/feedback-network-for-image-super-resolution
Repo https://github.com/zhuxyme/zxySRFBN_CVPR2019
Framework pytorch

BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

Title BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning
Authors Andreas Kirsch, Joost van Amersfoort, Yarin Gal
Abstract We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time $1 - \frac{1}{e}$-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.
Tasks Active Learning
Published 2019-06-19
URL https://arxiv.org/abs/1906.08158v2
PDF https://arxiv.org/pdf/1906.08158v2.pdf
PWC https://paperswithcode.com/paper/batchbald-efficient-and-diverse-batch
Repo https://github.com/BlackHC/BatchBALD
Framework pytorch

A Machine-learning Based Ensemble Method For Anti-patterns Detection

Title A Machine-learning Based Ensemble Method For Anti-patterns Detection
Authors Antoine Barbez, Foutse Khomh, Yann-Gaël Guéhéneuc
Abstract Anti-patterns are poor solutions to recurring design problems. Several empirical studies have highlighted their negative impact on program comprehension, maintainability, as well as fault-proneness. A variety of detection approaches have been proposed to identify their occurrences in source code. However, these approaches can identify only a subset of the occurrences and report large numbers of false positives and misses. Furthermore, a low agreement is generally observed among different approaches. Recent studies have shown the potential of machine-learning models to improve this situation. However, such algorithms require large sets of manually-produced training-data, which often limits their application in practice. In this paper, we present SMAD (SMart Aggregation of Anti-patterns Detectors), a machine-learning based ensemble method to aggregate various anti-patterns detection approaches on the basis of their internal detection rules. Thus, our method uses several detection tools to produce an improved prediction from a reasonable number of training examples. We implemented SMAD for the detection of two well known anti-patterns: God Class and Feature Envy. With the results of our experiments conducted on eight java projects, we show that: (1) our method clearly improves the so aggregated tools; (2) SMAD significantly outperforms other ensemble methods.
Tasks
Published 2019-01-29
URL https://arxiv.org/abs/1903.01899v3
PDF https://arxiv.org/pdf/1903.01899v3.pdf
PWC https://paperswithcode.com/paper/a-machine-learning-based-ensemble-method-for
Repo https://github.com/antoineBarbez/SMAD
Framework tf

Good News, Everyone! Context driven entity-aware captioning for news images

Title Good News, Everyone! Context driven entity-aware captioning for news images
Authors Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas
Abstract Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the world. In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline. For this we focus on the captioning of images used to illustrate news articles. We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image. Our model is able to selectively draw information from the article guided by visual cues, and to dynamically extend the output dictionary to out-of-vocabulary named entities that appear in the context source. Furthermore we introduce `GoodNews’, the largest news image captioning dataset in the literature and demonstrate state-of-the-art results. |
Tasks Image Captioning
Published 2019-04-02
URL http://arxiv.org/abs/1904.01475v1
PDF http://arxiv.org/pdf/1904.01475v1.pdf
PWC https://paperswithcode.com/paper/good-news-everyone-context-driven-entity
Repo https://github.com/furkanbiten/GoodNews
Framework pytorch

Visual-Inertial Mapping with Non-Linear Factor Recovery

Title Visual-Inertial Mapping with Non-Linear Factor Recovery
Authors Vladyslav Usenko, Nikolaus Demmel, David Schubert, Jörg Stückler, Daniel Cremers
Abstract Cameras and inertial measurement units are complementary sensors for ego-motion estimation and environment mapping. Their combination makes visual-inertial odometry (VIO) systems more accurate and robust. For globally consistent mapping, however, combining visual and inertial information is not straightforward. To estimate the motion and geometry with a set of images large baselines are required. Because of that, most systems operate on keyframes that have large time intervals between each other. Inertial data on the other hand quickly degrades with the duration of the intervals and after several seconds of integration, it typically contains only little useful information. In this paper, we propose to extract relevant information for visual-inertial mapping from visual-inertial odometry using non-linear factor recovery. We reconstruct a set of non-linear factors that make an optimal approximation of the information on the trajectory accumulated by VIO. To obtain a globally consistent map we combine these factors with loop-closing constraints using bundle adjustment. The VIO factors make the roll and pitch angles of the global map observable, and improve the robustness and the accuracy of the mapping. In experiments on a public benchmark, we demonstrate superior performance of our method over the state-of-the-art approaches.
Tasks Motion Estimation
Published 2019-04-13
URL http://arxiv.org/abs/1904.06504v2
PDF http://arxiv.org/pdf/1904.06504v2.pdf
PWC https://paperswithcode.com/paper/visual-inertial-mapping-with-non-linear
Repo https://github.com/VladyslavUsenko/basalt-mirror
Framework none

ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems

Title ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems
Authors Hochang Seok, Jongwoo Lim
Abstract In this paper we propose a robust visual odometry system for a wide-baseline camera rig with wide field-of-view (FOV) fisheye lenses, which provides full omnidirectional stereo observations of the environment. For more robust and accurate ego-motion estimation we adds three components to the standard VO pipeline, 1) the hybrid projection model for improved feature matching, 2) multi-view P3P RANSAC algorithm for pose estimation, and 3) online update of rig extrinsic parameters. The hybrid projection model combines the perspective and cylindrical projection to maximize the overlap between views and minimize the image distortion that degrades feature matching performance. The multi-view P3P RANSAC algorithm extends the conventional P3P RANSAC to multi-view images so that all feature matches in all views are considered in the inlier counting for robust pose estimation. Finally the online extrinsic calibration is seamlessly integrated in the backend optimization framework so that the changes in camera poses due to shocks or vibrations can be corrected automatically. The proposed system is extensively evaluated with synthetic datasets with ground-truth and real sequences of highly dynamic environment, and its superior performance is demonstrated.
Tasks Calibration, Motion Estimation, Pose Estimation, Visual Odometry
Published 2019-02-28
URL http://arxiv.org/abs/1902.11154v2
PDF http://arxiv.org/pdf/1902.11154v2.pdf
PWC https://paperswithcode.com/paper/rovo-robust-omnidirectional-visual-odometry
Repo https://github.com/renmengqisheng/stereo_multifisheye
Framework none

Decoding the Style and Bias of Song Lyrics

Title Decoding the Style and Bias of Song Lyrics
Authors Manash Pratim Barman, Amit Awekar, Sambhav Kothari
Abstract The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocabulary, length, repetitiveness, speed, and readability. We have observed that the style of popular songs significantly differs from other songs. We have used distributed representation methods and WEAT test to measure various gender and racial biases in the song lyrics. We have observed that biases in song lyrics correlate with prior results on human subjects. This correlation indicates that song lyrics reflect the biases that exist in society. Increasing consumption of music and the effect of lyrics on human emotions makes this analysis important.
Tasks
Published 2019-07-17
URL https://arxiv.org/abs/1907.07818v1
PDF https://arxiv.org/pdf/1907.07818v1.pdf
PWC https://paperswithcode.com/paper/decoding-the-style-and-bias-of-song-lyrics
Repo https://github.com/manashpratim/Decoding-the-Style-and-Bias-of-Song-Lyrics
Framework none

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

Title Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
Authors Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou
Abstract In many real-world applications, e.g. recommendation systems, certain items appear much more frequently than other items. However, standard embedding methods—which form the basis of many ML algorithms—allocate the same dimension to all of the items. This leads to statistical and memory inefficiencies. In this work, we propose mixed dimension embedding layers in which the dimension of a particular embedding vector can depend on the frequency of the item. This approach drastically reduces the memory requirement for the embedding, while maintaining and sometimes improving the ML performance. We show that the proposed mixed dimension layers achieve a higher accuracy, while using 8X fewer parameters, for collaborative filtering on the MovieLens dataset. Also, they improve accuracy by 0.1% using half as many parameters or maintain baseline accuracy using 16X fewer parameters for click-through rate prediction task on the Criteo Kaggle dataset.
Tasks Click-Through Rate Prediction, Recommendation Systems
Published 2019-09-25
URL https://arxiv.org/abs/1909.11810v1
PDF https://arxiv.org/pdf/1909.11810v1.pdf
PWC https://paperswithcode.com/paper/mixed-dimension-embeddings-with-application
Repo https://github.com/facebookresearch/dlrm
Framework pytorch
comments powered by Disqus