February 1, 2020

3064 words 15 mins read

Paper Group AWR 297

Data-Free Quantization Through Weight Equalization and Bias Correction. Learning Humanoid Robot Running Skills through Proximal Policy Optimization. Meshed-Memory Transformer for Image Captioning. Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors. The Effect of Translationese in Machine Translation Test Sets. Benchmar …

Data-Free Quantization Through Weight Equalization and Bias Correction


Title	Data-Free Quantization Through Weight Equalization and Bias Correction
Authors	Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
Abstract	We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference on modern deep learning hardware. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.
Tasks	Object Detection, Quantization, Semantic Segmentation
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04721v3
PDF	https://arxiv.org/pdf/1906.04721v3.pdf
PWC	https://paperswithcode.com/paper/data-free-quantization-through-weight
Repo	https://github.com/jakc4103/DFQ
Framework	pytorch

Learning Humanoid Robot Running Skills through Proximal Policy Optimization


Title	Learning Humanoid Robot Running Skills through Proximal Policy Optimization
Authors	Luckeciano C. Melo, Marcos R. O. A. Maximo
Abstract	In the current level of evolution of Soccer 3D, motion control is a key factor in team’s performance. Recent works takes advantages of model-free approaches based on Machine Learning to exploit robot dynamics in order to obtain faster locomotion skills, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep Reinforcement Learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot’s dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10620v1
PDF	https://arxiv.org/pdf/1910.10620v1.pdf
PWC	https://paperswithcode.com/paper/learning-humanoid-robot-running-skills
Repo	https://github.com/luckeciano/humanoid-run-ppo
Framework	none

Meshed-Memory Transformer for Image Captioning


Title	Meshed-Memory Transformer for Image Captioning
Authors	Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara
Abstract	Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding. Their applicability to multi-modal contexts like image captioning, however, is still largely under-explored. With the aim of filling this gap, we present M$^2$ - a Meshed Transformer with Memory for Image Captioning. The architecture improves both the image encoding and the language generation steps: it learns a multi-level representation of the relationships between image regions integrating learned a priori knowledge, and uses a mesh-like connectivity at decoding stage to exploit low- and high-level features. Experimentally, we investigate the performance of the M$^2$ Transformer and different fully-attentive models in comparison with recurrent ones. When tested on COCO, our proposal achieves a new state of the art in single-model and ensemble configurations on the “Karpathy” test split and on the online test server. We also assess its performances when describing objects unseen in the training set. Trained models and code for reproducing the experiments are publicly available at: https://github.com/aimagelab/meshed-memory-transformer.
Tasks	Image Captioning, Machine Translation, Text Generation
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08226v2
PDF	https://arxiv.org/pdf/1912.08226v2.pdf
PWC	https://paperswithcode.com/paper/m2-meshed-memory-transformer-for-image
Repo	https://github.com/aimagelab/meshed-memory-transformer
Framework	pytorch

Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors


Title	Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
Authors	Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Francesco Moramarco, Jack Flann, Nils Y. Hammerla
Abstract	Recent literature suggests that averaged word vectors followed by simple post-processing outperform many deep learning methods on semantic textual similarity tasks. Furthermore, when averaged word vectors are trained supervised on large corpora of paraphrases, they achieve state-of-the-art results on standard STS benchmarks. Inspired by these insights, we push the limits of word embeddings even further. We propose a novel fuzzy bag-of-words (FBoW) representation for text that contains all the words in the vocabulary simultaneously but with different degrees of membership, which are derived from similarities between word vectors. We show that max-pooled word vectors are only a special case of fuzzy BoW and should be compared via fuzzy Jaccard index rather than cosine similarity. Finally, we propose DynaMax, a completely unsupervised and non-parametric similarity measure that dynamically extracts and max-pools good features depending on the sentence pair. This method is both efficient and easy to implement, yet outperforms current baselines on STS tasks by a large margin and is even competitive with supervised word vectors trained to directly optimise cosine similarity.
Tasks	Semantic Textual Similarity, Word Embeddings
Published	2019-04-30
URL	http://arxiv.org/abs/1904.13264v1
PDF	http://arxiv.org/pdf/1904.13264v1.pdf
PWC	https://paperswithcode.com/paper/dont-settle-for-average-go-for-the-max-fuzzy-1
Repo	https://github.com/Babylonpartners/fuzzymax
Framework	none

The Effect of Translationese in Machine Translation Test Sets


Title	The Effect of Translationese in Machine Translation Test Sets
Authors	Mike Zhang, Antonio Toral
Abstract	The effect of translationese has been studied in the field of machine translation (MT), mostly with respect to training data. We study in depth the effect of translationese on test data, using the test sets from the last three editions of WMT’s news shared task, containing 17 translation directions. We show evidence that (i) the use of translationese in test sets results in inflated human evaluation scores for MT systems; (ii) in some cases system rankings do change and (iii) the impact translationese has on a translation direction is inversely correlated to the translation quality attainable by state-of-the-art MT systems for that direction.
Tasks	Machine Translation
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08069v1
PDF	https://arxiv.org/pdf/1906.08069v1.pdf
PWC	https://paperswithcode.com/paper/the-effect-of-translationese-in-machine
Repo	https://github.com/jjzha/translationese
Framework	none

Benchmarking Neural Machine Translation for Southern African Languages


Title	Benchmarking Neural Machine Translation for Southern African Languages
Authors	Laura Martinus, Jade Z. Abbott
Abstract	Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared. This has lead a struggle to reproduce reported results, and few publicly available benchmarks for African machine translation models exist. To start to address these problems, we trained neural machine translation models for 5 Southern African languages on publicly-available datasets. Code is provided for training the models and evaluate the models on a newly released evaluation set, with the aim of spur future research in the field for Southern African languages.
Tasks	Machine Translation
Published	2019-06-17
URL	https://arxiv.org/abs/1906.10511v1
PDF	https://arxiv.org/pdf/1906.10511v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-neural-machine-translation-for
Repo	https://github.com/LauraMartinus/ukuxhumana
Framework	tf

Towards conceptual generalization in the embedding space


Title	Towards conceptual generalization in the embedding space
Authors	Luka Nenadović, Vladimir Prelovac
Abstract	Humans are able to conceive physical reality by jointly learning different facets thereof. To every pair of notions related to a perceived reality may correspond a mutual relation, which is a notion on its own, but one-level higher. Thus, we may have a description of perceived reality on at least two levels and the translation map between them is in general, due to their different content corpus, one-to-many. Following success of the unsupervised neural machine translation models, which are essentially one-to-one mappings trained separately on monolingual corpora, we examine further capabilities of the unsupervised deep learning methods used there and apply some of these methods to sets of notions of different level and measure. Using the graph and word embedding-like techniques, we build one-to-many map without parallel data in order to establish a unified vector representation of the outer world by combining notions of different kind into a unique conceptual framework. Due to their latent similarity, by aligning the two embedding spaces in purely unsupervised way, one obtains a geometric relation between objects of cognition on the two levels, making it possible to express a natural knowledge using one description in the context of the other.
Tasks	Machine Translation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01873v3
PDF	https://arxiv.org/pdf/1906.01873v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-unsupervised-concept
Repo	https://github.com/kagi-ai/concept-unification
Framework	tf

Feedback Network for Image Super-Resolution


Title	Feedback Network for Image Super-Resolution
Authors	Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, Wei Wu
Abstract	Recent advances in image super-resolution (SR) explored the power of deep learning to achieve a better reconstruction performance. However, the feedback mechanism, which commonly exists in human visual system, has not been fully exploited in existing deep learning based image SR methods. In this paper, we propose an image super-resolution feedback network (SRFBN) to refine low-level representations with high-level information. Specifically, we use hidden states in an RNN with constraints to achieve such feedback manner. A feedback block is designed to handle the feedback connections and to generate powerful high-level representations. The proposed SRFBN comes with a strong early reconstruction ability and can create the final high-resolution image step by step. In addition, we introduce a curriculum learning strategy to make the network well suitable for more complicated tasks, where the low-resolution images are corrupted by multiple types of degradation. Extensive experimental results demonstrate the superiority of the proposed SRFBN in comparison with the state-of-the-art methods. Code is avaliable at https://github.com/Paper99/SRFBN_CVPR19.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-03-23
URL	https://arxiv.org/abs/1903.09814v2
PDF	https://arxiv.org/pdf/1903.09814v2.pdf
PWC	https://paperswithcode.com/paper/feedback-network-for-image-super-resolution
Repo	https://github.com/zhuxyme/zxySRFBN_CVPR2019
Framework	pytorch

BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning


Title	BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning
Authors	Andreas Kirsch, Joost van Amersfoort, Yarin Gal
Abstract	We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time $1 - \frac{1}{e}$-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.
Tasks	Active Learning
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08158v2
PDF	https://arxiv.org/pdf/1906.08158v2.pdf
PWC	https://paperswithcode.com/paper/batchbald-efficient-and-diverse-batch
Repo	https://github.com/BlackHC/BatchBALD
Framework	pytorch

A Machine-learning Based Ensemble Method For Anti-patterns Detection


Title	A Machine-learning Based Ensemble Method For Anti-patterns Detection
Authors	Antoine Barbez, Foutse Khomh, Yann-Gaël Guéhéneuc
Abstract	Anti-patterns are poor solutions to recurring design problems. Several empirical studies have highlighted their negative impact on program comprehension, maintainability, as well as fault-proneness. A variety of detection approaches have been proposed to identify their occurrences in source code. However, these approaches can identify only a subset of the occurrences and report large numbers of false positives and misses. Furthermore, a low agreement is generally observed among different approaches. Recent studies have shown the potential of machine-learning models to improve this situation. However, such algorithms require large sets of manually-produced training-data, which often limits their application in practice. In this paper, we present SMAD (SMart Aggregation of Anti-patterns Detectors), a machine-learning based ensemble method to aggregate various anti-patterns detection approaches on the basis of their internal detection rules. Thus, our method uses several detection tools to produce an improved prediction from a reasonable number of training examples. We implemented SMAD for the detection of two well known anti-patterns: God Class and Feature Envy. With the results of our experiments conducted on eight java projects, we show that: (1) our method clearly improves the so aggregated tools; (2) SMAD significantly outperforms other ensemble methods.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1903.01899v3
PDF	https://arxiv.org/pdf/1903.01899v3.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-based-ensemble-method-for
Repo	https://github.com/antoineBarbez/SMAD
Framework	tf

Good News, Everyone! Context driven entity-aware captioning for news images


Title	Good News, Everyone! Context driven entity-aware captioning for news images
Authors	Ali Furkan Biten, Lluis Gomez, Marçal Rusiñol, Dimosthenis Karatzas
Abstract	Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in the scene and their relations. Humans, on the contrary, interpret images by integrating several sources of prior knowledge of the world. In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline. For this we focus on the captioning of images used to illustrate news articles. We propose a novel captioning method that is able to leverage contextual information provided by the text of news articles associated with an image. Our model is able to selectively draw information from the article guided by visual cues, and to dynamically extend the output dictionary to out-of-vocabulary named entities that appear in the context source. Furthermore we introduce `GoodNews’, the largest news image captioning dataset in the literature and demonstrate state-of-the-art results. \|
Tasks	Image Captioning
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01475v1
PDF	http://arxiv.org/pdf/1904.01475v1.pdf
PWC	https://paperswithcode.com/paper/good-news-everyone-context-driven-entity
Repo	https://github.com/furkanbiten/GoodNews
Framework	pytorch

Visual-Inertial Mapping with Non-Linear Factor Recovery


Title	Visual-Inertial Mapping with Non-Linear Factor Recovery
Authors	Vladyslav Usenko, Nikolaus Demmel, David Schubert, Jörg Stückler, Daniel Cremers
Abstract	Cameras and inertial measurement units are complementary sensors for ego-motion estimation and environment mapping. Their combination makes visual-inertial odometry (VIO) systems more accurate and robust. For globally consistent mapping, however, combining visual and inertial information is not straightforward. To estimate the motion and geometry with a set of images large baselines are required. Because of that, most systems operate on keyframes that have large time intervals between each other. Inertial data on the other hand quickly degrades with the duration of the intervals and after several seconds of integration, it typically contains only little useful information. In this paper, we propose to extract relevant information for visual-inertial mapping from visual-inertial odometry using non-linear factor recovery. We reconstruct a set of non-linear factors that make an optimal approximation of the information on the trajectory accumulated by VIO. To obtain a globally consistent map we combine these factors with loop-closing constraints using bundle adjustment. The VIO factors make the roll and pitch angles of the global map observable, and improve the robustness and the accuracy of the mapping. In experiments on a public benchmark, we demonstrate superior performance of our method over the state-of-the-art approaches.
Tasks	Motion Estimation
Published	2019-04-13
URL	http://arxiv.org/abs/1904.06504v2
PDF	http://arxiv.org/pdf/1904.06504v2.pdf
PWC	https://paperswithcode.com/paper/visual-inertial-mapping-with-non-linear
Repo	https://github.com/VladyslavUsenko/basalt-mirror
Framework	none

ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems


Title	ROVO: Robust Omnidirectional Visual Odometry for Wide-baseline Wide-FOV Camera Systems
Authors	Hochang Seok, Jongwoo Lim
Abstract	In this paper we propose a robust visual odometry system for a wide-baseline camera rig with wide field-of-view (FOV) fisheye lenses, which provides full omnidirectional stereo observations of the environment. For more robust and accurate ego-motion estimation we adds three components to the standard VO pipeline, 1) the hybrid projection model for improved feature matching, 2) multi-view P3P RANSAC algorithm for pose estimation, and 3) online update of rig extrinsic parameters. The hybrid projection model combines the perspective and cylindrical projection to maximize the overlap between views and minimize the image distortion that degrades feature matching performance. The multi-view P3P RANSAC algorithm extends the conventional P3P RANSAC to multi-view images so that all feature matches in all views are considered in the inlier counting for robust pose estimation. Finally the online extrinsic calibration is seamlessly integrated in the backend optimization framework so that the changes in camera poses due to shocks or vibrations can be corrected automatically. The proposed system is extensively evaluated with synthetic datasets with ground-truth and real sequences of highly dynamic environment, and its superior performance is demonstrated.
Tasks	Calibration, Motion Estimation, Pose Estimation, Visual Odometry
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11154v2
PDF	http://arxiv.org/pdf/1902.11154v2.pdf
PWC	https://paperswithcode.com/paper/rovo-robust-omnidirectional-visual-odometry
Repo	https://github.com/renmengqisheng/stereo_multifisheye
Framework	none

Decoding the Style and Bias of Song Lyrics


Title	Decoding the Style and Bias of Song Lyrics
Authors	Manash Pratim Barman, Amit Awekar, Sambhav Kothari
Abstract	The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocabulary, length, repetitiveness, speed, and readability. We have observed that the style of popular songs significantly differs from other songs. We have used distributed representation methods and WEAT test to measure various gender and racial biases in the song lyrics. We have observed that biases in song lyrics correlate with prior results on human subjects. This correlation indicates that song lyrics reflect the biases that exist in society. Increasing consumption of music and the effect of lyrics on human emotions makes this analysis important.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07818v1
PDF	https://arxiv.org/pdf/1907.07818v1.pdf
PWC	https://paperswithcode.com/paper/decoding-the-style-and-bias-of-song-lyrics
Repo	https://github.com/manashpratim/Decoding-the-Style-and-Bias-of-Song-Lyrics
Framework	none

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems


Title	Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems
Authors	Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou
Abstract	In many real-world applications, e.g. recommendation systems, certain items appear much more frequently than other items. However, standard embedding methods—which form the basis of many ML algorithms—allocate the same dimension to all of the items. This leads to statistical and memory inefficiencies. In this work, we propose mixed dimension embedding layers in which the dimension of a particular embedding vector can depend on the frequency of the item. This approach drastically reduces the memory requirement for the embedding, while maintaining and sometimes improving the ML performance. We show that the proposed mixed dimension layers achieve a higher accuracy, while using 8X fewer parameters, for collaborative filtering on the MovieLens dataset. Also, they improve accuracy by 0.1% using half as many parameters or maintain baseline accuracy using 16X fewer parameters for click-through rate prediction task on the Criteo Kaggle dataset.
Tasks	Click-Through Rate Prediction, Recommendation Systems
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11810v1
PDF	https://arxiv.org/pdf/1909.11810v1.pdf
PWC	https://paperswithcode.com/paper/mixed-dimension-embeddings-with-application
Repo	https://github.com/facebookresearch/dlrm
Framework	pytorch