February 1, 2020

3100 words 15 mins read

Paper Group AWR 96

Fixing the train-test resolution discrepancy. Neural Percussive Synthesis Parameterised by High-Level Timbral Features. Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations. A Low-Cost, Flexible and Portable Volumetric Capturing System. Wasserstein Style Transfer. Educating Text Autoencoders: Latent Representati …

Fixing the train-test resolution discrepancy


Title	Fixing the train-test resolution discrepancy
Authors	Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou
Abstract	Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We then propose a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ. It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128x128 images, and 79.8% with one trained on 224x224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224x224 images. Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86.4% (top-5: 98.0%) (single-crop). To the best of our knowledge this is the highest ImageNet single-crop, top-1 and top-5 accuracy to date.
Tasks	Data Augmentation, Fine-Grained Image Classification, Image Classification
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06423v3
PDF	https://arxiv.org/pdf/1906.06423v3.pdf
PWC	https://paperswithcode.com/paper/fixing-the-train-test-resolution-discrepancy
Repo	https://github.com/facebookresearch/FixRes
Framework	pytorch

Neural Percussive Synthesis Parameterised by High-Level Timbral Features


Title	Neural Percussive Synthesis Parameterised by High-Level Timbral Features
Authors	António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra
Abstract	We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds. This approach allows for intuitive control of a synthesizer, enabling the user to shape sounds without extensive knowledge of signal processing. We use a feedforward convolutional neural network-based architecture, which is able to map input parameters to the corresponding waveform. We propose two datasets to evaluate our approach on both a restrictive context, and in one covering a broader spectrum of sounds. The timbral features used as parameters are taken from recent literature in signal processing. We also use these features for evaluation and validation of the presented model, to ensure that changing the input parameters produces a congruent waveform with the desired characteristics. Finally, we evaluate the quality of the output sound using a subjective listening test. We provide sound examples and the system’s source code for reproducibility.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11853v1
PDF	https://arxiv.org/pdf/1911.11853v1.pdf
PWC	https://paperswithcode.com/paper/neural-percussive-synthesis-parameterised-by
Repo	https://github.com/pc2752/percussive_synth
Framework	tf

Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations


Title	Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations
Authors	Ozsel Kilinc, Yang Hu, Giovanni Montana
Abstract	Learning robot manipulation policies through reinforcement learning (RL) with only sparse rewards is still considered a largely unsolved problem. Although learning with human demonstrations can make the training process more sample efficient, the demonstrations are often expensive to obtain, and their benefits heavily depend on the expertise of the demonstrators. In this paper we propose a novel approach for learning complex robot manipulation tasks with self-learned demonstrations. We note that a robot manipulation task can be interpreted, from the object’s perspective, as a locomotion task. In a virtual world, the object might be able to learn how to move from its initial position to the final target position on its own, without being manipulated. Although objects cannot move on their own in the real world, a policy to achieve object locomotion can be learned through physically-realistic simulators, which are nowadays widely available and routinely adopted to train RL systems. The resulting object-level trajectories are called Simulated Locomotion Demonstrations (SLD). The SLDs are then leveraged to learn the robot manipulation policy through deep RL using only sparse rewards. We thoroughly evaluate the proposed approach on 13 tasks of increasing complexity, and demonstrate that our framework can result in faster learning rates and achieve higher success rate compared to alternative algorithms. We demonstrate that SLDs are especially beneficial for complex tasks like multi-object stacking and non-rigid object manipulation.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07294v2
PDF	https://arxiv.org/pdf/1910.07294v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-robotic
Repo	https://github.com/WMGDataScience/gym_wmgds
Framework	none

A Low-Cost, Flexible and Portable Volumetric Capturing System


Title	A Low-Cost, Flexible and Portable Volumetric Capturing System
Authors	Vladimiros Sterzentsenko, Antonis Karakottas, Alexandros Papachristou, Nikolaos Zioulis, Alexandros Doumanoglou, Dimitrios Zarpalas, Petros Daras
Abstract	Multi-view capture systems are complex systems to engineer. They require technical knowledge to install and intricate processes to setup related mainly to the sensors’ spatial alignment (i.e. external calibration). However, with the ongoing developments in new production methods, we are now at a position where the production of high quality realistic 3D assets is possible even with commodity sensors. Nonetheless, the capturing systems developed with these methods are heavily intertwined with the methods themselves, relying on custom solutions and seldom - if not at all - publicly available. In light of this, we design, develop and publicly offer a multi-view capture system based on the latest RGB-D sensor technology. For our system, we develop a portable and easy-to-use external calibration method that greatly reduces the effort and knowledge required, as well as simplify the overall process.
Tasks	Calibration
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01207v1
PDF	https://arxiv.org/pdf/1909.01207v1.pdf
PWC	https://paperswithcode.com/paper/a-low-cost-flexible-and-portable-volumetric
Repo	https://github.com/VCL3D/VolumetricCapture
Framework	none

Wasserstein Style Transfer


Title	Wasserstein Style Transfer
Authors	Youssef Mroueh
Abstract	We propose Gaussian optimal transport for Image style transfer in an Encoder/Decoder framework. Optimal transport for Gaussian measures has closed forms Monge mappings from source to target distributions. Moreover interpolates between a content and a style image can be seen as geodesics in the Wasserstein Geometry. Using this insight, we show how to mix different target styles , using Wasserstein barycenter of Gaussian measures. Since Gaussians are closed under Wasserstein barycenter, this allows us a simple style transfer and style mixing and interpolation. Moreover we show how mixing different styles can be achieved using other geodesic metrics between gaussians such as the Fisher Rao metric, while the transport of the content to the new interpolate style is still performed with Gaussian OT maps. Our simple methodology allows to generate new stylized content interpolating between many artistic styles. The metric used in the interpolation results in different stylizations.
Tasks	Style Transfer
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12828v1
PDF	https://arxiv.org/pdf/1905.12828v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-style-transfer
Repo	https://github.com/wasserstein-transfer/wasserstein-transfer.github.io
Framework	none

Educating Text Autoencoders: Latent Representation Guidance via Denoising


Title	Educating Text Autoencoders: Latent Representation Guidance via Denoising
Authors	Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola
Abstract	Generative autoencoders offer a promising approach for controllable text generation by leveraging their latent sentence representations. However, current models struggle to maintain coherent latent spaces required to perform meaningful text manipulations via latent vector operations. Specifically, we demonstrate by example that neural encoders do not necessarily map similar sentences to nearby latent vectors. A theoretical explanation for this phenomenon establishes that high-capacity autoencoders can learn an arbitrary mapping between sequences and associated latent representations. To remedy this issue, we augment adversarial autoencoders with a denoising objective where original sentences are reconstructed from perturbed versions (referred to as DAAE). We prove that this simple modification guides the latent space geometry of the resulting model by encouraging the encoder to map similar texts to similar latent representations. In empirical comparisons with various types of autoencoders, our model provides the best trade-off between generation quality and reconstruction capacity. Moreover, the improved geometry of the DAAE latent space enables zero-shot text style transfer via simple latent vector arithmetic.
Tasks	Denoising, Style Transfer, Text Generation, Text Style Transfer
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12777v2
PDF	https://arxiv.org/pdf/1905.12777v2.pdf
PWC	https://paperswithcode.com/paper/latent-space-secrets-of-denoising-text
Repo	https://github.com/shentianxiao/text-autoencoders
Framework	pytorch

Techniques for Inverted Index Compression


Title	Techniques for Inverted Index Compression
Authors	Giulio Ermanno Pibiri, Rossano Venturini
Abstract	The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent performance requirements imposed by the heavy load of queries, the inverted index stores billions of integers that must be searched efficiently. In this scenario, index compression is essential because it leads to a better exploitation of the computer memory hierarchy for faster query processing and, at the same time, allows reducing the number of storage machines. The aim of this article is twofold: first, surveying the encoding algorithms suitable for inverted index compression and, second, characterizing the performance of the inverted index through experimentation.
Tasks
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10598v1
PDF	https://arxiv.org/pdf/1908.10598v1.pdf
PWC	https://paperswithcode.com/paper/techniques-for-inverted-index-compression
Repo	https://github.com/jermp/2i_bench
Framework	none

Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure


Title	Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure
Authors	Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio
Abstract	One characteristic of human visual perception is the presence of `Gestalt phenomena,’ that is, that the whole is something other than the sum of its parts. A natural question is whether image-recognition networks show similar effects. Our paper investigates one particular type of Gestalt phenomenon, the law of closure, in the context of a feedforward image classification neural network (NN). This is a robust effect in human perception, but experiments typically rely on measurements (e.g., reaction time) that are not available for artificial neural nets. We describe a protocol for identifying closure effect in NNs, and report on the results of experiments with simple visual stimuli. Our findings suggest that NNs trained with natural images do exhibit closure, in contrast to networks with randomized weights or networks that have been trained on visually random data. Furthermore, the closure effect reflects something beyond good feature extraction; it is correlated with the network’s higher layer features and ability to generalize. \|
Tasks	Image Classification
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01069v3
PDF	http://arxiv.org/pdf/1903.01069v3.pdf
PWC	https://paperswithcode.com/paper/do-neural-networks-show-gestalt-phenomena-an
Repo	https://github.com/google-research/gestalt
Framework	none

Any-Precision Deep Neural Networks


Title	Any-Precision Deep Neural Networks
Authors	Haichao Yu, Haoxiang Li, Honghui Shi, Thomas S. Huang, Gang Hua
Abstract	We present Any-Precision Deep Neural Networks (Any-Precision DNNs), which are trained with a new method that empowers learned DNNs to be flexible in any numerical precision during inference. The same model in runtime can be flexibly and directly set to different bit-width, by truncating the least significant bits, to support dynamic speed and accuracy trade-off. When all layers are set to low-bits, we show that the model achieved accuracy comparable to dedicated models trained at the same precision. This nice property facilitates flexible deployment of deep learning models in real-world applications, where in practice trade-offs between model accuracy and runtime efficiency are often sought. Previous literature presents solutions to train models at each individual fixed efficiency/accuracy trade-off point. But how to produce a model flexible in runtime precision is largely unexplored. When the demand of efficiency/accuracy trade-off varies from time to time or even dynamically changes in runtime, it is infeasible to re-train models accordingly, and the storage budget may forbid keeping multiple models. Our proposed framework achieves this flexibility without performance degradation. More importantly, we demonstrate that this achievement is agnostic to model architectures. We experimentally validated our method with different deep network backbones (AlexNet-small, Resnet-20, Resnet-50) on different datasets (SVHN, Cifar-10, ImageNet) and observed consistent results. Code and models will be available at https://github.com/haichaoyu.
Tasks
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07346v1
PDF	https://arxiv.org/pdf/1911.07346v1.pdf
PWC	https://paperswithcode.com/paper/any-precision-deep-neural-networks
Repo	https://github.com/haichaoyu/any-precision-nets
Framework	pytorch

GILT: Generating Images from Long Text


Title	GILT: Generating Images from Long Text
Authors	Ori Bar El, Ori Licht, Netanel Yosephian
Abstract	Creating an image reflecting the content of a long text is a complex process that requires a sense of creativity. For example, creating a book cover or a movie poster based on their summary or a food image based on its recipe. In this paper we present the new task of generating images from long text that does not describe the visual content of the image directly. For this, we build a system for generating high-resolution 256 $\times$ 256 images of food conditioned on their recipes. The relation between the recipe text (without its title) to the visual content of the image is vague, and the textual structure of recipes is complex, consisting of two sections (ingredients and instructions) both containing multiple sentences. We used the recipe1M dataset to train and evaluate our model that is based on a the StackGAN-v2 architecture.
Tasks
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02404v1
PDF	http://arxiv.org/pdf/1901.02404v1.pdf
PWC	https://paperswithcode.com/paper/gilt-generating-images-from-long-text
Repo	https://github.com/netanelyo/Recipe2ImageGAN
Framework	pytorch

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators


Title	EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
Authors	Lukas Cavigelli, Georg Rutishauser, Luca Benini
Abstract	In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly. This has sparked a surge of research into specialized hardware accelerators. Their performance is typically limited by I/O bandwidth, power consumption is dominated by I/O transfers to off-chip memory, and on-chip memories occupy a large part of the silicon area. We introduce and evaluate a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks. We present hardware architectures and synthesis results for the compressor and decompressor in 65nm. With a throughput of one 8-bit word/cycle at 600MHz, they fit into 2.8kGE and 3.0kGE of silicon area, respectively - together the size of less than seven 8-bit multiply-add units at the same throughput. We show that an average compression ratio of 5.1x for AlexNet, 4x for VGG-16, 2.4x for ResNet-34 and 2.2x for MobileNetV2 can be achieved - a gain of 45-70% over existing methods. Our approach also works effectively for various number formats, has a low frame-to-frame variance on the compression ratio, and achieves compression factors for gradient map compression during training that are even better than for inference.
Tasks	Image Classification, Object Recognition, Speech Recognition
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11645v2
PDF	https://arxiv.org/pdf/1908.11645v2.pdf
PWC	https://paperswithcode.com/paper/ebpc-extended-bit-plane-compression-for-deep
Repo	https://github.com/pulp-platform/stream-ebpc
Framework	pytorch

Deploying Technology to Save Endangered Languages


Title	Deploying Technology to Save Endangered Languages
Authors	Hilaria Cruz, Joseph Waring
Abstract	Computer scientists working on natural language processing, native speakers of endangered languages, and field linguists to discuss ways to harness Automatic Speech Recognition, especially neural networks, to automate annotation, speech tagging, and text parsing on endangered languages.
Tasks	Speech Recognition
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08971v2
PDF	https://arxiv.org/pdf/1908.08971v2.pdf
PWC	https://paperswithcode.com/paper/deploying-technology-to-save-endangered
Repo	https://github.com/pywirrarika/naki
Framework	none

RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation


Title	RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation
Authors	Blaž Škrlj, Andraž Repar, Senja Pollak
Abstract	Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems. We explore how load centrality, a graph-theoretic measure applied to graphs derived from a given text can be used to efficiently identify and rank keywords. Introducing meta vertices (aggregates of existing vertices) and systematic redundancy filters, the proposed method performs on par with state-of-the-art for the keyword extraction task on 14 diverse datasets. The proposed method is unsupervised, interpretable and can also be used for document visualization.
Tasks	Keyword Extraction
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06458v3
PDF	https://arxiv.org/pdf/1907.06458v3.pdf
PWC	https://paperswithcode.com/paper/rakun-rank-based-keyword-extraction-via
Repo	https://github.com/SkBlaz/rakun
Framework	none

An Attention-based Graph Neural Network for Heterogeneous Structural Learning


Title	An Attention-based Graph Neural Network for Heterogeneous Structural Learning
Authors	Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, Jieping Ye
Abstract	In this paper, we focus on graph representation learning of heterogeneous information network (HIN), in which various types of vertices are connected by various types of relations. Most of the existing methods conducted on HIN revise homogeneous graph embedding models via meta-paths to learn low-dimensional vector space of HIN. In this paper, we propose a novel Heterogeneous Graph Structural Attention Neural Network (HetSANN) to directly encode structural information of HIN without meta-path and achieve more informative representations. With this method, domain experts will not be needed to design meta-path schemes and the heterogeneous information can be processed automatically by our proposed model. Specifically, we implicitly represent heterogeneous information using the following two methods: 1) we model the transformation between heterogeneous vertices through a projection in low-dimensional entity spaces; 2) afterwards, we apply the graph neural network to aggregate multi-relational information of projected neighborhood by means of attention mechanism. We also present three extensions of HetSANN, i.e., voices-sharing product attention for the pairwise relationships in HIN, cycle-consistency loss to retain the transformation between heterogeneous entity spaces, and multi-task learning with full use of information. The experiments conducted on three public datasets demonstrate that our proposed models achieve significant and consistent improvements compared to state-of-the-art solutions.
Tasks	Graph Embedding, Graph Representation Learning, Heterogeneous Node Classification, Multi-Task Learning, Node Classification, Representation Learning
Published	2019-12-19
URL	https://arxiv.org/abs/1912.10832v1
PDF	https://arxiv.org/pdf/1912.10832v1.pdf
PWC	https://paperswithcode.com/paper/an-attention-based-graph-neural-network-for
Repo	https://github.com/didi/hetsann
Framework	tf

ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation


Title	ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
Authors	Hainan Zhang, Yanyan Lan, Liang Pang, Jiafeng Guo, Xueqi Cheng
Abstract	In multi-turn dialogue generation, response is usually related with only a few contexts. Therefore, an ideal model should be able to detect these relevant contexts and produce a suitable response accordingly. However, the widely used hierarchical recurrent encoderdecoder models just treat all the contexts indiscriminately, which may hurt the following response generation process. Some researchers try to use the cosine similarity or the traditional attention mechanism to find the relevant contexts, but they suffer from either insufficient relevance assumption or position bias problem. In this paper, we propose a new model, named ReCoSa, to tackle this problem. Firstly, a word level LSTM encoder is conducted to obtain the initial representation of each context. Then, the self-attention mechanism is utilized to update both the context and masked response representation. Finally, the attention weights between each context and response representations are computed and used in the further decoding process. Experimental results on both Chinese customer services dataset and English Ubuntu dialogue dataset show that ReCoSa significantly outperforms baseline models, in terms of both metric-based and human evaluations. Further analysis on attention shows that the detected relevant contexts by ReCoSa are highly coherent with human’s understanding, validating the correctness and interpretability of ReCoSa.
Tasks	Dialogue Generation
Published	2019-07-09
URL	https://arxiv.org/abs/1907.05339v1
PDF	https://arxiv.org/pdf/1907.05339v1.pdf
PWC	https://paperswithcode.com/paper/recosa-detecting-the-relevant-contexts-with
Repo	https://github.com/zhanghainan/ReCoSa
Framework	tf