February 1, 2020

3310 words 16 mins read

Paper Group AWR 244

Automatically Identifying Complaints in Social Media. A Simple Joint Model for Improved Contextual Neural Lemmatization. Learning To Follow Directions in Street View. JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields. Neural Volumes: Learning Dynamic Renderab …


Title	Automatically Identifying Complaints in Social Media
Authors	Daniel Preotiuc-Pietro, Mihaela Gaman, Nikolaos Aletras
Abstract	Complaining is a basic speech act regularly used in human and computer mediated communication to express a negative mismatch between reality and expectations in a particular situation. Automatically identifying complaints in social media is of utmost importance for organizations or brands to improve the customer experience or in developing dialogue systems for handling and responding to complaints. In this paper, we introduce the first systematic analysis of complaints in computational linguistics. We collect a new annotated data set of written complaints expressed in English on Twitter.\footnote{Data and code is available here: \url{https://github.com/danielpreotiuc/complaints-social-media}} We present an extensive linguistic analysis of complaining as a speech act in social media and train strong feature-based and neural models of complaints across nine domains achieving a predictive performance of up to 79 F1 using distant supervision.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03890v1
PDF	https://arxiv.org/pdf/1906.03890v1.pdf
PWC	https://paperswithcode.com/paper/automatically-identifying-complaints-in
Repo	https://github.com/danielpreotiuc/complaints-social-media
Framework	none

A Simple Joint Model for Improved Contextual Neural Lemmatization


Title	A Simple Joint Model for Improved Contextual Neural Lemmatization
Authors	Chaitanya Malaviya, Shijie Wu, Ryan Cotterell
Abstract	English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates that joint morphological tagging and lemmatization is especially helpful in low-resource lemmatization and languages that display a larger degree of morphological complexity. Code and pre-trained models are available at https://sigmorphon.github.io/sharedtasks/2019/task2/.
Tasks	Lemmatization, Morphological Tagging
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02306v3
PDF	https://arxiv.org/pdf/1904.02306v3.pdf
PWC	https://paperswithcode.com/paper/a-simple-joint-model-for-improved-contextual
Repo	https://github.com/shijie-wu/neural-transducer
Framework	pytorch

Learning To Follow Directions in Street View


Title	Learning To Follow Directions in Street View
Authors	Karl Moritz Hermann, Mateusz Malinowski, Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Raia Hadsell
Abstract	Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, models that establish strong baselines, and extensive analysis of the task and the trained agents.
Tasks
Published	2019-03-01
URL	https://arxiv.org/abs/1903.00401v2
PDF	https://arxiv.org/pdf/1903.00401v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-follow-directions-in-street-view
Repo	https://github.com/deepmind/streetlearn
Framework	tf

JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields


Title	JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields
Authors	Quang-Hieu Pham, Duc Thanh Nguyen, Binh-Son Hua, Gemma Roig, Sai-Kit Yeung
Abstract	Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space, e.g., 3D scene understanding. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. Specifically, we develop a multi-task pointwise network that simultaneously performs two tasks: predicting the semantic classes of 3D points and embedding the points into high-dimensional vectors so that points of the same object instance are represented by similar embeddings. We then propose a multi-value conditional random field model to incorporate the semantic and instance labels and formulate the problem of semantic and instance segmentation as jointly optimising labels in the field model. The proposed method is thoroughly evaluated and compared with existing methods on different indoor scene datasets including S3DIS and SceneNN. Experimental results showed the robustness of the proposed joint semantic-instance segmentation scheme over its single components. Our method also achieved state-of-the-art performance on semantic segmentation.
Tasks	3D Instance Segmentation, 3D Semantic Instance Segmentation, 3D Semantic Segmentation, Scene Understanding
Published	2019-04-01
URL	http://arxiv.org/abs/1904.00699v2
PDF	http://arxiv.org/pdf/1904.00699v2.pdf
PWC	https://paperswithcode.com/paper/jsis3d-joint-semantic-instance-segmentation
Repo	https://github.com/pqhieu/JSIS3D
Framework	pytorch

Neural Volumes: Learning Dynamic Renderable Volumes from Images


Title	Neural Volumes: Learning Dynamic Renderable Volumes from Images
Authors	Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, Yaser Sheikh
Abstract	Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent these difficulties by presenting a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging. The approach is supervised directly from 2D images in a multi-view capture setting and does not require explicit reconstruction or tracking of the object. Our method has two primary components: an encoder-decoder network that transforms input images into a 3D volume representation, and a differentiable ray-marching operation that enables end-to-end training. By virtue of its 3D representation, our construction extrapolates better to novel viewpoints compared to screen-space rendering techniques. The encoder-decoder architecture learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training. To overcome memory limitations of voxel-based representations, we learn a dynamic irregular grid structure implemented with a warp field during ray-marching. This structure greatly improves the apparent resolution and reduces grid-like artifacts and jagged motion. Finally, we demonstrate how to incorporate surface-based representations into our volumetric-learning framework for applications where the highest resolution is required, using facial performance capture as a case in point.
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07751v1
PDF	https://arxiv.org/pdf/1906.07751v1.pdf
PWC	https://paperswithcode.com/paper/neural-volumes-learning-dynamic-renderable
Repo	https://github.com/facebookresearch/neuralvolumes
Framework	pytorch

COEGAN: Evaluating the Coevolution Effect in Generative Adversarial Networks


Title	COEGAN: Evaluating the Coevolution Effect in Generative Adversarial Networks
Authors	Victor Costa, Nuno Lourenço, João Correia, Penousal Machado
Abstract	Generative adversarial networks (GAN) present state-of-the-art results in the generation of samples following the distribution of the input dataset. However, GANs are difficult to train, and several aspects of the model should be previously designed by hand. Neuroevolution is a well-known technique used to provide the automatic design of network architectures which was recently expanded to deep neural networks. COEGAN is a model that uses neuroevolution and coevolution in the GAN training algorithm to provide a more stable training method and the automatic design of neural network architectures. COEGAN makes use of the adversarial aspect of the GAN components to implement coevolutionary strategies in the training algorithm. Our proposal was evaluated in the Fashion-MNIST and MNIST dataset. We compare our results with a baseline based on DCGAN and also with results from a random search algorithm. We show that our method is able to discover efficient architectures in the Fashion-MNIST and MNIST datasets. The results also suggest that COEGAN can be used as a training algorithm for GANs to avoid common issues, such as the mode collapse problem.
Tasks
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06180v1
PDF	https://arxiv.org/pdf/1912.06180v1.pdf
PWC	https://paperswithcode.com/paper/coegan-evaluating-the-coevolution-effect-in
Repo	https://github.com/vfcosta/coegan
Framework	pytorch

Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks


Title	Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks
Authors	Paul Azunre, Craig Corcoran, Numa Dhamani, Jeffrey Gleason, Garrett Honke, David Sullivan, Rebecca Ruppel, Sandeep Verma, Jonathon Morgan
Abstract	A character-level convolutional neural network (CNN) motivated by applications in “automated machine learning” (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata—i.e., network structure, headers, etc.—can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion.
Tasks	AutoML, Transfer Learning
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08456v1
PDF	http://arxiv.org/pdf/1901.08456v1.pdf
PWC	https://paperswithcode.com/paper/semantic-classification-of-tabular-datasets
Repo	https://github.com/algorine/nokore
Framework	tf

srlearn: A Python Library for Gradient-Boosted Statistical Relational Models


Title	srlearn: A Python Library for Gradient-Boosted Statistical Relational Models
Authors	Alexander L. Hayes
Abstract	We present srlearn, a Python library for boosted statistical relational models. We adapt the scikit-learn interface to this setting and provide examples for how this can be used to express learning and inference problems.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08198v1
PDF	https://arxiv.org/pdf/1912.08198v1.pdf
PWC	https://paperswithcode.com/paper/srlearn-a-python-library-for-gradient-boosted
Repo	https://github.com/hayesall/srlearn-StarAI-2020-workshop
Framework	none

Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards


Title	Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
Authors	Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang
Abstract	Hierarchical Reinforcement Learning (HRL) is a promising approach to solving long-horizon problems with sparse and delayed rewards. Many existing HRL algorithms either use pre-trained low-level skills that are unadaptable, or require domain-specific information to define low-level rewards. In this paper, we aim to adapt low-level skills to downstream tasks while maintaining the generality of reward design. We propose an HRL framework which sets auxiliary rewards for low-level skill training based on the advantage function of the high-level policy. This auxiliary reward enables efficient, simultaneous learning of the high-level policy and low-level skills without using task-specific knowledge. In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy. Experimental results show that our algorithm dramatically outperforms other state-of-the-art HRL methods in Mujoco domains. We also find both low-level and high-level policies trained by our algorithm transferable.
Tasks	Hierarchical Reinforcement Learning
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04450v1
PDF	https://arxiv.org/pdf/1910.04450v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-reinforcement-learning-with-3
Repo	https://github.com/ArayCHN/HAAR-A-Hierarchical-RL-Algorithm
Framework	none

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks


Title	Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
Authors	Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio
Abstract	Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.
Tasks	Distant Speech Recognition
Published	2019-04-06
URL	http://arxiv.org/abs/1904.03416v1
PDF	http://arxiv.org/pdf/1904.03416v1.pdf
PWC	https://paperswithcode.com/paper/learning-problem-agnostic-speech
Repo	https://github.com/santi-pdp/pase
Framework	pytorch

Covariance-free Partial Least Squares: An Incremental Dimensionality Reduction Method


Title	Covariance-free Partial Least Squares: An Incremental Dimensionality Reduction Method
Authors	Artur Jordao, Maiko Lie, Victor Hugo Cunha de Melo, William Robson Schwartz
Abstract	Dimensionality reduction plays an important role in computer vision problems since it reduces computational cost and is often capable of yielding more discriminative data representation. In this context, Partial Least Squares (PLS) has presented notable results in tasks such as image classification and neural network optimization. However, PLS is infeasible on large datasets (e.g., ImageNet) because it requires all the data to be in memory in advance, which is often impractical due to hardware limitations. Additionally, this requirement prevents us from employing PLS on streaming applications where the data are being continuously generated. Motivated by this, we propose a novel incremental PLS, named Covariance-free Incremental Partial Least Squares (CIPLS), which learns a low-dimensional representation of the data using a single sample at a time. In contrast to other state-of-the-art approaches, instead of adopting a partially-discriminative or SGD-based model, we extend Nonlinear Iterative Partial Least Squares (NIPALS) - the standard algorithm used to compute PLS - for incremental processing. Among the advantages of this approach are the preservation of discriminative information across all components, the possibility of employing its score matrices for feature selection, and its computational efficiency. We validate CIPLS on face verification and image classification tasks, where it outperforms several other incremental dimensionality reduction methods. In the context of feature selection, CIPLS achieves comparable results when compared to state-of-the-art techniques.
Tasks	Dimensionality Reduction, Face Verification, Feature Selection, Image Classification
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02319v1
PDF	https://arxiv.org/pdf/1910.02319v1.pdf
PWC	https://paperswithcode.com/paper/covariance-free-partial-least-squares-an
Repo	https://github.com/arturjordao/IncrementalDimensionalityReduction
Framework	none

FaceQnet: Quality Assessment for Face Recognition based on Deep Learning


Title	FaceQnet: Quality Assessment for Face Recognition based on Deep Learning
Authors	Javier Hernandez-Ortega, Javier Galbally, Julian Fierrez, Rudolf Haraksim, Laurent Beslay
Abstract	In this paper we develop a Quality Assessment approach for face recognition based on deep learning. The method consists of a Convolutional Neural Network, FaceQnet, that is used to predict the suitability of a specific input image for face recognition purposes. The training of FaceQnet is done using the VGGFace2 database. We employ the BioLab-ICAO framework for labeling the VGGFace2 images with quality information related to their ICAO compliance level. The groundtruth quality labels are obtained using FaceNet to generate comparison scores. We employ the groundtruth data to fine-tune a ResNet-based CNN, making it capable of returning a numerical quality measure for each input image. Finally, we verify if the FaceQnet scores are suitable to predict the expected performance when employing a specific image for face recognition with a COTS face recognition system. Several conclusions can be drawn from this work, most notably: 1) we managed to employ an existing ICAO compliance framework and a pretrained CNN to automatically label data with quality information, 2) we trained FaceQnet for quality estimation by fine-tuning a pre-trained face recognition network (ResNet-50), and 3) we have shown that the predictions from FaceQnet are highly correlated with the face recognition accuracy of a state-of-the-art commercial system not used during development. FaceQnet is publicly available in GitHub.
Tasks	Face Recognition
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01740v2
PDF	http://arxiv.org/pdf/1904.01740v2.pdf
PWC	https://paperswithcode.com/paper/faceqnet-quality-assessment-for-face
Repo	https://github.com/uam-biometrics/FaceQnet
Framework	tf

Understanding and Visualizing Deep Visual Saliency Models


Title	Understanding and Visualizing Deep Visual Saliency Models
Authors	Sen He, Hamed R. Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault
Abstract	Recently, data-driven deep saliency models have achieved high performance and have outperformed classical saliency models, as demonstrated by results on datasets such as the MIT300 and SALICON. Yet, there remains a large gap between the performance of these models and the inter-human baseline. Some outstanding questions include what have these models learned, how and where they fail, and how they can be improved. This article attempts to answer these questions by analyzing the representations learned by individual neurons located at the intermediate layers of deep saliency models. To this end, we follow the steps of existing deep saliency models, that is borrowing a pre-trained model of object recognition to encode the visual features and learning a decoder to infer the saliency. We consider two cases when the encoder is used as a fixed feature extractor and when it is fine-tuned, and compare the inner representations of the network. To study how the learned representations depend on the task, we fine-tune the same network using the same image set but for two different tasks: saliency prediction versus scene classification. Our analyses reveal that: 1) some visual regions (e.g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e.g. head) over some others (e.g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e.g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data.
Tasks	Object Recognition, Saliency Prediction, Scene Classification
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02501v3
PDF	http://arxiv.org/pdf/1903.02501v3.pdf
PWC	https://paperswithcode.com/paper/understanding-and-visualizing-deep-visual
Repo	https://github.com/SenHe/uavdvsm
Framework	pytorch

HiLLoC: Lossless Image Compression with Hierarchical Latent Variable Models


Title	HiLLoC: Lossless Image Compression with Hierarchical Latent Variable Models
Authors	James Townsend, Thomas Bird, Julius Kunze, David Barber
Abstract	We make the following striking observation: fully convolutional VAE models trained on 32x32 ImageNet can generalize well, not just to 64x64 but also to far larger photographs, with no changes to the model. We use this property, applying fully convolutional models to lossless compression, demonstrating a method to scale the VAE-based ‘Bits-Back with ANS’ algorithm for lossless compression to large color photographs, and achieving state of the art for compression of full size ImageNet images. We release Craystack, an open source library for convenient prototyping of lossless compression using probabilistic models, along with full implementations of all of our compression results.
Tasks	Image Compression, Latent Variable Models
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09953v1
PDF	https://arxiv.org/pdf/1912.09953v1.pdf
PWC	https://paperswithcode.com/paper/hilloc-lossless-image-compression-with-1
Repo	https://github.com/hilloc-submission/hilloc
Framework	tf

Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images


Title	Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images
Authors	Magauiya Zhussip, Shakarim Soltanayev, Se Young Chun
Abstract	Recently, Stein’s unbiased risk estimator (SURE) has been applied to unsupervised training of deep neural network Gaussian denoisers that outperformed classical non-deep learning based denoisers and yielded comparable performance to those trained with ground truth. While SURE requires only one noise realization per image for training, it does not take advantage of having multiple noise realizations per image when they are available (e.g., two uncorrelated noise realizations per image for Noise2Noise). Here, we propose an extended SURE (eSURE) to train deep denoisers with correlated pairs of noise realizations per image and applied it to the case with two uncorrelated realizations per image to achieve better performance than SURE based method and comparable results to Noise2Noise. Then, we further investigated the case with imperfect ground truth (i.e., mild noise in ground truth) that may be obtained considering painstaking, time-consuming, and even expensive processes of collecting ground truth images with multiple noisy images. For the case of generating noisy training data by adding synthetic noise to imperfect ground truth to yield correlated pairs of images, our proposed eSURE based training method outperformed conventional SURE based method as well as Noise2Noise.
Tasks	Denoising, Image Restoration
Published	2019-02-07
URL	https://arxiv.org/abs/1902.02452v2
PDF	https://arxiv.org/pdf/1902.02452v2.pdf
PWC	https://paperswithcode.com/paper/theoretical-analysis-on-noise2noise-using
Repo	https://github.com/Magauiya/Extended_SURE
Framework	tf