Paper Group AWR 244
Automatically Identifying Complaints in Social Media. A Simple Joint Model for Improved Contextual Neural Lemmatization. Learning To Follow Directions in Street View. JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields. Neural Volumes: Learning Dynamic Renderab …
Automatically Identifying Complaints in Social Media
Title | Automatically Identifying Complaints in Social Media |
Authors | Daniel Preotiuc-Pietro, Mihaela Gaman, Nikolaos Aletras |
Abstract | Complaining is a basic speech act regularly used in human and computer mediated communication to express a negative mismatch between reality and expectations in a particular situation. Automatically identifying complaints in social media is of utmost importance for organizations or brands to improve the customer experience or in developing dialogue systems for handling and responding to complaints. In this paper, we introduce the first systematic analysis of complaints in computational linguistics. We collect a new annotated data set of written complaints expressed in English on Twitter.\footnote{Data and code is available here: \url{https://github.com/danielpreotiuc/complaints-social-media}} We present an extensive linguistic analysis of complaining as a speech act in social media and train strong feature-based and neural models of complaints across nine domains achieving a predictive performance of up to 79 F1 using distant supervision. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03890v1 |
https://arxiv.org/pdf/1906.03890v1.pdf | |
PWC | https://paperswithcode.com/paper/automatically-identifying-complaints-in |
Repo | https://github.com/danielpreotiuc/complaints-social-media |
Framework | none |
A Simple Joint Model for Improved Contextual Neural Lemmatization
Title | A Simple Joint Model for Improved Contextual Neural Lemmatization |
Authors | Chaitanya Malaviya, Shijie Wu, Ryan Cotterell |
Abstract | English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates that joint morphological tagging and lemmatization is especially helpful in low-resource lemmatization and languages that display a larger degree of morphological complexity. Code and pre-trained models are available at https://sigmorphon.github.io/sharedtasks/2019/task2/. |
Tasks | Lemmatization, Morphological Tagging |
Published | 2019-04-04 |
URL | https://arxiv.org/abs/1904.02306v3 |
https://arxiv.org/pdf/1904.02306v3.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-joint-model-for-improved-contextual |
Repo | https://github.com/shijie-wu/neural-transducer |
Framework | pytorch |
Learning To Follow Directions in Street View
Title | Learning To Follow Directions in Street View |
Authors | Karl Moritz Hermann, Mateusz Malinowski, Piotr Mirowski, Andras Banki-Horvath, Keith Anderson, Raia Hadsell |
Abstract | Navigating and understanding the real world remains a key challenge in machine learning and inspires a great variety of research in areas such as language grounding, planning, navigation and computer vision. We propose an instruction-following task that requires all of the above, and which combines the practicality of simulated environments with the challenges of ambiguous, noisy real world data. StreetNav is built on top of Google Street View and provides visually accurate environments representing real places. Agents are given driving instructions which they must learn to interpret in order to successfully navigate in this environment. Since humans equipped with driving instructions can readily navigate in previously unseen cities, we set a high bar and test our trained agents for similar cognitive capabilities. Although deep reinforcement learning (RL) methods are frequently evaluated only on data that closely follow the training distribution, our dataset extends to multiple cities and has a clean train/test separation. This allows for thorough testing of generalisation ability. This paper presents the StreetNav environment and tasks, models that establish strong baselines, and extensive analysis of the task and the trained agents. |
Tasks | |
Published | 2019-03-01 |
URL | https://arxiv.org/abs/1903.00401v2 |
https://arxiv.org/pdf/1903.00401v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-follow-directions-in-street-view |
Repo | https://github.com/deepmind/streetlearn |
Framework | tf |
JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields
Title | JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields |
Authors | Quang-Hieu Pham, Duc Thanh Nguyen, Binh-Son Hua, Gemma Roig, Sai-Kit Yeung |
Abstract | Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space, e.g., 3D scene understanding. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. Specifically, we develop a multi-task pointwise network that simultaneously performs two tasks: predicting the semantic classes of 3D points and embedding the points into high-dimensional vectors so that points of the same object instance are represented by similar embeddings. We then propose a multi-value conditional random field model to incorporate the semantic and instance labels and formulate the problem of semantic and instance segmentation as jointly optimising labels in the field model. The proposed method is thoroughly evaluated and compared with existing methods on different indoor scene datasets including S3DIS and SceneNN. Experimental results showed the robustness of the proposed joint semantic-instance segmentation scheme over its single components. Our method also achieved state-of-the-art performance on semantic segmentation. |
Tasks | 3D Instance Segmentation, 3D Semantic Instance Segmentation, 3D Semantic Segmentation, Scene Understanding |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.00699v2 |
http://arxiv.org/pdf/1904.00699v2.pdf | |
PWC | https://paperswithcode.com/paper/jsis3d-joint-semantic-instance-segmentation |
Repo | https://github.com/pqhieu/JSIS3D |
Framework | pytorch |
Neural Volumes: Learning Dynamic Renderable Volumes from Images
Title | Neural Volumes: Learning Dynamic Renderable Volumes from Images |
Authors | Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, Yaser Sheikh |
Abstract | Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent these difficulties by presenting a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging. The approach is supervised directly from 2D images in a multi-view capture setting and does not require explicit reconstruction or tracking of the object. Our method has two primary components: an encoder-decoder network that transforms input images into a 3D volume representation, and a differentiable ray-marching operation that enables end-to-end training. By virtue of its 3D representation, our construction extrapolates better to novel viewpoints compared to screen-space rendering techniques. The encoder-decoder architecture learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training. To overcome memory limitations of voxel-based representations, we learn a dynamic irregular grid structure implemented with a warp field during ray-marching. This structure greatly improves the apparent resolution and reduces grid-like artifacts and jagged motion. Finally, we demonstrate how to incorporate surface-based representations into our volumetric-learning framework for applications where the highest resolution is required, using facial performance capture as a case in point. |
Tasks | |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07751v1 |
https://arxiv.org/pdf/1906.07751v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-volumes-learning-dynamic-renderable |
Repo | https://github.com/facebookresearch/neuralvolumes |
Framework | pytorch |
COEGAN: Evaluating the Coevolution Effect in Generative Adversarial Networks
Title | COEGAN: Evaluating the Coevolution Effect in Generative Adversarial Networks |
Authors | Victor Costa, Nuno Lourenço, João Correia, Penousal Machado |
Abstract | Generative adversarial networks (GAN) present state-of-the-art results in the generation of samples following the distribution of the input dataset. However, GANs are difficult to train, and several aspects of the model should be previously designed by hand. Neuroevolution is a well-known technique used to provide the automatic design of network architectures which was recently expanded to deep neural networks. COEGAN is a model that uses neuroevolution and coevolution in the GAN training algorithm to provide a more stable training method and the automatic design of neural network architectures. COEGAN makes use of the adversarial aspect of the GAN components to implement coevolutionary strategies in the training algorithm. Our proposal was evaluated in the Fashion-MNIST and MNIST dataset. We compare our results with a baseline based on DCGAN and also with results from a random search algorithm. We show that our method is able to discover efficient architectures in the Fashion-MNIST and MNIST datasets. The results also suggest that COEGAN can be used as a training algorithm for GANs to avoid common issues, such as the mode collapse problem. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06180v1 |
https://arxiv.org/pdf/1912.06180v1.pdf | |
PWC | https://paperswithcode.com/paper/coegan-evaluating-the-coevolution-effect-in |
Repo | https://github.com/vfcosta/coegan |
Framework | pytorch |
Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks
Title | Semantic Classification of Tabular Datasets via Character-Level Convolutional Neural Networks |
Authors | Paul Azunre, Craig Corcoran, Numa Dhamani, Jeffrey Gleason, Garrett Honke, David Sullivan, Rebecca Ruppel, Sandeep Verma, Jonathon Morgan |
Abstract | A character-level convolutional neural network (CNN) motivated by applications in “automated machine learning” (AutoML) is proposed to semantically classify columns in tabular data. Simulated data containing a set of base classes is first used to learn an initial set of weights. Hand-labeled data from the CKAN repository is then used in a transfer-learning paradigm to adapt the initial weights to a more sophisticated representation of the problem (e.g., including more classes). In doing so, realistic data imperfections are learned and the set of classes handled can be expanded from the base set with reduced labeled data and computing power requirements. Results show the effectiveness and flexibility of this approach in three diverse domains: semantic classification of tabular data, age prediction from social media posts, and email spam classification. In addition to providing further evidence of the effectiveness of transfer learning in natural language processing (NLP), our experiments suggest that analyzing the semantic structure of language at the character level without additional metadata—i.e., network structure, headers, etc.—can produce competitive accuracy for type classification, spam classification, and social media age prediction. We present our open-source toolkit SIMON, an acronym for Semantic Inference for the Modeling of ONtologies, which implements this approach in a user-friendly and scalable/parallelizable fashion. |
Tasks | AutoML, Transfer Learning |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08456v1 |
http://arxiv.org/pdf/1901.08456v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-classification-of-tabular-datasets |
Repo | https://github.com/algorine/nokore |
Framework | tf |
srlearn: A Python Library for Gradient-Boosted Statistical Relational Models
Title | srlearn: A Python Library for Gradient-Boosted Statistical Relational Models |
Authors | Alexander L. Hayes |
Abstract | We present srlearn, a Python library for boosted statistical relational models. We adapt the scikit-learn interface to this setting and provide examples for how this can be used to express learning and inference problems. |
Tasks | |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.08198v1 |
https://arxiv.org/pdf/1912.08198v1.pdf | |
PWC | https://paperswithcode.com/paper/srlearn-a-python-library-for-gradient-boosted |
Repo | https://github.com/hayesall/srlearn-StarAI-2020-workshop |
Framework | none |
Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
Title | Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards |
Authors | Siyuan Li, Rui Wang, Minxue Tang, Chongjie Zhang |
Abstract | Hierarchical Reinforcement Learning (HRL) is a promising approach to solving long-horizon problems with sparse and delayed rewards. Many existing HRL algorithms either use pre-trained low-level skills that are unadaptable, or require domain-specific information to define low-level rewards. In this paper, we aim to adapt low-level skills to downstream tasks while maintaining the generality of reward design. We propose an HRL framework which sets auxiliary rewards for low-level skill training based on the advantage function of the high-level policy. This auxiliary reward enables efficient, simultaneous learning of the high-level policy and low-level skills without using task-specific knowledge. In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy. Experimental results show that our algorithm dramatically outperforms other state-of-the-art HRL methods in Mujoco domains. We also find both low-level and high-level policies trained by our algorithm transferable. |
Tasks | Hierarchical Reinforcement Learning |
Published | 2019-10-10 |
URL | https://arxiv.org/abs/1910.04450v1 |
https://arxiv.org/pdf/1910.04450v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-reinforcement-learning-with-3 |
Repo | https://github.com/ArayCHN/HAAR-A-Hierarchical-RL-Algorithm |
Framework | none |
Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
Title | Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks |
Authors | Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio |
Abstract | Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems. |
Tasks | Distant Speech Recognition |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03416v1 |
http://arxiv.org/pdf/1904.03416v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-problem-agnostic-speech |
Repo | https://github.com/santi-pdp/pase |
Framework | pytorch |
Covariance-free Partial Least Squares: An Incremental Dimensionality Reduction Method
Title | Covariance-free Partial Least Squares: An Incremental Dimensionality Reduction Method |
Authors | Artur Jordao, Maiko Lie, Victor Hugo Cunha de Melo, William Robson Schwartz |
Abstract | Dimensionality reduction plays an important role in computer vision problems since it reduces computational cost and is often capable of yielding more discriminative data representation. In this context, Partial Least Squares (PLS) has presented notable results in tasks such as image classification and neural network optimization. However, PLS is infeasible on large datasets (e.g., ImageNet) because it requires all the data to be in memory in advance, which is often impractical due to hardware limitations. Additionally, this requirement prevents us from employing PLS on streaming applications where the data are being continuously generated. Motivated by this, we propose a novel incremental PLS, named Covariance-free Incremental Partial Least Squares (CIPLS), which learns a low-dimensional representation of the data using a single sample at a time. In contrast to other state-of-the-art approaches, instead of adopting a partially-discriminative or SGD-based model, we extend Nonlinear Iterative Partial Least Squares (NIPALS) - the standard algorithm used to compute PLS - for incremental processing. Among the advantages of this approach are the preservation of discriminative information across all components, the possibility of employing its score matrices for feature selection, and its computational efficiency. We validate CIPLS on face verification and image classification tasks, where it outperforms several other incremental dimensionality reduction methods. In the context of feature selection, CIPLS achieves comparable results when compared to state-of-the-art techniques. |
Tasks | Dimensionality Reduction, Face Verification, Feature Selection, Image Classification |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.02319v1 |
https://arxiv.org/pdf/1910.02319v1.pdf | |
PWC | https://paperswithcode.com/paper/covariance-free-partial-least-squares-an |
Repo | https://github.com/arturjordao/IncrementalDimensionalityReduction |
Framework | none |
FaceQnet: Quality Assessment for Face Recognition based on Deep Learning
Title | FaceQnet: Quality Assessment for Face Recognition based on Deep Learning |
Authors | Javier Hernandez-Ortega, Javier Galbally, Julian Fierrez, Rudolf Haraksim, Laurent Beslay |
Abstract | In this paper we develop a Quality Assessment approach for face recognition based on deep learning. The method consists of a Convolutional Neural Network, FaceQnet, that is used to predict the suitability of a specific input image for face recognition purposes. The training of FaceQnet is done using the VGGFace2 database. We employ the BioLab-ICAO framework for labeling the VGGFace2 images with quality information related to their ICAO compliance level. The groundtruth quality labels are obtained using FaceNet to generate comparison scores. We employ the groundtruth data to fine-tune a ResNet-based CNN, making it capable of returning a numerical quality measure for each input image. Finally, we verify if the FaceQnet scores are suitable to predict the expected performance when employing a specific image for face recognition with a COTS face recognition system. Several conclusions can be drawn from this work, most notably: 1) we managed to employ an existing ICAO compliance framework and a pretrained CNN to automatically label data with quality information, 2) we trained FaceQnet for quality estimation by fine-tuning a pre-trained face recognition network (ResNet-50), and 3) we have shown that the predictions from FaceQnet are highly correlated with the face recognition accuracy of a state-of-the-art commercial system not used during development. FaceQnet is publicly available in GitHub. |
Tasks | Face Recognition |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.01740v2 |
http://arxiv.org/pdf/1904.01740v2.pdf | |
PWC | https://paperswithcode.com/paper/faceqnet-quality-assessment-for-face |
Repo | https://github.com/uam-biometrics/FaceQnet |
Framework | tf |
Understanding and Visualizing Deep Visual Saliency Models
Title | Understanding and Visualizing Deep Visual Saliency Models |
Authors | Sen He, Hamed R. Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault |
Abstract | Recently, data-driven deep saliency models have achieved high performance and have outperformed classical saliency models, as demonstrated by results on datasets such as the MIT300 and SALICON. Yet, there remains a large gap between the performance of these models and the inter-human baseline. Some outstanding questions include what have these models learned, how and where they fail, and how they can be improved. This article attempts to answer these questions by analyzing the representations learned by individual neurons located at the intermediate layers of deep saliency models. To this end, we follow the steps of existing deep saliency models, that is borrowing a pre-trained model of object recognition to encode the visual features and learning a decoder to infer the saliency. We consider two cases when the encoder is used as a fixed feature extractor and when it is fine-tuned, and compare the inner representations of the network. To study how the learned representations depend on the task, we fine-tune the same network using the same image set but for two different tasks: saliency prediction versus scene classification. Our analyses reveal that: 1) some visual regions (e.g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e.g. head) over some others (e.g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e.g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data. |
Tasks | Object Recognition, Saliency Prediction, Scene Classification |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.02501v3 |
http://arxiv.org/pdf/1903.02501v3.pdf | |
PWC | https://paperswithcode.com/paper/understanding-and-visualizing-deep-visual |
Repo | https://github.com/SenHe/uavdvsm |
Framework | pytorch |
HiLLoC: Lossless Image Compression with Hierarchical Latent Variable Models
Title | HiLLoC: Lossless Image Compression with Hierarchical Latent Variable Models |
Authors | James Townsend, Thomas Bird, Julius Kunze, David Barber |
Abstract | We make the following striking observation: fully convolutional VAE models trained on 32x32 ImageNet can generalize well, not just to 64x64 but also to far larger photographs, with no changes to the model. We use this property, applying fully convolutional models to lossless compression, demonstrating a method to scale the VAE-based ‘Bits-Back with ANS’ algorithm for lossless compression to large color photographs, and achieving state of the art for compression of full size ImageNet images. We release Craystack, an open source library for convenient prototyping of lossless compression using probabilistic models, along with full implementations of all of our compression results. |
Tasks | Image Compression, Latent Variable Models |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09953v1 |
https://arxiv.org/pdf/1912.09953v1.pdf | |
PWC | https://paperswithcode.com/paper/hilloc-lossless-image-compression-with-1 |
Repo | https://github.com/hilloc-submission/hilloc |
Framework | tf |
Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images
Title | Extending Stein’s unbiased risk estimator to train deep denoisers with correlated pairs of noisy images |
Authors | Magauiya Zhussip, Shakarim Soltanayev, Se Young Chun |
Abstract | Recently, Stein’s unbiased risk estimator (SURE) has been applied to unsupervised training of deep neural network Gaussian denoisers that outperformed classical non-deep learning based denoisers and yielded comparable performance to those trained with ground truth. While SURE requires only one noise realization per image for training, it does not take advantage of having multiple noise realizations per image when they are available (e.g., two uncorrelated noise realizations per image for Noise2Noise). Here, we propose an extended SURE (eSURE) to train deep denoisers with correlated pairs of noise realizations per image and applied it to the case with two uncorrelated realizations per image to achieve better performance than SURE based method and comparable results to Noise2Noise. Then, we further investigated the case with imperfect ground truth (i.e., mild noise in ground truth) that may be obtained considering painstaking, time-consuming, and even expensive processes of collecting ground truth images with multiple noisy images. For the case of generating noisy training data by adding synthetic noise to imperfect ground truth to yield correlated pairs of images, our proposed eSURE based training method outperformed conventional SURE based method as well as Noise2Noise. |
Tasks | Denoising, Image Restoration |
Published | 2019-02-07 |
URL | https://arxiv.org/abs/1902.02452v2 |
https://arxiv.org/pdf/1902.02452v2.pdf | |
PWC | https://paperswithcode.com/paper/theoretical-analysis-on-noise2noise-using |
Repo | https://github.com/Magauiya/Extended_SURE |
Framework | tf |