January 31, 2020

3202 words 16 mins read

Paper Group AWR 455

Paper Group AWR 455

Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data. Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study. Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances. Convolutional Neural Networks on non-uniform geometrical signals using Euclidean spectral transformation. …

Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data

Title Mask-ShadowGAN: Learning to Remove Shadows from Unpaired Data
Authors Xiaowei Hu, Yitong Jiang, Chi-Wing Fu, Pheng-Ann Heng
Abstract This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples. However, directly employing adversarial learning and cycle-consistency constraints is insufficient to learn the underlying relationship between the shadow and shadow-free domains, since the mapping between shadow and shadow-free images is not simply one-to-one. To address the problem, we formulate Mask-ShadowGAN, a new deep framework that automatically learns to produce a shadow mask from the input shadow image and then takes the mask to guide the shadow generation via re-formulated cycle-consistency constraints. Particularly, the framework simultaneously learns to produce shadow masks and learns to remove shadows, to maximize the overall performance. Also, we prepared an unpaired dataset for shadow removal and demonstrated the effectiveness of Mask-ShadowGAN on various experiments, even it was trained on unpaired data.
Tasks
Published 2019-03-26
URL https://arxiv.org/abs/1903.10683v3
PDF https://arxiv.org/pdf/1903.10683v3.pdf
PWC https://paperswithcode.com/paper/mask-shadowgan-learning-to-remove-shadows
Repo https://github.com/vinthony/ghost-free-shadow-removal
Framework tf

Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study

Title Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study
Authors Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih
Abstract As a promising paradigm, interactive semantic parsing has shown to improve both semantic parsing accuracy and user confidence in the results. In this paper, we propose a new, unified formulation of the interactive semantic parsing problem, where the goal is to design a model-based intelligent agent. The agent maintains its own state as the current predicted semantic parse, decides whether and where human intervention is needed, and generates a clarification question in natural language. A key part of the agent is a world model: it takes a percept (either an initial question or subsequent feedback from the user) and transitions to a new state. We then propose a simple yet remarkably effective instantiation of our framework, demonstrated on two text-to-SQL datasets (WikiSQL and Spider) with different state-of-the-art base semantic parsers. Compared to an existing interactive semantic parsing approach that treats the base parser as a black box, our approach solicits less user feedback but yields higher run-time accuracy.
Tasks Semantic Parsing, Text-To-Sql
Published 2019-10-11
URL https://arxiv.org/abs/1910.05389v1
PDF https://arxiv.org/pdf/1910.05389v1.pdf
PWC https://paperswithcode.com/paper/model-based-interactive-semantic-parsing-a
Repo https://github.com/LittleYUYU/Interactive-Semantic-Parsing
Framework tf

Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances

Title Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances
Authors Soujanya Poria, Navonil Majumder, Rada Mihalcea, Eduard Hovy
Abstract Emotion is intrinsic to humans and consequently emotion understanding is a key part of human-like artificial intelligence (AI). Emotion recognition in conversation (ERC) is becoming increasingly popular as a new research frontier in natural language processing (NLP) due to its ability to mine opinions from the plethora of publicly available conversational data in platforms such as Facebook, Youtube, Reddit, Twitter, and others. Moreover, it has potential applications in health-care systems (as a tool for psychological analysis), education (understanding student frustration) and more. Additionally, ERC is also extremely important for generating emotion-aware dialogues that require an understanding of the user’s emotions. Catering to these needs calls for effective and scalable conversational emotion-recognition algorithms. However, it is a strenuous problem to solve because of several research challenges. In this paper, we discuss these challenges and shed light on the recent research in this field. We also describe the drawbacks of these approaches and discuss the reasons why they fail to successfully overcome the research challenges in ERC.
Tasks Emotion Recognition, Emotion Recognition in Conversation
Published 2019-05-08
URL https://arxiv.org/abs/1905.02947v1
PDF https://arxiv.org/pdf/1905.02947v1.pdf
PWC https://paperswithcode.com/paper/emotion-recognition-in-conversation-research
Repo https://github.com/SenticNet/conv-emotion
Framework pytorch

Convolutional Neural Networks on non-uniform geometrical signals using Euclidean spectral transformation

Title Convolutional Neural Networks on non-uniform geometrical signals using Euclidean spectral transformation
Authors Chiyu “Max” Jiang, Dequan Wang, Jingwei Huang, Philip Marcus, Matthias Nießner
Abstract Convolutional Neural Networks (CNN) have been successful in processing data signals that are uniformly sampled in the spatial domain (e.g., images). However, most data signals do not natively exist on a grid, and in the process of being sampled onto a uniform physical grid suffer significant aliasing error and information loss. Moreover, signals can exist in different topological structures as, for example, points, lines, surfaces and volumes. It has been challenging to analyze signals with mixed topologies (for example, point cloud with surface mesh). To this end, we develop mathematical formulations for Non-Uniform Fourier Transforms (NUFT) to directly, and optimally, sample nonuniform data signals of different topologies defined on a simplex mesh into the spectral domain with no spatial sampling error. The spectral transform is performed in the Euclidean space, which removes the translation ambiguity from works on the graph spectrum. Our representation has four distinct advantages: (1) the process causes no spatial sampling error during the initial sampling, (2) the generality of this approach provides a unified framework for using CNNs to analyze signals of mixed topologies, (3) it allows us to leverage state-of-the-art backbone CNN architectures for effective learning without having to design a particular architecture for a particular data structure in an ad-hoc fashion, and (4) the representation allows weighted meshes where each element has a different weight (i.e., texture) indicating local properties. We achieve results on par with the state-of-the-art for the 3D shape retrieval task, and a new state-of-the-art for the point cloud to surface reconstruction task.
Tasks 3D Shape Retrieval
Published 2019-01-07
URL http://arxiv.org/abs/1901.02070v1
PDF http://arxiv.org/pdf/1901.02070v1.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-networks-on-non-uniform
Repo https://github.com/maxjiang93/DDSL
Framework pytorch

Adapting Neural Networks for the Estimation of Treatment Effects

Title Adapting Neural Networks for the Estimation of Treatment Effects
Authors Claudia Shi, David M. Blei, Victor Veitch
Abstract This paper addresses the use of neural networks for the estimation of treatment effects from observational data. Generally, estimation proceeds in two stages. First, we fit models for the expected outcome and the probability of treatment (propensity score) for each unit. Second, we plug these fitted models into a downstream estimator of the effect. Neural networks are a natural choice for the models in the first step. The question we address is: how can we adapt the design and training of the neural networks used in the first step in order to improve the quality of the final estimate of the treatment effect? We propose two adaptations based on insights from the statistical literature on the estimation of treatment effects. The first is a new architecture, the Dragonnet, that exploits the sufficiency of the propensity score for estimation adjustment. The second is a regularization procedure, targeted regularization, that induces a bias towards models that have non-parametrically optimal asymptotic properties out-of-the-box. Studies on benchmark datasets for causal inference show these adaptations outperform existing methods. Code is available at github.com/claudiashi57/dragonnet.
Tasks Causal Inference
Published 2019-06-05
URL https://arxiv.org/abs/1906.02120v2
PDF https://arxiv.org/pdf/1906.02120v2.pdf
PWC https://paperswithcode.com/paper/adapting-neural-networks-for-the-estimation
Repo https://github.com/claudiashi57/dragonnet
Framework tf

PAC-Bayes Un-Expected Bernstein Inequality

Title PAC-Bayes Un-Expected Bernstein Inequality
Authors Zakaria Mhammedi, Peter D. Grunwald, Benjamin Guedj
Abstract We present a new PAC-Bayesian generalization bound. Standard bounds contain a $\sqrt{L_n \cdot \KL/n}$ complexity term which dominates unless $L_n$, the empirical error of the learning algorithm’s randomized predictions, vanishes. We manage to replace $L_n$ by a term which vanishes in many more situations, essentially whenever the employed learning algorithm is sufficiently stable on the dataset at hand. Our new bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enough $n$). Theoretically, unlike existing bounds, our new bound can be expected to converge to $0$ faster whenever a Bernstein/Tsybakov condition holds, thus connecting PAC-Bayesian generalization and {\em excess risk/} bounds—for the latter it has long been known that faster convergence can be obtained under Bernstein conditions. Our main technical tool is a new concentration inequality which is like Bernstein’s but with $X^2$ taken outside its expectation.
Tasks
Published 2019-05-31
URL https://arxiv.org/abs/1905.13367v2
PDF https://arxiv.org/pdf/1905.13367v2.pdf
PWC https://paperswithcode.com/paper/pac-bayes-un-expected-bernstein-inequality
Repo https://github.com/bguedj/PAC-Bayesian-Un-Expected-Bernstein-Inequality
Framework none

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

Title You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization
Authors Okan Köpüklü, Xiangyu Wei, Gerhard Rigoll
Abstract Spatiotemporal action localization requires the incorporation of two sources of information into the designed architecture: (1) temporal information from the previous frames and (2) spatial information from the key frame. Current state-of-the-art approaches usually extract these information with separate networks and use an extra mechanism for fusion to get detections. In this work, we present YOWO, a unified CNN architecture for real-time spatiotemporal action localization in video streams. YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation. Since the whole architecture is unified, it can be optimized end-to-end. The YOWO architecture is fast providing 34 frames-per-second on 16-frames input clips and 62 frames-per-second on 8-frames input clips, which is currently the fastest state-of-the-art architecture on spatiotemporal action localization task. Remarkably, YOWO outperforms the previous state-of-the art results on J-HMDB-21 and UCF101-24 with an impressive improvement of ~3% and ~12%, respectively. We make our code and pretrained models publicly available.
Tasks Action Localization
Published 2019-11-15
URL https://arxiv.org/abs/1911.06644v3
PDF https://arxiv.org/pdf/1911.06644v3.pdf
PWC https://paperswithcode.com/paper/you-only-watch-once-a-unified-cnn
Repo https://github.com/wei-tim/YOWO
Framework pytorch

Permutohedral Attention Module for Efficient Non-Local Neural Networks

Title Permutohedral Attention Module for Efficient Non-Local Neural Networks
Authors Samuel Joutard, Reuben Dorent, Amanda Isaac, Sebastien Ourselin, Tom Vercauteren, Marc Modat
Abstract Medical image processing tasks such as segmentation often require capturing non-local information. As organs, bones, and tissues share common characteristics such as intensity, shape, and texture, the contextual information plays a critical role in correctly labeling them. Segmentation and labeling is now typically done with convolutional neural networks (CNNs) but the context of the CNN is limited by the receptive field which itself is limited by memory requirements and other properties. In this paper, we propose a new attention module, that we call Permutohedral Attention Module (PAM), to efficiently capture non-local characteristics of the image. The proposed method is both memory and computationally efficient. We provide a GPU implementation of this module suitable for 3D medical imaging problems. We demonstrate the efficiency and scalability of our module with the challenging task of vertebrae segmentation and labeling where context plays a crucial role because of the very similar appearance of different vertebrae.
Tasks
Published 2019-07-01
URL https://arxiv.org/abs/1907.00641v2
PDF https://arxiv.org/pdf/1907.00641v2.pdf
PWC https://paperswithcode.com/paper/permutohedral-attention-module-for-efficient
Repo https://github.com/SamuelJoutard/Permutohedral_attention_module
Framework pytorch

Point-Voxel CNN for Efficient 3D Deep Learning

Title Point-Voxel CNN for Efficient 3D Deep Learning
Authors Zhijian Liu, Haotian Tang, Yujun Lin, Song Han
Abstract We present Point-Voxel CNN (PVCNN) for efficient, fast 3D deep learning. Previous work processes 3D data using either voxel-based or point-based NN models. However, both approaches are computationally inefficient. The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution. As for point-based networks, up to 80% of the time is wasted on structuring the sparse data which have rather poor memory locality, not on the actual feature extraction. In this paper, we propose PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to reduce the irregular, sparse data access and improve the locality. Our PVCNN model is both memory and computation efficient. Evaluated on semantic and part segmentation datasets, it achieves much higher accuracy than the voxel-based baseline with 10x GPU memory reduction; it also outperforms the state-of-the-art point-based models with 7x measured speedup on average. Remarkably, the narrower version of PVCNN achieves 2x speedup over PointNet (an extremely efficient model) on part and scene segmentation benchmarks with much higher accuracy. We validate the general effectiveness of PVCNN on 3D object detection: by replacing the primitives in Frustrum PointNet with PVConv, it outperforms Frustrum PointNet++ by 2.4% mAP on average with 1.5x measured speedup and GPU memory reduction.
Tasks 3D Object Detection, Object Detection, Scene Segmentation
Published 2019-07-08
URL https://arxiv.org/abs/1907.03739v2
PDF https://arxiv.org/pdf/1907.03739v2.pdf
PWC https://paperswithcode.com/paper/point-voxel-cnn-for-efficient-3d-deep
Repo https://github.com/mit-han-lab/pvcnn
Framework pytorch

Interpreting Black Box Models via Hypothesis Testing

Title Interpreting Black Box Models via Hypothesis Testing
Authors Collin Burns, Jesse Thomason, Wesley Tansey
Abstract While many methods for interpreting machine learning models have been proposed, they are often ad hoc, difficult to interpret, and come with limited guarantees. This is especially problematic in science and medicine, where model interpretations may be reported as discoveries or guide patient treatments. As a step toward more principled and reliable interpretations, in this paper we reframe black box model interpretability as a multiple hypothesis testing problem. The task is to discover “important” features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with uninformative counterfactuals. We propose two testing methods: one that provably controls the false discovery rate but which is not yet feasible for large-scale applications, and an approximate testing method which can be applied to real-world data sets. In simulation, both tests have high power relative to existing interpretability methods. When applied to state-of-the-art vision and language models, the framework selects features that intuitively explain model predictions. The resulting explanations have the additional advantage that they are themselves easy to interpret.
Tasks
Published 2019-03-29
URL https://arxiv.org/abs/1904.00045v2
PDF https://arxiv.org/pdf/1904.00045v2.pdf
PWC https://paperswithcode.com/paper/interpreting-black-box-models-with
Repo https://github.com/collin-burns/interpretability-hypothesis-testing
Framework none

Generating Summaries with Topic Templates and Structured Convolutional Decoders

Title Generating Summaries with Topic Templates and Structured Convolutional Decoders
Authors Laura Perez-Beltrachini, Yang Liu, Mirella Lapata
Abstract Existing neural generation approaches create multi-sentence text as a single sequence. In this paper we propose a structured convolutional decoder that is guided by the content structure of target summaries. We compare our model with existing sequential decoders on three data sets representing different domains. Automatic and human evaluation demonstrate that our summaries have better content coverage.
Tasks
Published 2019-06-11
URL https://arxiv.org/abs/1906.04687v1
PDF https://arxiv.org/pdf/1906.04687v1.pdf
PWC https://paperswithcode.com/paper/generating-summaries-with-topic-templates-and
Repo https://github.com/lauhaide/WikiCatSum
Framework pytorch

Constructing Multiple Tasks for Augmentation: Improving Neural Image Classification With K-means Features

Title Constructing Multiple Tasks for Augmentation: Improving Neural Image Classification With K-means Features
Authors Tao Gui, Lizhi Qing, Qi Zhang, Jiacheng Ye, HangYan, Zichu Fei, Xuanjing Huang
Abstract Multi-task learning (MTL) has received considerable attention, and numerous deep learning applications benefit from MTL with multiple objectives. However, constructing multiple related tasks is difficult, and sometimes only a single task is available for training in a dataset. To tackle this problem, we explored the idea of using unsupervised clustering to construct a variety of auxiliary tasks from unlabeled data or existing labeled data. We found that some of these newly constructed tasks could exhibit semantic meanings corresponding to certain human-specific attributes, but some were non-ideal. In order to effectively reduce the impact of non-ideal auxiliary tasks on the main task, we further proposed a novel meta-learning-based multi-task learning approach, which trained the shared hidden layers on auxiliary tasks, while the meta-optimization objective was to minimize the loss on the main task, ensuring that the optimizing direction led to an improvement on the main task. Experimental results across five image datasets demonstrated that the proposed method significantly outperformed existing single task learning, semi-supervised learning, and some data augmentation methods, including an improvement of more than 9% on the Omniglot dataset.
Tasks Data Augmentation, Image Classification, Meta-Learning, Multi-Task Learning, Omniglot
Published 2019-11-18
URL https://arxiv.org/abs/1911.07518v1
PDF https://arxiv.org/pdf/1911.07518v1.pdf
PWC https://paperswithcode.com/paper/constructing-multiple-tasks-for-augmentation
Repo https://github.com/Howardqlz/Meta-MTL
Framework pytorch

HiGRU: Hierarchical Gated Recurrent Units for Utterance-level Emotion Recognition

Title HiGRU: Hierarchical Gated Recurrent Units for Utterance-level Emotion Recognition
Authors Wenxiang Jiao, Haiqin Yang, Irwin King, Michael R. Lyu
Abstract In this paper, we address three challenges in utterance-level emotion recognition in dialogue systems: (1) the same word can deliver different emotions in different contexts; (2) some emotions are rarely seen in general dialogues; (3) long-range contextual information is hard to be effectively captured. We therefore propose a hierarchical Gated Recurrent Unit (HiGRU) framework with a lower-level GRU to model the word-level inputs and an upper-level GRU to capture the contexts of utterance-level embeddings. Moreover, we promote the framework to two variants, HiGRU with individual features fusion (HiGRU-f) and HiGRU with self-attention and features fusion (HiGRU-sf), so that the word/utterance-level individual inputs and the long-range contextual information can be sufficiently utilized. Experiments on three dialogue emotion datasets, IEMOCAP, Friends, and EmotionPush demonstrate that our proposed HiGRU models attain at least 8.7%, 7.5%, 6.0% improvement over the state-of-the-art methods on each dataset, respectively. Particularly, by utilizing only the textual feature in IEMOCAP, our HiGRU models gain at least 3.8% improvement over the state-of-the-art conversational memory network (CMN) with the trimodal features of text, video, and audio.
Tasks Emotion Recognition
Published 2019-04-09
URL http://arxiv.org/abs/1904.04446v1
PDF http://arxiv.org/pdf/1904.04446v1.pdf
PWC https://paperswithcode.com/paper/higru-hierarchical-gated-recurrent-units-for
Repo https://github.com/wxjiao/HiGRUs
Framework pytorch

Speech-VGG: A deep feature extractor for speech processing

Title Speech-VGG: A deep feature extractor for speech processing
Authors Pierre Beckmann, Mikolaj Kegler, Hugues Saltini, Milos Cernak
Abstract A growing number of studies in the field of speech processing employ feature losses to train deep learning systems. While the application of this framework typically yields beneficial results, the question of what’s the optimal setup for extracting transferable speech features to compute losses remains underexplored. In this study, we extend our previous work on speechVGG, a deep feature extractor for training speech processing frameworks. The extractor is based on the classic VGG-16 convolutional neural network re-trained to identify words from the log magnitude STFT features. To estimate the influence of different hyperparameters on the extractor’s performance, we applied several configurations of speechVGG to train a system for informed speech inpainting, the context-based recovery of missing parts from time-frequency masked speech segments. We show that changing the size of the dictionary and the size of the dataset used to pre-train the speechVGG notably modulates task performance of the main framework.
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.09909v3
PDF https://arxiv.org/pdf/1910.09909v3.pdf
PWC https://paperswithcode.com/paper/speech-vgg-a-deep-feature-extractor-for
Repo https://github.com/bepierre/SpeechVGG
Framework tf

Compressing deep neural networks by matrix product operators

Title Compressing deep neural networks by matrix product operators
Authors Ze-Feng Gao, Song Cheng, Rong-Qiang He, Z. Y. Xie, Hui-Hai Zhao, Zhong-Yi Lu, Tao Xiang
Abstract A deep neural network is a parameterization of a multi-layer mapping of signals in terms of many alternatively arranged linear and nonlinear transformations. The linear transformations, which are generally used in the fully-connected as well as convolutional layers, contain most of the variational parameters that are trained and stored. Compressing a deep neural network to reduce its number of variational parameters but not its prediction power is an important but challenging problem towards the establishment of an optimized scheme in training efficiently these parameters and in lowering the risk of overfitting. Here we show that this problem can be effectively solved by representing linear transformations with matrix product operators (MPO). We have tested this approach in five main neural networks, including FC2, LeNet-5, VGG, ResNet, and DenseNet on two widely used datasets, namely MNIST and CIFAR-10, and found that this MPO representation indeed sets up a faithful and efficient mapping between input and output signals, which can keep or even improve the prediction accuracy with dramatically reduced number of parameters.
Tasks
Published 2019-04-11
URL http://arxiv.org/abs/1904.06194v1
PDF http://arxiv.org/pdf/1904.06194v1.pdf
PWC https://paperswithcode.com/paper/compressing-deep-neural-networks-by-matrix
Repo https://github.com/zfgao66/deeplearning-mpo
Framework tf
comments powered by Disqus