July 27, 2019

2982 words 14 mins read

Paper Group ANR 566

Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure. Less-forgetful Learning for Domain Expansion in Deep Neural Networks. Compact Tensor Pooling for Visual Question Answering. DeepDeblur: Fast one-step blurry face images restoration. Handling PDDL3.0 State Trajectory Constraints with Temporal Landmarks. UG^2: …

Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure


Title	Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure
Authors	Arman Cohan, Nazli Goharian
Abstract	We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article’s content. Our method overcomes the problem of inconsistency between the citation summary and the article’s content by providing context for each citation. We also leverage the inherent scientific article’s discourse for producing better summaries. We show that our proposed method effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of \textsc{Rouge} scores on TAC2014 scientific summarization dataset. While the dataset we use for evaluation is in the biomedical domain, most of our approaches are general and therefore adaptable to other domains.
Tasks
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06619v1
PDF	http://arxiv.org/pdf/1704.06619v1.pdf
PWC	https://paperswithcode.com/paper/scientific-article-summarization-using
Repo
Framework

Less-forgetful Learning for Domain Expansion in Deep Neural Networks


Title	Less-forgetful Learning for Domain Expansion in Deep Neural Networks
Authors	Heechul Jung, Jeongwoo Ju, Minju Jung, Junmo Kim
Abstract	Expanding the domain that deep neural network has already learned without accessing old domain data is a challenging task because deep neural networks forget previously learned information when learning new data from a new domain. In this paper, we propose a less-forgetful learning method for the domain expansion scenario. While existing domain adaptation techniques solely focused on adapting to new domains, the proposed technique focuses on working well with both old and new domains without needing to know whether the input is from the old or new domain. First, we present two naive approaches which will be problematic, then we provide a new method using two proposed properties for less-forgetful learning. Finally, we prove the effectiveness of our method through experiments on image classification tasks. All datasets used in the paper, will be released on our website for someone’s follow-up study.
Tasks	Domain Adaptation, Image Classification
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05959v1
PDF	http://arxiv.org/pdf/1711.05959v1.pdf
PWC	https://paperswithcode.com/paper/less-forgetful-learning-for-domain-expansion
Repo
Framework

Compact Tensor Pooling for Visual Question Answering


Title	Compact Tensor Pooling for Visual Question Answering
Authors	Yang Shi, Tommaso Furlanello, Anima Anandkumar
Abstract	Performing high level cognitive tasks requires the integration of feature maps with drastically different structure. In Visual Question Answering (VQA) image descriptors have spatial structures, while lexical inputs inherently follow a temporal sequence. The recently proposed Multimodal Compact Bilinear pooling (MCB) forms the outer products, via count-sketch approximation, of the visual and textual representation at each spatial location. While this procedure preserves spatial information locally, outer-products are taken independently for each fiber of the activation tensor, and therefore do not include spatial context. In this work, we introduce multi-dimensional sketch ({MD-sketch}), a novel extension of count-sketch to tensors. Using this new formulation, we propose Multimodal Compact Tensor Pooling (MCT) to fully exploit the global spatial context during bilinear pooling operations. Contrarily to MCB, our approach preserves spatial context by directly convolving the MD-sketch from the visual tensor features with the text vector feature using higher order FFT. Furthermore we apply MCT incrementally at each step of the question embedding and accumulate the multi-modal vectors with a second LSTM layer before the final answer is chosen.
Tasks	Question Answering, Visual Question Answering
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06706v1
PDF	http://arxiv.org/pdf/1706.06706v1.pdf
PWC	https://paperswithcode.com/paper/compact-tensor-pooling-for-visual-question
Repo
Framework

DeepDeblur: Fast one-step blurry face images restoration


Title	DeepDeblur: Fast one-step blurry face images restoration
Authors	Lingxiao Wang, Yali Li, Shengjin Wang
Abstract	We propose a very fast and effective one-step restoring method for blurry face images. In the last decades, many blind deblurring algorithms have been proposed to restore latent sharp images. However, these algorithms run slowly because of involving two steps: kernel estimation and following non-blind deconvolution or latent image estimation. Also they cannot handle face images in small size. Our proposed method restores sharp face images directly in one step using Convolutional Neural Network. Unlike previous deep learning involved methods that can only handle a single blur kernel at one time, our network is trained on totally random and numerous training sample pairs to deal with the variances due to different blur kernels in practice. A smoothness regularization as well as a facial regularization are added to keep facial identity information which is the key to face image applications. Comprehensive experiments demonstrate that our proposed method can handle various blur kenels and achieve state-of-the-art results for small size blurry face images restoration. Moreover, the proposed method shows significant improvement in face recognition accuracy along with increasing running speed by more than 100 times.
Tasks	Deblurring, Face Recognition
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09515v1
PDF	http://arxiv.org/pdf/1711.09515v1.pdf
PWC	https://paperswithcode.com/paper/deepdeblur-fast-one-step-blurry-face-images
Repo
Framework

Handling PDDL3.0 State Trajectory Constraints with Temporal Landmarks


Title	Handling PDDL3.0 State Trajectory Constraints with Temporal Landmarks
Authors	Eliseo Marzal, Mohannad Babli, Eva Onaindia, Laura Sebastia
Abstract	Temporal landmarks have been proved to be a helpful mechanism to deal with temporal planning problems, specifically to improve planners performance and handle problems with deadline constraints. In this paper, we show the strength of using temporal landmarks to handle the state trajectory constraints of PDDL3.0. We analyze the formalism of TempLM, a temporal planner particularly aimed at solving planning problems with deadlines, and we present a detailed study that exploits the underlying temporal landmark-based mechanism of TempLM for representing and reasoning with trajectory constraints.
Tasks
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08317v1
PDF	http://arxiv.org/pdf/1706.08317v1.pdf
PWC	https://paperswithcode.com/paper/handling-pddl30-state-trajectory-constraints
Repo
Framework

UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition


Title	UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition
Authors	Rosaura G. Vidal, Sreya Banerjee, Klemen Grm, Vitomir Struc, Walter J. Scheirer
Abstract	Advances in image restoration and enhancement techniques have led to discussion about how such algorithmscan be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of the computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG^2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground. Over 160,000 annotated frames forhundreds of ImageNet classes are available, which are used for baseline experiments that assess the impact of known and unknown image artifacts and other conditions on common deep learning-based object classification approaches. Further, current image restoration and enhancement techniques are evaluated by determining whether or not theyimprove baseline classification performance. Results showthat there is plenty of room for algorithmic innovation, making this dataset a useful tool going forward.
Tasks	Deblurring, Image Restoration, Object Classification, Super-Resolution
Published	2017-10-09
URL	http://arxiv.org/abs/1710.02909v2
PDF	http://arxiv.org/pdf/1710.02909v2.pdf
PWC	https://paperswithcode.com/paper/ug2-a-video-benchmark-for-assessing-the
Repo
Framework

Whale swarm algorithm for function optimization


Title	Whale swarm algorithm for function optimization
Authors	Bing Zeng, Liang Gao, Xinyu Li
Abstract	Increasing nature-inspired metaheuristic algorithms are applied to solving the real-world optimization problems, as they have some advantages over the classical methods of numerical optimization. This paper has proposed a new nature-inspired metaheuristic called Whale Swarm Algorithm for function optimization, which is inspired by the whales behavior of communicating with each other via ultrasound for hunting. The proposed Whale Swarm Algorithm has been compared with several popular metaheuristic algorithms on comprehensive performance metrics. According to the experimental results, Whale Swarm Algorithm has a quite competitive performance when compared with other algorithms.
Tasks
Published	2017-02-11
URL	http://arxiv.org/abs/1702.03389v2
PDF	http://arxiv.org/pdf/1702.03389v2.pdf
PWC	https://paperswithcode.com/paper/whale-swarm-algorithm-for-function
Repo
Framework

Scale-Aware Face Detection


Title	Scale-Aware Face Detection
Authors	Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, Xiu Li, Xiaolin Hu
Abstract	Convolutional neural network (CNN) based face detectors are inefficient in handling faces of diverse scales. They rely on either fitting a large single model to faces across a large scale range or multi-scale testing. Both are computationally expensive. We propose Scale-aware Face Detector (SAFD) to handle scale explicitly using CNN, and achieve better performance with less computation cost. Prior to detection, an efficient CNN predicts the scale distribution histogram of the faces. Then the scale histogram guides the zoom-in and zoom-out of the image. Since the faces will be approximately in uniform scale after zoom, they can be detected accurately even with much smaller CNN. Actually, more than 99% of the faces in AFW can be covered with less than two zooms per image. Extensive experiments on FDDB, MALF and AFW show advantages of SAFD.
Tasks	Face Detection
Published	2017-06-29
URL	http://arxiv.org/abs/1706.09876v1
PDF	http://arxiv.org/pdf/1706.09876v1.pdf
PWC	https://paperswithcode.com/paper/scale-aware-face-detection
Repo
Framework

Decoupled classifiers for fair and efficient machine learning


Title	Decoupled classifiers for fair and efficient machine learning
Authors	Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, Max Leiserson
Abstract	When it is ethical and legal to use a sensitive attribute (such as gender or race) in machine learning systems, the question remains how to do so. We show that the naive application of machine learning algorithms using sensitive features leads to an inherent tradeoff in accuracy between groups. We provide a simple and efficient decoupling technique, that can be added on top of any black-box machine learning algorithm, to learn different classifiers for different groups. Transfer learning is used to mitigate the problem of having too little data on any one group. The method can apply to a range of fairness criteria. In particular, we require the application designer to specify as joint loss function that makes explicit the trade-off between fairness and accuracy. Our reduction is shown to efficiently find the minimum loss as long as the objective has a certain natural monotonicity property which may be of independent interest in the study of fairness in algorithms.
Tasks	Transfer Learning
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06613v1
PDF	http://arxiv.org/pdf/1707.06613v1.pdf
PWC	https://paperswithcode.com/paper/decoupled-classifiers-for-fair-and-efficient
Repo
Framework

Analysis of Convolutional Neural Networks for Document Image Classification


Title	Analysis of Convolutional Neural Networks for Document Image Classification
Authors	Chris Tensmeyer, Tony Martinez
Abstract	Convolutional Neural Networks (CNNs) are state-of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images. Among other results, we exceed the state-of-the-art on the RVL-CDIP dataset by using shear transform data augmentation and an architecture designed for a larger input image. Additionally, we analyze the learned features and find evidence that CNNs trained on RVL-CDIP learn region-specific layout features.
Tasks	Data Augmentation, Document Image Classification, Image Classification
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03273v1
PDF	http://arxiv.org/pdf/1708.03273v1.pdf
PWC	https://paperswithcode.com/paper/analysis-of-convolutional-neural-networks-for
Repo
Framework

Dynamic Integration of Background Knowledge in Neural NLU Systems


Title	Dynamic Integration of Background Knowledge in Neural NLU Systems
Authors	Dirk Weissenborn, Tomáš Kočiský, Chris Dyer
Abstract	Common-sense and background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, this knowledge must be acquired from training corpora during learning, and then it is static at test time. We introduce a new architecture for the dynamic integration of explicit background knowledge in NLU models. A general-purpose reading module reads background knowledge in the form of free-text statements (together with task-specific text inputs) and yields refined word representations to a task-specific NLU architecture that reprocesses the task inputs with these representations. Experiments on document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of the approach. Analysis shows that our model learns to exploit knowledge in a semantically appropriate way.
Tasks	Common Sense Reasoning, Natural Language Inference, Question Answering
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02596v3
PDF	http://arxiv.org/pdf/1706.02596v3.pdf
PWC	https://paperswithcode.com/paper/dynamic-integration-of-background-knowledge
Repo
Framework

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging


Title	Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging
Authors	Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley
Abstract	Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
Tasks	Audio Tagging
Published	2017-02-24
URL	http://arxiv.org/abs/1702.07787v1
PDF	http://arxiv.org/pdf/1702.07787v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-gated-recurrent-neural-network
Repo
Framework

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation


Title	Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation
Authors	Christopher Tegho, Paweł Budzianowski, Milica Gašić
Abstract	In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on epsilon-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such as Gaussian Process SARSA (GPSARSA) estimate uncertainties and are sample efficient, leading to better user experience, but on the expense of a greater computational complexity. This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management. We perform an extensive benchmark of deep Bayesian methods to extract uncertainty estimates, namely Bayes-By-Backprop, dropout, its concrete variation, bootstrapped ensemble and alpha-divergences, combining it with DQN algorithm.
Tasks	Dialogue Management, Efficient Exploration
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11486v1
PDF	http://arxiv.org/pdf/1711.11486v1.pdf
PWC	https://paperswithcode.com/paper/uncertainty-estimates-for-efficient-neural
Repo
Framework

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration


Title	Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration
Authors	Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang
Abstract	We study the combinatorial pure exploration problem Best-Set in stochastic multi-armed bandits. In a Best-Set instance, we are given $n$ arms with unknown reward distributions, as well as a family $\mathcal{F}$ of feasible subsets over the arms. Our goal is to identify the feasible subset in $\mathcal{F}$ with the maximum total mean using as few samples as possible. The problem generalizes the classical best arm identification problem and the top-$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln\mathcal{F}$. For an important class of combinatorial families, we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and approximate Pareto curves in multi-objective optimization. We also show that the $\ln\mathcal{F}$ factor is inevitable in general through a nontrivial lower bound construction. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general Best-Set problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$-dimensional Euclidean space $\mathbb{R}^n$, and a collection $\mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $\mathcal{O}$ that contains the $n$-dimensional vector of the means. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.
Tasks	Multi-Armed Bandits
Published	2017-06-04
URL	http://arxiv.org/abs/1706.01081v1
PDF	http://arxiv.org/pdf/1706.01081v1.pdf
PWC	https://paperswithcode.com/paper/nearly-optimal-sampling-algorithms-for
Repo
Framework

Supervising Neural Attention Models for Video Captioning by Human Gaze Data


Title	Supervising Neural Attention Models for Video Captioning by Human Gaze Data
Authors	Youngjae Yu, Jongwook Choi, Yeonhwa Kim, Kyung Yoo, Sang-Hun Lee, Gunhee Kim
Abstract	The attention mechanisms in deep neural networks are inspired by human’s attention that sequentially focuses on the most relevant parts of the information over time to generate prediction output. The attention parameters in those models are implicitly trained in an end-to-end manner, yet there have been few trials to explicitly incorporate human gaze tracking to supervise the attention models. In this paper, we investigate whether attention models can benefit from explicit human gaze labels, especially for the task of video captioning. We collect a new dataset called VAS, consisting of movie clips, and corresponding multiple descriptive sentences along with human gaze tracking data. We propose a video captioning model named Gaze Encoding Attention Network (GEAN) that can leverage gaze tracking information to provide the spatial and temporal attention for sentence generation. Through evaluation of language similarity metrics and human assessment via Amazon mechanical Turk, we demonstrate that spatial attentions guided by human gaze data indeed improve the performance of multiple captioning methods. Moreover, we show that the proposed approach achieves the state-of-the-art performance for both gaze prediction and video captioning not only in our VAS dataset but also in standard datasets (e.g. LSMDC and Hollywood2).
Tasks	Gaze Prediction, Video Captioning
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06029v1
PDF	http://arxiv.org/pdf/1707.06029v1.pdf
PWC	https://paperswithcode.com/paper/supervising-neural-attention-models-for-video
Repo
Framework