January 31, 2020

3280 words 16 mins read

Paper Group AWR 382

Reverse-Engineering Satire, or “Paper on Computational Humor Accepted Despite Making Serious Advances”. Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning. A Study on Wrist Identification for Forensic Investigation. Overview and Results: CL-SciSumm Shared Task 2019. Learning Task-Specific Generalized Convolutions in the …

Reverse-Engineering Satire, or “Paper on Computational Humor Accepted Despite Making Serious Advances”


Title	Reverse-Engineering Satire, or “Paper on Computational Humor Accepted Despite Making Serious Advances”
Authors	Robert West, Eric Horvitz
Abstract	Humor is an essential human trait. Efforts to understand humor have called out links between humor and the foundations of cognition, as well as the importance of humor in social engagement. As such, it is a promising and important subject of study, with relevance for artificial intelligence and human-computer interaction. Previous computational work on humor has mostly operated at a coarse level of granularity, e.g., predicting whether an entire sentence, paragraph, document, etc., is humorous. As a step toward deep understanding of humor, we seek fine-grained models of attributes that make a given text humorous. Starting from the observation that satirical news headlines tend to resemble serious news headlines, we build and analyze a corpus of satirical headlines paired with nearly identical but serious headlines. The corpus is constructed via Unfun.me, an online game that incentivizes players to make minimal edits to satirical headlines with the goal of making other players believe the results are serious headlines. The edit operations used to successfully remove humor pinpoint the words and concepts that play a key role in making the original, satirical headline funny. Our analysis reveals that the humor tends to reside toward the end of headlines, and primarily in noun phrases, and that most satirical headlines follow a certain logical pattern, which we term false analogy. Overall, this paper deepens our understanding of the syntactic and semantic structure of satirical news headlines and provides insights for building humor-producing systems.
Tasks	Humor Detection
Published	2019-01-10
URL	https://arxiv.org/abs/1901.03253v3
PDF	https://arxiv.org/pdf/1901.03253v3.pdf
PWC	https://paperswithcode.com/paper/reverse-engineering-satire-or-paper-on
Repo	https://github.com/epfl-dlab/unfun
Framework	none

Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning


Title	Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning
Authors	Wonjae Kim, Yoonho Lee
Abstract	Without relevant human priors, neural networks may learn uninterpretable features. We propose Dynamics of Attention for Focus Transition (DAFT) as a human prior for machine reasoning. DAFT is a novel method that regularizes attention-based reasoning by modelling it as a continuous dynamical system using neural ordinary differential equations. As a proof of concept, we augment a state-of-the-art visual reasoning model with DAFT. Our experiments reveal that applying DAFT yields similar performance to the original model while using fewer reasoning steps, showing that it implicitly learns to skip unnecessary steps. We also propose a new metric, Total Length of Transition (TLT), which represents the effective reasoning step size by quantifying how much a given model’s focus drifts while reasoning about a question. We show that adding DAFT results in lower TLT, demonstrating that our method indeed obeys the human prior towards shorter reasoning paths in addition to producing more interpretable attention maps. Our code is available at https://github.com/kakao/DAFT.
Tasks	Visual Reasoning
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11666v3
PDF	https://arxiv.org/pdf/1905.11666v3.pdf
PWC	https://paperswithcode.com/paper/learning-dynamics-of-attention-human-prior
Repo	https://github.com/kakao/DAFT
Framework	pytorch

A Study on Wrist Identification for Forensic Investigation


Title	A Study on Wrist Identification for Forensic Investigation
Authors	Wojciech Michal Matkowski, Frodo Kin Sun Chan, Adams Wai Kin Kong
Abstract	Criminal and victim identification based on crime scene images is an important part of forensic investigation. Criminals usually avoid identification by covering their faces and tattoos in the evidence images, which are taken in uncontrolled environments. Existing identification methods, which make use of biometric traits, such as vein, skin mark, height, skin color, weight, race, etc., are considered for solving this problem. The soft biometric traits, including skin color, gender, height, weight and race, provide useful information but not distinctive enough. Veins and skin marks are limited to high resolution images and some body sites may neither have enough skin marks nor clear veins. Terrorists and rioters tend to expose their wrists in a gesture of triumph, greeting or salute, while paedophiles usually show them when touching victims. However, wrists were neglected by the biometric community for forensic applications. In this paper, a wrist identification algorithm, which includes skin segmentation, key point localization, image to template alignment, large feature set extraction, and classification, is proposed. The proposed algorithm is evaluated on NTU-Wrist-Image-Database-v1, which consists of 3945 images from 731 different wrists, including 205 pairs of wrist images collected from the Internet, taken under uneven illuminations with different poses and resolutions. The experimental results show that wrist is a useful clue for criminal and victim identification. Keywords: biometrics, criminal and victim identification, forensics, wrist.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03213v1
PDF	https://arxiv.org/pdf/1910.03213v1.pdf
PWC	https://paperswithcode.com/paper/a-study-on-wrist-identification-for-forensic
Repo	https://github.com/BFLTeam/NTU_Dataset
Framework	none

Overview and Results: CL-SciSumm Shared Task 2019


Title	Overview and Results: CL-SciSumm Shared Task 2019
Authors	Muthu Kumar Chandrasekaran, Michihiro Yasunaga, Dragomir Radev, Dayne Freitag, Min-Yen Kan
Abstract	The CL-SciSumm Shared Task is the first medium-scale shared task on scientific document summarization in the computational linguistics~(CL) domain. In 2019, it comprised three tasks: (1A) identifying relationships between citing documents and the referred document, (1B) classifying the discourse facets, and (2) generating the abstractive summary. The dataset comprised 40 annotated sets of citing and reference papers of the CL-SciSumm 2018 corpus and 1000 more from the SciSummNet dataset. All papers are from the open access research papers in the CL domain. This overview describes the participation and the official results of the CL-SciSumm 2019 Shared Task, organized as a part of the 42nd Annual Conference of the Special Interest Group in Information Retrieval (SIGIR), held in Paris, France in July 2019. We compare the participating systems in terms of two evaluation metrics and discuss the use of ROUGE as an evaluation metric. The annotated dataset used for this shared task and the scripts used for evaluation can be accessed and used by the community at: https://github.com/WING-NUS/scisumm-corpus.
Tasks	Document Summarization, Information Retrieval
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09854v1
PDF	https://arxiv.org/pdf/1907.09854v1.pdf
PWC	https://paperswithcode.com/paper/overview-and-results-cl-scisumm-shared-task
Repo	https://github.com/WING-NUS/scisumm-corpus
Framework	none

Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice


Title	Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice
Authors	Anne S. Wannenwetsch, Martin Kiefel, Peter V. Gehler, Stefan Roth
Abstract	Dense prediction tasks typically employ encoder-decoder architectures, but the prevalent convolutions in the decoder are not image-adaptive and can lead to boundary artifacts. Different generalized convolution operations have been introduced to counteract this. We go beyond these by leveraging guidance data to redefine their inherent notion of proximity. Our proposed network layer builds on the permutohedral lattice, which performs sparse convolutions in a high-dimensional space allowing for powerful non-local operations despite small filters. Multiple features with different characteristics span this permutohedral space. In contrast to prior work, we learn these features in a task-specific manner by generalizing the basic permutohedral operations to learnt feature representations. As the resulting objective is complex, a carefully designed framework and learning procedure are introduced, yielding rich feature embeddings in practice. We demonstrate the general applicability of our approach in different joint upsampling tasks. When adding our network layer to state-of-the-art networks for optical flow and semantic segmentation, boundary artifacts are removed and the accuracy is improved.
Tasks	Optical Flow Estimation, Semantic Segmentation
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03677v1
PDF	https://arxiv.org/pdf/1909.03677v1.pdf
PWC	https://paperswithcode.com/paper/learning-task-specific-generalized
Repo	https://github.com/visinf/semantic_lattice
Framework	mxnet

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks


Title	Luck Matters: Understanding Training Dynamics of Deep ReLU Networks
Authors	Yuandong Tian, Tina Jiang, Qucheng Gong, Ari Morcos
Abstract	We analyze the dynamics of training deep ReLU networks and their implications on generalization capability. Using a teacher-student setting, we discovered a novel relationship between the gradient received by hidden student nodes and the activations of teacher nodes for deep ReLU networks. With this relationship and the assumption of small overlapping teacher node activations, we prove that (1) student nodes whose weights are initialized to be close to teacher nodes converge to them at a faster rate, and (2) in over-parameterized regimes and 2-layer case, while a small set of lucky nodes do converge to the teacher nodes, the fan-out weights of other nodes converge to zero. This framework provides insight into multiple puzzling phenomena in deep learning like over-parameterization, implicit regularization, lottery tickets, etc. We verify our assumption by showing that the majority of BatchNorm biases of pre-trained VGG11/16 models are negative. Experiments on (1) random deep teacher networks with Gaussian inputs, (2) teacher network pre-trained on CIFAR-10 and (3) extensive ablation studies validate our multiple theoretical predictions.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13405v4
PDF	https://arxiv.org/pdf/1905.13405v4.pdf
PWC	https://paperswithcode.com/paper/luck-matters-understanding-training-dynamics
Repo	https://github.com/facebookresearch/luckmatters
Framework	pytorch

Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization


Title	Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization
Authors	Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-Yan Liu
Abstract	ResNet structure has achieved great empirical success since its debut. Recent work established the convergence of learning over-parameterized ResNet with a scaling factor $\tau=1/L$ on the residual branch where $L$ is the network depth. However, it is not clear how learning ResNet behaves for other values of $\tau$. In this paper, we fully characterize the convergence theory of gradient descent for learning over-parameterized ResNet with different values of $\tau$. Specifically, with hiding logarithmic factor and constant coefficients, we show that for $\tau\le 1/\sqrt{L}$ gradient descent is guaranteed to converge to the global minma, and especially when $\tau\le 1/L$ the convergence is irrelevant of the network depth. Conversely, we show that for $\tau>L^{-\frac{1}{2}+c}$, the forward output grows at least with rate $L^c$ in expectation and then the learning fails because of gradient explosion for large $L$. This means the bound $\tau\le 1/\sqrt{L}$ is sharp for learning ResNet with arbitrary depth. To the best of our knowledge, this is the first work that studies learning ResNet with full range of $\tau$.
Tasks
Published	2019-03-17
URL	https://arxiv.org/abs/1903.07120v4
PDF	https://arxiv.org/pdf/1903.07120v4.pdf
PWC	https://paperswithcode.com/paper/training-over-parameterized-deep-resnet-is
Repo	https://github.com/dayu11/tau-ResNet
Framework	pytorch

Semantic Relation Classification via Bidirectional LSTM Networks with Entity-aware Attention using Latent Entity Typing


Title	Semantic Relation Classification via Bidirectional LSTM Networks with Entity-aware Attention using Latent Entity Typing
Authors	Joohong Lee, Sangwoo Seo, Yong Suk Choi
Abstract	Classifying semantic relations between entity pairs in sentences is an important task in Natural Language Processing (NLP). Most previous models for relation classification rely on the high-level lexical and syntactic features obtained by NLP tools such as WordNet, dependency parser, part-of-speech (POS) tagger, and named entity recognizers (NER). In addition, state-of-the-art neural models based on attention mechanisms do not fully utilize information of entity that may be the most crucial features for relation classification. To address these issues, we propose a novel end-to-end recurrent neural model which incorporates an entity-aware attention mechanism with a latent entity typing (LET) method. Our model not only utilizes entities and their latent types as features effectively but also is more interpretable by visualizing attention mechanisms applied to our model and results of LET. Experimental results on the SemEval-2010 Task 8, one of the most popular relation classification task, demonstrate that our model outperforms existing state-of-the-art models without any high-level features.
Tasks	Entity Typing, Relation Classification, Relation Extraction
Published	2019-01-23
URL	http://arxiv.org/abs/1901.08163v1
PDF	http://arxiv.org/pdf/1901.08163v1.pdf
PWC	https://paperswithcode.com/paper/semantic-relation-classification-via-1
Repo	https://github.com/maganaluis/EARC_Estimator
Framework	tf

Unlabeled Data Improves Adversarial Robustness


Title	Unlabeled Data Improves Adversarial Robustness
Authors	Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi
Abstract	We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) $\ell_\infty$ robustness against several strong attacks via adversarial training and (ii) certified $\ell_2$ and $\ell_\infty$ robustness via randomized smoothing. On SVHN, adding the dataset’s own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13736v3
PDF	https://arxiv.org/pdf/1905.13736v3.pdf
PWC	https://paperswithcode.com/paper/unlabeled-data-improves-adversarial
Repo	https://github.com/yaircarmon/semisup-adv
Framework	pytorch

Libra R-CNN: Towards Balanced Learning for Object Detection


Title	Libra R-CNN: Towards Balanced Learning for Object Detection
Authors	Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin
Abstract	Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level. To mitigate the adverse effects caused thereby, we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level. Benefitted from the overall balanced design, Libra R-CNN significantly improves the detection performance. Without bells and whistles, it achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MSCOCO.
Tasks	Object Detection
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02701v1
PDF	http://arxiv.org/pdf/1904.02701v1.pdf
PWC	https://paperswithcode.com/paper/libra-r-cnn-towards-balanced-learning-for
Repo	https://github.com/hualuluu/--every-day-paper--
Framework	none

Task-Aware Monocular Depth Estimation for 3D Object Detection


Title	Task-Aware Monocular Depth Estimation for 3D Object Detection
Authors	Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei Li, Chunhua Shen
Abstract	Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.
Tasks	3D Object Detection, 3D Object Recognition, Depth Estimation, Monocular Depth Estimation, Object Detection, Object Recognition
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07701v2
PDF	https://arxiv.org/pdf/1909.07701v2.pdf
PWC	https://paperswithcode.com/paper/task-aware-monocular-depth-estimation-for-3d
Repo	https://github.com/WXinlong/ForeSeE
Framework	none

Generating Classical Chinese Poems from Vernacular Chinese


Title	Generating Classical Chinese Poems from Vernacular Chinese
Authors	Zhichao Yang, Pengshan Cai, Yansong Feng, Fei Li, Weijiang Feng, Elena Suet-Ying Chiu, Hong Yu
Abstract	Classical Chinese poetry is a jewel in the treasure house of Chinese culture. Previous poem generation models only allow users to employ keywords to interfere the meaning of generated poems, leaving the dominion of generation to the model. In this paper, we propose a novel task of generating classical Chinese poems from vernacular, which allows users to have more control over the semantic of generated poems. We adapt the approach of unsupervised machine translation (UMT) to our task. We use segmentation-based padding and reinforcement learning to address under-translation and over-translation respectively. According to experiments, our approach significantly improve the perplexity and BLEU compared with typical UMT models. Furthermore, we explored guidelines on how to write the input vernacular to generate better poems. Human evaluation showed our approach can generate high-quality poems which are comparable to amateur poems.
Tasks	Machine Translation, Unsupervised Machine Translation
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00279v1
PDF	https://arxiv.org/pdf/1909.00279v1.pdf
PWC	https://paperswithcode.com/paper/generating-classical-chinese-poems-from
Repo	https://github.com/whaleloops/interpoetry
Framework	pytorch

Visual Entailment: A Novel Task for Fine-Grained Image Understanding


Title	Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Authors	Ning Xie, Farley Lai, Derek Doran, Asim Kadav
Abstract	Existing visual reasoning datasets such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently proposed CLEVR dataset addresses these limitations and requires fine-grained reasoning but the dataset is synthetic and consists of similar objects and sentence structures across the dataset. In this paper, we introduce a new inference task, Visual Entailment (VE) - consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal of a trained VE model is to predict whether the image semantically entails the text. To realize this task, we build a dataset SNLI-VE based on the Stanford Natural Language Inference corpus and Flickr30k dataset. We evaluate various existing VQA baselines and build a model called Explainable Visual Entailment (EVE) system to address the VE task. EVE achieves up to 71% accuracy and outperforms several other state-of-the-art VQA based models. Finally, we demonstrate the explainability of EVE through cross-modal attention visualizations. The SNLI-VE dataset is publicly available at https://github.com/ necla-ml/SNLI-VE.
Tasks	Natural Language Inference, Question Answering, Visual Question Answering, Visual Reasoning
Published	2019-01-20
URL	http://arxiv.org/abs/1901.06706v1
PDF	http://arxiv.org/pdf/1901.06706v1.pdf
PWC	https://paperswithcode.com/paper/visual-entailment-a-novel-task-for-fine
Repo	https://github.com/necla-ml/SNLI-VE
Framework	none

Particle Smoothing Variational Objectives


Title	Particle Smoothing Variational Objectives
Authors	Antonio Khalil Moretti, Zizhao Wang, Luhuan Wu, Iddo Drori, Itsik Pe’er
Abstract	A body of recent work has focused on constructing a variational family of filtered distributions using Sequential Monte Carlo (SMC). Inspired by this work, we introduce Particle Smoothing Variational Objectives (SVO), a novel backward simulation technique and smoothed approximate posterior defined through a subsampling process. SVO augments support of the proposal and boosts particle diversity. Recent literature argues that increasing the number of samples K to obtain tighter variational bounds may hurt the proposal learning, due to a signal-to-noise ratio (SNR) of gradient estimators decreasing at the rate $\mathcal{O}(1/\sqrt{K})$. As a second contribution, we develop theoretical and empirical analysis of the SNR in filtering SMC, which motivates our choice of biased gradient estimators. We prove that introducing bias by dropping Categorical terms from the gradient estimate or using Gumbel-Softmax mitigates the adverse effect on the SNR. We apply SVO to three nonlinear latent dynamics tasks and provide statistics to rigorously quantify the predictions of filtered and smoothed objectives. SVO consistently outperforms filtered objectives when given fewer Monte Carlo samples on three nonlinear systems of increasing complexity.
Tasks
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09734v1
PDF	https://arxiv.org/pdf/1909.09734v1.pdf
PWC	https://paperswithcode.com/paper/190909734
Repo	https://github.com/amoretti86/PSVO
Framework	tf

Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM


Title	Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM
Authors	Qianqian Tong, Guannan Liang, Jinbo Bi
Abstract	Adaptive gradient methods (AGMs) have become popular in optimizing the nonconvex problems in deep learning area. We revisit AGMs and identify that the adaptive learning rate (A-LR) used by AGMs varies significantly across the dimensions of the problem over epochs (i.e., anisotropic scale), which may lead to issues in convergence and generalization. All existing modified AGMs actually represent efforts in revising the A-LR. Theoretically, we provide a new way to analyze the convergence of AGMs and prove that the convergence rate of \textsc{Adam} also depends on its hyper-parameter $\epsilon$, which has been overlooked previously. Based on these two facts, we propose a new AGM by calibrating the A-LR with an activation ({\em softplus}) function, resulting in the \textsc{Sadam} and \textsc{SAMSGrad} methods \footnote{Code is available at https://github.com/neilliang90/Sadam.git.}. We further prove that these algorithms enjoy better convergence speed under nonconvex, non-strongly convex, and Polyak-{\L}ojasiewicz conditions compared with \textsc{Adam}. Empirical studies support our observation of the anisotropic A-LR and show that the proposed methods outperform existing AGMs and generalize even better than S-Momentum in multiple deep learning tasks.
Tasks
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00700v2
PDF	https://arxiv.org/pdf/1908.00700v2.pdf
PWC	https://paperswithcode.com/paper/calibrating-the-learning-rate-for-adaptive
Repo	https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
Framework	pytorch