January 26, 2020

3511 words 17 mins read

Paper Group ANR 1485

Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation. Active Scene Learning. Discovering Implicational Knowledge in Wikidata. Adaptive Hedging under Delayed Feedback. Diversified Co-Attention towards Informative Live Video Commenting. Towards Multimodal Emotion Recognition in Ge …

Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation


Title	Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation
Authors	Partha P Mitra
Abstract	Traditionally in regression one minimizes the number of fitting parameters or uses smoothing/regularization to trade training (TE) and generalization error (GE). Driving TE to zero by increasing fitting degrees of freedom (dof) is expected to increase GE. However modern big-data approaches, including deep nets, seem to over-parametrize and send TE to zero (data interpolation) without impacting GE. Overparametrization has the benefit that global minima of the empirical loss function proliferate and become easier to find. These phenomena have drawn theoretical attention. Regression and classification algorithms have been shown that interpolate data but also generalize optimally. An interesting related phenomenon has been noted: the existence of non-monotonic risk curves, with a peak in GE with increasing dof. It was suggested that this peak separates a classical regime from a modern regime where over-parametrization improves performance. Similar over-fitting peaks were reported previously (statistical physics approach to learning) and attributed to increased fitting model flexibility. We introduce a generative and fitting model pair (“Misparametrized Sparse Regression” or MiSpaR) and show that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof’s to match the data generative model and thus provides good generalization. This complicates the interpretation of overfitting peaks as separating a “classical” from a “modern” regime. Data interpolation itself cannot guarantee good generalization: we need to study the interpolation with different penalty terms. We present analytical formulae for GE curves for MiSpaR with $l_2$ and $l_1$ penalties, in the interpolating limit $\lambda\rightarrow 0$.These risk curves exhibit important differences and help elucidate the underlying phenomena.
Tasks
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03667v1
PDF	https://arxiv.org/pdf/1906.03667v1.pdf
PWC	https://paperswithcode.com/paper/understanding-overfitting-peaks-in
Repo
Framework

Active Scene Learning


Title	Active Scene Learning
Authors	Erelcan Yanik, Tevfik Metin Sezgin
Abstract	Sketch recognition allows natural and efficient interaction in pen-based interfaces. A key obstacle to building accurate sketch recognizers has been the difficulty of creating large amounts of annotated training data. Several authors have attempted to address this issue by creating synthetic data, and by building tools that support efficient annotation. Two prominent sets of approaches stand out from the rest of the crowd. They use interim classifiers trained with a small set of labeled data to aid the labeling of the remainder of the data. The first set of approaches uses a classifier trained with a partially labeled dataset to automatically label unlabeled instances. The others, based on active learning, save annotation effort by giving priority to labeling informative data instances. The former is sub-optimal since it doesn’t prioritize the order of labeling to favor informative instances, while the latter makes the strong assumption that unlabeled data comes in an already segmented form (i.e. the ink in the training data is already assembled into groups forming isolated object instances). In this paper, we propose an active learning framework that combines the strengths of these methods, while addressing their weaknesses. In particular, we propose two methods for deciding how batches of unsegmented sketch scenes should be labeled. The first method, scene-wise selection, assesses the informativeness of each drawing (sketch scene) as a whole, and asks the user to annotate all objects in the drawing. The latter, segment-wise selection, attempts more precise targeting to locate informative fragments of drawings for user labeling. We show that both selection schemes outperform random selection. Furthermore, we demonstrate that precise targeting yields superior performance. Overall, our approach allows reaching top accuracy figures with up to 30% savings in annotation cost.
Tasks	Active Learning, Sketch Recognition
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02832v1
PDF	http://arxiv.org/pdf/1903.02832v1.pdf
PWC	https://paperswithcode.com/paper/active-scene-learning
Repo
Framework

Discovering Implicational Knowledge in Wikidata


Title	Discovering Implicational Knowledge in Wikidata
Authors	Tom Hanika, Maximilian Marx, Gerd Stumme
Abstract	Knowledge graphs have recently become the state-of-the-art tool for representing the diverse and complex knowledge of the world. Examples include the proprietary knowledge graphs of companies such as Google, Facebook, IBM, or Microsoft, but also freely available ones such as YAGO, DBpedia, and Wikidata. A distinguishing feature of Wikidata is that the knowledge is collaboratively edited and curated. While this greatly enhances the scope of Wikidata, it also makes it impossible for a single individual to grasp complex connections between properties or understand the global impact of edits in the graph. We apply Formal Concept Analysis to efficiently identify comprehensible implications that are implicitly present in the data. Although the complex structure of data modelling in Wikidata is not amenable to a direct approach, we overcome this limitation by extracting contextual representations of parts of Wikidata in a systematic fashion. We demonstrate the practical feasibility of our approach through several experiments and show that the results may lead to the discovery of interesting implicational knowledge. Besides providing a method for obtaining large real-world data sets for FCA, we sketch potential applications in offering semantic assistance for editing and curating Wikidata.
Tasks	Knowledge Graphs
Published	2019-02-03
URL	http://arxiv.org/abs/1902.00916v1
PDF	http://arxiv.org/pdf/1902.00916v1.pdf
PWC	https://paperswithcode.com/paper/discovering-implicational-knowledge-in
Repo
Framework

Adaptive Hedging under Delayed Feedback


Title	Adaptive Hedging under Delayed Feedback
Authors	Alexander Korotin, Vladimir V’yugin, Evgeny Burnaev
Abstract	The article is devoted to investigating the application of hedging strategies to online expert weight allocation under delayed feedback. As the main result, we develop the General Hedging algorithm $\mathcal{G}$ based on the exponential reweighing of experts’ losses. We build the artificial probabilistic framework and use it to prove the adversarial loss bounds for the algorithm $\mathcal{G}$ in the delayed feedback setting. The designed algorithm $\mathcal{G}$ can be applied to both countable and continuous sets of experts. We also show how algorithm $\mathcal{G}$ extends classical Hedge (Multiplicative Weights) and adaptive Fixed Share algorithms to the delayed feedback and derive their regret bounds for the delayed setting by using our main result.
Tasks
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10433v2
PDF	https://arxiv.org/pdf/1902.10433v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-hedging-under-delayed-feedback
Repo
Framework

Diversified Co-Attention towards Informative Live Video Commenting


Title	Diversified Co-Attention towards Informative Live Video Commenting
Authors	Zhihan Zhang, Zhiyi Yin, Shuhuai Ren, Xinhang Li, Shicheng Li
Abstract	We focus on the task of Automatic Live Video Commenting (ALVC), which aims to generate real-time video comments based on both video frames and other viewers’ remarks. An intractable challenge in this task is the appropriate modeling of complex dependencies between video and textual inputs. Previous work in the ALVC task applies separate attention on these two input sources to obtain their representations. In this paper, we argue that the information of video and text should be modeled integrally. We propose a novel model equipped with a Diversified Co-Attention layer (DCA) and a Gated Attention Module (GAM). DCA allows interactions between video and text from diversified perspectives via metric learning, while GAM collects an informative context for comment generation. We further introduce a parameter orthogonalization technique to allieviate information redundancy in DCA. Experiment results show that our model outperforms previous approaches in the ALVC task and the traditional co-attention model, achieving state-of-the-art results.
Tasks	Metric Learning
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02739v1
PDF	https://arxiv.org/pdf/1911.02739v1.pdf
PWC	https://paperswithcode.com/paper/diversified-co-attention-towards-informative
Repo
Framework

Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning


Title	Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning
Authors	Deniz Cevher, Sebastian Zepf, Roman Klinger
Abstract	The recognition of emotions by humans is a complex process which considers multiple interacting signals such as facial expressions and both prosody and semantic content of utterances. Commonly, research on automatic recognition of emotions is, with few exceptions, limited to one modality. We describe an in-car experiment for emotion recognition from speech interactions for three modalities: the audio signal of a spoken interaction, the visual signal of the driver’s face, and the manually transcribed content of utterances of the driver. We use off-the-shelf tools for emotion detection in audio and face and compare that to a neural transfer learning approach for emotion recognition from text which utilizes existing resources from other domains. We see that transfer learning enables models based on out-of-domain corpora to perform well. This method contributes up to 10 percentage points in F1, with up to 76 micro-average F1 across the emotions joy, annoyance and insecurity. Our findings also indicate that off-the-shelf-tools analyzing face and audio are not ready yet for emotion detection in in-car speech interactions without further adjustments.
Tasks	Emotion Recognition, Multimodal Emotion Recognition, Transfer Learning
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02764v2
PDF	https://arxiv.org/pdf/1909.02764v2.pdf
PWC	https://paperswithcode.com/paper/towards-multimodal-emotion-recognition-in
Repo
Framework

What comes next? Extractive summarization by next-sentence prediction


Title	What comes next? Extractive summarization by next-sentence prediction
Authors	Jingyun Liu, Jackie C. K. Cheung, Annie Louis
Abstract	Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimization problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that human-written summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far. We show that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance, in addition to automatically predicting how long the target summary should be. We perform experiments on the New York Times Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model summarization baselines by significant margins. We also show that the lengths of summaries produced by our system correlates with the lengths of the human-written gold standards.
Tasks
Published	2019-01-12
URL	http://arxiv.org/abs/1901.03859v1
PDF	http://arxiv.org/pdf/1901.03859v1.pdf
PWC	https://paperswithcode.com/paper/what-comes-next-extractive-summarization-by
Repo
Framework

Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks


Title	Greedy Shallow Networks: A New Approach for Constructing and Training Neural Networks
Authors	Anton Dereventsov, Armenak Petrosyan, Clayton Webster
Abstract	We present a greedy-based approach to construct an efficient single hidden layer neural network with the ReLU activation that approximates a target function. In our approach we obtain a shallow network by utilizing a greedy algorithm with the prescribed dictionary provided by the available training data and a set of possible inner weights. To facilitate the greedy selection process we employ an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the greedy selection. Our approach allows for the construction of efficient architectures which can be treated either as improved initializations to be used in place of random-based alternatives, or as fully-trained networks in certain cases, thus potentially nullifying the need for backpropagation training. Numerical experiments demonstrate the tenability of the proposed concept and its advantages compared to the conventional techniques for selecting architectures and initializations for neural networks.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10409v2
PDF	https://arxiv.org/pdf/1905.10409v2.pdf
PWC	https://paperswithcode.com/paper/greedy-shallow-networks-a-new-approach-for
Repo
Framework

Few-shot tweet detection in emerging disaster events


Title	Few-shot tweet detection in emerging disaster events
Authors	Anna Kruspe
Abstract	Social media sources can provide crucial information in crisis situations, but discovering relevant messages is not trivial. Methods have so far focused on universal detection models for all kinds of crises or for certain crisis types (e.g. floods). Event-specific models could implement a more focused search area, but collecting data and training new models for a crisis that is already in progress is costly and may take too much time for a prompt response. As a compromise, manually collecting a small amount of example messages is feasible. Few-shot models can generalize to unseen classes with such a small handful of examples, and do not need be trained anew for each event. We compare how few-shot approaches (matching networks and prototypical networks) perform for this task. Since this is essentially a one-class problem, we also demonstrate how a modified one-class version of prototypical models can be used for this application.
Tasks
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02290v1
PDF	https://arxiv.org/pdf/1910.02290v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-tweet-detection-in-emerging-disaster
Repo
Framework

Egocentric Hand Track and Object-based Human Action Recognition


Title	Egocentric Hand Track and Object-based Human Action Recognition
Authors	Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas P. J. J. Noldus, Remco C. Veltkamp
Abstract	Egocentric vision is an emerging field of computer vision that is characterized by the acquisition of images and video from the first person perspective. In this paper we address the challenge of egocentric human action recognition by utilizing the presence and position of detected regions of interest in the scene explicitly, without further use of visual features. Initially, we recognize that human hands are essential in the execution of actions and focus on obtaining their movements as the principal cues that define actions. We employ object detection and region tracking techniques to locate hands and capture their movements. Prior knowledge about egocentric views facilitates hand identification between left and right. With regard to detection and tracking, we contribute a pipeline that successfully operates on unseen egocentric videos to find the camera wearer’s hands and associate them through time. Moreover, we emphasize on the value of scene information for action recognition. We acknowledge that the presence of objects is significant for the execution of actions by humans and in general for the description of a scene. To acquire this information, we utilize object detection for specific classes that are relevant to the actions we want to recognize. Our experiments are targeted on videos of kitchen activities from the Epic-Kitchens dataset. We model action recognition as a sequence learning problem of the detected spatial positions in the frames. Our results show that explicit hand and object detections with no other visual information can be relied upon to classify hand-related human actions. Testing against methods fully dependent on visual features, signals that for actions where hand motions are conceptually important, a region-of-interest-based description of a video contains equally expressive information with comparable classification performance.
Tasks	Object Detection, Temporal Action Localization
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00742v1
PDF	https://arxiv.org/pdf/1905.00742v1.pdf
PWC	https://paperswithcode.com/paper/egocentric-hand-track-and-object-based-human
Repo
Framework

MAANet: Multi-view Aware Attention Networks for Image Super-Resolution


Title	MAANet: Multi-view Aware Attention Networks for Image Super-Resolution
Authors	Jingcai Guo, Shiheng Ma, Song Guo
Abstract	In most recent years, deep convolutional neural networks (DCNNs) based image super-resolution (SR) has gained increasing attention in multimedia and computer vision communities, focusing on restoring the high-resolution (HR) image from a low-resolution (LR) image. However, one nonnegligible flaw of DCNNs based methods is that most of them are not able to restore high-resolution images containing sufficient high-frequency information from low-resolution images with low-frequency information redundancy. Worse still, as the depth of DCNNs increases, the training easily encounters the problem of vanishing gradients, which makes the training more difficult. These problems hinder the effectiveness of DCNNs in image SR task. To solve these problems, we propose the Multi-view Aware Attention Networks (MAANet) for image SR task. Specifically, we propose the local aware (LA) and global aware (GA) attention to deal with LR features in unequal manners, which can highlight the high-frequency components and discriminate each feature from LR images in the local and the global views, respectively. Furthermore, we propose the local attentive residual-dense (LARD) block, which combines the LA attention with multiple residual and dense connections, to fit a deeper yet easy to train architecture. The experimental results show that our proposed approach can achieve remarkable performance compared with other state-of-the-art methods.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06252v1
PDF	http://arxiv.org/pdf/1904.06252v1.pdf
PWC	https://paperswithcode.com/paper/maanet-multi-view-aware-attention-networks
Repo
Framework

Meta-descent for Online, Continual Prediction


Title	Meta-descent for Online, Continual Prediction
Authors	Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam White, Martha White
Abstract	This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update—a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.
Tasks	Time Series, Time Series Prediction
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07751v2
PDF	https://arxiv.org/pdf/1907.07751v2.pdf
PWC	https://paperswithcode.com/paper/meta-descent-for-online-continual-prediction
Repo
Framework

An Artificial Spiking Quantum Neuron


Title	An Artificial Spiking Quantum Neuron
Authors	Lasse Bjørn Kristensen, Matthias Degroote, Peter Wittek, Alán Aspuru-Guzik, Nikolaj T. Zinner
Abstract	Artificial spiking neural networks have found applications in areas where the temporal nature of activation offers an advantage, such as time series prediction and signal processing. To improve their efficiency, spiking architectures often run on custom-designed neuromorphic hardware, but, despite their attractive properties, these implementations have been limited to digital systems. We describe an artificial quantum spiking neuron that relies on the dynamical evolution of two easy to implement Hamiltonians and subsequent local measurements. The architecture allows exploiting complex amplitudes and back-action from measurements to influence the input. This approach to learning protocols is advantageous in the case where the input and output of the system are both quantum states. We demonstrate this through the classification of Bell pairs which can be seen as a certification protocol. Stacking the introduced elementary building blocks into larger networks combines the spatiotemporal features of a spiking neural network with the non-local quantum correlations across the graph.
Tasks	Time Series, Time Series Prediction
Published	2019-07-14
URL	https://arxiv.org/abs/1907.06269v1
PDF	https://arxiv.org/pdf/1907.06269v1.pdf
PWC	https://paperswithcode.com/paper/an-artificial-spiking-quantum-neuron
Repo
Framework

Counterfactual Evaluation of Treatment Assignment Functions with Networked Observational Data


Title	Counterfactual Evaluation of Treatment Assignment Functions with Networked Observational Data
Authors	Ruocheng Guo, Jundong Li, Huan Liu
Abstract	Counterfactual evaluation of novel treatment assignment functions (e.g., advertising algorithms and recommender systems) is one of the most crucial causal inference problems for practitioners. Traditionally, randomized controlled trials (A/B tests) are performed to evaluate treatment assignment functions. However, such trials can be time-consuming, expensive, and even unethical in some cases. Therefore, offline counterfactual evaluation of treatment assignment functions becomes a pressing issue because a massive amount of observational data is available in today’s big data era. Counterfactual evaluation requires handling the hidden confounders – the unmeasured features which causally influence both the treatment assignment and the outcome. To deal with the hidden confounders, most of the existing methods rely on the assumption of no hidden confounders. However, this assumption can be untenable in the context of massive observational data. When such data comes with network information, the later can be potentially useful to correct hidden confounding bias. As such, we first formulate a novel problem, counterfactual evaluation of treatment assignment functions with networked observational data. Then, we investigate the following research questions: How can we utilize network information in counterfactual evaluation? Can network information improve the estimates in counterfactual evaluation? Toward answering these questions, first, we propose a novel framework, \emph{Counterfactual Network Evaluator} (CONE), which (1) learns partial representations of latent confounders under the supervision of observed treatments and outcomes; and (2) combines them for counterfactual evaluation. Then through extensive experiments, we corroborate the effectiveness of CONE. The results imply that incorporating network information mitigates hidden confounding bias in counterfactual evaluation.
Tasks	Causal Inference, Recommendation Systems
Published	2019-12-22
URL	https://arxiv.org/abs/1912.10536v1
PDF	https://arxiv.org/pdf/1912.10536v1.pdf
PWC	https://paperswithcode.com/paper/counterfactual-evaluation-of-treatment
Repo
Framework

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks


Title	Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks
Authors	Aditya Golatkar, Alessandro Achille, Stefano Soatto
Abstract	We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gleaned by probing deep into its weights. We propose a method for “scrubbing’” the weights clean of information about a particular set of training data. The method does not require retraining from scratch, nor access to the data originally used for training. Instead, the weights are modified so that any probing function of the weights is indistinguishable from the same function applied to the weights of a network trained without the data to be forgotten. This condition is a generalized and weaker form of Differential Privacy. Exploiting ideas related to the stability of stochastic gradient descent, we introduce an upper-bound on the amount of information remaining in the weights, which can be estimated efficiently even for deep neural networks.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04933v5
PDF	https://arxiv.org/pdf/1911.04933v5.pdf
PWC	https://paperswithcode.com/paper/eternal-sunshine-of-the-spotless-net
Repo
Framework