January 30, 2020

3054 words 15 mins read

Paper Group ANR 240

SNODE: Spectral Discretization of Neural ODEs for System Identification. Quantitative analysis of Matthew effect and sparsity problem of recommender systems. Face representation by deep learning: a linear encoding in a parameter space?. When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation. Local Dis …

SNODE: Spectral Discretization of Neural ODEs for System Identification


Title	SNODE: Spectral Discretization of Neural ODEs for System Identification
Authors	Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník
Abstract	This paper proposes the use of spectral element methods \citep{canuto_spectral_1988} for fast and accurate training of Neural Ordinary Differential Equations (ODE-Nets; \citealp{Chen2018NeuralOD}) for system identification. This is achieved by expressing their dynamics as a truncated series of Legendre polynomials. The series coefficients, as well as the network weights, are computed by minimizing the weighted sum of the loss function and the violation of the ODE-Net dynamics. The problem is solved by coordinate descent that alternately minimizes, with respect to the coefficients and the weights, two unconstrained sub-problems using standard backpropagation and gradient methods. The resulting optimization scheme is fully time-parallel and results in a low memory footprint. Experimental comparison to standard methods, such as backpropagation through explicit solvers and the adjoint technique \citep{Chen2018NeuralOD}, on training surrogate models of small and medium-scale dynamical systems shows that it is at least one order of magnitude faster at reaching a comparable value of the loss function. The corresponding testing MSE is one order of magnitude smaller as well, suggesting generalization capabilities increase.
Tasks
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07038v2
PDF	https://arxiv.org/pdf/1906.07038v2.pdf
PWC	https://paperswithcode.com/paper/accelerating-neural-odes-with-spectral
Repo
Framework

Quantitative analysis of Matthew effect and sparsity problem of recommender systems


Title	Quantitative analysis of Matthew effect and sparsity problem of recommender systems
Authors	Hao Wang, Zonghu Wang, Weishi Zhang
Abstract	Recommender systems have received great commercial success. Recommendation has been used widely in areas such as e-commerce, online music FM, online news portal, etc. However, several problems related to input data structure pose serious challenge to recommender system performance. Two of these problems are Matthew effect and sparsity problem. Matthew effect heavily skews recommender system output towards popular items. Data sparsity problem directly affects the coverage of recommendation result. Collaborative filtering is a simple benchmark ubiquitously adopted in the industry as the baseline for recommender system design. Understanding the underlying mechanism of collaborative filtering is crucial for further optimization. In this paper, we do a thorough quantitative analysis on Matthew effect and sparsity problem in the particular context setting of collaborative filtering. We compare the underlying mechanism of user-based and item-based collaborative filtering and give insight to industrial recommender system builders.
Tasks	Recommendation Systems
Published	2019-09-24
URL	https://arxiv.org/abs/1909.12798v1
PDF	https://arxiv.org/pdf/1909.12798v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-analysis-of-matthew-effect-and
Repo
Framework

Face representation by deep learning: a linear encoding in a parameter space?


Title	Face representation by deep learning: a linear encoding in a parameter space?
Authors	Qiulei Dong, Jiayin Sun, Zhanyi Hu
Abstract	Recently, Convolutional Neural Networks (CNNs) have achieved tremendous performances on face recognition, and one popular perspective regarding CNNs’ success is that CNNs could learn discriminative face representations from face images with complex image feature encoding. However, it is still unclear what is the intrinsic mechanism of face representation in CNNs. In this work, we investigate this problem by formulating face images as points in a shape-appearance parameter space, and our results demonstrate that: (i) The encoding and decoding of the neuron responses (representations) to face images in CNNs could be achieved under a linear model in the parameter space, in agreement with the recent discovery in primate IT face neurons, but different from the aforementioned perspective on CNNs’ face representation with complex image feature encoding; (ii) The linear model for face encoding and decoding in the parameter space could achieve close or even better performances on face recognition and verification than state-of-the-art CNNs, which might provide new lights on the design strategies for face recognition systems; (iii) The neuron responses to face images in CNNs could not be adequately modelled by the axis model, a model recently proposed on face modelling in primate IT cortex. All these results might shed some lights on the often complained blackbox nature behind CNNs’ tremendous performances on face recognition.
Tasks	Face Recognition
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09768v1
PDF	https://arxiv.org/pdf/1910.09768v1.pdf
PWC	https://paperswithcode.com/paper/face-representation-by-deep-learning-a-linear
Repo
Framework

When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation


Title	When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation
Authors	Tian Lan, Xianling Mao, Heyan Huang, Wei Wei
Abstract	Despite the multi-turn open-domain dialogue systems have attracted more and more attention and made great progress, the existing dialogue systems are still very boring. Nearly all the existing dialogue models only provide a response when the user’s utterance is accepted. But during daily conversations, humans always decide whether to continue to utter an utterance based on the context. Intuitively, a dialogue model that can control the timing of talking autonomously based on the conversation context can chat with humans more naturally. In this paper, we explore the dialogue system that automatically controls the timing of talking during the conversation. Specifically, we adopt the decision module for the existing dialogue models. Furthermore, modeling conversation context effectively is very important for controlling the timing of talking. So we also adopt the graph neural networks to process the context with the natural graph structure. Extensive experiments on two benchmarks show that controlling the timing of talking can effectively improve the quality of dialogue generation, and the proposed methods significantly improve the accuracy of the timing of talking. In addition, we have publicly released the codes of our proposed model.
Tasks	Chatbot, Dialogue Generation
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09879v1
PDF	https://arxiv.org/pdf/1912.09879v1.pdf
PWC	https://paperswithcode.com/paper/when-to-talk-chatbot-controls-the-timing-of
Repo
Framework

Local Distance Restricted Bribery in Voting


Title	Local Distance Restricted Bribery in Voting
Authors	Palash Dey
Abstract	Studying complexity of various bribery problems has been one of the main research focus in computational social choice. In all the models of bribery studied so far, the briber has to pay every voter some amount of money depending on what the briber wants the voter to report and the briber has some budget at her disposal. Although these models successfully capture many real world applications, in many other scenarios, the voters may be unwilling to deviate too much from their true preferences. In this paper, we study the computational complexity of the problem of finding a preference profile which is as close to the true preference profile as possible and still achieves the briber’s goal subject to budget constraints. We call this problem Optimal Bribery. We consider three important measures of distances, namely, swap distance, footrule distance, and maximum displacement distance, and resolve the complexity of the optimal bribery problem for many common voting rules. We show that the problem is polynomial time solvable for the plurality and veto voting rules for all the three measures of distance. On the other hand, we prove that the problem is NP-complete for a class of scoring rules which includes the Borda voting rule, maximin, Copeland$^\alpha$ for any $\alpha\in[0,1]$, and Bucklin voting rules for all the three measures of distance even when the distance allowed per voter is $1$ for the swap and maximum displacement distances and $2$ for the footrule distance even without the budget constraints (which corresponds to having an infinite budget). For the $k$-approval voting rule for any constant $k>1$ and the simplified Bucklin voting rule, we show that the problem is NP-complete for the swap distance even when the distance allowed is $2$ and for the footrule distance even when the distance allowed is $4$ even without the budget constraints.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08711v2
PDF	http://arxiv.org/pdf/1901.08711v2.pdf
PWC	https://paperswithcode.com/paper/local-distance-restricted-bribery-in-voting
Repo
Framework

A Boost in Revealing Subtle Facial Expressions: A Consolidated Eulerian Framework


Title	A Boost in Revealing Subtle Facial Expressions: A Consolidated Eulerian Framework
Authors	Wei Peng, Xiaopeng Hong, Yingyue Xu, Guoying Zhao
Abstract	Facial Micro-expression Recognition (MER) distinguishes the underlying emotional states of spontaneous subtle facialexpressions. Automatic MER is challenging because that 1) the intensity of subtle facial muscle movement is extremely lowand 2) the duration of ME is transient.Recent works adopt motion magnification or time interpolation to resolve these issues. Nevertheless, existing works dividethem into two separate modules due to their non-linearity. Though such operation eases the difficulty in implementation, itignores their underlying connections and thus results in inevitable losses in both accuracy and speed. Instead, in this paper, weexplore their underlying joint formulations and propose a consolidated Eulerian framework to reveal the subtle facial movements.It expands the temporal duration and amplifies the muscle movements in micro-expressions simultaneously. Compared toexisting approaches, the proposed method can not only process ME clips more efficiently but also make subtle ME movementsmore distinguishable. Experiments on two public MER databases indicate that our model outperforms the state-of-the-art inboth speed and accuracy.
Tasks
Published	2019-01-23
URL	http://arxiv.org/abs/1901.07765v1
PDF	http://arxiv.org/pdf/1901.07765v1.pdf
PWC	https://paperswithcode.com/paper/a-boost-in-revealing-subtle-facial
Repo
Framework

Efficiency through Auto-Sizing: Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task


Title	Efficiency through Auto-Sizing: Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task
Authors	Kenton Murray, Brian DuSell, David Chiang
Abstract	This paper describes the Notre Dame Natural Language Processing Group’s (NDNLP) submission to the WNGT 2019 shared task (Hayashi et al., 2019). We investigated the impact of auto-sizing (Murray and Chiang, 2015; Murray et al., 2019) to the Transformer network (Vaswani et al., 2017) with the goal of substantially reducing the number of parameters in the model. Our method was able to eliminate more than 25% of the model’s parameters while suffering a decrease of only 1.1 BLEU.
Tasks
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07134v1
PDF	https://arxiv.org/pdf/1910.07134v1.pdf
PWC	https://paperswithcode.com/paper/efficiency-through-auto-sizing-notre-dame
Repo
Framework

Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error


Title	Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error
Authors	Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov
Abstract	Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal.
Tasks	Eye Tracking
Published	2019-02-02
URL	https://arxiv.org/abs/1902.00595v3
PDF	https://arxiv.org/pdf/1902.00595v3.pdf
PWC	https://paperswithcode.com/paper/character-based-surprisal-as-a-model-of-human
Repo
Framework

Joint Spatial and Layer Attention for Convolutional Networks


Title	Joint Spatial and Layer Attention for Convolutional Networks
Authors	Tony Joseph, Konstantinos G. Derpanis, Faisal Z. Qureshi
Abstract	In this paper, we propose a novel approach that learns to sequentially attend to different Convolutional Neural Networks (CNN) layers (i.e., `what'' feature abstraction to attend to) and different spatial locations of the selected feature map (i.e.,` where’') to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) step, both a CNN layer and localized spatial region within it are selected for further processing. We demonstrate the effectiveness of this approach on two computer vision tasks: (i) image-based six degree of freedom camera pose regression and (ii) indoor scene classification. Empirically, we show that combining the `what'' and` where’’ aspects of attention improves network performance on both tasks. We evaluate our method on standard benchmarks for camera localization (Cambridge, 7-Scenes, and TUM-LSI) and for scene classification (MIT-67 Indoor Scenes). For camera localization our approach reduces the median error by 18.8% for position and 8.2% for orientation (averaged over all scenes), and for scene classification it improves the mean accuracy by 3.4% over previous methods.
Tasks	Camera Localization, Scene Classification
Published	2019-01-16
URL	https://arxiv.org/abs/1901.05376v2
PDF	https://arxiv.org/pdf/1901.05376v2.pdf
PWC	https://paperswithcode.com/paper/uan-unified-attention-network-for
Repo
Framework

Feature reinforcement with word embedding and parsing information in neural TTS


Title	Feature reinforcement with word embedding and parsing information in neural TTS
Authors	Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong
Abstract	In this paper, we propose a feature reinforcement method under the sequence-to-sequence neural text-to-speech (TTS) synthesis framework. The proposed method utilizes the multiple input encoder to take three levels of text information, i.e., phoneme sequence, pre-trained word embedding, and grammatical structure of sentences from parser as the input feature for the neural TTS system. The added word and sentence level information can be viewed as the feature based pre-training strategy, which clearly enhances the model generalization ability. The proposed method not only improves the system robustness significantly but also improves the synthesized speech to near recording quality in our experiments for out-of-domain text.
Tasks
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00707v2
PDF	http://arxiv.org/pdf/1901.00707v2.pdf
PWC	https://paperswithcode.com/paper/feature-reinforcement-with-word-embedding-and
Repo
Framework

Fingerprint Spoof Detection: Temporal Analysis of Image Sequence


Title	Fingerprint Spoof Detection: Temporal Analysis of Image Sequence
Authors	Tarang Chugh, Anil K. Jain
Abstract	We utilize the dynamics involved in the imaging of a fingerprint on a touch-based fingerprint reader, such as perspiration, changes in skin color (blanching), and skin distortion, to differentiate real fingers from spoof (fake) fingers. Specifically, we utilize a deep learning-based architecture (CNN-LSTM) trained end-to-end using sequences of minutiae-centered local patches extracted from ten color frames captured on a COTS fingerprint reader. A time-distributed CNN (MobileNet-v1) extracts spatial features from each local patch, while a bi-directional LSTM layer learns the temporal relationship between the patches in the sequence. Experimental results on a database of 26,650 live frames from 685 subjects (1,333 unique fingers), and 32,910 spoof frames of 7 spoof materials (with 14 variants) shows the superiority of the proposed approach in both known-material and cross-material (generalization) scenarios. For instance, the proposed approach improves the state-of-the-art cross-material performance from TDR of 81.65% to 86.20% @ FDR = 0.2%.
Tasks
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08240v1
PDF	https://arxiv.org/pdf/1912.08240v1.pdf
PWC	https://paperswithcode.com/paper/fingerprint-spoof-detection-temporal-analysis
Repo
Framework

Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion


Title	Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion
Authors	Adrià Arbués-Sangüesa, Coloma Ballester, Gloria Haro
Abstract	Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided. This paper presents an analysis of several geometric and semantic visual features to detect and track basketball players. An ablation study is carried out and then used to remark that a robust tracker can be built with Deep Learning features, without the need of extracting contextual ones, such as proximity or color similarity, nor applying camera stabilization techniques. The presented tracker consists of: (1) a detection step, which uses a pretrained deep learning model to estimate the players pose, followed by (2) a tracking step, which leverages pose and semantic information from the output of a convolutional layer in a VGG network. Its performance is analyzed in terms of MOTA over a basketball dataset with more than 10k instances.
Tasks
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02042v2
PDF	https://arxiv.org/pdf/1906.02042v2.pdf
PWC	https://paperswithcode.com/paper/single-camera-basketball-tracker-through-pose
Repo
Framework

3D-RelNet: Joint Object and Relational Network for 3D Prediction


Title	3D-RelNet: Joint Object and Relational Network for 3D Prediction
Authors	Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta
Abstract	We propose an approach to predict the 3D shape and pose for the objects present in a scene. Existing learning based methods that pursue this goal make independent predictions per object, and do not leverage the relationships amongst them. We argue that reasoning about these relationships is crucial, and present an approach to incorporate these in a 3D prediction framework. In addition to independent per-object predictions, we predict pairwise relations in the form of relative 3D pose, and demonstrate that these can be easily incorporated to improve object level estimates. We report performance across different datasets (SUNCG, NYUv2), and show that our approach significantly improves over independent prediction approaches while also outperforming alternate implicit reasoning methods.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02729v3
PDF	https://arxiv.org/pdf/1906.02729v3.pdf
PWC	https://paperswithcode.com/paper/3d-relnet-joint-object-and-relational-network-1
Repo
Framework

Where to Look Next: Unsupervised Active Visual Exploration on 360° Input


Title	Where to Look Next: Unsupervised Active Visual Exploration on 360° Input
Authors	Soroush Seifi, Tinne Tuytelaars
Abstract	We address the problem of active visual exploration of large 360{\deg} inputs. In our setting an active agent with a limited camera bandwidth explores its 360{\deg} environment by changing its viewing direction at limited discrete time steps. As such, it observes the world as a sequence of narrow field-of-view ‘glimpses’, deciding for itself where to look next. Our proposed method exceeds previous works’ performance by a significant margin without the need for deep reinforcement learning or training separate networks as sidekicks. A key component of our system are the spatial memory maps that make the system aware of the glimpses’ orientations (locations in the 360{\deg} image). Further, we stress the advantages of retina-like glimpses when the agent’s sensor bandwidth and time-steps are limited. Finally, we use our trained model to do classification of the whole scene using only the information observed in the glimpses.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10304v2
PDF	https://arxiv.org/pdf/1909.10304v2.pdf
PWC	https://paperswithcode.com/paper/190910304
Repo
Framework

Differentiable Causal Computations via Delayed Trace


Title	Differentiable Causal Computations via Delayed Trace
Authors	David Sprunger, Shin-ya Katsumata
Abstract	We investigate causal computations taking sequences of inputs to sequences of outputs where the $n$th output depends on the first $n$ inputs only. We model these in category theory via a construction taking a Cartesian category $C$ to another category $St(C)$ with a novel trace-like operation called “delayed trace”, which misses yanking and dinaturality axioms of the usual trace. The delayed trace operation provides a feedback mechanism in $St(C)$ with an implicit guardedness guarantee. When $C$ is equipped with a Cartesian differential operator, we construct a differential operator for $St(C)$ using an abstract version of backpropagation through time, a technique from machine learning based on unrolling of functions. This obtains a swath of properties for backpropagation through time, including a chain rule and Schwartz theorem. Our differential operator is also able to compute the derivative of a stateful network without requiring the network to be unrolled.
Tasks
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01093v1
PDF	http://arxiv.org/pdf/1903.01093v1.pdf
PWC	https://paperswithcode.com/paper/differentiable-causal-computations-via
Repo
Framework