Paper Group ANR 240
SNODE: Spectral Discretization of Neural ODEs for System Identification. Quantitative analysis of Matthew effect and sparsity problem of recommender systems. Face representation by deep learning: a linear encoding in a parameter space?. When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation. Local Dis …
SNODE: Spectral Discretization of Neural ODEs for System Identification
Title | SNODE: Spectral Discretization of Neural ODEs for System Identification |
Authors | Alessio Quaglino, Marco Gallieri, Jonathan Masci, Jan Koutník |
Abstract | This paper proposes the use of spectral element methods \citep{canuto_spectral_1988} for fast and accurate training of Neural Ordinary Differential Equations (ODE-Nets; \citealp{Chen2018NeuralOD}) for system identification. This is achieved by expressing their dynamics as a truncated series of Legendre polynomials. The series coefficients, as well as the network weights, are computed by minimizing the weighted sum of the loss function and the violation of the ODE-Net dynamics. The problem is solved by coordinate descent that alternately minimizes, with respect to the coefficients and the weights, two unconstrained sub-problems using standard backpropagation and gradient methods. The resulting optimization scheme is fully time-parallel and results in a low memory footprint. Experimental comparison to standard methods, such as backpropagation through explicit solvers and the adjoint technique \citep{Chen2018NeuralOD}, on training surrogate models of small and medium-scale dynamical systems shows that it is at least one order of magnitude faster at reaching a comparable value of the loss function. The corresponding testing MSE is one order of magnitude smaller as well, suggesting generalization capabilities increase. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07038v2 |
https://arxiv.org/pdf/1906.07038v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-neural-odes-with-spectral |
Repo | |
Framework | |
Quantitative analysis of Matthew effect and sparsity problem of recommender systems
Title | Quantitative analysis of Matthew effect and sparsity problem of recommender systems |
Authors | Hao Wang, Zonghu Wang, Weishi Zhang |
Abstract | Recommender systems have received great commercial success. Recommendation has been used widely in areas such as e-commerce, online music FM, online news portal, etc. However, several problems related to input data structure pose serious challenge to recommender system performance. Two of these problems are Matthew effect and sparsity problem. Matthew effect heavily skews recommender system output towards popular items. Data sparsity problem directly affects the coverage of recommendation result. Collaborative filtering is a simple benchmark ubiquitously adopted in the industry as the baseline for recommender system design. Understanding the underlying mechanism of collaborative filtering is crucial for further optimization. In this paper, we do a thorough quantitative analysis on Matthew effect and sparsity problem in the particular context setting of collaborative filtering. We compare the underlying mechanism of user-based and item-based collaborative filtering and give insight to industrial recommender system builders. |
Tasks | Recommendation Systems |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.12798v1 |
https://arxiv.org/pdf/1909.12798v1.pdf | |
PWC | https://paperswithcode.com/paper/quantitative-analysis-of-matthew-effect-and |
Repo | |
Framework | |
Face representation by deep learning: a linear encoding in a parameter space?
Title | Face representation by deep learning: a linear encoding in a parameter space? |
Authors | Qiulei Dong, Jiayin Sun, Zhanyi Hu |
Abstract | Recently, Convolutional Neural Networks (CNNs) have achieved tremendous performances on face recognition, and one popular perspective regarding CNNs’ success is that CNNs could learn discriminative face representations from face images with complex image feature encoding. However, it is still unclear what is the intrinsic mechanism of face representation in CNNs. In this work, we investigate this problem by formulating face images as points in a shape-appearance parameter space, and our results demonstrate that: (i) The encoding and decoding of the neuron responses (representations) to face images in CNNs could be achieved under a linear model in the parameter space, in agreement with the recent discovery in primate IT face neurons, but different from the aforementioned perspective on CNNs’ face representation with complex image feature encoding; (ii) The linear model for face encoding and decoding in the parameter space could achieve close or even better performances on face recognition and verification than state-of-the-art CNNs, which might provide new lights on the design strategies for face recognition systems; (iii) The neuron responses to face images in CNNs could not be adequately modelled by the axis model, a model recently proposed on face modelling in primate IT cortex. All these results might shed some lights on the often complained blackbox nature behind CNNs’ tremendous performances on face recognition. |
Tasks | Face Recognition |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09768v1 |
https://arxiv.org/pdf/1910.09768v1.pdf | |
PWC | https://paperswithcode.com/paper/face-representation-by-deep-learning-a-linear |
Repo | |
Framework | |
When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation
Title | When to Talk: Chatbot Controls the Timing of Talking during Multi-turn Open-domain Dialogue Generation |
Authors | Tian Lan, Xianling Mao, Heyan Huang, Wei Wei |
Abstract | Despite the multi-turn open-domain dialogue systems have attracted more and more attention and made great progress, the existing dialogue systems are still very boring. Nearly all the existing dialogue models only provide a response when the user’s utterance is accepted. But during daily conversations, humans always decide whether to continue to utter an utterance based on the context. Intuitively, a dialogue model that can control the timing of talking autonomously based on the conversation context can chat with humans more naturally. In this paper, we explore the dialogue system that automatically controls the timing of talking during the conversation. Specifically, we adopt the decision module for the existing dialogue models. Furthermore, modeling conversation context effectively is very important for controlling the timing of talking. So we also adopt the graph neural networks to process the context with the natural graph structure. Extensive experiments on two benchmarks show that controlling the timing of talking can effectively improve the quality of dialogue generation, and the proposed methods significantly improve the accuracy of the timing of talking. In addition, we have publicly released the codes of our proposed model. |
Tasks | Chatbot, Dialogue Generation |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09879v1 |
https://arxiv.org/pdf/1912.09879v1.pdf | |
PWC | https://paperswithcode.com/paper/when-to-talk-chatbot-controls-the-timing-of |
Repo | |
Framework | |
Local Distance Restricted Bribery in Voting
Title | Local Distance Restricted Bribery in Voting |
Authors | Palash Dey |
Abstract | Studying complexity of various bribery problems has been one of the main research focus in computational social choice. In all the models of bribery studied so far, the briber has to pay every voter some amount of money depending on what the briber wants the voter to report and the briber has some budget at her disposal. Although these models successfully capture many real world applications, in many other scenarios, the voters may be unwilling to deviate too much from their true preferences. In this paper, we study the computational complexity of the problem of finding a preference profile which is as close to the true preference profile as possible and still achieves the briber’s goal subject to budget constraints. We call this problem Optimal Bribery. We consider three important measures of distances, namely, swap distance, footrule distance, and maximum displacement distance, and resolve the complexity of the optimal bribery problem for many common voting rules. We show that the problem is polynomial time solvable for the plurality and veto voting rules for all the three measures of distance. On the other hand, we prove that the problem is NP-complete for a class of scoring rules which includes the Borda voting rule, maximin, Copeland$^\alpha$ for any $\alpha\in[0,1]$, and Bucklin voting rules for all the three measures of distance even when the distance allowed per voter is $1$ for the swap and maximum displacement distances and $2$ for the footrule distance even without the budget constraints (which corresponds to having an infinite budget). For the $k$-approval voting rule for any constant $k>1$ and the simplified Bucklin voting rule, we show that the problem is NP-complete for the swap distance even when the distance allowed is $2$ and for the footrule distance even when the distance allowed is $4$ even without the budget constraints. |
Tasks | |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08711v2 |
http://arxiv.org/pdf/1901.08711v2.pdf | |
PWC | https://paperswithcode.com/paper/local-distance-restricted-bribery-in-voting |
Repo | |
Framework | |
A Boost in Revealing Subtle Facial Expressions: A Consolidated Eulerian Framework
Title | A Boost in Revealing Subtle Facial Expressions: A Consolidated Eulerian Framework |
Authors | Wei Peng, Xiaopeng Hong, Yingyue Xu, Guoying Zhao |
Abstract | Facial Micro-expression Recognition (MER) distinguishes the underlying emotional states of spontaneous subtle facialexpressions. Automatic MER is challenging because that 1) the intensity of subtle facial muscle movement is extremely lowand 2) the duration of ME is transient.Recent works adopt motion magnification or time interpolation to resolve these issues. Nevertheless, existing works dividethem into two separate modules due to their non-linearity. Though such operation eases the difficulty in implementation, itignores their underlying connections and thus results in inevitable losses in both accuracy and speed. Instead, in this paper, weexplore their underlying joint formulations and propose a consolidated Eulerian framework to reveal the subtle facial movements.It expands the temporal duration and amplifies the muscle movements in micro-expressions simultaneously. Compared toexisting approaches, the proposed method can not only process ME clips more efficiently but also make subtle ME movementsmore distinguishable. Experiments on two public MER databases indicate that our model outperforms the state-of-the-art inboth speed and accuracy. |
Tasks | |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.07765v1 |
http://arxiv.org/pdf/1901.07765v1.pdf | |
PWC | https://paperswithcode.com/paper/a-boost-in-revealing-subtle-facial |
Repo | |
Framework | |
Efficiency through Auto-Sizing: Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task
Title | Efficiency through Auto-Sizing: Notre Dame NLP’s Submission to the WNGT 2019 Efficiency Task |
Authors | Kenton Murray, Brian DuSell, David Chiang |
Abstract | This paper describes the Notre Dame Natural Language Processing Group’s (NDNLP) submission to the WNGT 2019 shared task (Hayashi et al., 2019). We investigated the impact of auto-sizing (Murray and Chiang, 2015; Murray et al., 2019) to the Transformer network (Vaswani et al., 2017) with the goal of substantially reducing the number of parameters in the model. Our method was able to eliminate more than 25% of the model’s parameters while suffering a decrease of only 1.1 BLEU. |
Tasks | |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07134v1 |
https://arxiv.org/pdf/1910.07134v1.pdf | |
PWC | https://paperswithcode.com/paper/efficiency-through-auto-sizing-notre-dame |
Repo | |
Framework | |
Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error
Title | Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error |
Authors | Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov |
Abstract | Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: upcoming words are more difficultto predict when the context is degraded, leading to increased surprisal. |
Tasks | Eye Tracking |
Published | 2019-02-02 |
URL | https://arxiv.org/abs/1902.00595v3 |
https://arxiv.org/pdf/1902.00595v3.pdf | |
PWC | https://paperswithcode.com/paper/character-based-surprisal-as-a-model-of-human |
Repo | |
Framework | |
Joint Spatial and Layer Attention for Convolutional Networks
Title | Joint Spatial and Layer Attention for Convolutional Networks |
Authors | Tony Joseph, Konstantinos G. Derpanis, Faisal Z. Qureshi |
Abstract | In this paper, we propose a novel approach that learns to sequentially attend to different Convolutional Neural Networks (CNN) layers (i.e., what'' feature abstraction to attend to) and different spatial locations of the selected feature map (i.e., where’') to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) step, both a CNN layer and localized spatial region within it are selected for further processing. We demonstrate the effectiveness of this approach on two computer vision tasks: (i) image-based six degree of freedom camera pose regression and (ii) indoor scene classification. Empirically, we show that combining the what'' and where’’ aspects of attention improves network performance on both tasks. We evaluate our method on standard benchmarks for camera localization (Cambridge, 7-Scenes, and TUM-LSI) and for scene classification (MIT-67 Indoor Scenes). For camera localization our approach reduces the median error by 18.8% for position and 8.2% for orientation (averaged over all scenes), and for scene classification it improves the mean accuracy by 3.4% over previous methods. |
Tasks | Camera Localization, Scene Classification |
Published | 2019-01-16 |
URL | https://arxiv.org/abs/1901.05376v2 |
https://arxiv.org/pdf/1901.05376v2.pdf | |
PWC | https://paperswithcode.com/paper/uan-unified-attention-network-for |
Repo | |
Framework | |
Feature reinforcement with word embedding and parsing information in neural TTS
Title | Feature reinforcement with word embedding and parsing information in neural TTS |
Authors | Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong |
Abstract | In this paper, we propose a feature reinforcement method under the sequence-to-sequence neural text-to-speech (TTS) synthesis framework. The proposed method utilizes the multiple input encoder to take three levels of text information, i.e., phoneme sequence, pre-trained word embedding, and grammatical structure of sentences from parser as the input feature for the neural TTS system. The added word and sentence level information can be viewed as the feature based pre-training strategy, which clearly enhances the model generalization ability. The proposed method not only improves the system robustness significantly but also improves the synthesized speech to near recording quality in our experiments for out-of-domain text. |
Tasks | |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00707v2 |
http://arxiv.org/pdf/1901.00707v2.pdf | |
PWC | https://paperswithcode.com/paper/feature-reinforcement-with-word-embedding-and |
Repo | |
Framework | |
Fingerprint Spoof Detection: Temporal Analysis of Image Sequence
Title | Fingerprint Spoof Detection: Temporal Analysis of Image Sequence |
Authors | Tarang Chugh, Anil K. Jain |
Abstract | We utilize the dynamics involved in the imaging of a fingerprint on a touch-based fingerprint reader, such as perspiration, changes in skin color (blanching), and skin distortion, to differentiate real fingers from spoof (fake) fingers. Specifically, we utilize a deep learning-based architecture (CNN-LSTM) trained end-to-end using sequences of minutiae-centered local patches extracted from ten color frames captured on a COTS fingerprint reader. A time-distributed CNN (MobileNet-v1) extracts spatial features from each local patch, while a bi-directional LSTM layer learns the temporal relationship between the patches in the sequence. Experimental results on a database of 26,650 live frames from 685 subjects (1,333 unique fingers), and 32,910 spoof frames of 7 spoof materials (with 14 variants) shows the superiority of the proposed approach in both known-material and cross-material (generalization) scenarios. For instance, the proposed approach improves the state-of-the-art cross-material performance from TDR of 81.65% to 86.20% @ FDR = 0.2%. |
Tasks | |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.08240v1 |
https://arxiv.org/pdf/1912.08240v1.pdf | |
PWC | https://paperswithcode.com/paper/fingerprint-spoof-detection-temporal-analysis |
Repo | |
Framework | |
Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion
Title | Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion |
Authors | Adrià Arbués-Sangüesa, Coloma Ballester, Gloria Haro |
Abstract | Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided. This paper presents an analysis of several geometric and semantic visual features to detect and track basketball players. An ablation study is carried out and then used to remark that a robust tracker can be built with Deep Learning features, without the need of extracting contextual ones, such as proximity or color similarity, nor applying camera stabilization techniques. The presented tracker consists of: (1) a detection step, which uses a pretrained deep learning model to estimate the players pose, followed by (2) a tracking step, which leverages pose and semantic information from the output of a convolutional layer in a VGG network. Its performance is analyzed in terms of MOTA over a basketball dataset with more than 10k instances. |
Tasks | |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02042v2 |
https://arxiv.org/pdf/1906.02042v2.pdf | |
PWC | https://paperswithcode.com/paper/single-camera-basketball-tracker-through-pose |
Repo | |
Framework | |
3D-RelNet: Joint Object and Relational Network for 3D Prediction
Title | 3D-RelNet: Joint Object and Relational Network for 3D Prediction |
Authors | Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta |
Abstract | We propose an approach to predict the 3D shape and pose for the objects present in a scene. Existing learning based methods that pursue this goal make independent predictions per object, and do not leverage the relationships amongst them. We argue that reasoning about these relationships is crucial, and present an approach to incorporate these in a 3D prediction framework. In addition to independent per-object predictions, we predict pairwise relations in the form of relative 3D pose, and demonstrate that these can be easily incorporated to improve object level estimates. We report performance across different datasets (SUNCG, NYUv2), and show that our approach significantly improves over independent prediction approaches while also outperforming alternate implicit reasoning methods. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02729v3 |
https://arxiv.org/pdf/1906.02729v3.pdf | |
PWC | https://paperswithcode.com/paper/3d-relnet-joint-object-and-relational-network-1 |
Repo | |
Framework | |
Where to Look Next: Unsupervised Active Visual Exploration on 360° Input
Title | Where to Look Next: Unsupervised Active Visual Exploration on 360° Input |
Authors | Soroush Seifi, Tinne Tuytelaars |
Abstract | We address the problem of active visual exploration of large 360{\deg} inputs. In our setting an active agent with a limited camera bandwidth explores its 360{\deg} environment by changing its viewing direction at limited discrete time steps. As such, it observes the world as a sequence of narrow field-of-view ‘glimpses’, deciding for itself where to look next. Our proposed method exceeds previous works’ performance by a significant margin without the need for deep reinforcement learning or training separate networks as sidekicks. A key component of our system are the spatial memory maps that make the system aware of the glimpses’ orientations (locations in the 360{\deg} image). Further, we stress the advantages of retina-like glimpses when the agent’s sensor bandwidth and time-steps are limited. Finally, we use our trained model to do classification of the whole scene using only the information observed in the glimpses. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10304v2 |
https://arxiv.org/pdf/1909.10304v2.pdf | |
PWC | https://paperswithcode.com/paper/190910304 |
Repo | |
Framework | |
Differentiable Causal Computations via Delayed Trace
Title | Differentiable Causal Computations via Delayed Trace |
Authors | David Sprunger, Shin-ya Katsumata |
Abstract | We investigate causal computations taking sequences of inputs to sequences of outputs where the $n$th output depends on the first $n$ inputs only. We model these in category theory via a construction taking a Cartesian category $C$ to another category $St(C)$ with a novel trace-like operation called “delayed trace”, which misses yanking and dinaturality axioms of the usual trace. The delayed trace operation provides a feedback mechanism in $St(C)$ with an implicit guardedness guarantee. When $C$ is equipped with a Cartesian differential operator, we construct a differential operator for $St(C)$ using an abstract version of backpropagation through time, a technique from machine learning based on unrolling of functions. This obtains a swath of properties for backpropagation through time, including a chain rule and Schwartz theorem. Our differential operator is also able to compute the derivative of a stateful network without requiring the network to be unrolled. |
Tasks | |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01093v1 |
http://arxiv.org/pdf/1903.01093v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-causal-computations-via |
Repo | |
Framework | |