February 1, 2020

3063 words 15 mins read

Paper Group AWR 166

Paper Group AWR 166

Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations. Imbalance Problems in Object Detection: A Review. Learning to Transport with Neural Networks. Deep speech inpainting of time-frequency masks. SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Emb …

Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations

Title Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access Locations
Authors Venktesh Pandey, Evana Wang, Stephen D. Boyles
Abstract This article develops a deep reinforcement learning (Deep-RL) framework for dynamic pricing on managed lanes with multiple access locations and heterogeneity in travelers’ value of time, origin, and destination. This framework relaxes assumptions in the literature by considering multiple origins and destinations, multiple access locations to the managed lane, en route diversion of travelers, partial observability of the sensor readings, and stochastic demand and observations. The problem is formulated as a partially observable Markov decision process (POMDP) and policy gradient methods are used to determine tolls as a function of real-time observations. Tolls are modeled as continuous and stochastic variables, and are determined using a feedforward neural network. The method is compared against a feedback control method used for dynamic pricing. We show that Deep-RL is effective in learning toll policies for maximizing revenue, minimizing total system travel time, and other joint weighted objectives, when tested on real-world transportation networks. The Deep-RL toll policies outperform the feedback control heuristic for the revenue maximization objective by generating revenues up to 9.5% higher than the heuristic and for the objective minimizing total system travel time (TSTT) by generating TSTT up to 10.4% lower than the heuristic. We also propose reward shaping methods for the POMDP to overcome the undesired behavior of toll policies, like the jam-and-harvest behavior of revenue-maximizing policies. Additionally, we test transferability of the algorithm trained on one set of inputs for new input distributions and offer recommendations on real-time implementations of Deep-RL algorithms. The source code for our experiments is available online at https://github.com/venktesh22/ExpressLanes_Deep-RL
Tasks Policy Gradient Methods
Published 2019-09-10
URL https://arxiv.org/abs/1909.04760v1
PDF https://arxiv.org/pdf/1909.04760v1.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-algorithm-for
Repo https://github.com/venktesh22/ExpressLanes_Deep-RL
Framework none

Imbalance Problems in Object Detection: A Review

Title Imbalance Problems in Object Detection: A Review
Authors Kemal Oksuz, Baris Can Cam, Sinan Kalkan, Emre Akbas
Abstract In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: https://github.com/kemaloksuz/ObjectDetectionImbalance .
Tasks Object Detection
Published 2019-08-31
URL https://arxiv.org/abs/1909.00169v3
PDF https://arxiv.org/pdf/1909.00169v3.pdf
PWC https://paperswithcode.com/paper/imbalance-problems-in-object-detection-a
Repo https://github.com/kemaloksuz/ObjectDetectionImbalance
Framework none

Learning to Transport with Neural Networks

Title Learning to Transport with Neural Networks
Authors Andrea Schioppa
Abstract We compare several approaches to learn an Optimal Map, represented as a neural network, between probability distributions. The approaches fall into two categories: ``Heuristics’’ and approaches with a more sound mathematical justification, motivated by the dual of the Kantorovitch problem. Among the algorithms we consider a novel approach involving dynamic flows and reductions of Optimal Transport to supervised learning. |
Tasks
Published 2019-08-04
URL https://arxiv.org/abs/1908.01394v1
PDF https://arxiv.org/pdf/1908.01394v1.pdf
PWC https://paperswithcode.com/paper/learning-to-transport-with-neural-networks
Repo https://github.com/salayatana66/learn_to_transport_code
Framework pytorch

Deep speech inpainting of time-frequency masks

Title Deep speech inpainting of time-frequency masks
Authors Mikolaj Kegler, Pierre Beckmann, Milos Cernak
Abstract In particularly noisy environments, transient loud intrusions can completely overpower parts of the speech signal, leading to an inevitable loss of information. Recent algorithms for noise suppression often yield impressive results but tend to struggle when the signal-to-noise ratio (SNR) of the mixture is low or when parts of the signal are missing. To address these issues, here we introduce an end-to-end framework for the retrieval of missing or severely distorted parts of time-frequency representation of speech, from the short-term context, thus speech inpainting. The framework is based on a convolutional U-Net trained via deep feature losses, obtained through speechVGG, a deep speech feature extractor pre-trained on the word classification task. Our evaluation results demonstrate that the proposed framework is effective at recovering large portions of missing or distorted parts of speech. Specifically, it yields notable improvements in STOI & PESQ objective metrics, as assessed using the LibriSpeech dataset.
Tasks
Published 2019-10-20
URL https://arxiv.org/abs/1910.09058v3
PDF https://arxiv.org/pdf/1910.09058v3.pdf
PWC https://paperswithcode.com/paper/deep-speech-inpainting-of-time-frequency
Repo https://github.com/bepierre/SpeechVGG
Framework tf

SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Embeddings Evaluation

Title SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Embeddings Evaluation
Authors Albina Khusainova, Adil Khan, Adín Ramírez Rivera
Abstract There is a huge imbalance between languages currently spoken and corresponding resources to study them. Most of the attention naturally goes to the “big” languages: those which have the largest presence in terms of media and number of speakers. Other less represented languages sometimes do not even have a good quality corpus to study them. In this paper, we tackle this imbalance by presenting a new set of evaluation resources for Tatar, a language of the Turkic language family which is mainly spoken in Tatarstan Republic, Russia. We present three datasets: Similarity and Relatedness datasets that consist of human scored word pairs and can be used to evaluate semantic models; and Analogies dataset that comprises analogy questions and allows to explore semantic, syntactic, and morphological aspects of language modeling. All three datasets build upon existing datasets for the English language and follow the same structure. However, they are not mere translations. They take into account specifics of the Tatar language and expand beyond the original datasets. We evaluate state-of-the-art word embedding models for two languages using our proposed datasets for Tatar and the original datasets for English and report our findings on performance comparison.
Tasks Language Modelling, Word Embeddings
Published 2019-03-31
URL http://arxiv.org/abs/1904.00365v1
PDF http://arxiv.org/pdf/1904.00365v1.pdf
PWC https://paperswithcode.com/paper/sart-similarity-analogies-and-relatedness-for
Repo https://github.com/tat-nlp/SART
Framework none

On Measuring Social Biases in Sentence Encoders

Title On Measuring Social Biases in Sentence Encoders
Authors Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger
Abstract The Word Embedding Association Test shows that GloVe and word2vec word embeddings exhibit human-like implicit biases based on gender, race, and other social constructs (Caliskan et al., 2017). Meanwhile, research on learning reusable text representations has begun to explore sentence-level texts, with some sentence encoders seeing enthusiastic adoption. Accordingly, we extend the Word Embedding Association Test to measure bias in sentence encoders. We then test several sentence encoders, including state-of-the-art methods such as ELMo and BERT, for the social biases studied in prior work and two important biases that are difficult or impossible to test at the word level. We observe mixed results including suspicious patterns of sensitivity that suggest the test’s assumptions may not hold in general. We conclude by proposing directions for future work on measuring bias in sentence encoders.
Tasks Word Embeddings
Published 2019-03-25
URL http://arxiv.org/abs/1903.10561v1
PDF http://arxiv.org/pdf/1903.10561v1.pdf
PWC https://paperswithcode.com/paper/on-measuring-social-biases-in-sentence
Repo https://github.com/W4ngatang/sent-bias
Framework tf

Sparse Bounded Degree Sum of Squares Optimization for Certifiably Globally Optimal Rotation Averaging

Title Sparse Bounded Degree Sum of Squares Optimization for Certifiably Globally Optimal Rotation Averaging
Authors Matthew Giamou, Filip Maric, Valentin Peretroukhin, Jonathan Kelly
Abstract Estimating unknown rotations from noisy measurements is an important step in SfM and other 3D vision tasks. Typically, local optimization methods susceptible to returning suboptimal local minima are used to solve the rotation averaging problem. A new wave of approaches that leverage convex relaxations have provided the first formal guarantees of global optimality for state estimation techniques involving SO(3). However, most of these guarantees are only applicable when the measurement error introduced by noise is within a certain bound that depends on the problem instance’s structure. In this paper, we cast rotation averaging as a polynomial optimization problem over unit quaternions to produce the first rotation averaging method that is formally guaranteed to provide a certifiably globally optimal solution for \textit{any} problem instance. This is achieved by formulating and solving a sparse convex sum of squares (SOS) relaxation of the problem. We provide an open source implementation of our algorithm and experiments, demonstrating the benefits of our globally optimal approach.
Tasks
Published 2019-04-02
URL https://arxiv.org/abs/1904.01645v2
PDF https://arxiv.org/pdf/1904.01645v2.pdf
PWC https://paperswithcode.com/paper/sparse-bounded-degree-sum-of-squares
Repo https://github.com/utiasSTARS/sos-rotation-averaging
Framework none

Causal Discovery Toolbox: Uncover causal relationships in Python

Title Causal Discovery Toolbox: Uncover causal relationships in Python
Authors Diviyan Kalainathan, Olivier Goudet
Abstract This paper presents a new open source Python framework for causal discovery from observational data and domain background knowledge, aimed at causal graph and causal mechanism modeling. The ‘cdt’ package implements the end-to-end approach, recovering the direct dependencies (the skeleton of the causal graph) and the causal relationships between variables. It includes algorithms from the ‘Bnlearn’ and ‘Pcalg’ packages, together with algorithms for pairwise causal discovery such as ANM. ‘cdt’ is available under the MIT License at https://github.com/Diviyan-Kalainathan/CausalDiscoveryToolbox.
Tasks Causal Discovery
Published 2019-03-06
URL http://arxiv.org/abs/1903.02278v1
PDF http://arxiv.org/pdf/1903.02278v1.pdf
PWC https://paperswithcode.com/paper/causal-discovery-toolbox-uncover-causal
Repo https://github.com/GoudetOlivier/CGNN
Framework tf

Adversarial Computation of Optimal Transport Maps

Title Adversarial Computation of Optimal Transport Maps
Authors Jacob Leygonie, Jennifer She, Amjad Almahairi, Sai Rajeswar, Aaron Courville
Abstract Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminator’s objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.
Tasks
Published 2019-06-24
URL https://arxiv.org/abs/1906.09691v1
PDF https://arxiv.org/pdf/1906.09691v1.pdf
PWC https://paperswithcode.com/paper/adversarial-computation-of-optimal-transport
Repo https://github.com/jshe/wasserstein-2
Framework pytorch

A Self-Attentive model for Knowledge Tracing

Title A Self-Attentive model for Knowledge Tracing
Authors Shalini Pandey, George Karypis
Abstract Knowledge tracing is the task of modeling each student’s mastery of knowledge concepts (KCs) as (s)he engages with a sequence of learning activities. Each student’s knowledge is modeled by estimating the performance of the student on the learning activities. It is an important research area for providing a personalized learning platform to students. In recent years, methods based on Recurrent Neural Networks (RNN) such as Deep Knowledge Tracing (DKT) and Dynamic Key-Value Memory Network (DKVMN) outperformed all the traditional methods because of their ability to capture complex representation of human learning. However, these methods face the issue of not generalizing well while dealing with sparse data which is the case with real-world data as students interact with few KCs. In order to address this issue, we develop an approach that identifies the KCs from the student’s past activities that are \textit{relevant} to the given KC and predicts his/her mastery based on the relatively few KCs that it picked. Since predictions are made based on relatively few past activities, it handles the data sparsity problem better than the methods based on RNN. For identifying the relevance between the KCs, we propose a self-attention based approach, Self Attentive Knowledge Tracing (SAKT). Extensive experimentation on a variety of real-world dataset shows that our model outperforms the state-of-the-art models for knowledge tracing, improving AUC by 4.43% on average.
Tasks Knowledge Tracing
Published 2019-07-16
URL https://arxiv.org/abs/1907.06837v1
PDF https://arxiv.org/pdf/1907.06837v1.pdf
PWC https://paperswithcode.com/paper/a-self-attentive-model-for-knowledge-tracing
Repo https://github.com/shalini1194/SAKT
Framework tf

Deep Multimodality Model for Multi-task Multi-view Learning

Title Deep Multimodality Model for Multi-task Multi-view Learning
Authors Lecheng Zheng, Yu Cheng, Jingrui He
Abstract Many real-world problems exhibit the coexistence of multiple types of heterogeneity, such as view heterogeneity (i.e., multi-view property) and task heterogeneity (i.e., multi-task property). For example, in an image classification problem containing multiple poses of the same object, each pose can be considered as one view, and the detection of each type of object can be treated as one task. Furthermore, in some problems, the data type of multiple views might be different. In a web classification problem, for instance, we might be provided an image and text mixed data set, where the web pages are characterized by both images and texts. A common strategy to solve this kind of problem is to leverage the consistency of views and the relatedness of tasks to build the prediction model. In the context of deep neural network, multi-task relatedness is usually realized by grouping tasks at each layer, while multi-view consistency is usually enforced by finding the maximal correlation coefficient between views. However, there is no existing deep learning algorithm that jointly models task and view dual heterogeneity, particularly for a data set with multiple modalities (text and image mixed data set or text and video mixed data set, etc.). In this paper, we bridge this gap by proposing a deep multi-task multi-view learning framework that learns a deep representation for such dual-heterogeneity problems. Empirical studies on multiple real-world data sets demonstrate the effectiveness of our proposed Deep-MTMV algorithm.
Tasks Image Classification, MULTI-VIEW LEARNING
Published 2019-01-25
URL http://arxiv.org/abs/1901.08723v1
PDF http://arxiv.org/pdf/1901.08723v1.pdf
PWC https://paperswithcode.com/paper/deep-multimodality-model-for-multi-task-multi
Repo https://github.com/Leo02016/DeepMTMV
Framework pytorch

Multi-view Deep Subspace Clustering Networks

Title Multi-view Deep Subspace Clustering Networks
Authors Pengfei Zhu, Binyuan Hui, Changqing Zhang, Dawei Du, Longyin Wen, Qinghua Hu
Abstract Multi-view subspace clustering aims to discover the inherent structure by fusing multi-view complementary information. Most existing methods first extract multiple types of hand-crafted features and then learn a joint affinity matrix for clustering. The disadvantage lies in two aspects: 1) Multi-view relations are not embedded into feature learning. 2) The end-to-end learning manner of deep learning is not well used in multi-view clustering. To address the above issues, we propose a novel multi-view deep subspace clustering network (MvDSCN) by learning a multi-view self-representation matrix in an end-to-end manner. MvDSCN consists of two sub-networks, i.e., diversity network (Dnet) and universality network (Unet). A latent space is built upon deep convolutional auto-encoders and a self-representation matrix is learned in the latent space using a fully connected layer. Dnet learns view-specific self-representation matrices while Unet learns a common self-representation matrix for all views. To exploit the complementarity of multi-view representations, Hilbert Schmidt Independence Criterion (HSIC) is introduced as a diversity regularization, which can capture the non-linear and high-order inter-view relations. As different views share the same label space, the self-representation matrices of each view are aligned to the common one by a universality regularization. Experiments on both multi-feature and multi-modality learning validate the superiority of the proposed multi-view subspace clustering model.
Tasks Multi-view Subspace Clustering
Published 2019-08-06
URL https://arxiv.org/abs/1908.01978v1
PDF https://arxiv.org/pdf/1908.01978v1.pdf
PWC https://paperswithcode.com/paper/multi-view-deep-subspace-clustering-networks
Repo https://github.com/huybery/MvDSCN
Framework tf

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

Title LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
Authors Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras
Abstract More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALLHOME database.
Tasks Speaker Diarization
Published 2019-07-23
URL https://arxiv.org/abs/1907.10393v1
PDF https://arxiv.org/pdf/1907.10393v1.pdf
PWC https://paperswithcode.com/paper/lstm-based-similarity-measurement-with
Repo https://github.com/cvqluu/nn-similarity-diarization
Framework pytorch

Making Convolutional Networks Shift-Invariant Again

Title Making Convolutional Networks Shift-Invariant Again
Authors Richard Zhang
Abstract Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe \textit{increased accuracy} in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe \textit{better generalization}, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. Code and anti-aliased versions of popular networks are available at https://richzhang.github.io/antialiased-cnns/ .
Tasks Classification Consistency, Conditional Image Generation, Image Classification, Image Generation
Published 2019-04-25
URL https://arxiv.org/abs/1904.11486v2
PDF https://arxiv.org/pdf/1904.11486v2.pdf
PWC https://paperswithcode.com/paper/190411486
Repo https://github.com/mnikitin/Shift-Invariant-CNNs
Framework mxnet

Rethinking Table Recognition using Graph Neural Networks

Title Rethinking Table Recognition using Graph Neural Networks
Authors Shah Rukh Qasim, Hassan Mahmood, Faisal Shafait
Abstract Document structure analysis, such as zone segmentation and table recognition, is a complex problem in document processing and is an active area of research. The recent success of deep learning in solving various computer vision and machine learning problems has not been reflected in document structure analysis since conventional neural networks are not well suited to the input structure of the problem. In this paper, we propose an architecture based on graph networks as a better alternative to standard neural networks for table recognition. We argue that graph networks are a more natural choice for these problems, and explore two gradient-based graph neural networks. Our proposed architecture combines the benefits of convolutional neural networks for visual feature extraction and graph networks for dealing with the problem structure. We empirically demonstrate that our method outperforms the baseline by a significant margin. In addition, we identify the lack of large scale datasets as a major hindrance for deep learning research for structure analysis and present a new large scale synthetic dataset for the problem of table recognition. Finally, we open-source our implementation of dataset generation and the training framework of our graph networks to promote reproducible research in this direction.
Tasks
Published 2019-05-31
URL https://arxiv.org/abs/1905.13391v2
PDF https://arxiv.org/pdf/1905.13391v2.pdf
PWC https://paperswithcode.com/paper/rethinking-table-parsing-using-graph-neural
Repo https://github.com/hassan-mahmood/TIES_DataGeneration
Framework none
comments powered by Disqus