January 30, 2020

3229 words 16 mins read

Paper Group ANR 398

Paper Group ANR 398

An Online Decision-Theoretic Pipeline for Responder Dispatch. ShEMO – A Large-Scale Validated Database for Persian Speech Emotion Detection. Fast Fixed Dimension L2-Subspace Embeddings of Arbitrary Accuracy, With Application to L1 and L2 Tasks. Optimal Training of Fair Predictive Models. DLBricks: Composable Benchmark Generation to Reduce Deep Lea …

An Online Decision-Theoretic Pipeline for Responder Dispatch

Title An Online Decision-Theoretic Pipeline for Responder Dispatch
Authors Ayan Mukhopadhyay, Geoffrey Pettet, Chinmaya Samal, Abhishek Dubey, Yevgeniy Vorobeychik
Abstract The problem of dispatching emergency responders to service traffic accidents, fire, distress calls and crimes plagues urban areas across the globe. While such problems have been extensively looked at, most approaches are offline. Such methodologies fail to capture the dynamically changing environments under which critical emergency response occurs, and therefore, fail to be implemented in practice. Any holistic approach towards creating a pipeline for effective emergency response must also look at other challenges that it subsumes - predicting when and where incidents happen and understanding the changing environmental dynamics. We describe a system that collectively deals with all these problems in an online manner, meaning that the models get updated with streaming data sources. We highlight why such an approach is crucial to the effectiveness of emergency response, and present an algorithmic framework that can compute promising actions for a given decision-theoretic model for responder dispatch. We argue that carefully crafted heuristic measures can balance the trade-off between computational time and the quality of solutions achieved and highlight why such an approach is more scalable and tractable than traditional approaches. We also present an online mechanism for incident prediction, as well as an approach based on recurrent neural networks for learning and predicting environmental features that affect responder dispatch. We compare our methodology with prior state-of-the-art and existing dispatch strategies in the field, which show that our approach results in a reduction in response time with a drastic reduction in computational time.
Tasks
Published 2019-02-21
URL http://arxiv.org/abs/1902.08274v1
PDF http://arxiv.org/pdf/1902.08274v1.pdf
PWC https://paperswithcode.com/paper/an-online-decision-theoretic-pipeline-for
Repo
Framework

ShEMO – A Large-Scale Validated Database for Persian Speech Emotion Detection

Title ShEMO – A Large-Scale Validated Database for Persian Speech Emotion Detection
Authors Omid Mohamad Nezami, Paria Jamshid Lou, Mansoureh Karami
Abstract This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female=59.4%, male=57.6%). The ShEMO is available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.
Tasks
Published 2019-06-04
URL https://arxiv.org/abs/1906.01155v3
PDF https://arxiv.org/pdf/1906.01155v3.pdf
PWC https://paperswithcode.com/paper/shemo-a-large-scale-validated-database-for
Repo
Framework

Fast Fixed Dimension L2-Subspace Embeddings of Arbitrary Accuracy, With Application to L1 and L2 Tasks

Title Fast Fixed Dimension L2-Subspace Embeddings of Arbitrary Accuracy, With Application to L1 and L2 Tasks
Authors Malik Magdon-Ismail, Alex Gittens
Abstract We give a fast oblivious L2-embedding of $A\in \mathbb{R}^{n x d}$ to $B\in \mathbb{R}^{r x d}$ satisfying $(1-\varepsilon)\A x_2^2 \le \B x_2^2 <= (1+\varepsilon) \Ax_2^2.$ Our embedding dimension $r$ equals $d$, a constant independent of the distortion $\varepsilon$. We use as a black-box any L2-embedding $\Pi^T A$ and inherit its runtime and accuracy, effectively decoupling the dimension $r$ from runtime and accuracy, allowing downstream machine learning applications to benefit from both a low dimension and high accuracy (in prior embeddings higher accuracy means higher dimension). We give applications of our L2-embedding to regression, PCA and statistical leverage scores. We also give applications to L1: 1.) An oblivious L1-embedding with dimension $d+O(d\ln^{1+\eta} d)$ and distortion $O((d\ln d)/\ln\ln d)$, with application to constructing well-conditioned bases; 2.) Fast approximation of L1-Lewis weights using our L2 embedding to quickly approximate L2-leverage scores.
Tasks
Published 2019-09-27
URL https://arxiv.org/abs/1909.12580v1
PDF https://arxiv.org/pdf/1909.12580v1.pdf
PWC https://paperswithcode.com/paper/fast-fixed-dimension-l2-subspace-embeddings
Repo
Framework

Optimal Training of Fair Predictive Models

Title Optimal Training of Fair Predictive Models
Authors Razieh Nabi, Daniel Malinsky, Ilya Shpitser
Abstract Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance, without requiring parametric models for high-dimensional feature vectors.
Tasks
Published 2019-10-09
URL https://arxiv.org/abs/1910.04109v1
PDF https://arxiv.org/pdf/1910.04109v1.pdf
PWC https://paperswithcode.com/paper/optimal-training-of-fair-predictive-models
Repo
Framework

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)

Title DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)
Authors Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu
Abstract The past few years have seen a surge of applying Deep Learning (DL) models for a wide array of tasks such as image classification, object detection, machine translation, etc. While DL models provide an opportunity to solve otherwise intractable tasks, their adoption relies on them being optimized to meet latency and resource requirements. Benchmarking is a key step in this process but has been hampered in part due to the lack of representative and up-to-date benchmarking suites. This is exacerbated by the fast-evolving pace of DL models. This paper proposes DLBricks, a composable benchmark generation design that reduces the effort of developing, maintaining, and running DL benchmarks on CPUs. DLBricks decomposes DL models into a set of unique runnable networks and constructs the original model’s performance using the performance of the generated benchmarks. DLBricks leverages two key observations: DL layers are the performance building blocks of DL models and layers are extensively repeated within and across DL models. Since benchmarks are generated automatically and the benchmarking time is minimized, DLBricks can keep up-to-date with the latest proposed models, relieving the pressure of selecting representative DL models. Moreover, DLBricks allows users to represent proprietary models within benchmark suites. We evaluate DLBricks using $50$ MXNet models spanning $5$ DL tasks on $4$ representative CPU systems. We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e.g. within $95%$ accuracy and up to $4.4\times$ benchmarking time speedup on Amazon EC2 c5.xlarge).
Tasks Image Classification, Machine Translation, Object Detection
Published 2019-11-18
URL https://arxiv.org/abs/1911.07967v3
PDF https://arxiv.org/pdf/1911.07967v3.pdf
PWC https://paperswithcode.com/paper/dlbricks-composable-benchmark-generation
Repo
Framework

Contribution au Niveau de l’Approche Indirecte à Base de Transfert dans la Traduction Automatique

Title Contribution au Niveau de l’Approche Indirecte à Base de Transfert dans la Traduction Automatique
Authors Sadik Bessou
Abstract In this thesis, we address several important issues concerning the morphological analysis of Arabic language applied to textual data and machine translation. First, we provided an overview on machine translation, its history and its development, then we exposed human translation techniques for eventual inspiration in machine translation, and we exposed linguistic approaches and particularly indirect transfer approaches. Finally, we presented our contributions to the resolution of morphosyntactic problems in computer linguistics as multilingual information retrieval and machine translation. As a first contribution, we developed a morphological analyzer for Arabic, and we have exploited it in the bilingual information retrieval such as a computer application of multilingual documentary. Results validation showed a statistically significant performance. In a second contribution, we proposed a list of morphosyntactic transfer rules from English to Arabic for translation in three phases: analysis, transfer, generation. We focused on the transfer phase without semantic distortion for an abstraction of English in a sufficient subset of Arabic.
Tasks Information Retrieval, Machine Translation, Morphological Analysis
Published 2019-11-16
URL https://arxiv.org/abs/1911.07030v1
PDF https://arxiv.org/pdf/1911.07030v1.pdf
PWC https://paperswithcode.com/paper/contribution-au-niveau-de-lapproche-indirecte
Repo
Framework

Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation

Title Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation
Authors Umberto Michieli, Matteo Biasetton, Gianluca Agresti, Pietro Zanuttigh
Abstract Deep learning techniques have been widely used in autonomous driving systems for the semantic understanding of urban scenes. However, they need a huge amount of labeled data for training, which is difficult and expensive to acquire. A recently proposed workaround is to train deep networks using synthetic data, but the domain shift between real world and synthetic representations limits the performance. In this work, a novel Unsupervised Domain Adaptation (UDA) strategy is introduced to solve this issue. The proposed learning strategy is driven by three components: a standard supervised learning loss on labeled synthetic data; an adversarial learning module that exploits both labeled synthetic data and unlabeled real data; finally, a self-teaching strategy applied to unlabeled data. The last component exploits a region growing framework guided by the segmentation confidence. Furthermore, we weighted this component on the basis of the class frequencies to enhance the performance on less common classes. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets, like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary.
Tasks Autonomous Driving, Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2019-09-02
URL https://arxiv.org/abs/1909.00781v2
PDF https://arxiv.org/pdf/1909.00781v2.pdf
PWC https://paperswithcode.com/paper/adversarial-learning-and-self-teaching
Repo
Framework

Bilateral Operators for Functional Maps

Title Bilateral Operators for Functional Maps
Authors Gautam Pai, Mor Joseph-Rivlin, Ron Kimmel
Abstract A majority of shape correspondence frameworks are based on devising pointwise and pairwise constraints on the correspondence map. The functional maps framework allows for formulating these constraints in the spectral domain. In this paper, we develop a functional map framework for the shape correspondence problem by constructing pairwise constraints using point-wise descriptors. Our core observation is that, every point-wise descriptor allows for the construction a pairwise kernel operator whose low frequency eigenfunctions depict regions of similar descriptor values at various scales of frequency. By aggregating the pairwise information from the descriptor and the intrinsic geometry of the surface encoded in the heat kernel, we construct a hybrid kernel and call it the bilateral operator. Analogous to the edge preserving bilateral filter in image processing, the action of the bilateral operator on a function defined over the manifold yields a descriptor dependent local smoothing of that function. By forcing the correspondence map to commute with the Bilateral operator, we show that we can maximally exploit the information from a given set of pointwise descriptors in a functional map framework.
Tasks
Published 2019-07-30
URL https://arxiv.org/abs/1907.12993v1
PDF https://arxiv.org/pdf/1907.12993v1.pdf
PWC https://paperswithcode.com/paper/bilateral-operators-for-functional-maps
Repo
Framework

Vehicle Re-identification with Viewpoint-aware Metric Learning

Title Vehicle Re-identification with Viewpoint-aware Metric Learning
Authors Ruihang Chu, Yifan Sun, Yadong Li, Zheng Liu, Chi Zhang, Yichen Wei
Abstract This paper considers vehicle re-identification (re-ID) problem. The extreme viewpoint variation (up to 180 degrees) poses great challenges for existing approaches. Inspired by the behavior in human’s recognition process, we propose a novel viewpoint-aware metric learning approach. It learns two metrics for similar viewpoints and different viewpoints in two feature spaces, respectively, giving rise to viewpoint-aware network (VANet). During training, two types of constraints are applied jointly. During inference, viewpoint is firstly estimated and the corresponding metric is used. Experimental results confirm that VANet significantly improves re-ID accuracy, especially when the pair is observed from different viewpoints. Our method establishes the new state-of-the-art on two benchmarks.
Tasks Metric Learning, Vehicle Re-Identification
Published 2019-10-09
URL https://arxiv.org/abs/1910.04104v1
PDF https://arxiv.org/pdf/1910.04104v1.pdf
PWC https://paperswithcode.com/paper/vehicle-re-identification-with-viewpoint
Repo
Framework

Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding

Title Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding
Authors Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beack, Minje Kim
Abstract Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. CMRL differs from other DNN-based speech codecs, in that rather than modeling speech compression problem in a single large neural network, it optimizes a series of less-complicated modules in a two-phase training scheme. The proposed method shows better objective performance than AMR-WB and the state-of-the-art DNN-based speech codec with a similar network architecture. As an end-to-end model, it takes raw PCM signals as an input, but is also compatible with linear predictive coding (LPC), showing better subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved by using only 0.9 million trainable parameters, a significantly less complex architecture than the other DNN-based codecs in the literature.
Tasks
Published 2019-06-18
URL https://arxiv.org/abs/1906.07769v4
PDF https://arxiv.org/pdf/1906.07769v4.pdf
PWC https://paperswithcode.com/paper/cascaded-cross-module-residual-learning
Repo
Framework

DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning

Title DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning
Authors Bharathan Balaji, Sunil Mallya, Sahika Genc, Saurabh Gupta, Leo Dirac, Vineet Khare, Gourav Roy, Tao Sun, Yunzhe Tao, Brian Townsend, Eddie Calleja, Sunil Muralidhara, Dhanasekar Karuppasamy
Abstract DeepRacer is a platform for end-to-end experimentation with RL and can be used to systematically investigate the key challenges in developing intelligent control systems. Using the platform, we demonstrate how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera. It is trained in simulation with no additional tuning in physical world and demonstrates: 1) formulation and solution of a robust reinforcement learning algorithm, 2) narrowing the reality gap through joint perception and dynamics, 3) distributed on-demand compute architecture for training optimal policies, and 4) a robust evaluation method to identify when to stop training. It is the first successful large-scale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning. We open source our code and video demo on GitHub: https://git.io/fjxoJ.
Tasks
Published 2019-11-05
URL https://arxiv.org/abs/1911.01562v1
PDF https://arxiv.org/pdf/1911.01562v1.pdf
PWC https://paperswithcode.com/paper/deepracer-educational-autonomous-racing
Repo
Framework

Task2Vec: Task Embedding for Meta-Learning

Title Task2Vec: Task Embedding for Meta-Learning
Authors Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona
Abstract We introduce a method to provide vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function defined over those labels, we process images through a “probe network” and compute an embedding based on estimates of the Fisher information matrix associated with the probe network parameters. This provides a fixed-dimensional embedding of the task that is independent of details such as the number of classes and does not require any understanding of the class label semantics. We demonstrate that this embedding is capable of predicting task similarities that match our intuition about semantic and taxonomic relations between different visual tasks (e.g., tasks based on classifying different types of plants are similar) We also demonstrate the practical value of this framework for the meta-task of selecting a pre-trained feature extractor for a new task. We present a simple meta-learning framework for learning a metric on embeddings that is capable of predicting which feature extractors will perform well. Selecting a feature extractor with task embedding obtains a performance close to the best available feature extractor, while costing substantially less than exhaustively training and evaluating on all available feature extractors.
Tasks Meta-Learning
Published 2019-02-10
URL http://arxiv.org/abs/1902.03545v1
PDF http://arxiv.org/pdf/1902.03545v1.pdf
PWC https://paperswithcode.com/paper/task2vec-task-embedding-for-meta-learning
Repo
Framework

Entropy Minimization In Emergent Languages

Title Entropy Minimization In Emergent Languages
Authors Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, Marco Baroni
Abstract There is a growing interest in studying the languages emerging when neural agents are jointly trained to solve tasks requiring communication through a discrete channel. We investigate here the information-theoretic complexity of such languages, focusing on the basic two-agent, one-exchange setup. We find that, under common training procedures, the emergent languages are subject to an entropy minimization pressure that has also been detected in human language, whereby the mutual information between the communicating agent’s inputs and the messages is minimized, within the range afforded by the need for successful communication. This pressure is amplified as we increase communication channel discreteness. Further, we observe that stronger discrete-channel-driven entropy minimization leads to representations with increased robustness to overfitting and adversarial attacks. We conclude by discussing the implications of our findings for the study of natural and artificial communication systems.
Tasks Representation Learning
Published 2019-05-31
URL https://arxiv.org/abs/1905.13687v2
PDF https://arxiv.org/pdf/1905.13687v2.pdf
PWC https://paperswithcode.com/paper/information-minimization-in-emergent
Repo
Framework

Siamese Networks with Location Prior for Landmark Tracking in Liver Ultrasound Sequences

Title Siamese Networks with Location Prior for Landmark Tracking in Liver Ultrasound Sequences
Authors Alvaro Gomariz, Weiye Li, Ece Ozkan, Christine Tanner, Orcun Goksel
Abstract Image-guided radiation therapy can benefit from accurate motion tracking by ultrasound imaging, in order to minimize treatment margins and radiate moving anatomical targets, e.g., due to breathing. One way to formulate this tracking problem is the automatic localization of given tracked anatomical landmarks throughout a temporal ultrasound sequence. For this, we herein propose a fully-convolutional Siamese network that learns the similarity between pairs of image regions containing the same landmark. Accordingly, it learns to localize and thus track arbitrary image features, not only predefined anatomical structures. We employ a temporal consistency model as a location prior, which we combine with the network-predicted location probability map to track a target iteratively in ultrasound sequences. We applied this method on the dataset of the Challenge on Liver Ultrasound Tracking (CLUST) with competitive results, where our work is the first to effectively apply CNNs on this tracking problem, thanks to our temporal regularization.
Tasks Landmark Tracking
Published 2019-01-23
URL http://arxiv.org/abs/1901.08109v1
PDF http://arxiv.org/pdf/1901.08109v1.pdf
PWC https://paperswithcode.com/paper/siamese-networks-with-location-prior-for
Repo
Framework

When to Intervene: Detecting Abnormal Mood using Everyday Smartphone Conversations

Title When to Intervene: Detecting Abnormal Mood using Everyday Smartphone Conversations
Authors John Gideon, Katie Matton, Steve Anderau, Melvin G McInnis, Emily Mower Provost
Abstract Bipolar disorder (BPD) is a chronic mental illness characterized by extreme mood and energy changes from mania to depression. These changes drive behaviors that often lead to devastating personal or social consequences. BPD is managed clinically with regular interactions with care providers, who assess mood, energy levels, and the form and content of speech. Recent work has proposed smartphones for monitoring mood using speech. However, these works do not predict when to intervene. Predicting when to intervene is challenging because there is not a single measure that is relevant for every person: different individuals may have different levels of symptom severity considered typical. Additionally, this typical mood, or baseline, may change over time, making a single symptom threshold insufficient. This work presents an innovative approach that expands clinical mood monitoring to predict when interventions are necessary using an anomaly detection framework, which we call Temporal Normalization. We first validate the model using a dataset annotated for clinical interventions and then incorporate this method in a deep learning framework to predict mood anomalies from natural, unstructured, telephone speech data. The combination of these approaches provides a framework to enable real-world speech-focused mood monitoring.
Tasks Anomaly Detection
Published 2019-09-25
URL https://arxiv.org/abs/1909.11248v2
PDF https://arxiv.org/pdf/1909.11248v2.pdf
PWC https://paperswithcode.com/paper/when-to-intervene-detecting-abnormal-mood
Repo
Framework
comments powered by Disqus