January 25, 2020

2940 words 14 mins read

Paper Group ANR 1740

End-to-end Anchored Speech Recognition. Social and Scene-Aware Trajectory Prediction in Crowded Spaces. Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs. Almost Tune-Free Variance Reduction. Robust Deep Networks with Randomized Tensor Regression Layers. Striking the Right Balance with Uncertainty. Sound source detectio …

End-to-end Anchored Speech Recognition


Title	End-to-end Anchored Speech Recognition
Authors	Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister
Abstract	Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device-directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from the “anchored segment”. The anchored segment refers to the wake-up word part of an audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise. The first method is called “Multi-source Attention” where the attention mechanism takes both the speaker information and decoder state into consideration. The second method directly learns a frame-level mask on top of the encoder output. We also explore a multi-task learning setup where we use the ground truth of the mask to guide the learner. Given that audio data with interfering speech is rare in our training data set, we also propose a way to synthesize “noisy” speech from “clean” speech to mitigate the mismatch between training and test data. Our proposed methods show up to 15% relative reduction in WER for Amazon Alexa live data with interfering background speech without significantly degrading on clean speech.
Tasks	Multi-Task Learning, Speech Recognition
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02383v1
PDF	http://arxiv.org/pdf/1902.02383v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-anchored-speech-recognition
Repo
Framework


Title	Social and Scene-Aware Trajectory Prediction in Crowded Spaces
Authors	Matteo Lisotto, Pasquale Coscia, Lamberto Ballan
Abstract	Mimicking human ability to forecast future positions or interpret complex interactions in urban scenarios, such as streets, shopping malls or squares, is essential to develop socially compliant robots or self-driving cars. Autonomous systems may gain advantage on anticipating human motion to avoid collisions or to naturally behave alongside people. To foresee plausible trajectories, we construct an LSTM (long short-term memory)-based model considering three fundamental factors: people interactions, past observations in terms of previously crossed areas and semantics of surrounding space. Our model encompasses several pooling mechanisms to join the above elements defining multiple tensors, namely social, navigation and semantic tensors. The network is tested in unstructured environments where complex paths emerge according to both internal (intentions) and external (other people, not accessible areas) motivations. As demonstrated, modeling paths unaware of social interactions or context information, is insufficient to correctly predict future positions. Experimental results corroborate the effectiveness of the proposed framework in comparison to LSTM-based models for human path prediction.
Tasks	Self-Driving Cars, Trajectory Prediction
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08840v1
PDF	https://arxiv.org/pdf/1909.08840v1.pdf
PWC	https://paperswithcode.com/paper/social-and-scene-aware-trajectory-prediction
Repo
Framework

Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs


Title	Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs
Authors	AmirEmad Ghassami, Kun Zhang, Negar Kiyavash
Abstract	The main way for defining equivalence among acyclic directed graphs is based on the conditional independencies of the distributions that they can generate. However, it is known that when cycles are allowed in the structure, conditional independence is not a suitable notion for equivalence of two structures, as it does not reflect all the information in the distribution that can be used for identification of the underlying structure. In this paper, we present a general, unified notion of equivalence for linear Gaussian directed graphs. Our proposed definition for equivalence is based on the set of distributions that the structure is able to generate. We take a first step towards devising methods for characterizing the equivalence of two structures, which may be cyclic or acyclic. Additionally, we propose a score-based method for learning the structure from observational data.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12993v1
PDF	https://arxiv.org/pdf/1910.12993v1.pdf
PWC	https://paperswithcode.com/paper/characterizing-distribution-equivalence-for
Repo
Framework

Almost Tune-Free Variance Reduction


Title	Almost Tune-Free Variance Reduction
Authors	Bingcong Li, Lingda Wang, Georgios B. Giannakis
Abstract	The variance reduction class of algorithms including the representative ones, abbreviated as SVRG and SARAH, have well documented merits for empirical risk minimization tasks. However, they require grid search to optimally tune parameters (step size and the number of iterations per inner loop) for best performance. This work introduces `almost tune-free' SVRG and SARAH schemes by equipping them with Barzilai-Borwein (BB) step sizes. To achieve the best performance, both i) averaging schemes; and, ii) the inner loop length are adjusted according to the BB step size. SVRG and SARAH are first reexamined through an` estimate sequence’ lens. Such analysis provides new averaging methods that tighten the convergence rates of both SVRG and SARAH theoretically, and improve their performance empirically when the step size is chosen large. Then a simple yet effective means of adjusting the number of iterations per inner loop is developed, which completes the tune-free variance reduction together with BB step sizes. Numerical tests corroborate the proposed methods.
Tasks
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09345v1
PDF	https://arxiv.org/pdf/1908.09345v1.pdf
PWC	https://paperswithcode.com/paper/almost-tune-free-variance-reduction
Repo
Framework

Robust Deep Networks with Randomized Tensor Regression Layers


Title	Robust Deep Networks with Randomized Tensor Regression Layers
Authors	Arinbjörn Kolbeinsson, Jean Kossaifi, Yannis Panagakis, Adrian Bulat, Anima Anandkumar, Ioanna Tzoulaki, Paul Matthews
Abstract	In this paper, we propose a novel randomized tensor decomposition for tensor regression. It allows to stochastically approximate the weights of tensor regression layers by randomly sampling in the low-rank subspace. We theoretically and empirically establish the link between our proposed stochastic rank-regularization and the dropout on low-rank tensor regression. This acts as an additional stochastic regularization on the regression weight, which, combined with the deterministic regularization imposed by the low-rank constraint, improves both the performance and robustness of neural networks augmented with it. In particular, it makes the model more robust to adversarial attacks and random noise, without requiring any adversarial training. We perform a thorough study of our method on synthetic data, object classification on the CIFAR100 and ImageNet datasets, and large scale brain-age prediction on UK Biobank brain MRI dataset. We demonstrate superior performance in all cases, as well as significant improvement in robustness to adversarial attacks and random noise.
Tasks	Object Classification
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10758v3
PDF	https://arxiv.org/pdf/1902.10758v3.pdf
PWC	https://paperswithcode.com/paper/stochastically-rank-regularized-tensor
Repo
Framework

Striking the Right Balance with Uncertainty


Title	Striking the Right Balance with Uncertainty
Authors	Salman Khan, Munawar Hayat, Waqas Zamir, Jianbing Shen, Ling Shao
Abstract	Learning unbiased models on imbalanced datasets is a significant challenge. Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples. In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. Subsequently, we present a novel framework for uncertainty based class imbalance learning that follows two key insights: First, classification boundaries should be extended further away from a more uncertain (rare) class to avoid overfitting and enhance its generalization. Second, each sample should be modeled as a multi-variate Gaussian distribution with a mean vector and a covariance matrix defined by the sample’s uncertainty. The learned boundaries should respect not only the individual samples but also their distribution in the feature space. Our proposed approach efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers. We systematically study the class imbalance problem and derive a novel loss formulation for max-margin learning based on Bayesian uncertainty measure. The proposed method shows significant performance improvements on six benchmark datasets for face verification, attribute prediction, digit/object classification and skin lesion detection.
Tasks	Face Verification, Object Classification
Published	2019-01-22
URL	http://arxiv.org/abs/1901.07590v3
PDF	http://arxiv.org/pdf/1901.07590v3.pdf
PWC	https://paperswithcode.com/paper/striking-the-right-balance-with-uncertainty
Repo
Framework

Sound source detection, localization and classification using consecutive ensemble of CRNN models


Title	Sound source detection, localization and classification using consecutive ensemble of CRNN models
Authors	Sławomir Kapka, Mateusz Lewandowski
Abstract	In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival of the second source where the direction of the first one is known and a multi-label classification task. We use custom consecutive ensemble to predict events’ onset, offset, direction of arrival and class. The proposed approach is evaluated on the TAU Spatial Sound Events 2019 - Ambisonic and it is compared with other participants’ submissions.
Tasks	Multi-Label Classification
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00766v2
PDF	https://arxiv.org/pdf/1908.00766v2.pdf
PWC	https://paperswithcode.com/paper/sound-source-detection-localization-and
Repo
Framework

Latent User Linking for Collaborative Cross Domain Recommendation


Title	Latent User Linking for Collaborative Cross Domain Recommendation
Authors	Sapumal Ahangama, Danny Chiang-Choon Poo
Abstract	With the widespread adoption of information systems, recommender systems are widely used for better user experience. Collaborative filtering is a popular approach in implementing recommender systems. Yet, collaborative filtering methods are highly dependent on user feedback, which is often highly sparse and hard to obtain. However, such issues could be alleviated if knowledge from a much denser and a related secondary domain could be used to enhance the recommendation accuracy in the sparse target domain. In this publication, we propose a deep learning method for cross-domain recommender systems through the linking of cross-domain user latent representations as a form of knowledge transfer across domains. We assume that cross-domain similarities of user tastes and behaviors are clearly observable in the low dimensional user latent representations. These user similarities are used to link the domains. As a result, we propose a Variational Autoencoder based network model for cross-domain linking with added contextualization to handle sparse data and for better transfer of cross-domain knowledge. We further extend the model to be more suitable in cold start scenarios and to utilize auxiliary user information for additional gains in recommendation accuracy. The effectiveness of the proposed model was empirically evaluated using multiple datasets. The experiments proved that the proposed model outperforms the state of the art techniques.
Tasks	Recommendation Systems, Transfer Learning
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06583v1
PDF	https://arxiv.org/pdf/1908.06583v1.pdf
PWC	https://paperswithcode.com/paper/latent-user-linking-for-collaborative-cross
Repo
Framework

Lifted Weight Learning of Markov Logic Networks Revisited


Title	Lifted Weight Learning of Markov Logic Networks Revisited
Authors	Ondrej Kuzelka, Vyacheslav Kungurtsev
Abstract	We study lifted weight learning of Markov logic networks. We show that there is an algorithm for maximum-likelihood learning of 2-variable Markov logic networks which runs in time polynomial in the domain size. Our results are based on existing lifted-inference algorithms and recent algorithmic results on computing maximum entropy distributions.
Tasks
Published	2019-03-07
URL	http://arxiv.org/abs/1903.03099v1
PDF	http://arxiv.org/pdf/1903.03099v1.pdf
PWC	https://paperswithcode.com/paper/lifted-weight-learning-of-markov-logic
Repo
Framework

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks


Title	Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
Authors	Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese
Abstract	Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.
Tasks	Decision Making, Visual Navigation
Published	2019-03-09
URL	http://arxiv.org/abs/1903.03878v1
PDF	http://arxiv.org/pdf/1903.03878v1.pdf
PWC	https://paperswithcode.com/paper/scene-memory-transformer-for-embodied-agents
Repo
Framework

CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios


Title	CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios
Authors	Wenhao Ding, Mengdi Xu, Ding Zhao
Abstract	Naturalistic driving trajectories are crucial for the performance of autonomous driving algorithms. However, most of the data is collected in safe scenarios leading to the duplication of trajectories which are easy to be handled by currently developed algorithms. When considering safety, testing algorithms in near-miss scenarios that rarely show up in off-the-shelf datasets is a vital part of the evaluation. As a remedy, we propose a near-miss data synthesizing framework based on Variational Bayesian methods and term it as Conditional Multiple Trajectory Synthesizer (CMTS). We leverage a generative model conditioned on road maps to bridge safe and collision driving data by representing their distribution in the latent space. By sampling from the near-miss distribution, we can synthesize safety-critical data crucial for understanding traffic scenarios but not shown in neither the original dataset nor the collision dataset. Our experimental results demonstrate that the augmented dataset covers more kinds of driving scenarios, especially the near-miss ones, which help improve the trajectory prediction accuracy and the capability of dealing with risky driving scenarios.
Tasks	Autonomous Driving, Trajectory Prediction
Published	2019-09-17
URL	https://arxiv.org/abs/1910.00099v2
PDF	https://arxiv.org/pdf/1910.00099v2.pdf
PWC	https://paperswithcode.com/paper/cmts-conditional-multiple-trajectory
Repo
Framework

Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition


Title	Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition
Authors	Yiheng Huang, Liqiang He, Lei Han, Guangsen Wang, Dan Su
Abstract	The success of speech assistants requires precise recognition of a number of entities on particular contexts. A common solution is to train a class-based n-gram language model and then expand the classes into specific words or phrases. However, when the class has a huge list, e.g., more than 20 million songs, a fully expansion will cause memory explosion. Worse still, the list items in the class need to be updated frequently, which requires a dynamic model updating technique. In this work, we propose to train pruned language models for the word classes to replace the slots in the root n-gram. We further propose to use a novel technique, named Difference Language Model (DLM), to correct the bias from the pruned language models. Once the decoding graph is built, we only need to recalculate the DLM when the entities in word classes are updated. Results show that the proposed method consistently and significantly outperforms the conventional approaches on all datasets, esp. for large lists, which the conventional approaches cannot handle.
Tasks	Language Modelling
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00556v1
PDF	https://arxiv.org/pdf/1909.00556v1.pdf
PWC	https://paperswithcode.com/paper/phrase-level-class-based-language-model-for
Repo
Framework

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds


Title	Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
Authors	Martin Simon, Karl Amende, Andrea Kraus, Jens Honer, Timo Sämann, Hauke Kaulbersch, Stefan Milz, Horst Michael Gross
Abstract	Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.
Tasks	3D Object Detection, Autonomous Driving, Object Detection, Semantic Segmentation
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07537v1
PDF	http://arxiv.org/pdf/1904.07537v1.pdf
PWC	https://paperswithcode.com/paper/complexer-yolo-real-time-3d-object-detection
Repo
Framework

Flexibly Fair Representation Learning by Disentanglement


Title	Flexibly Fair Representation Learning by Disentanglement
Authors	Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel
Abstract	We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also \emph{flexibly fair}, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions. We show empirically that the resulting encoder—which does not require the sensitive attributes for inference—enables the adaptation of a single representation to a variety of fair classification tasks with new target labels and subgroup definitions.
Tasks	Representation Learning
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02589v1
PDF	https://arxiv.org/pdf/1906.02589v1.pdf
PWC	https://paperswithcode.com/paper/flexibly-fair-representation-learning-by
Repo
Framework

Patient trajectory prediction in the Mimic-III dataset, challenges and pitfalls


Title	Patient trajectory prediction in the Mimic-III dataset, challenges and pitfalls
Authors	Jose F Rodrigues-Jr, Gabriel Spadon, Bruno Brandoli, Sihem Amer-Yahia
Abstract	Automated medical prognosis has gained interest as artificial intelligence evolves and the potential for computer-aided medicine becomes evident. Nevertheless, it is challenging to design an effective system that, given a patient’s medical history, is able to predict probable future conditions. Previous works, mostly carried out over private datasets, have tackled the problem by using artificial neural network architectures that cannot deal with low-cardinality datasets, or by means of non-generalizable inference approaches. We introduce a Deep Learning architecture whose design results from an intensive experimental process. The final architecture is based on two parallel Minimal Gated Recurrent Unit networks working in bi-directional manner, which was extensively tested with the open-access Mimic-III dataset. Our results demonstrate significant improvements in automated medical prognosis, as measured with Recall@k. We summarize our experience as a set of relevant insights for the design of Deep Learning architectures. Our work improves the performance of computer-aided medicine and can serve as a guide in designing artificial neural networks used in prediction tasks.
Tasks	Trajectory Prediction
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04605v4
PDF	https://arxiv.org/pdf/1909.04605v4.pdf
PWC	https://paperswithcode.com/paper/patient-trajectory-prediction-in-the-mimic
Repo
Framework