Paper Group ANR 1740
End-to-end Anchored Speech Recognition. Social and Scene-Aware Trajectory Prediction in Crowded Spaces. Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs. Almost Tune-Free Variance Reduction. Robust Deep Networks with Randomized Tensor Regression Layers. Striking the Right Balance with Uncertainty. Sound source detectio …
End-to-end Anchored Speech Recognition
Title | End-to-end Anchored Speech Recognition |
Authors | Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister |
Abstract | Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device-directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from the “anchored segment”. The anchored segment refers to the wake-up word part of an audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise. The first method is called “Multi-source Attention” where the attention mechanism takes both the speaker information and decoder state into consideration. The second method directly learns a frame-level mask on top of the encoder output. We also explore a multi-task learning setup where we use the ground truth of the mask to guide the learner. Given that audio data with interfering speech is rare in our training data set, we also propose a way to synthesize “noisy” speech from “clean” speech to mitigate the mismatch between training and test data. Our proposed methods show up to 15% relative reduction in WER for Amazon Alexa live data with interfering background speech without significantly degrading on clean speech. |
Tasks | Multi-Task Learning, Speech Recognition |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02383v1 |
http://arxiv.org/pdf/1902.02383v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-anchored-speech-recognition |
Repo | |
Framework | |
Social and Scene-Aware Trajectory Prediction in Crowded Spaces
Title | Social and Scene-Aware Trajectory Prediction in Crowded Spaces |
Authors | Matteo Lisotto, Pasquale Coscia, Lamberto Ballan |
Abstract | Mimicking human ability to forecast future positions or interpret complex interactions in urban scenarios, such as streets, shopping malls or squares, is essential to develop socially compliant robots or self-driving cars. Autonomous systems may gain advantage on anticipating human motion to avoid collisions or to naturally behave alongside people. To foresee plausible trajectories, we construct an LSTM (long short-term memory)-based model considering three fundamental factors: people interactions, past observations in terms of previously crossed areas and semantics of surrounding space. Our model encompasses several pooling mechanisms to join the above elements defining multiple tensors, namely social, navigation and semantic tensors. The network is tested in unstructured environments where complex paths emerge according to both internal (intentions) and external (other people, not accessible areas) motivations. As demonstrated, modeling paths unaware of social interactions or context information, is insufficient to correctly predict future positions. Experimental results corroborate the effectiveness of the proposed framework in comparison to LSTM-based models for human path prediction. |
Tasks | Self-Driving Cars, Trajectory Prediction |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.08840v1 |
https://arxiv.org/pdf/1909.08840v1.pdf | |
PWC | https://paperswithcode.com/paper/social-and-scene-aware-trajectory-prediction |
Repo | |
Framework | |
Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs
Title | Characterizing Distribution Equivalence for Cyclic and Acyclic Directed Graphs |
Authors | AmirEmad Ghassami, Kun Zhang, Negar Kiyavash |
Abstract | The main way for defining equivalence among acyclic directed graphs is based on the conditional independencies of the distributions that they can generate. However, it is known that when cycles are allowed in the structure, conditional independence is not a suitable notion for equivalence of two structures, as it does not reflect all the information in the distribution that can be used for identification of the underlying structure. In this paper, we present a general, unified notion of equivalence for linear Gaussian directed graphs. Our proposed definition for equivalence is based on the set of distributions that the structure is able to generate. We take a first step towards devising methods for characterizing the equivalence of two structures, which may be cyclic or acyclic. Additionally, we propose a score-based method for learning the structure from observational data. |
Tasks | |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12993v1 |
https://arxiv.org/pdf/1910.12993v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-distribution-equivalence-for |
Repo | |
Framework | |
Almost Tune-Free Variance Reduction
Title | Almost Tune-Free Variance Reduction |
Authors | Bingcong Li, Lingda Wang, Georgios B. Giannakis |
Abstract | The variance reduction class of algorithms including the representative ones, abbreviated as SVRG and SARAH, have well documented merits for empirical risk minimization tasks. However, they require grid search to optimally tune parameters (step size and the number of iterations per inner loop) for best performance. This work introduces almost tune-free' SVRG and SARAH schemes by equipping them with Barzilai-Borwein (BB) step sizes. To achieve the best performance, both i) averaging schemes; and, ii) the inner loop length are adjusted according to the BB step size. SVRG and SARAH are first reexamined through an estimate sequence’ lens. Such analysis provides new averaging methods that tighten the convergence rates of both SVRG and SARAH theoretically, and improve their performance empirically when the step size is chosen large. Then a simple yet effective means of adjusting the number of iterations per inner loop is developed, which completes the tune-free variance reduction together with BB step sizes. Numerical tests corroborate the proposed methods. |
Tasks | |
Published | 2019-08-25 |
URL | https://arxiv.org/abs/1908.09345v1 |
https://arxiv.org/pdf/1908.09345v1.pdf | |
PWC | https://paperswithcode.com/paper/almost-tune-free-variance-reduction |
Repo | |
Framework | |
Robust Deep Networks with Randomized Tensor Regression Layers
Title | Robust Deep Networks with Randomized Tensor Regression Layers |
Authors | Arinbjörn Kolbeinsson, Jean Kossaifi, Yannis Panagakis, Adrian Bulat, Anima Anandkumar, Ioanna Tzoulaki, Paul Matthews |
Abstract | In this paper, we propose a novel randomized tensor decomposition for tensor regression. It allows to stochastically approximate the weights of tensor regression layers by randomly sampling in the low-rank subspace. We theoretically and empirically establish the link between our proposed stochastic rank-regularization and the dropout on low-rank tensor regression. This acts as an additional stochastic regularization on the regression weight, which, combined with the deterministic regularization imposed by the low-rank constraint, improves both the performance and robustness of neural networks augmented with it. In particular, it makes the model more robust to adversarial attacks and random noise, without requiring any adversarial training. We perform a thorough study of our method on synthetic data, object classification on the CIFAR100 and ImageNet datasets, and large scale brain-age prediction on UK Biobank brain MRI dataset. We demonstrate superior performance in all cases, as well as significant improvement in robustness to adversarial attacks and random noise. |
Tasks | Object Classification |
Published | 2019-02-27 |
URL | https://arxiv.org/abs/1902.10758v3 |
https://arxiv.org/pdf/1902.10758v3.pdf | |
PWC | https://paperswithcode.com/paper/stochastically-rank-regularized-tensor |
Repo | |
Framework | |
Striking the Right Balance with Uncertainty
Title | Striking the Right Balance with Uncertainty |
Authors | Salman Khan, Munawar Hayat, Waqas Zamir, Jianbing Shen, Ling Shao |
Abstract | Learning unbiased models on imbalanced datasets is a significant challenge. Rare classes tend to get a concentrated representation in the classification space which hampers the generalization of learned boundaries to new test examples. In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. Subsequently, we present a novel framework for uncertainty based class imbalance learning that follows two key insights: First, classification boundaries should be extended further away from a more uncertain (rare) class to avoid overfitting and enhance its generalization. Second, each sample should be modeled as a multi-variate Gaussian distribution with a mean vector and a covariance matrix defined by the sample’s uncertainty. The learned boundaries should respect not only the individual samples but also their distribution in the feature space. Our proposed approach efficiently utilizes sample and class uncertainty information to learn robust features and more generalizable classifiers. We systematically study the class imbalance problem and derive a novel loss formulation for max-margin learning based on Bayesian uncertainty measure. The proposed method shows significant performance improvements on six benchmark datasets for face verification, attribute prediction, digit/object classification and skin lesion detection. |
Tasks | Face Verification, Object Classification |
Published | 2019-01-22 |
URL | http://arxiv.org/abs/1901.07590v3 |
http://arxiv.org/pdf/1901.07590v3.pdf | |
PWC | https://paperswithcode.com/paper/striking-the-right-balance-with-uncertainty |
Repo | |
Framework | |
Sound source detection, localization and classification using consecutive ensemble of CRNN models
Title | Sound source detection, localization and classification using consecutive ensemble of CRNN models |
Authors | Sławomir Kapka, Mateusz Lewandowski |
Abstract | In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival of the second source where the direction of the first one is known and a multi-label classification task. We use custom consecutive ensemble to predict events’ onset, offset, direction of arrival and class. The proposed approach is evaluated on the TAU Spatial Sound Events 2019 - Ambisonic and it is compared with other participants’ submissions. |
Tasks | Multi-Label Classification |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00766v2 |
https://arxiv.org/pdf/1908.00766v2.pdf | |
PWC | https://paperswithcode.com/paper/sound-source-detection-localization-and |
Repo | |
Framework | |
Latent User Linking for Collaborative Cross Domain Recommendation
Title | Latent User Linking for Collaborative Cross Domain Recommendation |
Authors | Sapumal Ahangama, Danny Chiang-Choon Poo |
Abstract | With the widespread adoption of information systems, recommender systems are widely used for better user experience. Collaborative filtering is a popular approach in implementing recommender systems. Yet, collaborative filtering methods are highly dependent on user feedback, which is often highly sparse and hard to obtain. However, such issues could be alleviated if knowledge from a much denser and a related secondary domain could be used to enhance the recommendation accuracy in the sparse target domain. In this publication, we propose a deep learning method for cross-domain recommender systems through the linking of cross-domain user latent representations as a form of knowledge transfer across domains. We assume that cross-domain similarities of user tastes and behaviors are clearly observable in the low dimensional user latent representations. These user similarities are used to link the domains. As a result, we propose a Variational Autoencoder based network model for cross-domain linking with added contextualization to handle sparse data and for better transfer of cross-domain knowledge. We further extend the model to be more suitable in cold start scenarios and to utilize auxiliary user information for additional gains in recommendation accuracy. The effectiveness of the proposed model was empirically evaluated using multiple datasets. The experiments proved that the proposed model outperforms the state of the art techniques. |
Tasks | Recommendation Systems, Transfer Learning |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06583v1 |
https://arxiv.org/pdf/1908.06583v1.pdf | |
PWC | https://paperswithcode.com/paper/latent-user-linking-for-collaborative-cross |
Repo | |
Framework | |
Lifted Weight Learning of Markov Logic Networks Revisited
Title | Lifted Weight Learning of Markov Logic Networks Revisited |
Authors | Ondrej Kuzelka, Vyacheslav Kungurtsev |
Abstract | We study lifted weight learning of Markov logic networks. We show that there is an algorithm for maximum-likelihood learning of 2-variable Markov logic networks which runs in time polynomial in the domain size. Our results are based on existing lifted-inference algorithms and recent algorithmic results on computing maximum entropy distributions. |
Tasks | |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.03099v1 |
http://arxiv.org/pdf/1903.03099v1.pdf | |
PWC | https://paperswithcode.com/paper/lifted-weight-learning-of-markov-logic |
Repo | |
Framework | |
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
Title | Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks |
Authors | Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese |
Abstract | Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin. |
Tasks | Decision Making, Visual Navigation |
Published | 2019-03-09 |
URL | http://arxiv.org/abs/1903.03878v1 |
http://arxiv.org/pdf/1903.03878v1.pdf | |
PWC | https://paperswithcode.com/paper/scene-memory-transformer-for-embodied-agents |
Repo | |
Framework | |
CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios
Title | CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios |
Authors | Wenhao Ding, Mengdi Xu, Ding Zhao |
Abstract | Naturalistic driving trajectories are crucial for the performance of autonomous driving algorithms. However, most of the data is collected in safe scenarios leading to the duplication of trajectories which are easy to be handled by currently developed algorithms. When considering safety, testing algorithms in near-miss scenarios that rarely show up in off-the-shelf datasets is a vital part of the evaluation. As a remedy, we propose a near-miss data synthesizing framework based on Variational Bayesian methods and term it as Conditional Multiple Trajectory Synthesizer (CMTS). We leverage a generative model conditioned on road maps to bridge safe and collision driving data by representing their distribution in the latent space. By sampling from the near-miss distribution, we can synthesize safety-critical data crucial for understanding traffic scenarios but not shown in neither the original dataset nor the collision dataset. Our experimental results demonstrate that the augmented dataset covers more kinds of driving scenarios, especially the near-miss ones, which help improve the trajectory prediction accuracy and the capability of dealing with risky driving scenarios. |
Tasks | Autonomous Driving, Trajectory Prediction |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1910.00099v2 |
https://arxiv.org/pdf/1910.00099v2.pdf | |
PWC | https://paperswithcode.com/paper/cmts-conditional-multiple-trajectory |
Repo | |
Framework | |
Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition
Title | Phrase-Level Class based Language Model for Mandarin Smart Speaker Query Recognition |
Authors | Yiheng Huang, Liqiang He, Lei Han, Guangsen Wang, Dan Su |
Abstract | The success of speech assistants requires precise recognition of a number of entities on particular contexts. A common solution is to train a class-based n-gram language model and then expand the classes into specific words or phrases. However, when the class has a huge list, e.g., more than 20 million songs, a fully expansion will cause memory explosion. Worse still, the list items in the class need to be updated frequently, which requires a dynamic model updating technique. In this work, we propose to train pruned language models for the word classes to replace the slots in the root n-gram. We further propose to use a novel technique, named Difference Language Model (DLM), to correct the bias from the pruned language models. Once the decoding graph is built, we only need to recalculate the DLM when the entities in word classes are updated. Results show that the proposed method consistently and significantly outperforms the conventional approaches on all datasets, esp. for large lists, which the conventional approaches cannot handle. |
Tasks | Language Modelling |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.00556v1 |
https://arxiv.org/pdf/1909.00556v1.pdf | |
PWC | https://paperswithcode.com/paper/phrase-level-class-based-language-model-for |
Repo | |
Framework | |
Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
Title | Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds |
Authors | Martin Simon, Karl Amende, Andrea Kraus, Jens Honer, Timo Sämann, Hauke Kaulbersch, Stefan Milz, Horst Michael Gross |
Abstract | Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection. |
Tasks | 3D Object Detection, Autonomous Driving, Object Detection, Semantic Segmentation |
Published | 2019-04-16 |
URL | http://arxiv.org/abs/1904.07537v1 |
http://arxiv.org/pdf/1904.07537v1.pdf | |
PWC | https://paperswithcode.com/paper/complexer-yolo-real-time-3d-object-detection |
Repo | |
Framework | |
Flexibly Fair Representation Learning by Disentanglement
Title | Flexibly Fair Representation Learning by Disentanglement |
Authors | Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel |
Abstract | We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also \emph{flexibly fair}, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions. We show empirically that the resulting encoder—which does not require the sensitive attributes for inference—enables the adaptation of a single representation to a variety of fair classification tasks with new target labels and subgroup definitions. |
Tasks | Representation Learning |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02589v1 |
https://arxiv.org/pdf/1906.02589v1.pdf | |
PWC | https://paperswithcode.com/paper/flexibly-fair-representation-learning-by |
Repo | |
Framework | |
Patient trajectory prediction in the Mimic-III dataset, challenges and pitfalls
Title | Patient trajectory prediction in the Mimic-III dataset, challenges and pitfalls |
Authors | Jose F Rodrigues-Jr, Gabriel Spadon, Bruno Brandoli, Sihem Amer-Yahia |
Abstract | Automated medical prognosis has gained interest as artificial intelligence evolves and the potential for computer-aided medicine becomes evident. Nevertheless, it is challenging to design an effective system that, given a patient’s medical history, is able to predict probable future conditions. Previous works, mostly carried out over private datasets, have tackled the problem by using artificial neural network architectures that cannot deal with low-cardinality datasets, or by means of non-generalizable inference approaches. We introduce a Deep Learning architecture whose design results from an intensive experimental process. The final architecture is based on two parallel Minimal Gated Recurrent Unit networks working in bi-directional manner, which was extensively tested with the open-access Mimic-III dataset. Our results demonstrate significant improvements in automated medical prognosis, as measured with Recall@k. We summarize our experience as a set of relevant insights for the design of Deep Learning architectures. Our work improves the performance of computer-aided medicine and can serve as a guide in designing artificial neural networks used in prediction tasks. |
Tasks | Trajectory Prediction |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04605v4 |
https://arxiv.org/pdf/1909.04605v4.pdf | |
PWC | https://paperswithcode.com/paper/patient-trajectory-prediction-in-the-mimic |
Repo | |
Framework | |