July 29, 2019

3319 words 16 mins read

Paper Group ANR 89

Learning spectro-temporal features with 3D CNNs for speech emotion recognition. Evaluation of Deep Learning on an Abstract Image Classification Dataset. Unsupervised neural and Bayesian models for zero-resource speech processing. No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models. Improved Lossy Image Compr …

Learning spectro-temporal features with 3D CNNs for speech emotion recognition


Title	Learning spectro-temporal features with 3D CNNs for speech emotion recognition
Authors	Jaebok Kim, Khiet P. Truong, Gwenn Englebienne, Vanessa Evers
Abstract	In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed 3D CNNs simultaneously extract short-term and long-term spectral features with a moderate number of parameters. We evaluated our proposed and other state-of-the-art methods in a speaker-independent manner using aggregated corpora that give a large and diverse set of speakers. We found that 1) shallow temporal and moderately deep spectral kernels of a homogeneous architecture are optimal for the task; and 2) our 3D CNNs are more effective for spectro-temporal feature learning compared to other methods. Finally, we visualised the feature space obtained with our proposed method using t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct clusters of emotions.
Tasks	Emotion Recognition, Speech Emotion Recognition
Published	2017-08-14
URL	http://arxiv.org/abs/1708.05071v1
PDF	http://arxiv.org/pdf/1708.05071v1.pdf
PWC	https://paperswithcode.com/paper/learning-spectro-temporal-features-with-3d
Repo
Framework

Evaluation of Deep Learning on an Abstract Image Classification Dataset


Title	Evaluation of Deep Learning on an Abstract Image Classification Dataset
Authors	Sebastian Stabinger, Antonio Rodriguez-Sanchez
Abstract	Convolutional Neural Networks have become state of the art methods for image classification over the last couple of years. By now they perform better than human subjects on many of the image classification datasets. Most of these datasets are based on the notion of concrete classes (i.e. images are classified by the type of object in the image). In this paper we present a novel image classification dataset, using abstract classes, which should be easy to solve for humans, but variations of it are challenging for CNNs. The classification performance of popular CNN architectures is evaluated on this dataset and variations of the dataset that might be interesting for further research are identified.
Tasks	Image Classification
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07770v1
PDF	http://arxiv.org/pdf/1708.07770v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-deep-learning-on-an-abstract
Repo
Framework

Unsupervised neural and Bayesian models for zero-resource speech processing


Title	Unsupervised neural and Bayesian models for zero-resource speech processing
Authors	Herman Kamper
Abstract	In settings where only unlabelled speech data is available, zero-resource speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. In this thesis, we argue that a combination of top-down and bottom-up modelling is advantageous in tackling these two problems. To address the problem of frame-level representation learning, we present the correspondence autoencoder (cAE), a neural network trained with weak top-down supervision from an unsupervised term discovery system. By combining this top-down supervision with unsupervised bottom-up initialization, the cAE yields much more discriminative features than previous approaches. We then present our unsupervised segmental Bayesian model that segments and clusters unlabelled speech into hypothesized words. By imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, our system outperforms several others on multi-speaker conversational English and Xitsonga speech data. Finally, we show that the clusters discovered by the segmental Bayesian model can be made less speaker- and gender-specific by using features from the cAE instead of traditional acoustic features. In summary, the different models and systems presented in this thesis show that both top-down and bottom-up modelling can improve representation learning, segmentation and clustering of unlabelled speech data.
Tasks	Language Modelling, Representation Learning
Published	2017-01-03
URL	http://arxiv.org/abs/1701.00851v1
PDF	http://arxiv.org/pdf/1701.00851v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-neural-and-bayesian-models-for
Repo
Framework

No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models


Title	No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models
Authors	Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu
Abstract	For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine phoneme-based end-to-end models, which are contrasted against grapheme-based ones on a large vocabulary English Voice-search task, where we find that graphemes do indeed outperform phonemes. We also compare grapheme and phoneme-based approaches on a multi-dialect English task, which once again confirm the superiority of graphemes, greatly simplifying the system for recognizing multiple dialects.
Tasks	Language Modelling
Published	2017-12-05
URL	http://arxiv.org/abs/1712.01864v1
PDF	http://arxiv.org/pdf/1712.01864v1.pdf
PWC	https://paperswithcode.com/paper/no-need-for-a-lexicon-evaluating-the-value-of
Repo
Framework

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks


Title	Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks
Authors	Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, George Toderici
Abstract	We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several metrics. Second, we modify the recurrent architecture to improve spatial diffusion, which allows the network to more effectively capture and propagate image information through the network’s hidden state. Finally, in addition to lossless entropy coding, we use a spatially adaptive bit allocation algorithm to more efficiently use the limited number of bits to encode visually complex image regions. We evaluate our method on the Kodak and Tecnick image sets and compare against standard codecs as well recently published methods based on deep neural networks.
Tasks	Image Compression
Published	2017-03-29
URL	http://arxiv.org/abs/1703.10114v1
PDF	http://arxiv.org/pdf/1703.10114v1.pdf
PWC	https://paperswithcode.com/paper/improved-lossy-image-compression-with-priming
Repo
Framework

Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression


Title	Restricted Eigenvalue from Stable Rank with Applications to Sparse Linear Regression
Authors	Shiva Prasad Kasiviswanathan, Mark Rudelson
Abstract	High-dimensional settings, where the data dimension ($d$) far exceeds the number of observations ($n$), are common in many statistical and machine learning applications. Methods based on $\ell_1$-relaxation, such as Lasso, are very popular for sparse recovery in these settings. Restricted Eigenvalue (RE) condition is among the weakest, and hence the most general, condition in literature imposed on the Gram matrix that guarantees nice statistical properties for the Lasso estimator. It is natural to ask: what families of matrices satisfy the RE condition? Following a line of work in this area, we construct a new broad ensemble of dependent random design matrices that have an explicit RE bound. Our construction starts with a fixed (deterministic) matrix $X \in \mathbb{R}^{n \times d}$ satisfying a simple stable rank condition, and we show that a matrix drawn from the distribution $X \Phi^\top \Phi$, where $\Phi \in \mathbb{R}^{m \times d}$ is a subgaussian random matrix, with high probability, satisfies the RE condition. This construction allows incorporating a fixed matrix that has an easily {\em verifiable} condition into the design process, and allows for generation of {\em compressed} design matrices that have a lower storage requirement than a standard design matrix. We give two applications of this construction to sparse linear regression problems, including one to a compressed sparse regression setting where the regression algorithm only has access to a compressed representation of a fixed design matrix $X$.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.08092v4
PDF	http://arxiv.org/pdf/1707.08092v4.pdf
PWC	https://paperswithcode.com/paper/restricted-eigenvalue-from-stable-rank-with
Repo
Framework

Providing Self-Aware Systems with Reflexivity


Title	Providing Self-Aware Systems with Reflexivity
Authors	Alessandro Valitutti, Giuseppe Trautteur
Abstract	We propose a new type of self-aware systems inspired by ideas from higher-order theories of consciousness. First, we discussed the crucial distinction between introspection and reflexion. Then, we focus on computational reflexion as a mechanism by which a computer program can inspect its own code at every stage of the computation. Finally, we provide a formal definition and a proof-of-concept implementation of computational reflexion, viewed as an enriched form of program interpretation and a way to dynamically “augment” a computational process.
Tasks
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08901v1
PDF	http://arxiv.org/pdf/1707.08901v1.pdf
PWC	https://paperswithcode.com/paper/providing-self-aware-systems-with-reflexivity
Repo
Framework

Mining Smart Card Data for Travelers’ Mini Activities


Title	Mining Smart Card Data for Travelers’ Mini Activities
Authors	Boris Chidlovskii
Abstract	In the context of public transport modeling and simulation, we address the problem of mismatch between simulated transit trips and observed ones. We point to the weakness of the current travel demand modeling process; the trips it generates are over-optimistic and do not reflect the real passenger choices. We introduce the notion of mini activities the travelers do during the trips; they can explain the deviation of simulated trips from the observed trips. We propose to mine the smart card data to extract the mini activities. We develop a technique to integrate them in the generated trips and learn such an integration from two available sources, the trip history and trip planner recommendations. For an input travel demand, we build a Markov chain over the trip collection and apply the Monte Carlo Markov Chain algorithm to integrate mini activities in such a way that the selected characteristics converge to the desired distributions. We test our method in different settings on the passenger trip collection of Nancy, France. We report experimental results demonstrating a very important mismatch reduction.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06935v1
PDF	http://arxiv.org/pdf/1712.06935v1.pdf
PWC	https://paperswithcode.com/paper/mining-smart-card-data-for-travelers-mini
Repo
Framework

Robust Optimization of Unconstrained Binary Quadratic Problems


Title	Robust Optimization of Unconstrained Binary Quadratic Problems
Authors	Mark Lewis, Gary Kochenberger, John Metcalfe
Abstract	In this paper we focus on the unconstrained binary quadratic optimization model, maximize x^t Qx, x binary, and consider the problem of identifying optimal solutions that are robust with respect to perturbations in the Q matrix.. We are motivated to find robust, or stable, solutions because of the uncertainty inherent in the big data origins of Q and limitations in computer numerical precision, particularly in a new class of quantum annealing computers. Experimental design techniques are used to generate a diverse subset of possible scenarios, from which robust solutions are identified. An illustrative example with practical application to business decision making is examined. The approach presented also generates a surface response equation which is used to estimate upper bounds in constant time for Q instantiations within the scenario extremes. In addition, a theoretical framework for the robustness of individual x_i variables is considered by examining the range of Q values over which the x_i are predetermined.
Tasks	Decision Making
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07511v1
PDF	http://arxiv.org/pdf/1709.07511v1.pdf
PWC	https://paperswithcode.com/paper/robust-optimization-of-unconstrained-binary
Repo
Framework

Relevance-based Word Embedding


Title	Relevance-based Word Embedding
Authors	Hamed Zamani, W. Bruce Croft
Abstract	Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.
Tasks	Information Retrieval, Semantic Similarity, Semantic Textual Similarity
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03556v2
PDF	http://arxiv.org/pdf/1705.03556v2.pdf
PWC	https://paperswithcode.com/paper/relevance-based-word-embedding
Repo
Framework

Deep Speaker Verification: Do We Need End to End?


Title	Deep Speaker Verification: Do We Need End to End?
Authors	Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng
Abstract	End-to-end learning treats the entire system as a whole adaptable black box, which, if sufficient data are available, may learn a system that works very well for the target task. This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric. An opposite approach to end-to-end is feature learning, which firstly trains a feature learning model, and then constructs a back-end classifier separately to perform SV. Recently, both approaches achieved significant performance gains on SV, mainly attributed to the smart utilization of deep neural networks. However, the two approaches have not been carefully compared, and their respective advantages have not been well discussed. In this paper, we compare the end-to-end and feature learning approaches on a text-independent SV task. Our experiments on a dataset sampled from the Fisher database and involving 5,000 speakers demonstrated that the feature learning approach outperformed the end-to-end approach. This is a strong support for the feature learning approach, at least with data and computation resources similar to ours.
Tasks	Speaker Verification
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07859v1
PDF	http://arxiv.org/pdf/1706.07859v1.pdf
PWC	https://paperswithcode.com/paper/deep-speaker-verification-do-we-need-end-to
Repo
Framework

Perceiving and Reasoning About Liquids Using Fully Convolutional Networks


Title	Perceiving and Reasoning About Liquids Using Fully Convolutional Networks
Authors	Conor Schenck, Dieter Fox
Abstract	Liquids are an important part of many common manipulation tasks in human environments. If we wish to have robots that can accomplish these types of tasks, they must be able to interact with liquids in an intelligent manner. In this paper, we investigate ways for robots to perceive and reason about liquids. That is, a robot asks the questions What in the visual data stream is liquid? and How can I use that to infer all the potential places where liquid might be? We collected two datasets to evaluate these questions, one using a realistic liquid simulator and another on our robot. We used fully convolutional neural networks to learn to detect and track liquids across pouring sequences. Our results show that these networks are able to perceive and reason about liquids, and that integrating temporal information is important to performing such tasks well.
Tasks
Published	2017-03-05
URL	http://arxiv.org/abs/1703.01564v2
PDF	http://arxiv.org/pdf/1703.01564v2.pdf
PWC	https://paperswithcode.com/paper/perceiving-and-reasoning-about-liquids-using
Repo
Framework

An Investigation of Newton-Sketch and Subsampled Newton Methods


Title	An Investigation of Newton-Sketch and Subsampled Newton Methods
Authors	Albert S. Berahas, Raghu Bollapragada, Jorge Nocedal
Abstract	Sketching, a dimensionality reduction technique, has received much attention in the statistics community. In this paper, we study sketching in the context of Newton’s method for solving finite-sum optimization problems in which the number of variables and data points are both large. We study two forms of sketching that perform dimensionality reduction in data space: Hessian subsampling and randomized Hadamard transformations. Each has its own advantages, and their relative tradeoffs have not been investigated in the optimization literature. Our study focuses on practical versions of the two methods in which the resulting linear systems of equations are solved approximately, at every iteration, using an iterative solver. The advantages of using the conjugate gradient method vs. a stochastic gradient iteration are revealed through a set of numerical experiments, and a complexity analysis of the Hessian subsampling method is presented.
Tasks	Dimensionality Reduction
Published	2017-05-17
URL	https://arxiv.org/abs/1705.06211v4
PDF	https://arxiv.org/pdf/1705.06211v4.pdf
PWC	https://paperswithcode.com/paper/an-investigation-of-newton-sketch-and
Repo
Framework

Discovering objects and their relations from entangled scene representations


Title	Discovering objects and their relations from entangled scene representations
Authors	David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, Peter Battaglia
Abstract	Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and their relations for learning a wide spectrum of tasks, and more generally when learning the structure underlying observed data. In this work, we introduce relation networks (RNs) - a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning.
Tasks	One-Shot Learning
Published	2017-02-16
URL	http://arxiv.org/abs/1702.05068v1
PDF	http://arxiv.org/pdf/1702.05068v1.pdf
PWC	https://paperswithcode.com/paper/discovering-objects-and-their-relations-from
Repo
Framework

Attentive Memory Networks: Efficient Machine Reading for Conversational Search


Title	Attentive Memory Networks: Efficient Machine Reading for Conversational Search
Authors	Tom Kenter, Maarten de Rijke
Abstract	Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to keep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a stand-alone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a stand-alone task and present the Attentive Memory Network (AMN), an end-to-end trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the state-of-the-art models, while using considerably fewer computations.
Tasks	Information Retrieval, Reading Comprehension
Published	2017-12-19
URL	http://arxiv.org/abs/1712.07229v1
PDF	http://arxiv.org/pdf/1712.07229v1.pdf
PWC	https://paperswithcode.com/paper/attentive-memory-networks-efficient-machine
Repo
Framework