Paper Group ANR 318
Deep Learning for Distant Speech Recognition. Highlighting objects of interest in an image by integrating saliency and depth. Control of the Correlation of Spontaneous Neuron Activity in Biological and Noise-activated CMOS Artificial Neural Microcircuits. BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Appli …
Deep Learning for Distant Speech Recognition
Title | Deep Learning for Distant Speech Recognition |
Authors | Mirco Ravanelli |
Abstract | Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks. |
Tasks | Distant Speech Recognition, Speech Recognition |
Published | 2017-12-17 |
URL | http://arxiv.org/abs/1712.06086v1 |
http://arxiv.org/pdf/1712.06086v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-distant-speech-recognition |
Repo | |
Framework | |
Highlighting objects of interest in an image by integrating saliency and depth
Title | Highlighting objects of interest in an image by integrating saliency and depth |
Authors | Subhayan Mukherjee, Irene Cheng, Anup Basu |
Abstract | Stereo images have been captured primarily for 3D reconstruction in the past. However, the depth information acquired from stereo can also be used along with saliency to highlight certain objects in a scene. This approach can be used to make still images more interesting to look at, and highlight objects of interest in the scene. We introduce this novel direction in this paper, and discuss the theoretical framework behind the approach. Even though we use depth from stereo in this work, our approach is applicable to depth data acquired from any sensor modality. Experimental results on both indoor and outdoor scenes demonstrate the benefits of our algorithm. |
Tasks | 3D Reconstruction |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10515v1 |
http://arxiv.org/pdf/1711.10515v1.pdf | |
PWC | https://paperswithcode.com/paper/highlighting-objects-of-interest-in-an-image |
Repo | |
Framework | |
Control of the Correlation of Spontaneous Neuron Activity in Biological and Noise-activated CMOS Artificial Neural Microcircuits
Title | Control of the Correlation of Spontaneous Neuron Activity in Biological and Noise-activated CMOS Artificial Neural Microcircuits |
Authors | Ramin M. Hasani, Giorgio Ferrari, Hideaki Yamamoto, Sho Kono, Koji Ishihara, Soya Fujimori, Takashi Tanii, Enrico Prati |
Abstract | There are several indications that brain is organized not on a basis of individual unreliable neurons, but on a micro-circuital scale providing Lego blocks employed to create complex architectures. At such an intermediate scale, the firing activity in the microcircuits is governed by collective effects emerging by the background noise soliciting spontaneous firing, the degree of mutual connections between the neurons, and the topology of the connections. We compare spontaneous firing activity of small populations of neurons adhering to an engineered scaffold with simulations of biologically plausible CMOS artificial neuron populations whose spontaneous activity is ignited by tailored background noise. We provide a full set of flexible and low-power consuming silicon blocks including neurons, excitatory and inhibitory synapses, and both white and pink noise generators for spontaneous firing activation. We achieve a comparable degree of correlation of the firing activity of the biological neurons by controlling the kind and the number of connection among the silicon neurons. The correlation between groups of neurons, organized as a ring of four distinct populations connected by the equivalent of interneurons, is triggered more effectively by adding multiple synapses to the connections than increasing the number of independent point-to-point connections. The comparison between the biological and the artificial systems suggests that a considerable number of synapses is active also in biological populations adhering to engineered scaffolds. |
Tasks | |
Published | 2017-02-24 |
URL | http://arxiv.org/abs/1702.07426v1 |
http://arxiv.org/pdf/1702.07426v1.pdf | |
PWC | https://paperswithcode.com/paper/control-of-the-correlation-of-spontaneous |
Repo | |
Framework | |
BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition
Title | BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition |
Authors | Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee |
Abstract | Despite the remarkable progress achieved on automatic speech recognition, recognizing far-field speeches mixed with various noise sources is still a challenging task. In this paper, we introduce novel student-teacher transfer learning, BridgeNet which can provide a solution to improve distant speech recognition. There are two key features in BridgeNet. First, BridgeNet extends traditional student-teacher frameworks by providing multiple hints from a teacher network. Hints are not limited to the soft labels from a teacher network. Teacher’s intermediate feature representations can better guide a student network to learn how to denoise or dereverberate noisy input. Second, the proposed recursive architecture in the BridgeNet can iteratively improve denoising and recognition performance. The experimental results of BridgeNet showed significant improvements in tackling the distant speech recognition problem, where it achieved up to 13.24% relative WER reductions on AMI corpus compared to a baseline neural network without teacher’s hints. |
Tasks | Denoising, Distant Speech Recognition, Speech Recognition, Transfer Learning |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10224v3 |
http://arxiv.org/pdf/1710.10224v3.pdf | |
PWC | https://paperswithcode.com/paper/bridgenets-student-teacher-transfer-learning |
Repo | |
Framework | |
Progressively Diffused Networks for Semantic Image Segmentation
Title | Progressively Diffused Networks for Semantic Image Segmentation |
Authors | Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin |
Abstract | This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application. Prior neural networks, such as ResNet, tend to enhance representational power by increasing the depth of architectures and driving the training objective across layers. However, we argue that spatial dependencies in different layers, which generally represent the rich contexts among data elements, are also critical to building deep and discriminative representations. To this end, our PDNs enables to progressively broadcast information over the learned feature maps by inserting a stack of information diffusion layers, each of which exploits multi-dimensional convolutional LSTMs (Long-Short-Term Memory Structures). In each LSTM unit, a special type of atrous filters are designed to capture the short range and long range dependencies from various neighbors to a certain site of the feature map and pass the accumulated information to the next layer. From the extensive experiments on semantic image segmentation benchmarks (e.g., ImageNet Parsing, PASCAL VOC2012 and PASCAL-Part), our framework demonstrates the effectiveness to substantially improve the performances over the popular existing neural network models, and achieves state-of-the-art on ImageNet Parsing for large scale semantic segmentation. |
Tasks | Semantic Segmentation |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.05839v1 |
http://arxiv.org/pdf/1702.05839v1.pdf | |
PWC | https://paperswithcode.com/paper/progressively-diffused-networks-for-semantic |
Repo | |
Framework | |
Dress like a Star: Retrieving Fashion Products from Videos
Title | Dress like a Star: Retrieving Fashion Products from Videos |
Authors | Noa Garcia, George Vogiatzis |
Abstract | This work proposes a system for retrieving clothing and fashion products from video content. Although films and television are the perfect showcase for fashion brands to promote their products, spectators are not always aware of where to buy the latest trends they see on screen. Here, a framework for breaking the gap between fashion products shown on videos and users is presented. By relating clothing items and video frames in an indexed database and performing frame retrieval with temporal aggregation and fast indexing techniques, we can find fashion products from videos in a simple and non-intrusive way. Experiments in a large-scale dataset conducted here show that, by using the proposed framework, memory requirements can be reduced by 42.5X with respect to linear search, whereas accuracy is maintained at around 90%. |
Tasks | |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07198v1 |
http://arxiv.org/pdf/1710.07198v1.pdf | |
PWC | https://paperswithcode.com/paper/dress-like-a-star-retrieving-fashion-products |
Repo | |
Framework | |
Learning Edge Representations via Low-Rank Asymmetric Projections
Title | Learning Edge Representations via Low-Rank Asymmetric Projections |
Authors | Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou |
Abstract | We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, user-item graphs, knowledge bases, etc.) in many machine learning tasks. Unlike previous work, we (1) explicitly model an edge as a function of node embeddings, and we (2) propose a novel objective, the “graph likelihood”, which contrasts information from sampled random walks with non-existent edges. Individually, both of these contributions improve the learned representations, especially when there are memory constraints on the total size of the embeddings. When combined, our contributions enable us to significantly improve the state-of-the-art by learning more concise representations that better preserve the graph structure. We evaluate our method on a variety of link-prediction task including social networks, collaboration networks, and protein interactions, showing that our proposed method learn representations with error reductions of up to 76% and 55%, on directed and undirected graphs. In addition, we show that the representations learned by our method are quite space efficient, producing embeddings which have higher structure-preserving accuracy but are 10 times smaller. |
Tasks | Link Prediction |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05615v4 |
http://arxiv.org/pdf/1705.05615v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-edge-representations-via-low-rank |
Repo | |
Framework | |
Generative Adversarial Networks with Inverse Transformation Unit
Title | Generative Adversarial Networks with Inverse Transformation Unit |
Authors | Zhifeng Kong, Shuo Ding |
Abstract | In this paper we introduce a new structure to Generative Adversarial Networks by adding an inverse transformation unit behind the generator. We present two theorems to claim the convergence of the model, and two conjectures to nonideal situations when the transformation is not bijection. A general survey on models with different transformations was done on the MNIST dataset and the Fashion-MNIST dataset, which shows the transformation does not necessarily need to be bijection. Also, with certain transformations that blurs an image, our model successfully learned to sharpen the images and recover blurred images, which was additionally verified by our measurement of sharpness. |
Tasks | |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09354v1 |
http://arxiv.org/pdf/1709.09354v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-with-inverse |
Repo | |
Framework | |
A Quantum-Inspired Ensemble Method and Quantum-Inspired Forest Regressors
Title | A Quantum-Inspired Ensemble Method and Quantum-Inspired Forest Regressors |
Authors | Zeke Xie, Issei Sato |
Abstract | We propose a Quantum-Inspired Subspace(QIS) Ensemble Method for generating feature ensembles based on feature selections. We assign each principal component a Fraction Transition Probability as its probability weight based on Principal Component Analysis and quantum interpretations. In order to generate the feature subset for each base regressor, we select a feature subset from principal components based on Fraction Transition Probabilities. The idea originating from quantum mechanics can encourage ensemble diversity and the accuracy simultaneously. We incorporate Quantum-Inspired Subspace Method into Random Forest and propose Quantum-Inspired Forest. We theoretically prove that the quantum interpretation corresponds to the first order approximation of ensemble regression. We also evaluate the empirical performance of Quantum-Inspired Forest and Random Forest in multiple hyperparameter settings. Quantum-Inspired Forest proves the significant robustness of the default hyperparameters on most data sets. The contribution of this work is two-fold, a novel ensemble regression algorithm inspired by quantum mechanics and the theoretical connection between quantum interpretations and machine learning algorithms. |
Tasks | |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08117v1 |
http://arxiv.org/pdf/1711.08117v1.pdf | |
PWC | https://paperswithcode.com/paper/a-quantum-inspired-ensemble-method-and |
Repo | |
Framework | |
Interpretable probabilistic embeddings: bridging the gap between topic models and neural networks
Title | Interpretable probabilistic embeddings: bridging the gap between topic models and neural networks |
Authors | Anna Potapenko, Artem Popov, Konstantin Vorontsov |
Abstract | We consider probabilistic topic models and more recent word embedding techniques from a perspective of learning hidden semantic representations. Inspired by a striking similarity of the two approaches, we merge them and learn probabilistic embeddings with online EM-algorithm on word co-occurrence data. The resulting embeddings perform on par with Skip-Gram Negative Sampling (SGNS) on word similarity tasks and benefit in the interpretability of the components. Next, we learn probabilistic document embeddings that outperform paragraph2vec on a document similarity task and require less memory and time for training. Finally, we employ multimodal Additive Regularization of Topic Models (ARTM) to obtain a high sparsity and learn embeddings for other modalities, such as timestamps and categories. We observe further improvement of word similarity performance and meaningful inter-modality similarities. |
Tasks | Topic Models |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04154v1 |
http://arxiv.org/pdf/1711.04154v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-probabilistic-embeddings |
Repo | |
Framework | |
FlagIt: A System for Minimally Supervised Human Trafficking Indicator Mining
Title | FlagIt: A System for Minimally Supervised Human Trafficking Indicator Mining |
Authors | Mayank Kejriwal, Jiayuan Ding, Runqi Shao, Anoop Kumar, Pedro Szekely |
Abstract | In this paper, we describe and study the indicator mining problem in the online sex advertising domain. We present an in-development system, FlagIt (Flexible and adaptive generation of Indicators from text), which combines the benefits of both a lightweight expert system and classical semi-supervision (heuristic re-labeling) with recently released state-of-the-art unsupervised text embeddings to tag millions of sentences with indicators that are highly correlated with human trafficking. The FlagIt technology stack is open source. On preliminary evaluations involving five indicators, FlagIt illustrates promising performance compared to several alternatives. The system is being actively developed, refined and integrated into a domain-specific search system used by over 200 law enforcement agencies to combat human trafficking, and is being aggressively extended to mine at least six more indicators with minimal programming effort. FlagIt is a good example of a system that operates in limited label settings, and that requires creative combinations of established machine learning techniques to produce outputs that could be used by real-world non-technical analysts. |
Tasks | |
Published | 2017-12-05 |
URL | http://arxiv.org/abs/1712.03086v1 |
http://arxiv.org/pdf/1712.03086v1.pdf | |
PWC | https://paperswithcode.com/paper/flagit-a-system-for-minimally-supervised |
Repo | |
Framework | |
Symmetry Learning for Function Approximation in Reinforcement Learning
Title | Symmetry Learning for Function Approximation in Reinforcement Learning |
Authors | Anuj Mahajan, Theja Tulabandhula |
Abstract | In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information. |
Tasks | |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.02999v1 |
http://arxiv.org/pdf/1706.02999v1.pdf | |
PWC | https://paperswithcode.com/paper/symmetry-learning-for-function-approximation |
Repo | |
Framework | |
Statistical inference on random dot product graphs: a survey
Title | Statistical inference on random dot product graphs: a survey |
Authors | Avanti Athreya, Donniell E. Fishkind, Keith Levin, Vince Lyzinski, Youngser Park, Yichen Qin, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein, Carey E. Priebe |
Abstract | The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference. |
Tasks | Community Detection |
Published | 2017-09-16 |
URL | http://arxiv.org/abs/1709.05454v1 |
http://arxiv.org/pdf/1709.05454v1.pdf | |
PWC | https://paperswithcode.com/paper/statistical-inference-on-random-dot-product |
Repo | |
Framework | |
On labeling Android malware signatures using minhashing and further classification with Structural Equation Models
Title | On labeling Android malware signatures using minhashing and further classification with Structural Equation Models |
Authors | Ignacio Martín, José Alberto Hernández, Sergio de los Santos |
Abstract | Multi-scanner Antivirus systems provide insightful information on the nature of a suspect application; however there is often a lack of consensus and consistency between different Anti-Virus engines. In this article, we analyze more than 250 thousand malware signatures generated by 61 different Anti-Virus engines after analyzing 82 thousand different Android malware applications. We identify 41 different malware classes grouped into three major categories, namely Adware, Harmful Threats and Unknown or Generic signatures. We further investigate the relationships between such 41 classes using community detection algorithms from graph theory to identify similarities between them; and we finally propose a Structure Equation Model to identify which Anti-Virus engines are more powerful at detecting each macro-category. As an application, we show how such models can help in identifying whether Unknown malware applications are more likely to be of Harmful or Adware type. |
Tasks | Community Detection |
Published | 2017-09-13 |
URL | http://arxiv.org/abs/1709.04186v1 |
http://arxiv.org/pdf/1709.04186v1.pdf | |
PWC | https://paperswithcode.com/paper/on-labeling-android-malware-signatures-using |
Repo | |
Framework | |
Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information
Title | Weighted Message Passing and Minimum Energy Flow for Heterogeneous Stochastic Block Models with Side Information |
Authors | T. Tony Cai, Tengyuan Liang, Alexander Rakhlin |
Abstract | We study the misclassification error for community detection in general heterogeneous stochastic block models (SBM) with noisy or partial label information. We establish a connection between the misclassification rate and the notion of minimum energy on the local neighborhood of the SBM. We develop an optimally weighted message passing algorithm to reconstruct labels for SBM based on the minimum energy flow and the eigenvectors of a certain Markov transition matrix. The general SBM considered in this paper allows for unequal-size communities, degree heterogeneity, and different connection probabilities among blocks. We focus on how to optimally weigh the message passing to improve misclassification. |
Tasks | Community Detection |
Published | 2017-09-12 |
URL | http://arxiv.org/abs/1709.03907v1 |
http://arxiv.org/pdf/1709.03907v1.pdf | |
PWC | https://paperswithcode.com/paper/weighted-message-passing-and-minimum-energy |
Repo | |
Framework | |