Paper Group ANR 614
Risk Bounds for Learning Multiple Components with Permutation-Invariant Losses. Blind Denoising Autoencoder. DynamoNet: Dynamic Action and Motion Network. Automatic Model Selection for Neural Networks. Twitter Sentiment Analysis using Distributed Word and Sentence Representation. Social Attention for Autonomous Decision-Making in Dense Traffic. Top …
Risk Bounds for Learning Multiple Components with Permutation-Invariant Losses
Title | Risk Bounds for Learning Multiple Components with Permutation-Invariant Losses |
Authors | Fabien Lauer |
Abstract | This paper proposes a simple approach to derive efficient error bounds for learning multiple components with sparsity-inducing regularization. We show that for such regularization schemes, known decompositions of the Rademacher complexity over the components can be used in a more efficient manner to result in tighter bounds without too much effort. We give examples of application to switching regression and center-based clustering/vector quantization. Then, the complete workflow is illustrated on the problem of subspace clustering, for which decomposition results were not previously available. For all these problems, the proposed approach yields risk bounds with mild dependencies on the number of components and completely removes this dependence for nonconvex regularization schemes that could not be handled by previous methods. |
Tasks | Quantization |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07594v2 |
https://arxiv.org/pdf/1904.07594v2.pdf | |
PWC | https://paperswithcode.com/paper/risk-bounds-for-learning-multiple-components |
Repo | |
Framework | |
Blind Denoising Autoencoder
Title | Blind Denoising Autoencoder |
Authors | Angshul Majumdar |
Abstract | The term blind denoising refers to the fact that the basis used for denoising is learnt from the noisy sample itself during denoising. Dictionary learning and transform learning based formulations for blind denoising are well known. But there has been no autoencoder based solution for the said blind denoising approach. So far autoencoder based denoising formulations have learnt the model on a separate training data and have used the learnt model to denoise test samples. Such a methodology fails when the test image (to denoise) is not of the same kind as the models learnt with. This will be first work, where we learn the autoencoder from the noisy sample while denoising. Experimental results show that our proposed method performs better than dictionary learning (KSVD), transform learning, sparse stacked denoising autoencoder and the gold standard BM3D algorithm. |
Tasks | Denoising, Dictionary Learning |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.07358v1 |
https://arxiv.org/pdf/1912.07358v1.pdf | |
PWC | https://paperswithcode.com/paper/blind-denoising-autoencoder |
Repo | |
Framework | |
DynamoNet: Dynamic Action and Motion Network
Title | DynamoNet: Dynamic Action and Motion Network |
Authors | Ali Diba, Vivek Sharma, Luc Van Gool, Rainer Stiefelhagen |
Abstract | In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. Thus far, the vision community has focused on spatio-temporal approaches using standard filters, rather we here propose dynamic filters that adaptively learn the video-specific internal motion representation by predicting the short-term future frames. We name this new motion representation, as dynamic motion representation (DMR) and is embedded inside of 3D convolutional network as a new layer, which captures the visual appearance and motion dynamics throughout entire video clip via end-to-end network learning. Simultaneously, we utilize these motion representation to enrich video classification. We have designed the frame prediction task as an auxiliary task to empower the classification problem. With these overall objectives, to this end, we introduce a novel unified spatio-temporal 3D-CNN architecture (DynamoNet) that jointly optimizes the video classification and learning motion representation by predicting future frames as a multi-task learning problem. We conduct experiments on challenging human action datasets: Kinetics 400, UCF101, HMDB51. The experiments using the proposed DynamoNet show promising results on all the datasets. |
Tasks | Multi-Task Learning, Temporal Action Localization, Video Classification |
Published | 2019-04-25 |
URL | http://arxiv.org/abs/1904.11407v1 |
http://arxiv.org/pdf/1904.11407v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamonet-dynamic-action-and-motion-network |
Repo | |
Framework | |
Automatic Model Selection for Neural Networks
Title | Automatic Model Selection for Neural Networks |
Authors | David Laredo, Yulin Qin, Oliver Schütze, Jian-Qiao Sun |
Abstract | Neural networks and deep learning are changing the way that artificial intelligence is being done. Efficiently choosing a suitable network architecture and fine-tune its hyper-parameters for a specific dataset is a time-consuming task given the staggering number of possible alternatives. In this paper, we address the problem of model selection by means of a fully automated framework for efficiently selecting a neural network model for a given task: classification or regression. The algorithm, named Automatic Model Selection, is a modified micro-genetic algorithm that automatically and efficiently finds the most suitable neural network model for a given dataset. The main contributions of this method are a simple list based encoding for neural networks as genotypes in an evolutionary algorithm, new crossover, and mutation operators, the introduction of a fitness function that considers both, the accuracy of the model and its complexity and a method to measure the similarity between two neural networks. AMS is evaluated on two different datasets. By comparing some models obtained with AMS to state-of-the-art models for each dataset we show that AMS can automatically find efficient neural network models. Furthermore, AMS is computationally efficient and can make use of distributed computing paradigms to further boost its performance. |
Tasks | Model Selection |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06010v1 |
https://arxiv.org/pdf/1905.06010v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-model-selection-for-neural-networks |
Repo | |
Framework | |
Twitter Sentiment Analysis using Distributed Word and Sentence Representation
Title | Twitter Sentiment Analysis using Distributed Word and Sentence Representation |
Authors | Dwarampudi Mahidhar Reddy, Dr. N V Subba Reddy, Dr. N V Subba Reddy |
Abstract | An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people’s opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods. |
Tasks | Sentiment Analysis, Twitter Sentiment Analysis |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.12580v1 |
http://arxiv.org/pdf/1904.12580v1.pdf | |
PWC | https://paperswithcode.com/paper/190412580 |
Repo | |
Framework | |
Social Attention for Autonomous Decision-Making in Dense Traffic
Title | Social Attention for Autonomous Decision-Making in Dense Traffic |
Authors | Edouard Leurent, Jean Mercat |
Abstract | We study the design of learning architectures for behavioural planning in a dense traffic setting. Such architectures should deal with a varying number of nearby vehicles, be invariant to the ordering chosen to describe them, while staying accurate and compact. We observe that the two most popular representations in the literature do not fit these criteria, and perform badly on an complex negotiation task. We propose an attention-based architecture that satisfies all these properties and explicitly accounts for the existing interactions between the traffic participants. We show that this architecture leads to significant performance gains, and is able to capture interactions patterns that can be visualised and qualitatively interpreted. Videos and code are available at https://eleurent.github.io/social-attention/. |
Tasks | Decision Making |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12250v1 |
https://arxiv.org/pdf/1911.12250v1.pdf | |
PWC | https://paperswithcode.com/paper/social-attention-for-autonomous-decision |
Repo | |
Framework | |
Top-down induction of decision trees: rigorous guarantees and inherent limitations
Title | Top-down induction of decision trees: rigorous guarantees and inherent limitations |
Authors | Guy Blanc, Jane Lange, Li-Yang Tan |
Abstract | Consider the following heuristic for building a decision tree for a function $f : {0,1}^n \to {\pm 1}$. Place the most influential variable $x_i$ of $f$ at the root, and recurse on the subfunctions $f_{x_i=0}$ and $f_{x_i=1}$ on the left and right subtrees respectively; terminate once the tree is an $\varepsilon$-approximation of $f$. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: $\circ$ Upper bound: For every $f$ with decision tree size $s$ and every $\varepsilon \in (0,\frac1{2})$, this heuristic builds a decision tree of size at most $s^{O(\log(s/\varepsilon)\log(1/\varepsilon))}$. $\circ$ Lower bound: For every $\varepsilon \in (0,\frac1{2})$ and $s \le 2^{\tilde{O}(\sqrt{n})}$, there is an $f$ with decision tree size $s$ such that this heuristic builds a decision tree of size $s^{\tilde{\Omega}(\log s)}$. We also obtain upper and lower bounds for monotone functions: $s^{O(\sqrt{\log s}/\varepsilon)}$ and $s^{\tilde{\Omega}(\sqrt[4]{\log s } )}$ respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms—which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART—achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989). Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime. |
Tasks | |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07375v1 |
https://arxiv.org/pdf/1911.07375v1.pdf | |
PWC | https://paperswithcode.com/paper/top-down-induction-of-decision-trees-rigorous |
Repo | |
Framework | |
BMF: Block matrix approach to factorization of large scale data
Title | BMF: Block matrix approach to factorization of large scale data |
Authors | Prasad G Bhavana, Vineet C Nair |
Abstract | Matrix Factorization (MF) on large scale matrices is computationally as well as memory intensive task. Alternative convergence techniques are needed when the size of the input matrix is higher than the available memory on a Central Processing Unit (CPU) and Graphical Processing Unit (GPU). While alternating least squares (ALS) convergence on CPU could take forever, loading all the required matrices on to GPU memory may not be possible when the dimensions are significantly higher. Hence we introduce a novel technique that is based on considering the entire data into a block matrix and relies on factorization at a block level. |
Tasks | |
Published | 2019-01-02 |
URL | http://arxiv.org/abs/1901.00444v2 |
http://arxiv.org/pdf/1901.00444v2.pdf | |
PWC | https://paperswithcode.com/paper/bmf-block-matrix-approach-to-factorization-of |
Repo | |
Framework | |
Unpaired Image Captioning via Scene Graph Alignments
Title | Unpaired Image Captioning via Scene Graph Alignments |
Authors | Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang |
Abstract | Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin. |
Tasks | Image Captioning |
Published | 2019-03-26 |
URL | https://arxiv.org/abs/1903.10658v4 |
https://arxiv.org/pdf/1903.10658v4.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-image-captioning-via-scene-graph |
Repo | |
Framework | |
A Weighted Multi-Criteria Decision Making Approach for Image Captioning
Title | A Weighted Multi-Criteria Decision Making Approach for Image Captioning |
Authors | Hassan Maleki Galandouz, Mohsen Ebrahimi Moghaddam, Mehrnoush Shamsfard |
Abstract | Image captioning aims at automatically generating descriptions of an image in natural language. This is a challenging problem in the field of artificial intelligence that has recently received significant attention in the computer vision and natural language processing. Among the existing approaches, visual retrieval based methods have been proven to be highly effective. These approaches search for similar images, then build a caption for the query image based on the captions of the retrieved images. In this study, we present a method for visual retrieval based image captioning, in which we use a multi criteria decision making algorithm to effectively combine several criteria with proportional impact weights to retrieve the most relevant caption for the query image. The main idea of the proposed approach is to design a mechanism to retrieve more semantically relevant captions with the query image and then selecting the most appropriate caption by imitation of the human act based on a weighted multi-criteria decision making algorithm. Experiments conducted on MS COCO benchmark dataset have shown that proposed method provides much more effective results in compare to the state-of-the-art models by using criteria with proportional impact weights . |
Tasks | Decision Making, Image Captioning |
Published | 2019-03-17 |
URL | http://arxiv.org/abs/1904.00766v1 |
http://arxiv.org/pdf/1904.00766v1.pdf | |
PWC | https://paperswithcode.com/paper/a-weighted-multi-criteria-decision-making |
Repo | |
Framework | |
LIMIT-BERT : Linguistic Informed Multi-Task BERT
Title | LIMIT-BERT : Linguistic Informed Multi-Task BERT |
Authors | Junru Zhou, Zhuosheng Zhang, Hai Zhao |
Abstract | In this paper, we present a Linguistic Informed Multi-Task BERT (LIMIT-BERT) for learning language representations across multiple linguistic tasks by Multi-Task Learning (MTL). LIMIT-BERT includes five key linguistic syntax and semantics tasks: Part-Of-Speech (POS) tags, constituent and dependency syntactic parsing, span and dependency semantic role labeling (SRL). Besides, LIMIT-BERT adopts linguistics mask strategy: Syntactic and Semantic Phrase Masking which mask all of the tokens corresponding to a syntactic/semantic phrase. Different from recent Multi-Task Deep Neural Networks (MT-DNN) (Liu et al., 2019), our LIMIT-BERT is linguistically motivated and learning in a semi-supervised method which provides large amounts of linguistic-task data as same as BERT learning corpus. As a result, LIMIT-BERT not only improves linguistic tasks performance but also benefits from a regularization effect and linguistic information that leads to more general representations to help adapt to new tasks and domains. LIMIT-BERT obtains new state-of-the-art or competitive results on both span and dependency semantic parsing on Propbank benchmarks and both dependency and constituent syntactic parsing on Penn Treebank. |
Tasks | Multi-Task Learning, Semantic Parsing, Semantic Role Labeling |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14296v1 |
https://arxiv.org/pdf/1910.14296v1.pdf | |
PWC | https://paperswithcode.com/paper/limit-bert-linguistic-informed-multi-task |
Repo | |
Framework | |
Zero-shifting Technique for Deep Neural Network Training on Resistive Cross-point Arrays
Title | Zero-shifting Technique for Deep Neural Network Training on Resistive Cross-point Arrays |
Authors | Hyungjun Kim, Malte Rasch, Tayfun Gokmen, Takashi Ando, Hiroyuki Miyazoe, Jae-Joon Kim, John Rozen, Seyoung Kim |
Abstract | A resistive memory device-based computing architecture is one of the promising platforms for energy-efficient Deep Neural Network (DNN) training accelerators. The key technical challenge in realizing such accelerators is to accumulate the gradient information without a bias. Unlike the digital numbers in software which can be assigned and accessed with desired accuracy, numbers stored in resistive memory devices can only be manipulated following the physics of the device, which can significantly limit the training performance. Therefore, additional techniques and algorithm-level remedies are required to achieve the best possible performance in resistive memory device-based accelerators. In this paper, we analyze asymmetric conductance modulation characteristics in RRAM by Soft-bound synapse model and present an in-depth analysis on the relationship between device characteristics and DNN model accuracy using a 3-layer DNN trained on the MNIST dataset. We show that the imbalance between up and down update leads to a poor network performance. We introduce a concept of symmetry point and propose a zero-shifting technique which can compensate imbalance by programming the reference device and changing the zero value point of the weight. By using this zero-shifting method, we show that network performance dramatically improves for imbalanced synapse devices. |
Tasks | |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10228v2 |
https://arxiv.org/pdf/1907.10228v2.pdf | |
PWC | https://paperswithcode.com/paper/zero-shifting-technique-for-deep-neural |
Repo | |
Framework | |
Improving Image Captioning by Leveraging Knowledge Graphs
Title | Improving Image Captioning by Leveraging Knowledge Graphs |
Authors | Yimin Zhou, Yiwei Sun, Vasant Honavar |
Abstract | We explore the use of a knowledge graphs, that capture general or commonsense knowledge, to augment the information extracted from images by the state-of-the-art methods for image captioning. The results of our experiments, on several benchmark data sets such as MS COCO, as measured by CIDEr-D, a performance metric for image captioning, show that the variants of the state-of-the-art methods for image captioning that make use of the information extracted from knowledge graphs can substantially outperform those that rely solely on the information extracted from images. |
Tasks | Image Captioning, Knowledge Graphs |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.08942v1 |
http://arxiv.org/pdf/1901.08942v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-image-captioning-by-leveraging |
Repo | |
Framework | |
Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation
Title | Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation |
Authors | Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Haihua Xu, Eng Siong Chng |
Abstract | The neural language models (NLM) achieve strong generalization capability by learning the dense representation of words and using them to estimate probability distribution function. However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates. To address this problem, we propose a method to enrich representations of rare words in pre-trained NLM and consequently improve its probability estimation performance. The proposed method augments the word embedding matrices of pre-trained NLM while keeping other parameters unchanged. Specifically, our method updates the embedding vectors of rare words using embedding vectors of other semantically and syntactically similar words. To evaluate the proposed method, we enrich the rare street names in the pre-trained NLM and use it to rescore 100-best hypotheses output from the Singapore English speech recognition system. The enriched NLM reduces the word error rate by 6% relative and improves the recognition accuracy of the rare words by 16% absolute as compared to the baseline NLM. |
Tasks | Speech Recognition |
Published | 2019-04-08 |
URL | https://arxiv.org/abs/1904.03799v2 |
https://arxiv.org/pdf/1904.03799v2.pdf | |
PWC | https://paperswithcode.com/paper/enriching-rare-word-representations-in-neural |
Repo | |
Framework | |
Error-Correcting Neural Sequence Prediction
Title | Error-Correcting Neural Sequence Prediction |
Authors | James O’ Neill, Danushka Bollegala |
Abstract | We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance. Instead of minimizing measures between the predicted probability distribution and true distribution, we use error-correcting codes to represent both predictions and outputs. Secondly, we propose multiple ways to improve accuracy and convergence rates by maximizing the separability between codes that correspond to classes proportional to word embedding similarities. Lastly, we introduce our main contribution called \textit{Latent Variable Mixture Sampling}, a technique that is used to mitigate exposure bias, which can be integrated into training latent variable-based neural sequence predictors such as ECOC. This involves mixing the latent codes of past predictions and past targets in one of two ways: (1) according to a predefined sampling schedule or (2) a differentiable sampling procedure whereby the mixing probability is learned throughout training by replacing the greedy argmax operation with a smooth approximation. ECOC-NSP leads to consistent improvements on language modelling datasets and the proposed Latent Variable mixture sampling methods are found to perform well for text generation tasks such as image captioning. |
Tasks | Image Captioning, Language Modelling, Text Generation |
Published | 2019-01-21 |
URL | https://arxiv.org/abs/1901.07002v2 |
https://arxiv.org/pdf/1901.07002v2.pdf | |
PWC | https://paperswithcode.com/paper/error-correcting-neural-sequence-prediction |
Repo | |
Framework | |