July 28, 2019

3065 words 15 mins read

Paper Group ANR 444

Finding Efficient Swimming Strategies in a Three Dimensional Chaotic Flow by Reinforcement Learning. End-to-end Network for Twitter Geolocation Prediction and Hashing. A Fully-Automated Pipeline for Detection and Segmentation of Liver Lesions and Pathological Lymph Nodes. Fuzzy Based Implicit Sentiment Analysis on Quantitative Sentences. A Novel Fr …

Finding Efficient Swimming Strategies in a Three Dimensional Chaotic Flow by Reinforcement Learning


Title	Finding Efficient Swimming Strategies in a Three Dimensional Chaotic Flow by Reinforcement Learning
Authors	K. Gustavsson, L. Biferale, A. Celani, S. Colabrese
Abstract	We apply a reinforcement learning algorithm to show how smart particles can learn approximately optimal strategies to navigate in complex flows. In this paper we consider microswimmers in a paradigmatic three-dimensional case given by a stationary superposition of two Arnold-Beltrami-Childress flows with chaotic advection along streamlines. In such a flow, we study the evolution of point-like particles which can decide in which direction to swim, while keeping the velocity amplitude constant. We show that it is sufficient to endow the swimmers with a very restricted set of actions (six fixed swimming directions in our case) to have enough freedom to find efficient strategies to move upward and escape local fluid traps. The key ingredient is the learning-from-experience structure of the algorithm, which assigns positive or negative rewards depending on whether the taken action is, or is not, profitable for the predetermined goal in the long term horizon. This is another example supporting the efficiency of the reinforcement learning approach to learn how to accomplish difficult tasks in complex fluid environments.
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05826v2
PDF	http://arxiv.org/pdf/1711.05826v2.pdf
PWC	https://paperswithcode.com/paper/finding-efficient-swimming-strategies-in-a
Repo
Framework

End-to-end Network for Twitter Geolocation Prediction and Hashing


Title	End-to-end Network for Twitter Geolocation Prediction and Hashing
Authors	Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, Trevor Cohn
Abstract	We propose an end-to-end neural network to predict the geolocation of a tweet. The network takes as input a number of raw Twitter metadata such as the tweet message and associated user account information. Our model is language independent, and despite minimal feature engineering, it is interpretable and capable of learning location indicative words and timing patterns. Compared to state-of-the-art systems, our model outperforms them by 2%-6%. Additionally, we propose extensions to the model to compress representation learnt by the network into binary codes. Experiments show that it produces compact codes compared to benchmark hashing algorithms. An implementation of the model is released publicly.
Tasks	Feature Engineering
Published	2017-10-13
URL	http://arxiv.org/abs/1710.04802v1
PDF	http://arxiv.org/pdf/1710.04802v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-network-for-twitter-geolocation
Repo
Framework

A Fully-Automated Pipeline for Detection and Segmentation of Liver Lesions and Pathological Lymph Nodes


Title	A Fully-Automated Pipeline for Detection and Segmentation of Liver Lesions and Pathological Lymph Nodes
Authors	Assaf Hoogi, John W. Lambert, Yefeng Zheng, Dorin Comaniciu, Daniel L. Rubin
Abstract	We propose a fully-automated method for accurate and robust detection and segmentation of potentially cancerous lesions found in the liver and in lymph nodes. The process is performed in three steps, including organ detection, lesion detection and lesion segmentation. Our method applies machine learning techniques such as marginal space learning and convolutional neural networks, as well as active contour models. The method proves to be robust in its handling of extremely high lesion diversity. We tested our method on volumetric computed tomography (CT) images, including 42 volumes containing liver lesions and 86 volumes containing 595 pathological lymph nodes. Preliminary results under 10-fold cross validation show that for both the liver lesions and the lymph nodes, a total detection sensitivity of 0.53 and average Dice score of $0.71 \pm 0.15$ for segmentation were obtained.
Tasks	Computed Tomography (CT), Lesion Segmentation, Organ Detection
Published	2017-03-19
URL	http://arxiv.org/abs/1703.06418v1
PDF	http://arxiv.org/pdf/1703.06418v1.pdf
PWC	https://paperswithcode.com/paper/a-fully-automated-pipeline-for-detection-and
Repo
Framework

Fuzzy Based Implicit Sentiment Analysis on Quantitative Sentences


Title	Fuzzy Based Implicit Sentiment Analysis on Quantitative Sentences
Authors	Amir Hossein Yazdavar, Monireh Ebrahimi, Naomie Salim
Abstract	With the rapid growth of social media on the web, emotional polarity computation has become a flourishing frontier in the text mining community. However, it is challenging to understand the latest trends and summarize the state or general opinions about products due to the big diversity and size of social media data and this creates the need of automated and real time opinion extraction and mining. On the other hand, the bulk of current research has been devoted to study the subjective sentences which contain opinion keywords and limited work has been reported for objective statements that imply sentiment. In this paper, fuzzy based knowledge engineering model has been developed for sentiment classification of special group of such sentences including the change or deviation from desired range or value. Drug reviews are the rich source of such statements. Therefore, in this research, some experiments were carried out on patient’s reviews on several different cholesterol lowering drugs to determine their sentiment polarity. The main conclusion through this study is, in order to increase the accuracy level of existing drug opinion mining systems, objective sentences which imply opinion should be taken into account. Our experimental results demonstrate that our proposed model obtains over 72 percent F1 value.
Tasks	Opinion Mining, Sentiment Analysis
Published	2017-01-03
URL	http://arxiv.org/abs/1701.00798v1
PDF	http://arxiv.org/pdf/1701.00798v1.pdf
PWC	https://paperswithcode.com/paper/fuzzy-based-implicit-sentiment-analysis-on
Repo
Framework

A Novel Framework for Robustness Analysis of Visual QA Models


Title	A Novel Framework for Robustness Analysis of Visual QA Models
Authors	Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, Bernard Ghanem
Abstract	Deep neural networks have been playing an essential role in many computer vision tasks including Visual Question Answering (VQA). Until recently, the study of their accuracy was the main focus of research but now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating their tolerance to varying noise levels. In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later. In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. We hypothesize that the level of noise is positively correlated to the similarity of a basic question to the main question. Hence, to apply noise on any given main question, we rank a pool of basic questions based on their similarity by casting this ranking task as a LASSO optimization problem. Then, we propose a novel robustness measure, R_score, and two large-scale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models.
Tasks	Question Answering, Visual Question Answering
Published	2017-11-16
URL	http://arxiv.org/abs/1711.06232v3
PDF	http://arxiv.org/pdf/1711.06232v3.pdf
PWC	https://paperswithcode.com/paper/a-novel-framework-for-robustness-analysis-of
Repo
Framework

SEARNN: Training RNNs with Global-Local Losses


Title	SEARNN: Training RNNs with Global-Local Losses
Authors	Rémi Leblond, Jean-Baptiste Alayrac, Anton Osokin, Simon Lacoste-Julien
Abstract	We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the “learning to search” (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an appropriate surrogate for the test error: by only maximizing the ground truth probability, it fails to exploit the wealth of information offered by structured losses. Further, it introduces discrepancies between training and predicting (such as exposure bias) that may hurt test performance. Instead, SEARNN leverages test-alike search space exploration to introduce global-local losses that are closer to the test error. We first demonstrate improved performance over MLE on two different tasks: OCR and spelling correction. Then, we propose a subsampling strategy to enable SEARNN to scale to large vocabulary sizes. This allows us to validate the benefits of our approach on a machine translation task.
Tasks	Machine Translation, Optical Character Recognition, Spelling Correction, Structured Prediction
Published	2017-06-14
URL	http://arxiv.org/abs/1706.04499v3
PDF	http://arxiv.org/pdf/1706.04499v3.pdf
PWC	https://paperswithcode.com/paper/searnn-training-rnns-with-global-local-losses
Repo
Framework

Hierarchical Multi-scale Attention Networks for Action Recognition


Title	Hierarchical Multi-scale Attention Networks for Action Recognition
Authors	Shiyang Yan, Jeremy S. Smith, Wenjin Lu, Bailing Zhang
Abstract	Recurrent Neural Networks (RNNs) have been widely used in natural language processing and computer vision. Among them, the Hierarchical Multi-scale RNN (HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn the hierarchical temporal structure from data automatically. In this paper, we extend the work to solve the computer vision task of action recognition. However, in sequence-to-sequence models like RNN, it is normally very hard to discover the relationships between inputs and outputs given static inputs. As a solution, attention mechanism could be applied to extract the relevant information from input thus facilitating the modeling of input-output relationships. Based on these considerations, we propose a novel attention network, namely Hierarchical Multi-scale Attention Network (HM-AN), by combining the HM-RNN and the attention mechanism and apply it to action recognition. A newly proposed gradient estimation method for stochastic neurons, namely Gumbel-softmax, is exploited to implement the temporal boundary detectors and the stochastic hard attention mechanism. To amealiate the negative effect of sensitive temperature of the Gumbel-softmax, an adaptive temperature training method is applied to better the system performance. The experimental results demonstrate the improved effect of HM-AN over LSTM with attention on the vision task. Through visualization of what have been learnt by the networks, it can be observed that both the attention regions of images and the hierarchical temporal structure can be captured by HM-AN.
Tasks	Temporal Action Localization
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07590v2
PDF	http://arxiv.org/pdf/1708.07590v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-multi-scale-attention-networks
Repo
Framework

Large-Scale YouTube-8M Video Understanding with Deep Neural Networks


Title	Large-Scale YouTube-8M Video Understanding with Deep Neural Networks
Authors	Manuk Akopyan, Eshsou Khashba
Abstract	Video classification problem has been studied many years. The success of Convolutional Neural Networks (CNN) in image recognition tasks gives a powerful incentive for researchers to create more advanced video classification approaches. As video has a temporal content Long Short Term Memory (LSTM) networks become handy tool allowing to model long-term temporal clues. Both approaches need a large dataset of input data. In this paper three models provided to address video classification using recently announced YouTube-8M large-scale dataset. The first model is based on frame pooling approach. Two other models based on LSTM networks. Mixture of Experts intermediate layer is used in third model allowing to increase model capacity without dramatically increasing computations. The set of experiments for handling imbalanced training data has been conducted.
Tasks	Video Classification, Video Understanding
Published	2017-06-14
URL	http://arxiv.org/abs/1706.04488v1
PDF	http://arxiv.org/pdf/1706.04488v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-youtube-8m-video-understanding
Repo
Framework

Curriculum Q-Learning for Visual Vocabulary Acquisition


Title	Curriculum Q-Learning for Visual Vocabulary Acquisition
Authors	Ahmed H. Zaidi, Russell Moore, Ted Briscoe
Abstract	The structure of curriculum plays a vital role in our learning process, both as children and adults. Presenting material in ascending order of difficulty that also exploits prior knowledge can have a significant impact on the rate of learning. However, the notion of difficulty and prior knowledge differs from person to person. Motivated by the need for a personalised curriculum, we present a novel method of curriculum learning for vocabulary words in the form of visual prompts. We employ a reinforcement learning model grounded in pedagogical theories that emulates the actions of a tutor. We simulate three students with different levels of vocabulary knowledge in order to evaluate the how well our model adapts to the environment. The results of the simulation reveal that through interaction, the model is able to identify the areas of weakness, as well as push students to the edge of their ZPD. We hypothesise that these methods can also be effective in training agents to learn language representations in a simulated environment where it has previously been shown that order of words and prior knowledge play an important role in the efficacy of language learning.
Tasks	Q-Learning
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10837v1
PDF	http://arxiv.org/pdf/1711.10837v1.pdf
PWC	https://paperswithcode.com/paper/curriculum-q-learning-for-visual-vocabulary
Repo
Framework

Manifold Constrained Low-Rank Decomposition


Title	Manifold Constrained Low-Rank Decomposition
Authors	Chen Chen, Baochang Zhang, Alessio Del Bue, Vittorio Murino
Abstract	Low-rank decomposition (LRD) is a state-of-the-art method for visual data reconstruction and modelling. However, it is a very challenging problem when the image data contains significant occlusion, noise, illumination variation, and misalignment from rotation or viewpoint changes. We leverage the specific structure of data in order to improve the performance of LRD when the data are not ideal. To this end, we propose a new framework that embeds manifold priors into LRD. To implement the framework, we design an alternating direction method of multipliers (ADMM) method which efficiently integrates the manifold constraints during the optimization process. The proposed approach is successfully used to calculate low-rank models from face images, hand-written digits and planar surface images. The results show a consistent increase of performance when compared to the state-of-the-art over a wide range of realistic image misalignments and corruptions.
Tasks
Published	2017-08-06
URL	http://arxiv.org/abs/1708.01846v1
PDF	http://arxiv.org/pdf/1708.01846v1.pdf
PWC	https://paperswithcode.com/paper/manifold-constrained-low-rank-decomposition
Repo
Framework

Certified Defenses for Data Poisoning Attacks


Title	Certified Defenses for Data Poisoning Attacks
Authors	Jacob Steinhardt, Pang Wei Koh, Percy Liang
Abstract	Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.
Tasks	data poisoning
Published	2017-06-09
URL	http://arxiv.org/abs/1706.03691v2
PDF	http://arxiv.org/pdf/1706.03691v2.pdf
PWC	https://paperswithcode.com/paper/certified-defenses-for-data-poisoning-attacks
Repo
Framework

BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks


Title	BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks
Authors	Viet-Quoc Pham, Satoshi Ito, Tatsuo Kozakaya
Abstract	We present a simple and effective framework for simultaneous semantic segmentation and instance segmentation with Fully Convolutional Networks (FCNs). The method, called BiSeg, predicts instance segmentation as a posterior in Bayesian inference, where semantic segmentation is used as a prior. We extend the idea of position-sensitive score maps used in recent methods to a fusion of multiple score maps at different scales and partition modes, and adopt it as a robust likelihood for instance segmentation inference. As both Bayesian inference and map fusion are performed per pixel, BiSeg is a fully convolutional end-to-end solution that inherits all the advantages of FCNs. We demonstrate state-of-the-art instance segmentation accuracy on PASCAL VOC.
Tasks	Bayesian Inference, Instance Segmentation, Semantic Segmentation
Published	2017-06-07
URL	http://arxiv.org/abs/1706.02135v2
PDF	http://arxiv.org/pdf/1706.02135v2.pdf
PWC	https://paperswithcode.com/paper/biseg-simultaneous-instance-segmentation-and
Repo
Framework

The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising


Title	The Sup-norm Perturbation of HOSVD and Low Rank Tensor Denoising
Authors	Dong Xia, Fan Zhou
Abstract	The higher order singular value decomposition (HOSVD) of tensors is a generalization of matrix SVD. The perturbation analysis of HOSVD under random noise is more delicate than its matrix counterpart. Recently, polynomial time algorithms have been proposed where statistically optimal estimates of the singular subspaces and the low rank tensors are attainable in the Euclidean norm. In this article, we analyze the sup-norm perturbation bounds of HOSVD and introduce estimators of the singular subspaces with sharp deviation bounds in the sup-norm. We also investigate a low rank tensor denoising estimator and demonstrate its fast convergence rate with respect to the entry-wise errors. The sup-norm perturbation bounds reveal unconventional phase transitions for statistical learning applications such as the exact clustering in high dimensional Gaussian mixture model and the exact support recovery in sub-tensor localizations. In addition, the bounds established for HOSVD also elaborate the one-sided sup-norm perturbation bounds for the singular subspaces of unbalanced (or fat) matrices.
Tasks	Denoising
Published	2017-07-05
URL	http://arxiv.org/abs/1707.01207v5
PDF	http://arxiv.org/pdf/1707.01207v5.pdf
PWC	https://paperswithcode.com/paper/the-sup-norm-perturbation-of-hosvd-and-low
Repo
Framework

Deep Network Guided Proof Search


Title	Deep Network Guided Proof Search
Authors	Sarah Loos, Geoffrey Irving, Christian Szegedy, Cezary Kaliszyk
Abstract	Deep learning techniques lie at the heart of several significant AI advances in recent years including object recognition and detection, image captioning, machine translation, speech recognition and synthesis, and playing the game of Go. Automated first-order theorem provers can aid in the formalization and verification of mathematical theorems and play a crucial role in program analysis, theory reasoning, security, interpolation, and system verification. Here we suggest deep learning based guidance in the proof search of the theorem prover E. We train and compare several deep neural network models on the traces of existing ATP proofs of Mizar statements and use them to select processed clauses during proof search. We give experimental evidence that with a hybrid, two-phase approach, deep learning based guidance can significantly reduce the average number of proof search steps while increasing the number of theorems proved. Using a few proof guidance strategies that leverage deep neural networks, we have found first-order proofs of 7.36% of the first-order logic translations of the Mizar Mathematical Library theorems that did not previously have ATP generated proofs. This increases the ratio of statements in the corpus with ATP generated proofs from 56% to 59%.
Tasks	Game of Go, Image Captioning, Machine Translation, Object Recognition, Speech Recognition
Published	2017-01-24
URL	http://arxiv.org/abs/1701.06972v1
PDF	http://arxiv.org/pdf/1701.06972v1.pdf
PWC	https://paperswithcode.com/paper/deep-network-guided-proof-search
Repo
Framework

Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end


Title	Hyper-dimensional computing for a visual question-answering system that is trainable end-to-end
Authors	Guglielmo Montone, J. Kevin O’Regan, Alexander V. Terekhov
Abstract	In this work we propose a system for visual question answering. Our architecture is composed of two parts, the first part creates the logical knowledge base given the image. The second part evaluates questions against the knowledge base. Differently from previous work, the knowledge base is represented using hyper-dimensional computing. This choice has the advantage that all the operations in the system, namely creating the knowledge base and evaluating the questions against it, are differentiable, thereby making the system easily trainable in an end-to-end fashion.
Tasks	Question Answering, Visual Question Answering
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10185v1
PDF	http://arxiv.org/pdf/1711.10185v1.pdf
PWC	https://paperswithcode.com/paper/hyper-dimensional-computing-for-a-visual
Repo
Framework