January 29, 2020

3463 words 17 mins read

Paper Group ANR 713

Factorized Multimodal Transformer for Multimodal Sequential Learning. Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors. LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery. TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning. ExperienceThinking: Hyperparameter Optimization with Budget …

Factorized Multimodal Transformer for Multimodal Sequential Learning


Title	Factorized Multimodal Transformer for Multimodal Sequential Learning
Authors	Amir Zadeh, Chengfeng Mao, Kelly Shi, Yiwei Zhang, Paul Pu Liang, Soujanya Poria, Louis-Philippe Morency
Abstract	The complex world around us is inherently multimodal and sequential (continuous). Information is scattered across different modalities and requires multiple continuous sensors to be captured. As machine learning leaps towards better generalization to real world, multimodal sequential learning becomes a fundamental research area. Arguably, modeling arbitrarily distributed spatio-temporal dynamics within and across modalities is the biggest challenge in this research area. In this paper, we present a new transformer model, called the Factorized Multimodal Transformer (FMT) for multimodal sequential learning. FMT inherently models the intramodal and intermodal (involving two or more modalities) dynamics within its multimodal input in a factorized manner. The proposed factorization allows for increasing the number of self-attentions to better model the multimodal phenomena at hand; without encountering difficulties during training (e.g. overfitting) even on relatively low-resource setups. All the attention mechanisms within FMT have a full time-domain receptive field which allows them to asynchronously capture long-range multimodal dynamics. In our experiments we focus on datasets that contain the three commonly studied modalities of language, vision and acoustic. We perform a wide range of experiments, spanning across 3 well-studied datasets and 21 distinct labels. FMT shows superior performance over previously proposed models, setting new state of the art in the studied datasets.
Tasks
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09826v1
PDF	https://arxiv.org/pdf/1911.09826v1.pdf
PWC	https://paperswithcode.com/paper/factorized-multimodal-transformer-for
Repo
Framework

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors


Title	Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors
Authors	Tommaso Teofili, Jimmy Lin
Abstract	We demonstrate three approaches for adapting the open-source Lucene search library to perform approximate nearest-neighbor search on arbitrary dense vectors, using similarity search on word embeddings as a case study. At its core, Lucene is built around inverted indexes of a document collection’s (sparse) term-document matrix, which is incompatible with the lower-dimensional dense vectors that are common in deep learning applications. We evaluate three techniques to overcome these challenges that can all be natively integrated into Lucene: the creation of documents populated with fake words, LSH applied to lexical realizations of dense vectors, and k-d trees coupled with dimensionality reduction. Experiments show that the “fake words” approach represents the best balance between effectiveness and efficiency. These techniques are integrated into the Anserini open-source toolkit and made available to the community.
Tasks	Dimensionality Reduction, Word Embeddings
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10208v2
PDF	https://arxiv.org/pdf/1910.10208v2.pdf
PWC	https://paperswithcode.com/paper/lucene-for-approximate-nearest-neighbors
Repo
Framework

LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery


Title	LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery
Authors	Junming Zhang, Manikandasriram Srinivasan Ramanagopalg, Ram Vasudevan, Matthew Johnson-Roberson
Abstract	An accurate depth map of the environment is critical to the safe operation of autonomous robots and vehicles. Currently, either light detection and ranging (LIDAR) or stereo matching algorithms are used to acquire such depth information. However, a high-resolution LIDAR is expensive and produces sparse depth map at large range; stereo matching algorithms are able to generate denser depth maps but are typically less accurate than LIDAR at long range. This paper combines these approaches together to generate high-quality dense depth maps. Unlike previous approaches that are trained using ground-truth labels, the proposed model adopts a self-supervised training process. Experiments show that the proposed method is able to generate high-quality dense depth maps and performs robustly even with low-resolution inputs. This shows the potential to reduce the cost by using LIDARs with lower resolution in concert with stereo systems while maintaining high resolution.
Tasks	Stereo Matching, Stereo Matching Hand
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02744v1
PDF	https://arxiv.org/pdf/1905.02744v1.pdf
PWC	https://paperswithcode.com/paper/listereo-generate-dense-depth-maps-from-lidar
Repo
Framework

TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning


Title	TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning
Authors	Zhongjie Yu, Lin Chen, Zhongwei Cheng, Jiebo Luo
Abstract	The successful application of deep learning to many visual recognition tasks relies heavily on the availability of a large amount of labeled data which is usually expensive to obtain. The few-shot learning problem has attracted increasing attention from researchers for building a robust model upon only a few labeled samples. Most existing works tackle this problem under the meta-learning framework by mimicking the few-shot learning task with an episodic training strategy. In this paper, we propose a new transfer-learning framework for semi-supervised few-shot learning to fully utilize the auxiliary information from labeled base-class data and unlabeled novel-class data. The framework consists of three components: 1) pre-training a feature extractor on base-class data; 2) using the feature extractor to initialize the classifier weights for the novel classes; and 3) further updating the model with a semi-supervised learning method. Under the proposed framework, we develop a novel method for semi-supervised few-shot learning called TransMatch by instantiating the three components with Imprinting and MixMatch. Extensive experiments on two popular benchmark datasets for few-shot learning, CUB-200-2011 and miniImageNet, demonstrate that our proposed method can effectively utilize the auxiliary information from labeled base-class data and unlabeled novel-class data to significantly improve the accuracy of few-shot learning task.
Tasks	Few-Shot Learning, Meta-Learning, Transfer Learning
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09033v2
PDF	https://arxiv.org/pdf/1912.09033v2.pdf
PWC	https://paperswithcode.com/paper/transmatch-a-transfer-learning-scheme-for
Repo
Framework

ExperienceThinking: Hyperparameter Optimization with Budget Constraints


Title	ExperienceThinking: Hyperparameter Optimization with Budget Constraints
Authors	Chunnan Wang, Hongzhi Wang, Chang Zhou, Hanxiao Chen, Jianzhong Li, Hong Gao
Abstract	The problem of hyperparameter optimization exists widely in the real life and many common tasks can be transformed into it, such as neural architecture search and feature subset selection. Without considering various constraints, the existing hyperparameter tuning techniques can solve these problems effectively by traversing as many hyperparameter configurations as possible. However, because of the limited resources and budget, it is not feasible to evaluate so many kinds of configurations, which requires us to design effective algorithms to find a best possible hyperparameter configuration with a finite number of configuration evaluations. In this paper, we simulate human thinking processes and combine the merit of the existing techniques, and thus propose a new algorithm called ExperienceThinking, trying to solve this constrained hyperparameter optimization problem. In addition, we analyze the performances of 3 classical hyperparameter optimization algorithms with a finite number of configuration evaluations, and compare with that of ExperienceThinking. The experimental results show that our proposed algorithm provides superior results and has better performance.
Tasks	Hyperparameter Optimization, Neural Architecture Search
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00602v1
PDF	https://arxiv.org/pdf/1912.00602v1.pdf
PWC	https://paperswithcode.com/paper/experiencethinking-hyperparameter
Repo
Framework

Informative and Controllable Opinion Summarization


Title	Informative and Controllable Opinion Summarization
Authors	Reinald Kim Amplayo, Mirella Lapata
Abstract	Opinion summarization is the task of automatically generating summaries for a set of opinions about a specific target (e.g., a movie or a product). Since the number of input documents can be prohibitively large, neural network-based methods sacrifice end-to-end elegance and follow a two-stage approach where an extractive model first pre-selects a subset of salient opinions and an abstractive model creates the summary while conditioning on the extracted subset. However, the extractive stage leads to information loss and inflexible generation capability. In this paper we propose a summarization framework that eliminates the need to pre-select salient content. We view opinion summarization as an instance of multi-source transduction, and make use of all input documents by condensing them into multiple dense vectors which serve as input to an abstractive model. Beyond producing more informative summaries, we demonstrate that our approach allows to take user preferences into account based on a simple zero-shot customization technique. Experimental results show that our model improves the state of the art on the Rotten Tomatoes dataset by a wide margin and generates customized summaries effectively.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02322v1
PDF	https://arxiv.org/pdf/1909.02322v1.pdf
PWC	https://paperswithcode.com/paper/informative-and-controllable-opinion
Repo
Framework

Gaussianity and typicality in matrix distributional semantics


Title	Gaussianity and typicality in matrix distributional semantics
Authors	Sanjaye Ramgoolam, Mehrnoosh Sadrzadeh, Lewis Sword
Abstract	Constructions in type-driven compositional distributional semantics associate large collections of matrices of size $D$ to linguistic corpora. We develop the proposal of analysing the statistical characteristics of this data in the framework of permutation invariant matrix models. The observables in this framework are permutation invariant polynomial functions of the matrix entries, which correspond to directed graphs. Using the general 13-parameter permutation invariant Gaussian matrix models recently solved, we find, using a dataset of matrices constructed via standard techniques in distributional semantics, that the expectation values of a large class of cubic and quartic observables show high gaussianity at levels between 90 to 99 percent. Beyond expectation values, which are averages over words, the dataset allows the computation of standard deviations for each observable, which can be viewed as a measure of typicality for each observable. There is a wide range of magnitudes in the measures of typicality. The permutation invariant matrix models, considered as functions of random couplings, give a very good prediction of the magnitude of the typicality for different observables. We find evidence that observables with similar matrix model characteristics of Gaussianity and typicality also have high degrees of correlation between the ranked lists of words associated to these observables.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.10839v1
PDF	https://arxiv.org/pdf/1912.10839v1.pdf
PWC	https://paperswithcode.com/paper/gaussianity-and-typicality-in-matrix
Repo
Framework

Never Forget: Balancing Exploration and Exploitation via Learning Optical Flow


Title	Never Forget: Balancing Exploration and Exploitation via Learning Optical Flow
Authors	Hsuan-Kung Yang, Po-Han Chiang, Kuan-Wei Ho, Min-Fong Hong, Chun-Yi Lee
Abstract	Exploration bonus derived from the novelty of the states in an environment has become a popular approach to motivate exploration for deep reinforcement learning agents in the past few years. Recent methods such as curiosity-driven exploration usually estimate the novelty of new observations by the prediction errors of their system dynamics models. Due to the capacity limitation of the models and difficulty of performing next-frame prediction, however, these methods typically fail to balance between exploration and exploitation in high-dimensional observation tasks, resulting in the agents forgetting the visited paths and exploring those states repeatedly. Such inefficient exploration behavior causes significant performance drops, especially in large environments with sparse reward signals. In this paper, we propose to introduce the concept of optical flow estimation from the field of computer vision to deal with the above issue. We propose to employ optical flow estimation errors to examine the novelty of new observations, such that agents are able to memorize and understand the visited states in a more comprehensive fashion. We compare our method against the previous approaches in a number of experimental experiments. Our results indicate that the proposed method appears to deliver superior and long-lasting performance than the previous methods. We further provide a set of comprehensive ablative analysis of the proposed method, and investigate the impact of optical flow estimation on the learning curves of the DRL agents.
Tasks	Optical Flow Estimation
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08486v1
PDF	http://arxiv.org/pdf/1901.08486v1.pdf
PWC	https://paperswithcode.com/paper/never-forget-balancing-exploration-and
Repo
Framework

Stitching Videos from a Fisheye Lens Camera and a Wide-Angle Lens Camera for Telepresence Robots


Title	Stitching Videos from a Fisheye Lens Camera and a Wide-Angle Lens Camera for Telepresence Robots
Authors	Yanmei Dong, Mingtao Pei, Lijia Zhang, Bin Xu, Yuwei Wu, Yunde Jia
Abstract	Many telepresence robots are equipped with a forward-facing camera for video communication and a downward-facing camera for navigation. In this paper, we propose to stitch videos from the FF-camera with a wide-angle lens and the DF-camera with a fisheye lens for telepresence robots. We aim at providing more compact and efficient visual feedback for the user interface of telepresence robots with user-friendly interactive experiences. To this end, we present a multi-homography-based video stitching method which stitches videos from a wide-angle camera and a fisheye camera. The method consists of video image alignment, seam cutting, and image blending. We directly align the wide-angle video image and the fisheye video image based on the multi-homography alignment without calibration, distortion correction, and unwarping procedures. Thus, we can obtain a stitched video with shape preservation in the non-overlapping regions and alignment in the overlapping area for telepresence. To alleviate ghosting effects caused by moving objects and/or moving cameras during telepresence robot driving, an optimal seam is found for aligned video composition, and the optimal seam will be updated in subsequent frames, considering spatial and temporal coherence. The final stitched video is created by image blending based on the optimal seam. We conducted a user study to demonstrate the effectiveness of our method and the superiority of telepresence robots with a stitched video as visual feedback.
Tasks	Calibration
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06319v1
PDF	http://arxiv.org/pdf/1903.06319v1.pdf
PWC	https://paperswithcode.com/paper/stitching-videos-from-a-fisheye-lens-camera
Repo
Framework

Fully Convolutional Neural Network for Semantic Segmentation of Anatomical Structure and Pathologies in Colour Fundus Images Associated with Diabetic Retinopathy


Title	Fully Convolutional Neural Network for Semantic Segmentation of Anatomical Structure and Pathologies in Colour Fundus Images Associated with Diabetic Retinopathy
Authors	Oindrila Saha, Rachana Sathish, Debdoot Sheet
Abstract	Diabetic retinopathy (DR) is the most common form of diabetic eye disease. Retinopathy can affect all diabetic patients and becomes particularly dangerous, increasing the risk of blindness, if it is left untreated. The success rate of its curability solemnly depends on diagnosis at an early stage. The development of automated computer aided disease diagnosis tools could help in faster detection of symptoms with a wider reach and reasonable cost. This paper proposes a method for the automated segmentation of retinal lesions and optic disk in fundus images using a deep fully convolutional neural network for semantic segmentation. This trainable segmentation pipeline consists of an encoder network, a corresponding decoder network followed by pixel-wise classification to segment microaneurysms, hemorrhages, hard exudates, soft exudates, optic disk from background. The network was trained using Binary cross entropy criterion with Sigmoid as the last layer, while during an additional SoftMax layer was used for boosting response of single class. The performance of the proposed method is evaluated using sensitivity, positive prediction value (PPV) and accuracy as the metrices. Further, the position of the Optic disk is localised using the segmented output map.
Tasks	Semantic Segmentation
Published	2019-02-07
URL	http://arxiv.org/abs/1902.03122v1
PDF	http://arxiv.org/pdf/1902.03122v1.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-neural-network-for
Repo
Framework

Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention


Title	Ultrasound Image Representation Learning by Modeling Sonographer Visual Attention
Authors	Richard Droste, Yifan Cai, Harshita Sharma, Pierre Chatelain, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble
Abstract	Image representations are commonly learned from class labels, which are a simplistic approximation of human image understanding. In this paper we demonstrate that transferable representations of images can be learned without manual annotations by modeling human visual attention. The basis of our analyses is a unique gaze tracking dataset of sonographers performing routine clinical fetal anomaly screenings. Models of sonographer visual attention are learned by training a convolutional neural network (CNN) to predict gaze on ultrasound video frames through visual saliency prediction or gaze-point regression. We evaluate the transferability of the learned representations to the task of ultrasound standard plane detection in two contexts. Firstly, we perform transfer learning by fine-tuning the CNN with a limited number of labeled standard plane images. We find that fine-tuning the saliency predictor is superior to training from random initialization, with an average F1-score improvement of 9.6% overall and 15.3% for the cardiac planes. Secondly, we train a simple softmax regression on the feature activations of each CNN layer in order to evaluate the representations independently of transfer learning hyper-parameters. We find that the attention models derive strong representations, approaching the precision of a fully-supervised baseline model for all but the last layer.
Tasks	Representation Learning, Saliency Prediction, Transfer Learning
Published	2019-03-07
URL	https://arxiv.org/abs/1903.02974v2
PDF	https://arxiv.org/pdf/1903.02974v2.pdf
PWC	https://paperswithcode.com/paper/ultrasound-image-representation-learning-by
Repo
Framework

Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding


Title	Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding
Authors	Dengxin Dai, Christos Sakaridis, Simon Hecker, Luc Van Gool
Abstract	This work addresses the problem of semantic scene understanding under fog. Although marked progress has been made in semantic scene understanding, it is mainly concentrated on clear-weather scenes. Extending semantic segmentation methods to adverse weather conditions such as fog is crucial for outdoor applications. In this paper, we propose a novel method, named Curriculum Model Adaptation (CMAda), which gradually adapts a semantic segmentation model from light synthetic fog to dense real fog in multiple steps, using both labeled synthetic foggy data and unlabeled real foggy data. The method is based on the fact that the results of semantic segmentation in moderately adverse conditions (light fog) can be bootstrapped to solve the same problem in highly adverse conditions (dense fog). CMAda is extensible to other adverse conditions and provides a new paradigm for learning with synthetic data and unlabeled real data. In addition, we present three other main stand-alone contributions: 1) a novel method to add synthetic fog to real, clear-weather scenes using semantic input; 2) a new fog density estimator; 3) a novel fog densification method to densify the fog in real foggy scenes without using depth; and 4) the Foggy Zurich dataset comprising 3808 real foggy images, with pixel-level semantic annotations for 40 images under dense fog. Our experiments show that 1) our fog simulation and fog density estimator outperform their state-of-the-art counterparts with respect to the task of semantic foggy scene understanding (SFSU); 2) CMAda improves the performance of state-of-the-art models for SFSU significantly, benefiting both from our synthetic and real foggy data. The datasets and code are available at the project website.
Tasks	Scene Understanding, Semantic Segmentation
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01415v2
PDF	http://arxiv.org/pdf/1901.01415v2.pdf
PWC	https://paperswithcode.com/paper/curriculum-model-adaptation-with-synthetic
Repo
Framework

Toward Imitating Visual Attention of Experts in Software Development Tasks


Title	Toward Imitating Visual Attention of Experts in Software Development Tasks
Authors	Yoshiharu Ikutani, Nishanth Koganti, Hideaki Hata, Takatomi Kubo, Kenichi Matsumoto
Abstract	Expert programmers’ eye-movements during source code reading are valuable sources that are considered to be associated with their domain expertise. We advocate a vision of new intelligent systems incorporating expertise of experts for software development tasks, such as issue localization, comment generation, and code generation. We present a conceptual framework of neural autonomous agents based on imitation learning (IL), which enables agents to mimic the visual attention of an expert via his/her eye movement. In this framework, an autonomous agent is constructed as a context-based attention model that consists of encoder/decoder network and trained with state-action sequences generated by an experts’ demonstration. Challenges to implement an IL-based autonomous agent specialized for software development task are discussed in this paper.
Tasks	Code Generation, Imitation Learning
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06320v1
PDF	http://arxiv.org/pdf/1903.06320v1.pdf
PWC	https://paperswithcode.com/paper/toward-imitating-visual-attention-of-experts
Repo
Framework

A Tutorial on Concentration Bounds for System Identification


Title	A Tutorial on Concentration Bounds for System Identification
Authors	Nikolai Matni, Stephen Tu
Abstract	We provide a brief tutorial on the use of concentration inequalities as they apply to system identification of state-space parameters of linear time invariant systems, with a focus on the fully observed setting. We draw upon tools from the theories of large-deviations and self-normalized martingales, and provide both data-dependent and independent bounds on the learning rate.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11395v2
PDF	https://arxiv.org/pdf/1906.11395v2.pdf
PWC	https://paperswithcode.com/paper/a-tutorial-on-concentration-bounds-for-system
Repo
Framework

An Alarm System For Segmentation Algorithm Based On Shape Model


Title	An Alarm System For Segmentation Algorithm Based On Shape Model
Authors	Fengze Liu, Yingda Xia, Dong Yang, Alan Yuille, Daguang Xu
Abstract	It is usually hard for a learning system to predict correctly on rare events that never occur in the training data, and there is no exception for segmentation algorithms. Meanwhile, manual inspection of each case to locate the failures becomes infeasible due to the trend of large data scale and limited human resource. Therefore, we build an alarm system that will set off alerts when the segmentation result is possibly unsatisfactory, assuming no corresponding ground truth mask is provided. One plausible solution is to project the segmentation results into a low dimensional feature space; then learn classifiers/regressors to predict their qualities. Motivated by this, in this paper, we learn a feature space using the shape information which is a strong prior shared among different datasets and robust to the appearance variation of input data.The shape feature is captured using a Variational Auto-Encoder (VAE) network that trained with only the ground truth masks. During testing, the segmentation results with bad shapes shall not fit the shape prior well, resulting in large loss values. Thus, the VAE is able to evaluate the quality of segmentation result on unseen data, without using ground truth. Finally, we learn a regressor in the one-dimensional feature space to predict the qualities of segmentation results. Our alarm system is evaluated on several recent state-of-art segmentation algorithms for 3D medical segmentation tasks. Compared with other standard quality assessment methods, our system consistently provides more reliable prediction on the qualities of segmentation results.
Tasks
Published	2019-03-26
URL	https://arxiv.org/abs/1903.10645v3
PDF	https://arxiv.org/pdf/1903.10645v3.pdf
PWC	https://paperswithcode.com/paper/an-alarm-system-for-segmentation-algorithm-1
Repo
Framework