October 15, 2019

2904 words 14 mins read

Paper Group NANR 103

Deep Homogeneous Mixture Models: Representation, Separation, and Approximation. Cross-Pair Text Representations for Answer Sentence Selection. DFT-based Transformation Invariant Pooling Layer for Visual Classification. Scale-Awareness of Light Field Camera based Visual Odometry. Transfer and Multi-Task Learning for Noun–Noun Compound Interpretatio …

Deep Homogeneous Mixture Models: Representation, Separation, and Approximation


Title	Deep Homogeneous Mixture Models: Representation, Separation, and Approximation
Authors	Priyank Jaini, Pascal Poupart, Yaoliang Yu
Abstract	At their core, many unsupervised learning models provide a compact representation of homogeneous density mixtures, but their similarities and differences are not always clearly understood. In this work, we formally establish the relationships among latent tree graphical models (including special cases such as hidden Markov models and tensorial mixture models), hierarchical tensor formats and sum-product networks. Based on this connection, we then give a unified treatment of exponential separation in \emph{exact} representation size between deep mixture architectures and shallow ones. In contrast, for \emph{approximate} representation, we show that the conditional gradient algorithm can approximate any homogeneous mixture within $\epsilon$ accuracy by combining $O(1/\epsilon^2)$ ``shallow’’ architectures, where the hidden constant may decrease (exponentially) with respect to the depth. Our experiments on both synthetic and real datasets confirm the benefits of depth in density estimation. \|
Tasks	Density Estimation
Published	2018-12-01
URL	http://papers.nips.cc/paper/7944-deep-homogeneous-mixture-models-representation-separation-and-approximation
PDF	http://papers.nips.cc/paper/7944-deep-homogeneous-mixture-models-representation-separation-and-approximation.pdf
PWC	https://paperswithcode.com/paper/deep-homogeneous-mixture-models
Repo
Framework

Cross-Pair Text Representations for Answer Sentence Selection


Title	Cross-Pair Text Representations for Answer Sentence Selection
Authors	Kateryna Tymoshenko, Aless Moschitti, ro
Abstract	High-level semantics tasks, e.g., paraphrasing, textual entailment or question answering, involve modeling of text pairs. Before the emergence of neural networks, this has been mostly performed using intra-pair features, which incorporate similarity scores or rewrite rules computed between the members within the same pair. In this paper, we compute scalar products between vectors representing similarity between members of different pairs, in place of simply using a single vector for each pair. This allows us to obtain a representation specific to any pair of pairs, which delivers the state of the art in answer sentence selection. Most importantly, our approach can outperform much more complex algorithms based on neural networks.
Tasks	Natural Language Inference, Open-Domain Question Answering, Question Answering
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1240/
PDF	https://www.aclweb.org/anthology/D18-1240
PWC	https://paperswithcode.com/paper/cross-pair-text-representations-for-answer
Repo
Framework

DFT-based Transformation Invariant Pooling Layer for Visual Classification


Title	DFT-based Transformation Invariant Pooling Layer for Visual Classification
Authors	Jongbin Ryu, Ming-Hsuan Yang, Jongwoo Lim
Abstract	We propose a novel discrete Fourier transform-based pooling layer for convolutional neural networks. The DFT magnitude pooling replaces the traditional max/average pooling layer between the convolution and fully-connected layers to retain translation invariance and shape preserving (aware of shape difference) properties based on the shift theorem of the Fourier transform. Thanks to the ability to handle image misalignment while keeping important structural information in the pooling stage, the DFT magnitude pooling improves the classification accuracy significantly. In addition, we propose the DFT+ method for ensemble networks using the middle convolution layer outputs. The proposed methods are extensively evaluated on various classification tasks using the ImageNet, CUB 2010-2011, MIT Indoors, Caltech 101, FMD and DTD datasets. The AlexNet, VGG-VD 16, Inception-v3, and ResNet are used as the base networks, upon which DFT and DFT+ methods are implemented. Experimental results show that the proposed methods improve the classification performance in all networks and datasets.
Tasks
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Jongbin_Ryu_DFT-based_Transformation_Invariant_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Jongbin_Ryu_DFT-based_Transformation_Invariant_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/dft-based-transformation-invariant-pooling
Repo
Framework

Scale-Awareness of Light Field Camera based Visual Odometry


Title	Scale-Awareness of Light Field Camera based Visual Odometry
Authors	Niclas Zeller, Franz Quint, Uwe Stilla
Abstract	We propose a novel direct visual odometry algorithm for micro-lens-array-based light field cameras. The algorithm calculates a detailed, semi-dense 3D point cloud of its environment. This is achieved by establishing probabilistic depth hypotheses based on stereo observations between the micro images of different recordings. Tracking is performed in a coarse-to-fine process, working directly on the recorded raw images. The tracking accounts for changing lighting conditions and utilizes a linear motion model to be more robust. A novel scale optimization framework is proposed. It estimates the scene scale, on the basis of keyframes, and optimizes the scale of the entire trajectory by filtering over multiple estimates. The method is tested based on a versatile dataset consisting of challenging indoor and outdoor sequences and is compared to state-of-the-art monocular and stereo approaches. The algorithm shows the ability to recover the absolute scale of the scene and significantly outperforms state-of-the-art monocular algorithms with respect to scale drifts.
Tasks	Visual Odometry
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Niclas_Zeller_Scale-Awareness_of_Light_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Niclas_Zeller_Scale-Awareness_of_Light_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/scale-awareness-of-light-field-camera-based
Repo
Framework

Transfer and Multi-Task Learning for Noun–Noun Compound Interpretation


Title	Transfer and Multi-Task Learning for Noun–Noun Compound Interpretation
Authors	Murhaf Fares, Stephan Oepen, Erik Velldal
Abstract	In this paper, we empirically evaluate the utility of transfer and multi-task learning on a challenging semantic classification task: semantic interpretation of noun{–}noun compounds. Through a comprehensive series of experiments and in-depth error analysis, we show that transfer learning via parameter initialization and multi-task learning via parameter sharing can help a neural classification model generalize over a highly skewed distribution of relations. Further, we demonstrate how dual annotation with two distinct sets of relations over the same set of compounds can be exploited to improve the overall accuracy of a neural classifier and its F1 scores on the less frequent, but more difficult relations.
Tasks	Information Retrieval, Multi-Task Learning, Question Answering, Transfer Learning, Word Embeddings
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1178/
PDF	https://www.aclweb.org/anthology/D18-1178
PWC	https://paperswithcode.com/paper/transfer-and-multi-task-learning-for
Repo
Framework

Uniform Information Density Effects on Syntactic Choice in Hindi


Title	Uniform Information Density Effects on Syntactic Choice in Hindi
Authors	Ayush Jain, Vishal Singh, Sidharth Ranjan, Rajakrishnan Rajkumar, Sumeet Agarwal
Abstract	According to the UNIFORM INFORMATION DENSITY (UID) hypothesis (Levy and Jaeger, 2007; Jaeger, 2010), speakers tend to distribute information density across the signal uniformly while producing language. The prior works cited above studied syntactic reduction in language production at particular choice points in a sentence. In contrast, we use a variant of the above UID hypothesis in order to investigate the extent to which word order choices in Hindi are influenced by the drive to minimize the variance of information across entire sentences. To this end, we propose multiple lexical and syntactic measures (at both word and constituent levels) to capture the uniform spread of information across a sentence. Subsequently, we incorporate these measures in machine learning models aimed to distinguish between a naturally occurring corpus sentence and its grammatical variants (expressing the same idea). Our results indicate that our UID measures are not a significant factor in predicting the corpus sentence in the presence of lexical surprisal, a competing control predictor. Finally, in the light of other recent works, we conclude with a discussion of reasons for UID not being suitable for a theory of word order.
Tasks
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-4605/
PDF	https://www.aclweb.org/anthology/W18-4605
PWC	https://paperswithcode.com/paper/uniform-information-density-effects-on
Repo
Framework

DIDEC: The Dutch Image Description and Eye-tracking Corpus


Title	DIDEC: The Dutch Image Description and Eye-tracking Corpus
Authors	Emiel van Miltenburg, {'A}kos K{'a}d{'a}r, Ruud Koolen, Emiel Krahmer
Abstract	We present a corpus of spoken Dutch image descriptions, paired with two sets of eye-tracking data: Free viewing, where participants look at images without any particular purpose, and Description viewing, where we track eye movements while participants produce spoken descriptions of the images they are viewing. This paper describes the data collection procedure and the corpus itself, and provides an initial analysis of self-corrections in image descriptions. We also present two studies showing the potential of this data. Though these studies mainly serve as an example, we do find two interesting results: (1) the eye-tracking data for the description viewing task is more coherent than for the free-viewing task; (2) variation in image descriptions (also called {`}image specificity{'}; Jas and Parikh, 2015) is only moderately correlated across different languages. Our corpus can be used to gain a deeper understanding of the image description task, particularly how visual attention is correlated with the image description process. \|
Tasks	Eye Tracking
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1310/
PDF	https://www.aclweb.org/anthology/C18-1310
PWC	https://paperswithcode.com/paper/didec-the-dutch-image-description-and-eye
Repo
Framework

Deep Hashing via Discrepancy Minimization


Title	Deep Hashing via Discrepancy Minimization
Authors	Zhixiang Chen, Xin Yuan, Jiwen Lu, Qi Tian, Jie Zhou
Abstract	This paper presents a discrepancy minimizing model to address the discrete optimization problem in hashing learning. The discrete optimization introduced by binary constraint is an NP-hard mixed integer programming problem. It is usually addressed by relaxing the binary variables into continuous variables to adapt to the gradient based learning of hashing functions, especially the training of deep neural networks. To deal with the objective discrepancy caused by relaxation, we transform the original binary optimization into differentiable optimization problem over hash functions through series expansion. This transformation decouples the binary constraint and the similarity preserving hashing function optimization. The transformed objective is optimized in a tractable alternating optimization framework with gradual discrepancy minimization. Extensive experimental results on three benchmark datasets validate the efficacy of the proposed discrepancy minimizing hashing.
Tasks
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Chen_Deep_Hashing_via_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_Deep_Hashing_via_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/deep-hashing-via-discrepancy-minimization
Repo
Framework

DetNet: Design Backbone for Object Detection


Title	DetNet: Design Backbone for Object Detection
Authors	Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun
Abstract	Recent CNN based object detectors, either one-stage methods like YOLO, SSD, and RetinaNet, or two-stage detectors like Faster R-CNN, R-FCN and FPN, are usually trying to directly finetune from ImageNet pre-trained models designed for the task of image classification. However, there has been little work discussing the backbone feature extractor specifically designed for the task of object detection. More importantly, there are several differences between the tasks of image classification and object detection. (1) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. (2) Object detection not only needs to recognize the category of the object instances but also spatially locate them. Large downsampling factors bring large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet~(4.8G FLOPs) backbone. Codes will be released.
Tasks	Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Zeming_Li_DetNet_Design_Backbone_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Zeming_Li_DetNet_Design_Backbone_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/detnet-design-backbone-for-object-detection
Repo
Framework

An information-theoretic analysis of deep latent-variable models


Title	An information-theoretic analysis of deep latent-variable models
Authors	Alex Alemi, Ben Poole, Ian Fischer, Josh Dillon, Rif A. Saurus, Kevin Murphy
Abstract	We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.
Tasks	Latent Variable Models, Omniglot
Published	2018-01-01
URL	https://openreview.net/forum?id=H1rRWl-Cb
PDF	https://openreview.net/pdf?id=H1rRWl-Cb
PWC	https://paperswithcode.com/paper/an-information-theoretic-analysis-of-deep
Repo
Framework

Wavelet Pooling for Convolutional Neural Networks


Title	Wavelet Pooling for Convolutional Neural Networks
Authors	Travis Williams, Robert Li
Abstract	Convolutional Neural Networks continuously advance the progress of 2D and 3D image and object classification. The steadfast usage of this algorithm requires constant evaluation and upgrading of foundational concepts to maintain progress. Network regularization techniques typically focus on convolutional layer operations, while leaving pooling layer operations without suitable options. We introduce Wavelet Pooling as another alternative to traditional neighborhood pooling. This method decomposes features into a second level decomposition, and discards the first-level subbands to reduce feature dimensions. This method addresses the overfitting problem encountered by max pooling, while reducing features in a more structurally compact manner than pooling via neighborhood regions. Experimental results on four benchmark classification datasets demonstrate our proposed method outperforms or performs comparatively with methods like max, mean, mixed, and stochastic pooling.
Tasks	Object Classification
Published	2018-01-01
URL	https://openreview.net/forum?id=rkhlb8lCZ
PDF	https://openreview.net/pdf?id=rkhlb8lCZ
PWC	https://paperswithcode.com/paper/wavelet-pooling-for-convolutional-neural
Repo
Framework

Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing


Title	Unsupervised Induction of Linguistic Categories with Records of Reading, Speaking, and Writing
Authors	Maria Barrett, Ana Valeria Gonz{'a}lez-Gardu{~n}o, Lea Frermann, Anders S{\o}gaard
Abstract	When learning POS taggers and syntactic chunkers for low-resource languages, different resources may be available, and often all we have is a small tag dictionary, motivating type-constrained unsupervised induction. Even small dictionaries can improve the performance of unsupervised induction algorithms. This paper shows that performance can be further improved by including data that is readily available or can be easily obtained for most languages, i.e., eye-tracking, speech, or keystroke logs (or any combination thereof). We project information from all these data sources into shared spaces, in which the union of words is represented. For English unsupervised POS induction, the additional information, which is not required at test time, leads to an average error reduction on Ontonotes domains of 1.5{%} over systems augmented with state-of-the-art word embeddings. On Penn Treebank the best model achieves 5.4{%} error reduction over a word embeddings baseline. We also achieve significant improvements for syntactic chunk induction. Our analysis shows that improvements are even bigger when the available tag dictionaries are smaller.
Tasks	Chunking, Eye Tracking, Word Embeddings
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-1184/
PDF	https://www.aclweb.org/anthology/N18-1184
PWC	https://paperswithcode.com/paper/unsupervised-induction-of-linguistic
Repo
Framework

Weighting Model Based on Group Dynamics to Measure Convergence in Multi-party Dialogue


Title	Weighting Model Based on Group Dynamics to Measure Convergence in Multi-party Dialogue
Authors	Zahra Rahimi, Diane Litman
Abstract	This paper proposes a new weighting method for extending a dyad-level measure of convergence to multi-party dialogues by considering group dynamics instead of simply averaging. Experiments indicate the usefulness of the proposed weighted measure and also show that in general a proper weighting of the dyad-level measures performs better than non-weighted averaging in multiple tasks.
Tasks
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-5046/
PDF	https://www.aclweb.org/anthology/W18-5046
PWC	https://paperswithcode.com/paper/weighting-model-based-on-group-dynamics-to
Repo
Framework

Gaze Prediction in Dynamic 360Â° Immersive Videos


Title	Gaze Prediction in Dynamic 360Â° Immersive Videos
Authors	Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, Shenghua Gao
Abstract	This paper explores gaze prediction in dynamic $360^circ$ immersive videos, emph{i.e.}, based on the history scan path and VR contents, we predict where a viewer will look at an upcoming time. To tackle this problem, we first present the large-scale eye-tracking in dynamic VR scene dataset. Our dataset contains 208 $360^circ$ videos captured in dynamic scenes, and each video is viewed by at least 31 subjects. Our analysis shows that gaze prediction depends on its history scan path and image contents. In terms of the image contents, those salient objects easily attract viewers’ attention. On the one hand, the saliency is related to both appearance and motion of the objects. Considering that the saliency measured at different scales is different, we propose to compute saliency maps at different spatial scales: the sub-image patch centered at current gaze point, the sub-image corresponding to the Field of View (FoV), and the panorama image. Then we feed both the saliency maps and the corresponding images into a Convolutional Neural Network (CNN) for feature extraction. Meanwhile, we also use a Long-Short-Term-Memory (LSTM) to encode the history scan path. Then we combine the CNN features and LSTM features for gaze displacement prediction between gaze point at a current time and gaze point at an upcoming time. Extensive experiments validate the effectiveness of our method for gaze prediction in dynamic VR scenes.
Tasks	Eye Tracking, Gaze Prediction
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Xu_Gaze_Prediction_in_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_Gaze_Prediction_in_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/gaze-prediction-in-dynamic-360a-immersive
Repo
Framework

Multi-Scale Context Intertwining for Semantic Segmentation


Title	Multi-Scale Context Intertwining for Semantic Segmentation
Authors	Di Lin, Yuanfeng Ji, Dani Lischinski, Daniel Cohen-Or, Hui Huang
Abstract	Accurate semantic image segmentation requires the joint consideration of local appearance, semantic information, and global scene context. In todayâs age of pre-trained deep networks and their powerful convolutional features, state-of-the-art semantic segmentation approaches differ mostly in how they choose to combine together these different kinds of information. In this work, we propose a novel scheme for aggregating features from different scales, which we refer to as Multi-Scale Context Intertwining (MSCI). In contrast to previous approaches, which typically propagate information between scales in a one-directional manner, we merge pairs of feature maps in a bidirectional and recurrent fashion, via connections between two LSTM chains. By training the parameters of the LSTM units on the segmentation task, the above approach learns how to extract powerful and effective features for pixel-level semantic segmentation, which are then combined hierarchically. Furthermore, rather than using fixed information propagation routes, we subdivide images into super-pixels, and use the spatial relationship between them in order to perform image-adapted context aggregation. Our extensive evaluation on public benchmarks indicates that all of the aforementioned components of our approach increase the effectiveness of information propagation throughout the network, and significantly improve its eventual segmentation accuracy.
Tasks	Semantic Segmentation
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Di_Lin_Multi-Scale_Context_Intertwining_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Di_Lin_Multi-Scale_Context_Intertwining_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/multi-scale-context-intertwining-for-semantic
Repo
Framework