October 17, 2019

3005 words 15 mins read

Paper Group ANR 956

Hyperbolic Deep Learning for Chinese Natural Language Understanding. Domain Confusion with Self Ensembling for Unsupervised Adaptation. Social Media Analysis For Organizations: Us Northeastern Public And State Libraries Case Study. RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform. Semi-supervised classification by rea …

Hyperbolic Deep Learning for Chinese Natural Language Understanding


Title	Hyperbolic Deep Learning for Chinese Natural Language Understanding
Authors	Marko Valentin Micic, Hugo Chu
Abstract	Recently hyperbolic geometry has proven to be effective in building embeddings that encode hierarchical and entailment information. This makes it particularly suited to modelling the complex asymmetrical relationships between Chinese characters and words. In this paper we first train a large scale hyperboloid skip-gram model on a Chinese corpus, then apply the character embeddings to a downstream hyperbolic Transformer model derived from the principles of gyrovector space for Poincare disk model. In our experiments the character-based Transformer outperformed its word-based Euclidean equivalent. To the best of our knowledge, this is the first time in Chinese NLP that a character-based model outperformed its word-based counterpart, allowing the circumvention of the challenging and domain-dependent task of Chinese Word Segmentation (CWS).
Tasks	Chinese Word Segmentation
Published	2018-12-11
URL	http://arxiv.org/abs/1812.10408v1
PDF	http://arxiv.org/pdf/1812.10408v1.pdf
PWC	https://paperswithcode.com/paper/hyperbolic-deep-learning-for-chinese-natural
Repo
Framework

Domain Confusion with Self Ensembling for Unsupervised Adaptation


Title	Domain Confusion with Self Ensembling for Unsupervised Adaptation
Authors	Jiawei Wang, Zhaoshui He, Chengjian Feng, Zhouping Zhu, Qinzhuang Lin, Jun Lv, Shengli Xie
Abstract	Data collection and annotation are time-consuming in machine learning, expecially for large scale problem. A common approach for this problem is to transfer knowledge from a related labeled domain to a target one. There are two popular ways to achieve this goal: adversarial learning and self training. In this article, we first analyze the training unstablity problem and the mistaken confusion issue in adversarial learning process. Then, inspired by domain confusion and self-ensembling methods, we propose a combined model to learn feature and class jointly invariant representation, namely Domain Confusion with Self Ensembling (DCSE). The experiments verified that our proposed approach can offer better performance than empirical art in a variety of unsupervised domain adaptation benchmarks.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04472v1
PDF	http://arxiv.org/pdf/1810.04472v1.pdf
PWC	https://paperswithcode.com/paper/domain-confusion-with-self-ensembling-for
Repo
Framework


Title	Social Media Analysis For Organizations: Us Northeastern Public And State Libraries Case Study
Authors	Matthew Collins, Amir Karami
Abstract	Social networking sites such as Twitter have provided a great opportunity for organizations such as public libraries to disseminate information for public relations purposes. However, there is a need to analyze vast amounts of social media data. This study presents a computational approach to explore the content of tweets posted by nine public libraries in the northeastern United States of America. In December 2017, this study extracted more than 19,000 tweets from the Twitter accounts of seven state libraries and two urban public libraries. Computational methods were applied to collect the tweets and discover meaningful themes. This paper shows how the libraries have used Twitter to represent their services and provides a starting point for different organizations to evaluate the themes of their public tweets.
Tasks
Published	2018-03-24
URL	http://arxiv.org/abs/1803.09133v1
PDF	http://arxiv.org/pdf/1803.09133v1.pdf
PWC	https://paperswithcode.com/paper/social-media-analysis-for-organizations-us
Repo
Framework


Title	RIFT: Multi-modal Image Matching Based on Radiation-invariant Feature Transform
Authors	Jiayuan Li, Qingwu Hu, Mingyao Ai
Abstract	Traditional feature matching methods such as scale-invariant feature transform (SIFT) usually use image intensity or gradient information to detect and describe feature points; however, both intensity and gradient are sensitive to nonlinear radiation distortions (NRD). To solve the problem, this paper proposes a novel feature matching algorithm that is robust to large NRD. The proposed method is called radiation-invariant feature transform (RIFT). There are three main contributions in RIFT: first, RIFT uses phase congruency (PC) instead of image intensity for feature point detection. RIFT considers both the number and repeatability of feature points, and detects both corner points and edge points on the PC map. Second, RIFT originally proposes a maximum index map (MIM) for feature description. MIM is constructed from the log-Gabor convolution sequence and is much more robust to NRD than traditional gradient map. Thus, RIFT not only largely improves the stability of feature detection, but also overcomes the limitation of gradient information for feature description. Third, RIFT analyzes the inherent influence of rotations on the values of MIM, and realizes rotation invariance. We use six different types of multi-model image datasets to evaluate RIFT, including optical-optical, infrared-optical, synthetic aperture radar (SAR)-optical, depth-optical, map-optical, and day-night datasets. Experimental results show that RIFT is much more superior to SIFT and SAR-SIFT. To the best of our knowledge, RIFT is the first feature matching algorithm that can achieve good performance on all the above-mentioned types of multi-model images. The source code of RIFT and multi-modal remote sensing image datasets are made public .
Tasks
Published	2018-04-25
URL	http://arxiv.org/abs/1804.09493v1
PDF	http://arxiv.org/pdf/1804.09493v1.pdf
PWC	https://paperswithcode.com/paper/rift-multi-modal-image-matching-based-on
Repo
Framework

Semi-supervised classification by reaching consensus among modalities


Title	Semi-supervised classification by reaching consensus among modalities
Authors	Zining Zhu, Jekaterina Novikova, Frank Rudzicz
Abstract	Deep learning has demonstrated abilities to learn complex structures, but they can be restricted by available data. Recently, Consensus Networks (CNs) were proposed to alleviate data sparsity by utilizing features from multiple modalities, but they too have been limited by the size of labeled data. In this paper, we extend CN to Transductive Consensus Networks (TCNs), suitable for semi-supervised learning. In TCNs, different modalities of input are compressed into latent representations, which we encourage to become indistinguishable during iterative adversarial training. To understand TCNs two mechanisms, consensus and classification, we put forward its three variants in ablation studies on these mechanisms. To further investigate TCN models, we treat the latent representations as probability distributions and measure their similarities as the negative relative Jensen-Shannon divergences. We show that a consensus state beneficial for classification desires a stable but imperfect similarity between the representations. Overall, TCNs outperform or align with the best benchmark algorithms given 20 to 200 labeled samples on the Bank Marketing and the DementiaBank datasets.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09366v2
PDF	http://arxiv.org/pdf/1805.09366v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-classification-by-reaching
Repo
Framework

Negative Binomial Matrix Factorization for Recommender Systems


Title	Negative Binomial Matrix Factorization for Recommender Systems
Authors	Olivier Gouvert, Thomas Oberlin, Cédric Févotte
Abstract	We introduce negative binomial matrix factorization (NBMF), a matrix factorization technique specially designed for analyzing over-dispersed count data. It can be viewed as an extension of Poisson matrix factorization (PF) perturbed by a multiplicative term which models exposure. This term brings a degree of freedom for controlling the dispersion, making NBMF more robust to outliers. We show that NBMF allows to skip traditional pre-processing stages, such as binarization, which lead to loss of information. Two estimation approaches are presented: maximum likelihood and variational Bayes inference. We test our model with a recommendation task and show its ability to predict user tastes with better precision than PF.
Tasks	Recommendation Systems
Published	2018-01-05
URL	http://arxiv.org/abs/1801.01708v1
PDF	http://arxiv.org/pdf/1801.01708v1.pdf
PWC	https://paperswithcode.com/paper/negative-binomial-matrix-factorization-for
Repo
Framework

TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers


Title	TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers
Authors	Masaki Saito, Shunta Saito
Abstract	In this paper, we propose a novel method to efficiently train a Generative Adversarial Network (GAN) on high dimensional samples. The key idea is to introduce a differentiable subsampling layer which appropriately reduces the dimensionality of intermediate feature maps in the generator during training. In general, generators require large memory and computational costs in the latter stages of the network as the feature maps become larger, though the latter stages have relatively fewer parameters than the earlier stages. It makes training large models for video generation difficult due to the limited computational resource. We solve this problem by introducing a method that gradually reduces the dimensionality of feature maps in the generator with multiple subsampling layers. We also propose a network (Temporal GAN v2) with such layers and perform video generation experiments. As a consequence, our model trained on the UCF101 dataset at $192 \times 192$ pixels achieves an Inception Score (IS) of 24.34, which shows a significant improvement over the previous state-of-the-art score of 14.56.
Tasks	Video Generation
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09245v1
PDF	http://arxiv.org/pdf/1811.09245v1.pdf
PWC	https://paperswithcode.com/paper/tganv2-efficient-training-of-large-models-for
Repo
Framework

Fully Automatic Segmentation of Sublingual Veins from Retrained U-Net Model for Few Near Infrared Images


Title	Fully Automatic Segmentation of Sublingual Veins from Retrained U-Net Model for Few Near Infrared Images
Authors	Tingxiao Yang, Yuichiro Yoshimura, Akira Morita, Takao Namiki, Toshiya Nakaguchi
Abstract	Sublingual vein is commonly used to diagnose the health status. The width of main sublingual veins gives information of the blood circulation. Therefore, it is necessary to segment the main sublingual veins from the tongue automatically. In general, the dataset in the medical field is small, which is a challenge for training the deep learning model. In order to train the model with a small data set, the proposed method for automatically segmenting the sublingual veins is to re-train U-net model with different sets of the limited number of labels for the same training images. With pre-knowledge of the segmentation, the loss of the trained model will be convergence easier. To improve the performance of the segmentation further, a novel strategy of data augmentation was utilized. The operation for masking output of the model with the input was randomly switched on or switched off in each training step. This approach will force the model to learn the contrast invariance and avoid overfitting. Images of dataset were taken with the developed device using eight near infrared LEDs. The final segmentation results were evaluated on the validation dataset by the IoU metric.
Tasks	Data Augmentation
Published	2018-12-22
URL	http://arxiv.org/abs/1812.09477v1
PDF	http://arxiv.org/pdf/1812.09477v1.pdf
PWC	https://paperswithcode.com/paper/fully-automatic-segmentation-of-sublingual
Repo
Framework

Kinematic Morphing Networks for Manipulation Skill Transfer


Title	Kinematic Morphing Networks for Manipulation Skill Transfer
Authors	Peter Englert, Marc Toussaint
Abstract	The transfer of a robot skill between different geometric environments is non-trivial since a wide variety of environments exists, sensor observations as well as robot motions are high-dimensional, and the environment might only be partially observed. We consider the problem of extracting a low-dimensional description of the manipulated environment in form of a kinematic model. This allows us to transfer a skill by defining a policy on a prototype model and morphing the observed environment to this prototype. A deep neural network is used to map depth image observations of the environment to morphing parameter, which include transformation and configuration parameters of the prototype model. Using the concatenation property of affine transformations and the ability to convert point clouds to depth images allows to apply the network in an iterative manner. The network is trained on data generated in a simulator and on augmented data that is created by using network predictions. The algorithm is evaluated on different tasks, where it is shown that iterative predictions lead to a higher accuracy than one-step predictions.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01777v1
PDF	http://arxiv.org/pdf/1803.01777v1.pdf
PWC	https://paperswithcode.com/paper/kinematic-morphing-networks-for-manipulation
Repo
Framework

3D Human Pose Estimation with Relational Networks


Title	3D Human Pose Estimation with Relational Networks
Authors	Sungheon Park, Nojun Kwak
Abstract	In this paper, we propose a novel 3D human pose estimation algorithm from a single image based on neural networks. We adopted the structure of the relational networks in order to capture the relations among different body parts. In our method, each pair of different body parts generates features, and the average of the features from all the pairs are used for 3D pose estimation. In addition, we propose a dropout method that can be used in relational modules, which inherently imposes robustness to the occlusions. The proposed network achieves state-of-the-art performance for 3D pose estimation in Human 3.6M dataset, and it effectively produces plausible results even in the existence of missing joints.
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published	2018-05-23
URL	http://arxiv.org/abs/1805.08961v2
PDF	http://arxiv.org/pdf/1805.08961v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-with-relational
Repo
Framework

A Unified Framework for Training Neural Networks


Title	A Unified Framework for Training Neural Networks
Authors	Hadi Ghauch, Hossein Shokri-Ghadikolaei, Carlo Fischione, Mikael Skoglund
Abstract	The lack of mathematical tractability of Deep Neural Networks (DNNs) has hindered progress towards having a unified convergence analysis of training algorithms, in the general setting. We propose a unified optimization framework for training different types of DNNs, and establish its convergence for arbitrary loss, activation, and regularization functions, assumed to be smooth. We show that framework generalizes well-known first- and second-order training methods, and thus allows us to show the convergence of these methods for various DNN architectures and learning tasks, as a special case of our approach. We discuss some of its applications in training various DNN architectures (e.g., feed-forward, convolutional, linear networks), to regression and classification tasks.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09214v1
PDF	http://arxiv.org/pdf/1805.09214v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-framework-for-training-neural
Repo
Framework

PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities


Title	PM-GANs: Discriminative Representation Learning for Action Recognition Using Partial-modalities
Authors	Lan Wang, Chenqiang Gao, Luyu Yang, Yue Zhao, Wangmeng Zuo, Deyu Meng
Abstract	Data of different modalities generally convey complimentary but heterogeneous information, and a more discriminative representation is often preferred by combining multiple data modalities like the RGB and infrared features. However in reality, obtaining both data channels is challenging due to many limitations. For example, the RGB surveillance cameras are often restricted from private spaces, which is in conflict with the need of abnormal activity detection for personal security. As a result, using partial data channels to build a full representation of multi-modalities is clearly desired. In this paper, we propose a novel Partial-modal Generative Adversarial Networks (PM-GANs) that learns a full-modal representation using data from only partial modalities. The full representation is achieved by a generated representation in place of the missing data channel. Extensive experiments are conducted to verify the performance of our proposed method on action recognition, compared with four state-of-the-art methods. Meanwhile, a new Infrared-Visible Dataset for action recognition is introduced, and will be the first publicly available action dataset that contains paired infrared and visible spectrum.
Tasks	Action Detection, Activity Detection, Representation Learning, Temporal Action Localization
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06248v1
PDF	http://arxiv.org/pdf/1804.06248v1.pdf
PWC	https://paperswithcode.com/paper/pm-gans-discriminative-representation
Repo
Framework

Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law


Title	Tight Query Complexity Lower Bounds for PCA via Finite Sample Deformed Wigner Law
Authors	Max Simchowitz, Ahmed El Alaoui, Benjamin Recht
Abstract	We prove a \emph{query complexity} lower bound for approximating the top $r$ dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix $\mathbf{M} \in \mathbb{R}^{d \times d}$, an algorithm $\mathsf{Alg}$ is allowed to make $\mathsf{T}$ exact queries of the form $\mathsf{w}^{(i)} = \mathbf{M} \mathsf{v}^{(i)}$ for $i$ in ${1,…,\mathsf{T}}$, where $\mathsf{v}^{(i)}$ is drawn from a distribution which depends arbitrarily on the past queries and measurements ${\mathsf{v}^{(j)},\mathsf{w}^{(i)}}_{1 \le j \le i-1}$. We show that for every $\mathtt{gap} \in (0,1/2]$, there exists a distribution over matrices $\mathbf{M}$ for which 1) $\mathrm{gap}_r(\mathbf{M}) = \Omega(\mathtt{gap})$ (where $\mathrm{gap}r(\mathbf{M})$ is the normalized gap between the $r$ and $r+1$-st largest-magnitude eigenvector of $\mathbf{M}$), and 2) any algorithm $\mathsf{Alg}$ which takes fewer than $\mathrm{const} \times \frac{r \log d}{\sqrt{\mathtt{gap}}}$ queries fails (with overwhelming probability) to identity a matrix $\widehat{\mathsf{V}} \in \mathbb{R}^{d \times r}$ with orthonormal columns for which $\langle \widehat{\mathsf{V}}, \mathbf{M} \widehat{\mathsf{V}}\rangle \ge (1 - \mathrm{const} \times \mathtt{gap})\sum{i=1}^r \lambda_i(\mathbf{M})$. Our bound requires only that $d$ is a small polynomial in $1/\mathtt{gap}$ and $r$, and matches the upper bounds of Musco and Musco ‘15. Moreover, it establishes a strict separation between convex optimization and \emph{randomized}, “strict-saddle” non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension.
Tasks
Published	2018-04-04
URL	http://arxiv.org/abs/1804.01221v1
PDF	http://arxiv.org/pdf/1804.01221v1.pdf
PWC	https://paperswithcode.com/paper/tight-query-complexity-lower-bounds-for-pca
Repo
Framework

Scalable Simple Linear Iterative Clustering (SSLIC) Using a Generic and Parallel Approach


Title	Scalable Simple Linear Iterative Clustering (SSLIC) Using a Generic and Parallel Approach
Authors	Bradley C. Lowekamp, David T. Chen, Ziv Yaniv, Terry S. Yoo
Abstract	Superpixel algorithms have proven to be a useful initial step for segmentation and subsequent processing of images, reducing computational complexity by replacing the use of expensive per-pixel primitives with a higher-level abstraction, superpixels. They have been successfully applied both in the context of traditional image analysis and deep learning based approaches. In this work, we present a generalized implementation of the simple linear iterative clustering (SLIC) superpixel algorithm that has been generalized for n-dimensional scalar and multi-channel images. Additionally, the standard iterative implementation is replaced by a parallel, multi-threaded one. We describe the implementation details and analyze its scalability using a strong scaling formulation. Quantitative evaluation is performed using a 3D image, the Visible Human cryosection dataset, and a 2D image from the same dataset. Results show good scalability with runtime gains even when using a large number of threads that exceeds the physical number of available cores (hyperthreading).
Tasks
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08741v2
PDF	http://arxiv.org/pdf/1806.08741v2.pdf
PWC	https://paperswithcode.com/paper/scalable-simple-linear-iterative-clustering
Repo
Framework

Explainable Recommendation via Multi-Task Learning in Opinionated Text Data


Title	Explainable Recommendation via Multi-Task Learning in Opinionated Text Data
Authors	Nan Wang, Hongning Wang, Yiling Jia, Yue Yin
Abstract	Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation} and \textit{opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user’s preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation. Extensive experiments on two large collections of Amazon and Yelp reviews confirmed the effectiveness of our solution in both recommendation and explanation tasks, compared with several existing recommendation algorithms. And our extensive user study clearly demonstrates the practical value of the explainable recommendations generated by our algorithm.
Tasks	Multi-Task Learning
Published	2018-06-10
URL	http://arxiv.org/abs/1806.03568v1
PDF	http://arxiv.org/pdf/1806.03568v1.pdf
PWC	https://paperswithcode.com/paper/explainable-recommendation-via-multi-task
Repo
Framework