February 2, 2020

3271 words 16 mins read

Paper Group AWR 29

Meta-Curvature. Analyzing machine-learned representations: A natural language case study. Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions. The Universal Decompositional Semantics Dataset and Decomp Toolkit. Compressed Indexes for Fast Search of Semantic Data. Deep Independently Recurrent Neural Netwo …

Meta-Curvature


Title	Meta-Curvature
Authors	Eunbyung Park, Junier B. Oliva
Abstract	We propose meta-curvature (MC), a framework to learn curvature information for better generalization and fast model adaptation. MC expands on the model-agnostic meta-learner (MAML) by learning to transform the gradients in the inner optimization such that the transformed gradients achieve better generalization performance to a new task. For training large scale neural networks, we decompose the curvature matrix into smaller matrices in a novel scheme where we capture the dependencies of the model’s parameters with a series of tensor products. We demonstrate the effects of our proposed method on several few-shot learning tasks and datasets. Without any task specific techniques and architectures, the proposed method achieves substantial improvement upon previous MAML variants and outperforms the recent state-of-the-art methods. Furthermore, we observe faster convergence rates of the meta-training process. Finally, we present an analysis that explains better generalization performance with the meta-trained curvature.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Image Classification
Published	2019-02-09
URL	https://arxiv.org/abs/1902.03356v3
PDF	https://arxiv.org/pdf/1902.03356v3.pdf
PWC	https://paperswithcode.com/paper/meta-curvature
Repo	https://github.com/silverbottlep/meta_curvature
Framework	none

Analyzing machine-learned representations: A natural language case study


Title	Analyzing machine-learned representations: A natural language case study
Authors	Ishita Dasgupta, Demi Guo, Samuel J. Gershman, Noah D. Goodman
Abstract	As modern deep networks become more complex, and get closer to human-like capabilities in certain domains, the question arises of how the representations and decision rules they learn compare to the ones in humans. In this work, we study representations of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of abstract composable structure represented. Analyzing performance on these diagnostic tests indicates a lack of systematicity in the representations and decision rules, and reveals a set of heuristic strategies. We then investigate the effect of the training distribution on learning these heuristic strategies, and study changes in these representations with various augmentations to the training set. Our results reveal parallels to the analogous representations in people. We find that these systems can learn abstract rules and generalize them to new contexts under certain circumstances – similar to human zero-shot reasoning. However, we also note some shortcomings in this generalization behavior – similar to human judgment errors like belief bias. Studying these parallels suggests new ways to understand psychological phenomena in humans as well as informs best strategies for building artificial intelligence with human-like language understanding.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05885v1
PDF	https://arxiv.org/pdf/1909.05885v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-machine-learned-representations-a
Repo	https://github.com/ishita-dg/ScrambleTests
Framework	pytorch

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions


Title	Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions
Authors	Matthew Faw, Rajat Sen, Karthikeyan Shanmugam, Constantine Caramanis, Sanjay Shakkottai
Abstract	We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. This covariate shift is caused, in part, due to unobserved features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, ${\sf Mix&Match}$, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations. Finally, we validate our algorithm on two real-world datasets.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10154v4
PDF	https://arxiv.org/pdf/1907.10154v4.pdf
PWC	https://paperswithcode.com/paper/mix-and-match-an-optimistic-tree-search
Repo	https://github.com/matthewfaw/mixnmatch-infrastructure
Framework	none

The Universal Decompositional Semantics Dataset and Decomp Toolkit


Title	The Universal Decompositional Semantics Dataset and Decomp Toolkit
Authors	Aaron Steven White, Elias Stengel-Eskin, Siddharth Vashishtha, Venkata Govindarajan, Dee Ann Reisinger, Tim Vieira, Keisuke Sakaguchi, Sheng Zhang, Francis Ferraro, Rachel Rudinger, Kyle Rawlins, Benjamin Van Durme
Abstract	We present the Universal Decompositional Semantics (UDS) dataset (v1.0), which is bundled with the Decomp toolkit (v0.1). UDS1.0 unifies five high-quality, decompositional semantics-aligned annotation sets within a single semantic graph specification—with graph structures defined by the predicative patterns produced by the PredPatt tool and real-valued node and edge attributes constructed using sophisticated normalization procedures. The Decomp toolkit provides a suite of Python 3 tools for querying UDS graphs using SPARQL. Both UDS1.0 and Decomp0.1 are publicly available at http://decomp.io.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13851v1
PDF	https://arxiv.org/pdf/1909.13851v1.pdf
PWC	https://paperswithcode.com/paper/the-universal-decompositional-semantics
Repo	https://github.com/decompositional-semantics-initiative/decomp
Framework	none

Compressed Indexes for Fast Search of Semantic Data


Title	Compressed Indexes for Fast Search of Semantic Data
Authors	Raffaele Perego, Giulio Ermanno Pibiri, Rossano Venturini
Abstract	The sheer increase in volume of RDF data demands efficient solutions for the triple indexing problem, that is devising a compressed data structure to compactly represent RDF triples by guaranteeing, at the same time, fast pattern matching operations. This problem lies at the heart of delivering good practical performance for the resolution of complex SPARQL queries on large RDF datasets. In this work, we propose a trie-based index layout to solve the problem and introduce two novel techniques to reduce its space of representation for improved effectiveness. The extensive experimental analysis conducted over a wide range of publicly available real-world datasets, reveals that our best space/time trade-off configuration substantially outperforms existing solutions at the state-of-the-art, by taking 30-60% less space and speeding up query execution by a factor of 2-81x.
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07619v3
PDF	https://arxiv.org/pdf/1904.07619v3.pdf
PWC	https://paperswithcode.com/paper/compressed-indexes-for-fast-search-of
Repo	https://github.com/jermp/rdf_indexes
Framework	none

Deep Independently Recurrent Neural Network (IndRNN)


Title	Deep Independently Recurrent Neural Network (IndRNN)
Authors	Shuai Li, Wanqing Li, Chris Cook, Yanbo Gao, Ce Zhu
Abstract	Recurrent neural networks (RNNs) are known to be difficult to train due to the gradient vanishing and exploding problems and thus difficult to learn long-term patterns. Long short-term memory (LSTM) was developed to address these problems, but the use of hyperbolic tangent and the sigmoid activation functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep RNN is challenging. Moreover, training of LSTM is very compute-intensive as the recurrent connection using matrix product is conducted at every time step. To address these problems, this paper proposes a new type of RNNs with the recurrent connection formulated as Hadamard product, referred to as independently recurrent neural network (IndRNN), where neurons in the same layer are independent of each other and connected across layers. The gradient vanishing and exploding problems are solved in IndRNN by simply regulating the recurrent weights, and thus long-term dependencies can be learned. Moreover, an IndRNN can work with non-saturated activation functions such as ReLU and be still trained robustly. Different deeper IndRNN architectures, including the basic stacked IndRNN, residual IndRNN and densely connected IndRNN, have been investigated, all of which can be much deeper than the existing RNNs. Furthermore, IndRNN reduces the computation at each time step and can be over 10 times faster than the LSTM. The code is made publicly available at https://github.com/Sunnydreamrain/IndRNN_pytorch. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (the 21 layers residual IndRNN and deep densely connected IndRNN used in the experiment for example). Better performances have been achieved on various tasks with IndRNNs compared with the traditional RNN and LSTM.
Tasks	Language Modelling, Sequential Image Classification, Skeleton Based Action Recognition
Published	2019-10-11
URL	https://arxiv.org/abs/1910.06251v2
PDF	https://arxiv.org/pdf/1910.06251v2.pdf
PWC	https://paperswithcode.com/paper/deep-independently-recurrent-neural-network
Repo	https://github.com/Sunnydreamrain/IndRNN_pytorch
Framework	pytorch


Title	CityLearn: Diverse Real-World Environments for Sample-Efficient Navigation Policy Learning
Authors	Marvin Chancán, Michael Milford
Abstract	Visual navigation tasks in real-world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional data, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal image representations that can then be used to effectively learn control policies from a small amount of experience. Second, we present an interactive framework, CityLearn, that enables for the first time training and deployment of navigation algorithms across city-sized, realistic environments with extreme visual appearance changes. CityLearn features more than 10 benchmark datasets, often used in visual place recognition and autonomous driving research, including over 100 recorded traversals across 60 cities around the world. We evaluate our approach on two CityLearn environments, training our navigation policy on a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images, and can also generalize across extreme visual changes including day to night and summer to winter transitions.
Tasks	Autonomous Driving, Decision Making, Visual Navigation, Visual Place Recognition
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04335v2
PDF	https://arxiv.org/pdf/1910.04335v2.pdf
PWC	https://paperswithcode.com/paper/from-visual-place-recognition-to-navigation
Repo	https://github.com/mchancan/citylearn
Framework	none

Pose-aware Multi-level Feature Network for Human Object Interaction Detection


Title	Pose-aware Multi-level Feature Network for Human Object Interaction Detection
Authors	Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He
Abstract	Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories. To address those challenges, we propose a multi-level relation detection strategy that utilizes human pose cues to capture global spatial configurations of relations and as an attention mechanism to dynamically zoom into relevant regions at human part level. Specifically, we develop a multi-branch deep network to learn a pose-augmented relation representation at three semantic levels, incorporating interaction context, object features and detailed semantic part cues. As a result, our approach is capable of generating robust predictions on fine-grained human object interactions with interpretable outputs. Extensive experimental evaluations on public benchmarks show that our model outperforms prior methods by a considerable margin, demonstrating its efficacy in handling complex scenes.
Tasks	Human-Object Interaction Detection, Scene Understanding
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08453v1
PDF	https://arxiv.org/pdf/1909.08453v1.pdf
PWC	https://paperswithcode.com/paper/pose-aware-multi-level-feature-network-for
Repo	https://github.com/bobwan1995/PMFNet
Framework	pytorch

Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression


Title	Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression
Authors	Xinyao Wang, Liefeng Bo, Li Fuxin
Abstract	Heatmap regression with a deep network has become one of the mainstream approaches to localize facial landmarks. However, the loss function for heatmap regression is rarely studied. In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems. Then we propose a novel loss function, named Adaptive Wing loss, that is able to adapt its shape to different types of ground truth heatmap pixels. This adaptability penalizes loss more on foreground pixels while less on background pixels. To address the imbalance between foreground and background pixels, we also propose Weighted Loss Map, which assigns high weights on foreground and difficult background pixels to help training process focus more on pixels that are crucial to landmark localization. To further improve face alignment accuracy, we introduce boundary prediction and CoordConv with boundary coordinates. Extensive experiments on different benchmarks, including COFW, 300W and WFLW, show our approach outperforms the state-of-the-art by a significant margin on various evaluation metrics. Besides, the Adaptive Wing loss also helps other heatmap regression tasks. Code will be made publicly available at https://github.com/protossw512/AdaptiveWingLoss.
Tasks	Face Alignment, Robust Face Alignment
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07399v2
PDF	https://arxiv.org/pdf/1904.07399v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-wing-loss-for-robust-face-alignment
Repo	https://github.com/SeungyounShin/Adaptive-Wing-Loss-for-Robust-Face-Alignment-via-Heatmap-Regression
Framework	pytorch

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels


Title	Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels
Authors	Simon S. Du, Kangcheng Hou, Barnabás Póczos, Ruslan Salakhutdinov, Ruosong Wang, Keyulu Xu
Abstract	While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs. Compared to graph kernels, graph neural networks (GNNs) usually achieve better practical performance, as GNNs use multi-layer architectures and non-linear activation functions to extract high-order information of graphs as features. However, due to the large number of hyper-parameters and the non-convex nature of the training procedure, GNNs are harder to train. Theoretical guarantees of GNNs are also not well-understood. Furthermore, the expressive power of GNNs scales with the number of parameters, and thus it is hard to exploit the full power of GNNs when computing resources are limited. The current paper presents a new class of graph kernels, Graph Neural Tangent Kernels (GNTKs), which correspond to infinitely wide multi-layer GNNs trained by gradient descent. GNTKs enjoy the full expressive power of GNNs and inherit advantages of GKs. Theoretically, we show GNTKs provably learn a class of smooth functions on graphs. Empirically, we test GNTKs on graph classification datasets and show they achieve strong performance.
Tasks	Graph Classification
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13192v2
PDF	https://arxiv.org/pdf/1905.13192v2.pdf
PWC	https://paperswithcode.com/paper/graph-neural-tangent-kernel-fusing-graph
Repo	https://github.com/KangchengHou/gntk
Framework	none

Face Alignment using a 3D Deeply-initialized Ensemble of Regression Trees


Title	Face Alignment using a 3D Deeply-initialized Ensemble of Regression Trees
Authors	Roberto Valle, José M. Buenaposada, Antonio Valdés, Luis Baumela
Abstract	Face alignment algorithms locate a set of landmark points in images of faces taken in unrestricted situations. State-of-the-art approaches typically fail or lose accuracy in the presence of occlusions, strong deformations, large pose variations and ambiguous configurations. In this paper we present 3DDE, a robust and efficient face alignment algorithm based on a coarse-to-fine cascade of ensembles of regression trees. It is initialized by robustly fitting a 3D face model to the probability maps produced by a convolutional neural network. With this initialization we address self-occlusions and large face rotations. Further, the regressor implicitly imposes a prior face shape on the solution, addressing occlusions and ambiguous face configurations. Its coarse-to-fine structure tackles the combinatorial explosion of parts deformation. In the experiments performed, 3DDE improves the state-of-the-art in 300W, COFW, AFLW and WFLW data sets. Finally, we perform cross-dataset experiments that reveal the existence of a significant data set bias in these benchmarks.
Tasks	Face Alignment, Facial Landmark Detection
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01831v2
PDF	https://arxiv.org/pdf/1902.01831v2.pdf
PWC	https://paperswithcode.com/paper/face-alignment-using-a-3d-deeply-initialized
Repo	https://github.com/bobetocalo/bobetocalo_eccv18
Framework	none

Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training


Title	Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training
Authors	Haichao Zhang, Jianyu Wang
Abstract	We introduce a feature scattering-based adversarial training approach for improving model robustness against adversarial attacks. Conventional adversarial training approaches leverage a supervised scheme (either targeted or non-targeted) in generating attacks for training, which typically suffer from issues such as label leaking as noted in recent works. Differently, the proposed approach generates adversarial images for training through feature scattering in the latent space, which is unsupervised in nature and avoids label leaking. More importantly, this new approach generates perturbed images in a collaborative fashion, taking the inter-sample relationships into consideration. We conduct analysis on model robustness and demonstrate the effectiveness of the proposed approach through extensively experiments on different datasets compared with state-of-the-art approaches.
Tasks
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10764v4
PDF	https://arxiv.org/pdf/1907.10764v4.pdf
PWC	https://paperswithcode.com/paper/defense-against-adversarial-attacks-using-2
Repo	https://github.com/Line290/FeatureAttack
Framework	pytorch

Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons


Title	Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons
Authors	Yi-Fan Song, Zhang Zhang, Liang Wang
Abstract	Current methods for skeleton-based human action recognition usually work with completely observed skeletons. However, in real scenarios, it is prone to capture incomplete and noisy skeletons, which will deteriorate the performance of traditional models. To enhance the robustness of action recognition models to incomplete skeletons, we propose a multi-stream graph convolutional network (GCN) for exploring sufficient discriminative features distributed over all skeleton joints. Here, each stream of the network is only responsible for learning features from currently unactivated joints, which are distinguished by the class activation maps (CAM) obtained by preceding streams, so that the activated joints of the proposed method are obviously more than traditional methods. Thus, the proposed method is termed richly activated GCN (RA-GCN), where the richly discovered features will improve the robustness of the model. Compared to the state-of-the-art methods, the RA-GCN achieves comparable performance on the NTU RGB+D dataset. Moreover, on a synthetic occlusion dataset, the performance deterioration can be alleviated by the RA-GCN significantly.
Tasks	Skeleton Based Action Recognition, Temporal Action Localization
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06774v2
PDF	https://arxiv.org/pdf/1905.06774v2.pdf
PWC	https://paperswithcode.com/paper/richlt-activated-graph-convolutional-network
Repo	https://github.com/yfsong0709/RA-GCNv1
Framework	pytorch

Learning protein sequence embeddings using information from structure


Title	Learning protein sequence embeddings using information from structure
Authors	Tristan Bepler, Bonnie Berger
Abstract	Inferring the structural properties of a protein from its amino acid sequence is a challenging yet important problem in biology. Structures are not known for the vast majority of protein sequences, but structure is critical for understanding function. Existing approaches for detecting structural similarity between proteins from sequence are unable to recognize and exploit structural patterns when sequences have diverged too far, limiting our ability to transfer knowledge between structurally related proteins. We newly approach this problem through the lens of representation learning. We introduce a framework that maps any protein sequence to a sequence of vector embeddings — one per amino acid position — that encode structural information. We train bidirectional long short-term memory (LSTM) models on protein sequences with a two-part feedback mechanism that incorporates information from (i) global structural similarity between proteins and (ii) pairwise residue contact maps for individual proteins. To enable learning from structural similarity information, we define a novel similarity measure between arbitrary-length sequences of vector embeddings based on a soft symmetric alignment (SSA) between them. Our method is able to learn useful position-specific embeddings despite lacking direct observations of position-level correspondence between sequences. We show empirically that our multi-task framework outperforms other sequence-based methods and even a top-performing structure-based alignment method when predicting structural similarity, our goal. Finally, we demonstrate that our learned embeddings can be transferred to other protein sequence problems, improving the state-of-the-art in transmembrane domain prediction.
Tasks	Representation Learning
Published	2019-02-22
URL	https://arxiv.org/abs/1902.08661v2
PDF	https://arxiv.org/pdf/1902.08661v2.pdf
PWC	https://paperswithcode.com/paper/learning-protein-sequence-embeddings-using
Repo	https://github.com/cguerramain/protein-structure-prediction-models
Framework	none

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos


Title	Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Authors	Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu
Abstract	Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence. Existing methods mainly tackle this task via matching and aligning semantics between a sentence and candidate video segments, while neglect the fact that the sentence information plays an important role in temporally correlating and composing the described contents in videos. In this paper, we propose a novel semantic conditioned dynamic modulation (SCDM) mechanism, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence related video contents over time. More importantly, the proposed SCDM performs dynamically with respect to the diverse video contents so as to establish a more precise matching relationship between sentence and video, thereby improving the temporal grounding accuracy. Extensive experiments on three public datasets demonstrate that our proposed model outperforms the state-of-the-arts with clear margins, illustrating the ability of SCDM to better associate and localize relevant video contents for temporal sentence grounding. Our code for this paper is available at https://github.com/yytzsy/SCDM .
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14303v1
PDF	https://arxiv.org/pdf/1910.14303v1.pdf
PWC	https://paperswithcode.com/paper/semantic-conditioned-dynamic-modulation-for
Repo	https://github.com/yytzsy/SCDM
Framework	tf