February 1, 2020

3076 words 15 mins read

Paper Group AWR 245

Automated Machine Learning with Monte-Carlo Tree Search. Efficient Neural Interaction Function Search for Collaborative Filtering. Discovering the Compositional Structure of Vector Representations with Role Learning Networks. Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study. Pushing the Boundaries of View Extr …

Automated Machine Learning with Monte-Carlo Tree Search


Title	Automated Machine Learning with Monte-Carlo Tree Search
Authors	Herilalaina Rakotoarison, Marc Schoenauer, Michèle Sebag
Abstract	The AutoML task consists of selecting the proper algorithm in a machine learning portfolio, and its hyperparameter values, in order to deliver the best performance on the dataset at hand. Mosaic, a Monte-Carlo tree search (MCTS) based approach, is presented to handle the AutoML hybrid structural and parametric expensive black-box optimization problem. Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initialization; iii) the ensembling of the solutions gathered along the search. Mosaic is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over Auto-Sklearn, winner of former international AutoML challenges.
Tasks	AutoML
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00170v2
PDF	https://arxiv.org/pdf/1906.00170v2.pdf
PWC	https://paperswithcode.com/paper/190600170
Repo	https://github.com/herilalaina/mosaic_ml
Framework	none

Efficient Neural Interaction Function Search for Collaborative Filtering


Title	Efficient Neural Interaction Function Search for Collaborative Filtering
Authors	Quanming Yao, Xiangning Chen, James Kwok, Yong Li, Cho-Jui Hsieh
Abstract	In collaborative filtering (CF), interaction function (IFC) plays the important role of capturing interactions among items and users. The most popular IFC is the inner product, which has been successfully used in low-rank matrix factorization. However, interactions in real-world applications can be highly complex. Thus, other operations (such as plus and concatenation), which may potentially offer better performance, have been proposed. Nevertheless, it is still hard for existing IFCs to have consistently good performance across different application scenarios. Motivated by the recent success of automated machine learning (AutoML), we propose in this paper the search for simple neural interaction functions (SIF) in CF. By examining and generalizing existing CF approaches, an expressive SIF search space is designed and represented as a structured multi-layer perceptron. We propose an one-shot search algorithm that simultaneously updates both the architecture and learning parameters. Experimental results demonstrate that the proposed method can be much more efficient than popular AutoML approaches, can obtain much better prediction performance than state-of-the-art CF approaches, and can discover distinct IFCs for different data sets and tasks
Tasks	AutoML
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12091v2
PDF	https://arxiv.org/pdf/1906.12091v2.pdf
PWC	https://paperswithcode.com/paper/searching-for-interaction-functions-in
Repo	https://github.com/xiangning-chen/SIF
Framework	pytorch

Discovering the Compositional Structure of Vector Representations with Role Learning Networks


Title	Discovering the Compositional Structure of Vector Representations with Role Learning Networks
Authors	Paul Soulos, Tom McCoy, Tal Linzen, Paul Smolensky
Abstract	Neural networks are able to perform tasks that rely on compositional structure even though they lack obvious mechanisms for representing this structure. To analyze the internal representations that enable such success, we propose ROLE, a technique that detects whether these representations implicitly encode symbolic structure. ROLE learns to approximate the representations of a target encoder E by learning a symbolic constituent structure and an embedding of that structure into E’s representational vector space. The constituents of the approximating symbol structure are defined by structural positions - roles - that can be filled by symbols. We show that when E is constructed to explicitly embed a particular type of structure (string or tree), ROLE successfully extracts the ground-truth roles defining that structure. We then analyze a GRU seq2seq network trained to perform a more complex compositional task (SCAN), where there is no ground truth role scheme available. For this model, ROLE successfully discovers an interpretable symbolic structure that the model implicitly uses to perform the SCAN task, providing a comprehensive account of the representations that drive the behavior of a frequently-used but hard-to-interpret type of model. We verify the causal importance of the discovered symbolic structure by showing that, when we systematically manipulate hidden embeddings based on this symbolic structure, the model’s resulting output is changed in the way predicted by our analysis. Finally, we use ROLE to explore whether popular sentence embedding models are capturing compositional structure and find evidence that they are not; we conclude by suggesting how insights from ROLE can be used to impart new inductive biases to improve the compositional abilities of such models.
Tasks	Sentence Embedding
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09113v2
PDF	https://arxiv.org/pdf/1910.09113v2.pdf
PWC	https://paperswithcode.com/paper/discovering-the-compositional-structure-of
Repo	https://github.com/psoulos/role-decomposition
Framework	pytorch


Title	Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study
Authors	Derek Howard, Marta Maslej, Justin Lee, Jacob Ritchie, Geoffrey Woollard, Leon French
Abstract	Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data which can be mined to predict mental health states using machine learning methods. We benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools to triage content for moderator attention. We used 1588 labeled posts from the CLPsych 2017 shared task collected from the Reachout.com forum (Milne et al., 2019). Posts were represented using lexicon based tools including VADER, Empath, LIWC and also used pre-trained artificial neural network models including DeepMoji, Universal Sentence Encoder, and GPT-1. We used TPOT and auto-sklearn as AutoML tools to generate classifiers to triage the posts. The top-performing system used features derived from the GPT-1 model, which was finetuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macro averaged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from meta-data or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. We additionally present visualizations that aid understanding of the learned classifiers. We show that transfer learning is an effective strategy for predicting risk with relatively little labeled data. We note that finetuning of pretrained language models provides further gains when large amounts of unlabeled text is available.
Tasks	AutoML, Transfer Learning
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02581v2
PDF	https://arxiv.org/pdf/1907.02581v2.pdf
PWC	https://paperswithcode.com/paper/application-of-transfer-learning-for
Repo	https://github.com/derekhoward/Reachout_triage
Framework	none

Pushing the Boundaries of View Extrapolation with Multiplane Images


Title	Pushing the Boundaries of View Extrapolation with Multiplane Images
Authors	Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, Noah Snavely
Abstract	We explore the problem of view synthesis from a narrow baseline pair of images, and focus on generating high-quality view extrapolations with plausible disocclusions. Our method builds upon prior work in predicting a multiplane image (MPI), which represents scene content as a set of RGB$\alpha$ planes within a reference view frustum and renders novel views by projecting this content into the target viewpoints. We present a theoretical analysis showing how the range of views that can be rendered from an MPI increases linearly with the MPI disparity sampling frequency, as well as a novel MPI prediction procedure that theoretically enables view extrapolations of up to $4\times$ the lateral viewpoint movement allowed by prior work. Our method ameliorates two specific issues that limit the range of views renderable by prior methods: 1) We expand the range of novel views that can be rendered without depth discretization artifacts by using a 3D convolutional network architecture along with a randomized-resolution training procedure to allow our model to predict MPIs with increased disparity sampling frequency. 2) We reduce the repeated texture artifacts seen in disocclusions by enforcing a constraint that the appearance of hidden content at any depth must be drawn from visible content at or behind that depth. Please see our results video at: https://www.youtube.com/watch?v=aJqAaMNL2m4.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00413v1
PDF	http://arxiv.org/pdf/1905.00413v1.pdf
PWC	https://paperswithcode.com/paper/pushing-the-boundaries-of-view-extrapolation
Repo	https://github.com/google-research/google-research/tree/master/mpi_extrapolation
Framework	tf

CAT: CRF-based ASR Toolkit


Title	CAT: CRF-based ASR Toolkit
Authors	Keyu An, Hongyu Xiang, Zhijian Ou
Abstract	In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit). A key feature of CAT is discriminative training in the framework of conditional random field (CRF), particularly with connectionist temporal classification (CTC) inspired state topology. CAT contains a full-fledged implementation of CTC-CRF and provides a complete workflow for CRF-based end-to-end speech recognition. Evaluation results on Chinese and English benchmarks such as Switchboard and Aishell show that CAT obtains the state-of-the-art results among existing end-to-end models with less parameters, and is competitive compared with the hybrid DNN-HMM models. Towards flexibility, we show that i-vector based speaker-adapted recognition and latency control mechanism can be explored easily and effectively in CAT. We hope CAT, especially the CRF-based framework and software, will be of broad interest to the community, and can be further explored and improved.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08747v1
PDF	https://arxiv.org/pdf/1911.08747v1.pdf
PWC	https://paperswithcode.com/paper/cat-crf-based-asr-toolkit
Repo	https://github.com/thu-spmi/cat
Framework	pytorch

Invariant Risk Minimization


Title	Invariant Risk Minimization
Authors	Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, David Lopez-Paz
Abstract	We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.
Tasks
Published	2019-07-05
URL	https://arxiv.org/abs/1907.02893v3
PDF	https://arxiv.org/pdf/1907.02893v3.pdf
PWC	https://paperswithcode.com/paper/invariant-risk-minimization
Repo	https://github.com/reiinakano/invariant-risk-minimization
Framework	pytorch

Deep Equilibrium Models


Title	Deep Equilibrium Models
Authors	Shaojie Bai, J. Zico Kolter, Vladlen Koltun
Abstract	We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but has the notable advantage that we can analytically backpropagate through the equilibrium point using implicit differentiation. Using this approach, training and prediction in these networks require only constant memory, regardless of the effective “depth” of the network. We demonstrate how DEQs can be applied to two state-of-the-art deep sequence models: self-attention transformers and trellis networks. On large-scale language modeling tasks, such as the WikiText-103 benchmark, we show that DEQs 1) often improve performance over these state-of-the-art models (for similar parameter counts); 2) have similar computational requirements to existing models; and 3) vastly reduce memory consumption (often the bottleneck for training large sequence models), demonstrating an up-to 88% memory reduction in our experiments. The code is available at https://github.com/locuslab/deq .
Tasks	Language Modelling
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01377v2
PDF	https://arxiv.org/pdf/1909.01377v2.pdf
PWC	https://paperswithcode.com/paper/deep-equilibrium-models
Repo	https://github.com/locuslab/deq
Framework	pytorch

Video Instance Segmentation


Title	Video Instance Segmentation
Authors	Linjie Yang, Yuchen Fan, Ning Xu
Abstract	In this paper we present a new computer vision task, named video instance segmentation. The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain. To facilitate research on this new task, we propose a large-scale benchmark called YouTube-VIS, which consists of 2883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks. In addition, we propose a novel algorithm called MaskTrack R-CNN for this task. Our new method introduces a new tracking branch to Mask R-CNN to jointly perform the detection, segmentation and tracking tasks simultaneously. Finally, we evaluate the proposed method and several strong baselines on our new dataset. Experimental results clearly demonstrate the advantages of the proposed algorithm and reveal insight for future improvement. We believe the video instance segmentation task will motivate the community along the line of research for video understanding.
Tasks	Instance Segmentation, Semantic Segmentation, Video Understanding
Published	2019-05-12
URL	https://arxiv.org/abs/1905.04804v4
PDF	https://arxiv.org/pdf/1905.04804v4.pdf
PWC	https://paperswithcode.com/paper/video-instance-segmentation
Repo	https://github.com/jiawen9611/Awesome-Video-Instance-Segmentation
Framework	pytorch

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding


Title	ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding
Authors	Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
Abstract	We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance. Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees. We establish that down to a small noise floor, ErasureHead converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model. We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.
Tasks
Published	2019-01-28
URL	http://arxiv.org/abs/1901.09671v1
PDF	http://arxiv.org/pdf/1901.09671v1.pdf
PWC	https://paperswithcode.com/paper/erasurehead-distributed-gradient-descent
Repo	https://github.com/hwang595/ErasureHead
Framework	pytorch

Towards Real-Time Multi-Object Tracking


Title	Towards Real-Time Multi-Object Tracking
Authors	Zhongdao Wang, Liang Zheng, Yixuan Liu, Shengjin Wang
Abstract	Modern multiple object tracking (MOT) systems usually follow the tracking-by-detection paradigm. It has 1) a detection model for target localization and 2) an appearance embedding model for data association. Having the two models separately executed might lead to efficiency problems, as the running time is simply a sum of the two steps without investigating potential structures that can be shared between them. Existing research efforts on real-time MOT usually focus on the association step, so they are essentially real-time association methods but not real-time MOT system. In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model. Specifically, we incorporate the appearance embedding model into a single-shot detector, such that the model can simultaneously output detections and the corresponding embeddings. As such, the system is formulated as a multi-task learning problem: there are multiple objectives, i.e., anchor classification, bounding box regression, and embedding learning; and the individual losses are automatically weighted. To our knowledge, this work reports the first (near) real-time MOT system, with a running speed of 18.8 to 24.1 FPS depending on the input resolution. Meanwhile, its tracking accuracy is comparable to the state-of-the-art trackers embodying separate detection and embedding (SDE) learning (64.4% MOTA v.s. 66.1% MOTA on MOT-16 challenge). The code and models are available at https://github.com/Zhongdao/Towards-Realtime-MOT.
Tasks	Multi-Object Tracking, Multiple Object Tracking, Multi-Task Learning, Object Tracking
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12605v1
PDF	https://arxiv.org/pdf/1909.12605v1.pdf
PWC	https://paperswithcode.com/paper/towards-real-time-multi-object-tracking
Repo	https://github.com/Zhongdao/Towards-Realtime-MOT
Framework	pytorch

Simple Unsupervised Summarization by Contextual Matching


Title	Simple Unsupervised Summarization by Contextual Matching
Authors	Jiawei Zhou, Alexander M. Rush
Abstract	We propose an unsupervised method for sentence summarization using only language modeling. The approach employs two language models, one that is generic (i.e. pretrained), and the other that is specific to the target domain. We show that by using a product-of-experts criteria these are enough for maintaining continuous contextual matching while maintaining output fluency. Experiments on both abstractive and extractive sentence summarization data sets show promising results of our method without being exposed to any paired data.
Tasks	Language Modelling
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13337v1
PDF	https://arxiv.org/pdf/1907.13337v1.pdf
PWC	https://paperswithcode.com/paper/simple-unsupervised-summarization-by-1
Repo	https://github.com/jzhou316/Unsupervised-Sentence-Summarization
Framework	pytorch

Dual Active Sampling on Batch-Incremental Active Learning


Title	Dual Active Sampling on Batch-Incremental Active Learning
Authors	Johan Phan, Massimiliano Ruocco, Francesco Scibilia
Abstract	Recently, Convolutional Neural Networks (CNNs) have shown unprecedented success in the field of computer vision, especially on challenging image classification tasks by relying on a universal approach, i.e., training a deep model on a massive dataset of supervised examples. While unlabeled data are often an abundant resource, collecting a large set of labeled data, on the other hand, are very expensive, which often require considerable human efforts. One way to ease out this is to effectively select and label highly informative instances from a pool of unlabeled data (i.e., active learning). This paper proposed a new method of batch-mode active learning, Dual Active Sampling(DAS), which is based on a simple assumption, if two deep neural networks (DNNs) of the same structure and trained on the same dataset give significantly different output for a given sample, then that particular sample should be picked for additional training. While other state of the art methods in this field usually require intensive computational power or relying on a complicated structure, DAS is simpler to implement and, managed to get improved results on Cifar-10 with preferable computational time compared to the core-set method.
Tasks	Active Learning, Image Classification
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09247v1
PDF	https://arxiv.org/pdf/1905.09247v1.pdf
PWC	https://paperswithcode.com/paper/dual-active-sampling-on-batch-incremental
Repo	https://github.com/JohanPhan/Dual_active_sampling_Active_learning
Framework	pytorch

Conceptualize and Infer User Needs in E-commerce


Title	Conceptualize and Infer User Needs in E-commerce
Authors	Xusheng Luo, Yonghua Yang, Kenny Q. Zhu, Yu Gong, Keping Yang
Abstract	Understanding latent user needs beneath shopping behaviors is critical to e-commercial applications. Without a proper definition of user needs in e-commerce, most industry solutions are not driven directly by user needs at current stage, which prevents them from further improving user satisfaction. Representing implicit user needs explicitly as nodes like “outdoor barbecue” or “keep warm for kids” in a knowledge graph, provides new imagination for various e- commerce applications. Backed by such an e-commerce knowledge graph, we propose a supervised learning algorithm to conceptualize user needs from their transaction history as “concept” nodes in the graph and infer those concepts for each user through a deep attentive model. Offline experiments demonstrate the effectiveness and stability of our model, and online industry strength tests show substantial advantages of such user needs understanding.
Tasks
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03295v1
PDF	https://arxiv.org/pdf/1910.03295v1.pdf
PWC	https://paperswithcode.com/paper/conceptualize-and-infer-user-needs-in-e
Repo	https://github.com/angrymidiao/concept_net
Framework	none

Optimal Sparse Decision Trees


Title	Optimal Sparse Decision Trees
Authors	Xiyang Hu, Cynthia Rudin, Margo Seltzer
Abstract	Decision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980’s. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. Our experiments highlight advantages in scalability, speed, and proof of optimality.
Tasks
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12847v3
PDF	https://arxiv.org/pdf/1904.12847v3.pdf
PWC	https://paperswithcode.com/paper/optimal-sparse-decision-trees
Repo	https://github.com/xiyanghu/OSDT
Framework	none