October 20, 2019

2937 words 14 mins read

Paper Group AWR 258

Rice Classification Using Spatio-Spectral Deep Convolutional Neural Network. Fully Supervised Speaker Diarization. Deep Object Co-Segmentation. DRACO: Byzantine-resilient Distributed Training via Redundant Gradients. 3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data. QA4IE: A Question Answering based Framework for Inf …

Rice Classification Using Spatio-Spectral Deep Convolutional Neural Network


Title	Rice Classification Using Spatio-Spectral Deep Convolutional Neural Network
Authors	Itthi Chatnuntawech, Kittipong Tantisantisom, Paisan Khanchaitit, Thitikorn Boonkoom, Berkin Bilgic, Ekapol Chuangsuwanich
Abstract	Rice has been one of the staple foods that contribute significantly to human food supplies. Numerous rice varieties have been cultivated, imported, and exported worldwide. Different rice varieties could be mixed during rice production and trading. Rice impurities could damage the trust between rice importers and exporters, calling for the need to develop a rice variety inspection system. In this work, we develop a non-destructive rice variety classification system that benefits from the synergy between hyperspectral imaging and deep convolutional neural network (CNN). The proposed method uses a hyperspectral imaging system to simultaneously acquire complementary spatial and spectral information of rice seeds. The rice varieties are then determined from the acquired spatio-spectral data using a deep CNN. As opposed to several existing rice variety classification methods that require hand-engineered features, the proposed method automatically extracts spatio-spectral features from the raw sensor data. As demonstrated using two types of rice datasets, the proposed method achieved up to 11.9% absolute improvement in the mean classification accuracy, compared to the commonly used classification methods based on support vector machines.
Tasks
Published	2018-05-29
URL	https://arxiv.org/abs/1805.11491v3
PDF	https://arxiv.org/pdf/1805.11491v3.pdf
PWC	https://paperswithcode.com/paper/rice-classification-using-spatio-spectral
Repo	https://github.com/ichatnun/spatiospectral-densenet-rice-classification
Framework	tf

Fully Supervised Speaker Diarization


Title	Fully Supervised Speaker Diarization
Authors	Aonan Zhang, Quan Wang, Zhenyao Zhu, John Paisley, Chong Wang
Abstract	In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally integrated with a distance-dependent Chinese restaurant process (ddCRP) to accommodate an unknown number of speakers. Our system is fully supervised and is able to learn from examples where time-stamped speaker labels are annotated. We achieved a 7.6% diarization error rate on NIST SRE 2000 CALLHOME, which is better than the state-of-the-art method using spectral clustering. Moreover, our method decodes in an online fashion while most state-of-the-art systems rely on offline clustering.
Tasks	Speaker Diarization
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04719v7
PDF	http://arxiv.org/pdf/1810.04719v7.pdf
PWC	https://paperswithcode.com/paper/fully-supervised-speaker-diarization
Repo	https://github.com/google/uis-rnn
Framework	pytorch

Deep Object Co-Segmentation


Title	Deep Object Co-Segmentation
Authors	Weihao Li, Omid Hosseini Jafari, Carsten Rother
Abstract	This work presents a deep object co-segmentation (DOCS) approach for segmenting common objects of the same class within a pair of images. This means that the method learns to ignore common, or uncommon, background stuff and focuses on objects. If multiple object classes are presented in the image pair, they are jointly extracted as foreground. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. To train our model, we compile a large object co-segmentation dataset consisting of image pairs from the PASCAL VOC dataset with common objects masks. We evaluate our approach on commonly used datasets for co-segmentation tasks and observe that our approach consistently outperforms competing methods, for both seen and unseen object classes.
Tasks
Published	2018-04-17
URL	https://arxiv.org/abs/1804.06423v2
PDF	https://arxiv.org/pdf/1804.06423v2.pdf
PWC	https://paperswithcode.com/paper/deep-object-co-segmentation
Repo	https://github.com/ohosseini/DOCS-pytorch
Framework	pytorch

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients


Title	DRACO: Byzantine-resilient Distributed Training via Redundant Gradients
Authors	Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos
Abstract	Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times, to orders of magnitude faster than median-based approaches.
Tasks
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09877v4
PDF	http://arxiv.org/pdf/1803.09877v4.pdf
PWC	https://paperswithcode.com/paper/draco-byzantine-resilient-distributed
Repo	https://github.com/hwang595/Draco
Framework	pytorch

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data


Title	3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data
Authors	Maurice Weiler, Mario Geiger, Max Welling, Wouter Boomsma, Taco Cohen
Abstract	We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02547v2
PDF	http://arxiv.org/pdf/1807.02547v2.pdf
PWC	https://paperswithcode.com/paper/3d-steerable-cnns-learning-rotationally
Repo	https://github.com/mariogeiger/se3cnn
Framework	pytorch

QA4IE: A Question Answering based Framework for Information Extraction


Title	QA4IE: A Question Answering based Framework for Information Extraction
Authors	Lin Qiu, Hao Zhou, Yanru Qu, Weinan Zhang, Suoheng Li, Shu Rong, Dongyu Ru, Lihua Qian, Kewei Tu, Yong Yu
Abstract	Information Extraction (IE) refers to automatically extracting structured relation tuples from unstructured texts. Common IE solutions, including Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence tuples, and are severely restricted by limited relation types as well as informal relation specifications (e.g., free-text based relation tuples). In order to overcome these weaknesses, we propose a novel IE framework named QA4IE, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences. Based on the framework, we develop a large IE benchmark with high quality human evaluation. This benchmark contains 293K documents, 2M golden relation triples, and 636 relation types. We compare our system with some IE baselines on our benchmark and the results show that our system achieves great improvements.
Tasks	Question Answering, Relation Extraction
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03396v2
PDF	http://arxiv.org/pdf/1804.03396v2.pdf
PWC	https://paperswithcode.com/paper/qa4ie-a-question-answering-based-framework
Repo	https://github.com/SJTU-lqiu/QA4IE
Framework	tf

Narrative Modeling with Memory Chains and Semantic Supervision


Title	Narrative Modeling with Memory Chains and Semantic Supervision
Authors	Fei Liu, Trevor Cohn, Timothy Baldwin
Abstract	Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task. Inspired by previous studies on ROC Story Cloze Test, we propose a novel method, tracking various semantic aspects with external neural memory chains while encouraging each to focus on a particular semantic aspect. Evaluated on the task of story ending prediction, our model demonstrates superior performance to a collection of competitive baselines, setting a new state of the art.
Tasks
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06122v1
PDF	http://arxiv.org/pdf/1805.06122v1.pdf
PWC	https://paperswithcode.com/paper/narrative-modeling-with-memory-chains-and
Repo	https://github.com/liufly/narrative-modeling
Framework	tf

Solving Inverse Computational Imaging Problems using Deep Pixel-level Prior


Title	Solving Inverse Computational Imaging Problems using Deep Pixel-level Prior
Authors	Akshat Dave, Anil Kumar Vadathya, Ramana Subramanyam, Rahul Baburajan, Kaushik Mitra
Abstract	Signal reconstruction is a challenging aspect of computational imaging as it often involves solving ill-posed inverse problems. Recently, deep feed-forward neural networks have led to state-of-the-art results in solving various inverse imaging problems. However, being task specific, these networks have to be learned for each inverse problem. On the other hand, a more flexible approach would be to learn a deep generative model once and then use it as a signal prior for solving various inverse problems. We show that among the various state of the art deep generative models, autoregressive models are especially suitable for our purpose for the following reasons. First, they explicitly model the pixel level dependencies and hence are capable of reconstructing low-level details such as texture patterns and edges better. Second, they provide an explicit expression for the image prior which can then be used for MAP based inference along with the forward model. Third, they can model long range dependencies in images which make them ideal for handling global multiplexing as encountered in various compressive imaging systems. We demonstrate the efficacy of our proposed approach in solving three computational imaging problems: Single Pixel Camera (SPC), LiSens and FlatCam. For both real and simulated cases, we obtain better reconstructions than the state-of-the-art methods in terms of perceptual and quantitative metrics.
Tasks
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09850v2
PDF	http://arxiv.org/pdf/1802.09850v2.pdf
PWC	https://paperswithcode.com/paper/solving-inverse-computational-imaging
Repo	https://github.com/adaveiitm/deep-pixel-level-prior
Framework	tf

Nonlinear 3D Face Morphable Model


Title	Nonlinear 3D Face Morphable Model
Authors	Luan Tran, Xiaoming Liu
Abstract	As a classic statistical model of 3D facial shape and texture, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of well-controlled 2D face images with associated 3D face scans, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of unconstrained face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, shape and texture parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and texture parameters to the 3D shape and texture, respectively. With the projection parameter, 3D shape, and texture, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment and 3D reconstruction.
Tasks	3D Reconstruction, Face Alignment, Image Generation
Published	2018-04-11
URL	http://arxiv.org/abs/1804.03786v3
PDF	http://arxiv.org/pdf/1804.03786v3.pdf
PWC	https://paperswithcode.com/paper/nonlinear-3d-face-morphable-model
Repo	https://github.com/tranluan/Nonlinear_Face_3DMM
Framework	tf

G2D: from GTA to Data


Title	G2D: from GTA to Data
Authors	Anh-Dzung Doan, Abdul Mohsi Jawaid, Thanh-Toan Do, Tat-Jun Chin
Abstract	This document describes G2D, a software that enables capturing videos from Grand Theft Auto V (GTA V), a popular role playing game set in an expansive virtual city. The target users of our software are computer vision researchers who wish to collect hyper-realistic computer-generated imagery of a city from the street level, under controlled 6DOF camera poses and varying environmental conditions (weather, season, time of day, traffic density, etc.). G2D accesses/calls the native functions of the game; hence users can directly interact with G2D while playing the game. Specifically, G2D enables users to manipulate conditions of the virtual environment on the fly, while the gameplay camera is set to automatically retrace a predetermined 6DOF camera pose trajectory within the game coordinate system. Concurrently, automatic screen capture is executed while the virtual environment is being explored. G2D and its source code are publicly available at https://goo.gl/SS7fS6 In addition, we demonstrate an application of G2D to generate a large-scale dataset with groundtruth camera poses for testing structure-from-motion (SfM) algorithms. The dataset and generated 3D point clouds are also made available at https://goo.gl/DNzxHx
Tasks	3D Reconstruction, Autonomous Driving, Visual Localization, Visual Place Recognition
Published	2018-06-16
URL	http://arxiv.org/abs/1806.07381v1
PDF	http://arxiv.org/pdf/1806.07381v1.pdf
PWC	https://paperswithcode.com/paper/g2d-from-gta-to-data
Repo	https://github.com/dadung/G2D
Framework	none


Title	PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation
Authors	Jonathon Luiten, Paul Voigtlaender, Bastian Leibe
Abstract	We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposal-generation, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting and merging these proposals into accurate and temporally consistent pixel-wise object tracks over a video sequence in a way which is designed to specifically tackle the difficult challenges involved with segmenting multiple objects across a video sequence. Our approach surpasses all previous state-of-the-art results on the DAVIS 2017 video object segmentation benchmark with a J & F mean score of 71.6 on the test-dev dataset, and achieves first place in both the DAVIS 2018 Video Object Segmentation Challenge and the YouTube-VOS 1st Large-scale Video Object Segmentation Challenge.
Tasks	Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2018-07-24
URL	http://arxiv.org/abs/1807.09190v2
PDF	http://arxiv.org/pdf/1807.09190v2.pdf
PWC	https://paperswithcode.com/paper/premvos-proposal-generation-refinement-and
Repo	https://github.com/gunpowder78/PReMVOS
Framework	tf

Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow


Title	Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow
Authors	Tuan Anh Le, Adam R. Kosiorek, N. Siddharth, Yee Whye Teh, Frank Wood
Abstract	Stochastic control-flow models (SCFMs) are a class of generative models that involve branching on choices from discrete random variables. Amortized gradient-based learning of SCFMs is challenging as most approaches targeting discrete variables rely on their continuous relaxations—which can be intractable in SCFMs, as branching on relaxations requires evaluating all (exponentially many) branching paths. Tractable alternatives mainly combine REINFORCE with complex control-variate schemes to improve the variance of naive estimators. Here, we revisit the reweighted wake-sleep (RWS) (Bornschein and Bengio, 2015) algorithm, and through extensive evaluations, show that it outperforms current state-of-the-art methods in learning SCFMs. Further, in contrast to the importance weighted autoencoder, we observe that RWS learns better models and inference networks with increasing numbers of particles. Our results suggest that RWS is a competitive, often preferable, alternative for learning SCFMs.
Tasks
Published	2018-05-26
URL	https://arxiv.org/abs/1805.10469v2
PDF	https://arxiv.org/pdf/1805.10469v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-reweighted-wake-sleep
Repo	https://github.com/tuananhle7/rwspp
Framework	pytorch

Informative Features for Model Comparison


Title	Informative Features for Model Comparison
Authors	Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton
Abstract	Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models. We propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. As a unique advantage, our tests can produce a set of examples (informative features) indicating the regions in the data domain where one model fits significantly better than the other. In a real-world problem of comparing GAN models, the test power of our new test matches that of the state-of-the-art test of relative goodness of fit, while being one order of magnitude faster.
Tasks
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11630v1
PDF	http://arxiv.org/pdf/1810.11630v1.pdf
PWC	https://paperswithcode.com/paper/informative-features-for-model-comparison
Repo	https://github.com/wittawatj/kernel-mod
Framework	pytorch

Reconciling modern machine learning practice and the bias-variance trade-off


Title	Reconciling modern machine learning practice and the bias-variance trade-off
Authors	Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal
Abstract	Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This “double descent” curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.
Tasks
Published	2018-12-28
URL	https://arxiv.org/abs/1812.11118v2
PDF	https://arxiv.org/pdf/1812.11118v2.pdf
PWC	https://paperswithcode.com/paper/reconciling-modern-machine-learning-and-the
Repo	https://github.com/devoworm/DW-ML
Framework	tf

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings


Title	A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
Authors	Mikel Artetxe, Gorka Labaka, Eneko Agirre
Abstract	Recent work has managed to learn cross-lingual word embeddings without parallel data by mapping monolingual embeddings to a shared space through adversarial training. However, their evaluation has focused on favorable conditions, using comparable corpora or closely-related languages, and we show that they often fail in more realistic scenarios. This work proposes an alternative approach based on a fully unsupervised initialization that explicitly exploits the structural similarity of the embeddings, and a robust self-learning algorithm that iteratively improves this solution. Our method succeeds in all tested scenarios and obtains the best published results in standard datasets, even surpassing previous supervised systems. Our implementation is released as an open source project at https://github.com/artetxem/vecmap
Tasks	Word Embeddings
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06297v2
PDF	http://arxiv.org/pdf/1805.06297v2.pdf
PWC	https://paperswithcode.com/paper/a-robust-self-learning-method-for-fully
Repo	https://github.com/artetxem/vecmap
Framework	none