July 29, 2019

3015 words 15 mins read

Paper Group AWR 104

The Kinetics Human Action Video Dataset. Adversarial Variational Optimization of Non-Differentiable Simulators. EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity. On better training the infinite restricted Boltzmann machines. Rainbow: Combining Improvements in Deep Reinforcement Learning. Learning Spatio-Temporal Repre …

The Kinetics Human Action Video Dataset


Title	The Kinetics Human Action Video Dataset
Authors	Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, Andrew Zisserman
Abstract	We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some baseline performance figures for neural network architectures trained and tested for human action classification on this dataset. We also carry out a preliminary analysis of whether imbalance in the dataset leads to bias in the classifiers.
Tasks	Action Classification, Human-Object Interaction Detection
Published	2017-05-19
URL	http://arxiv.org/abs/1705.06950v1
PDF	http://arxiv.org/pdf/1705.06950v1.pdf
PWC	https://paperswithcode.com/paper/the-kinetics-human-action-video-dataset
Repo	https://github.com/OanaIgnat/i3d_keras
Framework	tf

Adversarial Variational Optimization of Non-Differentiable Simulators


Title	Adversarial Variational Optimization of Non-Differentiable Simulators
Authors	Gilles Louppe, Joeri Hermans, Kyle Cranmer
Abstract	Complex computer simulators are increasingly used across fields of science as generative models tying parameters of an underlying theory to experimental observations. Inference in this setup is often difficult, as simulators rarely admit a tractable density or likelihood function. We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes. We adapt the training procedure of generative adversarial networks by replacing the differentiable generative network with a domain-specific simulator. We solve the resulting non-differentiable minimax problem by minimizing variational upper bounds of the two adversarial objectives. Effectively, the procedure results in learning a proposal distribution over simulator parameters, such that the JS divergence between the marginal distribution of the synthetic data and the empirical distribution of observed data is minimized. We evaluate and compare the method with simulators producing both discrete and continuous data.
Tasks
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07113v4
PDF	http://arxiv.org/pdf/1707.07113v4.pdf
PWC	https://paperswithcode.com/paper/adversarial-variational-optimization-of-non
Repo	https://github.com/neychev/adversarial_variational_optimization
Framework	pytorch

EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity


Title	EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity
Authors	Edison Marrese-Taylor, Yutaka Matsuo
Abstract	In this paper we describe a deep learning system that has been designed and built for the WASSA 2017 Emotion Intensity Shared Task. We introduce a representation learning approach based on inner attention on top of an RNN. Results show that our model offers good capabilities and is able to successfully identify emotion-bearing words to predict intensity without leveraging on lexicons, obtaining the 13th place among 22 shared task competitors.
Tasks	Representation Learning, Sentence Embedding
Published	2017-08-18
URL	http://arxiv.org/abs/1708.05521v1
PDF	http://arxiv.org/pdf/1708.05521v1.pdf
PWC	https://paperswithcode.com/paper/emoatt-at-emoint-2017-inner-attention
Repo	https://github.com/epochx/emoatt
Framework	tf

On better training the infinite restricted Boltzmann machines


Title	On better training the infinite restricted Boltzmann machines
Authors	Xuan Peng, Xunzhang Gao, Xiang Li
Abstract	The infinite restricted Boltzmann machine (iRBM) is an extension of the classic RBM. It enjoys a good property of automatically deciding the size of the hidden layer according to specific training data. With sufficient training, the iRBM can achieve a competitive performance with that of the classic RBM. However, the convergence of learning the iRBM is slow, due to the fact that the iRBM is sensitive to the ordering of its hidden units, the learned filters change slowly from the left-most hidden unit to right. To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed. The key idea of the proposed training strategy is randomly regrouping the hidden units before each gradient descent step. Potentially, a mixing of infinite many iRBMs with different permutations of the hidden units can be achieved by this learning method, which has a similar effect of preventing the model from over-fitting as the dropout. The original iRBM is also modified to be capable of carrying out discriminative training. To evaluate the impact of our method on convergence speed of learning and the model’s generalization ability, several experiments have been performed on the binarized MNIST and CalTech101 Silhouettes datasets. Experimental results indicate that the proposed training strategy can greatly accelerate learning and enhance generalization ability of iRBMs.
Tasks
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03239v2
PDF	http://arxiv.org/pdf/1709.03239v2.pdf
PWC	https://paperswithcode.com/paper/on-better-training-the-infinite-restricted
Repo	https://github.com/Boltzxuann/RP-iRBM
Framework	none

Rainbow: Combining Improvements in Deep Reinforcement Learning


Title	Rainbow: Combining Improvements in Deep Reinforcement Learning
Authors	Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
Abstract	The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.
Tasks
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02298v1
PDF	http://arxiv.org/pdf/1710.02298v1.pdf
PWC	https://paperswithcode.com/paper/rainbow-combining-improvements-in-deep
Repo	https://github.com/mingyip/Rainbow-Reinforcement-Learning-For-Chess-Games
Framework	none

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks


Title	Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
Authors	Zhaofan Qiu, Ting Yao, Tao Mei
Abstract	Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%, respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques.
Tasks	Action Recognition In Videos, Video Classification
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10305v1
PDF	http://arxiv.org/pdf/1711.10305v1.pdf
PWC	https://paperswithcode.com/paper/learning-spatio-temporal-representation-with
Repo	https://github.com/ZhaofanQiu/pseudo-3d-residual-networks
Framework	pytorch

Prediction Under Uncertainty with Error-Encoding Networks


Title	Prediction Under Uncertainty with Error-Encoding Networks
Authors	Mikael Henaff, Junbo Zhao, Yann LeCun
Abstract	In this work we introduce a new framework for performing temporal predictions in the presence of uncertainty. It is based on a simple idea of disentangling components of the future state which are predictable from those which are inherently unpredictable, and encoding the unpredictable components into a low-dimensional latent variable which is fed into a forward model. Our method uses a supervised training objective which is fast and easy to train. We evaluate it in the context of video prediction on multiple datasets and show that it is able to consistently generate diverse predictions without the need for alternating minimization over a latent space or adversarial training.
Tasks	Video Prediction
Published	2017-11-14
URL	http://arxiv.org/abs/1711.04994v3
PDF	http://arxiv.org/pdf/1711.04994v3.pdf
PWC	https://paperswithcode.com/paper/prediction-under-uncertainty-with-error
Repo	https://github.com/21lva/EEN_acrobot
Framework	tf

SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling


Title	SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling
Authors	Ana Marasović, Anette Frank
Abstract	For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question “Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e. Semantic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis we determine what works and what might be done to make further improvements for ORL.
Tasks	Fine-Grained Opinion Analysis, Multi-Task Learning
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00768v3
PDF	http://arxiv.org/pdf/1711.00768v3.pdf
PWC	https://paperswithcode.com/paper/srl4orl-improving-opinion-role-labeling-using
Repo	https://github.com/amarasovic/naacl-mpqa-srl4orl
Framework	tf

EMNIST: an extension of MNIST to handwritten letters


Title	EMNIST: an extension of MNIST to handwritten letters
Authors	Gregory Cohen, Saeed Afshar, Jonathan Tapson, André van Schaik
Abstract	The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits.
Tasks
Published	2017-02-17
URL	http://arxiv.org/abs/1702.05373v2
PDF	http://arxiv.org/pdf/1702.05373v2.pdf
PWC	https://paperswithcode.com/paper/emnist-an-extension-of-mnist-to-handwritten
Repo	https://github.com/mohitiitb/NeuralNetworkVerification_GlobalRobustness
Framework	tf

Unsupervised Generative Modeling Using Matrix Product States


Title	Unsupervised Generative Modeling Using Matrix Product States
Authors	Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, Pan Zhang
Abstract	Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning analogous to the density matrix renormalization group method, which allows dynamically adjusting dimensions of the tensors and offers an efficient direct sampling approach for generative tasks. We apply our method to generative modeling of several standard datasets including the Bars and Stripes, random binary patterns and the MNIST handwritten digits to illustrate the abilities, features and drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work sheds light on many interesting directions of future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which are promisingly possible to be realized on quantum devices.
Tasks
Published	2017-09-06
URL	http://arxiv.org/abs/1709.01662v3
PDF	http://arxiv.org/pdf/1709.01662v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-generative-modeling-using-matrix
Repo	https://github.com/congzlwag/UnsupGenModbyMPS
Framework	none

SUPRA: Open Source Software Defined Ultrasound Processing for Real-Time Applications


Title	SUPRA: Open Source Software Defined Ultrasound Processing for Real-Time Applications
Authors	Rüdiger Göbl, Nassir Navab, Christoph Hennersperger
Abstract	Research in ultrasound imaging is limited in reproducibility by two factors: First, many existing ultrasound pipelines are protected by intellectual property, rendering exchange of code difficult. Second, most pipelines are implemented in special hardware, resulting in limited flexibility of implemented processing steps on such platforms. Methods: With SUPRA we propose an open-source pipeline for fully Software Defined Ultrasound Processing for Real-time Applications to alleviate these problems. Covering all steps from beamforming to output of B-mode images, SUPRA can help improve the reproducibility of results and make modifications to the image acquisition mode accessible to the research community. We evaluate the pipeline qualitatively, quantitatively, and regarding its run-time. Results: The pipeline shows image quality comparable to a clinical system and backed by point-spread function measurements a comparable resolution. Including all processing stages of a usual ultrasound pipeline, the run-time analysis shows that it can be executed in 2D and 3D on consumer GPUs in real-time. Conclusions: Our software ultrasound pipeline opens up the research in image acquisition. Given access to ultrasound data from early stages (raw channel data, radiofrequency data) it simplifies the development in imaging. Furthermore, it tackles the reproducibility of research results, as code can be shared easily and even be executed without dedicated ultrasound hardware.
Tasks
Published	2017-11-16
URL	http://arxiv.org/abs/1711.06127v3
PDF	http://arxiv.org/pdf/1711.06127v3.pdf
PWC	https://paperswithcode.com/paper/supra-open-source-software-defined-ultrasound
Repo	https://github.com/IFL-CAMP/supra
Framework	pytorch

Detecting Cancer Metastases on Gigapixel Pathology Images


Title	Detecting Cancer Metastases on Gigapixel Pathology Images
Authors	Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, Aleksei Timofeev, Philip Q. Nelson, Greg S. Corrado, Jason D. Hipp, Lily Peng, Martin C. Stumpe
Abstract	Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection.
Tasks	Medical Object Detection
Published	2017-03-03
URL	http://arxiv.org/abs/1703.02442v2
PDF	http://arxiv.org/pdf/1703.02442v2.pdf
PWC	https://paperswithcode.com/paper/detecting-cancer-metastases-on-gigapixel
Repo	https://github.com/Reemr/Cancer-detection
Framework	none

Learning from Synthetic Humans


Title	Learning from Synthetic Humans
Authors	Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid
Abstract	Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.
Tasks	Human Part Segmentation, Motion Capture, Pose Estimation
Published	2017-01-05
URL	http://arxiv.org/abs/1701.01370v3
PDF	http://arxiv.org/pdf/1701.01370v3.pdf
PWC	https://paperswithcode.com/paper/learning-from-synthetic-humans
Repo	https://github.com/gulvarol/surreal
Framework	torch

Projection Based Weight Normalization for Deep Neural Networks


Title	Projection Based Weight Normalization for Deep Neural Networks
Authors	Lei Huang, Xianglong Liu, Bo Lang, Bo Li
Abstract	Optimizing deep neural networks (DNNs) often suffers from the ill-conditioned problem. We observe that the scaling-based weight space symmetry property in rectified nonlinear network will cause this negative effect. Therefore, we propose to constrain the incoming weights of each neuron to be unit-norm, which is formulated as an optimization problem over Oblique manifold. A simple yet efficient method referred to as projection based weight normalization (PBWN) is also developed to solve this problem. PBWN executes standard gradient updates, followed by projecting the updated weight back to Oblique manifold. This proposed method has the property of regularization and collaborates well with the commonly used batch normalization technique. We conduct comprehensive experiments on several widely-used image datasets including CIFAR-10, CIFAR-100, SVHN and ImageNet for supervised learning over the state-of-the-art convolutional neural networks, such as Inception, VGG and residual networks. The results show that our method is able to improve the performance of DNNs with different architectures consistently. We also apply our method to Ladder network for semi-supervised learning on permutation invariant MNIST dataset, and our method outperforms the state-of-the-art methods: we obtain test errors as 2.52%, 1.06%, and 0.91% with only 20, 50, and 100 labeled samples, respectively.
Tasks
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02338v1
PDF	http://arxiv.org/pdf/1710.02338v1.pdf
PWC	https://paperswithcode.com/paper/projection-based-weight-normalization-for
Repo	https://github.com/huangleiBuaa/NormProjection
Framework	torch

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages


Title	Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Authors	Ehsaneddin Asgari, Hinrich Schütze
Abstract	We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the best of our knowledge - the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.
Tasks
Published	2017-04-28
URL	http://arxiv.org/abs/1704.08914v2
PDF	http://arxiv.org/pdf/1704.08914v2.pdf
PWC	https://paperswithcode.com/paper/past-present-future-a-computational
Repo	https://github.com/pywirrarika/naki
Framework	none